Bayesian Inference: Posterior Intervals

Similar documents
The binomial model. Assume a uniform prior distribution on p(θ). Write the pdf for this distribution.

HPD Intervals / Regions

Estimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio

Bayesian Methods. David S. Rosenberg. New York University. March 20, 2018

A Very Brief Summary of Bayesian Inference, and Examples

Introduction to Bayesian Methods. Introduction to Bayesian Methods p.1/??

PARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation.

Other Noninformative Priors

Accouncements. You should turn in a PDF and a python file(s) Figure for problem 9 should be in the PDF

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

Chapter 5. Bayesian Statistics

Part III. A Decision-Theoretic Approach and Bayesian testing

Inference for a Population Proportion

Introduction to Probabilistic Machine Learning

One-parameter models

Two examples of the use of fuzzy set theory in statistics. Glen Meeden University of Minnesota.

Introduction to Bayesian Methods

Bayesian philosophy Bayesian computation Bayesian software. Bayesian Statistics. Petter Mostad. Chalmers. April 6, 2017

Lecture 2: Conjugate priors

Decision theory. 1 We may also consider randomized decision rules, where δ maps observed data D to a probability distribution over


PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

Estimation of Quantiles

(1) Introduction to Bayesian statistics

Chapter 4. Bayesian inference. 4.1 Estimation. Point estimates. Interval estimates

Bayesian Inference. STA 121: Regression Analysis Artin Armagan

Module 22: Bayesian Methods Lecture 9 A: Default prior selection

INTRODUCTION TO BAYESIAN METHODS II


Prior Choice, Summarizing the Posterior

Noninformative Priors for the Ratio of the Scale Parameters in the Inverted Exponential Distributions

Bayesian Models in Machine Learning

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Parameter Estimation

Foundations of Statistical Inference

ECE521 W17 Tutorial 6. Min Bai and Yuhuai (Tony) Wu

New Bayesian methods for model comparison

CLASS NOTES Models, Algorithms and Data: Introduction to computing 2018

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics

2016 SISG Module 17: Bayesian Statistics for Genetics Lecture 3: Binomial Sampling

Unobservable Parameter. Observed Random Sample. Calculate Posterior. Choosing Prior. Conjugate prior. population proportion, p prior:

Frequentist Statistics and Hypothesis Testing Spring

Announcements. Proposals graded

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

A primer on Bayesian statistics, with an application to mortality rate estimation

Sequential Monitoring of Clinical Trials Session 4 - Bayesian Evaluation of Group Sequential Designs

Statistical Approaches to Learning and Discovery. Week 4: Decision Theory and Risk Minimization. February 3, 2003

PMR Learning as Inference

STAT 425: Introduction to Bayesian Analysis

Bayesian Computation

Bayesian Learning (II)

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Chapter 4 HOMEWORK ASSIGNMENTS. 4.1 Homework #1

Computational Cognitive Science

Overview of Course. Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58

Parameter Learning With Binary Variables

Exam 2 Practice Questions, 18.05, Spring 2014

Point Estimation. Vibhav Gogate The University of Texas at Dallas

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

NONINFORMATIVE NONPARAMETRIC BAYESIAN ESTIMATION OF QUANTILES

CSC321 Lecture 18: Learning Probabilistic Models

Lecture 2: Statistical Decision Theory (Part I)

ST 740: Multiparameter Inference

Bayesian Statistics. Debdeep Pati Florida State University. February 11, 2016

A Very Brief Summary of Statistical Inference, and Examples

Bayesian inference: what it means and why we care

Bayesian RL Seminar. Chris Mansley September 9, 2008

Conjugate Priors: Beta and Normal Spring 2018

Lecture 6: Model Checking and Selection

7. Estimation and hypothesis testing. Objective. Recommended reading

Discrete Binary Distributions

Bayesian Methods: Naïve Bayes

Integrated Objective Bayesian Estimation and Hypothesis Testing

Announcements. Proposals graded

Computational Perception. Bayesian Inference

Statistical Inference: Maximum Likelihood and Bayesian Approaches

INTRODUCTION TO BAYESIAN ANALYSIS

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences FINAL EXAMINATION, APRIL 2013

STAB57: Quiz-1 Tutorial 1 (Show your work clearly) 1. random variable X has a continuous distribution for which the p.d.f.

A Bayesian solution for a statistical auditing problem

Probability and Estimation. Alan Moses

Interval Estimation. Chapter 9

Bayesian Inference. Introduction

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Analysis of Middle Censored Data with Exponential Lifetime Distributions

Statistical Theory MT 2007 Problems 4: Solution sketches

Bayesian Statistical Methods. Jeff Gill. Department of Political Science, University of Florida

MAS3301 Bayesian Statistics

Bayesian Methods for Machine Learning

Practice Exam 1. (A) (B) (C) (D) (E) You are given the following data on loss sizes:

Statistical Theory MT 2006 Problems 4: Solution sketches

Introduction. Start with a probability distribution f(y θ) for the data. where η is a vector of hyperparameters

Hierarchical Bayesian modeling

Bayesian Inference. Chapter 2: Conjugate models

Computational Cognitive Science

Predictive Distributions

Confidence Intervals. CAS Antitrust Notice. Bayesian Computation. General differences between Bayesian and Frequntist statistics 10/16/2014

Stat 5101 Lecture Notes

Lecture 2: Repetition of probability theory and statistics

Subject CS1 Actuarial Statistics 1 Core Principles

Transcription:

Bayesian Inference: Posterior Intervals Simple values like the posterior mean E[θ X] and posterior variance var[θ X] can be useful in learning about θ. Quantiles of π(θ X) (especially the posterior median) can also be a useful summary of θ. The ideal summary of θ is an interval (or region) with a certain probability of containing θ. Note that a classical (frequentist) confidence interval does not exactly have this interpretation.

Definitions of Coverage Defn.: A random interval ( L(X), U(X) ) has 100(1 α)% frequentist coverage for θ if, before the data are gathered, P[L(X) < θ < U(X) θ] = 1 α. (Pre-experimental 1 α coverage) Note that if we observe X = x and plug x into our confidence interval formula, {0 if θ / ( L(x), U(x) ) P[L(x) < θ < U(x) θ] = (NOT Post-experimental 1 α coverage) 1 if θ ( L(x), U(x) )

Definitions of Coverage Defn.: An interval ( L(x), U(x) ), based on the observed data X = x, has 100(1 α)% Bayesian coverage for θ if P[L(x) < θ < U(x) X = x] = 1 α. (Post-experimental 1 α coverage) The frequentist interpretation is less desirable if we are performing inference about θ based on a single interval.

Frequentist Coverage for Bayesian Intervals Hartigan (1966) showed that for standard posterior intervals, an interval with 100(1 α)% Bayesian coverage will have P[L(X) < θ < U(X) θ] = (1 α) + ɛ n, where ɛ n < a/n for some constant a. Frequentist coverage 1 α as n. Note that many classical CI methods only achieve 100(1 α)% frequentist coverage asymptotically, as well.

Bayesian Credible Intervals A credible interval (or in general, a credible set) is the Bayesian analogue of a confidence interval. A 100(1 α)% credible set C is a subset of Θ such that π(θ X) dθ = 1 α. C If the parameter space Θ is discrete, a sum replaces the integral.

Quantile-Based Intervals If θl is the α/2 posterior quantile for θ, and θ U is the 1 α/2 posterior quantile for θ, then (θl, θ U ) is a 100(1 α)% credible interval for θ. Note: P[θ < θl X] = α/2 and P[θ > θ U X] = α/2. P{θ (θ L, θ U ) X} = 1 P{θ / (θl, θ U ) X} ) = 1 (P[θ < θ L X] + P[θ > θ U X] = 1 α.

Quantile-Based Intervals Picture:

Example: Quantile-Based Interval Suppose X 1,..., X n are the durations of cabinets for a sample of cabinets from Western European countries. We assume the X i s follow an exponential distribution. p(x i θ) = θe θx i, X i > 0 L(θ X) = θ n e θ P n i=1 x i Suppose our prior distribution for θ is p(θ) 1/θ, θ > 0. Larger values of θ are less likely a priori.

Example: Quantile-Based Interval Then π(θ X) p(θ)l(θ X) ( ) 1 θ n e θ P x i θ = θ n 1 e θ P x i This is the kernel of a gamma distribution with shape parameter n and rate parameter n i=1 x i. So including the normalizing constant, π(θ X) = ( x i ) n θ n 1 e θ P x i, θ > 0. Γ(n)

Example: Quantile-Based Interval Now, given the observed data x 1,..., x n, we can calculate any quantiles of this gamma distribution. The 0.05 and 0.95 quantiles will give us a 90% credible interval for θ. See R example with real data on course web page.

Example: Quantile-Based Interval Suppose we feel p(θ) = 1/θ is too subjective and favors small values of θ too much. Instead, let s consider the noninformative prior (favors all values of θ equally). Then our posterior is p(θ) = 1, θ > 0 π(θ X) p(θ)l(θ X) = (1)θ n e θ P x i = θ (n+1) 1 e θ P x i This posterior is a gamma with parameters (n + 1) and x i. We can similarly find the equal-tail credible interval.

Example 2: Quantile-Based Interval Consider 10 flips of a coin having P{Heads} = θ. Suppose we observe 2 heads. We model the count of heads as binomial: ( ) 10 p(x θ) = θ X (1 θ) 10 X, x = 0, 1,..., 10. X Let s use a uniform prior for θ: p(θ) = 1, 0 θ 1.

Example 2: Quantile-Based Interval Then the posterior is: π(θ x) p(θ)l(θ x) ( ) 10 = (1) θ x (1 θ) 10 x x θ x (1 θ) 10 x, 0 θ 1. This is a beta distribution for θ with parameters x + 1 and 10 x + 1. Since x = 2 here, π(θ x = 2) is beta(3,9). The 0.025 and 0.975 quantiles of a beta(3,9) are (.0602,.5178), which is a 95% credible interval for θ.

HPD Intervals / Regions The equal-tail credible interval approach is ideal when the posterior distribution is symmetric. But what if π(θ x) is skewed? Picture: A Skewed Posterior Density π(θ) 0.00 0.05 0.10 0.15 0 5 10 15 θ

HPD Intervals / Regions Note that values of θ around 2.2 have much higher posterior probability than values around 11.5. Yet 11.5 is in the equal-tails interval and 2.2 is not! A better approach here is to create our interval of θ-values having the Highest Posterior Density.

HPD Intervals / Regions Defn: A 100(1 α)% HPD region for θ is a subset C Θ defined by C = {θ : π(θ x) k} where k is the largest number such that π(θ x) dθ = 1 α. θ:π(θ x) k The value k can be thought of as a horizontal line placed over the posterior density whose intersection(s) with the posterior define regions with probability 1 α.

HPD Intervals / Regions Picture: (95% HPD Interval) A Skewed Posterior Density π(θ) 0.00 0.05 0.10 0.15 0 5 10 15 θ P{θL < θ < θ U } = 0.95. The values between θl and θ U density. here have the highest posterior