Remarks on Improper Ignorance Priors

Similar documents
Classical and Bayesian inference

Improper mixtures and Bayes s theorem

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33

Classical and Bayesian inference

The Jeffreys Prior. Yingbo Li MATH Clemson University. Yingbo Li (Clemson) The Jeffreys Prior MATH / 13

Statistical Theory MT 2006 Problems 4: Solution sketches

STAT 425: Introduction to Bayesian Analysis

Statistical Theory MT 2007 Problems 4: Solution sketches

USEFUL PROPERTIES OF THE MULTIVARIATE NORMAL*

STAT J535: Chapter 5: Classes of Bayesian Priors

A simple analysis of the exact probability matching prior in the location-scale model

Predictive Distributions

David Giles Bayesian Econometrics

Linear Models A linear model is defined by the expression

Spring 2006: Examples: Laplace s Method; Hierarchical Models

1 Priors, Contd. ISyE8843A, Brani Vidakovic Handout Nuisance Parameters

The binomial model. Assume a uniform prior distribution on p(θ). Write the pdf for this distribution.

Bayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017

Integrated Objective Bayesian Estimation and Hypothesis Testing

Module 22: Bayesian Methods Lecture 9 A: Default prior selection

arxiv: v1 [math.st] 10 Aug 2007

CS281A/Stat241A Lecture 22

10. Exchangeability and hierarchical models Objective. Recommended reading

Notes on Jaynes s Treatment of the Marginalization Paradox

Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models

Default priors and model parametrization

A Very Brief Summary of Bayesian Inference, and Examples

Harrison B. Prosper. CMS Statistics Committee

ON THE EQUIVALENCE OF CONGLOMERABILITY AND DISINTEGRABILITY FOR UNBOUNDED RANDOM VARIABLES

Probability and Distributions

A Very Brief Summary of Statistical Inference, and Examples

General Bayesian Inference I

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Fiducial Inference and Generalizations

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Chapter 8: Sampling distributions of estimators Sections

Stats 579 Intermediate Bayesian Modeling. Assignment # 2 Solutions

Introduction to Machine Learning. Lecture 2

A Bayesian Treatment of Linear Gaussian Regression

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator

Bayesian Statistics Part III: Building Bayes Theorem Part IV: Prior Specification

On Generalized Fiducial Inference

Objective Bayesian Hypothesis Testing

COS513 LECTURE 8 STATISTICAL CONCEPTS

Stat 5101 Lecture Notes

More on nuisance parameters

Bayesian Inference in a Normal Population

Invariant HPD credible sets and MAP estimators

Time Series and Dynamic Models

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Part 4: Multi-parameter and normal models

Introduction. Start with a probability distribution f(y θ) for the data. where η is a vector of hyperparameters

The exponential family: Conjugate priors

Bayesian statistics, simulation and software

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

STAT215: Solutions for Homework 2

1 Data Arrays and Decompositions

ON BAYES S THEOREM FOR IMPROPER MIXTURES. By Peter McCullagh, and Han Han The University of Chicago

Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics

Chapter 5. Bayesian Statistics

Introduction to Bayesian Methods

Bayesian Regression Linear and Logistic Regression

One-parameter models

Mathematical Statistics

Data Analysis and Uncertainty Part 2: Estimation

Eco517 Fall 2004 C. Sims MIDTERM EXAM

Parametric Techniques Lecture 3

(3) Review of Probability. ST440/540: Applied Bayesian Statistics

Lecture 2. (See Exercise 7.22, 7.23, 7.24 in Casella & Berger)

STAT Advanced Bayesian Inference

Bayes and conjugate forms

Divergence Based priors for the problem of hypothesis testing

Bayesian Inference for Normal Mean

Some Curiosities Arising in Objective Bayesian Analysis

1 Hypothesis Testing and Model Selection

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Modern Methods of Statistical Learning sf2935 Auxiliary material: Exponential Family of Distributions Timo Koski. Second Quarter 2016

STAT 830 Bayesian Estimation

Bayesian Inference. STA 121: Regression Analysis Artin Armagan

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

Introduc)on to Bayesian Methods

Lecture 13 Fundamentals of Bayesian Inference

P Values and Nuisance Parameters

Bayesian Inference in a Normal Population

MATH c UNIVERSITY OF LEEDS Examination for the Module MATH2715 (January 2015) STATISTICAL METHODS. Time allowed: 2 hours

Default priors for Bayesian and frequentist inference

Introduction to Bayesian Statistics. James Swain University of Alabama in Huntsville ISEEM Department

7. Estimation and hypothesis testing. Objective. Recommended reading

STA 294: Stochastic Processes & Bayesian Nonparametrics

A union of Bayesian, frequentist and fiducial inferences by confidence distribution and artificial data sampling

Foundations of Statistical Inference

COMP90051 Statistical Machine Learning

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

Bayesian vs frequentist techniques for the analysis of binary outcome data

Chapter 3: Unbiased Estimation Lecture 22: UMVUE and the method of using a sufficient and complete statistic

A few basics of credibility theory

Non-conglomerability for countably additive measures that are not -additive* Mark Schervish, Teddy Seidenfeld, and Joseph Kadane CMU

Introduction to Bayesian Methods. Introduction to Bayesian Methods p.1/??

Transcription:

As a limit of proper priors Remarks on Improper Ignorance Priors Two caveats relating to computations with improper priors, based on their relationship with finitely-additive, but not countably-additive probability. 1. Failure of the law of conditional expectations and integrating-out.. Equivalent (transformed) random variables and integrating-out. 1

You have seen (HW #1) that for X N(θ, σ ), with the variance specified, if θ has a (Normal) conjugate prior θ N(µ, τ ), then the posterior for θ satisfies P(θ x) N(µ, τ ) where µ = (σ µ + τ x) / (σ +τ ) and τ = σ τ / (τ +σ ) If we let τ in the posterior probability, P(θ x) N(x, σ ). This is a case where the Confidence Interval theory, and the Bayesian posterior probability for these intervals agree. The corresponding improper prior is the uniform (Lebesgue) density for θ, dθ.

However, though Lebesgue measure is a σ-finite measure, it corresponds to a (class of) finitely, but not countably additive probabilities. That is, with Lebesgue measure used to depict a probability distribution for θ, the resulting probability satisfies shift-invariance: for all i and j P(i θ i+1) = P(j θ j+1). Then, even by finitely additive reasoning, P(i θ i+1) = 0 As the line is a countable union of unit-intervals, P is not countably additive! What are the consequences of using a finitely additive, but not countably additive prior probability for θ, disguised as an improper prior? 3

Anomalous Example 1: Let X and Y be independent Poisson random variables where X has mean θ and Y has mean λθ Let the prior for the parameters be the product of the uniform improper Lebesgue density for λ > 0 and a proper Gamma Γ(,1) density for θ. Observe Y = y first. For computing a formal Bayes posterior, product of prior and likelihood densities satisfies: f(y, θ, λ) = exp[-θ(λ+1)](θλ) y θ/y! 4

Not surprisingly, one observation of Y proves uninformative about λ, as the integral of f(y, θ, λ) with respect to θ is 1, improper. To find the marginal posterior for θ given Y, evaluate the integral of f(y, θ, λ) with respect to λ, leading to g(y,θ) = exp[-θ], a proper Γ(1,1) distribution. BUT, this formal application of Bayes theorem with the (partially) improper prior yields the same proper Gamma Γ(1,1) marginal posterior for θ given y, regardless the value of Y observed! 5

The (proper) marginal posterior distribution for θ, Γ(1,1), is a different distribution than the (proper) marginal prior distribution for θ, Γ(,1). Evidently, E[θ] E[ E[θ Y] ] But the anomaly is not restricted to inference about the parameters: Consider the distribution of X. Prior to observing Y = y, we have P(X = ) = θ 0 ( exp[ θ ])( θ exp[ θ ]) dθ = 3/16. However, conditioned by Y = y, as X and Y are independent given (λ,θ), we get P(X = Y = y) = 0 ( exp[ θ ])exp[ θ ] dθ = 1/8 independent of the observed value y. θ This, too, is an evident violation of the law of conditional expectations as: E[X] E[ E[X Y] ] 6

Even though these posterior distributions are proper, that they are the consequence of a finitely, but not countably additive (improper) prior probability affects them as follows. Theorem: Every finitely additive but not countably additive probability P admits an infinite partition π = {h 1, h,., h n, }, event F, and δ > 0 where: P(F) k and yet P(F h i ) < k - δ (i = 1,, ) Evidently, then, E[F] E[ E[F h] ] and one cannot write P(F) = h P(F h i ) dp(h i ) as always is possible for the countably additive case. 7

In the example, above, this anomaly occurs in the partition of the random variable Y, and affects the conditional probability for the parameter θ and the conditional probability for the other observable X, given Y. Here is a second version of the same problem using Jeffreys improper prior for Normally distributed data. Anomalous Example (Buehler-Feddersen, 1963): Let (X 1,X ) be conditionally iid N(µ,σ ). Trivially, as µ is the median of the distribution, for each pair (µ,σ ), P(X min µ X max µ,σ ) =.50. 8

It is a straightforward calculation that, using H.Jeffreys improper prior p(µ,σ ) dµ dσ/σ, which also is a limit of conjugate priors, then, for each pair (x 1, x ) P(x min µ x max x 1, x ) =.50. Define a statistic t = (x 1 +x )/(x 1 -x ). So, for pairs (x 1, x ) satisfying t 1.5, P(x min µ x max x 1, x, t 1.5) =.50. (*) Buehler and Feddersen (1963) show that, for each pair (µ,σ ), however, P(X min µ X max µ,σ, t 1.5) >.518. (**) 9

o Integrating-out in the (X 1, X )-partition, using (*), we get: P(X min µ X max t 1.5) =.50. o Integrating-out in the (µ,σ )-partition, using (**), we get: P(X min µ X max t 1.5) >.518. Evidently, with Jeffreys improper prior, at least one of these two partitions must fail to allow integrating-out. But which one is the culprit, or is integrating-out illicit in them both? 10

() Equivalent random variables and integrating-out a nuisance parameter under an improper prior. Here is an illustration of the general problem. Let (X 1, X ) be positive, real-valued random variables. Consider the (uniform) improper prior density p(x 1, x ) dx 1 dx By usual rules for factoring: p(x 1, x ) = p(x 1 x )p(x ) = p(x x 1 )p(x 1 ) the joint density is the independent product of two improper uniform marginal distributions p(x 1, x ) p(x 1 ) p(x ). 11

Transform to the equivalent pair (X 1, Y), where Y = X /X 1. The improper joint density for (X 1, Y) is p(x 1, y) x 1 dx 1 dy. By usual rules for factoring this is the independent product of the two improper marginal priors p(x 1, y) = p*(x 1 ) p(x ), where p*(x 1 ) x 1 dx 1 p(x 1 ) dx 1 and p*(x 1 ) is off by the Jacobian of the transformation from (X 1, X ) to (X 1, Y). Note: This is a magnification of a familiar (measure 0) problem with σ-additive probability the so-called Borel paradox. Where, for a set of measure 0, it is possible that, though (X, Y) is an equivalent pair to (X, Z), and though Y = y 0 is equivalent to Z = z 0, nonetheless P(X Y = y 0 ) P(X Z = z 0 ) 1

Example (Dawid and Stone, 197): Let X =(X 1, X,, X n ) (n > 1) be conditionally iid N(µ,σ ), and use Jeffreys improper (joint) prior density p(µ,σ ) dµdσ/σ, which can be arrived at as a limit of conjugate priors. Recall that ( x, s ) are jointly sufficient statistics for the parameters. The joint posterior density p(µ,σ x, s ) (where s = Σ(x i - x ) /(n-1)) is proportional to σ -(n+1) exp[-n(µ x ) /σ (n-1)s /σ ] dµdσ which can be described as follows in terms of the equivalent pair (µ,σ ) p(µ σ, x, s ) is Normal N( x, σ /n) and p(σ x, s ) is Inverse Gamma Γ([ n 1 ], [ n 1 ]s ) 13

Recall that, by integrating out s from the joint posterior, p(µ x, s ) is such that, n(µ- x )/s has a Student s t-distribution with n-1 d.f. Let θ = µ/σ and transform the joint posterior p(µ,σ x, s ) to the joint posterior for the equivalent pair (θ, σ): where R = Σ x i. p(θ, σ x, s ) σ -n exp[-nθ / + nθ x /σ - R /σ ] dθdσ Integrate-out σ (again!) to give the marginal posterior p(θ x) exp[-nθ /] n 1 ω exp[ ω + rθω] dω = g(θ, r) (*) 0 where r = (n/n-1) x /s. Thus, p(θ x) depends solely on the data through (r,n). 14

The distribution of r, likewise depends solely on θ, given the ancillary n. (See Stone & Dawid, 197 for the details) p(r θ,σ) exp[-nθ /](1- r /n) (n-3)/ n 1 ω exp[ ω + rθω] dω (**) 0 1 However, we cannot write g(θ, r) = p(r θ,σ) h(θ) for any prior h on θ. That is, we cannot interpret (*) as a Bayesian posterior for θ, given the data r, its likelihood (**), and some prior h on θ. However, if we compute p(θ x) using the improper prior dµdσ/σ, no such anomaly arises. Of course, this is a change in the prior matching the Jacobian of the transformation from (µ, σ) to (θ, σ). 15

Summary (1) In general, it is not valid to integrate-out random variables from proper posterior distributions that are based on improper prior distributions. Just when this is allowed depends upon the details of the (merely) finitely additive probability that the improper prior represents. The theory of finitely additive probability does not yet offer a transparent answer when integration over improper priors is permitted. () This is a controversial point: In general, familiar mathematical rules for manipulating transformations of a proper joint posterior density may not apply when that posterior is the result of an improper prior. That is, even when integrating out a (nuisance) parameter is warranted in a particular problem, how to do this when equivalent parameters are used is not settled! 16

References Kadane, J.B., Schervish,M.J., and Seidenfeld, T. (1996) Reasoning to a Foregone Conclusion, J. Amer. Stat. Assoc. 91, 18-136. Seidenfeld, T. (198) Paradoxes of Conglomerability and Fiducial Inference, in Logic, Methodology and Philosophy of Science VI: Proceedings of the 6 th International Congress of Logic, Methodology and Philosophy of Science, Hannover 1979. North Holland: pp. 395-41. Stone, M. and Dawid, P. (197) Un-Bayesian implications of improper Bayes inference in routine statistical problems, Biometrika 59, 369-375. 17