Default priors and model parametrization

Similar documents
Default priors for Bayesian and frequentist inference

Bayesian and frequentist inference

Likelihood inference in the presence of nuisance parameters

Approximating models. Nancy Reid, University of Toronto. Oxford, February 6.

Nancy Reid SS 6002A Office Hours by appointment

Last week. posterior marginal density. exact conditional density. LTCC Likelihood Theory Week 3 November 19, /36

Measuring nuisance parameter effects in Bayesian inference

Likelihood Inference in the Presence of Nuisance Parameters

ASYMPTOTICS AND THE THEORY OF INFERENCE

Nancy Reid SS 6002A Office Hours by appointment

Accurate directional inference for vector parameters

Likelihood Inference in the Presence of Nuisance Parameters

DEFNITIVE TESTING OF AN INTEREST PARAMETER: USING PARAMETER CONTINUITY

BFF Four: Are we Converging?

Accurate directional inference for vector parameters

Some Curiosities Arising in Objective Bayesian Analysis

ASSESSING A VECTOR PARAMETER

PARAMETER CURVATURE REVISITED AND THE BAYES-FREQUENTIST DIVERGENCE.

A Very Brief Summary of Statistical Inference, and Examples

Applied Asymptotics Case studies in higher order inference

Staicu, A-M., & Reid, N. (2007). On the uniqueness of probability matching priors.

Linear Models A linear model is defined by the expression

Bayesian Asymptotics

COMBINING p-values: A DEFINITIVE PROCESS. Galley

ANCILLARY STATISTICS: A REVIEW

Introduction to Bayesian Methods

Noninformative Priors for the Ratio of the Scale Parameters in the Inverted Exponential Distributions

ANCILLARY STATISTICS: A REVIEW

Principles of Statistical Inference

Principles of Statistical Inference

Invariant HPD credible sets and MAP estimators

Bayesian Inference and the Parametric Bootstrap. Bradley Efron Stanford University

Module 22: Bayesian Methods Lecture 9 A: Default prior selection

A Very Brief Summary of Statistical Inference, and Examples

Overall Objective Priors

Likelihood and Asymptotic Theory for Statistical Inference

Third-order inference for autocorrelation in nonlinear regression models

STAT 425: Introduction to Bayesian Analysis

1 Hypothesis Testing and Model Selection

Modern likelihood inference for the parameter of skewness: An application to monozygotic

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands

Likelihood and Asymptotic Theory for Statistical Inference

Remarks on Improper Ignorance Priors

Bayesian Model Comparison

Approximate Inference for the Multinomial Logit Model

Integrated Objective Bayesian Estimation and Hypothesis Testing

Model comparison and selection

Stats 579 Intermediate Bayesian Modeling. Assignment # 2 Solutions

Improved Inference for First Order Autocorrelation using Likelihood Analysis

Chapter 4 HOMEWORK ASSIGNMENTS. 4.1 Homework #1

Exponential Models: Approximations for Probabilities

A simple analysis of the exact probability matching prior in the location-scale model

USEFUL PROPERTIES OF THE MULTIVARIATE NORMAL*

A BAYESIAN MATHEMATICAL STATISTICS PRIMER. José M. Bernardo Universitat de València, Spain

Bootstrap and Parametric Inference: Successes and Challenges

New Bayesian methods for model comparison

Improper mixtures and Bayes s theorem

BAYESIAN ANALYSIS OF BIVARIATE COMPETING RISKS MODELS

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

An Extended BIC for Model Selection

Likelihood based Statistical Inference. Dottorato in Economia e Finanza Dipartimento di Scienze Economiche Univ. di Verona

Chapter 4 - Fundamentals of spatial processes Lecture notes

Statistical Theory MT 2007 Problems 4: Solution sketches

g-priors for Linear Regression

An Improved Specification Test for AR(1) versus MA(1) Disturbances in Linear Regression Models

The binomial model. Assume a uniform prior distribution on p(θ). Write the pdf for this distribution.

Comparing Non-informative Priors for Estimation and Prediction in Spatial Models

Overall Objective Priors

Spatial statistics, addition to Part I. Parameter estimation and kriging for Gaussian random fields

ST 740: Linear Models and Multivariate Normal Inference

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown.

A Bayesian Treatment of Linear Gaussian Regression

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

A Very Brief Summary of Bayesian Inference, and Examples

Marginal Posterior Simulation via Higher-order Tail Area Approximations

Introduction to Estimation Methods for Time Series models Lecture 2

Part III. A Decision-Theoretic Approach and Bayesian testing

Bayesian inference for factor scores

Fiducial Inference and Generalizations

Improved Inference for Moving Average Disturbances in Nonlinear Regression Models

Predictive Distributions

Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models

Advanced Quantitative Methods: maximum likelihood

Modern Methods of Statistical Learning sf2935 Auxiliary material: Exponential Family of Distributions Timo Koski. Second Quarter 2016

Harrison B. Prosper. CMS Statistics Committee

ESTIMATING THE MEAN LEVEL OF FINE PARTICULATE MATTER: AN APPLICATION OF SPATIAL STATISTICS

Likelihood and p-value functions in the composite likelihood context

Outline of GLMs. Definitions

Integrated Objective Bayesian Estimation and Hypothesis Testing

Chapter 4: Constrained estimators and tests in the multiple linear regression model (Part III)

7. Estimation and hypothesis testing. Objective. Recommended reading

Linear Methods for Prediction

More on nuisance parameters

Nuisance parameters and their treatment

Bayesian Econometrics

Chapter 3: Maximum Likelihood Theory

Bayesian Statistics from Subjective Quantum Probabilities to Objective Data Analysis

simple if it completely specifies the density of x

Aspects of Likelihood Inference

The formal relationship between analytic and bootstrap approaches to parametric inference

Transcription:

1 / 16 Default priors and model parametrization Nancy Reid O-Bayes09, June 6, 2009 Don Fraser, Elisabeta Marras, Grace Yun-Yi

2 / 16 Well-calibrated priors model f (y; θ), F(y; θ); log-likelihood l(θ) = log f (y; θ) can we find priors that are guaranteed to be well-calibrated, at least approximately what structure do such priors need to have can we do this for all component parameters simultaneously Welch & Peers, 1963: π(θ)dθ i 1/2 (θ)dθ expected Fisher information (matrix) i(θ) = E{ l (θ)} Peers, 1965; Tibshirani, 1989 parameter of interest θ = (ψ, λ): π(θ)dθ i 1/2 ψψ (θ)g(λ)dθ matching priors using Edgeworth expansion

3 / 16 Well-calibrated priors model f (y; θ), F(y; θ); log-likelihood l(θ) = log f (y; θ) can we find priors that are guaranteed to be well-calibrated, at least approximately what structure do such priors need to have can we do this for all component parameters simultaneously Welch & Peers, 1963: π(θ)dθ i 1/2 (θ)dθ expected Fisher information (matrix) i(θ) = E{ l (θ)} Peers, 1965; Tibshirani, 1989 parameter of interest θ = (ψ, λ): π(θ)dθ i 1/2 ψψ (θ)g(λ)dθ matching priors using Edgeworth expansion

4 / 16 Approximate location models Location model: Y f (y θ) = π(θ)dθ dθ Location model: θ θ + dθ, y y + dθ F(y; θ) unchanged, i.e. df(y; θ) = 0 scalar Y, θ General model: require df (y 0 ; θ) = 0 y 0 sample point F y (y 0 ; θ)dy + F ;θ (y 0 ; θ)dθ = 0 sample y 1,..., y n : dy = F ;θ(y 0 ; θ) F y (y 0 dθ = V (θ)dθ ; θ) dy i = V i (θ)dθ scalar Y, vector θ i.i.d Y i, vector θ

Default prior dy = V (θ)dθ; V (θ) = V 1 (θ). V n (θ) possible default prior π(θ)dθ V (θ)v (θ) 1/2 dθ convert to maximum likelihood coordinates l θ (ˆθ; y) = 0 = l θθ (ˆθ; y)d ˆθ + l θ;y (ˆθ; y)dy = 0 n p matrix d ˆθ = ĵ 1 Hdy = ĵ 1 HV (θ)dθ proposed default prior ĵ = j(ˆθ 0 ): observed Fisher information π(θ)dθ ĵ 1 HV (θ) dθ 5 / 16

6 / 16 V (θ) links dy to dθ ĵ 1 H links dy to d ˆθ... default prior π(θ)dθ ĵ 1 HV (θ) dθ gives right invariant prior for transformation parameter in transformation models provides extension of right invariant prior to general (continuous) models V (θ) = dy dθ = F θ(y; θ) y=y 0 f (y; θ) y=y 0 H = H(y 0 ; ˆθ 0 ) = 2 l(θ; y) θ y ĵ = j(ˆθ 0 ) = l θθ (ˆθ 0 ) ˆθ 0,y 0

7 / 16 Example: regression model y 1 = X 1 β + σɛ 1,..., y n = X n β + σɛ n V (θ) = dy dθ = {X, (y 0 Xβ)/(2σ)} y 0 ( X /ˆσ 2 ) H = ẑ 0 /ˆσ 3 ĵ = diag{x X/ˆσ 2, n/(2ˆσ 4 )} W (θ) = = { { I (X X) 1 X z 0 (θ)/(2σ) 2ẑ 0 ˆσX/n ẑ 0 z 0 (θ)ˆσ/(nσ) I ( β ˆ } 0 β)/2σ 2 0 ˆσ 2 /σ 2. }

8 / 16... regression y N(Xβ, σ 2 ), θ = (β, σ 2 ) V (θ) = ( X y 0 ) Xβ σ y = Xβ + σɛ design residuals d ˆβ = dβ + ( ˆβ 0 β)dσ 2 /2σ 2 d ˆσ 2 = ˆσ 2 dσ 2 /σ 2. π(θ)dθ dβdσ 2 /σ 2 dβdσ/σ nonlinear regression, y N(x(β), σ 2 ), leads to d βdσ/σ β coordinates for tangent plane ẋ( ˆβ 0 )(β ˆβ 0 )

Scalar parameter default prior is data dependent, changes with y 0 based on approximate local location model Jeffreys prior π J (θ)dθ i 1/2 (θ)dθ gives frequentist matching to second order depends on model, not data unconditional prior 0.6 0.8 1.0 1.2 1.4 1.6 1.8 Flat priors for Gamma shape default Jeffreys 2 4 6 8 10 θ 9 / 16

Targetted priors If parameter of interest is curved, then prior needs to be targetted on the parameter of interest marginalization paradox (Dawid, Stone and Zidek) Example: Normal circle y i N(µ i, 1/n), i = 1,..., k; ψ = (µ 2 i )1/2 Stein Default prior π(µ)dµ dµ Posterior χ 2 k (n y 2 ) Exact χ 2 k (n ψ 2 ) Normal Circle, k=2, 5, 10 pvalue 0.0 0.2 0.4 0.6 0.8 1.0 posterior p-value 2 3 4 5 6 7 8 ψ 10 / 16

Normal Circle, k=2, 5, 10 pvalue 0.0 0.2 0.4 0.6 0.8 1.0 posterior p-value actual coverage 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 2 3 4 5 6 7 8 ψ nominal probability Bayes - frequentist Φ { } (k 1) ψ n not fixed by hierarchy of priors

12 / 16 Targetted priors: strong matching use Laplace approximations to posterior and to frequentist p-value structure of approximations makes comparison easy s(ψ). = Φ(r + 1 r log q B r ): Bayesian survivor value p(ψ). = Φ(r + 1 r log q F r ): Frequentist p-value r = r(ψ) = ± 2{l(ˆθ) l(ψ, ˆλ ψ )} q B contains the prior; q F various information functions and sample space derivatives q B = q F π(ˆθ) π(ˆθ ψ ) =... default prior along the curve θ = ˆθ ψ F&R, 2002 need to extend to full parameter space

13 / 16 q B = l ψ (ˆθ ψ ) j λλ(ˆθ ψ ) 1/2 j(ˆθ) 1/2... details π(ˆθ) π(ˆθ ψ ) q F = l ;V (ˆθ) l ;V (ˆθ ψ ) l λ;v (ˆθ ψ ) l θ;v (ˆθ) j(ˆθ) 1/2 j λλ (ˆθ ψ ) 1/2 π(ˆθ ψ ) π(ˆθ) l ψ (ˆθ ψ ) l θ;v (ˆθ) j λλ (ˆθ ψ ) l ;V (ˆθ) l ;V (ˆθ ψ ) l λ;v (ˆθ ψ ) j(ˆθ) along the profile curve C ψ = {θ : θ = (ψ, ˆλ ψ )} based on exponential family approximation at y 0 use observed information off the curve to extend prior to parameter space to get a version that when integrated via Laplace, brings us back to rf approximation π(θ) j (θθ) (ˆθ ψ ) 1/2 j (λλ) (θ) 1/2

14 / 16... details any continuous model can be approximated to O(n 1 ) by an exponential family model (at y 0 ) canonical parameter ϕ(θ) = l ;V (θ; y 0 ) = l yi (θ; y 0 )V i (ˆθ 0 ) V (θ) the same matrix as in the default prior connection through location model approximation ancillarity flat prior

15 / 16 Conclusions calibrated priors are data dependent focus motivated by asymptotic theory for likelihood inference reference priors also targetted on parameter of interest marginalization to curved parameters using flat priors may lead to poorly calibrated inferences hierarchical Poisson models: E(y ij ) = c 0 x ij exp(µ + α i + β j + γ ij ) non-informative uniform priors on µ, α, σ β, σ γ difficulties and opportunities with new large data sets checking sensitivity to prior specification can be done simply using asymptotic approximation Φ(r B ) connections to Empirical Bayes?

16 / 16 Some references Fraser, D.A.S., Marras, E., Reid, N. and Yi, G. (2009). Default priors for Bayesian and frequentist inference. Fraser, D.A.S. and Reid, N. (2002). Strong matching of frequentist and Bayesian inference. J. Statist. Plan. Infer. 103, 263 285. Berger, J.O., Bernardo, J. and Sun, D. (2009).The formal definition of reference priors. Ann. Statist. 37, 905 938. Datta, G.S. and Ghosh, M. (1995). Some remarks on noninformative priors. J. Amer. Statist. Assoc. 90, 1357 1363. Dawid, A.P., Stone, M., and Zidek, J.V. (1973). Marginalization paradoxes in Bayesian and structural inference. J. Roy. Statist. Soc. B 35, 189 233. Bernardo, J.M. (1979). Reference posterior distributions for Bayesian inference. J. Roy. Statist. Soc. B 41, 113 147 (with discussion).