Monte Carlo conditioning on a sufficient statistic

Similar documents
CONDITIONAL MONTE CARLO BASED ON SUFFICIENT STATISTICS WITH APPLICATIONS

Exact Statistical Inference in. Parametric Models

A lemma on conditional Monte Carlo Bo Henry Lindqvist, Norwegian University of Science and Technology Gunnar Taraldsen, SINTEF Telecom and Informatics

Conditional Sampling from a Gamma Distribution given Sufficient Statistics

Parameter Estimation

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Bayesian Inference, and Examples

Part III. A Decision-Theoretic Approach and Bayesian testing

1. Fisher Information

7. Estimation and hypothesis testing. Objective. Recommended reading

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Bayesian Dropout. Tue Herlau, Morten Morup and Mikkel N. Schmidt. Feb 20, Discussed by: Yizhe Zhang

Invariant HPD credible sets and MAP estimators

Nuisance parameters and their treatment

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Patterns of Scalable Bayesian Inference Background (Session 1)

Lecture 16 November Application of MoUM to our 2-sided testing problem

Stat 5101 Lecture Notes

On Generalized Fiducial Inference

Remarks on Improper Ignorance Priors

POSTERIOR PROPRIETY IN SOME HIERARCHICAL EXPONENTIAL FAMILY MODELS

1 Hypothesis Testing and Model Selection

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

Principles of Statistics

Minimum Message Length Analysis of the Behrens Fisher Problem

7. Estimation and hypothesis testing. Objective. Recommended reading

Statistics & Data Sciences: First Year Prelim Exam May 2018

Bayesian Inference. Chapter 9. Linear models and regression

Statistical Theory MT 2007 Problems 4: Solution sketches

Fiducial Inference and Generalizations

Simulating Random Variables

F denotes cumulative density. denotes probability density function; (.)

A simple analysis of the exact probability matching prior in the location-scale model

Statistical Inference

Contents. Part I: Fundamentals of Bayesian Inference 1

The Surprising Conditional Adventures of the Bootstrap

Statistical Methods in Particle Physics

Statistical Data Analysis Stat 3: p-values, parameter estimation

Bayesian Methods with Monte Carlo Markov Chains II

Stat 451 Lecture Notes Simulating Random Variables

Likelihood-free MCMC

Markov Chain Monte Carlo Lecture 4

Integrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University

Chapter 8.8.1: A factorization theorem

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao

Advanced Statistical Modelling

P Values and Nuisance Parameters

The Bayesian Choice. Christian P. Robert. From Decision-Theoretic Foundations to Computational Implementation. Second Edition.

Hybrid Dirichlet processes for functional data

Bayesian estimation of the discrepancy with misspecified parametric models

1 Probability Model. 1.1 Types of models to be discussed in the course

Statistical Theory MT 2006 Problems 4: Solution sketches

Modern Methods of Statistical Learning sf2935 Auxiliary material: Exponential Family of Distributions Timo Koski. Second Quarter 2016

General Bayesian Inference I

Computational statistics

Second Workshop, Third Summary

BEST TESTS. Abstract. We will discuss the Neymann-Pearson theorem and certain best test where the power function is optimized.

Approximate analysis of covariance in trials in rare diseases, in particular rare cancers

Rank Regression with Normal Residuals using the Gibbs Sampler

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed

MCMC algorithms for fitting Bayesian models

A Review of Pseudo-Marginal Markov Chain Monte Carlo

Some Curiosities Arising in Objective Bayesian Analysis

Overall Objective Priors

Markov Chain Monte Carlo (MCMC)

Data Analysis and Uncertainty Part 2: Estimation

Previously Monte Carlo Integration

Multivariate Regression

Math 494: Mathematical Statistics

STAT 730 Chapter 4: Estimation

Wageningen Summer School in Econometrics. The Bayesian Approach in Theory and Practice

Theory of Statistical Tests

Master s Written Examination - Solution

Classical and Bayesian inference

STAT 830 Bayesian Estimation

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)

Subjective and Objective Bayesian Statistics

Primer on statistics:

New Insights into History Matching via Sequential Monte Carlo

A Very Brief Summary of Statistical Inference, and Examples

Bayesian Inference II. Course lecturer: Loukia Meligkotsidou

Module 22: Bayesian Methods Lecture 9 A: Default prior selection

Harvard University. Harvard University Biostatistics Working Paper Series

The comparative studies on reliability for Rayleigh models

JEREMY TAYLOR S CONTRIBUTIONS TO TRANSFORMATION MODEL

Markov Chain Monte Carlo methods

Conditional confidence interval procedures for the location and scale parameters of the Cauchy and logistic distributions

Lecture 3. Inference about multivariate normal distribution

Markov Chain Monte Carlo methods

Bayesian Ingredients. Hedibert Freitas Lopes

The formal relationship between analytic and bootstrap approaches to parametric inference

SAMPLING ALGORITHMS. In general. Inference in Bayesian models

Mathematical statistics

Censoring mechanisms

Strong Lens Modeling (II): Statistical Methods

Summary of Extending the Rank Likelihood for Semiparametric Copula Estimation, by Peter Hoff

STAT 830 Hypothesis Testing

Default Priors and Effcient Posterior Computation in Bayesian

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Transcription:

Seminar, UC Davis, 24 April 2008 p. 1/22 Monte Carlo conditioning on a sufficient statistic Bo Henry Lindqvist Norwegian University of Science and Technology, Trondheim Joint work with Gunnar Taraldsen, NTNU

Seminar, UC Davis, 24 April 2008 p. 2/22 Outline Definition of sufficiency Sufficiency in goodness-of-fit testing Conditional sampling given sufficient statistic: Basic algorithm Conditional sampling given sufficient statistic: Weighted sampling The Euclidean case Relation to Bayesian and fiducial statistics Other applications and concluding remarks

Seminar, UC Davis, 24 April 2008 p. 3/22 Sufficient statistics (X, T) pair of random vectors with joint distribution indexed by θ. Typically, X = (X 1,..., X n ) is a sample and T = T(X 1,..., X n ) is a statistic.

Seminar, UC Davis, 24 April 2008 p. 3/22 Sufficient statistics (X, T) pair of random vectors with joint distribution indexed by θ. Typically, X = (X 1,..., X n ) is a sample and T = T(X 1,..., X n ) is a statistic. T is assumed to be sufficient for θ compared to X, meaning that: The conditional distribution of X given T = t does not depend on θ.

Seminar, UC Davis, 24 April 2008 p. 3/22 Sufficient statistics (X, T) pair of random vectors with joint distribution indexed by θ. Typically, X = (X 1,..., X n ) is a sample and T = T(X 1,..., X n ) is a statistic. T is assumed to be sufficient for θ compared to X, meaning that: The conditional distribution of X given T = t does not depend on θ. or equivalently Conditional expectations E{φ(X) T = t} do not depend on the value of θ.

Seminar, UC Davis, 24 April 2008 p. 3/22 Sufficient statistics (X, T) pair of random vectors with joint distribution indexed by θ. Typically, X = (X 1,..., X n ) is a sample and T = T(X 1,..., X n ) is a statistic. T is assumed to be sufficient for θ compared to X, meaning that: The conditional distribution of X given T = t does not depend on θ. or equivalently Conditional expectations E{φ(X) T = t} do not depend on the value of θ. Useful criterion: Neyman s Factorization Theorem: T = (T 1,..., T k ) is sufficient for θ compared to X = (X 1,..., X n ) if the joint density can be factorized as f(x θ) = h(x)g(t(x) θ) i.e. f(x 1,..., x n θ) = h(x 1,..., x n )g(t 1 (x 1,..., x n ), T 2 (x 1,..., x n ),..., T k (x 1,..., x n ) θ)

Seminar, UC Davis, 24 April 2008 p. 4/22 Sufficiency Applications: construction of optimal estimators and tests, nuisance parameter elimination, goodness-of-fit testing.

Seminar, UC Davis, 24 April 2008 p. 4/22 Sufficiency Applications: construction of optimal estimators and tests, nuisance parameter elimination, goodness-of-fit testing. Motivation for present Monte Carlo approach: Usually difficult to derive the conditional distributions analytically. Simulation methods are therefore sought rather than formulas for conditional densities. Goal: To sample X conditionally given T = t.

Seminar, UC Davis, 24 April 2008 p. 5/22 Goodness-of-fit testing H 0 : observation X comes from a particular distribution indexed by θ

Seminar, UC Davis, 24 April 2008 p. 5/22 Goodness-of-fit testing H 0 : observation X comes from a particular distribution indexed by θ Suppose test statistic W W(X) given, large values expected when H 0 is violated. Let T T(X) be sufficient statistic under the null model.

Seminar, UC Davis, 24 April 2008 p. 5/22 Goodness-of-fit testing H 0 : observation X comes from a particular distribution indexed by θ Suppose test statistic W W(X) given, large values expected when H 0 is violated. Let T T(X) be sufficient statistic under the null model. Conditional test: Reject H 0, conditionally given T = t, when W k(t), where critical value k(t) is such that P H0 (W k(t) T = t) = α. k(t) is found from conditional distribution of W given T = t, which in principle is known.

Seminar, UC Davis, 24 April 2008 p. 5/22 Goodness-of-fit testing H 0 : observation X comes from a particular distribution indexed by θ Suppose test statistic W W(X) given, large values expected when H 0 is violated. Let T T(X) be sufficient statistic under the null model. Conditional test: Reject H 0, conditionally given T = t, when W k(t), where critical value k(t) is such that P H0 (W k(t) T = t) = α. k(t) is found from conditional distribution of W given T = t, which in principle is known. Equivalently, we can calculate the conditional p-value, p obs = P H0 (W w obs T = t), where w obs is observed value of W(X) Then reject H 0 if p obs α.

Seminar, UC Davis, 24 April 2008 p. 5/22 Goodness-of-fit testing H 0 : observation X comes from a particular distribution indexed by θ Suppose test statistic W W(X) given, large values expected when H 0 is violated. Let T T(X) be sufficient statistic under the null model. Conditional test: Reject H 0, conditionally given T = t, when W k(t), where critical value k(t) is such that P H0 (W k(t) T = t) = α. k(t) is found from conditional distribution of W given T = t, which in principle is known. Equivalently, we can calculate the conditional p-value, p obs = P H0 (W w obs T = t), where w obs is observed value of W(X) Then reject H 0 if p obs α. Remark: This test is also unconditionally an α-level test.

Seminar, UC Davis, 24 April 2008 p. 6/22 References Point of departure: ENGEN, S. and LILLEGÅRD, M. (1997). Stochastic simulations conditioned on sufficient statistics. Biometrika 84 235-240. LINDQVIST, B. H., TARALDSEN, G., LILLEGÅRD, M., AND ENGEN, S. (2003). A counterexample to a claim about stochastic simulations. Biometrika 90 489-490.

Seminar, UC Davis, 24 April 2008 p. 6/22 References Point of departure: ENGEN, S. and LILLEGÅRD, M. (1997). Stochastic simulations conditioned on sufficient statistics. Biometrika 84 235-240. LINDQVIST, B. H., TARALDSEN, G., LILLEGÅRD, M., AND ENGEN, S. (2003). A counterexample to a claim about stochastic simulations. Biometrika 90 489-490. Our papers: LINDQVIST, B. H. and TARALDSEN, G. (2005). Monte Carlo conditioning on a sufficient statistic. Biometrika 92 451-464. LINDQVIST, B. H. and TARALDSEN, G. (2007). Conditional Monte Carlo Based on Sufficient Statistics with Applications. In: Advances in statistical modeling and inference. Essays in Honor of Kjell A. Doksum. (ed. Vijay Nair), pp. 545-562, World Scientific, Singapore.

Seminar, UC Davis, 24 April 2008 p. 6/22 References Point of departure: ENGEN, S. and LILLEGÅRD, M. (1997). Stochastic simulations conditioned on sufficient statistics. Biometrika 84 235-240. LINDQVIST, B. H., TARALDSEN, G., LILLEGÅRD, M., AND ENGEN, S. (2003). A counterexample to a claim about stochastic simulations. Biometrika 90 489-490. Our papers: LINDQVIST, B. H. and TARALDSEN, G. (2005). Monte Carlo conditioning on a sufficient statistic. Biometrika 92 451-464. LINDQVIST, B. H. and TARALDSEN, G. (2007). Conditional Monte Carlo Based on Sufficient Statistics with Applications. In: Advances in statistical modeling and inference. Essays in Honor of Kjell A. Doksum. (ed. Vijay Nair), pp. 545-562, World Scientific, Singapore. Recent literature: DIACONIS, P., CHEN, Y., HOLMES, S. AND LIU, J. S. (2005). Sequential Monte Carlo Methods for Statistical Analysis of Tables. Journal of the American Statistical Association 100 109-120. LANGSRUD, Ø. (2005). Rotation tests. Statistics and Computing 15 53-60. LOCKHART, R. A., O REILLY, F. J. AND STEPHENS, M. A. (2007). Use of the Gibbs Sampler to Obtain Conditional Tests, with Applications Biometrika 94 992-998.

Seminar, UC Davis, 24 April 2008 p. 7/22 General setup Given (X, T), with T sufficient compared to X for parameter θ. Basic assumption: There are given random vector U with known distribution, known functions χ(, ), τ(, ) such that (χ(u, θ), τ(u, θ)) θ (X, T).

Seminar, UC Davis, 24 April 2008 p. 7/22 General setup Given (X, T), with T sufficient compared to X for parameter θ. Basic assumption: There are given random vector U with known distribution, known functions χ(, ), τ(, ) such that (χ(u, θ), τ(u, θ)) θ (X, T). Interpretation: These are ways of simulating (X, T) for given θ.

Seminar, UC Davis, 24 April 2008 p. 7/22 General setup Given (X, T), with T sufficient compared to X for parameter θ. Basic assumption: There are given random vector U with known distribution, known functions χ(, ), τ(, ) such that (χ(u, θ), τ(u, θ)) θ (X, T). Interpretation: These are ways of simulating (X, T) for given θ. EXAMPLE: EXPONENTIAL SAMPLES. X = (X 1,..., X n ) are i.i.d. from Exp(θ), i.e. with hazard rate θ Then T = n i=1 X i is sufficient for θ. Let U = (U 1,..., U n ) be i.i.d. Exp(1) variables. Then: χ(u, θ) = (U 1 /θ,..., U n /θ), n τ(u, θ) = U i /θ. i=1

Seminar, UC Davis, 24 April 2008 p. 8/22 Conditional sampling of X given T = t EXAMPLE (continued) Want to sample X = (X 1, X 2,..., X n ) conditional on T = X i = t for given t. Idea: Draw U = (U 1,..., U n ) i.i.d. Exp(1).

Seminar, UC Davis, 24 April 2008 p. 8/22 Conditional sampling of X given T = t EXAMPLE (continued) Want to sample X = (X 1, X 2,..., X n ) conditional on T = X i = t for given t. Idea: Draw U = (U 1,..., U n ) i.i.d. Exp(1). Recall: χ(u, θ) = (U 1 /θ,..., U n /θ) n θ τ(u, θ) = U i /θ T. i=1 θ X,

Seminar, UC Davis, 24 April 2008 p. 8/22 Conditional sampling of X given T = t EXAMPLE (continued) Want to sample X = (X 1, X 2,..., X n ) conditional on T = X i = t for given t. Idea: Draw U = (U 1,..., U n ) i.i.d. Exp(1). Recall: χ(u, θ) = (U 1 /θ,..., U n /θ) n θ τ(u, θ) = U i /θ T. i=1 θ X, Solve: τ(u, θ) = t θ = ˆθ(U, t) = U i /t.

Seminar, UC Davis, 24 April 2008 p. 8/22 Conditional sampling of X given T = t EXAMPLE (continued) Want to sample X = (X 1, X 2,..., X n ) conditional on T = X i = t for given t. Idea: Draw U = (U 1,..., U n ) i.i.d. Exp(1). Recall: χ(u, θ) = (U 1 /θ,..., U n /θ) n θ τ(u, θ) = U i /θ T. i=1 θ X, Solve: τ(u, θ) = t θ = ˆθ(U, t) = U i /t. Conditional sample: X t (U) = χ(u, ˆθ(U, t)) = ( tu1 n i=1 U,..., i ) tu n n i=1 U, i

Seminar, UC Davis, 24 April 2008 p. 8/22 Conditional sampling of X given T = t EXAMPLE (continued) Want to sample X = (X 1, X 2,..., X n ) conditional on T = X i = t for given t. Idea: Draw U = (U 1,..., U n ) i.i.d. Exp(1). Recall: χ(u, θ) = (U 1 /θ,..., U n /θ) n θ τ(u, θ) = U i /θ T. i=1 θ X, Solve: τ(u, θ) = t θ = ˆθ(U, t) = U i /t. Conditional sample: X t (U) = χ(u, ˆθ(U, t)) = ( tu1 n i=1 U,..., i ) tu n n i=1 U, i which is known to have the correct distribution!

Seminar, UC Davis, 24 April 2008 p. 9/22 Algorithm 1: Conditional sampling of X given T = t The algorithm used in the example can more generally be described as follows:

Seminar, UC Davis, 24 April 2008 p. 9/22 Algorithm 1: Conditional sampling of X given T = t The algorithm used in the example can more generally be described as follows: Recall: χ(u, θ) θ X, τ(u, θ) θ T.

Seminar, UC Davis, 24 April 2008 p. 9/22 Algorithm 1: Conditional sampling of X given T = t The algorithm used in the example can more generally be described as follows: Recall: χ(u, θ) θ X, τ(u, θ) θ T. ALGORITHM 1 Generate U from the known density f(u).

Seminar, UC Davis, 24 April 2008 p. 9/22 Algorithm 1: Conditional sampling of X given T = t The algorithm used in the example can more generally be described as follows: Recall: χ(u, θ) θ X, τ(u, θ) θ T. ALGORITHM 1 Generate U from the known density f(u). Solve τ(u, θ) = t for θ. The (unique) solution is ˆθ(U, t).

Seminar, UC Davis, 24 April 2008 p. 9/22 Algorithm 1: Conditional sampling of X given T = t The algorithm used in the example can more generally be described as follows: Recall: χ(u, θ) θ X, τ(u, θ) θ T. ALGORITHM 1 Generate U from the known density f(u). Solve τ(u, θ) = t for θ. The (unique) solution is ˆθ(U, t). Return X t (U) = χ{u, ˆθ(U, t)}.

Seminar, UC Davis, 24 April 2008 p. 10/22 Problems with Algorithm 1 Algorithm 1 does not in general give samples from the correct distribution, even when ˆθ is unique.

Seminar, UC Davis, 24 April 2008 p. 10/22 Problems with Algorithm 1 Algorithm 1 does not in general give samples from the correct distribution, even when ˆθ is unique. There may not be a unique solution ˆθ for θ of τ(u, θ) = t. For discrete distributions the solutions for θ of the equation τ(u, θ) = t are typically intervals. For continuous distributions there may be a finite number of solutions, depending on u.

Seminar, UC Davis, 24 April 2008 p. 11/22 What may go wrong with Algorithm 1? Assume that for each fixed u and t the equation τ(u, θ) = t has the unique solution θ = ˆθ(u, t)

Seminar, UC Davis, 24 April 2008 p. 11/22 What may go wrong with Algorithm 1? Assume that for each fixed u and t the equation τ(u, θ) = t has the unique solution θ = ˆθ(u, t) Under Algorithm 1 we obtained conditional samples by X t = χ{u, ˆθ(U, t)}

Seminar, UC Davis, 24 April 2008 p. 11/22 What may go wrong with Algorithm 1? Assume that for each fixed u and t the equation τ(u, θ) = t has the unique solution θ = ˆθ(u, t) Under Algorithm 1 we obtained conditional samples by X t = χ{u, ˆθ(U, t)} A tentative proof that this gives samples from the conditional distribution of X given T = t can be given as follows.

Seminar, UC Davis, 24 April 2008 p. 11/22 What may go wrong with Algorithm 1? Assume that for each fixed u and t the equation τ(u, θ) = t has the unique solution θ = ˆθ(u, t) Under Algorithm 1 we obtained conditional samples by X t = χ{u, ˆθ(U, t)} A tentative proof that this gives samples from the conditional distribution of X given T = t can be given as follows. Let φ be any function. For all θ and t we can formally write: E{φ(X) T = t} = E[φ{χ(U, θ)} τ(u, θ) = t] = E[φ{χ(U, θ)} ˆθ(U, t) = θ] = E(φ[χ{U, ˆθ(U, t)}] ˆθ(U, t) = θ) = E(φ[χ{U, ˆθ(U, t)}]) E{φ(X t )} If correct, this will imply that X t has the correct distribution, i.e. X t is distributed like the conditional distribution of X given T = t.

Seminar, UC Davis, 24 April 2008 p. 12/22 The key equality a possible Borel paradox The key equality is E[φ{χ(U, θ)} τ(u, θ) = t] = E[φ{χ(U, θ)} ˆθ(U, t) = θ] Follows apparently from the equivalence of the events {τ(u, θ) = t} and {ˆθ(U, t) = θ}

Seminar, UC Davis, 24 April 2008 p. 12/22 The key equality a possible Borel paradox The key equality is E[φ{χ(U, θ)} τ(u, θ) = t] = E[φ{χ(U, θ)} ˆθ(U, t) = θ] Follows apparently from the equivalence of the events {τ(u, θ) = t} and {ˆθ(U, t) = θ} Unproblematic if these events have positive probability, otherwise, equality may be invalid due to a Borel paradox

Seminar, UC Davis, 24 April 2008 p. 12/22 The key equality a possible Borel paradox The key equality is E[φ{χ(U, θ)} τ(u, θ) = t] = E[φ{χ(U, θ)} ˆθ(U, t) = θ] Follows apparently from the equivalence of the events {τ(u, θ) = t} and {ˆθ(U, t) = θ} Unproblematic if these events have positive probability, otherwise, equality may be invalid due to a Borel paradox Equality holds if the two events can be described by the same function of U (same σ-algebra).

Seminar, UC Davis, 24 April 2008 p. 12/22 The key equality a possible Borel paradox The key equality is E[φ{χ(U, θ)} τ(u, θ) = t] = E[φ{χ(U, θ)} ˆθ(U, t) = θ] Follows apparently from the equivalence of the events {τ(u, θ) = t} and {ˆθ(U, t) = θ} Unproblematic if these events have positive probability, otherwise, equality may be invalid due to a Borel paradox Equality holds if the two events can be described by the same function of U (same σ-algebra). SUFFICIENT CONDITION FOR ALGORITHM 1

Seminar, UC Davis, 24 April 2008 p. 12/22 The key equality a possible Borel paradox The key equality is E[φ{χ(U, θ)} τ(u, θ) = t] = E[φ{χ(U, θ)} ˆθ(U, t) = θ] Follows apparently from the equivalence of the events {τ(u, θ) = t} and {ˆθ(U, t) = θ} Unproblematic if these events have positive probability, otherwise, equality may be invalid due to a Borel paradox Equality holds if the two events can be described by the same function of U (same σ-algebra). SUFFICIENT CONDITION FOR ALGORITHM 1 The pivotal condition: Assume that τ(u, θ) depends on u only through a function r(u), where we have unique representation r(u) = v(θ, t) by solving τ(u, θ) = t.

Seminar, UC Davis, 24 April 2008 p. 12/22 The key equality a possible Borel paradox The key equality is E[φ{χ(U, θ)} τ(u, θ) = t] = E[φ{χ(U, θ)} ˆθ(U, t) = θ] Follows apparently from the equivalence of the events {τ(u, θ) = t} and {ˆθ(U, t) = θ} Unproblematic if these events have positive probability, otherwise, equality may be invalid due to a Borel paradox Equality holds if the two events can be described by the same function of U (same σ-algebra). SUFFICIENT CONDITION FOR ALGORITHM 1 The pivotal condition: Assume that τ(u, θ) depends on u only through a function r(u), where we have unique representation r(u) = v(θ, t) by solving τ(u, θ) = t. Thus v(θ, T) is a pivot in the classical sense.

Seminar, UC Davis, 24 April 2008 p. 12/22 The key equality a possible Borel paradox The key equality is E[φ{χ(U, θ)} τ(u, θ) = t] = E[φ{χ(U, θ)} ˆθ(U, t) = θ] Follows apparently from the equivalence of the events {τ(u, θ) = t} and {ˆθ(U, t) = θ} Unproblematic if these events have positive probability, otherwise, equality may be invalid due to a Borel paradox Equality holds if the two events can be described by the same function of U (same σ-algebra). SUFFICIENT CONDITION FOR ALGORITHM 1 The pivotal condition: Assume that τ(u, θ) depends on u only through a function r(u), where we have unique representation r(u) = v(θ, t) by solving τ(u, θ) = t. Thus v(θ, T) is a pivot in the classical sense. IN EXAMPLE: τ(u, θ) = n i=1 U i/θ r(u)/θ, so v(θ, t) = θt.

Seminar, UC Davis, 24 April 2008 p. 13/22 Algorithms 2 and 3 for weighted conditional sampling of X given T = t It turns out that a weighted sampling scheme is needed for the general case. Let Θ be a random variable with some conveniently chosen distribution, where Θ and U are independent. Key result is that conditional distribution of X given T = t is the same as that of χ(u,θ) given τ(u, Θ) = t. Notation: Let t W t (u) be the density of τ(u, Θ).

Seminar, UC Davis, 24 April 2008 p. 13/22 Algorithms 2 and 3 for weighted conditional sampling of X given T = t It turns out that a weighted sampling scheme is needed for the general case. Let Θ be a random variable with some conveniently chosen distribution, where Θ and U are independent. Key result is that conditional distribution of X given T = t is the same as that of χ(u,θ) given τ(u, Θ) = t. Notation: Let t W t (u) be the density of τ(u, Θ). ALGORITHM 2 Assume equation τ(u, θ) = t has unique solution θ = ˆθ(u, t) Generate V from a density proportional to W t (u)f(u). Return X t (V ) = χ(v, ˆθ(V, t)).

Algorithms 2 and 3 for weighted conditional sampling Seminar, UC Davis, 24 April 2008 p. 13/22 of X given T = t It turns out that a weighted sampling scheme is needed for the general case. Let Θ be a random variable with some conveniently chosen distribution, where Θ and U are independent. Key result is that conditional distribution of X given T = t is the same as that of χ(u,θ) given τ(u, Θ) = t. Notation: Let t W t (u) be the density of τ(u, Θ). ALGORITHM 2 Assume equation τ(u, θ) = t has unique solution θ = ˆθ(u, t) Generate V from a density proportional to W t (u)f(u). Return X t (V ) = χ(v, ˆθ(V, t)). ALGORITHM 3 General case Generate V from a density proportional to W t (u)f(u) and let the result be V = v. Generate Θ t from the conditional distribution of Θ given τ(v, Θ) = t. Return X t (V ) = χ(v,θ t ).

Seminar, UC Davis, 24 April 2008 p. 14/22 The weight function W t (u) in the Euclidean case X = (X 1,..., X n ) has distribution depending on a k-dimensional parameter θ T(X) is a k-dimensional sufficient statistic Choose density π(θ) for Θ; let f(u) be density of U Recall that W t (u) is density of τ(u, Θ)

Seminar, UC Davis, 24 April 2008 p. 14/22 The weight function W t (u) in the Euclidean case X = (X 1,..., X n ) has distribution depending on a k-dimensional parameter θ T(X) is a k-dimensional sufficient statistic Choose density π(θ) for Θ; let f(u) be density of U Recall that W t (u) is density of τ(u, Θ) Standard transformation formula using that τ(u, θ) = t θ = ˆθ(u, t) gives π(θ) W t (u) = π{ˆθ(u, t)} det tˆθ(u, t) = det θ τ(u, θ) θ=ˆθ(u,t).

Seminar, UC Davis, 24 April 2008 p. 14/22 The weight function W t (u) in the Euclidean case X = (X 1,..., X n ) has distribution depending on a k-dimensional parameter θ T(X) is a k-dimensional sufficient statistic Choose density π(θ) for Θ; let f(u) be density of U Recall that W t (u) is density of τ(u, Θ) Standard transformation formula using that τ(u, θ) = t θ = ˆθ(u, t) gives π(θ) W t (u) = π{ˆθ(u, t)} det tˆθ(u, t) = det θ τ(u, θ) θ=ˆθ(u,t). and further E{φ(X) T = t} = = π(θ) φ[χ{u, ˆθ(u, t)}] det θ τ(u,θ) θ=ˆθ(u,t) f(u)du π(θ) det θ τ(u,θ) θ=ˆθ(u,t) f(u)du E U (φ[χ{u, ˆθ(U, ) π(θ) t)}] det θ τ(u,θ) θ=ˆθ(u,t) ) π(θ) E U ( det θ τ(u,θ) θ=ˆθ(u,t)

Seminar, UC Davis, 24 April 2008 p. 14/22 The weight function W t (u) in the Euclidean case X = (X 1,..., X n ) has distribution depending on a k-dimensional parameter θ T(X) is a k-dimensional sufficient statistic Choose density π(θ) for Θ; let f(u) be density of U Recall that W t (u) is density of τ(u, Θ) Standard transformation formula using that τ(u, θ) = t θ = ˆθ(u, t) gives π(θ) W t (u) = π{ˆθ(u, t)} det tˆθ(u, t) = det θ τ(u, θ) θ=ˆθ(u,t). and further E{φ(X) T = t} = = π(θ) φ[χ{u, ˆθ(u, t)}] det θ τ(u,θ) θ=ˆθ(u,t) f(u)du π(θ) det θ τ(u,θ) θ=ˆθ(u,t) f(u)du E U (φ[χ{u, ˆθ(U, ) π(θ) t)}] det θ τ(u,θ) θ=ˆθ(u,t) ) π(θ) E U ( det θ τ(u,θ) θ=ˆθ(u,t) Can be computed by simulation using a pseudo-sample from the distribution of U

Seminar, UC Davis, 24 April 2008 p. 15/22 Example truncated exponential X = (X 1,..., X n ) are i.i.d. on [0, 1] with density f(x, θ) = { θe θx e θ 1 if θ 0 1 if θ = 0 for 0 x 1 T = n i=1 X i is sufficient compared to X.

Seminar, UC Davis, 24 April 2008 p. 15/22 Example truncated exponential X = (X 1,..., X n ) are i.i.d. on [0, 1] with density f(x, θ) = { θe θx e θ 1 if θ 0 1 if θ = 0 for 0 x 1 T = n i=1 X i is sufficient compared to X. Conditional distribution of X given T = t is that of n independent uniform [0, 1] random variables given their sum (for which there seems to be no simple expression).

Seminar, UC Davis, 24 April 2008 p. 15/22 Example truncated exponential X = (X 1,..., X n ) are i.i.d. on [0, 1] with density f(x, θ) = { θe θx e θ 1 if θ 0 1 if θ = 0 for 0 x 1 T = n i=1 X i is sufficient compared to X. Conditional distribution of X given T = t is that of n independent uniform [0, 1] random variables given their sum (for which there seems to be no simple expression). Simulation of data: Let U = (U 1, U 2,..., U n ) be i.i.d. uniform on [0, 1]. χ(u, θ) = τ(u, θ) = ( log{1 + (e θ 1)U 1 } n i=1 θ log{1 + (e θ 1)U i } θ,..., log{1 + ) (eθ 1)U n } θ

Seminar, UC Davis, 24 April 2008 p. 15/22 Example truncated exponential X = (X 1,..., X n ) are i.i.d. on [0, 1] with density f(x, θ) = { θe θx e θ 1 if θ 0 1 if θ = 0 for 0 x 1 T = n i=1 X i is sufficient compared to X. Conditional distribution of X given T = t is that of n independent uniform [0, 1] random variables given their sum (for which there seems to be no simple expression). Simulation of data: Let U = (U 1, U 2,..., U n ) be i.i.d. uniform on [0, 1]. χ(u, θ) = τ(u, θ) = ( log{1 + (e θ 1)U 1 } n i=1 θ log{1 + (e θ 1)U i } θ,..., log{1 + ) (eθ 1)U n } θ The equation τ(u, θ) = t has unique solution θ = ˆθ(u, t) However, X t does not have the correct distribution (e.g. let n = 2).

Seminar, UC Davis, 24 April 2008 p. 16/22 Computation Thus for computation of E{φ(X) T = t} we need to compute θ τ(u, θ) θ=ˆθ(u,t) = eˆθ(u,t) ˆθ(u, t) n i=1 u i t 1 + (eˆθ(u,t) 1)u i ˆθ(u, t).

Seminar, UC Davis, 24 April 2008 p. 16/22 Computation Thus for computation of E{φ(X) T = t} we need to compute θ τ(u, θ) θ=ˆθ(u,t) = eˆθ(u,t) ˆθ(u, t) n i=1 u i t 1 + (eˆθ(u,t) 1)u i ˆθ(u, t). To be substituted in E{φ(X) T = t} = E U (φ[χ{u, ˆθ(U, t)}] E U ( π(θ) det θ τ(u,θ) θ=ˆθ(u,t) π(θ) det θ τ(u,θ) θ=ˆθ(u,t) ) )

Seminar, UC Davis, 24 April 2008 p. 16/22 Computation Thus for computation of E{φ(X) T = t} we need to compute θ τ(u, θ) θ=ˆθ(u,t) = eˆθ(u,t) ˆθ(u, t) n i=1 u i t 1 + (eˆθ(u,t) 1)u i ˆθ(u, t). To be substituted in E{φ(X) T = t} = E U (φ[χ{u, ˆθ(U, t)}] E U ( π(θ) det θ τ(u,θ) θ=ˆθ(u,t) π(θ) det θ τ(u,θ) θ=ˆθ(u,t) ) ) In principle we can use any choice of π for which the integrals exist. Good choice is Jeffreys prior π(θ) = 1 θ 2 + 1 2 (e θ e θ )

Seminar, UC Davis, 24 April 2008 p. 17/22 Distribution of weights Recall: π(θ) W t (u) = θ τ(u, θ) θ=ˆθ(u,t).

Seminar, UC Davis, 24 April 2008 p. 18/22 Discussion of method Disadvantage: Need to solve equation n log{1 + (e θ 1)U i } = θt i=1 at each step to find ˆθ(U, t).

Seminar, UC Davis, 24 April 2008 p. 18/22 Discussion of method Disadvantage: Need to solve equation n log{1 + (e θ 1)U i } = θt i=1 at each step to find ˆθ(U, t). Quicker: Use a Gibbs algorithm to simulate X = (X 1,..., X n ) i.i.d. uniform on [0, 1] given n i=1 X i = t:

Seminar, UC Davis, 24 April 2008 p. 18/22 Discussion of method Disadvantage: Need to solve equation n log{1 + (e θ 1)U i } = θt i=1 at each step to find ˆθ(U, t). Quicker: Use a Gibbs algorithm to simulate X = (X 1,..., X n ) i.i.d. uniform on [0, 1] given n i=1 X i = t: Start with X 0 i = t for i = 1,..., n n Given (X m 1,..., X m n ) with n i=1 Xm i = t. Draw integers i < j randomly. Compute a = X m i Draw X m+1 i = Let X m+1 j = a X m+1 i Continue with m m + 1 { + X m j uniform[0, a] if a 1 uniform[a 1, 1] if a > 1

Seminar, UC Davis, 24 April 2008 p. 19/22 Relationship to Bayesian and fiducial distributions. Simple case: Suppose θ is one-dimensional τ(u, θ) is strictly monotone in θ for fixed u Then distribution of ˆθ(U, t) (i.e. the θ solving τ(u, θ) = t) corresponds to Fisher s fiducial distribution.

Seminar, UC Davis, 24 April 2008 p. 19/22 Relationship to Bayesian and fiducial distributions. Simple case: Suppose θ is one-dimensional τ(u, θ) is strictly monotone in θ for fixed u Then distribution of ˆθ(U, t) (i.e. the θ solving τ(u, θ) = t) corresponds to Fisher s fiducial distribution. Lindley (1958): The distribution of ˆθ(U, t) is a posterior distribution for some (possibly improper) prior distribution for θ if and only if T, or a transformation of it, has a location distribution.

Seminar, UC Davis, 24 April 2008 p. 19/22 Relationship to Bayesian and fiducial distributions. Simple case: Suppose θ is one-dimensional τ(u, θ) is strictly monotone in θ for fixed u Then distribution of ˆθ(U, t) (i.e. the θ solving τ(u, θ) = t) corresponds to Fisher s fiducial distribution. Lindley (1958): The distribution of ˆθ(U, t) is a posterior distribution for some (possibly improper) prior distribution for θ if and only if T, or a transformation of it, has a location distribution. Fraser (1961): (Multiparameter case.)the fiducial distribution is a posterior distribution if the sample and parameter sets are transformation groups, and the distributions are given by means of density functions with respect to right Haar measure.

Seminar, UC Davis, 24 April 2008 p. 19/22 Relationship to Bayesian and fiducial distributions. Simple case: Suppose θ is one-dimensional τ(u, θ) is strictly monotone in θ for fixed u Then distribution of ˆθ(U, t) (i.e. the θ solving τ(u, θ) = t) corresponds to Fisher s fiducial distribution. Lindley (1958): The distribution of ˆθ(U, t) is a posterior distribution for some (possibly improper) prior distribution for θ if and only if T, or a transformation of it, has a location distribution. Fraser (1961): (Multiparameter case.)the fiducial distribution is a posterior distribution if the sample and parameter sets are transformation groups, and the distributions are given by means of density functions with respect to right Haar measure. The cases above essentially correspond to the cases when Algorithm 1 can be used (pivotal property).

Seminar, UC Davis, 24 April 2008 p. 20/22 Example: Multivariate normal distribution Generation of multivariate normal samples conditional on sample mean and empirical covariance matrix: X = (X 1,..., X n ) is sample from N p (µ, Σ) T = ( X, S) is sufficient compared to X, with X = n 1 n i=1 X i S = (n 1) 1 n i=1 (X i X)(X i X) Reparameterise from (µ, Σ) to θ (µ, A), where Σ = AA is the Cholesky decomposition. Simulate by letting U = (U 1,..., U n ) be i.i.d. N p (0, I): χ(u, θ) = (µ + AU 1,..., µ + AU n ), τ(u, θ) = (µ + AŪ, AS UA ), where Ū and S U are defined in the same way as X and S. Pivotal condition holds and the desired conditional sample is given by χ{u, ˆθ(U, t)} = ( x + cl 1 U (U 1 Ū),..., x + cl 1 U (U n Ū)) where s = cc and S U = L U L U are Cholesky decompositions.

Other applications Seminar, UC Davis, 24 April 2008 p. 21/22

Seminar, UC Davis, 24 April 2008 p. 21/22 Other applications Inverse Gaussian samples given the sufficient statistics: Standard algorithm for generation of inverse Gaussian variates leads to multiple roots of τ(u, θ) = t. Algorithm 3 must be used.

Seminar, UC Davis, 24 April 2008 p. 21/22 Other applications Inverse Gaussian samples given the sufficient statistics: Standard algorithm for generation of inverse Gaussian variates leads to multiple roots of τ(u, θ) = t. Algorithm 3 must be used. Type II censored exponential samples: Equation τ(u, θ) = t has one or no solutions for θ. Algorithm 3 can be used.

Seminar, UC Davis, 24 April 2008 p. 21/22 Other applications Inverse Gaussian samples given the sufficient statistics: Standard algorithm for generation of inverse Gaussian variates leads to multiple roots of τ(u, θ) = t. Algorithm 3 must be used. Type II censored exponential samples: Equation τ(u, θ) = t has one or no solutions for θ. Algorithm 3 can be used. Discrete distributions (e.g. Poisson distribution, logistic regression): Solutions for θ of equation τ(u, θ) = t are typically intervals. Algorithm 3 is essentially used.

Seminar, UC Davis, 24 April 2008 p. 22/22 Concluding remarks The idea of weighted sampling (Algorithm 2) is similar to the classical conditional Monte Carlo approach of Trotter and Tukey (1956).

Seminar, UC Davis, 24 April 2008 p. 22/22 Concluding remarks The idea of weighted sampling (Algorithm 2) is similar to the classical conditional Monte Carlo approach of Trotter and Tukey (1956). Trotter and Tukey essentially consider the case of unique solution of the equation τ(u, θ) = t, but works without assuming sufficiency of conditioning variable.

Seminar, UC Davis, 24 April 2008 p. 22/22 Concluding remarks The idea of weighted sampling (Algorithm 2) is similar to the classical conditional Monte Carlo approach of Trotter and Tukey (1956). Trotter and Tukey essentially consider the case of unique solution of the equation τ(u, θ) = t, but works without assuming sufficiency of conditioning variable. The approach presented here can also be used for computation of conditional expectations E{φ(X) T = t} in non-statistical problems: Construct artificial statistical models for which the conditioning variable T is sufficient. E.g., use exponential models like f(x, θ) = c(θ)h(x)e θt(x), where h(x) is the density of X and T(X) is sufficient for θ.

Seminar, UC Davis, 24 April 2008 p. 22/22 Concluding remarks The idea of weighted sampling (Algorithm 2) is similar to the classical conditional Monte Carlo approach of Trotter and Tukey (1956). Trotter and Tukey essentially consider the case of unique solution of the equation τ(u, θ) = t, but works without assuming sufficiency of conditioning variable. The approach presented here can also be used for computation of conditional expectations E{φ(X) T = t} in non-statistical problems: Construct artificial statistical models for which the conditioning variable T is sufficient. E.g., use exponential models like f(x, θ) = c(θ)h(x)e θt(x), where h(x) is the density of X and T(X) is sufficient for θ. Methods based on conditional distributions given sufficient statistics are not of widespread use. The literature is scarce even for the normal and multinormal distributions. However, there seems to be an increasing interest in the recent literature.