A Kullback-Leibler Divergence for Bayesian Model Comparison with Applications to Diabetes Studies. Chen-Pin Wang, UTHSCSA Malay Ghosh, U.

Size: px
Start display at page:

Download "A Kullback-Leibler Divergence for Bayesian Model Comparison with Applications to Diabetes Studies. Chen-Pin Wang, UTHSCSA Malay Ghosh, U."

Transcription

1 A Kullback-Leibler Divergence for Bayesian Model Comparison with Applications to Diabetes Studies Chen-Pin Wang, UTHSCSA Malay Ghosh, U. Florida Lehmann Symposium, May 9,

2 Background KLD: the expected (with respect to the reference model) logarithm of the ratio of the probability density functions (p.d.f. s) of two models. log ( r(tn θ) f(t n θ) ) r(t n θ) dt n KLD: a measure of the discrepancy of information about θ contained in the data revealed by two competing models (K-L; Lindley; Bernardo; Akaike; Schwarz; Goutis and Robert). Challenge in the Bayesian framework: identify priors that are compatible under the competing models the resulting integrated likelihoods are proper.

3 G-R KLD Remedy: The Kullback-Leibler projection by Goutis and Robert (1998), or G-R KLD: the inf. KLD between the likelihood under the reference model and all possible likelihoods arising from the competing model. G-R KLD is the KLD between the reference model and the competing model evaluated at its MLE if the reference model is correctly specified (ref. Akaike 1974). G-R KLD overcomes the challenges associated with prior elicitation in calculating KLD under the Bayesian framework.

4 G-R KLD The Bayesian estimate of G-R KLD: integrating the G-R KLD with respect to the posterior distribution of model parameters under the reference model. Bayesian estimate of G-R KLD is not subject to impropriety of the prior as long as the posterior under the reference model is proper. G-R KLD is suitable for comparing the predictivity of the competing models. G-R KLD was originally developed for comparing nested GLM with a known true model, and its extension to general model comparison remains limited.

5 Proposed KLD log ( r(tn θ) f(t n ˆθ f ) ) r(t n θ) dt n. (1) Bayes estimate of (1): { ( ) } r(tn θ) log r(t n θ) dt n π(θ U n ). (2) f(t n ˆθ f ) Objective: To study the property of KLD estimate given in (2).

6 Notations originating from model g gov- X i s are i.i.d. erned by θ Θ. T n = T (X 1,,X n ): diagnostics. the statistic for model Two competing models: r for the reference model and f for the fitted model. Assume that prior π r (θ) leads to proper posterior under r.

7 Our proposed KLD KLD t (r, f θ) quantifies the relative model fit for statistic T n between models r and f. KLD t (r, f θ) is identical to G-R KLD when the reference model r is the correct model. KLD t (r, f θ) is not necessarily the same as the G-R KLD. KLD t (r, f θ) needs no additional adjustment for non-nested situations. KLD t (r, f θ) is more practical than G-R KLD.

8 Regularity Conditions I (A1) For each x, both log r(x θ) and log f(x θ) are 3 times continuously differentiable in θ. Further, there exist neighborhoods N r (δ) =(θ δ r,θ+ δ r ) and N f (δ) =(θ δ f,θ+ δ f ) of θ and integrable functions H θ,δr (x) and H θ,δf (x) such that sup k log r(x θ) θ N(δ r ) θk H θ,δr (x) θ=θ and sup k log f(x θ) θ N(δ f ) θk H θ,δf (x) θ=θ for k=1, 2, 3. (A2) For all sufficiently large λ>0, [ ] E r sup log r(x θ ) θ θ >λ r(x θ) < 0; [ ] E f sup log f(x θ ) < 0. θ θ >λ f(x θ)

9 Regularity Conditions II (A3) E r [ E f [ sup θ (θ δ,θ+δ) sup θ (θ δ,θ+δ) ] log r(x θ ) θ E r [log r(x θ)] as δ 0; ] log f(x θ ) θ E f [log f(x θ)] as δ 0. (A4) The prior density π(θ) is continuously differentiable in a neighborhood of θ and π(θ) > 0. (A5) Suppose that T n is asymptotically normally distributed under both models such that r(t n θ) =σ 1 r (θ)φ( n{t n µ r (θ)}/σ r (θ)) + O(n 1/2 ); f(t n θ) =σ 1 f (θ)φ( n{t n µ f (θ)}/σ f (θ)) + O(n 1/2 ).

10 Theorem 1. Assume the regularity conditions (A1)-(A5). Then 2KLD t (r, f U n ) n when µ f (θ) µ r (θ), and {ˆµ f(u n ) ˆµ r (U n )} 2 ˆσ 2 f (U n) = o p (1)(3) 2KLD t (r, f U n ) Q ˆσ2 r (U n) ˆσ f 2(U = o p (1) (4) n) when µ r (θ) =µ f (θ) but σ 2 r (θ) σ2 f (θ).

11 Remarks for Theorem 1 KLD t (r, f θ) is also a divergence of model parameter estimates Model comparison in real applications may rely on the fit to a multi-dimensional statistic. The results in Theorem 1 are applicable to the multivariate case with a fixed dimension. KLD t (r, f θ) can be viewed as the discrepancy between r and f in terms of their posterior predictivity of T n. We study how KLD t (r, f θ) is connected to a weighted posterior predictive p-value, a typical Bayesian technique to assess model discrepancy (see Rubin 1984; Gelman et al. 1996).

12 Weighted Posterior Predictive P-value WPPP r (U n ) { tn f (y n ˆθ f )dy n r (t n θ)dt n } π r (θ U n ) dθ, (5) where r and f are the predictive density functions of T n under r and f, respectively. WPPP is equivalent to the weighted posterior predictive p-value of T n under f with respect to the posterior predictive distribution of T n under r.

13 Theorem 2. 2KLD t (r, f U n ) n = {Φ 1 (WPPP r (U n ))} 2 { n } (ˆµ r (U n ) ˆµ f (U n )) 2 + ˆσ f 2(U n)+ˆσ r 2 (U n ) when µ f (θ) µ r (θ). Let Q(y) =y log(y) 1. Then ˆσ 2 r (U n ) ˆσ 2 f (U n) + o p(1) (6) ( ) ˆσ r 2 (U n ) 2KLD t (r, f U n ) Q ˆσ f 2(U n) = o p (1) (7) and WPPP r (U n ) 0.5 =o p (1) (8) when µ r (θ) =µ f (θ) but σ 2 r (θ) σ 2 f (θ).

14 Remarks of Theorem 2. It shows the asymptotic relationship between KLD t (r, f u n ) and WPPP. Suppose that µ f (θ) µ r (θ). Both KLD t (r, f U n ) and Φ 1 (WPPP r (U n )) are of order O p (n). KLD t (r, f U n ) is greater than Φ 1 (WPPP r (U n )) by an O p (n) term that assumes positive values with probability 1. When µ r (θ) =µ f (θ) (i.e., both r and f assume the same mean of T n ) but σf 2(θ) σ2 r (θ), Φ 1 (WPPP r (U n )) converges to 0; WPPP r (U n ) converges to 0.5 KLD t (r, f U n ) converges to a positive quantity order O p (1)

15 Example 1. g θ (x i )=φ((x i θ 1 )/ θ 2 )/ θ 2, where θ 2 > 0. Let T n = n[( i X i)/n θ 1 ]/ κ. Let r = g and f θ (x i )=φ((x i θ 1 )/ κ)/ κ. Then X i i.i.d. µ r (θ) = E h (T n ) = µ f (θ) = E f (T n ) = θ 1, σr 2 (θ) = θ 2, σf 2 (θ) =κ, 2 lim KLDt (r, f u n ) n ) (ˆθ 2 (u n ) = log + ˆθ 2 (u n ) κ κ 1 { > 0 if κ θ2 =0 if κ = θ 2. T n is the MLE for θ 1 under both h and f. lim n WPPP(U n )=0.5 WPPP(U n ) is asymptotically equivalent to the KLD approaches.

16 i.i.d. Example 2 Assume X i g θ (x i ) = exp{ θ/(1 θ)}{θ/(1 θ)} x i /x i!, where 0 <θ<1. Let T n = X n /(1 + X n ), r = g, and f θ (x i )=θ x i (1 θ). Then µ r (θ) = µ f (θ) = θ, σ 2 r (θ) = θ(1 θ)3, and σ 2 f (θ) = θ(1 θ) 2. θ = E(X i )/(1 + E(X i )). T n is the MLE for θ under both r and f 2 lim n KLD t (r, f u n )= log(1 ˆθ(u n )) + (1 ˆθ(u n )) 1 > 0for0<θ<1. lim n WPPP(U n )=0.5

17 i.i.d. Example 3 Assume X i g θ (x i ) = Γ((θ 2+1)/2) Γ(θ 2 /2) (1 + (x πθ 2 θ 1 ) 2 /θ 2 ) (1+θ2)/2, where θ 2 > 2. Let T n = X. Let r = g and f θ (x i )=φ(x i θ 1 ). Then µ f (θ) =µ r (θ) =θ 1, σr 2 (θ) =θ 2 /(θ 2 2), and σf 2 (θ) =1 2 lim n KLD t (r, f u n )= log(θ 2 (u n )/(θ 2 (u n ) 2))+θ 2 /(θ 2 (u n ) 2) 1 0 for all θ 2 with equality if and only if θ 2 =.

18 i.i.d. Example 4 Assume X i g θ (x i ) = exp( x i /θ)/θ. Let r = g and f θ (x i ) = exp( x i ), T n = min{x 1,,X n }. Then r θ (t n )=n exp( nt n /θ)/θ and f θ (t n )=n exp( nt n ) WPPP f ( x n )=E f (Pr(T n <T n ) x n ) x n x n +1 KLDt (r, f x n ) log( x n )+n( x n 1) The asymptotic equivalence between KLD t (r, f u n ) and WPPP f (u n ) does not hold in the sense of Thm. 2 due to the violation of the asym. normality assumption.

19 A Study of Glucose Change in Veterans with Type 2 Diabetes A clinical cohort of 507 veterans with type 2 diabetes who had poor glucose control at the baseline and were then treated by metformin as the mono oral glucose-lowering agent. Goal: to compare models that assessed whether obesity was associated with the net change in glucose level between baseline and the end of 5- year follow-up. The empirical mean of the net change in HbA1c over time was similar between the obese vs. nonobese groups ( vs ). The empirical variance was greater in the obese group (1.207 vs ). Distribution of HbA1c was reasonably symmetric. Considered two candidate models for fitting the HbA1c change: a mixture of normals vs. a t-distribution.

20 KLD t (r, f u n )=10.75 suggesting that r was superior to f. KLD t (r, f u n ) result was consistent with Figures 1 & 2 which contrasted the empirical quantiles with predicted quantiles under r and f. Note that both r and f yielded unbiased estimators of E(X i ). Thus the model discrepancy between r and f assessed by KLD t (r, f u n ) is primarily attributed to the difference in the variance assumption between r and f (as evident in Figure 1 which contrasted the empirical quantiles with predicted quantiles under r and f). WPPP=0.522 suggested that the overall fit were similar between the two models (the estimated net change in HbA1c was similar between these two models).

21

22 A Study of Functioning in the Elderly with Diabetes The study cohort arisen from the subset of 119 participants with diabetes in the San Antonio Longitudinal Study of Aging, a communitybased study of the disablement process in Mexican American and European American older adults. Goal: to compare models that assessed whether glucose control trajectory class (poorer vs. better) was associated with the lower-extremity physical functional limitation score (measured by SPPB) during the first follow up period. SPPB score is discrete in nature with a range of Considered two candidate models for fitting SPPB: a negative binomial vs. a poisson. The empirical variance of SPPB (15.60 vs ) was greater than the mean (7.23 vs. 8.02) in both glucose control classes.

23 KLD t (r, f u n )=32.63 suggested that r was a better fit than f. Both r and f yielded similar estimates of E(X i ). The model discrepancy assessed by KLD t (r, f u n ) could primarily be attributed to the difference in variance estimation between r and f (as evident in Figures 3 & 4). WPPP(U n )=0.539 suggested similar fit between r and f.

24

25 Summary This paper considers a Bayesian estimate of the G-R-A KLD as given in (2). G-R-A KLD is appropriate for quantifying information discrepancy between the competing models r and f. We derive the asymptotic property of the G-R- A KLD in Theorem 1, and its relationship to a weighted posterior predictive p-value (WPPP) in Theorem 2. Our results need further refinement when the MLE of the mean of T n differs between r and f, or the normality assumption given in (A5) is not suitable. Model comparison in medical research may rely on the fit to a multidimensional statistic. Theorem 1 holds for a multivariate statistic T n with a

26 fixed dimension. Further investigation is needed to assess the property of our proposed KLD for situation when the dimension of T n increases with n. G-R-A KLD provides the relative fit between competing models. For the purpose of assessing absolute model adequacy, a KLD should be used in conjuction with absolute model departure indices such as posterior predictive p-values or residuals. Nevertheless, a KLD is also a measure of the absolute fit of model f when the reference model r is the true model.

Model comparison and selection

Model comparison and selection BS2 Statistical Inference, Lectures 9 and 10, Hilary Term 2008 March 2, 2008 Hypothesis testing Consider two alternative models M 1 = {f (x; θ), θ Θ 1 } and M 2 = {f (x; θ), θ Θ 2 } for a sample (X = x)

More information

Bayesian Model Diagnostics and Checking

Bayesian Model Diagnostics and Checking Earvin Balderama Quantitative Ecology Lab Department of Forestry and Environmental Resources North Carolina State University April 12, 2013 1 / 34 Introduction MCMCMC 2 / 34 Introduction MCMCMC Steps in

More information

MODEL COMPARISON CHRISTOPHER A. SIMS PRINCETON UNIVERSITY

MODEL COMPARISON CHRISTOPHER A. SIMS PRINCETON UNIVERSITY ECO 513 Fall 2008 MODEL COMPARISON CHRISTOPHER A. SIMS PRINCETON UNIVERSITY SIMS@PRINCETON.EDU 1. MODEL COMPARISON AS ESTIMATING A DISCRETE PARAMETER Data Y, models 1 and 2, parameter vectors θ 1, θ 2.

More information

Statistical Inference

Statistical Inference Statistical Inference Classical and Bayesian Methods Revision Class for Midterm Exam AMS-UCSC Th Feb 9, 2012 Winter 2012. Session 1 (Revision Class) AMS-132/206 Th Feb 9, 2012 1 / 23 Topics Topics We will

More information

Parameter Estimation

Parameter Estimation Parameter Estimation Chapters 13-15 Stat 477 - Loss Models Chapters 13-15 (Stat 477) Parameter Estimation Brian Hartman - BYU 1 / 23 Methods for parameter estimation Methods for parameter estimation Methods

More information

Econometrics I, Estimation

Econometrics I, Estimation Econometrics I, Estimation Department of Economics Stanford University September, 2008 Part I Parameter, Estimator, Estimate A parametric is a feature of the population. An estimator is a function of the

More information

Theory of Maximum Likelihood Estimation. Konstantin Kashin

Theory of Maximum Likelihood Estimation. Konstantin Kashin Gov 2001 Section 5: Theory of Maximum Likelihood Estimation Konstantin Kashin February 28, 2013 Outline Introduction Likelihood Examples of MLE Variance of MLE Asymptotic Properties What is Statistical

More information

Model comparison. Christopher A. Sims Princeton University October 18, 2016

Model comparison. Christopher A. Sims Princeton University October 18, 2016 ECO 513 Fall 2008 Model comparison Christopher A. Sims Princeton University sims@princeton.edu October 18, 2016 c 2016 by Christopher A. Sims. This document may be reproduced for educational and research

More information

is a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications.

is a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications. Stat 811 Lecture Notes The Wald Consistency Theorem Charles J. Geyer April 9, 01 1 Analyticity Assumptions Let { f θ : θ Θ } be a family of subprobability densities 1 with respect to a measure µ on a measurable

More information

THE FORMAL DEFINITION OF REFERENCE PRIORS

THE FORMAL DEFINITION OF REFERENCE PRIORS Submitted to the Annals of Statistics THE FORMAL DEFINITION OF REFERENCE PRIORS By James O. Berger, José M. Bernardo, and Dongchu Sun Duke University, USA, Universitat de València, Spain, and University

More information

Integrated Objective Bayesian Estimation and Hypothesis Testing

Integrated Objective Bayesian Estimation and Hypothesis Testing Integrated Objective Bayesian Estimation and Hypothesis Testing José M. Bernardo Universitat de València, Spain jose.m.bernardo@uv.es 9th Valencia International Meeting on Bayesian Statistics Benidorm

More information

Chapter 4 HOMEWORK ASSIGNMENTS. 4.1 Homework #1

Chapter 4 HOMEWORK ASSIGNMENTS. 4.1 Homework #1 Chapter 4 HOMEWORK ASSIGNMENTS These homeworks may be modified as the semester progresses. It is your responsibility to keep up to date with the correctly assigned homeworks. There may be some errors in

More information

Minimum Message Length Analysis of the Behrens Fisher Problem

Minimum Message Length Analysis of the Behrens Fisher Problem Analysis of the Behrens Fisher Problem Enes Makalic and Daniel F Schmidt Centre for MEGA Epidemiology The University of Melbourne Solomonoff 85th Memorial Conference, 2011 Outline Introduction 1 Introduction

More information

Bayesian Regularization

Bayesian Regularization Bayesian Regularization Aad van der Vaart Vrije Universiteit Amsterdam International Congress of Mathematicians Hyderabad, August 2010 Contents Introduction Abstract result Gaussian process priors Co-authors

More information

Beta statistics. Keywords. Bayes theorem. Bayes rule

Beta statistics. Keywords. Bayes theorem. Bayes rule Keywords Beta statistics Tommy Norberg tommy@chalmers.se Mathematical Sciences Chalmers University of Technology Gothenburg, SWEDEN Bayes s formula Prior density Likelihood Posterior density Conjugate

More information

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA, 00 MODULE : Statistical Inference Time Allowed: Three Hours Candidates should answer FIVE questions. All questions carry equal marks. The

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)

More information

Statistical Inference

Statistical Inference Statistical Inference Robert L. Wolpert Institute of Statistics and Decision Sciences Duke University, Durham, NC, USA Week 12. Testing and Kullback-Leibler Divergence 1. Likelihood Ratios Let 1, 2, 2,...

More information

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation Yujin Chung November 29th, 2016 Fall 2016 Yujin Chung Lec13: MLE Fall 2016 1/24 Previous Parametric tests Mean comparisons (normality assumption)

More information

Invariant HPD credible sets and MAP estimators

Invariant HPD credible sets and MAP estimators Bayesian Analysis (007), Number 4, pp. 681 69 Invariant HPD credible sets and MAP estimators Pierre Druilhet and Jean-Michel Marin Abstract. MAP estimators and HPD credible sets are often criticized in

More information

Lecture 16: Sample quantiles and their asymptotic properties

Lecture 16: Sample quantiles and their asymptotic properties Lecture 16: Sample quantiles and their asymptotic properties Estimation of quantiles (percentiles Suppose that X 1,...,X n are i.i.d. random variables from an unknown nonparametric F For p (0,1, G 1 (p

More information

CSC 2541: Bayesian Methods for Machine Learning

CSC 2541: Bayesian Methods for Machine Learning CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 10 Alternatives to Monte Carlo Computation Since about 1990, Markov chain Monte Carlo has been the dominant

More information

F & B Approaches to a simple model

F & B Approaches to a simple model A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 215 http://www.astro.cornell.edu/~cordes/a6523 Lecture 11 Applications: Model comparison Challenges in large-scale surveys

More information

9 Bayesian inference. 9.1 Subjective probability

9 Bayesian inference. 9.1 Subjective probability 9 Bayesian inference 1702-1761 9.1 Subjective probability This is probability regarded as degree of belief. A subjective probability of an event A is assessed as p if you are prepared to stake pm to win

More information

Chapter 3. Point Estimation. 3.1 Introduction

Chapter 3. Point Estimation. 3.1 Introduction Chapter 3 Point Estimation Let (Ω, A, P θ ), P θ P = {P θ θ Θ}be probability space, X 1, X 2,..., X n : (Ω, A) (IR k, B k ) random variables (X, B X ) sample space γ : Θ IR k measurable function, i.e.

More information

Part III. A Decision-Theoretic Approach and Bayesian testing

Part III. A Decision-Theoretic Approach and Bayesian testing Part III A Decision-Theoretic Approach and Bayesian testing 1 Chapter 10 Bayesian Inference as a Decision Problem The decision-theoretic framework starts with the following situation. We would like to

More information

Brief Review on Estimation Theory

Brief Review on Estimation Theory Brief Review on Estimation Theory K. Abed-Meraim ENST PARIS, Signal and Image Processing Dept. abed@tsi.enst.fr This presentation is essentially based on the course BASTA by E. Moulines Brief review on

More information

ECE531 Lecture 10b: Maximum Likelihood Estimation

ECE531 Lecture 10b: Maximum Likelihood Estimation ECE531 Lecture 10b: Maximum Likelihood Estimation D. Richard Brown III Worcester Polytechnic Institute 05-Apr-2011 Worcester Polytechnic Institute D. Richard Brown III 05-Apr-2011 1 / 23 Introduction So

More information

Principles of Statistics

Principles of Statistics Part II Year 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2018 81 Paper 4, Section II 28K Let g : R R be an unknown function, twice continuously differentiable with g (x) M for

More information

Stratégies bayésiennes et fréquentistes dans un modèle de bandit

Stratégies bayésiennes et fréquentistes dans un modèle de bandit Stratégies bayésiennes et fréquentistes dans un modèle de bandit thèse effectuée à Telecom ParisTech, co-dirigée par Olivier Cappé, Aurélien Garivier et Rémi Munos Journées MAS, Grenoble, 30 août 2016

More information

1 Hypothesis Testing and Model Selection

1 Hypothesis Testing and Model Selection A Short Course on Bayesian Inference (based on An Introduction to Bayesian Analysis: Theory and Methods by Ghosh, Delampady and Samanta) Module 6: From Chapter 6 of GDS 1 Hypothesis Testing and Model Selection

More information

Motivational Example

Motivational Example Motivational Example Data: Observational longitudinal study of obesity from birth to adulthood. Overall Goal: Build age-, gender-, height-specific growth charts (under 3 year) to diagnose growth abnomalities.

More information

Default priors and model parametrization

Default priors and model parametrization 1 / 16 Default priors and model parametrization Nancy Reid O-Bayes09, June 6, 2009 Don Fraser, Elisabeta Marras, Grace Yun-Yi 2 / 16 Well-calibrated priors model f (y; θ), F(y; θ); log-likelihood l(θ)

More information

Module 22: Bayesian Methods Lecture 9 A: Default prior selection

Module 22: Bayesian Methods Lecture 9 A: Default prior selection Module 22: Bayesian Methods Lecture 9 A: Default prior selection Peter Hoff Departments of Statistics and Biostatistics University of Washington Outline Jeffreys prior Unit information priors Empirical

More information

Overall Objective Priors

Overall Objective Priors Overall Objective Priors Jim Berger, Jose Bernardo and Dongchu Sun Duke University, University of Valencia and University of Missouri Recent advances in statistical inference: theory and case studies University

More information

Objective Bayesian Hypothesis Testing

Objective Bayesian Hypothesis Testing Objective Bayesian Hypothesis Testing José M. Bernardo Universitat de València, Spain jose.m.bernardo@uv.es Statistical Science and Philosophy of Science London School of Economics (UK), June 21st, 2010

More information

Conjugate Predictive Distributions and Generalized Entropies

Conjugate Predictive Distributions and Generalized Entropies Conjugate Predictive Distributions and Generalized Entropies Eduardo Gutiérrez-Peña Department of Probability and Statistics IIMAS-UNAM, Mexico Padova, Italy. 21-23 March, 2013 Menu 1 Antipasto/Appetizer

More information

Penalized Loss functions for Bayesian Model Choice

Penalized Loss functions for Bayesian Model Choice Penalized Loss functions for Bayesian Model Choice Martyn International Agency for Research on Cancer Lyon, France 13 November 2009 The pure approach For a Bayesian purist, all uncertainty is represented

More information

STAT215: Solutions for Homework 2

STAT215: Solutions for Homework 2 STAT25: Solutions for Homework 2 Due: Wednesday, Feb 4. (0 pt) Suppose we take one observation, X, from the discrete distribution, x 2 0 2 Pr(X x θ) ( θ)/4 θ/2 /2 (3 θ)/2 θ/4, 0 θ Find an unbiased estimator

More information

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures FE661 - Statistical Methods for Financial Engineering 9. Model Selection Jitkomut Songsiri statistical models overview of model selection information criteria goodness-of-fit measures 9-1 Statistical models

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2009 Prof. Gesine Reinert Our standard situation is that we have data x = x 1, x 2,..., x n, which we view as realisations of random

More information

Extended Bayesian Information Criteria for Model Selection with Large Model Spaces

Extended Bayesian Information Criteria for Model Selection with Large Model Spaces Extended Bayesian Information Criteria for Model Selection with Large Model Spaces Jiahua Chen, University of British Columbia Zehua Chen, National University of Singapore (Biometrika, 2008) 1 / 18 Variable

More information

Lecture 25: Review. Statistics 104. April 23, Colin Rundel

Lecture 25: Review. Statistics 104. April 23, Colin Rundel Lecture 25: Review Statistics 104 Colin Rundel April 23, 2012 Joint CDF F (x, y) = P [X x, Y y] = P [(X, Y ) lies south-west of the point (x, y)] Y (x,y) X Statistics 104 (Colin Rundel) Lecture 25 April

More information

Bayesian estimation of the discrepancy with misspecified parametric models

Bayesian estimation of the discrepancy with misspecified parametric models Bayesian estimation of the discrepancy with misspecified parametric models Pierpaolo De Blasi University of Torino & Collegio Carlo Alberto Bayesian Nonparametrics workshop ICERM, 17-21 September 2012

More information

Iterative Markov Chain Monte Carlo Computation of Reference Priors and Minimax Risk

Iterative Markov Chain Monte Carlo Computation of Reference Priors and Minimax Risk Iterative Markov Chain Monte Carlo Computation of Reference Priors and Minimax Risk John Lafferty School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 lafferty@cs.cmu.edu Abstract

More information

Graduate Econometrics I: Maximum Likelihood I

Graduate Econometrics I: Maximum Likelihood I Graduate Econometrics I: Maximum Likelihood I Yves Dominicy Université libre de Bruxelles Solvay Brussels School of Economics and Management ECARES Yves Dominicy Graduate Econometrics I: Maximum Likelihood

More information

Statistical Approaches to Learning and Discovery. Week 4: Decision Theory and Risk Minimization. February 3, 2003

Statistical Approaches to Learning and Discovery. Week 4: Decision Theory and Risk Minimization. February 3, 2003 Statistical Approaches to Learning and Discovery Week 4: Decision Theory and Risk Minimization February 3, 2003 Recall From Last Time Bayesian expected loss is ρ(π, a) = E π [L(θ, a)] = L(θ, a) df π (θ)

More information

Foundations of Nonparametric Bayesian Methods

Foundations of Nonparametric Bayesian Methods 1 / 27 Foundations of Nonparametric Bayesian Methods Part II: Models on the Simplex Peter Orbanz http://mlg.eng.cam.ac.uk/porbanz/npb-tutorial.html 2 / 27 Tutorial Overview Part I: Basics Part II: Models

More information

A BAYESIAN MATHEMATICAL STATISTICS PRIMER. José M. Bernardo Universitat de València, Spain

A BAYESIAN MATHEMATICAL STATISTICS PRIMER. José M. Bernardo Universitat de València, Spain A BAYESIAN MATHEMATICAL STATISTICS PRIMER José M. Bernardo Universitat de València, Spain jose.m.bernardo@uv.es Bayesian Statistics is typically taught, if at all, after a prior exposure to frequentist

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Chapter 8 Maximum Likelihood Estimation 8. Consistency If X is a random variable (or vector) with density or mass function f θ (x) that depends on a parameter θ, then the function f θ (X) viewed as a function

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

STAT 730 Chapter 4: Estimation

STAT 730 Chapter 4: Estimation STAT 730 Chapter 4: Estimation Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Analysis 1 / 23 The likelihood We have iid data, at least initially. Each datum

More information

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score

More information

MIT Spring 2016

MIT Spring 2016 MIT 18.655 Dr. Kempthorne Spring 2016 1 MIT 18.655 Outline 1 2 MIT 18.655 Decision Problem: Basic Components P = {P θ : θ Θ} : parametric model. Θ = {θ}: Parameter space. A{a} : Action space. L(θ, a) :

More information

STA 732: Inference. Notes 10. Parameter Estimation from a Decision Theoretic Angle. Other resources

STA 732: Inference. Notes 10. Parameter Estimation from a Decision Theoretic Angle. Other resources STA 732: Inference Notes 10. Parameter Estimation from a Decision Theoretic Angle Other resources 1 Statistical rules, loss and risk We saw that a major focus of classical statistics is comparing various

More information

6.1 Variational representation of f-divergences

6.1 Variational representation of f-divergences ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 6: Variational representation, HCR and CR lower bounds Lecturer: Yihong Wu Scribe: Georgios Rovatsos, Feb 11, 2016

More information

EM Algorithm II. September 11, 2018

EM Algorithm II. September 11, 2018 EM Algorithm II September 11, 2018 Review EM 1/27 (Y obs, Y mis ) f (y obs, y mis θ), we observe Y obs but not Y mis Complete-data log likelihood: l C (θ Y obs, Y mis ) = log { f (Y obs, Y mis θ) Observed-data

More information

Statistical Theory MT 2007 Problems 4: Solution sketches

Statistical Theory MT 2007 Problems 4: Solution sketches Statistical Theory MT 007 Problems 4: Solution sketches 1. Consider a 1-parameter exponential family model with density f(x θ) = f(x)g(θ)exp{cφ(θ)h(x)}, x X. Suppose that the prior distribution has the

More information

Model comparison: Deviance-based approaches

Model comparison: Deviance-based approaches Model comparison: Deviance-based approaches Patrick Breheny February 19 Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/23 Model comparison Thus far, we have looked at residuals in a fairly

More information

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches Sta 216, Lecture 4 Last Time: Logistic regression example, existence/uniqueness of MLEs Today s Class: 1. Hypothesis testing through analysis of deviance 2. Standard errors & confidence intervals 3. Model

More information

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30 MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD Copyright c 2012 (Iowa State University) Statistics 511 1 / 30 INFORMATION CRITERIA Akaike s Information criterion is given by AIC = 2l(ˆθ) + 2k, where l(ˆθ)

More information

Approximating mixture distributions using finite numbers of components

Approximating mixture distributions using finite numbers of components Approximating mixture distributions using finite numbers of components Christian Röver and Tim Friede Department of Medical Statistics University Medical Center Göttingen March 17, 2016 This project has

More information

λ(x + 1)f g (x) > θ 0

λ(x + 1)f g (x) > θ 0 Stat 8111 Final Exam December 16 Eleven students took the exam, the scores were 92, 78, 4 in the 5 s, 1 in the 4 s, 1 in the 3 s and 3 in the 2 s. 1. i) Let X 1, X 2,..., X n be iid each Bernoulli(θ) where

More information

Midterm Examination. STA 215: Statistical Inference. Due Wednesday, 2006 Mar 8, 1:15 pm

Midterm Examination. STA 215: Statistical Inference. Due Wednesday, 2006 Mar 8, 1:15 pm Midterm Examination STA 215: Statistical Inference Due Wednesday, 2006 Mar 8, 1:15 pm This is an open-book take-home examination. You may work on it during any consecutive 24-hour period you like; please

More information

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory Statistical Inference Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory IP, José Bioucas Dias, IST, 2007

More information

A NEW INFORMATION THEORETIC APPROACH TO ORDER ESTIMATION PROBLEM. Massachusetts Institute of Technology, Cambridge, MA 02139, U.S.A.

A NEW INFORMATION THEORETIC APPROACH TO ORDER ESTIMATION PROBLEM. Massachusetts Institute of Technology, Cambridge, MA 02139, U.S.A. A EW IFORMATIO THEORETIC APPROACH TO ORDER ESTIMATIO PROBLEM Soosan Beheshti Munther A. Dahleh Massachusetts Institute of Technology, Cambridge, MA 0239, U.S.A. Abstract: We introduce a new method of model

More information

Fisher Information & Efficiency

Fisher Information & Efficiency Fisher Information & Efficiency Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA 1 Introduction Let f(x θ) be the pdf of X for θ Θ; at times we will also consider a

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Chapter 7 Maximum Likelihood Estimation 7. Consistency If X is a random variable (or vector) with density or mass function f θ (x) that depends on a parameter θ, then the function f θ (X) viewed as a function

More information

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables THE UNIVERSITY OF MANCHESTER. 21 June :45 11:45

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables THE UNIVERSITY OF MANCHESTER. 21 June :45 11:45 Two hours MATH20802 To be supplied by the Examinations Office: Mathematical Formula Tables THE UNIVERSITY OF MANCHESTER STATISTICAL METHODS 21 June 2010 9:45 11:45 Answer any FOUR of the questions. University-approved

More information

Foundations of Statistical Inference

Foundations of Statistical Inference Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2016 Julien Berestycki (University of Oxford) SB2a MT 2016 1 / 32 Lecture 14 : Variational Bayes

More information

Seminar über Statistik FS2008: Model Selection

Seminar über Statistik FS2008: Model Selection Seminar über Statistik FS2008: Model Selection Alessia Fenaroli, Ghazale Jazayeri Monday, April 2, 2008 Introduction Model Choice deals with the comparison of models and the selection of a model. It can

More information

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.

More information

Information geometry for bivariate distribution control

Information geometry for bivariate distribution control Information geometry for bivariate distribution control C.T.J.Dodson + Hong Wang Mathematics + Control Systems Centre, University of Manchester Institute of Science and Technology Optimal control of stochastic

More information

Qualifying Exam in Probability and Statistics. https://www.soa.org/files/edu/edu-exam-p-sample-quest.pdf

Qualifying Exam in Probability and Statistics. https://www.soa.org/files/edu/edu-exam-p-sample-quest.pdf Part : Sample Problems for the Elementary Section of Qualifying Exam in Probability and Statistics https://www.soa.org/files/edu/edu-exam-p-sample-quest.pdf Part 2: Sample Problems for the Advanced Section

More information

A MODIFIED LIKELIHOOD RATIO TEST FOR HOMOGENEITY IN FINITE MIXTURE MODELS

A MODIFIED LIKELIHOOD RATIO TEST FOR HOMOGENEITY IN FINITE MIXTURE MODELS A MODIFIED LIKELIHOOD RATIO TEST FOR HOMOGENEITY IN FINITE MIXTURE MODELS Hanfeng Chen, Jiahua Chen and John D. Kalbfleisch Bowling Green State University and University of Waterloo Abstract Testing for

More information

Lecture 3. Truncation, length-bias and prevalence sampling

Lecture 3. Truncation, length-bias and prevalence sampling Lecture 3. Truncation, length-bias and prevalence sampling 3.1 Prevalent sampling Statistical techniques for truncated data have been integrated into survival analysis in last two decades. Truncation in

More information

M. J. Bayarri and M. E. Castellanos. University of Valencia and Rey Juan Carlos University

M. J. Bayarri and M. E. Castellanos. University of Valencia and Rey Juan Carlos University 1 BAYESIAN CHECKING OF HIERARCHICAL MODELS M. J. Bayarri and M. E. Castellanos University of Valencia and Rey Juan Carlos University Abstract: Hierarchical models are increasingly used in many applications.

More information

Lecture notes on statistical decision theory Econ 2110, fall 2013

Lecture notes on statistical decision theory Econ 2110, fall 2013 Lecture notes on statistical decision theory Econ 2110, fall 2013 Maximilian Kasy March 10, 2014 These lecture notes are roughly based on Robert, C. (2007). The Bayesian choice: from decision-theoretic

More information

Lecture 2: Statistical Decision Theory (Part I)

Lecture 2: Statistical Decision Theory (Part I) Lecture 2: Statistical Decision Theory (Part I) Hao Helen Zhang Hao Helen Zhang Lecture 2: Statistical Decision Theory (Part I) 1 / 35 Outline of This Note Part I: Statistics Decision Theory (from Statistical

More information

5.3 METABOLIC NETWORKS 193. P (x i P a (x i )) (5.30) i=1

5.3 METABOLIC NETWORKS 193. P (x i P a (x i )) (5.30) i=1 5.3 METABOLIC NETWORKS 193 5.3 Metabolic Networks 5.4 Bayesian Networks Let G = (V, E) be a directed acyclic graph. We assume that the vertices i V (1 i n) represent for example genes and correspond to

More information

ST440/540: Applied Bayesian Statistics. (9) Model selection and goodness-of-fit checks

ST440/540: Applied Bayesian Statistics. (9) Model selection and goodness-of-fit checks (9) Model selection and goodness-of-fit checks Objectives In this module we will study methods for model comparisons and checking for model adequacy For model comparisons there are a finite number of candidate

More information

The information complexity of sequential resource allocation

The information complexity of sequential resource allocation The information complexity of sequential resource allocation Emilie Kaufmann, joint work with Olivier Cappé, Aurélien Garivier and Shivaram Kalyanakrishan SMILE Seminar, ENS, June 8th, 205 Sequential allocation

More information

Lecture 8: Information Theory and Statistics

Lecture 8: Information Theory and Statistics Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang

More information

Selection Criteria Based on Monte Carlo Simulation and Cross Validation in Mixed Models

Selection Criteria Based on Monte Carlo Simulation and Cross Validation in Mixed Models Selection Criteria Based on Monte Carlo Simulation and Cross Validation in Mixed Models Junfeng Shang Bowling Green State University, USA Abstract In the mixed modeling framework, Monte Carlo simulation

More information

Lecture 1: Introduction

Lecture 1: Introduction Principles of Statistics Part II - Michaelmas 208 Lecturer: Quentin Berthet Lecture : Introduction This course is concerned with presenting some of the mathematical principles of statistical theory. One

More information

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3 Hypothesis Testing CB: chapter 8; section 0.3 Hypothesis: statement about an unknown population parameter Examples: The average age of males in Sweden is 7. (statement about population mean) The lowest

More information

Improper mixtures and Bayes s theorem

Improper mixtures and Bayes s theorem and Bayes s theorem and Han Han Department of Statistics University of Chicago DASF-III conference Toronto, March 2010 Outline Bayes s theorem 1 Bayes s theorem 2 Bayes s theorem Non-Bayesian model: Domain

More information

Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units

Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Sahar Z Zangeneh Robert W. Keener Roderick J.A. Little Abstract In Probability proportional

More information

Bayes spaces: use of improper priors and distances between densities

Bayes spaces: use of improper priors and distances between densities Bayes spaces: use of improper priors and distances between densities J. J. Egozcue 1, V. Pawlowsky-Glahn 2, R. Tolosana-Delgado 1, M. I. Ortego 1 and G. van den Boogaart 3 1 Universidad Politécnica de

More information

Bayesian Inference. Chapter 9. Linear models and regression

Bayesian Inference. Chapter 9. Linear models and regression Bayesian Inference Chapter 9. Linear models and regression M. Concepcion Ausin Universidad Carlos III de Madrid Master in Business Administration and Quantitative Methods Master in Mathematical Engineering

More information

Section 8.2. Asymptotic normality

Section 8.2. Asymptotic normality 30 Section 8.2. Asymptotic normality We assume that X n =(X 1,...,X n ), where the X i s are i.i.d. with common density p(x; θ 0 ) P= {p(x; θ) :θ Θ}. We assume that θ 0 is identified in the sense that

More information

BAYESIAN ANALYSIS OF BIVARIATE COMPETING RISKS MODELS

BAYESIAN ANALYSIS OF BIVARIATE COMPETING RISKS MODELS Sankhyā : The Indian Journal of Statistics 2000, Volume 62, Series B, Pt. 3, pp. 388 401 BAYESIAN ANALYSIS OF BIVARIATE COMPETING RISKS MODELS By CHEN-PIN WANG University of South Florida, Tampa, U.S.A.

More information

Statistics 135 Fall 2007 Midterm Exam

Statistics 135 Fall 2007 Midterm Exam Name: Student ID Number: Statistics 135 Fall 007 Midterm Exam Ignore the finite population correction in all relevant problems. The exam is closed book, but some possibly useful facts about probability

More information

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS PROBABILITY AND INFORMATION THEORY Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Probability space Rules of probability

More information

Generalized confidence intervals for the ratio or difference of two means for lognormal populations with zeros

Generalized confidence intervals for the ratio or difference of two means for lognormal populations with zeros UW Biostatistics Working Paper Series 9-7-2006 Generalized confidence intervals for the ratio or difference of two means for lognormal populations with zeros Yea-Hung Chen University of Washington, yeahung@u.washington.edu

More information

Statistics GIDP Ph.D. Qualifying Exam Theory Jan 11, 2016, 9:00am-1:00pm

Statistics GIDP Ph.D. Qualifying Exam Theory Jan 11, 2016, 9:00am-1:00pm Statistics GIDP Ph.D. Qualifying Exam Theory Jan, 06, 9:00am-:00pm Instructions: Provide answers on the supplied pads of paper; write on only one side of each sheet. Complete exactly 5 of the 6 problems.

More information

Methods of Estimation

Methods of Estimation Methods of Estimation MIT 18.655 Dr. Kempthorne Spring 2016 1 Outline Methods of Estimation I 1 Methods of Estimation I 2 X X, X P P = {P θ, θ Θ}. Problem: Finding a function θˆ(x ) which is close to θ.

More information

1. Fisher Information

1. Fisher Information 1. Fisher Information Let f(x θ) be a density function with the property that log f(x θ) is differentiable in θ throughout the open p-dimensional parameter set Θ R p ; then the score statistic (or score

More information

Chapter 3 : Likelihood function and inference

Chapter 3 : Likelihood function and inference Chapter 3 : Likelihood function and inference 4 Likelihood function and inference The likelihood Information and curvature Sufficiency and ancilarity Maximum likelihood estimation Non-regular models EM

More information

MML Invariant Linear Regression

MML Invariant Linear Regression MML Invariant Linear Regression Daniel F. Schmidt and Enes Makalic The University of Melbourne Centre for MEGA Epidemiology Carlton VIC 3053, Australia {dschmidt,emakalic}@unimelb.edu.au Abstract. This

More information