A Kullback-Leibler Divergence for Bayesian Model Comparison with Applications to Diabetes Studies. Chen-Pin Wang, UTHSCSA Malay Ghosh, U.
|
|
- Roger Pierce
- 5 years ago
- Views:
Transcription
1 A Kullback-Leibler Divergence for Bayesian Model Comparison with Applications to Diabetes Studies Chen-Pin Wang, UTHSCSA Malay Ghosh, U. Florida Lehmann Symposium, May 9,
2 Background KLD: the expected (with respect to the reference model) logarithm of the ratio of the probability density functions (p.d.f. s) of two models. log ( r(tn θ) f(t n θ) ) r(t n θ) dt n KLD: a measure of the discrepancy of information about θ contained in the data revealed by two competing models (K-L; Lindley; Bernardo; Akaike; Schwarz; Goutis and Robert). Challenge in the Bayesian framework: identify priors that are compatible under the competing models the resulting integrated likelihoods are proper.
3 G-R KLD Remedy: The Kullback-Leibler projection by Goutis and Robert (1998), or G-R KLD: the inf. KLD between the likelihood under the reference model and all possible likelihoods arising from the competing model. G-R KLD is the KLD between the reference model and the competing model evaluated at its MLE if the reference model is correctly specified (ref. Akaike 1974). G-R KLD overcomes the challenges associated with prior elicitation in calculating KLD under the Bayesian framework.
4 G-R KLD The Bayesian estimate of G-R KLD: integrating the G-R KLD with respect to the posterior distribution of model parameters under the reference model. Bayesian estimate of G-R KLD is not subject to impropriety of the prior as long as the posterior under the reference model is proper. G-R KLD is suitable for comparing the predictivity of the competing models. G-R KLD was originally developed for comparing nested GLM with a known true model, and its extension to general model comparison remains limited.
5 Proposed KLD log ( r(tn θ) f(t n ˆθ f ) ) r(t n θ) dt n. (1) Bayes estimate of (1): { ( ) } r(tn θ) log r(t n θ) dt n π(θ U n ). (2) f(t n ˆθ f ) Objective: To study the property of KLD estimate given in (2).
6 Notations originating from model g gov- X i s are i.i.d. erned by θ Θ. T n = T (X 1,,X n ): diagnostics. the statistic for model Two competing models: r for the reference model and f for the fitted model. Assume that prior π r (θ) leads to proper posterior under r.
7 Our proposed KLD KLD t (r, f θ) quantifies the relative model fit for statistic T n between models r and f. KLD t (r, f θ) is identical to G-R KLD when the reference model r is the correct model. KLD t (r, f θ) is not necessarily the same as the G-R KLD. KLD t (r, f θ) needs no additional adjustment for non-nested situations. KLD t (r, f θ) is more practical than G-R KLD.
8 Regularity Conditions I (A1) For each x, both log r(x θ) and log f(x θ) are 3 times continuously differentiable in θ. Further, there exist neighborhoods N r (δ) =(θ δ r,θ+ δ r ) and N f (δ) =(θ δ f,θ+ δ f ) of θ and integrable functions H θ,δr (x) and H θ,δf (x) such that sup k log r(x θ) θ N(δ r ) θk H θ,δr (x) θ=θ and sup k log f(x θ) θ N(δ f ) θk H θ,δf (x) θ=θ for k=1, 2, 3. (A2) For all sufficiently large λ>0, [ ] E r sup log r(x θ ) θ θ >λ r(x θ) < 0; [ ] E f sup log f(x θ ) < 0. θ θ >λ f(x θ)
9 Regularity Conditions II (A3) E r [ E f [ sup θ (θ δ,θ+δ) sup θ (θ δ,θ+δ) ] log r(x θ ) θ E r [log r(x θ)] as δ 0; ] log f(x θ ) θ E f [log f(x θ)] as δ 0. (A4) The prior density π(θ) is continuously differentiable in a neighborhood of θ and π(θ) > 0. (A5) Suppose that T n is asymptotically normally distributed under both models such that r(t n θ) =σ 1 r (θ)φ( n{t n µ r (θ)}/σ r (θ)) + O(n 1/2 ); f(t n θ) =σ 1 f (θ)φ( n{t n µ f (θ)}/σ f (θ)) + O(n 1/2 ).
10 Theorem 1. Assume the regularity conditions (A1)-(A5). Then 2KLD t (r, f U n ) n when µ f (θ) µ r (θ), and {ˆµ f(u n ) ˆµ r (U n )} 2 ˆσ 2 f (U n) = o p (1)(3) 2KLD t (r, f U n ) Q ˆσ2 r (U n) ˆσ f 2(U = o p (1) (4) n) when µ r (θ) =µ f (θ) but σ 2 r (θ) σ2 f (θ).
11 Remarks for Theorem 1 KLD t (r, f θ) is also a divergence of model parameter estimates Model comparison in real applications may rely on the fit to a multi-dimensional statistic. The results in Theorem 1 are applicable to the multivariate case with a fixed dimension. KLD t (r, f θ) can be viewed as the discrepancy between r and f in terms of their posterior predictivity of T n. We study how KLD t (r, f θ) is connected to a weighted posterior predictive p-value, a typical Bayesian technique to assess model discrepancy (see Rubin 1984; Gelman et al. 1996).
12 Weighted Posterior Predictive P-value WPPP r (U n ) { tn f (y n ˆθ f )dy n r (t n θ)dt n } π r (θ U n ) dθ, (5) where r and f are the predictive density functions of T n under r and f, respectively. WPPP is equivalent to the weighted posterior predictive p-value of T n under f with respect to the posterior predictive distribution of T n under r.
13 Theorem 2. 2KLD t (r, f U n ) n = {Φ 1 (WPPP r (U n ))} 2 { n } (ˆµ r (U n ) ˆµ f (U n )) 2 + ˆσ f 2(U n)+ˆσ r 2 (U n ) when µ f (θ) µ r (θ). Let Q(y) =y log(y) 1. Then ˆσ 2 r (U n ) ˆσ 2 f (U n) + o p(1) (6) ( ) ˆσ r 2 (U n ) 2KLD t (r, f U n ) Q ˆσ f 2(U n) = o p (1) (7) and WPPP r (U n ) 0.5 =o p (1) (8) when µ r (θ) =µ f (θ) but σ 2 r (θ) σ 2 f (θ).
14 Remarks of Theorem 2. It shows the asymptotic relationship between KLD t (r, f u n ) and WPPP. Suppose that µ f (θ) µ r (θ). Both KLD t (r, f U n ) and Φ 1 (WPPP r (U n )) are of order O p (n). KLD t (r, f U n ) is greater than Φ 1 (WPPP r (U n )) by an O p (n) term that assumes positive values with probability 1. When µ r (θ) =µ f (θ) (i.e., both r and f assume the same mean of T n ) but σf 2(θ) σ2 r (θ), Φ 1 (WPPP r (U n )) converges to 0; WPPP r (U n ) converges to 0.5 KLD t (r, f U n ) converges to a positive quantity order O p (1)
15 Example 1. g θ (x i )=φ((x i θ 1 )/ θ 2 )/ θ 2, where θ 2 > 0. Let T n = n[( i X i)/n θ 1 ]/ κ. Let r = g and f θ (x i )=φ((x i θ 1 )/ κ)/ κ. Then X i i.i.d. µ r (θ) = E h (T n ) = µ f (θ) = E f (T n ) = θ 1, σr 2 (θ) = θ 2, σf 2 (θ) =κ, 2 lim KLDt (r, f u n ) n ) (ˆθ 2 (u n ) = log + ˆθ 2 (u n ) κ κ 1 { > 0 if κ θ2 =0 if κ = θ 2. T n is the MLE for θ 1 under both h and f. lim n WPPP(U n )=0.5 WPPP(U n ) is asymptotically equivalent to the KLD approaches.
16 i.i.d. Example 2 Assume X i g θ (x i ) = exp{ θ/(1 θ)}{θ/(1 θ)} x i /x i!, where 0 <θ<1. Let T n = X n /(1 + X n ), r = g, and f θ (x i )=θ x i (1 θ). Then µ r (θ) = µ f (θ) = θ, σ 2 r (θ) = θ(1 θ)3, and σ 2 f (θ) = θ(1 θ) 2. θ = E(X i )/(1 + E(X i )). T n is the MLE for θ under both r and f 2 lim n KLD t (r, f u n )= log(1 ˆθ(u n )) + (1 ˆθ(u n )) 1 > 0for0<θ<1. lim n WPPP(U n )=0.5
17 i.i.d. Example 3 Assume X i g θ (x i ) = Γ((θ 2+1)/2) Γ(θ 2 /2) (1 + (x πθ 2 θ 1 ) 2 /θ 2 ) (1+θ2)/2, where θ 2 > 2. Let T n = X. Let r = g and f θ (x i )=φ(x i θ 1 ). Then µ f (θ) =µ r (θ) =θ 1, σr 2 (θ) =θ 2 /(θ 2 2), and σf 2 (θ) =1 2 lim n KLD t (r, f u n )= log(θ 2 (u n )/(θ 2 (u n ) 2))+θ 2 /(θ 2 (u n ) 2) 1 0 for all θ 2 with equality if and only if θ 2 =.
18 i.i.d. Example 4 Assume X i g θ (x i ) = exp( x i /θ)/θ. Let r = g and f θ (x i ) = exp( x i ), T n = min{x 1,,X n }. Then r θ (t n )=n exp( nt n /θ)/θ and f θ (t n )=n exp( nt n ) WPPP f ( x n )=E f (Pr(T n <T n ) x n ) x n x n +1 KLDt (r, f x n ) log( x n )+n( x n 1) The asymptotic equivalence between KLD t (r, f u n ) and WPPP f (u n ) does not hold in the sense of Thm. 2 due to the violation of the asym. normality assumption.
19 A Study of Glucose Change in Veterans with Type 2 Diabetes A clinical cohort of 507 veterans with type 2 diabetes who had poor glucose control at the baseline and were then treated by metformin as the mono oral glucose-lowering agent. Goal: to compare models that assessed whether obesity was associated with the net change in glucose level between baseline and the end of 5- year follow-up. The empirical mean of the net change in HbA1c over time was similar between the obese vs. nonobese groups ( vs ). The empirical variance was greater in the obese group (1.207 vs ). Distribution of HbA1c was reasonably symmetric. Considered two candidate models for fitting the HbA1c change: a mixture of normals vs. a t-distribution.
20 KLD t (r, f u n )=10.75 suggesting that r was superior to f. KLD t (r, f u n ) result was consistent with Figures 1 & 2 which contrasted the empirical quantiles with predicted quantiles under r and f. Note that both r and f yielded unbiased estimators of E(X i ). Thus the model discrepancy between r and f assessed by KLD t (r, f u n ) is primarily attributed to the difference in the variance assumption between r and f (as evident in Figure 1 which contrasted the empirical quantiles with predicted quantiles under r and f). WPPP=0.522 suggested that the overall fit were similar between the two models (the estimated net change in HbA1c was similar between these two models).
21
22 A Study of Functioning in the Elderly with Diabetes The study cohort arisen from the subset of 119 participants with diabetes in the San Antonio Longitudinal Study of Aging, a communitybased study of the disablement process in Mexican American and European American older adults. Goal: to compare models that assessed whether glucose control trajectory class (poorer vs. better) was associated with the lower-extremity physical functional limitation score (measured by SPPB) during the first follow up period. SPPB score is discrete in nature with a range of Considered two candidate models for fitting SPPB: a negative binomial vs. a poisson. The empirical variance of SPPB (15.60 vs ) was greater than the mean (7.23 vs. 8.02) in both glucose control classes.
23 KLD t (r, f u n )=32.63 suggested that r was a better fit than f. Both r and f yielded similar estimates of E(X i ). The model discrepancy assessed by KLD t (r, f u n ) could primarily be attributed to the difference in variance estimation between r and f (as evident in Figures 3 & 4). WPPP(U n )=0.539 suggested similar fit between r and f.
24
25 Summary This paper considers a Bayesian estimate of the G-R-A KLD as given in (2). G-R-A KLD is appropriate for quantifying information discrepancy between the competing models r and f. We derive the asymptotic property of the G-R- A KLD in Theorem 1, and its relationship to a weighted posterior predictive p-value (WPPP) in Theorem 2. Our results need further refinement when the MLE of the mean of T n differs between r and f, or the normality assumption given in (A5) is not suitable. Model comparison in medical research may rely on the fit to a multidimensional statistic. Theorem 1 holds for a multivariate statistic T n with a
26 fixed dimension. Further investigation is needed to assess the property of our proposed KLD for situation when the dimension of T n increases with n. G-R-A KLD provides the relative fit between competing models. For the purpose of assessing absolute model adequacy, a KLD should be used in conjuction with absolute model departure indices such as posterior predictive p-values or residuals. Nevertheless, a KLD is also a measure of the absolute fit of model f when the reference model r is the true model.
Model comparison and selection
BS2 Statistical Inference, Lectures 9 and 10, Hilary Term 2008 March 2, 2008 Hypothesis testing Consider two alternative models M 1 = {f (x; θ), θ Θ 1 } and M 2 = {f (x; θ), θ Θ 2 } for a sample (X = x)
More informationBayesian Model Diagnostics and Checking
Earvin Balderama Quantitative Ecology Lab Department of Forestry and Environmental Resources North Carolina State University April 12, 2013 1 / 34 Introduction MCMCMC 2 / 34 Introduction MCMCMC Steps in
More informationMODEL COMPARISON CHRISTOPHER A. SIMS PRINCETON UNIVERSITY
ECO 513 Fall 2008 MODEL COMPARISON CHRISTOPHER A. SIMS PRINCETON UNIVERSITY SIMS@PRINCETON.EDU 1. MODEL COMPARISON AS ESTIMATING A DISCRETE PARAMETER Data Y, models 1 and 2, parameter vectors θ 1, θ 2.
More informationStatistical Inference
Statistical Inference Classical and Bayesian Methods Revision Class for Midterm Exam AMS-UCSC Th Feb 9, 2012 Winter 2012. Session 1 (Revision Class) AMS-132/206 Th Feb 9, 2012 1 / 23 Topics Topics We will
More informationParameter Estimation
Parameter Estimation Chapters 13-15 Stat 477 - Loss Models Chapters 13-15 (Stat 477) Parameter Estimation Brian Hartman - BYU 1 / 23 Methods for parameter estimation Methods for parameter estimation Methods
More informationEconometrics I, Estimation
Econometrics I, Estimation Department of Economics Stanford University September, 2008 Part I Parameter, Estimator, Estimate A parametric is a feature of the population. An estimator is a function of the
More informationTheory of Maximum Likelihood Estimation. Konstantin Kashin
Gov 2001 Section 5: Theory of Maximum Likelihood Estimation Konstantin Kashin February 28, 2013 Outline Introduction Likelihood Examples of MLE Variance of MLE Asymptotic Properties What is Statistical
More informationModel comparison. Christopher A. Sims Princeton University October 18, 2016
ECO 513 Fall 2008 Model comparison Christopher A. Sims Princeton University sims@princeton.edu October 18, 2016 c 2016 by Christopher A. Sims. This document may be reproduced for educational and research
More informationis a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications.
Stat 811 Lecture Notes The Wald Consistency Theorem Charles J. Geyer April 9, 01 1 Analyticity Assumptions Let { f θ : θ Θ } be a family of subprobability densities 1 with respect to a measure µ on a measurable
More informationTHE FORMAL DEFINITION OF REFERENCE PRIORS
Submitted to the Annals of Statistics THE FORMAL DEFINITION OF REFERENCE PRIORS By James O. Berger, José M. Bernardo, and Dongchu Sun Duke University, USA, Universitat de València, Spain, and University
More informationIntegrated Objective Bayesian Estimation and Hypothesis Testing
Integrated Objective Bayesian Estimation and Hypothesis Testing José M. Bernardo Universitat de València, Spain jose.m.bernardo@uv.es 9th Valencia International Meeting on Bayesian Statistics Benidorm
More informationChapter 4 HOMEWORK ASSIGNMENTS. 4.1 Homework #1
Chapter 4 HOMEWORK ASSIGNMENTS These homeworks may be modified as the semester progresses. It is your responsibility to keep up to date with the correctly assigned homeworks. There may be some errors in
More informationMinimum Message Length Analysis of the Behrens Fisher Problem
Analysis of the Behrens Fisher Problem Enes Makalic and Daniel F Schmidt Centre for MEGA Epidemiology The University of Melbourne Solomonoff 85th Memorial Conference, 2011 Outline Introduction 1 Introduction
More informationBayesian Regularization
Bayesian Regularization Aad van der Vaart Vrije Universiteit Amsterdam International Congress of Mathematicians Hyderabad, August 2010 Contents Introduction Abstract result Gaussian process priors Co-authors
More informationBeta statistics. Keywords. Bayes theorem. Bayes rule
Keywords Beta statistics Tommy Norberg tommy@chalmers.se Mathematical Sciences Chalmers University of Technology Gothenburg, SWEDEN Bayes s formula Prior density Likelihood Posterior density Conjugate
More informationEXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY
EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA, 00 MODULE : Statistical Inference Time Allowed: Three Hours Candidates should answer FIVE questions. All questions carry equal marks. The
More informationA Very Brief Summary of Statistical Inference, and Examples
A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)
More informationStatistical Inference
Statistical Inference Robert L. Wolpert Institute of Statistics and Decision Sciences Duke University, Durham, NC, USA Week 12. Testing and Kullback-Leibler Divergence 1. Likelihood Ratios Let 1, 2, 2,...
More informationBIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation
BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation Yujin Chung November 29th, 2016 Fall 2016 Yujin Chung Lec13: MLE Fall 2016 1/24 Previous Parametric tests Mean comparisons (normality assumption)
More informationInvariant HPD credible sets and MAP estimators
Bayesian Analysis (007), Number 4, pp. 681 69 Invariant HPD credible sets and MAP estimators Pierre Druilhet and Jean-Michel Marin Abstract. MAP estimators and HPD credible sets are often criticized in
More informationLecture 16: Sample quantiles and their asymptotic properties
Lecture 16: Sample quantiles and their asymptotic properties Estimation of quantiles (percentiles Suppose that X 1,...,X n are i.i.d. random variables from an unknown nonparametric F For p (0,1, G 1 (p
More informationCSC 2541: Bayesian Methods for Machine Learning
CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 10 Alternatives to Monte Carlo Computation Since about 1990, Markov chain Monte Carlo has been the dominant
More informationF & B Approaches to a simple model
A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 215 http://www.astro.cornell.edu/~cordes/a6523 Lecture 11 Applications: Model comparison Challenges in large-scale surveys
More information9 Bayesian inference. 9.1 Subjective probability
9 Bayesian inference 1702-1761 9.1 Subjective probability This is probability regarded as degree of belief. A subjective probability of an event A is assessed as p if you are prepared to stake pm to win
More informationChapter 3. Point Estimation. 3.1 Introduction
Chapter 3 Point Estimation Let (Ω, A, P θ ), P θ P = {P θ θ Θ}be probability space, X 1, X 2,..., X n : (Ω, A) (IR k, B k ) random variables (X, B X ) sample space γ : Θ IR k measurable function, i.e.
More informationPart III. A Decision-Theoretic Approach and Bayesian testing
Part III A Decision-Theoretic Approach and Bayesian testing 1 Chapter 10 Bayesian Inference as a Decision Problem The decision-theoretic framework starts with the following situation. We would like to
More informationBrief Review on Estimation Theory
Brief Review on Estimation Theory K. Abed-Meraim ENST PARIS, Signal and Image Processing Dept. abed@tsi.enst.fr This presentation is essentially based on the course BASTA by E. Moulines Brief review on
More informationECE531 Lecture 10b: Maximum Likelihood Estimation
ECE531 Lecture 10b: Maximum Likelihood Estimation D. Richard Brown III Worcester Polytechnic Institute 05-Apr-2011 Worcester Polytechnic Institute D. Richard Brown III 05-Apr-2011 1 / 23 Introduction So
More informationPrinciples of Statistics
Part II Year 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2018 81 Paper 4, Section II 28K Let g : R R be an unknown function, twice continuously differentiable with g (x) M for
More informationStratégies bayésiennes et fréquentistes dans un modèle de bandit
Stratégies bayésiennes et fréquentistes dans un modèle de bandit thèse effectuée à Telecom ParisTech, co-dirigée par Olivier Cappé, Aurélien Garivier et Rémi Munos Journées MAS, Grenoble, 30 août 2016
More information1 Hypothesis Testing and Model Selection
A Short Course on Bayesian Inference (based on An Introduction to Bayesian Analysis: Theory and Methods by Ghosh, Delampady and Samanta) Module 6: From Chapter 6 of GDS 1 Hypothesis Testing and Model Selection
More informationMotivational Example
Motivational Example Data: Observational longitudinal study of obesity from birth to adulthood. Overall Goal: Build age-, gender-, height-specific growth charts (under 3 year) to diagnose growth abnomalities.
More informationDefault priors and model parametrization
1 / 16 Default priors and model parametrization Nancy Reid O-Bayes09, June 6, 2009 Don Fraser, Elisabeta Marras, Grace Yun-Yi 2 / 16 Well-calibrated priors model f (y; θ), F(y; θ); log-likelihood l(θ)
More informationModule 22: Bayesian Methods Lecture 9 A: Default prior selection
Module 22: Bayesian Methods Lecture 9 A: Default prior selection Peter Hoff Departments of Statistics and Biostatistics University of Washington Outline Jeffreys prior Unit information priors Empirical
More informationOverall Objective Priors
Overall Objective Priors Jim Berger, Jose Bernardo and Dongchu Sun Duke University, University of Valencia and University of Missouri Recent advances in statistical inference: theory and case studies University
More informationObjective Bayesian Hypothesis Testing
Objective Bayesian Hypothesis Testing José M. Bernardo Universitat de València, Spain jose.m.bernardo@uv.es Statistical Science and Philosophy of Science London School of Economics (UK), June 21st, 2010
More informationConjugate Predictive Distributions and Generalized Entropies
Conjugate Predictive Distributions and Generalized Entropies Eduardo Gutiérrez-Peña Department of Probability and Statistics IIMAS-UNAM, Mexico Padova, Italy. 21-23 March, 2013 Menu 1 Antipasto/Appetizer
More informationPenalized Loss functions for Bayesian Model Choice
Penalized Loss functions for Bayesian Model Choice Martyn International Agency for Research on Cancer Lyon, France 13 November 2009 The pure approach For a Bayesian purist, all uncertainty is represented
More informationSTAT215: Solutions for Homework 2
STAT25: Solutions for Homework 2 Due: Wednesday, Feb 4. (0 pt) Suppose we take one observation, X, from the discrete distribution, x 2 0 2 Pr(X x θ) ( θ)/4 θ/2 /2 (3 θ)/2 θ/4, 0 θ Find an unbiased estimator
More information9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures
FE661 - Statistical Methods for Financial Engineering 9. Model Selection Jitkomut Songsiri statistical models overview of model selection information criteria goodness-of-fit measures 9-1 Statistical models
More informationA Very Brief Summary of Statistical Inference, and Examples
A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2009 Prof. Gesine Reinert Our standard situation is that we have data x = x 1, x 2,..., x n, which we view as realisations of random
More informationExtended Bayesian Information Criteria for Model Selection with Large Model Spaces
Extended Bayesian Information Criteria for Model Selection with Large Model Spaces Jiahua Chen, University of British Columbia Zehua Chen, National University of Singapore (Biometrika, 2008) 1 / 18 Variable
More informationLecture 25: Review. Statistics 104. April 23, Colin Rundel
Lecture 25: Review Statistics 104 Colin Rundel April 23, 2012 Joint CDF F (x, y) = P [X x, Y y] = P [(X, Y ) lies south-west of the point (x, y)] Y (x,y) X Statistics 104 (Colin Rundel) Lecture 25 April
More informationBayesian estimation of the discrepancy with misspecified parametric models
Bayesian estimation of the discrepancy with misspecified parametric models Pierpaolo De Blasi University of Torino & Collegio Carlo Alberto Bayesian Nonparametrics workshop ICERM, 17-21 September 2012
More informationIterative Markov Chain Monte Carlo Computation of Reference Priors and Minimax Risk
Iterative Markov Chain Monte Carlo Computation of Reference Priors and Minimax Risk John Lafferty School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 lafferty@cs.cmu.edu Abstract
More informationGraduate Econometrics I: Maximum Likelihood I
Graduate Econometrics I: Maximum Likelihood I Yves Dominicy Université libre de Bruxelles Solvay Brussels School of Economics and Management ECARES Yves Dominicy Graduate Econometrics I: Maximum Likelihood
More informationStatistical Approaches to Learning and Discovery. Week 4: Decision Theory and Risk Minimization. February 3, 2003
Statistical Approaches to Learning and Discovery Week 4: Decision Theory and Risk Minimization February 3, 2003 Recall From Last Time Bayesian expected loss is ρ(π, a) = E π [L(θ, a)] = L(θ, a) df π (θ)
More informationFoundations of Nonparametric Bayesian Methods
1 / 27 Foundations of Nonparametric Bayesian Methods Part II: Models on the Simplex Peter Orbanz http://mlg.eng.cam.ac.uk/porbanz/npb-tutorial.html 2 / 27 Tutorial Overview Part I: Basics Part II: Models
More informationA BAYESIAN MATHEMATICAL STATISTICS PRIMER. José M. Bernardo Universitat de València, Spain
A BAYESIAN MATHEMATICAL STATISTICS PRIMER José M. Bernardo Universitat de València, Spain jose.m.bernardo@uv.es Bayesian Statistics is typically taught, if at all, after a prior exposure to frequentist
More informationMaximum Likelihood Estimation
Chapter 8 Maximum Likelihood Estimation 8. Consistency If X is a random variable (or vector) with density or mass function f θ (x) that depends on a parameter θ, then the function f θ (X) viewed as a function
More informationStatistics: Learning models from data
DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial
More informationSTAT 730 Chapter 4: Estimation
STAT 730 Chapter 4: Estimation Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Analysis 1 / 23 The likelihood We have iid data, at least initially. Each datum
More informationStatistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach
Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score
More informationMIT Spring 2016
MIT 18.655 Dr. Kempthorne Spring 2016 1 MIT 18.655 Outline 1 2 MIT 18.655 Decision Problem: Basic Components P = {P θ : θ Θ} : parametric model. Θ = {θ}: Parameter space. A{a} : Action space. L(θ, a) :
More informationSTA 732: Inference. Notes 10. Parameter Estimation from a Decision Theoretic Angle. Other resources
STA 732: Inference Notes 10. Parameter Estimation from a Decision Theoretic Angle Other resources 1 Statistical rules, loss and risk We saw that a major focus of classical statistics is comparing various
More information6.1 Variational representation of f-divergences
ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 6: Variational representation, HCR and CR lower bounds Lecturer: Yihong Wu Scribe: Georgios Rovatsos, Feb 11, 2016
More informationEM Algorithm II. September 11, 2018
EM Algorithm II September 11, 2018 Review EM 1/27 (Y obs, Y mis ) f (y obs, y mis θ), we observe Y obs but not Y mis Complete-data log likelihood: l C (θ Y obs, Y mis ) = log { f (Y obs, Y mis θ) Observed-data
More informationStatistical Theory MT 2007 Problems 4: Solution sketches
Statistical Theory MT 007 Problems 4: Solution sketches 1. Consider a 1-parameter exponential family model with density f(x θ) = f(x)g(θ)exp{cφ(θ)h(x)}, x X. Suppose that the prior distribution has the
More informationModel comparison: Deviance-based approaches
Model comparison: Deviance-based approaches Patrick Breheny February 19 Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/23 Model comparison Thus far, we have looked at residuals in a fairly
More information1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches
Sta 216, Lecture 4 Last Time: Logistic regression example, existence/uniqueness of MLEs Today s Class: 1. Hypothesis testing through analysis of deviance 2. Standard errors & confidence intervals 3. Model
More informationMISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30
MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD Copyright c 2012 (Iowa State University) Statistics 511 1 / 30 INFORMATION CRITERIA Akaike s Information criterion is given by AIC = 2l(ˆθ) + 2k, where l(ˆθ)
More informationApproximating mixture distributions using finite numbers of components
Approximating mixture distributions using finite numbers of components Christian Röver and Tim Friede Department of Medical Statistics University Medical Center Göttingen March 17, 2016 This project has
More informationλ(x + 1)f g (x) > θ 0
Stat 8111 Final Exam December 16 Eleven students took the exam, the scores were 92, 78, 4 in the 5 s, 1 in the 4 s, 1 in the 3 s and 3 in the 2 s. 1. i) Let X 1, X 2,..., X n be iid each Bernoulli(θ) where
More informationMidterm Examination. STA 215: Statistical Inference. Due Wednesday, 2006 Mar 8, 1:15 pm
Midterm Examination STA 215: Statistical Inference Due Wednesday, 2006 Mar 8, 1:15 pm This is an open-book take-home examination. You may work on it during any consecutive 24-hour period you like; please
More informationParametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory
Statistical Inference Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory IP, José Bioucas Dias, IST, 2007
More informationA NEW INFORMATION THEORETIC APPROACH TO ORDER ESTIMATION PROBLEM. Massachusetts Institute of Technology, Cambridge, MA 02139, U.S.A.
A EW IFORMATIO THEORETIC APPROACH TO ORDER ESTIMATIO PROBLEM Soosan Beheshti Munther A. Dahleh Massachusetts Institute of Technology, Cambridge, MA 0239, U.S.A. Abstract: We introduce a new method of model
More informationFisher Information & Efficiency
Fisher Information & Efficiency Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA 1 Introduction Let f(x θ) be the pdf of X for θ Θ; at times we will also consider a
More informationMaximum Likelihood Estimation
Chapter 7 Maximum Likelihood Estimation 7. Consistency If X is a random variable (or vector) with density or mass function f θ (x) that depends on a parameter θ, then the function f θ (X) viewed as a function
More informationTwo hours. To be supplied by the Examinations Office: Mathematical Formula Tables THE UNIVERSITY OF MANCHESTER. 21 June :45 11:45
Two hours MATH20802 To be supplied by the Examinations Office: Mathematical Formula Tables THE UNIVERSITY OF MANCHESTER STATISTICAL METHODS 21 June 2010 9:45 11:45 Answer any FOUR of the questions. University-approved
More informationFoundations of Statistical Inference
Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2016 Julien Berestycki (University of Oxford) SB2a MT 2016 1 / 32 Lecture 14 : Variational Bayes
More informationSeminar über Statistik FS2008: Model Selection
Seminar über Statistik FS2008: Model Selection Alessia Fenaroli, Ghazale Jazayeri Monday, April 2, 2008 Introduction Model Choice deals with the comparison of models and the selection of a model. It can
More informationIEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm
IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.
More informationInformation geometry for bivariate distribution control
Information geometry for bivariate distribution control C.T.J.Dodson + Hong Wang Mathematics + Control Systems Centre, University of Manchester Institute of Science and Technology Optimal control of stochastic
More informationQualifying Exam in Probability and Statistics. https://www.soa.org/files/edu/edu-exam-p-sample-quest.pdf
Part : Sample Problems for the Elementary Section of Qualifying Exam in Probability and Statistics https://www.soa.org/files/edu/edu-exam-p-sample-quest.pdf Part 2: Sample Problems for the Advanced Section
More informationA MODIFIED LIKELIHOOD RATIO TEST FOR HOMOGENEITY IN FINITE MIXTURE MODELS
A MODIFIED LIKELIHOOD RATIO TEST FOR HOMOGENEITY IN FINITE MIXTURE MODELS Hanfeng Chen, Jiahua Chen and John D. Kalbfleisch Bowling Green State University and University of Waterloo Abstract Testing for
More informationLecture 3. Truncation, length-bias and prevalence sampling
Lecture 3. Truncation, length-bias and prevalence sampling 3.1 Prevalent sampling Statistical techniques for truncated data have been integrated into survival analysis in last two decades. Truncation in
More informationM. J. Bayarri and M. E. Castellanos. University of Valencia and Rey Juan Carlos University
1 BAYESIAN CHECKING OF HIERARCHICAL MODELS M. J. Bayarri and M. E. Castellanos University of Valencia and Rey Juan Carlos University Abstract: Hierarchical models are increasingly used in many applications.
More informationLecture notes on statistical decision theory Econ 2110, fall 2013
Lecture notes on statistical decision theory Econ 2110, fall 2013 Maximilian Kasy March 10, 2014 These lecture notes are roughly based on Robert, C. (2007). The Bayesian choice: from decision-theoretic
More informationLecture 2: Statistical Decision Theory (Part I)
Lecture 2: Statistical Decision Theory (Part I) Hao Helen Zhang Hao Helen Zhang Lecture 2: Statistical Decision Theory (Part I) 1 / 35 Outline of This Note Part I: Statistics Decision Theory (from Statistical
More information5.3 METABOLIC NETWORKS 193. P (x i P a (x i )) (5.30) i=1
5.3 METABOLIC NETWORKS 193 5.3 Metabolic Networks 5.4 Bayesian Networks Let G = (V, E) be a directed acyclic graph. We assume that the vertices i V (1 i n) represent for example genes and correspond to
More informationST440/540: Applied Bayesian Statistics. (9) Model selection and goodness-of-fit checks
(9) Model selection and goodness-of-fit checks Objectives In this module we will study methods for model comparisons and checking for model adequacy For model comparisons there are a finite number of candidate
More informationThe information complexity of sequential resource allocation
The information complexity of sequential resource allocation Emilie Kaufmann, joint work with Olivier Cappé, Aurélien Garivier and Shivaram Kalyanakrishan SMILE Seminar, ENS, June 8th, 205 Sequential allocation
More informationLecture 8: Information Theory and Statistics
Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang
More informationSelection Criteria Based on Monte Carlo Simulation and Cross Validation in Mixed Models
Selection Criteria Based on Monte Carlo Simulation and Cross Validation in Mixed Models Junfeng Shang Bowling Green State University, USA Abstract In the mixed modeling framework, Monte Carlo simulation
More informationLecture 1: Introduction
Principles of Statistics Part II - Michaelmas 208 Lecturer: Quentin Berthet Lecture : Introduction This course is concerned with presenting some of the mathematical principles of statistical theory. One
More informationHypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3
Hypothesis Testing CB: chapter 8; section 0.3 Hypothesis: statement about an unknown population parameter Examples: The average age of males in Sweden is 7. (statement about population mean) The lowest
More informationImproper mixtures and Bayes s theorem
and Bayes s theorem and Han Han Department of Statistics University of Chicago DASF-III conference Toronto, March 2010 Outline Bayes s theorem 1 Bayes s theorem 2 Bayes s theorem Non-Bayesian model: Domain
More informationBayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units
Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Sahar Z Zangeneh Robert W. Keener Roderick J.A. Little Abstract In Probability proportional
More informationBayes spaces: use of improper priors and distances between densities
Bayes spaces: use of improper priors and distances between densities J. J. Egozcue 1, V. Pawlowsky-Glahn 2, R. Tolosana-Delgado 1, M. I. Ortego 1 and G. van den Boogaart 3 1 Universidad Politécnica de
More informationBayesian Inference. Chapter 9. Linear models and regression
Bayesian Inference Chapter 9. Linear models and regression M. Concepcion Ausin Universidad Carlos III de Madrid Master in Business Administration and Quantitative Methods Master in Mathematical Engineering
More informationSection 8.2. Asymptotic normality
30 Section 8.2. Asymptotic normality We assume that X n =(X 1,...,X n ), where the X i s are i.i.d. with common density p(x; θ 0 ) P= {p(x; θ) :θ Θ}. We assume that θ 0 is identified in the sense that
More informationBAYESIAN ANALYSIS OF BIVARIATE COMPETING RISKS MODELS
Sankhyā : The Indian Journal of Statistics 2000, Volume 62, Series B, Pt. 3, pp. 388 401 BAYESIAN ANALYSIS OF BIVARIATE COMPETING RISKS MODELS By CHEN-PIN WANG University of South Florida, Tampa, U.S.A.
More informationStatistics 135 Fall 2007 Midterm Exam
Name: Student ID Number: Statistics 135 Fall 007 Midterm Exam Ignore the finite population correction in all relevant problems. The exam is closed book, but some possibly useful facts about probability
More informationPROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS
PROBABILITY AND INFORMATION THEORY Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Probability space Rules of probability
More informationGeneralized confidence intervals for the ratio or difference of two means for lognormal populations with zeros
UW Biostatistics Working Paper Series 9-7-2006 Generalized confidence intervals for the ratio or difference of two means for lognormal populations with zeros Yea-Hung Chen University of Washington, yeahung@u.washington.edu
More informationStatistics GIDP Ph.D. Qualifying Exam Theory Jan 11, 2016, 9:00am-1:00pm
Statistics GIDP Ph.D. Qualifying Exam Theory Jan, 06, 9:00am-:00pm Instructions: Provide answers on the supplied pads of paper; write on only one side of each sheet. Complete exactly 5 of the 6 problems.
More informationMethods of Estimation
Methods of Estimation MIT 18.655 Dr. Kempthorne Spring 2016 1 Outline Methods of Estimation I 1 Methods of Estimation I 2 X X, X P P = {P θ, θ Θ}. Problem: Finding a function θˆ(x ) which is close to θ.
More information1. Fisher Information
1. Fisher Information Let f(x θ) be a density function with the property that log f(x θ) is differentiable in θ throughout the open p-dimensional parameter set Θ R p ; then the score statistic (or score
More informationChapter 3 : Likelihood function and inference
Chapter 3 : Likelihood function and inference 4 Likelihood function and inference The likelihood Information and curvature Sufficiency and ancilarity Maximum likelihood estimation Non-regular models EM
More informationMML Invariant Linear Regression
MML Invariant Linear Regression Daniel F. Schmidt and Enes Makalic The University of Melbourne Centre for MEGA Epidemiology Carlton VIC 3053, Australia {dschmidt,emakalic}@unimelb.edu.au Abstract. This
More information