Bayesian estimation of the discrepancy with misspecified parametric models

Size: px
Start display at page:

Download "Bayesian estimation of the discrepancy with misspecified parametric models"

Transcription

1 Bayesian estimation of the discrepancy with misspecified parametric models Pierpaolo De Blasi University of Torino & Collegio Carlo Alberto Bayesian Nonparametrics workshop ICERM, September 2012 Joint work with S. Walker 1 / 22

2 Outline Semiparametric density estimation Asymptotics and illustration References 2 / 22

3 BNP density estimation Let X 1,..., X n be exchangeable (i.e. conditionally iid) observations from an unknown density f on the real line. If F is the density space and Π(df ) the prior, via Bayes theorem R Q n Π(A X 1,..., X n) = R A i=1 f (X i) Π(df ) Q n F i=1 f (X i) Π(df ) Wealth of Bayesian nonparametric (BNP) models Dirichlet process mixtures of continuos densities; log spline models; Bernstein polynomials; log Gaussian processes. All with well studied asymptotic properties, e.g. posterior concentration rates Π(f : d(f, f 0 ) > Mɛ n X 1,..., X n) n 0, when X 1, X 2,... are iid from some true f 0. 3 / 22

4 Discrepancy from a parametric model Suppose now we have a favorite parametric family f θ (x), θ Θ R p. likely to be misspecified: there is no θ such that f 0 = f θ. We want to learn about the best parameter value θ 0 which minimizes the Kullback-Leibler divergence from true f 0 : θ 0 = arg min Θ R f0 log(f 0 /f θ ) A nonparametric component W is introduced to model the discrepancy between f 0 and the closest density f θ0 : f θ,w (x) f θ (x) W (x), so that C(x) := W (x) R W (s) fθ (s) ds is designed to estimate C 0 (x) = f 0 (x)/f θ0 (x). 4 / 22

5 Related works - Frequentist Hjort and Glad (1995) Start with a parametric density estimate f ˆθ(x), ˆθ being, e.g., the MLE of θ with respect to the likelihood Q n i=1 log f θ(x i ). Then multiply it with a nonparametric kernel-type of the correction function r(x) = f 0 (x)/f ˆθ (x): ˆf (x) = f ˆθ(x)ˆr(x) = 1 n nx i=1 K h (x i x) f ˆθ (x) f ˆθ(x i ) in a two-stage sequential analysis. ˆf is shown to be more precise than traditional kernel density estimator in a broad neighborhood around the parametric family, while losing little when the f 0 is far from the parametric family. 5 / 22

6 Related works - Bayes Nonparametric prior built around a parametric model via f (x) = f θ (x)g(f θ (x)), where F θ is the cdf of f θ and g is a density on [0, 1] with prior Π. Verdinelli and Wasserman (1999): Π as an infinite exponential family. Application to goodness of fit testing. Rousseau (2008): Π as a mixtures of betas. Application to goodness of fit testing. Tokdar (2007): Π as a log Gaussian process prior. Application to posterior inference for densities with unbounded support. For g(x) = e Z (x) / R 1 0 ez (s) ds and Z Gaussian process with covariance σ(, ), f (x) can be written f (x) f θ (x) e Z (x) {z } W (x) for Z Gaussian process with covariance σ(f θ ( ), F θ ( )). 6 / 22

7 Posterior updating f θ,w (x) f θ (x) W (x), C(x) := W (x) R W (s) fθ (s) ds. Truly semi parametric: aim is at learning about the best parameter θ 0, then at seeing how close f θ0 is to f 0 via C(x) = W (x)/ R W (s) f θ (s) ds. Situation in which the updating process from prior to posterior may be seen as problematic: the model f θ,w is intrinsically non identified in (θ, C) The full Bayesian update π(θ, W x 1,..., x n) π(θ)π(w ) Q n i=1 f θ,w (x i ) is appropriate for learning about f 0 ; it is not so for learning about (θ 0, C 0 ). The marginal posterior π(θ x 1,..., x n) = R π(θ, W x 1,..., x n)dw has no interpretation: it is not identified what parameter value this π is targeting. 7 / 22

8 Posterior updating What removes us from the formal Bayes set up is the desire to specifically learn about θ 0. θ 0 defined without any reference to W, or C. Whether we are interested in learning about C 0 or not, our beliefs about θ 0 should not change. Hence, the appropriate update for θ is the parametric one: π(θ x 1,..., x n) π(θ) Q n i=1 f θ(x i ). We keep updating W according to the semi parametric model, so our updating scheme is π(w θ, x 1,..., x n) π(w ) Q n i=1 f θ,w (x i ), π(θ, W x 1,..., x n) = π(w θ, x 1,..., x n) π(θ x 1,..., x n). non-full Bayesian update 8 / 22

9 Posterior updating π(θ, W x 1,..., x n) = π(w θ, x 1,..., x n) π(θ x 1,..., x n). (θ, W ) are estimated sequentially, with W reflecting additional uncertainty on θ. Marginalization of the posterior over W is well defined, π(w x 1,..., x n) = R Θ π(w θ, x 1,..., x n) π(dθ x 1,..., x n) since π(θ x 1,..., x n) describes the beliefs about the real parameter θ 0. Coherence is about properly defining the quantities of interest and showing that Bayesian updates provide learning about these quantities and this is checked by what is yielded asymptotically. Hence we seek frequentist validation: we show that the posterior of (θ, C) converges to a point mass at (θ 0, C 0 ). 9 / 22

10 Lenk (2003) Let I be a compact interval on the real line and Z a Gaussian process. Lenk (2003) considers the semi parametric density model f (x) = f θ(x) e R Z (x) f I θ(s) e Z (s) ds for f θ (x) member of the exponential family. In the Loève expansion of Z (x), the orthogonal basis is chosen so that the sample paths integrate to zero. Further assumption for identification: the orthogonal basis does not contain any of the canonical statistics of f θ (x). Estimation based on truncation of the series expansion or by imputation of the Gaussian process at a fixed grid of points, see Tokdar (2007). 10 / 22

11 Bounded W (x) Building upon Lenk (2003), we keep working with Gaussian processes and consider f θ,w (x) = f θ(x) W (x) RI f, W (x) = Ψ(Z (x)) θ(s) W (s)ds where Ψ(u) is a cdf having a smooth unimodal symmetric density ψ(u) on the real line. With an additional condition on Ψ(u), we can show that W (x) preserves the asymptotic properties of log Gaussian process prior. On the other hand, with W (x) 1, Walker (2011) describes a latent model which can deal with the intractable normalizing constant. It is based on X k=0 n + k 1 k!»z k «n 1 f θ (s) (1 W (s)) ds = R. W (s) fθ (s)ds 11 / 22

12 Link function Ψ(u) Lipschitz condition on log Ψ(u): ψ(u)/ψ(u) m uniformly on R satisfied by the standard Laplace cdf, standard logistic cdf or standard Cauchy cdf, but not by the standard normal cdf. For fixed θ, write p z = f θ,ψ(z). It can be shown that, when z 1 z 2 < ɛ, ( h(p z1, p z2 ) mɛ e mɛ/2 K (p z1, p z2 ) m 2 ɛ 2 e mɛ (1 + mɛ) Posterior asymptotic results of van der Vaart and van Zanten (2008) carries over to this setting: If Ψ 1 (f 0 /f θ ) is contained in the support of Z, then Π {p z : h(p z, f 0 ) > ɛ X 1,..., X n} 0, F 0 a.s. Results on posterior contraction rate can be also derived. 12 / 22

13 Conditional posterior of W (A) Lipschitz condition on log Ψ(u); (B) f θ (x) is continuous and bounded away from zero; (C) the support of Z contains the space C(I) of continuous densities on I. Theorem 1. Under assumptions (A), (B) and (C), the conditional posterior of W given θ is exponentially consistent at all f 0 C(I), i.e. for any ɛ > 0, π {W : h(f θ,w, f 0 ) > ɛ θ, X 1,..., X n} e dn, F 0 a.s. for some d > 0 as n. As corollary, for fixed θ, the posterior of C(x) = W (x)/ R I f θ(s)w (s)ds consistently estimates the discrepancy f 0 (x)/f θ (x). The exponential convergence to 0 is a by-product of standard techniques for proving posterior consistency. 13 / 22

14 Marginal posterior of θ For given f 0, let θ 0 be the parameter value that minimize R I f 0 log(f 0 /f θ ): θ 0 = arg min Θ R f0 log(f 0 /f θ ) Under some regularity condition on the family f θ and on the prior at θ 0, the posterior accumulates at θ 0 with rate n: π θ θ 0 > M nn 1/2 X 1,..., X n 0, F 0 a.s. see Kleijn and van der Vaart (2012). One of the key regularity conditions on f θ is the existence of an open neighborhood of θ 0 and a square-integrable function m θ0 (x) such that, for all θ 1, θ 2 U, log(f θ1 /f θ2 ) m θ0 θ 1 θ 2, P 0 a.s. For our purposes, we focus on a different local property: there exist α > 0 and an open neighborhood U of θ 0 such that for all θ 1, θ 2 U: log f θ1 /f θ2 θ 1 θ 2 α (D) 14 / 22

15 Marginal posterior of W Assume the regularity conditions on f θ and π(θ) are satisfied for f 0 C(I). Recall the definition of the marginal posterior of W, π(w x 1,..., x n) = R Θ π(w θ, x 1,..., x n)π(dθ x 1,..., x n) Theorem 2. Under assumptions (A), (B), (C) and (D), the marginal posterior of W (x) satisfies as n. π W : h(f θ0,w, f 0 ) > ɛ X 1,..., X n 0, F 0 a.s. The marginal posterior of W is evaluated outside a neighborhood defined in terms of θ 0. Clearly, if π(θ) is degenerate at θ 0, the result follows directly from Theorem 1. Hint of the proof: sufficient to consider the posterior when the prior is restricted on θ θ 0 M nn 1/2. We then manipulate numerator and denominator by using (D) together with the inequalities R exp{ log(f θ0 /f θ ) } R f I θ 0 (x)w (x)dx f I θ(x)w (x)dx exp{ log(f θ 0 /f θ ) } 15 / 22

16 Marginal posterior of C Recall the definition C(x) = C θ,w (x) = W (x) R W (s) fθ (s) ds, which is designed to estimate C 0 (x) = f 0 (x)/f θ0 (x). Corollary. Under the hypotheses of Theorem 2, as n, π R I C C 0 > ɛ X 1,..., X n 0, F 0 a.s. Together with π θ θ 0 > M nn 1/2 X 1,..., X n 0, we conclude that the posterior of (θ, C) converges to (θ 0, C 0 ). Hint of the proof: Theorem 2 implies that R I C θ 0,W C 0 goes to 0. By triangular inequality, it is sufficient to show that, uniformly over θ θ 0 Mn 1/2, R I C θ,w C θ0,w 0. This we show by using (D). 16 / 22

17 Illustration 1 n = 500 observations from f 0 (x) = 2(1 x); f θ (x) = θe θx /(1 e θ ) with improper prior π(θ) 1/θ. C(x) x Figure: Estimated (bold) and true (dashed) functions of C 0 (x) 17 / 22

18 Parametric Bayes update Simulation from the posterior of θ. The minimum K-L parameter value is θ 0 = Density theta Figure: Posterior distribution of θ with parametric Bayes update. Posterior mean / 22

19 Full Bayes update Using the proper conditional posterior π(θ W, x 1,..., x n) π(θ) Q i f θ,w (x i ) Density theta Figure: Posterior distribution of θ with formal Bayes update. Posterior mean / 22

20 Illustration 2 n = 500 observations from f 0 (x) = 2x; f θ (x) = θx θ 1 with 0 < θ < 1 and uniform prior for θ. θ 0 = theta theta Figure: Posterior distributions of θ with parametric Bayes update (top) and formal Bayes update (bottom). 20 / 22

21 Discussion Both the proposed update and formal Bayes update seem to provide a suitable estimate for C, which is not surprising given its flexibility of estimating C 0 with alternative values of θ. Yet the posterior for θ is more accurate for the parametric Bayesian posterior. It shows that semiparametric models need to be thought about carefully: the parametric part needs to define which θ has been targeted. Future work will deal with f θ with unbounded support. Extension to posterior contraction rates. Connections with asymptotic properties of empirical Bayes. Use the C(x) function for model selection 21 / 22

22 References De Blasi & Walker (2012). Bayesian estimation of the discrepancy with misspecified parametric models. Tech. Rep., submitted. Hjort & Glad (1995). Nonparametric density estimation with a parametric start. Ann. Statist. 23, Kleijn & van der Vaart (2012). The Bernstein-Von Mises theorem under misspecification. Electron. J. Stat. 6, Lenk (1988). The logistic normal distribution for Bayesian, nonparametric, predictive densities. J. Amer. Statist. Assoc. 83, Rousseau (2008). Approximating interval hypothesis: p-values and Bayes factors. In Bayesian Statistics 8, Tokdar (2007). Towards a faster implementation of density estimation with logistic Gaussian process priors. J. Comp. Graph. Statist. 16, van der Vaart & van Zanten (2008). Rates of contraction of posterior distributions based on Gaussian process priors. Ann. Statist. 36, Verdinelli & Wasserman (1998). Bayesuan goodness of fit testing using infinite dimensional exponential families. Ann. Statist. 26, Walker (2011). Posterior sampling when the normalizing constant is unknown. Comm. Statist. Simulation Comput. 40, / 22

ICES REPORT Model Misspecification and Plausibility

ICES REPORT Model Misspecification and Plausibility ICES REPORT 14-21 August 2014 Model Misspecification and Plausibility by Kathryn Farrell and J. Tinsley Odena The Institute for Computational Engineering and Sciences The University of Texas at Austin

More information

Asymptotics for posterior hazards

Asymptotics for posterior hazards Asymptotics for posterior hazards Igor Prünster University of Turin, Collegio Carlo Alberto and ICER Joint work with P. Di Biasi and G. Peccati Workshop on Limit Theorems and Applications Paris, 16th January

More information

Nonparametric Bayesian Methods - Lecture I

Nonparametric Bayesian Methods - Lecture I Nonparametric Bayesian Methods - Lecture I Harry van Zanten Korteweg-de Vries Institute for Mathematics CRiSM Masterclass, April 4-6, 2016 Overview of the lectures I Intro to nonparametric Bayesian statistics

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables

More information

Bayesian Sparse Linear Regression with Unknown Symmetric Error

Bayesian Sparse Linear Regression with Unknown Symmetric Error Bayesian Sparse Linear Regression with Unknown Symmetric Error Minwoo Chae 1 Joint work with Lizhen Lin 2 David B. Dunson 3 1 Department of Mathematics, The University of Texas at Austin 2 Department of

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Bayesian Nonparametrics

Bayesian Nonparametrics Bayesian Nonparametrics Peter Orbanz Columbia University PARAMETERS AND PATTERNS Parameters P(X θ) = Probability[data pattern] 3 2 1 0 1 2 3 5 0 5 Inference idea data = underlying pattern + independent

More information

Inconsistency of Bayesian inference when the model is wrong, and how to repair it

Inconsistency of Bayesian inference when the model is wrong, and how to repair it Inconsistency of Bayesian inference when the model is wrong, and how to repair it Peter Grünwald Thijs van Ommen Centrum Wiskunde & Informatica, Amsterdam Universiteit Leiden June 3, 2015 Outline 1 Introduction

More information

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 Nasser Sadeghkhani a.sadeghkhani@queensu.ca There are two main schools to statistical inference: 1-frequentist

More information

Bayesian Regularization

Bayesian Regularization Bayesian Regularization Aad van der Vaart Vrije Universiteit Amsterdam International Congress of Mathematicians Hyderabad, August 2010 Contents Introduction Abstract result Gaussian process priors Co-authors

More information

Semiparametric posterior limits

Semiparametric posterior limits Statistics Department, Seoul National University, Korea, 2012 Semiparametric posterior limits for regular and some irregular problems Bas Kleijn, KdV Institute, University of Amsterdam Based on collaborations

More information

Stat 451 Lecture Notes Numerical Integration

Stat 451 Lecture Notes Numerical Integration Stat 451 Lecture Notes 03 12 Numerical Integration Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 5 in Givens & Hoeting, and Chapters 4 & 18 of Lange 2 Updated: February 11, 2016 1 / 29

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

Bayesian Consistency for Markov Models

Bayesian Consistency for Markov Models Bayesian Consistency for Markov Models Isadora Antoniano-Villalobos Bocconi University, Milan, Italy. Stephen G. Walker University of Texas at Austin, USA. Abstract We consider sufficient conditions for

More information

Bayesian nonparametrics

Bayesian nonparametrics Bayesian nonparametrics Posterior contraction and limiting shape Ismaël Castillo LPMA Université Paris VI Berlin, September 2016 Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September

More information

Latent factor density regression models

Latent factor density regression models Biometrika (2010), 97, 1, pp. 1 7 C 2010 Biometrika Trust Printed in Great Britain Advance Access publication on 31 July 2010 Latent factor density regression models BY A. BHATTACHARYA, D. PATI, D.B. DUNSON

More information

Bayesian asymptotics with misspecified models

Bayesian asymptotics with misspecified models ISSN 2279-9362 Bayesian asymptotics with misspecified models Pierpaolo De Blasi Stephen G. Walker No. 270 October 2012 www.carloalberto.org/research/working-papers 2012 by Pierpaolo De Blasi and Stephen

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our

More information

Can we do statistical inference in a non-asymptotic way? 1

Can we do statistical inference in a non-asymptotic way? 1 Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.

More information

DISCUSSION: COVERAGE OF BAYESIAN CREDIBLE SETS. By Subhashis Ghosal North Carolina State University

DISCUSSION: COVERAGE OF BAYESIAN CREDIBLE SETS. By Subhashis Ghosal North Carolina State University Submitted to the Annals of Statistics DISCUSSION: COVERAGE OF BAYESIAN CREDIBLE SETS By Subhashis Ghosal North Carolina State University First I like to congratulate the authors Botond Szabó, Aad van der

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

Invariant HPD credible sets and MAP estimators

Invariant HPD credible sets and MAP estimators Bayesian Analysis (007), Number 4, pp. 681 69 Invariant HPD credible sets and MAP estimators Pierre Druilhet and Jean-Michel Marin Abstract. MAP estimators and HPD credible sets are often criticized in

More information

A Very Brief Summary of Bayesian Inference, and Examples

A Very Brief Summary of Bayesian Inference, and Examples A Very Brief Summary of Bayesian Inference, and Examples Trinity Term 009 Prof Gesine Reinert Our starting point are data x = x 1, x,, x n, which we view as realisations of random variables X 1, X,, X

More information

Stochastic Spectral Approaches to Bayesian Inference

Stochastic Spectral Approaches to Bayesian Inference Stochastic Spectral Approaches to Bayesian Inference Prof. Nathan L. Gibson Department of Mathematics Applied Mathematics and Computation Seminar March 4, 2011 Prof. Gibson (OSU) Spectral Approaches to

More information

NONPARAMETRIC BAYESIAN INFERENCE ON PLANAR SHAPES

NONPARAMETRIC BAYESIAN INFERENCE ON PLANAR SHAPES NONPARAMETRIC BAYESIAN INFERENCE ON PLANAR SHAPES Author: Abhishek Bhattacharya Coauthor: David Dunson Department of Statistical Science, Duke University 7 th Workshop on Bayesian Nonparametrics Collegio

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Computer Emulation With Density Estimation

Computer Emulation With Density Estimation Computer Emulation With Density Estimation Jake Coleman, Robert Wolpert May 8, 2017 Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, 2017 1 / 17 Computer Emulation Motivation Expensive

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

Adaptive Bayesian procedures using random series priors

Adaptive Bayesian procedures using random series priors Adaptive Bayesian procedures using random series priors WEINING SHEN 1 and SUBHASHIS GHOSAL 2 1 Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030 2 Department

More information

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory Statistical Inference Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory IP, José Bioucas Dias, IST, 2007

More information

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Review. DS GA 1002 Statistical and Mathematical Models.   Carlos Fernandez-Granda Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with

More information

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b) LECTURE 5 NOTES 1. Bayesian point estimators. In the conventional (frequentist) approach to statistical inference, the parameter θ Θ is considered a fixed quantity. In the Bayesian approach, it is considered

More information

Foundations of Nonparametric Bayesian Methods

Foundations of Nonparametric Bayesian Methods 1 / 27 Foundations of Nonparametric Bayesian Methods Part II: Models on the Simplex Peter Orbanz http://mlg.eng.cam.ac.uk/porbanz/npb-tutorial.html 2 / 27 Tutorial Overview Part I: Basics Part II: Models

More information

Consensus and Distributed Inference Rates Using Network Divergence

Consensus and Distributed Inference Rates Using Network Divergence DIMACS August 2017 1 / 26 Consensus and Distributed Inference Rates Using Network Divergence Anand D. Department of Electrical and Computer Engineering, The State University of New Jersey August 23, 2017

More information

Asymptotic properties of posterior distributions in nonparametric regression with non-gaussian errors

Asymptotic properties of posterior distributions in nonparametric regression with non-gaussian errors Ann Inst Stat Math (29) 61:835 859 DOI 1.17/s1463-8-168-2 Asymptotic properties of posterior distributions in nonparametric regression with non-gaussian errors Taeryon Choi Received: 1 January 26 / Revised:

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)

More information

12 - Nonparametric Density Estimation

12 - Nonparametric Density Estimation ST 697 Fall 2017 1/49 12 - Nonparametric Density Estimation ST 697 Fall 2017 University of Alabama Density Review ST 697 Fall 2017 2/49 Continuous Random Variables ST 697 Fall 2017 3/49 1.0 0.8 F(x) 0.6

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

Machine Learning using Bayesian Approaches

Machine Learning using Bayesian Approaches Machine Learning using Bayesian Approaches Sargur N. Srihari University at Buffalo, State University of New York 1 Outline 1. Progress in ML and PR 2. Fully Bayesian Approach 1. Probability theory Bayes

More information

Bayesian Regression with Heteroscedastic Error Density and Parametric Mean Function

Bayesian Regression with Heteroscedastic Error Density and Parametric Mean Function Bayesian Regression with Heteroscedastic Error Density and Parametric Mean Function Justinas Pelenis pelenis@ihs.ac.at Institute for Advanced Studies, Vienna May 8, 2013 Abstract This paper considers a

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

Overall Objective Priors

Overall Objective Priors Overall Objective Priors Jim Berger, Jose Bernardo and Dongchu Sun Duke University, University of Valencia and University of Missouri Recent advances in statistical inference: theory and case studies University

More information

Nonparametric Bayesian Uncertainty Quantification

Nonparametric Bayesian Uncertainty Quantification Nonparametric Bayesian Uncertainty Quantification Lecture 1: Introduction to Nonparametric Bayes Aad van der Vaart Universiteit Leiden, Netherlands YES, Eindhoven, January 2017 Contents Introduction Recovery

More information

STAT 518 Intro Student Presentation

STAT 518 Intro Student Presentation STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

Bayesian Regression Linear and Logistic Regression

Bayesian Regression Linear and Logistic Regression When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we

More information

Part III. A Decision-Theoretic Approach and Bayesian testing

Part III. A Decision-Theoretic Approach and Bayesian testing Part III A Decision-Theoretic Approach and Bayesian testing 1 Chapter 10 Bayesian Inference as a Decision Problem The decision-theoretic framework starts with the following situation. We would like to

More information

Bayes spaces: use of improper priors and distances between densities

Bayes spaces: use of improper priors and distances between densities Bayes spaces: use of improper priors and distances between densities J. J. Egozcue 1, V. Pawlowsky-Glahn 2, R. Tolosana-Delgado 1, M. I. Ortego 1 and G. van den Boogaart 3 1 Universidad Politécnica de

More information

Curve Fitting Re-visited, Bishop1.2.5

Curve Fitting Re-visited, Bishop1.2.5 Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood differentiation p(t x, w, β) = Maximum Likelihood N N ( t n y(x n, w), β 1). (1.61) n=1 As we did in the case of the

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

On the Choice of Parametric Families of Copulas

On the Choice of Parametric Families of Copulas On the Choice of Parametric Families of Copulas Radu Craiu Department of Statistics University of Toronto Collaborators: Mariana Craiu, University Politehnica, Bucharest Vienna, July 2008 Outline 1 Brief

More information

Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood

Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood Kuangyu Wen & Ximing Wu Texas A&M University Info-Metrics Institute Conference: Recent Innovations in Info-Metrics October

More information

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting

More information

Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart

Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart 1 Motivation and Problem In Lecture 1 we briefly saw how histograms

More information

Introduction An approximated EM algorithm Simulation studies Discussion

Introduction An approximated EM algorithm Simulation studies Discussion 1 / 33 An Approximated Expectation-Maximization Algorithm for Analysis of Data with Missing Values Gong Tang Department of Biostatistics, GSPH University of Pittsburgh NISS Workshop on Nonignorable Nonresponse

More information

Asymptotics for posterior hazards

Asymptotics for posterior hazards Asymptotics for posterior hazards Pierpaolo De Blasi University of Turin 10th August 2007, BNR Workshop, Isaac Newton Intitute, Cambridge, UK Joint work with Giovanni Peccati (Université Paris VI) and

More information

Bayesian model selection for computer model validation via mixture model estimation

Bayesian model selection for computer model validation via mixture model estimation Bayesian model selection for computer model validation via mixture model estimation Kaniav Kamary ATER, CNAM Joint work with É. Parent, P. Barbillon, M. Keller and N. Bousquet Outline Computer model validation

More information

Irr. Statistical Methods in Experimental Physics. 2nd Edition. Frederick James. World Scientific. CERN, Switzerland

Irr. Statistical Methods in Experimental Physics. 2nd Edition. Frederick James. World Scientific. CERN, Switzerland Frederick James CERN, Switzerland Statistical Methods in Experimental Physics 2nd Edition r i Irr 1- r ri Ibn World Scientific NEW JERSEY LONDON SINGAPORE BEIJING SHANGHAI HONG KONG TAIPEI CHENNAI CONTENTS

More information

Statistica Sinica Preprint No: SS R2

Statistica Sinica Preprint No: SS R2 Statistica Sinica Preprint No: SS-2017-0074.R2 Title The semi-parametric Bernstein-von Mises theorem for regression models with symmetric errors Manuscript ID SS-2017-0074.R2 URL http://www.stat.sinica.edu.tw/statistica/

More information

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

Improper mixtures and Bayes s theorem

Improper mixtures and Bayes s theorem and Bayes s theorem and Han Han Department of Statistics University of Chicago DASF-III conference Toronto, March 2010 Outline Bayes s theorem 1 Bayes s theorem 2 Bayes s theorem Non-Bayesian model: Domain

More information

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Elizabeth C. Mannshardt-Shamseldin Advisor: Richard L. Smith Duke University Department

More information

Part 1: Expectation Propagation

Part 1: Expectation Propagation Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 1: Expectation Propagation Tom Heskes Machine Learning Group, Institute for Computing and Information Sciences Radboud

More information

Exchangeability. Peter Orbanz. Columbia University

Exchangeability. Peter Orbanz. Columbia University Exchangeability Peter Orbanz Columbia University PARAMETERS AND PATTERNS Parameters P(X θ) = Probability[data pattern] 3 2 1 0 1 2 3 5 0 5 Inference idea data = underlying pattern + independent noise Peter

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample

More information

ASYMPTOTIC PROPERTIES OF PREDICTIVE RECURSION: ROBUSTNESS AND RATE OF CONVERGENCE

ASYMPTOTIC PROPERTIES OF PREDICTIVE RECURSION: ROBUSTNESS AND RATE OF CONVERGENCE ASYMPTOTIC PROPERTIES OF PREDICTIVE RECURSION: ROBUSTNESS AND RATE OF CONVERGENCE By Ryan Martin and Surya T. Tokdar Indiana University-Purdue University Indianapolis and Duke University Here we explore

More information

Expectation Propagation for Approximate Bayesian Inference

Expectation Propagation for Approximate Bayesian Inference Expectation Propagation for Approximate Bayesian Inference José Miguel Hernández Lobato Universidad Autónoma de Madrid, Computer Science Department February 5, 2007 1/ 24 Bayesian Inference Inference Given

More information

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33 Hypothesis Testing Econ 690 Purdue University Justin L. Tobias (Purdue) Testing 1 / 33 Outline 1 Basic Testing Framework 2 Testing with HPD intervals 3 Example 4 Savage Dickey Density Ratio 5 Bartlett

More information

Time Series and Dynamic Models

Time Series and Dynamic Models Time Series and Dynamic Models Section 1 Intro to Bayesian Inference Carlos M. Carvalho The University of Texas at Austin 1 Outline 1 1. Foundations of Bayesian Statistics 2. Bayesian Estimation 3. The

More information

STAT J535: Chapter 5: Classes of Bayesian Priors

STAT J535: Chapter 5: Classes of Bayesian Priors STAT J535: Chapter 5: Classes of Bayesian Priors David B. Hitchcock E-Mail: hitchcock@stat.sc.edu Spring 2012 The Bayesian Prior A prior distribution must be specified in a Bayesian analysis. The choice

More information

Covariance function estimation in Gaussian process regression

Covariance function estimation in Gaussian process regression Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian

More information

Log Gaussian Cox Processes. Chi Group Meeting February 23, 2016

Log Gaussian Cox Processes. Chi Group Meeting February 23, 2016 Log Gaussian Cox Processes Chi Group Meeting February 23, 2016 Outline Typical motivating application Introduction to LGCP model Brief overview of inference Applications in my work just getting started

More information

Introduction to Bayesian Statistics

Introduction to Bayesian Statistics Bayesian Parameter Estimation Introduction to Bayesian Statistics Harvey Thornburg Center for Computer Research in Music and Acoustics (CCRMA) Department of Music, Stanford University Stanford, California

More information

Short Questions (Do two out of three) 15 points each

Short Questions (Do two out of three) 15 points each Econometrics Short Questions Do two out of three) 5 points each ) Let y = Xβ + u and Z be a set of instruments for X When we estimate β with OLS we project y onto the space spanned by X along a path orthogonal

More information

Empirical Bayes Deconvolution Problem

Empirical Bayes Deconvolution Problem Empirical Bayes Deconvolution Problem Bayes Deconvolution Problem Unknown prior density g(θ) gives unobserved realizations Θ 1, Θ 2,..., Θ N iid g(θ) Each Θ k gives observed X k p Θk (x) [p Θ (x) known]

More information

Expectation Propagation Algorithm

Expectation Propagation Algorithm Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,

More information

5 Introduction to the Theory of Order Statistics and Rank Statistics

5 Introduction to the Theory of Order Statistics and Rank Statistics 5 Introduction to the Theory of Order Statistics and Rank Statistics This section will contain a summary of important definitions and theorems that will be useful for understanding the theory of order

More information

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model

More information

Bayesian Aggregation for Extraordinarily Large Dataset

Bayesian Aggregation for Extraordinarily Large Dataset Bayesian Aggregation for Extraordinarily Large Dataset Guang Cheng 1 Department of Statistics Purdue University www.science.purdue.edu/bigdata Department Seminar Statistics@LSE May 19, 2017 1 A Joint Work

More information

Flexible Regression Modeling using Bayesian Nonparametric Mixtures

Flexible Regression Modeling using Bayesian Nonparametric Mixtures Flexible Regression Modeling using Bayesian Nonparametric Mixtures Athanasios Kottas Department of Applied Mathematics and Statistics University of California, Santa Cruz Department of Statistics Brigham

More information

Foundations of Statistical Inference

Foundations of Statistical Inference Foundations of Statistical Inference Jonathan Marchini Department of Statistics University of Oxford MT 2013 Jonathan Marchini (University of Oxford) BS2a MT 2013 1 / 27 Course arrangements Lectures M.2

More information

Econometrics I, Estimation

Econometrics I, Estimation Econometrics I, Estimation Department of Economics Stanford University September, 2008 Part I Parameter, Estimator, Estimate A parametric is a feature of the population. An estimator is a function of the

More information

The consequences of misspecifying the random effects distribution when fitting generalized linear mixed models

The consequences of misspecifying the random effects distribution when fitting generalized linear mixed models The consequences of misspecifying the random effects distribution when fitting generalized linear mixed models John M. Neuhaus Charles E. McCulloch Division of Biostatistics University of California, San

More information

is a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications.

is a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications. Stat 811 Lecture Notes The Wald Consistency Theorem Charles J. Geyer April 9, 01 1 Analyticity Assumptions Let { f θ : θ Θ } be a family of subprobability densities 1 with respect to a measure µ on a measurable

More information

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

CSC 2541: Bayesian Methods for Machine Learning

CSC 2541: Bayesian Methods for Machine Learning CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 10 Alternatives to Monte Carlo Computation Since about 1990, Markov chain Monte Carlo has been the dominant

More information

Theory of Maximum Likelihood Estimation. Konstantin Kashin

Theory of Maximum Likelihood Estimation. Konstantin Kashin Gov 2001 Section 5: Theory of Maximum Likelihood Estimation Konstantin Kashin February 28, 2013 Outline Introduction Likelihood Examples of MLE Variance of MLE Asymptotic Properties What is Statistical

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

STAT215: Solutions for Homework 2

STAT215: Solutions for Homework 2 STAT25: Solutions for Homework 2 Due: Wednesday, Feb 4. (0 pt) Suppose we take one observation, X, from the discrete distribution, x 2 0 2 Pr(X x θ) ( θ)/4 θ/2 /2 (3 θ)/2 θ/4, 0 θ Find an unbiased estimator

More information

Efficiency of Profile/Partial Likelihood in the Cox Model

Efficiency of Profile/Partial Likelihood in the Cox Model Efficiency of Profile/Partial Likelihood in the Cox Model Yuichi Hirose School of Mathematics, Statistics and Operations Research, Victoria University of Wellington, New Zealand Summary. This paper shows

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a

More information

Nonparametric Bayes Inference on Manifolds with Applications

Nonparametric Bayes Inference on Manifolds with Applications Nonparametric Bayes Inference on Manifolds with Applications Abhishek Bhattacharya Indian Statistical Institute Based on the book Nonparametric Statistics On Manifolds With Applications To Shape Spaces

More information

POSTERIOR CONSISTENCY IN CONDITIONAL DENSITY ESTIMATION BY COVARIATE DEPENDENT MIXTURES. By Andriy Norets and Justinas Pelenis

POSTERIOR CONSISTENCY IN CONDITIONAL DENSITY ESTIMATION BY COVARIATE DEPENDENT MIXTURES. By Andriy Norets and Justinas Pelenis Resubmitted to Econometric Theory POSTERIOR CONSISTENCY IN CONDITIONAL DENSITY ESTIMATION BY COVARIATE DEPENDENT MIXTURES By Andriy Norets and Justinas Pelenis Princeton University and Institute for Advanced

More information

9.520: Class 20. Bayesian Interpretations. Tomaso Poggio and Sayan Mukherjee

9.520: Class 20. Bayesian Interpretations. Tomaso Poggio and Sayan Mukherjee 9.520: Class 20 Bayesian Interpretations Tomaso Poggio and Sayan Mukherjee Plan Bayesian interpretation of Regularization Bayesian interpretation of the regularizer Bayesian interpretation of quadratic

More information