Bayesian estimation of the discrepancy with misspecified parametric models
|
|
- Bartholomew Patterson
- 5 years ago
- Views:
Transcription
1 Bayesian estimation of the discrepancy with misspecified parametric models Pierpaolo De Blasi University of Torino & Collegio Carlo Alberto Bayesian Nonparametrics workshop ICERM, September 2012 Joint work with S. Walker 1 / 22
2 Outline Semiparametric density estimation Asymptotics and illustration References 2 / 22
3 BNP density estimation Let X 1,..., X n be exchangeable (i.e. conditionally iid) observations from an unknown density f on the real line. If F is the density space and Π(df ) the prior, via Bayes theorem R Q n Π(A X 1,..., X n) = R A i=1 f (X i) Π(df ) Q n F i=1 f (X i) Π(df ) Wealth of Bayesian nonparametric (BNP) models Dirichlet process mixtures of continuos densities; log spline models; Bernstein polynomials; log Gaussian processes. All with well studied asymptotic properties, e.g. posterior concentration rates Π(f : d(f, f 0 ) > Mɛ n X 1,..., X n) n 0, when X 1, X 2,... are iid from some true f 0. 3 / 22
4 Discrepancy from a parametric model Suppose now we have a favorite parametric family f θ (x), θ Θ R p. likely to be misspecified: there is no θ such that f 0 = f θ. We want to learn about the best parameter value θ 0 which minimizes the Kullback-Leibler divergence from true f 0 : θ 0 = arg min Θ R f0 log(f 0 /f θ ) A nonparametric component W is introduced to model the discrepancy between f 0 and the closest density f θ0 : f θ,w (x) f θ (x) W (x), so that C(x) := W (x) R W (s) fθ (s) ds is designed to estimate C 0 (x) = f 0 (x)/f θ0 (x). 4 / 22
5 Related works - Frequentist Hjort and Glad (1995) Start with a parametric density estimate f ˆθ(x), ˆθ being, e.g., the MLE of θ with respect to the likelihood Q n i=1 log f θ(x i ). Then multiply it with a nonparametric kernel-type of the correction function r(x) = f 0 (x)/f ˆθ (x): ˆf (x) = f ˆθ(x)ˆr(x) = 1 n nx i=1 K h (x i x) f ˆθ (x) f ˆθ(x i ) in a two-stage sequential analysis. ˆf is shown to be more precise than traditional kernel density estimator in a broad neighborhood around the parametric family, while losing little when the f 0 is far from the parametric family. 5 / 22
6 Related works - Bayes Nonparametric prior built around a parametric model via f (x) = f θ (x)g(f θ (x)), where F θ is the cdf of f θ and g is a density on [0, 1] with prior Π. Verdinelli and Wasserman (1999): Π as an infinite exponential family. Application to goodness of fit testing. Rousseau (2008): Π as a mixtures of betas. Application to goodness of fit testing. Tokdar (2007): Π as a log Gaussian process prior. Application to posterior inference for densities with unbounded support. For g(x) = e Z (x) / R 1 0 ez (s) ds and Z Gaussian process with covariance σ(, ), f (x) can be written f (x) f θ (x) e Z (x) {z } W (x) for Z Gaussian process with covariance σ(f θ ( ), F θ ( )). 6 / 22
7 Posterior updating f θ,w (x) f θ (x) W (x), C(x) := W (x) R W (s) fθ (s) ds. Truly semi parametric: aim is at learning about the best parameter θ 0, then at seeing how close f θ0 is to f 0 via C(x) = W (x)/ R W (s) f θ (s) ds. Situation in which the updating process from prior to posterior may be seen as problematic: the model f θ,w is intrinsically non identified in (θ, C) The full Bayesian update π(θ, W x 1,..., x n) π(θ)π(w ) Q n i=1 f θ,w (x i ) is appropriate for learning about f 0 ; it is not so for learning about (θ 0, C 0 ). The marginal posterior π(θ x 1,..., x n) = R π(θ, W x 1,..., x n)dw has no interpretation: it is not identified what parameter value this π is targeting. 7 / 22
8 Posterior updating What removes us from the formal Bayes set up is the desire to specifically learn about θ 0. θ 0 defined without any reference to W, or C. Whether we are interested in learning about C 0 or not, our beliefs about θ 0 should not change. Hence, the appropriate update for θ is the parametric one: π(θ x 1,..., x n) π(θ) Q n i=1 f θ(x i ). We keep updating W according to the semi parametric model, so our updating scheme is π(w θ, x 1,..., x n) π(w ) Q n i=1 f θ,w (x i ), π(θ, W x 1,..., x n) = π(w θ, x 1,..., x n) π(θ x 1,..., x n). non-full Bayesian update 8 / 22
9 Posterior updating π(θ, W x 1,..., x n) = π(w θ, x 1,..., x n) π(θ x 1,..., x n). (θ, W ) are estimated sequentially, with W reflecting additional uncertainty on θ. Marginalization of the posterior over W is well defined, π(w x 1,..., x n) = R Θ π(w θ, x 1,..., x n) π(dθ x 1,..., x n) since π(θ x 1,..., x n) describes the beliefs about the real parameter θ 0. Coherence is about properly defining the quantities of interest and showing that Bayesian updates provide learning about these quantities and this is checked by what is yielded asymptotically. Hence we seek frequentist validation: we show that the posterior of (θ, C) converges to a point mass at (θ 0, C 0 ). 9 / 22
10 Lenk (2003) Let I be a compact interval on the real line and Z a Gaussian process. Lenk (2003) considers the semi parametric density model f (x) = f θ(x) e R Z (x) f I θ(s) e Z (s) ds for f θ (x) member of the exponential family. In the Loève expansion of Z (x), the orthogonal basis is chosen so that the sample paths integrate to zero. Further assumption for identification: the orthogonal basis does not contain any of the canonical statistics of f θ (x). Estimation based on truncation of the series expansion or by imputation of the Gaussian process at a fixed grid of points, see Tokdar (2007). 10 / 22
11 Bounded W (x) Building upon Lenk (2003), we keep working with Gaussian processes and consider f θ,w (x) = f θ(x) W (x) RI f, W (x) = Ψ(Z (x)) θ(s) W (s)ds where Ψ(u) is a cdf having a smooth unimodal symmetric density ψ(u) on the real line. With an additional condition on Ψ(u), we can show that W (x) preserves the asymptotic properties of log Gaussian process prior. On the other hand, with W (x) 1, Walker (2011) describes a latent model which can deal with the intractable normalizing constant. It is based on X k=0 n + k 1 k!»z k «n 1 f θ (s) (1 W (s)) ds = R. W (s) fθ (s)ds 11 / 22
12 Link function Ψ(u) Lipschitz condition on log Ψ(u): ψ(u)/ψ(u) m uniformly on R satisfied by the standard Laplace cdf, standard logistic cdf or standard Cauchy cdf, but not by the standard normal cdf. For fixed θ, write p z = f θ,ψ(z). It can be shown that, when z 1 z 2 < ɛ, ( h(p z1, p z2 ) mɛ e mɛ/2 K (p z1, p z2 ) m 2 ɛ 2 e mɛ (1 + mɛ) Posterior asymptotic results of van der Vaart and van Zanten (2008) carries over to this setting: If Ψ 1 (f 0 /f θ ) is contained in the support of Z, then Π {p z : h(p z, f 0 ) > ɛ X 1,..., X n} 0, F 0 a.s. Results on posterior contraction rate can be also derived. 12 / 22
13 Conditional posterior of W (A) Lipschitz condition on log Ψ(u); (B) f θ (x) is continuous and bounded away from zero; (C) the support of Z contains the space C(I) of continuous densities on I. Theorem 1. Under assumptions (A), (B) and (C), the conditional posterior of W given θ is exponentially consistent at all f 0 C(I), i.e. for any ɛ > 0, π {W : h(f θ,w, f 0 ) > ɛ θ, X 1,..., X n} e dn, F 0 a.s. for some d > 0 as n. As corollary, for fixed θ, the posterior of C(x) = W (x)/ R I f θ(s)w (s)ds consistently estimates the discrepancy f 0 (x)/f θ (x). The exponential convergence to 0 is a by-product of standard techniques for proving posterior consistency. 13 / 22
14 Marginal posterior of θ For given f 0, let θ 0 be the parameter value that minimize R I f 0 log(f 0 /f θ ): θ 0 = arg min Θ R f0 log(f 0 /f θ ) Under some regularity condition on the family f θ and on the prior at θ 0, the posterior accumulates at θ 0 with rate n: π θ θ 0 > M nn 1/2 X 1,..., X n 0, F 0 a.s. see Kleijn and van der Vaart (2012). One of the key regularity conditions on f θ is the existence of an open neighborhood of θ 0 and a square-integrable function m θ0 (x) such that, for all θ 1, θ 2 U, log(f θ1 /f θ2 ) m θ0 θ 1 θ 2, P 0 a.s. For our purposes, we focus on a different local property: there exist α > 0 and an open neighborhood U of θ 0 such that for all θ 1, θ 2 U: log f θ1 /f θ2 θ 1 θ 2 α (D) 14 / 22
15 Marginal posterior of W Assume the regularity conditions on f θ and π(θ) are satisfied for f 0 C(I). Recall the definition of the marginal posterior of W, π(w x 1,..., x n) = R Θ π(w θ, x 1,..., x n)π(dθ x 1,..., x n) Theorem 2. Under assumptions (A), (B), (C) and (D), the marginal posterior of W (x) satisfies as n. π W : h(f θ0,w, f 0 ) > ɛ X 1,..., X n 0, F 0 a.s. The marginal posterior of W is evaluated outside a neighborhood defined in terms of θ 0. Clearly, if π(θ) is degenerate at θ 0, the result follows directly from Theorem 1. Hint of the proof: sufficient to consider the posterior when the prior is restricted on θ θ 0 M nn 1/2. We then manipulate numerator and denominator by using (D) together with the inequalities R exp{ log(f θ0 /f θ ) } R f I θ 0 (x)w (x)dx f I θ(x)w (x)dx exp{ log(f θ 0 /f θ ) } 15 / 22
16 Marginal posterior of C Recall the definition C(x) = C θ,w (x) = W (x) R W (s) fθ (s) ds, which is designed to estimate C 0 (x) = f 0 (x)/f θ0 (x). Corollary. Under the hypotheses of Theorem 2, as n, π R I C C 0 > ɛ X 1,..., X n 0, F 0 a.s. Together with π θ θ 0 > M nn 1/2 X 1,..., X n 0, we conclude that the posterior of (θ, C) converges to (θ 0, C 0 ). Hint of the proof: Theorem 2 implies that R I C θ 0,W C 0 goes to 0. By triangular inequality, it is sufficient to show that, uniformly over θ θ 0 Mn 1/2, R I C θ,w C θ0,w 0. This we show by using (D). 16 / 22
17 Illustration 1 n = 500 observations from f 0 (x) = 2(1 x); f θ (x) = θe θx /(1 e θ ) with improper prior π(θ) 1/θ. C(x) x Figure: Estimated (bold) and true (dashed) functions of C 0 (x) 17 / 22
18 Parametric Bayes update Simulation from the posterior of θ. The minimum K-L parameter value is θ 0 = Density theta Figure: Posterior distribution of θ with parametric Bayes update. Posterior mean / 22
19 Full Bayes update Using the proper conditional posterior π(θ W, x 1,..., x n) π(θ) Q i f θ,w (x i ) Density theta Figure: Posterior distribution of θ with formal Bayes update. Posterior mean / 22
20 Illustration 2 n = 500 observations from f 0 (x) = 2x; f θ (x) = θx θ 1 with 0 < θ < 1 and uniform prior for θ. θ 0 = theta theta Figure: Posterior distributions of θ with parametric Bayes update (top) and formal Bayes update (bottom). 20 / 22
21 Discussion Both the proposed update and formal Bayes update seem to provide a suitable estimate for C, which is not surprising given its flexibility of estimating C 0 with alternative values of θ. Yet the posterior for θ is more accurate for the parametric Bayesian posterior. It shows that semiparametric models need to be thought about carefully: the parametric part needs to define which θ has been targeted. Future work will deal with f θ with unbounded support. Extension to posterior contraction rates. Connections with asymptotic properties of empirical Bayes. Use the C(x) function for model selection 21 / 22
22 References De Blasi & Walker (2012). Bayesian estimation of the discrepancy with misspecified parametric models. Tech. Rep., submitted. Hjort & Glad (1995). Nonparametric density estimation with a parametric start. Ann. Statist. 23, Kleijn & van der Vaart (2012). The Bernstein-Von Mises theorem under misspecification. Electron. J. Stat. 6, Lenk (1988). The logistic normal distribution for Bayesian, nonparametric, predictive densities. J. Amer. Statist. Assoc. 83, Rousseau (2008). Approximating interval hypothesis: p-values and Bayes factors. In Bayesian Statistics 8, Tokdar (2007). Towards a faster implementation of density estimation with logistic Gaussian process priors. J. Comp. Graph. Statist. 16, van der Vaart & van Zanten (2008). Rates of contraction of posterior distributions based on Gaussian process priors. Ann. Statist. 36, Verdinelli & Wasserman (1998). Bayesuan goodness of fit testing using infinite dimensional exponential families. Ann. Statist. 26, Walker (2011). Posterior sampling when the normalizing constant is unknown. Comm. Statist. Simulation Comput. 40, / 22
ICES REPORT Model Misspecification and Plausibility
ICES REPORT 14-21 August 2014 Model Misspecification and Plausibility by Kathryn Farrell and J. Tinsley Odena The Institute for Computational Engineering and Sciences The University of Texas at Austin
More informationAsymptotics for posterior hazards
Asymptotics for posterior hazards Igor Prünster University of Turin, Collegio Carlo Alberto and ICER Joint work with P. Di Biasi and G. Peccati Workshop on Limit Theorems and Applications Paris, 16th January
More informationNonparametric Bayesian Methods - Lecture I
Nonparametric Bayesian Methods - Lecture I Harry van Zanten Korteweg-de Vries Institute for Mathematics CRiSM Masterclass, April 4-6, 2016 Overview of the lectures I Intro to nonparametric Bayesian statistics
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables
More informationBayesian Sparse Linear Regression with Unknown Symmetric Error
Bayesian Sparse Linear Regression with Unknown Symmetric Error Minwoo Chae 1 Joint work with Lizhen Lin 2 David B. Dunson 3 1 Department of Mathematics, The University of Texas at Austin 2 Department of
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationBayesian Nonparametrics
Bayesian Nonparametrics Peter Orbanz Columbia University PARAMETERS AND PATTERNS Parameters P(X θ) = Probability[data pattern] 3 2 1 0 1 2 3 5 0 5 Inference idea data = underlying pattern + independent
More informationInconsistency of Bayesian inference when the model is wrong, and how to repair it
Inconsistency of Bayesian inference when the model is wrong, and how to repair it Peter Grünwald Thijs van Ommen Centrum Wiskunde & Informatica, Amsterdam Universiteit Leiden June 3, 2015 Outline 1 Introduction
More informationSTAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01
STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 Nasser Sadeghkhani a.sadeghkhani@queensu.ca There are two main schools to statistical inference: 1-frequentist
More informationBayesian Regularization
Bayesian Regularization Aad van der Vaart Vrije Universiteit Amsterdam International Congress of Mathematicians Hyderabad, August 2010 Contents Introduction Abstract result Gaussian process priors Co-authors
More informationSemiparametric posterior limits
Statistics Department, Seoul National University, Korea, 2012 Semiparametric posterior limits for regular and some irregular problems Bas Kleijn, KdV Institute, University of Amsterdam Based on collaborations
More informationStat 451 Lecture Notes Numerical Integration
Stat 451 Lecture Notes 03 12 Numerical Integration Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 5 in Givens & Hoeting, and Chapters 4 & 18 of Lange 2 Updated: February 11, 2016 1 / 29
More informationIntroduction to Probabilistic Machine Learning
Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning
More informationBayesian Consistency for Markov Models
Bayesian Consistency for Markov Models Isadora Antoniano-Villalobos Bocconi University, Milan, Italy. Stephen G. Walker University of Texas at Austin, USA. Abstract We consider sufficient conditions for
More informationBayesian nonparametrics
Bayesian nonparametrics Posterior contraction and limiting shape Ismaël Castillo LPMA Université Paris VI Berlin, September 2016 Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September
More informationLatent factor density regression models
Biometrika (2010), 97, 1, pp. 1 7 C 2010 Biometrika Trust Printed in Great Britain Advance Access publication on 31 July 2010 Latent factor density regression models BY A. BHATTACHARYA, D. PATI, D.B. DUNSON
More informationBayesian asymptotics with misspecified models
ISSN 2279-9362 Bayesian asymptotics with misspecified models Pierpaolo De Blasi Stephen G. Walker No. 270 October 2012 www.carloalberto.org/research/working-papers 2012 by Pierpaolo De Blasi and Stephen
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More informationSTA 4273H: Sta-s-cal Machine Learning
STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our
More informationCan we do statistical inference in a non-asymptotic way? 1
Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.
More informationDISCUSSION: COVERAGE OF BAYESIAN CREDIBLE SETS. By Subhashis Ghosal North Carolina State University
Submitted to the Annals of Statistics DISCUSSION: COVERAGE OF BAYESIAN CREDIBLE SETS By Subhashis Ghosal North Carolina State University First I like to congratulate the authors Botond Szabó, Aad van der
More informationSTA414/2104 Statistical Methods for Machine Learning II
STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements
More informationInvariant HPD credible sets and MAP estimators
Bayesian Analysis (007), Number 4, pp. 681 69 Invariant HPD credible sets and MAP estimators Pierre Druilhet and Jean-Michel Marin Abstract. MAP estimators and HPD credible sets are often criticized in
More informationA Very Brief Summary of Bayesian Inference, and Examples
A Very Brief Summary of Bayesian Inference, and Examples Trinity Term 009 Prof Gesine Reinert Our starting point are data x = x 1, x,, x n, which we view as realisations of random variables X 1, X,, X
More informationStochastic Spectral Approaches to Bayesian Inference
Stochastic Spectral Approaches to Bayesian Inference Prof. Nathan L. Gibson Department of Mathematics Applied Mathematics and Computation Seminar March 4, 2011 Prof. Gibson (OSU) Spectral Approaches to
More informationNONPARAMETRIC BAYESIAN INFERENCE ON PLANAR SHAPES
NONPARAMETRIC BAYESIAN INFERENCE ON PLANAR SHAPES Author: Abhishek Bhattacharya Coauthor: David Dunson Department of Statistical Science, Duke University 7 th Workshop on Bayesian Nonparametrics Collegio
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationComputer Emulation With Density Estimation
Computer Emulation With Density Estimation Jake Coleman, Robert Wolpert May 8, 2017 Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, 2017 1 / 17 Computer Emulation Motivation Expensive
More informationPattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite
More informationAdaptive Bayesian procedures using random series priors
Adaptive Bayesian procedures using random series priors WEINING SHEN 1 and SUBHASHIS GHOSAL 2 1 Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030 2 Department
More informationParametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory
Statistical Inference Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory IP, José Bioucas Dias, IST, 2007
More informationReview. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda
Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with
More informationLECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)
LECTURE 5 NOTES 1. Bayesian point estimators. In the conventional (frequentist) approach to statistical inference, the parameter θ Θ is considered a fixed quantity. In the Bayesian approach, it is considered
More informationFoundations of Nonparametric Bayesian Methods
1 / 27 Foundations of Nonparametric Bayesian Methods Part II: Models on the Simplex Peter Orbanz http://mlg.eng.cam.ac.uk/porbanz/npb-tutorial.html 2 / 27 Tutorial Overview Part I: Basics Part II: Models
More informationConsensus and Distributed Inference Rates Using Network Divergence
DIMACS August 2017 1 / 26 Consensus and Distributed Inference Rates Using Network Divergence Anand D. Department of Electrical and Computer Engineering, The State University of New Jersey August 23, 2017
More informationAsymptotic properties of posterior distributions in nonparametric regression with non-gaussian errors
Ann Inst Stat Math (29) 61:835 859 DOI 1.17/s1463-8-168-2 Asymptotic properties of posterior distributions in nonparametric regression with non-gaussian errors Taeryon Choi Received: 1 January 26 / Revised:
More informationA Very Brief Summary of Statistical Inference, and Examples
A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)
More information12 - Nonparametric Density Estimation
ST 697 Fall 2017 1/49 12 - Nonparametric Density Estimation ST 697 Fall 2017 University of Alabama Density Review ST 697 Fall 2017 2/49 Continuous Random Variables ST 697 Fall 2017 3/49 1.0 0.8 F(x) 0.6
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationMachine Learning using Bayesian Approaches
Machine Learning using Bayesian Approaches Sargur N. Srihari University at Buffalo, State University of New York 1 Outline 1. Progress in ML and PR 2. Fully Bayesian Approach 1. Probability theory Bayes
More informationBayesian Regression with Heteroscedastic Error Density and Parametric Mean Function
Bayesian Regression with Heteroscedastic Error Density and Parametric Mean Function Justinas Pelenis pelenis@ihs.ac.at Institute for Advanced Studies, Vienna May 8, 2013 Abstract This paper considers a
More information13: Variational inference II
10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational
More informationOverall Objective Priors
Overall Objective Priors Jim Berger, Jose Bernardo and Dongchu Sun Duke University, University of Valencia and University of Missouri Recent advances in statistical inference: theory and case studies University
More informationNonparametric Bayesian Uncertainty Quantification
Nonparametric Bayesian Uncertainty Quantification Lecture 1: Introduction to Nonparametric Bayes Aad van der Vaart Universiteit Leiden, Netherlands YES, Eindhoven, January 2017 Contents Introduction Recovery
More informationSTAT 518 Intro Student Presentation
STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible
More informationStatistics: Learning models from data
DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial
More informationBayesian Regression Linear and Logistic Regression
When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we
More informationPart III. A Decision-Theoretic Approach and Bayesian testing
Part III A Decision-Theoretic Approach and Bayesian testing 1 Chapter 10 Bayesian Inference as a Decision Problem The decision-theoretic framework starts with the following situation. We would like to
More informationBayes spaces: use of improper priors and distances between densities
Bayes spaces: use of improper priors and distances between densities J. J. Egozcue 1, V. Pawlowsky-Glahn 2, R. Tolosana-Delgado 1, M. I. Ortego 1 and G. van den Boogaart 3 1 Universidad Politécnica de
More informationCurve Fitting Re-visited, Bishop1.2.5
Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood differentiation p(t x, w, β) = Maximum Likelihood N N ( t n y(x n, w), β 1). (1.61) n=1 As we did in the case of the
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationOn the Choice of Parametric Families of Copulas
On the Choice of Parametric Families of Copulas Radu Craiu Department of Statistics University of Toronto Collaborators: Mariana Craiu, University Politehnica, Bucharest Vienna, July 2008 Outline 1 Brief
More informationSpatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood
Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood Kuangyu Wen & Ximing Wu Texas A&M University Info-Metrics Institute Conference: Recent Innovations in Info-Metrics October
More informationLecture 3: More on regularization. Bayesian vs maximum likelihood learning
Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting
More informationStatistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart
Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart 1 Motivation and Problem In Lecture 1 we briefly saw how histograms
More informationIntroduction An approximated EM algorithm Simulation studies Discussion
1 / 33 An Approximated Expectation-Maximization Algorithm for Analysis of Data with Missing Values Gong Tang Department of Biostatistics, GSPH University of Pittsburgh NISS Workshop on Nonignorable Nonresponse
More informationAsymptotics for posterior hazards
Asymptotics for posterior hazards Pierpaolo De Blasi University of Turin 10th August 2007, BNR Workshop, Isaac Newton Intitute, Cambridge, UK Joint work with Giovanni Peccati (Université Paris VI) and
More informationBayesian model selection for computer model validation via mixture model estimation
Bayesian model selection for computer model validation via mixture model estimation Kaniav Kamary ATER, CNAM Joint work with É. Parent, P. Barbillon, M. Keller and N. Bousquet Outline Computer model validation
More informationIrr. Statistical Methods in Experimental Physics. 2nd Edition. Frederick James. World Scientific. CERN, Switzerland
Frederick James CERN, Switzerland Statistical Methods in Experimental Physics 2nd Edition r i Irr 1- r ri Ibn World Scientific NEW JERSEY LONDON SINGAPORE BEIJING SHANGHAI HONG KONG TAIPEI CHENNAI CONTENTS
More informationStatistica Sinica Preprint No: SS R2
Statistica Sinica Preprint No: SS-2017-0074.R2 Title The semi-parametric Bernstein-von Mises theorem for regression models with symmetric errors Manuscript ID SS-2017-0074.R2 URL http://www.stat.sinica.edu.tw/statistica/
More informationEcon 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines
Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationσ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =
Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,
More informationImproper mixtures and Bayes s theorem
and Bayes s theorem and Han Han Department of Statistics University of Chicago DASF-III conference Toronto, March 2010 Outline Bayes s theorem 1 Bayes s theorem 2 Bayes s theorem Non-Bayesian model: Domain
More informationAsymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands
Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Elizabeth C. Mannshardt-Shamseldin Advisor: Richard L. Smith Duke University Department
More informationPart 1: Expectation Propagation
Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 1: Expectation Propagation Tom Heskes Machine Learning Group, Institute for Computing and Information Sciences Radboud
More informationExchangeability. Peter Orbanz. Columbia University
Exchangeability Peter Orbanz Columbia University PARAMETERS AND PATTERNS Parameters P(X θ) = Probability[data pattern] 3 2 1 0 1 2 3 5 0 5 Inference idea data = underlying pattern + independent noise Peter
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample
More informationASYMPTOTIC PROPERTIES OF PREDICTIVE RECURSION: ROBUSTNESS AND RATE OF CONVERGENCE
ASYMPTOTIC PROPERTIES OF PREDICTIVE RECURSION: ROBUSTNESS AND RATE OF CONVERGENCE By Ryan Martin and Surya T. Tokdar Indiana University-Purdue University Indianapolis and Duke University Here we explore
More informationExpectation Propagation for Approximate Bayesian Inference
Expectation Propagation for Approximate Bayesian Inference José Miguel Hernández Lobato Universidad Autónoma de Madrid, Computer Science Department February 5, 2007 1/ 24 Bayesian Inference Inference Given
More informationHypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33
Hypothesis Testing Econ 690 Purdue University Justin L. Tobias (Purdue) Testing 1 / 33 Outline 1 Basic Testing Framework 2 Testing with HPD intervals 3 Example 4 Savage Dickey Density Ratio 5 Bartlett
More informationTime Series and Dynamic Models
Time Series and Dynamic Models Section 1 Intro to Bayesian Inference Carlos M. Carvalho The University of Texas at Austin 1 Outline 1 1. Foundations of Bayesian Statistics 2. Bayesian Estimation 3. The
More informationSTAT J535: Chapter 5: Classes of Bayesian Priors
STAT J535: Chapter 5: Classes of Bayesian Priors David B. Hitchcock E-Mail: hitchcock@stat.sc.edu Spring 2012 The Bayesian Prior A prior distribution must be specified in a Bayesian analysis. The choice
More informationCovariance function estimation in Gaussian process regression
Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian
More informationLog Gaussian Cox Processes. Chi Group Meeting February 23, 2016
Log Gaussian Cox Processes Chi Group Meeting February 23, 2016 Outline Typical motivating application Introduction to LGCP model Brief overview of inference Applications in my work just getting started
More informationIntroduction to Bayesian Statistics
Bayesian Parameter Estimation Introduction to Bayesian Statistics Harvey Thornburg Center for Computer Research in Music and Acoustics (CCRMA) Department of Music, Stanford University Stanford, California
More informationShort Questions (Do two out of three) 15 points each
Econometrics Short Questions Do two out of three) 5 points each ) Let y = Xβ + u and Z be a set of instruments for X When we estimate β with OLS we project y onto the space spanned by X along a path orthogonal
More informationEmpirical Bayes Deconvolution Problem
Empirical Bayes Deconvolution Problem Bayes Deconvolution Problem Unknown prior density g(θ) gives unobserved realizations Θ 1, Θ 2,..., Θ N iid g(θ) Each Θ k gives observed X k p Θk (x) [p Θ (x) known]
More informationExpectation Propagation Algorithm
Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,
More information5 Introduction to the Theory of Order Statistics and Rank Statistics
5 Introduction to the Theory of Order Statistics and Rank Statistics This section will contain a summary of important definitions and theorems that will be useful for understanding the theory of order
More informationOutline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution
Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model
More informationBayesian Aggregation for Extraordinarily Large Dataset
Bayesian Aggregation for Extraordinarily Large Dataset Guang Cheng 1 Department of Statistics Purdue University www.science.purdue.edu/bigdata Department Seminar Statistics@LSE May 19, 2017 1 A Joint Work
More informationFlexible Regression Modeling using Bayesian Nonparametric Mixtures
Flexible Regression Modeling using Bayesian Nonparametric Mixtures Athanasios Kottas Department of Applied Mathematics and Statistics University of California, Santa Cruz Department of Statistics Brigham
More informationFoundations of Statistical Inference
Foundations of Statistical Inference Jonathan Marchini Department of Statistics University of Oxford MT 2013 Jonathan Marchini (University of Oxford) BS2a MT 2013 1 / 27 Course arrangements Lectures M.2
More informationEconometrics I, Estimation
Econometrics I, Estimation Department of Economics Stanford University September, 2008 Part I Parameter, Estimator, Estimate A parametric is a feature of the population. An estimator is a function of the
More informationThe consequences of misspecifying the random effects distribution when fitting generalized linear mixed models
The consequences of misspecifying the random effects distribution when fitting generalized linear mixed models John M. Neuhaus Charles E. McCulloch Division of Biostatistics University of California, San
More informationis a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications.
Stat 811 Lecture Notes The Wald Consistency Theorem Charles J. Geyer April 9, 01 1 Analyticity Assumptions Let { f θ : θ Θ } be a family of subprobability densities 1 with respect to a measure µ on a measurable
More informationFundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner
Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More informationCSC 2541: Bayesian Methods for Machine Learning
CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 10 Alternatives to Monte Carlo Computation Since about 1990, Markov chain Monte Carlo has been the dominant
More informationTheory of Maximum Likelihood Estimation. Konstantin Kashin
Gov 2001 Section 5: Theory of Maximum Likelihood Estimation Konstantin Kashin February 28, 2013 Outline Introduction Likelihood Examples of MLE Variance of MLE Asymptotic Properties What is Statistical
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationSTAT215: Solutions for Homework 2
STAT25: Solutions for Homework 2 Due: Wednesday, Feb 4. (0 pt) Suppose we take one observation, X, from the discrete distribution, x 2 0 2 Pr(X x θ) ( θ)/4 θ/2 /2 (3 θ)/2 θ/4, 0 θ Find an unbiased estimator
More informationEfficiency of Profile/Partial Likelihood in the Cox Model
Efficiency of Profile/Partial Likelihood in the Cox Model Yuichi Hirose School of Mathematics, Statistics and Operations Research, Victoria University of Wellington, New Zealand Summary. This paper shows
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a
More informationNonparametric Bayes Inference on Manifolds with Applications
Nonparametric Bayes Inference on Manifolds with Applications Abhishek Bhattacharya Indian Statistical Institute Based on the book Nonparametric Statistics On Manifolds With Applications To Shape Spaces
More informationPOSTERIOR CONSISTENCY IN CONDITIONAL DENSITY ESTIMATION BY COVARIATE DEPENDENT MIXTURES. By Andriy Norets and Justinas Pelenis
Resubmitted to Econometric Theory POSTERIOR CONSISTENCY IN CONDITIONAL DENSITY ESTIMATION BY COVARIATE DEPENDENT MIXTURES By Andriy Norets and Justinas Pelenis Princeton University and Institute for Advanced
More information9.520: Class 20. Bayesian Interpretations. Tomaso Poggio and Sayan Mukherjee
9.520: Class 20 Bayesian Interpretations Tomaso Poggio and Sayan Mukherjee Plan Bayesian interpretation of Regularization Bayesian interpretation of the regularizer Bayesian interpretation of quadratic
More information