Variational Bayesian Inference for Parametric and Non-Parametric Regression with Missing Predictor Data
|
|
- Juniper Grant
- 5 years ago
- Views:
Transcription
1 Variational Bayesian Inference for Parametric and Non-Parametric Regression with Missing Predictor Data Christel Faes Co-authors: John Ormerod and Matt Wand May 10, 01
2 Introduction Variational Bayes Bayesian inference For parametric regression: long history (e.g. Box and Tiao, 1973; Gelman, Carlin, Stern and Rubin, 004) For non-parametric regression: e.g. mixed model representations of penalized splines (e.g. Ruppert, Wand and Carroll, 003) For dealing with missingness in data: allows incorporation of standard missing data models (e.g. Little and Rubin, 004; Daniels and Hogan, 008) Dealing with complex models via hierarchical Bayesian model Easy via MCMC, but can be costly in processing time
3 Variational Approximate Bayes inference Fast Bayesian regression analysis Deterministic approach that yields approximate inference Involves approximation of posterior densities by other densities for which inference is more tractable Part of mainstream Computer Science methodology (e.g. Bishop, 006) Speech recognition, document retrieval (e.g. Jordan, 004) Functional magnetic resonance imaging (e.g. Flandin and Penny, 007) Recently, used in statistical problems (e.g. Ormerod & Wand, 010) Cluster analysis for gene expression data (e.g. Teschendorff et al. 005) Finite mixture models (e.g. McGrory & Titterington, 007)
4 Elements of Variational Bayes Bayesian inference on θ Θ is based on the posterior density function p(y, θ) p(θ y) = = p(y θ)p(θ) p(y) p(y) with y the observed data vector For an arbitrary density function q over Θ, the following inequality holds ( { } ) p(y, θ) p(y) p(y; q) = exp q(θ)log dθ q(θ) Equality holds if and only if q(θ) = p(θ y) almost everywhere The gap between log p(y) and log p(y; q) is the KL-divergence For most models, qexact(θ) = p(θ y) intractable
5 Elements of Variational Bayes Variational Bayes relies on product density restrictions on q: M q(θ) = q i (θ i ) i=1 for some partition {θ 1,..., θ M } of θ Tractability at cost of posterior independence assumption The optimal densities (with minimum KL divergence) can be shown to satisfy q i (θ i ) exp {E θi log p(θ, y)} exp {E θi log p(θ i rest)} E θi denotes expectation wrt j i q j (θ j ) rest {y, θ 1,..., θ i 1, θ i+1,..., θ M } p(θ i rest) are the full conditionals (e.g. Robert and Casella, 004))
6 Elements of Variational Bayes The expressions q i (θ i ) exp {E θi log p(θ i rest)} uniquely maximize p(y; q) with respect to the parameters q i (θ i) Method of alternating variable (or coordinate ascent method) used to attain convergence Convergence assessed by monitoring relative increase in log ( p(y; q) ) Typically, convergence achieve within a few hundred iterations
7 Example: Mixture Model ELISA test results for Bluetongue BTV-8 Mixture model of normal distributions On weights of mixture: Dirichlet prior Conjugate priors on parameters of normal distribution
8 Example: Mixture Model Prevalence estimation: Provincie MCMC VA p 1 p 1 Antwerpen 0,771 0,771 Vlaams-Brabant 0,794 0,794 Waals-Brabant 0,760 0,761 West-Vlaanderen 0,914 0,915 Oost-Vlaanderen 0,781 0,781 Henegouwen 0,966 0,966 Luik 0,554 0,550 Limburg 0,533 0,533 Luxemburg 0,93 0,933 Namen 0,894 0,894 Multimodel Parameters: MCMC VA µ q(µ1) µ q(µ) µ q(µ1) µ q(µ) mean se mean se mean se mean se 1,46 0,049 4,578 0, ,45 0,0336 4,577 0,00435
9 Accuracy Estimation - Simulation Simple with Missing Predictor Data Assume the model y i = β 0 + β 1 x i + ɛ i, ɛ i N(0, σ ɛ) Take β 0, β 1 N(0, σ β ) and σ ɛ IG(A ɛ, B ɛ ). Suppose that predictors are susceptible to missingness and assume x i N(µ x, σ x) with hyperpriors µ x N(0, σ µ x ) and σ x IG(A x, B x ) Let R i be the missingness indicators and consider the missingness mechanisms: 1 P(R i = 1) = p: MCAR P(R i = 1) = Φ(φ 0 + φ 1 y i ) for φ 0, φ 1 N(0, σφ ): MAR 3 P(R i = 1) = Φ(φ 0 + φ 1 x i ) for φ 0, φ 1 N(0, σφ ): MNAR Use auxiliary variables a i φ N((Y φ) i, 1) or a i φ N((X φ) i, 1) for the probit regression components
10 s Accuracy Estimation - Simulation predictor MCAR predictor MAR predictor MNAR β y σ ε β y σ ε β y σ ε x mis x obs x mis x obs x mis x obs µ x σ x µ x σ x µ x σ x R R φ R φ Evidence nodes (observed data), hidden nodes (random variables) and directed edges (conditional dependence) Markov blanket is set of children, parents and co-parents of node DAGs aid the algebra for variational Bayes q i (θ i ) exp { E θi log p(θ i rest) } exp { E θi log p(θ i Markov blanket of θ i ) }
11 s Accuracy Estimation - Simulation predictor MCAR predictor MAR predictor MNAR β y σ ε β y σ ε β y σ ε x mis x obs x mis x obs x mis x obs µ x σ x µ x σ x µ x σ x R R φ R φ MAR: MNAR: separation between the two hidden node sets {β, σ ɛ, x mis, µx, σ x } and {a, φ} Bayesian inference for regression parameters not impacted by missing-data mechanism separation does not occur, e.g. Markov blanket of x mis includes {a, φ}
12 Accuracy Estimation - Simulation Approximate Inference via Variational Bayes We impose the product density restrictions: MCAR: q(β, σ ɛ, x mis, µ x, σ x ) = q(β, µ x )q(σ ɛ, σ x )q(x mis ) MAR: MNAR: q(β, σ ɛ, x mis, µx, σ x, φ, a) = q(β, µx, φ)q(σ ɛ, σ x )q(x mis )q(a) q(β, σ ɛ, x mis, µx, σ x, φ, a) = q(β, µx, φ)q(σ ɛ, σ x )q(x mis )q(a) For the MCAR, this leads to optimal densities of the form q (β) = Bivariate normal density q (µ x ) = Univariate normal density q (σɛ ) = Inverse Gamma density q (σx ) = Inverse Gamma density q (x mis ) = product of Univariate Normal densities For MAR and MNAR situation, derivations of optimal densities for φ and a have easy expressions as well
13 Accuracy Estimation - Simulation Approximate Inference via Variational Bayes Iterative scheme for obtaining the parameters in optimal densities: Initialize: µ q(1/σ ε ), µ q(1/σ x ) > 0, µ q(β) ( 1) and Σ q(β) ( ). Cycle: / [ } ] σq(x mis ) 1 µ q(1/σ {µ x ) + µ q(1/σ ε ) q(β 1 ) + (Σ q(β) ) + µ q(φ 1 ) + (Σ q(φ) ) { for i = 1,..., n mis : µ q(xmis,i) σ q(x mis ) [µ q(1/σ x ) µ q(µx ) + µ q(1/σ ε ) y xmis,i µ q(β1 ) } ] (Σ q(β) ) 1 µ q(β0 ) µ q(β 1 ) + µ q(axmis,i ) µ q(φ 1 ) (Σ q(φ) ) 1 µ q(φ0 ) µ q(φ 1 ). update E q(xmis ) (X) and E q(x mis ) (XT X) Σ q(β) { µ q(1/σ ε ) E q(x mis ) (XT X) + 1 σ β I } 1 ; µ q(β) Σ q(β) µ q(1/σ ε ) E q(x mis ) (X)T y ( ) σq(µx ) 1/ nµ q(1/σ x ) + 1/σ ; µ µx q(µx ) σq(µx ) µ q(1/σx )(1T x obs + 1 T µ q(xmis ) ) B q(σ ε ) Bε + 1 y y T E q(xmis ) (X)µ q(β) 1 tr{e q(x mis ) (XT X)(Σ q(β) + µ q(β) µ T q(β) )} B q(σ x ) Bx + 1 ( x obs µ q(µx ) 1 + µ q(xmis ) µ q(µx ) 1 + nσ q(µ x ) + n misσ q(x mis ) ) µ q(1/σ ε ) (Aε + 1 n)/b q(σ ε ) ; µ q(1/σ x ) (Ax + 1 n)/b q(σ x ) Σ q(φ) { E q(xmis ) (XT X) + 1 σ φ I } 1 ; µq(φ) Σ q(φ) E q(xmis ) (X)T µ q(a) µ q(a) E q(xmis ) (X)µ q(φ) + (R 1) (π) 1/ exp{ 1 (E q(x mis ) (X)µ q(φ) ) } Φ((R 1) (E q(xmis ) (X)µ q(φ) )) until the increase in p(y, x obs, R; q) is negligible.
14 Accuracy Estimation - Simulation Accuracy of variational Bayes inference Speedy approximate inference, but no guarentees of achieving acceptable level of Assessment of algorithm via simulated data Compare q (θ) with exact posterior density p(θ y) KL dominated by tail-behavior of densities (Hall, 1987) Use L 1 loss, or integrated absolute error (IAE) of q IAE(q ) = Accuracy measure defined as q (θ) p(θ y) dθ (q ) = 1 (IAE(q )/sup q IAE(q)) = 1 1 IAE(q ) Note that 0 (q ) 1 MCMC with large samples used to approximate p(θ y)
15 Variational Bayes Accuracy Estimation - Simulation Simulation Simple with predictor MNAR (φ 0 =.95, φ 1 =.95) σ ε = 0.05 (φ 0 =.95, φ 1 =.95) σ ε = 0. (φ 0 =.95, φ 1 =.95) σ ε = (φ 0 = 0.85, φ 1 = 1.05) σ ε = 0.05 (φ 0 = 0.85, φ 1 = 1.05) σ ε = 0. (φ 0 = 0.85, φ 1 = 1.05) σ ε = β 0 β 1 σ µ x ε σ φ 0 φ 1 x m1x mx m3x m4 β 0 β 1 x σ µ x ε σ φ 0 φ 1 x m1x mx m3x m4 β 0 β 1 x σ µ x ε σ φ 0 φ 1 x m1x mx m3x m4 x High for regression part {β 0, β 1, σ ɛ}, but drop when large amount of missing data and data noisy Accuracy of missing covariates {x i } is high in all situations Poor performance for missing mechanism parameters (φ and a): location fine, deflated spread
16 Accuracy Estimation - Simulation Simulation Simple with predictor MNAR successive values of log{p(y,x obs,r;q)} β 0 β 1 σ ε log{p(y,xobs,r;q)} VB MCMC 95% % % µ x σ x φ 0 φ % % % % x mis,1 x mis, x mis,3 x mis, % % % %
17 Accuracy Estimation - Simulation Simulation Simple with predictor MNAR Credible interval coverage: mis ness MCAR low miss. MCAR high miss. MNAR low miss. MNAR high miss. σ ε β β σε µ x σx φ φ x mis, x mis, x mis,
18 Accuracy Estimation - Simulation Simulation Simple with predictor MNAR Speed comparisons: MAR models MNAR models MCMC (5.89,5.84) (33.8,33.9) var. Bayes (0.0849,0.0850) (0.705,0.790) ratio (76.6,78.7) (59.5,67.8) General conclusion: Fast alternative method Excellent for regression parameters
19 Illustration with Missing Predictor Data Replace linear mean function by smooth flexible function f (x) Use penalized splines with mixed model representation: f (x) = β 0 + β 1 x + K u k z k (x) k=1 with u j N(0, σ u) and {z l (.); 1 k K} set of spline basis functions Different spline functions possible, e.g. O Sullivan penalized splines (Wand and Ormerod, 008) Penalized linear splines
20 Illustration with Missing Predictor Data predictor MCAR predictor MAR predictor MNAR σ u (β,u) y σ ε σ u (β,u) y σ ε σ u (β,u) y σ ε x mis x obs x mis x obs x mis x obs µ x σ x µ x σ x µ x σ x R R φ R φ Nonparametric extension enlarges DAG Nodes (β, u) not broken up into separate nodes, because expressions easier if kept together Variational Bayes algorithms: modification of previous algorithm Leads to non-standard forms of optimal densities
21 Illustration with Missing Predictor Data Let C x = (1, x, z 1 (x),..., z K (x)) The optimal densities for x mis,i take the form ( q (x mis,i ) exp 1 ) Cx mis,i mis,i C Tx mis,i with mis,i correspond to each entry of x mis except x mis,i Does not have closed-form integral, numerical integration required to obtain normalizing factors We take basic quadrature approach, with same quadrature grid (g 1,..., g M ) over all 1 i n mis M z 1 (x)dx w j z 1 (g j ) k=1 with (w 1,..., w M ) the quadrature weights Expressions for mean and variance can be derived
22 Data examples Variational Bayes Illustration Example 1: simulated data, n = 300 y i N(f (x i ), σ ɛ with f (x) = sin(4πx) and 0% of the x i s removed completely at random Penalized splines with truncated linear spline basis with 30 knots Knots equally spaced over range of observed x i s MCMC: burning of size 0, 000, thinning factor of 0, post-burnin of 00, 000 Example : simulated data, same setting, but missingness according to R i Bernoulli(Φ(φ 0 + φ 1 x 1 )) (φ 0, φ 1 ) = (3, 3) Example 3: ozone data Daily maximum one-hour-average ozone level versus daily temperature at El Monte, California n = 361, with 137 of the predictor values are missing Predictors and errors approximately normal and homoscedastic
23 Results Example 1 Variational Bayes Illustration log{p(y,x obs;q)} successive values of log{p(y,x obs;q)} σ ε 93% µ x 85% x mis,7 98% σ x 9% x mis,36 97% σ u 6% x mis,65 98% Good to excellent of variational Bayes (except for σ u) Multimodality of posteriors of variational Bayes approximations
24 Illustration Results Examples Nonparametric MCAR Example x y MCMC VB Nonparametric MNAR Example x y Ozone Data Example daily temperature (degrees Fahrenheit) Maximum one hour average Ozone level Good agreement between variational Bayes and MCMC in fitted functions Time needed: 75 seconds for variational Bayes, 15.5 hours for MCMC
25 Examples 4 Variational Bayes Illustration European study of Antibiotic Use Defined Daily Dose per country per year Turkey production possible explanatory covariate observed data variational approximation DDD DDD Turkey Production Turkey Production
26 Conclusions Variational Bayes Variational Bayes inference achieves good to excellent for main parameters of interest Poor is realized for the missing data mechanism parameters Better maybe achieved with a more elaborate variational scheme in situations where they are of interest Variational Bayes approximates multimodal posterior densities with high degree of Speed-up in the order of several hundreds
27 References Bishop, C.M. (006). Pattern Recognition and Machine Learning. New York: Springer. Box, G.P. and Tiao, G.C. (1973). Bayesian Inference in Statistical Analysis. Reading, Massachusetts: Addison Wesley. Daniels, M.J. and Hogan, J.W. (008). Missing Data in Longitudinal Studies: Strategies for Bayesian Modeling and Sensitivity Analysis. Boca Raton, Florida: Chapman and Hall (CRC Press). Faes, C., Ormerod, J.T. and Wand, M.P. (011). Variational Bayesian inference for parametric and nonparametric regression with missing data. Journal of the American Statistical Association, 106, Flandin, G. and Penny, W.D. (007). Bayesian fmri data analysis with sparse spatial basis function priors. NeuroImage, 34, Gelman, A., Carlin, J.B., Stern, H.S. and Rubin, D.B. (004). Bayesian Data Analysis. Boca Raton, Florida: Chapman and Hall. Jordan, M.I. (004). Graphical models. Statistical Science, 19, McGrory, C.A. and Titterington, D.M. (007). Variational approximations in Bayesian model selection for finite mixture distributions. Computational Statistics and Data Analysis, 51, Ormerod, J.T. and Wand, M.P. (010). Explaining variational approximations. The American Statistician, 64, Robert, C.P. and Casella, G. (004). Monte Carlo Statistical Methods, Second Edition. New York: Springer-Verlag. Ruppert, D., Wand, M.P. and Carroll, R.J. (003). Semiparametric Regression. New York: Cambridge University Press. Teschendorff, A.E., Wang, Y., Barbosa-Morais, N.L., Brenton, J.D. and Caldas C. (005). A variational Bayesian mixture modelling framework for cluster analysis of gene-expression data. Bioinformatics, 1, Wand, M.P. and Ormerod, J.T. (008). On O Sullivan penalised splines and semiparametric regression. Australian and New Zealand Journal of Statistics, 50,
Variational Bayesian Inference for Parametric and Non-Parametric Regression with Missing Predictor Data
for Parametric and Non-Parametric Regression with Missing Predictor Data August 23, 2010 Introduction Bayesian inference For parametric regression: long history (e.g. Box and Tiao, 1973; Gelman, Carlin,
More informationVariational Bayesian Inference for Parametric and Nonparametric Regression with Missing Data
Variational Bayesian Inference for Parametric and Nonparametric Regression with Missing Data BY C. FAES 1, J.T. ORMEROD & M.P. WAND 3 1 Interuniversity Institute for Biostatistics and Statistical Bioinformatics,
More informationBayesian Inference for the Multivariate Normal
Bayesian Inference for the Multivariate Normal Will Penny Wellcome Trust Centre for Neuroimaging, University College, London WC1N 3BG, UK. November 28, 2014 Abstract Bayesian inference for the multivariate
More informationBayesian Inference Course, WTCN, UCL, March 2013
Bayesian Course, WTCN, UCL, March 2013 Shannon (1948) asked how much information is received when we observe a specific value of the variable x? If an unlikely event occurs then one would expect the information
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationMarkov Chain Monte Carlo methods
Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As
More informationBayesian Inference. Chapter 1. Introduction and basic concepts
Bayesian Inference Chapter 1. Introduction and basic concepts M. Concepción Ausín Department of Statistics Universidad Carlos III de Madrid Master in Business Administration and Quantitative Methods Master
More informationBagging During Markov Chain Monte Carlo for Smoother Predictions
Bagging During Markov Chain Monte Carlo for Smoother Predictions Herbert K. H. Lee University of California, Santa Cruz Abstract: Making good predictions from noisy data is a challenging problem. Methods
More informationIntegrated Non-Factorized Variational Inference
Integrated Non-Factorized Variational Inference Shaobo Han, Xuejun Liao and Lawrence Carin Duke University February 27, 2014 S. Han et al. Integrated Non-Factorized Variational Inference February 27, 2014
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationMachine Learning Summer School
Machine Learning Summer School Lecture 3: Learning parameters and structure Zoubin Ghahramani zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Department of Engineering University of Cambridge,
More informationLecture 6: Graphical Models: Learning
Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate
More informationExpectation Propagation Algorithm
Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,
More informationRecent Advances in Bayesian Inference Techniques
Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian
More informationA Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness
A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A. Linero and M. Daniels UF, UT-Austin SRC 2014, Galveston, TX 1 Background 2 Working model
More informationBayesian data analysis in practice: Three simple examples
Bayesian data analysis in practice: Three simple examples Martin P. Tingley Introduction These notes cover three examples I presented at Climatea on 5 October 0. Matlab code is available by request to
More informationSampling Methods (11/30/04)
CS281A/Stat241A: Statistical Learning Theory Sampling Methods (11/30/04) Lecturer: Michael I. Jordan Scribe: Jaspal S. Sandhu 1 Gibbs Sampling Figure 1: Undirected and directed graphs, respectively, with
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationModel Comparison. Course on Bayesian Inference, WTCN, UCL, February Model Comparison. Bayes rule for models. Linear Models. AIC and BIC.
Course on Bayesian Inference, WTCN, UCL, February 2013 A prior distribution over model space p(m) (or hypothesis space ) can be updated to a posterior distribution after observing data y. This is implemented
More informationSparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference
Sparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference Shunsuke Horii Waseda University s.horii@aoni.waseda.jp Abstract In this paper, we present a hierarchical model which
More informationLarge-scale Ordinal Collaborative Filtering
Large-scale Ordinal Collaborative Filtering Ulrich Paquet, Blaise Thomson, and Ole Winther Microsoft Research Cambridge, University of Cambridge, Technical University of Denmark ulripa@microsoft.com,brmt2@cam.ac.uk,owi@imm.dtu.dk
More informationStatistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling
1 / 27 Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling Melih Kandemir Özyeğin University, İstanbul, Turkey 2 / 27 Monte Carlo Integration The big question : Evaluate E p(z) [f(z)]
More informationVariational Learning : From exponential families to multilinear systems
Variational Learning : From exponential families to multilinear systems Ananth Ranganathan th February 005 Abstract This note aims to give a general overview of variational inference on graphical models.
More informationMachine Learning Techniques for Computer Vision
Machine Learning Techniques for Computer Vision Part 2: Unsupervised Learning Microsoft Research Cambridge x 3 1 0.5 0.2 0 0.5 0.3 0 0.5 1 ECCV 2004, Prague x 2 x 1 Overview of Part 2 Mixture models EM
More informationThe Bayesian approach to inverse problems
The Bayesian approach to inverse problems Youssef Marzouk Department of Aeronautics and Astronautics Center for Computational Engineering Massachusetts Institute of Technology ymarz@mit.edu, http://uqgroup.mit.edu
More informationSTAT 518 Intro Student Presentation
STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible
More informationUnsupervised Learning
Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College
More informationUncertainty Quantification for Inverse Problems. November 7, 2011
Uncertainty Quantification for Inverse Problems November 7, 2011 Outline UQ and inverse problems Review: least-squares Review: Gaussian Bayesian linear model Parametric reductions for IP Bias, variance
More informationLecture 2: From Linear Regression to Kalman Filter and Beyond
Lecture 2: From Linear Regression to Kalman Filter and Beyond January 18, 2017 Contents 1 Batch and Recursive Estimation 2 Towards Bayesian Filtering 3 Kalman Filter and Bayesian Filtering and Smoothing
More informationBayesian methods for missing data: part 1. Key Concepts. Nicky Best and Alexina Mason. Imperial College London
Bayesian methods for missing data: part 1 Key Concepts Nicky Best and Alexina Mason Imperial College London BAYES 2013, May 21-23, Erasmus University Rotterdam Missing Data: Part 1 BAYES2013 1 / 68 Outline
More informationA short introduction to INLA and R-INLA
A short introduction to INLA and R-INLA Integrated Nested Laplace Approximation Thomas Opitz, BioSP, INRA Avignon Workshop: Theory and practice of INLA and SPDE November 7, 2018 2/21 Plan for this talk
More informationVariational Message Passing. By John Winn, Christopher M. Bishop Presented by Andy Miller
Variational Message Passing By John Winn, Christopher M. Bishop Presented by Andy Miller Overview Background Variational Inference Conjugate-Exponential Models Variational Message Passing Messages Univariate
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationBayesian model selection: methodology, computation and applications
Bayesian model selection: methodology, computation and applications David Nott Department of Statistics and Applied Probability National University of Singapore Statistical Genomics Summer School Program
More informationApproximate Bayesian inference
Approximate Bayesian inference Variational and Monte Carlo methods Christian A. Naesseth 1 Exchange rate data 0 20 40 60 80 100 120 Month Image data 2 1 Bayesian inference 2 Variational inference 3 Stochastic
More informationLecture 13 : Variational Inference: Mean Field Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1
More informationeqr094: Hierarchical MCMC for Bayesian System Reliability
eqr094: Hierarchical MCMC for Bayesian System Reliability Alyson G. Wilson Statistical Sciences Group, Los Alamos National Laboratory P.O. Box 1663, MS F600 Los Alamos, NM 87545 USA Phone: 505-667-9167
More informationBayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units
Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Sahar Z Zangeneh Robert W. Keener Roderick J.A. Little Abstract In Probability proportional
More informationBayesian linear regression
Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding
More informationan introduction to bayesian inference
with an application to network analysis http://jakehofman.com january 13, 2010 motivation would like models that: provide predictive and explanatory power are complex enough to describe observed phenomena
More informationWill Penny. SPM short course for M/EEG, London 2013
SPM short course for M/EEG, London 2013 Ten Simple Rules Stephan et al. Neuroimage, 2010 Model Structure Bayes rule for models A prior distribution over model space p(m) (or hypothesis space ) can be updated
More informationp L yi z n m x N n xi
y i z n x n N x i Overview Directed and undirected graphs Conditional independence Exact inference Latent variables and EM Variational inference Books statistical perspective Graphical Models, S. Lauritzen
More informationHierarchical Dirichlet Processes with Random Effects
Hierarchical Dirichlet Processes with Random Effects Seyoung Kim Department of Computer Science University of California, Irvine Irvine, CA 92697-34 sykim@ics.uci.edu Padhraic Smyth Department of Computer
More informationBayesian Nonparametric Modelling with the Dirichlet Process Regression Smoother
Bayesian Nonparametric Modelling with the Dirichlet Process Regression Smoother J. E. Griffin and M. F. J. Steel University of Warwick Bayesian Nonparametric Modelling with the Dirichlet Process Regression
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September
More informationWill Penny. SPM for MEG/EEG, 15th May 2012
SPM for MEG/EEG, 15th May 2012 A prior distribution over model space p(m) (or hypothesis space ) can be updated to a posterior distribution after observing data y. This is implemented using Bayes rule
More informationIEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm
IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.
More informationLearning Bayesian network : Given structure and completely observed data
Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution
More informationLecture 1b: Linear Models for Regression
Lecture 1b: Linear Models for Regression Cédric Archambeau Centre for Computational Statistics and Machine Learning Department of Computer Science University College London c.archambeau@cs.ucl.ac.uk Advanced
More informationPrinciples of Bayesian Inference
Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters
More informationWill Penny. DCM short course, Paris 2012
DCM short course, Paris 2012 Ten Simple Rules Stephan et al. Neuroimage, 2010 Model Structure Bayes rule for models A prior distribution over model space p(m) (or hypothesis space ) can be updated to a
More informationNonparametric Bayesian Methods - Lecture I
Nonparametric Bayesian Methods - Lecture I Harry van Zanten Korteweg-de Vries Institute for Mathematics CRiSM Masterclass, April 4-6, 2016 Overview of the lectures I Intro to nonparametric Bayesian statistics
More informationStat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC
Stat 451 Lecture Notes 07 12 Markov Chain Monte Carlo Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapters 8 9 in Givens & Hoeting, Chapters 25 27 in Lange 2 Updated: April 4, 2016 1 / 42 Outline
More informationVariational Principal Components
Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings
More informationBAYESIAN ESTIMATION OF LINEAR STATISTICAL MODEL BIAS
BAYESIAN ESTIMATION OF LINEAR STATISTICAL MODEL BIAS Andrew A. Neath 1 and Joseph E. Cavanaugh 1 Department of Mathematics and Statistics, Southern Illinois University, Edwardsville, Illinois 606, USA
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning
More informationParticle Filtering a brief introductory tutorial. Frank Wood Gatsby, August 2007
Particle Filtering a brief introductory tutorial Frank Wood Gatsby, August 2007 Problem: Target Tracking A ballistic projectile has been launched in our direction and may or may not land near enough to
More informationBayesian Inference. Chapter 9. Linear models and regression
Bayesian Inference Chapter 9. Linear models and regression M. Concepcion Ausin Universidad Carlos III de Madrid Master in Business Administration and Quantitative Methods Master in Mathematical Engineering
More informationPractical Bayesian Optimization of Machine Learning. Learning Algorithms
Practical Bayesian Optimization of Machine Learning Algorithms CS 294 University of California, Berkeley Tuesday, April 20, 2016 Motivation Machine Learning Algorithms (MLA s) have hyperparameters that
More informationLecture 7 and 8: Markov Chain Monte Carlo
Lecture 7 and 8: Markov Chain Monte Carlo 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/ Ghahramani
More informationBayesian Inference for DSGE Models. Lawrence J. Christiano
Bayesian Inference for DSGE Models Lawrence J. Christiano Outline State space-observer form. convenient for model estimation and many other things. Bayesian inference Bayes rule. Monte Carlo integation.
More informationLearning Energy-Based Models of High-Dimensional Data
Learning Energy-Based Models of High-Dimensional Data Geoffrey Hinton Max Welling Yee-Whye Teh Simon Osindero www.cs.toronto.edu/~hinton/energybasedmodelsweb.htm Discovering causal structure as a goal
More informationProbabilistic Graphical Networks: Definitions and Basic Results
This document gives a cursory overview of Probabilistic Graphical Networks. The material has been gleaned from different sources. I make no claim to original authorship of this material. Bayesian Graphical
More informationBayesian Inference: Concept and Practice
Inference: Concept and Practice fundamentals Johan A. Elkink School of Politics & International Relations University College Dublin 5 June 2017 1 2 3 Bayes theorem In order to estimate the parameters of
More informationDiscussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs
Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs Michael J. Daniels and Chenguang Wang Jan. 18, 2009 First, we would like to thank Joe and Geert for a carefully
More informationCOS513 LECTURE 8 STATISTICAL CONCEPTS
COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions
More informationWeb-Supplement for: Accurate Logistic Variational Message Passing: Algebraic and Numerical Details
Web-Supplement for: Accurate Logistic Variational Message Passing: Algebraic and Numerical Details BY TUI H. NOLAN AND MATT P. WAND School of Mathematical and Physical Sciences, University of Technology
More informationBayesian Hidden Markov Models and Extensions
Bayesian Hidden Markov Models and Extensions Zoubin Ghahramani Department of Engineering University of Cambridge joint work with Matt Beal, Jurgen van Gael, Yunus Saatci, Tom Stepleton, Yee Whye Teh Modeling
More informationRonald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California
Texts in Statistical Science Bayesian Ideas and Data Analysis An Introduction for Scientists and Statisticians Ronald Christensen University of New Mexico Albuquerque, New Mexico Wesley Johnson University
More informationProbabilistic Machine Learning
Probabilistic Machine Learning Bayesian Nets, MCMC, and more Marek Petrik 4/18/2017 Based on: P. Murphy, K. (2012). Machine Learning: A Probabilistic Perspective. Chapter 10. Conditional Independence Independent
More informationTwo Useful Bounds for Variational Inference
Two Useful Bounds for Variational Inference John Paisley Department of Computer Science Princeton University, Princeton, NJ jpaisley@princeton.edu Abstract We review and derive two lower bounds on the
More informationApproximate Inference Part 1 of 2
Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ Bayesian paradigm Consistent use of probability theory
More informationHeriot-Watt University
Heriot-Watt University Heriot-Watt University Research Gateway Prediction of settlement delay in critical illness insurance claims by using the generalized beta of the second kind distribution Dodd, Erengul;
More informationExpectation Propagation for Approximate Bayesian Inference
Expectation Propagation for Approximate Bayesian Inference José Miguel Hernández Lobato Universidad Autónoma de Madrid, Computer Science Department February 5, 2007 1/ 24 Bayesian Inference Inference Given
More informationA graph contains a set of nodes (vertices) connected by links (edges or arcs)
BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,
More informationBayes: All uncertainty is described using probability.
Bayes: All uncertainty is described using probability. Let w be the data and θ be any unknown quantities. Likelihood. The probability model π(w θ) has θ fixed and w varying. The likelihood L(θ; w) is π(w
More informationApproximate Inference Part 1 of 2
Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ 1 Bayesian paradigm Consistent use of probability theory
More informationPrinciples of Bayesian Inference
Principles of Bayesian Inference Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department
More informationLecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions
DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K
More informationLecture 5: Spatial probit models. James P. LeSage University of Toledo Department of Economics Toledo, OH
Lecture 5: Spatial probit models James P. LeSage University of Toledo Department of Economics Toledo, OH 43606 jlesage@spatial-econometrics.com March 2004 1 A Bayesian spatial probit model with individual
More informationSequential Monte Carlo and Particle Filtering. Frank Wood Gatsby, November 2007
Sequential Monte Carlo and Particle Filtering Frank Wood Gatsby, November 2007 Importance Sampling Recall: Let s say that we want to compute some expectation (integral) E p [f] = p(x)f(x)dx and we remember
More informationWill Penny. SPM short course for M/EEG, London 2015
SPM short course for M/EEG, London 2015 Ten Simple Rules Stephan et al. Neuroimage, 2010 Model Structure The model evidence is given by integrating out the dependence on model parameters p(y m) = p(y,
More informationBayesian Networks BY: MOHAMAD ALSABBAGH
Bayesian Networks BY: MOHAMAD ALSABBAGH Outlines Introduction Bayes Rule Bayesian Networks (BN) Representation Size of a Bayesian Network Inference via BN BN Learning Dynamic BN Introduction Conditional
More informationPrinciples of Bayesian Inference
Principles of Bayesian Inference Sudipto Banerjee and Andrew O. Finley 2 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department
More informationMean Field Variational Bayes for Elaborate Distributions
Bayesian Analysis (011) 6, Number 4, pp. 1 48 Mean Field Variational Bayes for Elaborate Distributions Matthew P. Wand, John T. Ormerod, Simone A. Padoan and Rudolf Frührwirth Abstract. We develop strategies
More informationStatistical Machine Learning Lectures 4: Variational Bayes
1 / 29 Statistical Machine Learning Lectures 4: Variational Bayes Melih Kandemir Özyeğin University, İstanbul, Turkey 2 / 29 Synonyms Variational Bayes Variational Inference Variational Bayesian Inference
More informationBayesian Inference for DSGE Models. Lawrence J. Christiano
Bayesian Inference for DSGE Models Lawrence J. Christiano Outline State space-observer form. convenient for model estimation and many other things. Preliminaries. Probabilities. Maximum Likelihood. Bayesian
More informationPart 1: Expectation Propagation
Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 1: Expectation Propagation Tom Heskes Machine Learning Group, Institute for Computing and Information Sciences Radboud
More informationGaussian Mixture Models
Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationVariational Scoring of Graphical Model Structures
Variational Scoring of Graphical Model Structures Matthew J. Beal Work with Zoubin Ghahramani & Carl Rasmussen, Toronto. 15th September 2003 Overview Bayesian model selection Approximations using Variational
More informationVariational Methods in Bayesian Deconvolution
PHYSTAT, SLAC, Stanford, California, September 8-, Variational Methods in Bayesian Deconvolution K. Zarb Adami Cavendish Laboratory, University of Cambridge, UK This paper gives an introduction to the
More informationScale Mixture Modeling of Priors for Sparse Signal Recovery
Scale Mixture Modeling of Priors for Sparse Signal Recovery Bhaskar D Rao 1 University of California, San Diego 1 Thanks to David Wipf, Jason Palmer, Zhilin Zhang and Ritwik Giri Outline Outline Sparse
More informationPMR Learning as Inference
Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning
More informationNon-Parametric Bayes
Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian
More informationLearning Gaussian Process Models from Uncertain Data
Learning Gaussian Process Models from Uncertain Data Patrick Dallaire, Camille Besse, and Brahim Chaib-draa DAMAS Laboratory, Computer Science & Software Engineering Department, Laval University, Canada
More informationWeb Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.
Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Ruppert A. EMPIRICAL ESTIMATE OF THE KERNEL MIXTURE Here we
More informationBayesian Sparse Correlated Factor Analysis
Bayesian Sparse Correlated Factor Analysis 1 Abstract In this paper, we propose a new sparse correlated factor model under a Bayesian framework that intended to model transcription factor regulation in
More informationLecture 4: Probabilistic Learning
DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods
More information