Expectation Propagation performs smooth gradient descent GUILLAUME DEHAENE

Size: px
Start display at page:

Download "Expectation Propagation performs smooth gradient descent GUILLAUME DEHAENE"

Transcription

1 Expectation Propagation performs smooth gradient descent 1 GUILLAUME DEHAENE

2 In a nutshell Problem: posteriors are uncomputable Solution: parametric approximations 2 But which one should we choose? Laplace? Variational Bayes? Expectation Propagation? I will unite these three methods

3 Outline I. Gaussian Approximation methods A. Laplace B. Variational Bayes C. Expectation Propagation II. Smooth gradient methods A. Fixed-point conditions B. Reformulating gradient descent C. Gaussian Smoothing D. Using the factor structure III. Some consequences 3

4 I. Gaussian approximation methods 4 We have collected some data DD 1 DD nn We have a great IID model: - Prior: pp θθ - Conditional: pp DD ii θθ The posterior: p θθ DD 1 DD nn = 1 ZZ DD 1 DD nn pp θθ ii pp DD ii θθ

5 Uncomputability Usually, the posterior is uncomputable: - θθ is high-dimensional - The likelihoods have a complicated structure 5 Two solutions: - Sampling methods - Approximation methods We are going to focus on Gaussian approximations

6 A. The Laplace approximation The problem: Finding a Gaussian approximation qq(θθ) of pp θθ 6 IE: a quadratic approximation of log pp θθ = ψψ θθ The most basic quadratic approximation is the Taylor expansion to second order

7 A. The Laplace approximation 7 We center at the global maximum of pp θθ : θθ MMMMMM log pp θθ = ψψ θθ ψψ θθ MMMMMM + θθ θθ MMMMMM 2 2 HHHH θθ MMMMMM The Laplace approximation: - A Gaussian - Centered at θθ MMMMMM - With inverse-variance HHHH θθ MMMMMM

8 B. The Variational Bayes approximation Laplace is fine but not very principled 8 Instead, let s find the Gaussian which minimizes a «sensible notion of distance» to pp θθ qq VVVV = arg min KKKK qq, pp Distance = reverse Kullback-Leibler divergence: qq θθ KKKK qq, pp = qq θθ log pp θθ

9 B. The Variational Bayes approximation Parameterize Gaussians with their mean and std: μμ, σσ ηη μμ,σσ = μμ + σσηη 0 qq θθ μμ, σσ = xx μμ exp 2ππ σσ 2σσ 2 9 KKKK qq, pp = EE ψψ μμ + σσηη 0 log σσ + Gradient to μμ: μμ = EE ψψ μμ + σσηη 0 Gradient to σσ: σσ = σσee Hψψ μμ + σσηη 0 1/σσ

10 B. The Variational Bayes approximation 10 Gradient to μμ: μμ = EE ψψ μμ + σσηη 0 Gradient to σσ: σσ = σσee Hψψ μμ + σσηη 0 1/σσ We can use stochastic gradient descent to optimize the reverse KL divergence We use samples from ηη 0

11 C. Expectation Propagation First key idea: - Instead of a global approximation qq θθ pp θθ - We use the factor structure of pp θθ : 11 nn pp θθ = ff ii θθ ii=1 - And compute nn local approximations gg ii θθ ff ii θθ We can recover a global approximation as: qq EEEE θθ = gg ii θθ

12 C. Expectation Propagation First key idea: - Compute nn local approximations gg ii θθ 12 Second key idea: - Iteratively refine the approximation gg ii θθ ff ii θθ - Using other approximations gg jj θθ ; jj ii as context

13 C. Expectation Propagation To update gg ii θθ - Define the «hybrid»: 13 h ii θθ ff ii θθ gg jj θθ - Compute the mean and variance of h ii θθ and the Gaussian which has that same mean and variance qq jj ii - New approximation: gg ii θθ = qq θθ jj ii gg jj θθ

14 C. Expectation Propagation First key idea: - Compute nn local approximations gg ii θθ 14 Second key idea: - Iteratively refine the approximation gg ii θθ ff ii θθ - New approximation: gg ii θθ = GGGGGGGGGG ff ii θθ jj ii gg jj θθ jj ii gg jj θθ

15 C. Expectation Propagation Computing the mean and the variance of the hybrid requires model specific work: - Analytic - Quadrature methods - Pre-compiled approximations - Sampling 15 EP can also use non-gaussian approximations

16 Summary To deal with an uncomputable target distribution: 16 nn pp θθ = ff ii θθ ii=1 - Laplace approximation GD on ψψ θθ - VB approximation SGD on KKKK qq, pp - EP EP iteration These methods couldn t be further appart!!

17 Summary Gradient descent is well-understood and intuitive: - Dynamics of an object sliding down a slope 17 Stochastic optimization of KKKK qq, pp and EP iteration are unintuitive This makes them hard to use

18 II. Smooth gradient methods I will now unite these three methods under a single framework: smooth gradient methods 18 These iterate on Gaussian approximations to pp θθ They are closely related to GD = intuitive The three methods correspond to special cases

19 A. Fixed-point conditions The methods can be united because their fixed-point conditions are extremely similar 19 Laplace: gradient is 0 ψψ θθ MMMMMM = 0

20 A. Fixed-point conditions Variational Bayes: Recall the gradient: Gradient to μμ: μμ = EE ψψ μμ + σσηη 0 Gradient to σσ: σσ = σσσσ Hψψ μμ + σσηη 0 1/σσ 20 Optimal Gaussian qq VVVV must respect: EE qqvvvv ψψ θθ = 0 EE qqvvvv HHHH = CCCCvv qqvvvv 1

21 Variational Bayes: A. Fixed-point conditions 21 EE qqvvvv ψψ θθ = 0 EE qqvvvv HHHH = CCCCvv qqvvvv 1 The smoothed gradient must be 0 The covariance is related to the peakedness of log pp

22 A. Fixed-point conditions Expectation Propagation: - Define the «hybrid»: 22 h ii θθ ff ii θθ gg jj θθ - Compute the mean and variance of h ii θθ and the Gaussian which has that same mean and variance qq jj ii - New approximation: gg ii θθ = qq θθ jj ii gg jj θθ

23 A. Fixed-point conditions Expectation-Propagation: Easy fixed-point condition: All the hybrids h ii θθ and the global Gaussian approximation qq EEEE θθ have the same mean / variance Using this, and a little bit of math (Dehaene 2016): 23 ii EE hii log ff ii θθ = 0 ii CCCCvv hii 1 EEhii θθ μμ ii log ff ii θθ = CCCCvv qqeeee 1

24 A. Fixed-point conditions 24 Laplace: VB: ψψ θθ MMMMMM = 0 EE qqvvvv ψψ θθ = 0 EE qqvvvv HHHH θθ = CCCCvv qqvvvv 1 EP: EE hii log ff ii θθ = 0 CCCCvv hii 1 EE hii θθ μμ EEEE log ff ii θθ = CCCCvv qqeeee 1

25 A. Fixed-point conditions 25 Laplace: ψψ θθ MMMMMM = 0 VB: EP: EE qqvvvv ψψ θθ = 0 CCCCvv qqvvvv 1 EEqqVVVV θθ μμ VVVV ψψ θθ = CCCCvv qqvvvv 1 EE hii log ff ii θθ = 0 CCCCvv hii 1 EE hii θθ μμ EEEE log ff ii θθ = CCCCvv qqeeee 1

26 B. Reformulating gradient descent The first step: reframing gradient descent as - Iterating over Gaussian approximations to pp θθ - Fixed point = Laplace approximation 26 Key idea: GD corresponds to using a linear approximation of ψψ = log pp

27 B. Reformulating gradient descent Interpretation of GD: θθ nn+1 = θθ nn λλ ψψ θθ nn 27 «Please find the maximizer of the equation» θθ θθ nn ψψ θθ nn λλ 2 θθ θθ nn 2 This is the same as the mean of the Gaussian: qq nn+1 θθ exp θθ θθ nn ψψ θθ nn λλ 2 θθ θθ nn 2

28 B. Reformulating gradient descent 28 A trivial reformulation of GD: Iterate: qq nn+1 θθ exp θθ θθ nn ψψ θθ nn λλ 2 θθ θθ nn 2 θθ nn+1 = EE qqnn+1 θθ This iterates Gaussian approximations to pp θθ But the fixed-point isn t the Laplace approximation!

29 B. Reformulating gradient descent We need to use an optimization algorithm which uses a quadratic approximation of ψψ: 29 Newton s method: θθ nn+1 = θθ nn Hψψ θθ nn 1 ψψ θθ nn «Please find the maximizer of the equation» θθ θθ nn ψψ θθ nn 1 2 HHHH θθ nn θθ θθ nn 2

30 B. Reformulating gradient descent A trivial reformulation of Newton s: 30 Iterate: qq nn+1 θθ exp θθ θθ nn ψψ θθ nn θθ nn+1 = EE qqnn+1 HHHH θθ nn 2 θθ θθ θθ nn 2 We iterate Gaussian approximations of pp θθ until we find the Laplace approximation

31 Algorithm 1: disguised gradient descent Newton s method 31 DGD ψψ qqqqqqqqqqqqqqqqqq pp exp qqqqqqqqqqqqqqqqqq pp GGGGGGGGGGGGGGGG

32 C. Gaussian smoothing The Laplace approximation is a point approximation 32 We could improve the algorithm: - Smooth the objective function ψψ = ψψ exp θθ2 2σσ 2 - Run the algorithm on ψψ

33 C. Gaussian smoothing How should we choose the smoothing bandwith σσ? 33 We could choose it once for all steps OR On each step, we could use the current Gaussian approximation qq nn θθ to smooth ψψ

34 Algorithm 2: smoothed gradient descent 34 - Initialize with any Gaussian qq 0 - Loop: μμ nn = EE qqnn θθ rr = EE qqnn ψψ θθ ββ = EE qqnn HHHH θθ qq nn+1 θθ exp rr θθ μμ nn ββ 2 θθ μμ nn 2

35 Algorithm 2: smoothed gradient descent 35

36 D. Using the factor structure In order to use EP, pp θθ needs to have a nice factor structure: 36 nn pp θθ = ff ii θθ ii=1 For ψψ: nn ψψ θθ = φφ ii θθ ii=1

37 D. Using the factor structure Crazy idea: - We could use the VB algorithm - With non-gaussian smoothing - On each component φφ ii of ψψ 37

38 D. Using the factor structure The algorithm 2 update has two equivalent forms: ββ = EE qqnn HHHH θθ 38 OR ββ = CCCCvv qqnn 1 EEqqnn θθ μμ nn ψψ θθ When we replace qq nn by h ii, we have a choice to make: which form should we use?

39 Algorithm 3: smooth EP 39 - Initialize with any Gaussians gg 1, gg 2 gg nn - Loop: h ii ff ii jj ii gg jj μμ ii = EE hii θθ rr ii = EE hii φφ ii θθ ββ ii = vvvvrr hii 1 EEhii θθ μμ ii φφ ii θθ gg ii θθ exp rr θθ μμ ii ββ 2 θθ μμ ii 2

40 D. Using the factor structure Smooth EP is actually exactly equivalent to EP!! 40 This ties EP to a much more intuitive algorithm: Newton s method However, we have lost the most important feature: the explicit objective function

41 Summary I have presented a family of algorithms which: - Iterate Gaussian approximations to pp θθ - By computing smoothed quadratic approximations of log pp θθ (or parts of it) 41 Different smoothings correspod to different known methods: - Laplace = No smoothing - VB = Gaussian smoothing - EP = Various hybrid smoothings

42 III. Consequences Key observation: EP and algorithm 2 are closely related to Newton s method 42 They must behave in a similar fashion!!

43 III. Consequences Newton s has two striking features: - Very fast convergence near its fixed-points - Possible oscillations (overshooting) Solution: compliment it with line-search methods 43 EP is also known for its oscillations! Are these also overshoots of the target?

44 III. Consequences 44 EP still needs improvements We could import good ideas from Newton s method and use them on EP - Line-search (but how?) - Non-SPD second-order term ββ ii (but how?) Finally, the smooth Newton view of EP might be useful on its own

45 III. Consequences Since Laplace, VB and EP are closely related, we can ask whether they behave similarly in some situations 45 - Laplace = No smoothing - VB = Gaussian smoothing - EP = Various hybrid smoothings Whenever the smoothing distributions are similar, the methods are similar!

46 III. Consequences If all hybrids are almost equal to the global approximation: h ii qq nn 46 IE: ff ii /gg ii is negligible compared to qq nn Then EP and VB have the same smoothing and behave similarly

47 III. Consequences Furthermore, if qq nn is almost a Dirac distribution IE: its width is negligible compared to the oscillations of ψψ and/or the φφ ii functions 47 Then, algorithm 2 behaves similarly to algorithm 1 Thus, the VB approximation behaves similarly to the Laplace approximation

48 Conclusion Various approximation methods are closely related: The can be obtaiend through smooth Newton methods 48 - Laplace = No smoothing - VB = Gaussian smoothing - EP = Hybrid smoothings Corollary: VB and EP are very closely related EP behaves similarly to Newton s method

49 Speculation Smooth Newton variants might be computationally useful for VB and EP 49 VB and EP should give better approximations than Laplace Can this give a path towards understanding or improving the convergence of EP and VB?

50 References Minka, 2001, Expectation Propagation for approximate Bayesian inference Seeger, 2007, Expectation Propagation for Exponential Families Dehaene, Barthelmé, 2017, Expectation Propagation in the large-data limit Dehaene, 2016, Expectation Propagation performs a smoothed gradient descent 50

Variations. ECE 6540, Lecture 02 Multivariate Random Variables & Linear Algebra

Variations. ECE 6540, Lecture 02 Multivariate Random Variables & Linear Algebra Variations ECE 6540, Lecture 02 Multivariate Random Variables & Linear Algebra Last Time Probability Density Functions Normal Distribution Expectation / Expectation of a function Independence Uncorrelated

More information

Inferring the origin of an epidemic with a dynamic message-passing algorithm

Inferring the origin of an epidemic with a dynamic message-passing algorithm Inferring the origin of an epidemic with a dynamic message-passing algorithm HARSH GUPTA (Based on the original work done by Andrey Y. Lokhov, Marc Mézard, Hiroki Ohta, and Lenka Zdeborová) Paper Andrey

More information

CS Lecture 8 & 9. Lagrange Multipliers & Varitional Bounds

CS Lecture 8 & 9. Lagrange Multipliers & Varitional Bounds CS 6347 Lecture 8 & 9 Lagrange Multipliers & Varitional Bounds General Optimization subject to: min ff 0() R nn ff ii 0, h ii = 0, ii = 1,, mm ii = 1,, pp 2 General Optimization subject to: min ff 0()

More information

Expectation Propagation Algorithm

Expectation Propagation Algorithm Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,

More information

Lecture 3. STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher

Lecture 3. STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher Lecture 3 STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher Previous lectures What is machine learning? Objectives of machine learning Supervised and

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Gradient expansion formalism for generic spin torques

Gradient expansion formalism for generic spin torques Gradient expansion formalism for generic spin torques Atsuo Shitade RIKEN Center for Emergent Matter Science Atsuo Shitade, arxiv:1708.03424. Outline 1. Spintronics a. Magnetoresistance and spin torques

More information

CS249: ADVANCED DATA MINING

CS249: ADVANCED DATA MINING CS249: ADVANCED DATA MINING Vector Data: Clustering: Part II Instructor: Yizhou Sun yzsun@cs.ucla.edu May 3, 2017 Methods to Learn: Last Lecture Classification Clustering Vector Data Text Data Recommender

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

Approximate Second Order Algorithms. Seo Taek Kong, Nithin Tangellamudi, Zhikai Guo

Approximate Second Order Algorithms. Seo Taek Kong, Nithin Tangellamudi, Zhikai Guo Approximate Second Order Algorithms Seo Taek Kong, Nithin Tangellamudi, Zhikai Guo Why Second Order Algorithms? Invariant under affine transformations e.g. stretching a function preserves the convergence

More information

Entropy Enhanced Covariance Matrix Adaptation Evolution Strategy (EE_CMAES)

Entropy Enhanced Covariance Matrix Adaptation Evolution Strategy (EE_CMAES) 1 Entropy Enhanced Covariance Matrix Adaptation Evolution Strategy (EE_CMAES) Developers: Main Author: Kartik Pandya, Dept. of Electrical Engg., CSPIT, CHARUSAT, Changa, India Co-Author: Jigar Sarda, Dept.

More information

Statistical Learning with the Lasso, spring The Lasso

Statistical Learning with the Lasso, spring The Lasso Statistical Learning with the Lasso, spring 2017 1 Yeast: understanding basic life functions p=11,904 gene values n number of experiments ~ 10 Blomberg et al. 2003, 2010 The Lasso fmri brain scans function

More information

Non-linear least squares

Non-linear least squares Non-linear least squares Concept of non-linear least squares We have extensively studied linear least squares or linear regression. We see that there is a unique regression line that can be determined

More information

Part 1: Expectation Propagation

Part 1: Expectation Propagation Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 1: Expectation Propagation Tom Heskes Machine Learning Group, Institute for Computing and Information Sciences Radboud

More information

Dan Roth 461C, 3401 Walnut

Dan Roth   461C, 3401 Walnut CIS 519/419 Applied Machine Learning www.seas.upenn.edu/~cis519 Dan Roth danroth@seas.upenn.edu http://www.cis.upenn.edu/~danroth/ 461C, 3401 Walnut Slides were created by Dan Roth (for CIS519/419 at Penn

More information

Chapter 5: Spectral Domain From: The Handbook of Spatial Statistics. Dr. Montserrat Fuentes and Dr. Brian Reich Prepared by: Amanda Bell

Chapter 5: Spectral Domain From: The Handbook of Spatial Statistics. Dr. Montserrat Fuentes and Dr. Brian Reich Prepared by: Amanda Bell Chapter 5: Spectral Domain From: The Handbook of Spatial Statistics Dr. Montserrat Fuentes and Dr. Brian Reich Prepared by: Amanda Bell Background Benefits of Spectral Analysis Type of data Basic Idea

More information

Extreme value statistics: from one dimension to many. Lecture 1: one dimension Lecture 2: many dimensions

Extreme value statistics: from one dimension to many. Lecture 1: one dimension Lecture 2: many dimensions Extreme value statistics: from one dimension to many Lecture 1: one dimension Lecture 2: many dimensions The challenge for extreme value statistics right now: to go from 1 or 2 dimensions to 50 or more

More information

Angular Momentum, Electromagnetic Waves

Angular Momentum, Electromagnetic Waves Angular Momentum, Electromagnetic Waves Lecture33: Electromagnetic Theory Professor D. K. Ghosh, Physics Department, I.I.T., Bombay As before, we keep in view the four Maxwell s equations for all our discussions.

More information

General Strong Polarization

General Strong Polarization General Strong Polarization Madhu Sudan Harvard University Joint work with Jaroslaw Blasiok (Harvard), Venkatesan Gurswami (CMU), Preetum Nakkiran (Harvard) and Atri Rudra (Buffalo) May 1, 018 G.Tech:

More information

Solar Photovoltaics & Energy Systems

Solar Photovoltaics & Energy Systems Solar Photovoltaics & Energy Systems Lecture 3. Solar energy conversion with band-gap materials ChE-600 Kevin Sivula, Spring 2014 The Müser Engine with a concentrator T s Q 1 = σσ CffT ss 4 + 1 Cff T pp

More information

Gaussian and Linear Discriminant Analysis; Multiclass Classification

Gaussian and Linear Discriminant Analysis; Multiclass Classification Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015

More information

PHY103A: Lecture # 4

PHY103A: Lecture # 4 Semester II, 2017-18 Department of Physics, IIT Kanpur PHY103A: Lecture # 4 (Text Book: Intro to Electrodynamics by Griffiths, 3 rd Ed.) Anand Kumar Jha 10-Jan-2018 Notes The Solutions to HW # 1 have been

More information

A new procedure for sensitivity testing with two stress factors

A new procedure for sensitivity testing with two stress factors A new procedure for sensitivity testing with two stress factors C.F. Jeff Wu Georgia Institute of Technology Sensitivity testing : problem formulation. Review of the 3pod (3-phase optimal design) procedure

More information

(1) Correspondence of the density matrix to traditional method

(1) Correspondence of the density matrix to traditional method (1) Correspondence of the density matrix to traditional method New method (with the density matrix) Traditional method (from thermal physics courses) ZZ = TTTT ρρ = EE ρρ EE = dddd xx ρρ xx ii FF = UU

More information

CHAPTER 5 Wave Properties of Matter and Quantum Mechanics I

CHAPTER 5 Wave Properties of Matter and Quantum Mechanics I CHAPTER 5 Wave Properties of Matter and Quantum Mechanics I 1 5.1 X-Ray Scattering 5.2 De Broglie Waves 5.3 Electron Scattering 5.4 Wave Motion 5.5 Waves or Particles 5.6 Uncertainty Principle Topics 5.7

More information

General Strong Polarization

General Strong Polarization General Strong Polarization Madhu Sudan Harvard University Joint work with Jaroslaw Blasiok (Harvard), Venkatesan Gurswami (CMU), Preetum Nakkiran (Harvard) and Atri Rudra (Buffalo) December 4, 2017 IAS:

More information

A minimalist s exposition of EM

A minimalist s exposition of EM A minimalist s exposition of EM Karl Stratos 1 What EM optimizes Let O, H be a random variables representing the space of samples. Let be the parameter of a generative model with an associated probability

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Regression with Numerical Optimization. Logistic

Regression with Numerical Optimization. Logistic CSG220 Machine Learning Fall 2008 Regression with Numerical Optimization. Logistic regression Regression with Numerical Optimization. Logistic regression based on a document by Andrew Ng October 3, 204

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Prediction Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict the

More information

Stat 451 Lecture Notes Numerical Integration

Stat 451 Lecture Notes Numerical Integration Stat 451 Lecture Notes 03 12 Numerical Integration Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 5 in Givens & Hoeting, and Chapters 4 & 18 of Lange 2 Updated: February 11, 2016 1 / 29

More information

Support Vector Machines. CSE 4309 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Support Vector Machines. CSE 4309 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington Support Vector Machines CSE 4309 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 A Linearly Separable Problem Consider the binary classification

More information

ECE 6540, Lecture 06 Sufficient Statistics & Complete Statistics Variations

ECE 6540, Lecture 06 Sufficient Statistics & Complete Statistics Variations ECE 6540, Lecture 06 Sufficient Statistics & Complete Statistics Variations Last Time Minimum Variance Unbiased Estimators Sufficient Statistics Proving t = T(x) is sufficient Neyman-Fischer Factorization

More information

Expectation Propagation for Approximate Bayesian Inference

Expectation Propagation for Approximate Bayesian Inference Expectation Propagation for Approximate Bayesian Inference José Miguel Hernández Lobato Universidad Autónoma de Madrid, Computer Science Department February 5, 2007 1/ 24 Bayesian Inference Inference Given

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict

More information

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian

More information

SECTION 5: POWER FLOW. ESE 470 Energy Distribution Systems

SECTION 5: POWER FLOW. ESE 470 Energy Distribution Systems SECTION 5: POWER FLOW ESE 470 Energy Distribution Systems 2 Introduction Nodal Analysis 3 Consider the following circuit Three voltage sources VV sss, VV sss, VV sss Generic branch impedances Could be

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Distributed Estimation, Information Loss and Exponential Families. Qiang Liu Department of Computer Science Dartmouth College

Distributed Estimation, Information Loss and Exponential Families. Qiang Liu Department of Computer Science Dartmouth College Distributed Estimation, Information Loss and Exponential Families Qiang Liu Department of Computer Science Dartmouth College Statistical Learning / Estimation Learning generative models from data Topic

More information

SECTION 8: ROOT-LOCUS ANALYSIS. ESE 499 Feedback Control Systems

SECTION 8: ROOT-LOCUS ANALYSIS. ESE 499 Feedback Control Systems SECTION 8: ROOT-LOCUS ANALYSIS ESE 499 Feedback Control Systems 2 Introduction Introduction 3 Consider a general feedback system: Closed-loop transfer function is KKKK ss TT ss = 1 + KKKK ss HH ss GG ss

More information

Introduction to Statistical Learning Theory

Introduction to Statistical Learning Theory Introduction to Statistical Learning Theory In the last unit we looked at regularization - adding a w 2 penalty. We add a bias - we prefer classifiers with low norm. How to incorporate more complicated

More information

Curve Fitting Re-visited, Bishop1.2.5

Curve Fitting Re-visited, Bishop1.2.5 Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood differentiation p(t x, w, β) = Maximum Likelihood N N ( t n y(x n, w), β 1). (1.61) n=1 As we did in the case of the

More information

Instructor: Dr. Volkan Cevher. 1. Background

Instructor: Dr. Volkan Cevher. 1. Background Instructor: Dr. Volkan Cevher Variational Bayes Approximation ice University STAT 631 / ELEC 639: Graphical Models Scribe: David Kahle eviewers: Konstantinos Tsianos and Tahira Saleem 1. Background These

More information

Expectation propagation for signal detection in flat-fading channels

Expectation propagation for signal detection in flat-fading channels Expectation propagation for signal detection in flat-fading channels Yuan Qi MIT Media Lab Cambridge, MA, 02139 USA yuanqi@media.mit.edu Thomas Minka CMU Statistics Department Pittsburgh, PA 15213 USA

More information

Foundations of Statistical Inference

Foundations of Statistical Inference Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2016 Julien Berestycki (University of Oxford) SB2a MT 2016 1 / 32 Lecture 14 : Variational Bayes

More information

Active and Semi-supervised Kernel Classification

Active and Semi-supervised Kernel Classification Active and Semi-supervised Kernel Classification Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London Work done in collaboration with Xiaojin Zhu (CMU), John Lafferty (CMU),

More information

Machine Learning 4771

Machine Learning 4771 Machine Learning 4771 Instructor: Tony Jebara Topic 11 Maximum Likelihood as Bayesian Inference Maximum A Posteriori Bayesian Gaussian Estimation Why Maximum Likelihood? So far, assumed max (log) likelihood

More information

Radial Basis Function (RBF) Networks

Radial Basis Function (RBF) Networks CSE 5526: Introduction to Neural Networks Radial Basis Function (RBF) Networks 1 Function approximation We have been using MLPs as pattern classifiers But in general, they are function approximators Depending

More information

Ch 4. Linear Models for Classification

Ch 4. Linear Models for Classification Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,

More information

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation

More information

Stochastic Analogues to Deterministic Optimizers

Stochastic Analogues to Deterministic Optimizers Stochastic Analogues to Deterministic Optimizers ISMP 2018 Bordeaux, France Vivak Patel Presented by: Mihai Anitescu July 6, 2018 1 Apology I apologize for not being here to give this talk myself. I injured

More information

(1) Introduction: a new basis set

(1) Introduction: a new basis set () Introduction: a new basis set In scattering, we are solving the S eq. for arbitrary VV in integral form We look for solutions to unbound states: certain boundary conditions (EE > 0, plane and spherical

More information

Variational Bayes. A key quantity in Bayesian inference is the marginal likelihood of a set of data D given a model M

Variational Bayes. A key quantity in Bayesian inference is the marginal likelihood of a set of data D given a model M A key quantity in Bayesian inference is the marginal likelihood of a set of data D given a model M PD M = PD θ, MPθ Mdθ Lecture 14 : Variational Bayes where θ are the parameters of the model and Pθ M is

More information

Lecture 3 Optimization methods for econometrics models

Lecture 3 Optimization methods for econometrics models Lecture 3 Optimization methods for econometrics models Cinzia Cirillo University of Maryland Department of Civil and Environmental Engineering 06/29/2016 Summer Seminar June 27-30, 2016 Zinal, CH 1 Overview

More information

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Engineering Part IIB: Module 4F0 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 202 Engineering Part IIB:

More information

Selected Topics in Optimization. Some slides borrowed from

Selected Topics in Optimization. Some slides borrowed from Selected Topics in Optimization Some slides borrowed from http://www.stat.cmu.edu/~ryantibs/convexopt/ Overview Optimization problems are almost everywhere in statistics and machine learning. Input Model

More information

Multilayer Perceptrons (MLPs)

Multilayer Perceptrons (MLPs) CSE 5526: Introduction to Neural Networks Multilayer Perceptrons (MLPs) 1 Motivation Multilayer networks are more powerful than singlelayer nets Example: XOR problem x 2 1 AND x o x 1 x 2 +1-1 o x x 1-1

More information

Black-box α-divergence Minimization

Black-box α-divergence Minimization Black-box α-divergence Minimization José Miguel Hernández-Lobato, Yingzhen Li, Daniel Hernández-Lobato, Thang Bui, Richard Turner, Harvard University, University of Cambridge, Universidad Autónoma de Madrid.

More information

Log Gaussian Cox Processes. Chi Group Meeting February 23, 2016

Log Gaussian Cox Processes. Chi Group Meeting February 23, 2016 Log Gaussian Cox Processes Chi Group Meeting February 23, 2016 Outline Typical motivating application Introduction to LGCP model Brief overview of inference Applications in my work just getting started

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict

More information

Expectation propagation as a way of life

Expectation propagation as a way of life Expectation propagation as a way of life Yingzhen Li Department of Engineering Feb. 2014 Yingzhen Li (Department of Engineering) Expectation propagation as a way of life Feb. 2014 1 / 9 Reference This

More information

Machine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart

Machine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart Machine Learning Bayesian Regression & Classification learning as inference, Bayesian Kernel Ridge regression & Gaussian Processes, Bayesian Kernel Logistic Regression & GP classification, Bayesian Neural

More information

Bayesian Regression Linear and Logistic Regression

Bayesian Regression Linear and Logistic Regression When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we

More information

Lecture 13 : Variational Inference: Mean Field Approximation

Lecture 13 : Variational Inference: Mean Field Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

PATTERN RECOGNITION AND MACHINE LEARNING

PATTERN RECOGNITION AND MACHINE LEARNING PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality

More information

Integration, differentiation, and root finding. Phys 420/580 Lecture 7

Integration, differentiation, and root finding. Phys 420/580 Lecture 7 Integration, differentiation, and root finding Phys 420/580 Lecture 7 Numerical integration Compute an approximation to the definite integral I = b Find area under the curve in the interval Trapezoid Rule:

More information

Lecture 10: Logistic Regression

Lecture 10: Logistic Regression BIOINF 585: Machine Learning for Systems Biology & Clinical Informatics Lecture 10: Logistic Regression Jie Wang Department of Computational Medicine & Bioinformatics University of Michigan 1 Outline An

More information

CLASS NOTES Models, Algorithms and Data: Introduction to computing 2018

CLASS NOTES Models, Algorithms and Data: Introduction to computing 2018 CLASS NOTES Models, Algorithms and Data: Introduction to computing 208 Petros Koumoutsakos, Jens Honore Walther (Last update: June, 208) IMPORTANT DISCLAIMERS. REFERENCES: Much of the material (ideas,

More information

Bayesian Deep Learning

Bayesian Deep Learning Bayesian Deep Learning Mohammad Emtiyaz Khan AIP (RIKEN), Tokyo http://emtiyaz.github.io emtiyaz.khan@riken.jp June 06, 2018 Mohammad Emtiyaz Khan 2018 1 What will you learn? Why is Bayesian inference

More information

Hopfield Network for Associative Memory

Hopfield Network for Associative Memory CSE 5526: Introduction to Neural Networks Hopfield Network for Associative Memory 1 The next few units cover unsupervised models Goal: learn the distribution of a set of observations Some observations

More information

Integrated Non-Factorized Variational Inference

Integrated Non-Factorized Variational Inference Integrated Non-Factorized Variational Inference Shaobo Han, Xuejun Liao and Lawrence Carin Duke University February 27, 2014 S. Han et al. Integrated Non-Factorized Variational Inference February 27, 2014

More information

Minimum Message Length Analysis of the Behrens Fisher Problem

Minimum Message Length Analysis of the Behrens Fisher Problem Analysis of the Behrens Fisher Problem Enes Makalic and Daniel F Schmidt Centre for MEGA Epidemiology The University of Melbourne Solomonoff 85th Memorial Conference, 2011 Outline Introduction 1 Introduction

More information

Estimate by the L 2 Norm of a Parameter Poisson Intensity Discontinuous

Estimate by the L 2 Norm of a Parameter Poisson Intensity Discontinuous Research Journal of Mathematics and Statistics 6: -5, 24 ISSN: 242-224, e-issn: 24-755 Maxwell Scientific Organization, 24 Submied: September 8, 23 Accepted: November 23, 23 Published: February 25, 24

More information

CSC 578 Neural Networks and Deep Learning

CSC 578 Neural Networks and Deep Learning CSC 578 Neural Networks and Deep Learning Fall 2018/19 3. Improving Neural Networks (Some figures adapted from NNDL book) 1 Various Approaches to Improve Neural Networks 1. Cost functions Quadratic Cross

More information

1 Bayesian Linear Regression (BLR)

1 Bayesian Linear Regression (BLR) Statistical Techniques in Robotics (STR, S15) Lecture#10 (Wednesday, February 11) Lecturer: Byron Boots Gaussian Properties, Bayesian Linear Regression 1 Bayesian Linear Regression (BLR) In linear regression,

More information

Distributed Bayesian Learning with Stochastic Natural-gradient EP and the Posterior Server

Distributed Bayesian Learning with Stochastic Natural-gradient EP and the Posterior Server Distributed Bayesian Learning with Stochastic Natural-gradient EP and the Posterior Server in collaboration with: Minjie Xu, Balaji Lakshminarayanan, Leonard Hasenclever, Thibaut Lienart, Stefan Webb,

More information

Solving Fuzzy Nonlinear Equations by a General Iterative Method

Solving Fuzzy Nonlinear Equations by a General Iterative Method 2062062062062060 Journal of Uncertain Systems Vol.4, No.3, pp.206-25, 200 Online at: www.jus.org.uk Solving Fuzzy Nonlinear Equations by a General Iterative Method Anjeli Garg, S.R. Singh * Department

More information

Variational Bayesian Inference Techniques

Variational Bayesian Inference Techniques Advanced Signal Processing 2, SE Variational Bayesian Inference Techniques Johann Steiner 1 Outline Introduction Sparse Signal Reconstruction Sparsity Priors Benefits of Sparse Bayesian Inference Variational

More information

Understanding Covariance Estimates in Expectation Propagation

Understanding Covariance Estimates in Expectation Propagation Understanding Covariance Estimates in Expectation Propagation William Stephenson Department of EECS Massachusetts Institute of Technology Cambridge, MA 019 wtstephe@csail.mit.edu Tamara Broderick Department

More information

Quantitative Biology II Lecture 4: Variational Methods

Quantitative Biology II Lecture 4: Variational Methods 10 th March 2015 Quantitative Biology II Lecture 4: Variational Methods Gurinder Singh Mickey Atwal Center for Quantitative Biology Cold Spring Harbor Laboratory Image credit: Mike West Summary Approximate

More information

Variational Principal Components

Variational Principal Components Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a

More information

Logistic Regression. Seungjin Choi

Logistic Regression. Seungjin Choi Logistic Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

CSC 2541: Bayesian Methods for Machine Learning

CSC 2541: Bayesian Methods for Machine Learning CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 10 Alternatives to Monte Carlo Computation Since about 1990, Markov chain Monte Carlo has been the dominant

More information

Variational Methods in Bayesian Deconvolution

Variational Methods in Bayesian Deconvolution PHYSTAT, SLAC, Stanford, California, September 8-, Variational Methods in Bayesian Deconvolution K. Zarb Adami Cavendish Laboratory, University of Cambridge, UK This paper gives an introduction to the

More information

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

TDA231. Logistic regression

TDA231. Logistic regression TDA231 Devdatt Dubhashi dubhashi@chalmers.se Dept. of Computer Science and Engg. Chalmers University February 19, 2016 Some data 5 x2 0 5 5 0 5 x 1 In the Bayes classifier, we built a model of each class

More information

Stochastic Spectral Approaches to Bayesian Inference

Stochastic Spectral Approaches to Bayesian Inference Stochastic Spectral Approaches to Bayesian Inference Prof. Nathan L. Gibson Department of Mathematics Applied Mathematics and Computation Seminar March 4, 2011 Prof. Gibson (OSU) Spectral Approaches to

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters

More information

Deep Variational Inference. FLARE Reading Group Presentation Wesley Tansey 9/28/2016

Deep Variational Inference. FLARE Reading Group Presentation Wesley Tansey 9/28/2016 Deep Variational Inference FLARE Reading Group Presentation Wesley Tansey 9/28/2016 What is Variational Inference? What is Variational Inference? Want to estimate some distribution, p*(x) p*(x) What is

More information

Bayesian Inference for DSGE Models. Lawrence J. Christiano

Bayesian Inference for DSGE Models. Lawrence J. Christiano Bayesian Inference for DSGE Models Lawrence J. Christiano Outline State space-observer form. convenient for model estimation and many other things. Bayesian inference Bayes rule. Monte Carlo integation.

More information

13 : Variational Inference: Loopy Belief Propagation and Mean Field

13 : Variational Inference: Loopy Belief Propagation and Mean Field 10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction

More information

Review for Exam Hyunse Yoon, Ph.D. Assistant Research Scientist IIHR-Hydroscience & Engineering University of Iowa

Review for Exam Hyunse Yoon, Ph.D. Assistant Research Scientist IIHR-Hydroscience & Engineering University of Iowa 57:020 Fluids Mechanics Fall2013 1 Review for Exam3 12. 11. 2013 Hyunse Yoon, Ph.D. Assistant Research Scientist IIHR-Hydroscience & Engineering University of Iowa 57:020 Fluids Mechanics Fall2013 2 Chapter

More information

Gradient Descent. Sargur Srihari

Gradient Descent. Sargur Srihari Gradient Descent Sargur srihari@cedar.buffalo.edu 1 Topics Simple Gradient Descent/Ascent Difficulties with Simple Gradient Descent Line Search Brent s Method Conjugate Gradient Descent Weight vectors

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information

Chapter 1: A Brief Review of Maximum Likelihood, GMM, and Numerical Tools. Joan Llull. Microeconometrics IDEA PhD Program

Chapter 1: A Brief Review of Maximum Likelihood, GMM, and Numerical Tools. Joan Llull. Microeconometrics IDEA PhD Program Chapter 1: A Brief Review of Maximum Likelihood, GMM, and Numerical Tools Joan Llull Microeconometrics IDEA PhD Program Maximum Likelihood Chapter 1. A Brief Review of Maximum Likelihood, GMM, and Numerical

More information

Continuous Random Variables

Continuous Random Variables Continuous Random Variables Page Outline Continuous random variables and density Common continuous random variables Moment generating function Seeking a Density Page A continuous random variable has an

More information

Bayesian Gaussian / Linear Models. Read Sections and 3.3 in the text by Bishop

Bayesian Gaussian / Linear Models. Read Sections and 3.3 in the text by Bishop Bayesian Gaussian / Linear Models Read Sections 2.3.3 and 3.3 in the text by Bishop Multivariate Gaussian Model with Multivariate Gaussian Prior Suppose we model the observed vector b as having a multivariate

More information