The Bayesian approach to inverse problems

Size: px
Start display at page:

Download "The Bayesian approach to inverse problems"

Transcription

1 The Bayesian approach to inverse problems Youssef Marzouk Department of Aeronautics and Astronautics Center for Computational Engineering Massachusetts Institute of Technology 7 July 2015 Marzouk (MIT) ICERM IdeaLab 7 July / 29

2 Statistical inference Why is a statistical perspective useful in inverse problems? To characterize uncertainty in the inverse solution To understand how this uncertainty depends on the number and quality of observations, features of the forward model, prior information, etc. To make probabilistic predictions To choose good observations or experiments To address questions of model error, model validity, and model selection Marzouk (MIT) ICERM IdeaLab 7 July / 29

3 Bayesian inference Bayes rule p(θ y) = p(y θ)p(θ) p(y) Key idea: model parameters θ are treated as random variables (For simplicity, we let our random variables have densities) Notation θ are model parameters; y are the data; assume both to be finite-dimensional unless otherwise indicated p(θ) is the prior probability density L(θ) p(y θ) is the likelihood function p(θ y) is the posterior probability density p(y) is the evidence, or equivalently, the marginal likelihood Marzouk (MIT) ICERM IdeaLab 7 July / 29

4 Bayesian inference Summaries of the posterior distribution What information to extract? Posterior mean of θ; maximum a posteriori (MAP) estimate of θ Posterior covariance or higher moments of θ Quantiles Credibile intervals: C(y) such that P [θ C(y) y] = 1 α. Credibile intervals are not uniquely defined above; thus consider, for example, the HPD (highest posterior density) region. Posterior realizations: for direct assessment, or to estimate posterior predictions or other posterior expectations Marzouk (MIT) ICERM IdeaLab 7 July / 29

5 Bayesian and frequentist statistics Understanding both perspectives is useful and important... Key differences between these two statistical paradigms Frequentists do not assign probabilities to unknown parameters θ. One can write likelihoods p θ (y) p(y θ) but not priors p(θ) or posteriors. θ is not a random variable. In the frequentist viewpoint, there is no single preferred methodology for inverting the relationship between parameters and data. Instead, consider various estimators ˆθ(y) of θ. The estimator ˆθ is a random variable. Why? Frequentist paradigm considers y to result from a random and repeatable experiment. Marzouk (MIT) ICERM IdeaLab 7 July / 29

6 Bayesian and frequentist statistics Key differences (continued) Evaluate quality of ˆθ through various criteria: bias, variance, mean-square error, consistency, efficiency,... One common estimator is maximum likelihood: ˆθ ML = argmax θ p(y θ). p(y θ) defines a family of distributions indexed by θ. Link to Bayesian approach: MAP estimate maximizes a penalized likelihood. What about Bayesian versus frequentist prediction of y new y θ? Frequentist: plug-in or other estimators of y new Bayesian: posterior prediction via integration Marzouk (MIT) ICERM IdeaLab 7 July / 29

7 Bayesian inference Likelihood functions In general, p(y θ) is a probabilistic model for the data In the inverse problem or parameter estimation context, the likelihood function is where the forward model appears, along with a noise model and (if applicable) an expression for model discrepancy Contrasting example (but not really!): parametric density estimation, where the likelihood function results from the probability density itself. Selected examples of likelihood functions 1 Bayesian linear regression 2 Nonlinear forward model g(θ) with additive Gaussian noise 3 Nonlinear forward model with noise + model discrepancy Marzouk (MIT) ICERM IdeaLab 7 July / 29

8 Bayesian inference Prior distributions In ill-posed parameter estimation problems, e.g., inverse problems, prior information plays a key role Intuitive idea: assign lower probability to values of θ that you don t expect to see, higher probability to values of θ that you do expect to see Examples 1 Gaussian processes with specified covariance kernel 2 Gaussian Markov random fields 3 Gaussian priors derived from differential operators 4 Hierarchical priors 5 Besov space priors 6 Higher-level representations (objects, marked point processes) Marzouk (MIT) ICERM IdeaLab 7 July / 29

9 Gaussian process priors Key idea: any finite-dimensional distribution of the stochastic process θ(x, ω) : D Ω R is multivariate normal. In other words: θ(x, ω) is a collection of jointly Gaussian random variables, indexed by x Specify via mean function and covariance function E [θ(x)] = µ(x) E [(θ(x) µ) (θ(x ) µ)] = C(x, x ) Smoothness of process is controlled by behavior of covariance function as x x Restrictions: stationarity, isotropy,... Marzouk (MIT) ICERM IdeaLab 7 July / 29

10 Example: Gaussian stationary Gaussian process random fields priors! Prior is a stationary Gaussian random field: (exponential covariance kernel) (Gaussian covariance kernel) Both are θ(x, ω) : D M(x,!) Ω R, = with µ(x) D + = [0, " i 1] c 2 i (!). # i (x) K $ i=1 (Karhunen-Loève expansion) Marzouk (MIT) ICERM IdeaLab 7 July / 29

11 Gaussian Markov random fields Key idea: discretize space and specify a sparse inverse covariance ( precision ) matrix W ( p(θ) exp 1 ) 2 γθt Wθ where γ controls scale Full conditionals p(θ i θ i ) are available analytically and may simplify dramatically. Represent as an undirected graphical model Example: E [θ i θ i ] is just an average of site i s nearest neighbors Quite flexible; even used to simulate textures Marzouk (MIT) ICERM IdeaLab 7 July / 29

12 Priors through differential operators Key idea: return to infinite-dimensional setting; again penalize roughness in θ(x) Stuart 2010: define the prior using fractional negative powers of the Laplacian A = : θ N ( θ 0, βa α) Sufficiently large α (α > d/2), along with conditions on the likelihood, ensures that posterior measure is well defined Marzouk (MIT) ICERM IdeaLab 7 July / 29

13 GPs, GMRFs, and SPDEs In fact, all three types of Gaussian priors just described are closely connected. Linear fractional SPDE: ( κ 2 ) β/2 θ(x) = W(x), x R d, β = ν + d/2, κ > 0, ν > 0 Then θ(x) is a Gaussian field with Matérn covariance: C(x, x σ 2 ) = 2 ν 1 Γ(ν) (κ x x ) ν K ν (κ x x ) Covariance ( kernel is Green s function of differential operator κ 2 ) β C(x, x ) = δ(x x ) ν = 1/2 equivalent to exponential covariance; ν equivalent to squared exponential covariance Can construct a discrete GMRF that approximates the solution of SPDE (See Lindgren, Rue, Lindström JRSSB 2011.) Marzouk (MIT) ICERM IdeaLab 7 July / 29

14 Hierarchical Gaussian priors Inverse Problems 24 (2008) D Calvetti and E Somersalo Figure 1. Three realization drawn from the prior (6) with constant variance θ j = θ 0 (left) and from the corresponding prior where the variance is 100 fold at two points indicated by arrows (right). Calvetti & Somersalo, Inverse Problems 24 (2008) where X and W are the n-variate random variables with components X j and W j,respectively, and 1 Marzouk (MIT) ICERM IdeaLab 7 July / 29

15 Figure 4. Approximation of the MAP Estimate of the image (top row) and of the variance (bottom Hierarchical row) after 1, 3Gaussian and 5 iteration of thepriors cyclic algorithm when using the GMRES method to compute the updated of the image at each iteration step. Iteration 1 Iteration 3 Iteration 7 Iteration 1 Iteration 3 Iteration 7 Figure 5. Approximation of the MAP estimate of the image (top row) and of the variance (bottom row) after 1, 3 and 5 iteration of the cyclic algorithm when using the CGLS method to compute the updated of the image at each iteration step Calvetti & Somersalo, Inverse Problems 24 (2008) he graphs displayed in figure 6 refer to the CGLS iteration with inverse gamma hyperprior. he value of the objective function levels off after five iterations, and this could be the basis Marzouk (MIT) ICERM IdeaLab 7 July / 29

16 Non-Gaussian priors Besov space B s pq(t): and θ B s pq (T) := θ(x) = c 0 + c 0 q + Consider p = q = s = 1: j=0 2 j 1 j=0 h=0 2 jq(s p ) θ B 1 11 (T) = c 0 + w j,h ψ j,h (x) 2 j q/p 1/q 1 w j,h p <. h=0 2 j 1 j=0 h=0 2 j/2 w j,h. Then the distribution of θ is a Besov prior if αc 0 and α2 j/2 w j,h are independent and Laplace(1). ( ) Loosely, π(θ) = exp α θ B 1 11 (T). Marzouk (MIT) ICERM IdeaLab 7 July / 29

17 Higher-level representations Marked point processes, and more: Rue & Hurn, Biometrika 86 (1999) Marzouk (MIT) ICERM IdeaLab 7 July / 29

18 Bayesian inference Hierarchical modeling One of the key flexibilities of the Bayesian construction! Hierarchical modeling has important implications for the design of efficient MCMC samplers (later in the lecture) Examples: 1 Unknown noise variance 2 Unknown variance of a Gaussian process prior (cf. choosing the regularization parameter) 3 Many more, as dictated by the physical models at hand Marzouk (MIT) ICERM IdeaLab 7 July / 29

19 Example: prior variance hyperparameter in an inverse diffusion problem hyperprior posterior, ς=10 1, 13 sensors posterior, ς=10 2, 25 sensors 1.2 p(θ) or p(θ d) θ Figure : Posterior marginal density of the variance hyperparameter θ, versus quality of data, contrasted with its hyperprior density. Regularization ς 2 /θ. Marzouk (MIT) ICERM IdeaLab 7 July / 29

20 The linear Gaussian model A key building-block problem: Parameters θ R n, observations y R m Forward model f (θ) = Gθ, where G R m n Additive noise yields observations: y = Gθ + ɛ ɛ N(0, Γ obs ) and is independent of θ Endow θ with a Gaussian prior, θ N(0, Γ pr ). Posterior probability density p(θ y) p(y θ)p(θ) = L(θ)p(θ) ( = exp 1 ) ( 2 (y Gθ)T Γ 1 obs (y Gθ) exp 1 ) 2 θt Γ 1 pr θ ( = exp 1 ) 2 (θ µ pos) T Γ 1 pos (θ µ pos) Marzouk (MIT) ICERM IdeaLab 7 July / 29

21 The linear Gaussian model A key building-block problem: Parameters θ R n, observations y R m Forward model f (θ) = Gθ, where G R m n Additive noise yields observations: y = Gθ + ɛ ɛ N(0, Γ obs ) and is independent of θ Endow θ with a Gaussian prior, θ N(0, Γ pr ). Posterior probability density p(θ y) p(y θ)p(θ) = L(θ)p(θ) ( = exp 1 ) ( 2 (y Gθ)T Γ 1 obs (y Gθ) exp 1 ) 2 θt Γ 1 pr θ ( = exp 1 ) 2 (θ µ pos) T Γ 1 pos (θ µ pos) Marzouk (MIT) ICERM IdeaLab 7 July / 29

22 The linear Gaussian model Posterior is again Gaussian: Γ pos = ( G T Γ 1 obs G + Γ 1 pr ) 1 = Γ pr Γ pr G T ( GΓ pr G T + Γ obs ) 1 GΓpr = (I KG) Γ pr µ pos = Γ pos G T Γ 1 obs y In the context of filtering, K is known as the (optimal) Kalman gain. H := G T Γ 1 obsg is the Hessian of the negative log-likelihood How does low rank of H affect the structure of the posterior? How does H interact with the prior? Marzouk (MIT) ICERM IdeaLab 7 July / 29

23 Likelihood-informed directions Consider the Rayleigh ratio R(w) = w Hw w Γ 1 pr w. When R(w) is large, likelihood dominates the prior in direction w. The ratio is maximized by solutions to the generalized eigenvalue problem Hw = λγ 1 pr w. The posterior covariance can be written as a negative update along these likelihood-informed directions, and approximation can be obtained by using only r largest eigenvalues: Γ pos = Γ pr n i=1 λ i w i wi Γ pr 1 + λ i r i=1 λ i 1 + λ i w i w i (1) Marzouk (MIT) ICERM IdeaLab 7 July / 29

24 Likelihood-informed directions Consider the Rayleigh ratio R(w) = w Hw w Γ 1 pr w. When R(w) is large, likelihood dominates the prior in direction w. The ratio is maximized by solutions to the generalized eigenvalue problem Hw = λγ 1 pr w. The posterior covariance can be written as a negative update along these likelihood-informed directions, and approximation can be obtained by using only r largest eigenvalues: Γ pos = Γ pr n i=1 λ i w i wi Γ pr 1 + λ i r i=1 λ i 1 + λ i w i w i (1) Marzouk (MIT) ICERM IdeaLab 7 July / 29

25 Likelihood-informed directions Consider the Rayleigh ratio R(w) = w Hw w Γ 1 pr w. When R(w) is large, likelihood dominates the prior in direction w. The ratio is maximized by solutions to the generalized eigenvalue problem Hw = λγ 1 pr w. The posterior covariance can be written as a negative update along these likelihood-informed directions, and approximation can be obtained by using only r largest eigenvalues: Γ pos = Γ pr n i=1 λ i w i wi Γ pr 1 + λ i r i=1 λ i 1 + λ i w i w i (1) Marzouk (MIT) ICERM IdeaLab 7 July / 29

26 Optimality results for Γ pos It turns out that the approximation Γpos = Γ pr r i=1 λ i 1 + λ i w i w i (2) is optimal in a class of loss functions L( Γ pos, Γ pos ) for approximations of form Γ pos = Γ pr KK, where rank(k) r. 1 Γpos minimises the Hellinger distance and the KL-divergence between N (µ pos (y), Γ pos ) and N (µ pos (y), Γ pos ). The results can also be used to devise efficient approximations for the posterior mean. λ = 1 means that prior and likelihood are roughly balanced. Truncate at λ = 0.1, for instance. 1 For details see Spantini et al., Optimal low-rank approximations of Bayesian linear inverse problems, Marzouk (MIT) ICERM IdeaLab 7 July / 29

27 Optimality results for Γ pos It turns out that the approximation Γpos = Γ pr r i=1 λ i 1 + λ i w i w i (2) is optimal in a class of loss functions L( Γ pos, Γ pos ) for approximations of form Γ pos = Γ pr KK, where rank(k) r. 1 Γpos minimises the Hellinger distance and the KL-divergence between N (µ pos (y), Γ pos ) and N (µ pos (y), Γ pos ). The results can also be used to devise efficient approximations for the posterior mean. λ = 1 means that prior and likelihood are roughly balanced. Truncate at λ = 0.1, for instance. 1 For details see Spantini et al., Optimal low-rank approximations of Bayesian linear inverse problems, Marzouk (MIT) ICERM IdeaLab 7 July / 29

28 Optimality results for Γ pos It turns out that the approximation Γpos = Γ pr r i=1 λ i 1 + λ i w i w i (2) is optimal in a class of loss functions L( Γ pos, Γ pos ) for approximations of form Γ pos = Γ pr KK, where rank(k) r. 1 Γpos minimises the Hellinger distance and the KL-divergence between N (µ pos (y), Γ pos ) and N (µ pos (y), Γ pos ). The results can also be used to devise efficient approximations for the posterior mean. λ = 1 means that prior and likelihood are roughly balanced. Truncate at λ = 0.1, for instance. 1 For details see Spantini et al., Optimal low-rank approximations of Bayesian linear inverse problems, Marzouk (MIT) ICERM IdeaLab 7 July / 29

29 Optimality results for Γ pos It turns out that the approximation Γpos = Γ pr r i=1 λ i 1 + λ i w i w i (2) is optimal in a class of loss functions L( Γ pos, Γ pos ) for approximations of form Γ pos = Γ pr KK, where rank(k) r. 1 Γpos minimises the Hellinger distance and the KL-divergence between N (µ pos (y), Γ pos ) and N (µ pos (y), Γ pos ). The results can also be used to devise efficient approximations for the posterior mean. λ = 1 means that prior and likelihood are roughly balanced. Truncate at λ = 0.1, for instance. 1 For details see Spantini et al., Optimal low-rank approximations of Bayesian linear inverse problems, Marzouk (MIT) ICERM IdeaLab 7 July / 29

30 Remarks on the optimal approximation Γ pos = Γ pr KK, KK = r i=1 λ i w i wi 1 + λ i The form of the optimal update is widely used (Flath et al. 2011) Compute with Lanczos, randomized SVD, etc. Directions w i = Γ 1 pr w i maximize the relative difference between prior and posterior variance: Var ( w i x ) Var ( w i x y ) Var ( w i x ) = λ i 1 + λ i Using the Frobenius norm as a loss would instead yield directions of greatest absolute difference between prior and posterior variance. Marzouk (MIT) ICERM IdeaLab 7 July / 29

31 A metric between covariance matrices Förstner metric Let A, B 0, and (σ i ) be the eigenvalues of (A, B), then: df [ln ( )] 2 (A, B) = tr 2 B 1 2 AB 1 2 B 1/2 AB 1/2 σ 2 σ 1 = i ln 2 (σ i ) Compare curvatures: sup u u Au u Bu = σ 1 Invariance properties: d F (A, B) = d F ( A 1, B 1) d F (A, B) = d F ( MAM, MBM ) Frobenius d F (A, B) = A B F does not share the same properties Marzouk (MIT) ICERM IdeaLab 7 July / 29

32 Example: computerized tomography X-rays travel from sources to detectors through an object of interest. The intensities from the sources are measured at the detectors, and the goal is to reconstruct the density of the object intensity detector pixel This synthetic example is motivated by a real application: real-time X-ray imaging of logs that enter a saw mill for the purpose of automatic quality control. 2 2 Check Marzouk (MIT) ICERM IdeaLab 7 July / 29

33 Example: computerized tomography Weaker data faster decay of generalized eigenvalues lower order approximations possible limited angle 10-3 prior limited angle eigenvalues limited angle full angle generalized eigenvalues full angle df full angle index i index i rank of update In the limited angle case, roughly r = 200 is enough to get a good approximation (with full angle r 800 needed). prior 5.4 rank = 50 rank = 100 rank = 200 posterior Marzouk (MIT) ICERM IdeaLab 7 July / 29

34 Example: computerized tomography Approximation of the mean: µ pos (y) = Γ pos G Γ 1 obs y A r y Marzouk (MIT) ICERM IdeaLab 7 July / 29

35 Questions yet to answer How to simulate from or explore the posterior distribution? How to make Bayesian inference computationally tractable when the forward model is expensive (e.g., a PDE) and the parameters are high- or infinite-dimensional? Downstream questions: model selection, optimal experimental design, decision-making, etc. Marzouk (MIT) ICERM IdeaLab 7 July / 29

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Stochastic Spectral Approaches to Bayesian Inference

Stochastic Spectral Approaches to Bayesian Inference Stochastic Spectral Approaches to Bayesian Inference Prof. Nathan L. Gibson Department of Mathematics Applied Mathematics and Computation Seminar March 4, 2011 Prof. Gibson (OSU) Spectral Approaches to

More information

Graphical Models for Collaborative Filtering

Graphical Models for Collaborative Filtering Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,

More information

Dimension-Independent likelihood-informed (DILI) MCMC

Dimension-Independent likelihood-informed (DILI) MCMC Dimension-Independent likelihood-informed (DILI) MCMC Tiangang Cui, Kody Law 2, Youssef Marzouk Massachusetts Institute of Technology 2 Oak Ridge National Laboratory 2 August 25 TC, KL, YM DILI MCMC USC

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning

More information

Outline Lecture 2 2(32)

Outline Lecture 2 2(32) Outline Lecture (3), Lecture Linear Regression and Classification it is our firm belief that an understanding of linear models is essential for understanding nonlinear ones Thomas Schön Division of Automatic

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

Covariance function estimation in Gaussian process regression

Covariance function estimation in Gaussian process regression Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian

More information

State Space Representation of Gaussian Processes

State Space Representation of Gaussian Processes State Space Representation of Gaussian Processes Simo Särkkä Department of Biomedical Engineering and Computational Science (BECS) Aalto University, Espoo, Finland June 12th, 2013 Simo Särkkä (Aalto University)

More information

A short introduction to INLA and R-INLA

A short introduction to INLA and R-INLA A short introduction to INLA and R-INLA Integrated Nested Laplace Approximation Thomas Opitz, BioSP, INRA Avignon Workshop: Theory and practice of INLA and SPDE November 7, 2018 2/21 Plan for this talk

More information

GAUSSIAN PROCESS REGRESSION

GAUSSIAN PROCESS REGRESSION GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The

More information

Transport maps and dimension reduction for Bayesian computation Youssef Marzouk

Transport maps and dimension reduction for Bayesian computation Youssef Marzouk Transport maps and dimension reduction for Bayesian computation Youssef Marzouk Massachusetts Institute of Technology Department of Aeronautics & Astronautics Center for Computational Engineering http://uqgroup.mit.edu

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Practical Bayesian Optimization of Machine Learning. Learning Algorithms

Practical Bayesian Optimization of Machine Learning. Learning Algorithms Practical Bayesian Optimization of Machine Learning Algorithms CS 294 University of California, Berkeley Tuesday, April 20, 2016 Motivation Machine Learning Algorithms (MLA s) have hyperparameters that

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin

More information

Spatial Statistics with Image Analysis. Outline. A Statistical Approach. Johan Lindström 1. Lund October 6, 2016

Spatial Statistics with Image Analysis. Outline. A Statistical Approach. Johan Lindström 1. Lund October 6, 2016 Spatial Statistics Spatial Examples More Spatial Statistics with Image Analysis Johan Lindström 1 1 Mathematical Statistics Centre for Mathematical Sciences Lund University Lund October 6, 2016 Johan Lindström

More information

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu Lecture: Gaussian Process Regression STAT 6474 Instructor: Hongxiao Zhu Motivation Reference: Marc Deisenroth s tutorial on Robot Learning. 2 Fast Learning for Autonomous Robots with Gaussian Processes

More information

MCMC Sampling for Bayesian Inference using L1-type Priors

MCMC Sampling for Bayesian Inference using L1-type Priors MÜNSTER MCMC Sampling for Bayesian Inference using L1-type Priors (what I do whenever the ill-posedness of EEG/MEG is just not frustrating enough!) AG Imaging Seminar Felix Lucka 26.06.2012 , MÜNSTER Sampling

More information

Gaussian Process Regression

Gaussian Process Regression Gaussian Process Regression 4F1 Pattern Recognition, 21 Carl Edward Rasmussen Department of Engineering, University of Cambridge November 11th - 16th, 21 Rasmussen (Engineering, Cambridge) Gaussian Process

More information

GWAS V: Gaussian processes

GWAS V: Gaussian processes GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011

More information

Gaussian Processes (10/16/13)

Gaussian Processes (10/16/13) STA561: Probabilistic machine learning Gaussian Processes (10/16/13) Lecturer: Barbara Engelhardt Scribes: Changwei Hu, Di Jin, Mengdi Wang 1 Introduction In supervised learning, we observe some inputs

More information

Regularized Regression A Bayesian point of view

Regularized Regression A Bayesian point of view Regularized Regression A Bayesian point of view Vincent MICHEL Director : Gilles Celeux Supervisor : Bertrand Thirion Parietal Team, INRIA Saclay Ile-de-France LRI, Université Paris Sud CEA, DSV, I2BM,

More information

Nonparameteric Regression:

Nonparameteric Regression: Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

Bayesian inference of random fields represented with the Karhunen-Loève expansion

Bayesian inference of random fields represented with the Karhunen-Loève expansion Bayesian inference of random fields represented with the Karhunen-Loève expansion Felipe Uribe a,, Iason Papaioannou a, Wolfgang Betz a, Daniel Straub a a Engineering Risk Analysis Group, Technische Universität

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian

More information

Outline lecture 2 2(30)

Outline lecture 2 2(30) Outline lecture 2 2(3), Lecture 2 Linear Regression it is our firm belief that an understanding of linear models is essential for understanding nonlinear ones Thomas Schön Division of Automatic Control

More information

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations

More information

Lecture 13 Fundamentals of Bayesian Inference

Lecture 13 Fundamentals of Bayesian Inference Lecture 13 Fundamentals of Bayesian Inference Dennis Sun Stats 253 August 11, 2014 Outline of Lecture 1 Bayesian Models 2 Modeling Correlations Using Bayes 3 The Universal Algorithm 4 BUGS 5 Wrapping Up

More information

COMP90051 Statistical Machine Learning

COMP90051 Statistical Machine Learning COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 2. Statistical Schools Adapted from slides by Ben Rubinstein Statistical Schools of Thought Remainder of lecture is to provide

More information

Summary STK 4150/9150

Summary STK 4150/9150 STK4150 - Intro 1 Summary STK 4150/9150 Odd Kolbjørnsen May 22 2017 Scope You are expected to know and be able to use basic concepts introduced in the book. You knowledge is expected to be larger than

More information

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33 Hypothesis Testing Econ 690 Purdue University Justin L. Tobias (Purdue) Testing 1 / 33 Outline 1 Basic Testing Framework 2 Testing with HPD intervals 3 Example 4 Savage Dickey Density Ratio 5 Bartlett

More information

g-priors for Linear Regression

g-priors for Linear Regression Stat60: Bayesian Modeling and Inference Lecture Date: March 15, 010 g-priors for Linear Regression Lecturer: Michael I. Jordan Scribe: Andrew H. Chan 1 Linear regression and g-priors In the last lecture,

More information

Factor Analysis and Kalman Filtering (11/2/04)

Factor Analysis and Kalman Filtering (11/2/04) CS281A/Stat241A: Statistical Learning Theory Factor Analysis and Kalman Filtering (11/2/04) Lecturer: Michael I. Jordan Scribes: Byung-Gon Chun and Sunghoon Kim 1 Factor Analysis Factor analysis is used

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II 1 Non-linear regression techniques Part - II Regression Algorithms in this Course Support Vector Machine Relevance Vector Machine Support vector regression Boosting random projections Relevance vector

More information

Nonparametric Regression With Gaussian Processes

Nonparametric Regression With Gaussian Processes Nonparametric Regression With Gaussian Processes From Chap. 45, Information Theory, Inference and Learning Algorithms, D. J. C. McKay Presented by Micha Elsner Nonparametric Regression With Gaussian Processes

More information

GWAS IV: Bayesian linear (variance component) models

GWAS IV: Bayesian linear (variance component) models GWAS IV: Bayesian linear (variance component) models Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS IV: Bayesian

More information

Nonparametric Bayesian Methods - Lecture I

Nonparametric Bayesian Methods - Lecture I Nonparametric Bayesian Methods - Lecture I Harry van Zanten Korteweg-de Vries Institute for Mathematics CRiSM Masterclass, April 4-6, 2016 Overview of the lectures I Intro to nonparametric Bayesian statistics

More information

Part 1: Expectation Propagation

Part 1: Expectation Propagation Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 1: Expectation Propagation Tom Heskes Machine Learning Group, Institute for Computing and Information Sciences Radboud

More information

Variational Principal Components

Variational Principal Components Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings

More information

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA Intro: Course Outline and Brief Intro to Marina Vannucci Rice University, USA PASI-CIMAT 04/28-30/2010 Marina Vannucci

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters

More information

Hierarchical Modeling for Univariate Spatial Data

Hierarchical Modeling for Univariate Spatial Data Hierarchical Modeling for Univariate Spatial Data Geography 890, Hierarchical Bayesian Models for Environmental Spatial Data Analysis February 15, 2011 1 Spatial Domain 2 Geography 890 Spatial Domain This

More information

Bayesian Calibration of Simulators with Structured Discretization Uncertainty

Bayesian Calibration of Simulators with Structured Discretization Uncertainty Bayesian Calibration of Simulators with Structured Discretization Uncertainty Oksana A. Chkrebtii Department of Statistics, The Ohio State University Joint work with Matthew T. Pratola (Statistics, The

More information

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Bayesian Inference. Chapter 4: Regression and Hierarchical Models Bayesian Inference Chapter 4: Regression and Hierarchical Models Conchi Ausín and Mike Wiper Department of Statistics Universidad Carlos III de Madrid Advanced Statistics and Data Mining Summer School

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

p L yi z n m x N n xi

p L yi z n m x N n xi y i z n x n N x i Overview Directed and undirected graphs Conditional independence Exact inference Latent variables and EM Variational inference Books statistical perspective Graphical Models, S. Lauritzen

More information

1 Bayesian Linear Regression (BLR)

1 Bayesian Linear Regression (BLR) Statistical Techniques in Robotics (STR, S15) Lecture#10 (Wednesday, February 11) Lecturer: Byron Boots Gaussian Properties, Bayesian Linear Regression 1 Bayesian Linear Regression (BLR) In linear regression,

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

Gaussian with mean ( µ ) and standard deviation ( σ)

Gaussian with mean ( µ ) and standard deviation ( σ) Slide from Pieter Abbeel Gaussian with mean ( µ ) and standard deviation ( σ) 10/6/16 CSE-571: Robotics X ~ N( µ, σ ) Y ~ N( aµ + b, a σ ) Y = ax + b + + + + 1 1 1 1 1 1 1 1 1 1, ~ ) ( ) ( ), ( ~ ), (

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 12: Gaussian Belief Propagation, State Space Models and Kalman Filters Guest Kalman Filter Lecture by

More information

ICML Scalable Bayesian Inference on Point processes. with Gaussian Processes. Yves-Laurent Kom Samo & Stephen Roberts

ICML Scalable Bayesian Inference on Point processes. with Gaussian Processes. Yves-Laurent Kom Samo & Stephen Roberts ICML 2015 Scalable Nonparametric Bayesian Inference on Point Processes with Gaussian Processes Machine Learning Research Group and Oxford-Man Institute University of Oxford July 8, 2015 Point Processes

More information

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression Group Prof. Daniel Cremers 9. Gaussian Processes - Regression Repetition: Regularized Regression Before, we solved for w using the pseudoinverse. But: we can kernelize this problem as well! First step:

More information

Probabilistic Graphical Models

Probabilistic Graphical Models 2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector

More information

Lecture 1: Bayesian Framework Basics

Lecture 1: Bayesian Framework Basics Lecture 1: Bayesian Framework Basics Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de April 21, 2014 What is this course about? Building Bayesian machine learning models Performing the inference of

More information

Generative Models and Stochastic Algorithms for Population Average Estimation and Image Analysis

Generative Models and Stochastic Algorithms for Population Average Estimation and Image Analysis Generative Models and Stochastic Algorithms for Population Average Estimation and Image Analysis Stéphanie Allassonnière CIS, JHU July, 15th 28 Context : Computational Anatomy Context and motivations :

More information

Part 2: Multivariate fmri analysis using a sparsifying spatio-temporal prior

Part 2: Multivariate fmri analysis using a sparsifying spatio-temporal prior Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 2: Multivariate fmri analysis using a sparsifying spatio-temporal prior Tom Heskes joint work with Marcel van Gerven

More information

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference 1 The views expressed in this paper are those of the authors and do not necessarily reflect the views of the Federal Reserve Board of Governors or the Federal Reserve System. Bayesian Estimation of DSGE

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

20: Gaussian Processes

20: Gaussian Processes 10-708: Probabilistic Graphical Models 10-708, Spring 2016 20: Gaussian Processes Lecturer: Andrew Gordon Wilson Scribes: Sai Ganesh Bandiatmakuri 1 Discussion about ML Here we discuss an introduction

More information

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b) LECTURE 5 NOTES 1. Bayesian point estimators. In the conventional (frequentist) approach to statistical inference, the parameter θ Θ is considered a fixed quantity. In the Bayesian approach, it is considered

More information

Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart

Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart 1 Motivation and Problem In Lecture 1 we briefly saw how histograms

More information

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Bayesian Inference. Chapter 4: Regression and Hierarchical Models Bayesian Inference Chapter 4: Regression and Hierarchical Models Conchi Ausín and Mike Wiper Department of Statistics Universidad Carlos III de Madrid Master in Business Administration and Quantitative

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information

Data are collected along transect lines, with dense data along. Spatial modelling using GMRFs. 200m. Today? Spatial modelling using GMRFs

Data are collected along transect lines, with dense data along. Spatial modelling using GMRFs. 200m. Today? Spatial modelling using GMRFs Centre for Mathematical Sciences Lund University Engineering geology Lund University Results A non-stationary extension The model Estimation Gaussian Markov random fields Basics Approximating Mate rn covariances

More information

Probabilistic Reasoning in Deep Learning

Probabilistic Reasoning in Deep Learning Probabilistic Reasoning in Deep Learning Dr Konstantina Palla, PhD palla@stats.ox.ac.uk September 2017 Deep Learning Indaba, Johannesburgh Konstantina Palla 1 / 39 OVERVIEW OF THE TALK Basics of Bayesian

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Lecture 6: Graphical Models: Learning

Lecture 6: Graphical Models: Learning Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)

More information

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory Statistical Inference Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory IP, José Bioucas Dias, IST, 2007

More information

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.) Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori

More information

On Bayesian Computation

On Bayesian Computation On Bayesian Computation Michael I. Jordan with Elaine Angelino, Maxim Rabinovich, Martin Wainwright and Yun Yang Previous Work: Information Constraints on Inference Minimize the minimax risk under constraints

More information

Bayesian Inverse Problems with L[subscript 1] Priors: A Randomize-Then-Optimize Approach

Bayesian Inverse Problems with L[subscript 1] Priors: A Randomize-Then-Optimize Approach Bayesian Inverse Problems with L[subscript ] Priors: A Randomize-Then-Optimize Approach The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters.

More information

Kernel methods, kernel SVM and ridge regression

Kernel methods, kernel SVM and ridge regression Kernel methods, kernel SVM and ridge regression Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Collaborative Filtering 2 Collaborative Filtering R: rating matrix; U: user factor;

More information

Bayesian Support Vector Machines for Feature Ranking and Selection

Bayesian Support Vector Machines for Feature Ranking and Selection Bayesian Support Vector Machines for Feature Ranking and Selection written by Chu, Keerthi, Ong, Ghahramani Patrick Pletscher pat@student.ethz.ch ETH Zurich, Switzerland 12th January 2006 Overview 1 Introduction

More information

Relevance Vector Machines

Relevance Vector Machines LUT February 21, 2011 Support Vector Machines Model / Regression Marginal Likelihood Regression Relevance vector machines Exercise Support Vector Machines The relevance vector machine (RVM) is a bayesian

More information

Introduction to Gaussian Process

Introduction to Gaussian Process Introduction to Gaussian Process CS 778 Chris Tensmeyer CS 478 INTRODUCTION 1 What Topic? Machine Learning Regression Bayesian ML Bayesian Regression Bayesian Non-parametric Gaussian Process (GP) GP Regression

More information

Lecture 3: Pattern Classification

Lecture 3: Pattern Classification EE E6820: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 1 2 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mixtures

More information

Integrated Non-Factorized Variational Inference

Integrated Non-Factorized Variational Inference Integrated Non-Factorized Variational Inference Shaobo Han, Xuejun Liao and Lawrence Carin Duke University February 27, 2014 S. Han et al. Integrated Non-Factorized Variational Inference February 27, 2014

More information

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the

More information

Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets

Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets Abhirup Datta 1 Sudipto Banerjee 1 Andrew O. Finley 2 Alan E. Gelfand 3 1 University of Minnesota, Minneapolis,

More information

Hilbert Space Methods for Reduced-Rank Gaussian Process Regression

Hilbert Space Methods for Reduced-Rank Gaussian Process Regression Hilbert Space Methods for Reduced-Rank Gaussian Process Regression Arno Solin and Simo Särkkä Aalto University, Finland Workshop on Gaussian Process Approximation Copenhagen, Denmark, May 2015 Solin &

More information

Bayesian Regression Linear and Logistic Regression

Bayesian Regression Linear and Logistic Regression When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we

More information

Recent Advances in Bayesian Inference for Inverse Problems

Recent Advances in Bayesian Inference for Inverse Problems Recent Advances in Bayesian Inference for Inverse Problems Felix Lucka University College London, UK f.lucka@ucl.ac.uk Applied Inverse Problems Helsinki, May 25, 2015 Bayesian Inference for Inverse Problems

More information

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model (& discussion on the GPLVM tech. report by Prof. N. Lawrence, 06) Andreas Damianou Department of Neuro- and Computer Science,

More information

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Elizabeth C. Mannshardt-Shamseldin Advisor: Richard L. Smith Duke University Department

More information

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization

More information

Gaussian processes and bayesian optimization Stanisław Jastrzębski. kudkudak.github.io kudkudak

Gaussian processes and bayesian optimization Stanisław Jastrzębski. kudkudak.github.io kudkudak Gaussian processes and bayesian optimization Stanisław Jastrzębski kudkudak.github.io kudkudak Plan Goal: talk about modern hyperparameter optimization algorithms Bayes reminder: equivalent linear regression

More information