The Bayesian approach to inverse problems
|
|
- Amberly Jean Day
- 5 years ago
- Views:
Transcription
1 The Bayesian approach to inverse problems Youssef Marzouk Department of Aeronautics and Astronautics Center for Computational Engineering Massachusetts Institute of Technology 7 July 2015 Marzouk (MIT) ICERM IdeaLab 7 July / 29
2 Statistical inference Why is a statistical perspective useful in inverse problems? To characterize uncertainty in the inverse solution To understand how this uncertainty depends on the number and quality of observations, features of the forward model, prior information, etc. To make probabilistic predictions To choose good observations or experiments To address questions of model error, model validity, and model selection Marzouk (MIT) ICERM IdeaLab 7 July / 29
3 Bayesian inference Bayes rule p(θ y) = p(y θ)p(θ) p(y) Key idea: model parameters θ are treated as random variables (For simplicity, we let our random variables have densities) Notation θ are model parameters; y are the data; assume both to be finite-dimensional unless otherwise indicated p(θ) is the prior probability density L(θ) p(y θ) is the likelihood function p(θ y) is the posterior probability density p(y) is the evidence, or equivalently, the marginal likelihood Marzouk (MIT) ICERM IdeaLab 7 July / 29
4 Bayesian inference Summaries of the posterior distribution What information to extract? Posterior mean of θ; maximum a posteriori (MAP) estimate of θ Posterior covariance or higher moments of θ Quantiles Credibile intervals: C(y) such that P [θ C(y) y] = 1 α. Credibile intervals are not uniquely defined above; thus consider, for example, the HPD (highest posterior density) region. Posterior realizations: for direct assessment, or to estimate posterior predictions or other posterior expectations Marzouk (MIT) ICERM IdeaLab 7 July / 29
5 Bayesian and frequentist statistics Understanding both perspectives is useful and important... Key differences between these two statistical paradigms Frequentists do not assign probabilities to unknown parameters θ. One can write likelihoods p θ (y) p(y θ) but not priors p(θ) or posteriors. θ is not a random variable. In the frequentist viewpoint, there is no single preferred methodology for inverting the relationship between parameters and data. Instead, consider various estimators ˆθ(y) of θ. The estimator ˆθ is a random variable. Why? Frequentist paradigm considers y to result from a random and repeatable experiment. Marzouk (MIT) ICERM IdeaLab 7 July / 29
6 Bayesian and frequentist statistics Key differences (continued) Evaluate quality of ˆθ through various criteria: bias, variance, mean-square error, consistency, efficiency,... One common estimator is maximum likelihood: ˆθ ML = argmax θ p(y θ). p(y θ) defines a family of distributions indexed by θ. Link to Bayesian approach: MAP estimate maximizes a penalized likelihood. What about Bayesian versus frequentist prediction of y new y θ? Frequentist: plug-in or other estimators of y new Bayesian: posterior prediction via integration Marzouk (MIT) ICERM IdeaLab 7 July / 29
7 Bayesian inference Likelihood functions In general, p(y θ) is a probabilistic model for the data In the inverse problem or parameter estimation context, the likelihood function is where the forward model appears, along with a noise model and (if applicable) an expression for model discrepancy Contrasting example (but not really!): parametric density estimation, where the likelihood function results from the probability density itself. Selected examples of likelihood functions 1 Bayesian linear regression 2 Nonlinear forward model g(θ) with additive Gaussian noise 3 Nonlinear forward model with noise + model discrepancy Marzouk (MIT) ICERM IdeaLab 7 July / 29
8 Bayesian inference Prior distributions In ill-posed parameter estimation problems, e.g., inverse problems, prior information plays a key role Intuitive idea: assign lower probability to values of θ that you don t expect to see, higher probability to values of θ that you do expect to see Examples 1 Gaussian processes with specified covariance kernel 2 Gaussian Markov random fields 3 Gaussian priors derived from differential operators 4 Hierarchical priors 5 Besov space priors 6 Higher-level representations (objects, marked point processes) Marzouk (MIT) ICERM IdeaLab 7 July / 29
9 Gaussian process priors Key idea: any finite-dimensional distribution of the stochastic process θ(x, ω) : D Ω R is multivariate normal. In other words: θ(x, ω) is a collection of jointly Gaussian random variables, indexed by x Specify via mean function and covariance function E [θ(x)] = µ(x) E [(θ(x) µ) (θ(x ) µ)] = C(x, x ) Smoothness of process is controlled by behavior of covariance function as x x Restrictions: stationarity, isotropy,... Marzouk (MIT) ICERM IdeaLab 7 July / 29
10 Example: Gaussian stationary Gaussian process random fields priors! Prior is a stationary Gaussian random field: (exponential covariance kernel) (Gaussian covariance kernel) Both are θ(x, ω) : D M(x,!) Ω R, = with µ(x) D + = [0, " i 1] c 2 i (!). # i (x) K $ i=1 (Karhunen-Loève expansion) Marzouk (MIT) ICERM IdeaLab 7 July / 29
11 Gaussian Markov random fields Key idea: discretize space and specify a sparse inverse covariance ( precision ) matrix W ( p(θ) exp 1 ) 2 γθt Wθ where γ controls scale Full conditionals p(θ i θ i ) are available analytically and may simplify dramatically. Represent as an undirected graphical model Example: E [θ i θ i ] is just an average of site i s nearest neighbors Quite flexible; even used to simulate textures Marzouk (MIT) ICERM IdeaLab 7 July / 29
12 Priors through differential operators Key idea: return to infinite-dimensional setting; again penalize roughness in θ(x) Stuart 2010: define the prior using fractional negative powers of the Laplacian A = : θ N ( θ 0, βa α) Sufficiently large α (α > d/2), along with conditions on the likelihood, ensures that posterior measure is well defined Marzouk (MIT) ICERM IdeaLab 7 July / 29
13 GPs, GMRFs, and SPDEs In fact, all three types of Gaussian priors just described are closely connected. Linear fractional SPDE: ( κ 2 ) β/2 θ(x) = W(x), x R d, β = ν + d/2, κ > 0, ν > 0 Then θ(x) is a Gaussian field with Matérn covariance: C(x, x σ 2 ) = 2 ν 1 Γ(ν) (κ x x ) ν K ν (κ x x ) Covariance ( kernel is Green s function of differential operator κ 2 ) β C(x, x ) = δ(x x ) ν = 1/2 equivalent to exponential covariance; ν equivalent to squared exponential covariance Can construct a discrete GMRF that approximates the solution of SPDE (See Lindgren, Rue, Lindström JRSSB 2011.) Marzouk (MIT) ICERM IdeaLab 7 July / 29
14 Hierarchical Gaussian priors Inverse Problems 24 (2008) D Calvetti and E Somersalo Figure 1. Three realization drawn from the prior (6) with constant variance θ j = θ 0 (left) and from the corresponding prior where the variance is 100 fold at two points indicated by arrows (right). Calvetti & Somersalo, Inverse Problems 24 (2008) where X and W are the n-variate random variables with components X j and W j,respectively, and 1 Marzouk (MIT) ICERM IdeaLab 7 July / 29
15 Figure 4. Approximation of the MAP Estimate of the image (top row) and of the variance (bottom Hierarchical row) after 1, 3Gaussian and 5 iteration of thepriors cyclic algorithm when using the GMRES method to compute the updated of the image at each iteration step. Iteration 1 Iteration 3 Iteration 7 Iteration 1 Iteration 3 Iteration 7 Figure 5. Approximation of the MAP estimate of the image (top row) and of the variance (bottom row) after 1, 3 and 5 iteration of the cyclic algorithm when using the CGLS method to compute the updated of the image at each iteration step Calvetti & Somersalo, Inverse Problems 24 (2008) he graphs displayed in figure 6 refer to the CGLS iteration with inverse gamma hyperprior. he value of the objective function levels off after five iterations, and this could be the basis Marzouk (MIT) ICERM IdeaLab 7 July / 29
16 Non-Gaussian priors Besov space B s pq(t): and θ B s pq (T) := θ(x) = c 0 + c 0 q + Consider p = q = s = 1: j=0 2 j 1 j=0 h=0 2 jq(s p ) θ B 1 11 (T) = c 0 + w j,h ψ j,h (x) 2 j q/p 1/q 1 w j,h p <. h=0 2 j 1 j=0 h=0 2 j/2 w j,h. Then the distribution of θ is a Besov prior if αc 0 and α2 j/2 w j,h are independent and Laplace(1). ( ) Loosely, π(θ) = exp α θ B 1 11 (T). Marzouk (MIT) ICERM IdeaLab 7 July / 29
17 Higher-level representations Marked point processes, and more: Rue & Hurn, Biometrika 86 (1999) Marzouk (MIT) ICERM IdeaLab 7 July / 29
18 Bayesian inference Hierarchical modeling One of the key flexibilities of the Bayesian construction! Hierarchical modeling has important implications for the design of efficient MCMC samplers (later in the lecture) Examples: 1 Unknown noise variance 2 Unknown variance of a Gaussian process prior (cf. choosing the regularization parameter) 3 Many more, as dictated by the physical models at hand Marzouk (MIT) ICERM IdeaLab 7 July / 29
19 Example: prior variance hyperparameter in an inverse diffusion problem hyperprior posterior, ς=10 1, 13 sensors posterior, ς=10 2, 25 sensors 1.2 p(θ) or p(θ d) θ Figure : Posterior marginal density of the variance hyperparameter θ, versus quality of data, contrasted with its hyperprior density. Regularization ς 2 /θ. Marzouk (MIT) ICERM IdeaLab 7 July / 29
20 The linear Gaussian model A key building-block problem: Parameters θ R n, observations y R m Forward model f (θ) = Gθ, where G R m n Additive noise yields observations: y = Gθ + ɛ ɛ N(0, Γ obs ) and is independent of θ Endow θ with a Gaussian prior, θ N(0, Γ pr ). Posterior probability density p(θ y) p(y θ)p(θ) = L(θ)p(θ) ( = exp 1 ) ( 2 (y Gθ)T Γ 1 obs (y Gθ) exp 1 ) 2 θt Γ 1 pr θ ( = exp 1 ) 2 (θ µ pos) T Γ 1 pos (θ µ pos) Marzouk (MIT) ICERM IdeaLab 7 July / 29
21 The linear Gaussian model A key building-block problem: Parameters θ R n, observations y R m Forward model f (θ) = Gθ, where G R m n Additive noise yields observations: y = Gθ + ɛ ɛ N(0, Γ obs ) and is independent of θ Endow θ with a Gaussian prior, θ N(0, Γ pr ). Posterior probability density p(θ y) p(y θ)p(θ) = L(θ)p(θ) ( = exp 1 ) ( 2 (y Gθ)T Γ 1 obs (y Gθ) exp 1 ) 2 θt Γ 1 pr θ ( = exp 1 ) 2 (θ µ pos) T Γ 1 pos (θ µ pos) Marzouk (MIT) ICERM IdeaLab 7 July / 29
22 The linear Gaussian model Posterior is again Gaussian: Γ pos = ( G T Γ 1 obs G + Γ 1 pr ) 1 = Γ pr Γ pr G T ( GΓ pr G T + Γ obs ) 1 GΓpr = (I KG) Γ pr µ pos = Γ pos G T Γ 1 obs y In the context of filtering, K is known as the (optimal) Kalman gain. H := G T Γ 1 obsg is the Hessian of the negative log-likelihood How does low rank of H affect the structure of the posterior? How does H interact with the prior? Marzouk (MIT) ICERM IdeaLab 7 July / 29
23 Likelihood-informed directions Consider the Rayleigh ratio R(w) = w Hw w Γ 1 pr w. When R(w) is large, likelihood dominates the prior in direction w. The ratio is maximized by solutions to the generalized eigenvalue problem Hw = λγ 1 pr w. The posterior covariance can be written as a negative update along these likelihood-informed directions, and approximation can be obtained by using only r largest eigenvalues: Γ pos = Γ pr n i=1 λ i w i wi Γ pr 1 + λ i r i=1 λ i 1 + λ i w i w i (1) Marzouk (MIT) ICERM IdeaLab 7 July / 29
24 Likelihood-informed directions Consider the Rayleigh ratio R(w) = w Hw w Γ 1 pr w. When R(w) is large, likelihood dominates the prior in direction w. The ratio is maximized by solutions to the generalized eigenvalue problem Hw = λγ 1 pr w. The posterior covariance can be written as a negative update along these likelihood-informed directions, and approximation can be obtained by using only r largest eigenvalues: Γ pos = Γ pr n i=1 λ i w i wi Γ pr 1 + λ i r i=1 λ i 1 + λ i w i w i (1) Marzouk (MIT) ICERM IdeaLab 7 July / 29
25 Likelihood-informed directions Consider the Rayleigh ratio R(w) = w Hw w Γ 1 pr w. When R(w) is large, likelihood dominates the prior in direction w. The ratio is maximized by solutions to the generalized eigenvalue problem Hw = λγ 1 pr w. The posterior covariance can be written as a negative update along these likelihood-informed directions, and approximation can be obtained by using only r largest eigenvalues: Γ pos = Γ pr n i=1 λ i w i wi Γ pr 1 + λ i r i=1 λ i 1 + λ i w i w i (1) Marzouk (MIT) ICERM IdeaLab 7 July / 29
26 Optimality results for Γ pos It turns out that the approximation Γpos = Γ pr r i=1 λ i 1 + λ i w i w i (2) is optimal in a class of loss functions L( Γ pos, Γ pos ) for approximations of form Γ pos = Γ pr KK, where rank(k) r. 1 Γpos minimises the Hellinger distance and the KL-divergence between N (µ pos (y), Γ pos ) and N (µ pos (y), Γ pos ). The results can also be used to devise efficient approximations for the posterior mean. λ = 1 means that prior and likelihood are roughly balanced. Truncate at λ = 0.1, for instance. 1 For details see Spantini et al., Optimal low-rank approximations of Bayesian linear inverse problems, Marzouk (MIT) ICERM IdeaLab 7 July / 29
27 Optimality results for Γ pos It turns out that the approximation Γpos = Γ pr r i=1 λ i 1 + λ i w i w i (2) is optimal in a class of loss functions L( Γ pos, Γ pos ) for approximations of form Γ pos = Γ pr KK, where rank(k) r. 1 Γpos minimises the Hellinger distance and the KL-divergence between N (µ pos (y), Γ pos ) and N (µ pos (y), Γ pos ). The results can also be used to devise efficient approximations for the posterior mean. λ = 1 means that prior and likelihood are roughly balanced. Truncate at λ = 0.1, for instance. 1 For details see Spantini et al., Optimal low-rank approximations of Bayesian linear inverse problems, Marzouk (MIT) ICERM IdeaLab 7 July / 29
28 Optimality results for Γ pos It turns out that the approximation Γpos = Γ pr r i=1 λ i 1 + λ i w i w i (2) is optimal in a class of loss functions L( Γ pos, Γ pos ) for approximations of form Γ pos = Γ pr KK, where rank(k) r. 1 Γpos minimises the Hellinger distance and the KL-divergence between N (µ pos (y), Γ pos ) and N (µ pos (y), Γ pos ). The results can also be used to devise efficient approximations for the posterior mean. λ = 1 means that prior and likelihood are roughly balanced. Truncate at λ = 0.1, for instance. 1 For details see Spantini et al., Optimal low-rank approximations of Bayesian linear inverse problems, Marzouk (MIT) ICERM IdeaLab 7 July / 29
29 Optimality results for Γ pos It turns out that the approximation Γpos = Γ pr r i=1 λ i 1 + λ i w i w i (2) is optimal in a class of loss functions L( Γ pos, Γ pos ) for approximations of form Γ pos = Γ pr KK, where rank(k) r. 1 Γpos minimises the Hellinger distance and the KL-divergence between N (µ pos (y), Γ pos ) and N (µ pos (y), Γ pos ). The results can also be used to devise efficient approximations for the posterior mean. λ = 1 means that prior and likelihood are roughly balanced. Truncate at λ = 0.1, for instance. 1 For details see Spantini et al., Optimal low-rank approximations of Bayesian linear inverse problems, Marzouk (MIT) ICERM IdeaLab 7 July / 29
30 Remarks on the optimal approximation Γ pos = Γ pr KK, KK = r i=1 λ i w i wi 1 + λ i The form of the optimal update is widely used (Flath et al. 2011) Compute with Lanczos, randomized SVD, etc. Directions w i = Γ 1 pr w i maximize the relative difference between prior and posterior variance: Var ( w i x ) Var ( w i x y ) Var ( w i x ) = λ i 1 + λ i Using the Frobenius norm as a loss would instead yield directions of greatest absolute difference between prior and posterior variance. Marzouk (MIT) ICERM IdeaLab 7 July / 29
31 A metric between covariance matrices Förstner metric Let A, B 0, and (σ i ) be the eigenvalues of (A, B), then: df [ln ( )] 2 (A, B) = tr 2 B 1 2 AB 1 2 B 1/2 AB 1/2 σ 2 σ 1 = i ln 2 (σ i ) Compare curvatures: sup u u Au u Bu = σ 1 Invariance properties: d F (A, B) = d F ( A 1, B 1) d F (A, B) = d F ( MAM, MBM ) Frobenius d F (A, B) = A B F does not share the same properties Marzouk (MIT) ICERM IdeaLab 7 July / 29
32 Example: computerized tomography X-rays travel from sources to detectors through an object of interest. The intensities from the sources are measured at the detectors, and the goal is to reconstruct the density of the object intensity detector pixel This synthetic example is motivated by a real application: real-time X-ray imaging of logs that enter a saw mill for the purpose of automatic quality control. 2 2 Check Marzouk (MIT) ICERM IdeaLab 7 July / 29
33 Example: computerized tomography Weaker data faster decay of generalized eigenvalues lower order approximations possible limited angle 10-3 prior limited angle eigenvalues limited angle full angle generalized eigenvalues full angle df full angle index i index i rank of update In the limited angle case, roughly r = 200 is enough to get a good approximation (with full angle r 800 needed). prior 5.4 rank = 50 rank = 100 rank = 200 posterior Marzouk (MIT) ICERM IdeaLab 7 July / 29
34 Example: computerized tomography Approximation of the mean: µ pos (y) = Γ pos G Γ 1 obs y A r y Marzouk (MIT) ICERM IdeaLab 7 July / 29
35 Questions yet to answer How to simulate from or explore the posterior distribution? How to make Bayesian inference computationally tractable when the forward model is expensive (e.g., a PDE) and the parameters are high- or infinite-dimensional? Downstream questions: model selection, optimal experimental design, decision-making, etc. Marzouk (MIT) ICERM IdeaLab 7 July / 29
Density Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationStochastic Spectral Approaches to Bayesian Inference
Stochastic Spectral Approaches to Bayesian Inference Prof. Nathan L. Gibson Department of Mathematics Applied Mathematics and Computation Seminar March 4, 2011 Prof. Gibson (OSU) Spectral Approaches to
More informationGraphical Models for Collaborative Filtering
Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,
More informationDimension-Independent likelihood-informed (DILI) MCMC
Dimension-Independent likelihood-informed (DILI) MCMC Tiangang Cui, Kody Law 2, Youssef Marzouk Massachusetts Institute of Technology 2 Oak Ridge National Laboratory 2 August 25 TC, KL, YM DILI MCMC USC
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning
More informationOutline Lecture 2 2(32)
Outline Lecture (3), Lecture Linear Regression and Classification it is our firm belief that an understanding of linear models is essential for understanding nonlinear ones Thomas Schön Division of Automatic
More informationCOS513 LECTURE 8 STATISTICAL CONCEPTS
COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions
More informationSTA 4273H: Sta-s-cal Machine Learning
STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our
More informationIntroduction to Probabilistic Machine Learning
Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationCovariance function estimation in Gaussian process regression
Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian
More informationState Space Representation of Gaussian Processes
State Space Representation of Gaussian Processes Simo Särkkä Department of Biomedical Engineering and Computational Science (BECS) Aalto University, Espoo, Finland June 12th, 2013 Simo Särkkä (Aalto University)
More informationA short introduction to INLA and R-INLA
A short introduction to INLA and R-INLA Integrated Nested Laplace Approximation Thomas Opitz, BioSP, INRA Avignon Workshop: Theory and practice of INLA and SPDE November 7, 2018 2/21 Plan for this talk
More informationGAUSSIAN PROCESS REGRESSION
GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The
More informationTransport maps and dimension reduction for Bayesian computation Youssef Marzouk
Transport maps and dimension reduction for Bayesian computation Youssef Marzouk Massachusetts Institute of Technology Department of Aeronautics & Astronautics Center for Computational Engineering http://uqgroup.mit.edu
More informationSTA414/2104 Statistical Methods for Machine Learning II
STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationPractical Bayesian Optimization of Machine Learning. Learning Algorithms
Practical Bayesian Optimization of Machine Learning Algorithms CS 294 University of California, Berkeley Tuesday, April 20, 2016 Motivation Machine Learning Algorithms (MLA s) have hyperparameters that
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin
More informationSpatial Statistics with Image Analysis. Outline. A Statistical Approach. Johan Lindström 1. Lund October 6, 2016
Spatial Statistics Spatial Examples More Spatial Statistics with Image Analysis Johan Lindström 1 1 Mathematical Statistics Centre for Mathematical Sciences Lund University Lund October 6, 2016 Johan Lindström
More informationLecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu
Lecture: Gaussian Process Regression STAT 6474 Instructor: Hongxiao Zhu Motivation Reference: Marc Deisenroth s tutorial on Robot Learning. 2 Fast Learning for Autonomous Robots with Gaussian Processes
More informationMCMC Sampling for Bayesian Inference using L1-type Priors
MÜNSTER MCMC Sampling for Bayesian Inference using L1-type Priors (what I do whenever the ill-posedness of EEG/MEG is just not frustrating enough!) AG Imaging Seminar Felix Lucka 26.06.2012 , MÜNSTER Sampling
More informationGaussian Process Regression
Gaussian Process Regression 4F1 Pattern Recognition, 21 Carl Edward Rasmussen Department of Engineering, University of Cambridge November 11th - 16th, 21 Rasmussen (Engineering, Cambridge) Gaussian Process
More informationGWAS V: Gaussian processes
GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011
More informationGaussian Processes (10/16/13)
STA561: Probabilistic machine learning Gaussian Processes (10/16/13) Lecturer: Barbara Engelhardt Scribes: Changwei Hu, Di Jin, Mengdi Wang 1 Introduction In supervised learning, we observe some inputs
More informationRegularized Regression A Bayesian point of view
Regularized Regression A Bayesian point of view Vincent MICHEL Director : Gilles Celeux Supervisor : Bertrand Thirion Parietal Team, INRIA Saclay Ile-de-France LRI, Université Paris Sud CEA, DSV, I2BM,
More informationNonparameteric Regression:
Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationBayesian inference of random fields represented with the Karhunen-Loève expansion
Bayesian inference of random fields represented with the Karhunen-Loève expansion Felipe Uribe a,, Iason Papaioannou a, Wolfgang Betz a, Daniel Straub a a Engineering Risk Analysis Group, Technische Universität
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September
More informationUnsupervised Learning
Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationCSci 8980: Advanced Topics in Graphical Models Gaussian Processes
CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian
More informationOutline lecture 2 2(30)
Outline lecture 2 2(3), Lecture 2 Linear Regression it is our firm belief that an understanding of linear models is essential for understanding nonlinear ones Thomas Schön Division of Automatic Control
More informationSTA414/2104. Lecture 11: Gaussian Processes. Department of Statistics
STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations
More informationLecture 13 Fundamentals of Bayesian Inference
Lecture 13 Fundamentals of Bayesian Inference Dennis Sun Stats 253 August 11, 2014 Outline of Lecture 1 Bayesian Models 2 Modeling Correlations Using Bayes 3 The Universal Algorithm 4 BUGS 5 Wrapping Up
More informationCOMP90051 Statistical Machine Learning
COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 2. Statistical Schools Adapted from slides by Ben Rubinstein Statistical Schools of Thought Remainder of lecture is to provide
More informationSummary STK 4150/9150
STK4150 - Intro 1 Summary STK 4150/9150 Odd Kolbjørnsen May 22 2017 Scope You are expected to know and be able to use basic concepts introduced in the book. You knowledge is expected to be larger than
More informationHypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33
Hypothesis Testing Econ 690 Purdue University Justin L. Tobias (Purdue) Testing 1 / 33 Outline 1 Basic Testing Framework 2 Testing with HPD intervals 3 Example 4 Savage Dickey Density Ratio 5 Bartlett
More informationg-priors for Linear Regression
Stat60: Bayesian Modeling and Inference Lecture Date: March 15, 010 g-priors for Linear Regression Lecturer: Michael I. Jordan Scribe: Andrew H. Chan 1 Linear regression and g-priors In the last lecture,
More informationFactor Analysis and Kalman Filtering (11/2/04)
CS281A/Stat241A: Statistical Learning Theory Factor Analysis and Kalman Filtering (11/2/04) Lecturer: Michael I. Jordan Scribes: Byung-Gon Chun and Sunghoon Kim 1 Factor Analysis Factor analysis is used
More informationGaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008
Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:
More informationADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II
1 Non-linear regression techniques Part - II Regression Algorithms in this Course Support Vector Machine Relevance Vector Machine Support vector regression Boosting random projections Relevance vector
More informationNonparametric Regression With Gaussian Processes
Nonparametric Regression With Gaussian Processes From Chap. 45, Information Theory, Inference and Learning Algorithms, D. J. C. McKay Presented by Micha Elsner Nonparametric Regression With Gaussian Processes
More informationGWAS IV: Bayesian linear (variance component) models
GWAS IV: Bayesian linear (variance component) models Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS IV: Bayesian
More informationNonparametric Bayesian Methods - Lecture I
Nonparametric Bayesian Methods - Lecture I Harry van Zanten Korteweg-de Vries Institute for Mathematics CRiSM Masterclass, April 4-6, 2016 Overview of the lectures I Intro to nonparametric Bayesian statistics
More informationPart 1: Expectation Propagation
Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 1: Expectation Propagation Tom Heskes Machine Learning Group, Institute for Computing and Information Sciences Radboud
More informationVariational Principal Components
Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings
More informationBAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA
BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA Intro: Course Outline and Brief Intro to Marina Vannucci Rice University, USA PASI-CIMAT 04/28-30/2010 Marina Vannucci
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters
More informationHierarchical Modeling for Univariate Spatial Data
Hierarchical Modeling for Univariate Spatial Data Geography 890, Hierarchical Bayesian Models for Environmental Spatial Data Analysis February 15, 2011 1 Spatial Domain 2 Geography 890 Spatial Domain This
More informationBayesian Calibration of Simulators with Structured Discretization Uncertainty
Bayesian Calibration of Simulators with Structured Discretization Uncertainty Oksana A. Chkrebtii Department of Statistics, The Ohio State University Joint work with Matthew T. Pratola (Statistics, The
More informationBayesian Inference. Chapter 4: Regression and Hierarchical Models
Bayesian Inference Chapter 4: Regression and Hierarchical Models Conchi Ausín and Mike Wiper Department of Statistics Universidad Carlos III de Madrid Advanced Statistics and Data Mining Summer School
More informationStatistics: Learning models from data
DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial
More informationp L yi z n m x N n xi
y i z n x n N x i Overview Directed and undirected graphs Conditional independence Exact inference Latent variables and EM Variational inference Books statistical perspective Graphical Models, S. Lauritzen
More information1 Bayesian Linear Regression (BLR)
Statistical Techniques in Robotics (STR, S15) Lecture#10 (Wednesday, February 11) Lecturer: Byron Boots Gaussian Properties, Bayesian Linear Regression 1 Bayesian Linear Regression (BLR) In linear regression,
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationσ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =
Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,
More informationGaussian with mean ( µ ) and standard deviation ( σ)
Slide from Pieter Abbeel Gaussian with mean ( µ ) and standard deviation ( σ) 10/6/16 CSE-571: Robotics X ~ N( µ, σ ) Y ~ N( aµ + b, a σ ) Y = ax + b + + + + 1 1 1 1 1 1 1 1 1 1, ~ ) ( ) ( ), ( ~ ), (
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 12: Gaussian Belief Propagation, State Space Models and Kalman Filters Guest Kalman Filter Lecture by
More informationICML Scalable Bayesian Inference on Point processes. with Gaussian Processes. Yves-Laurent Kom Samo & Stephen Roberts
ICML 2015 Scalable Nonparametric Bayesian Inference on Point Processes with Gaussian Processes Machine Learning Research Group and Oxford-Man Institute University of Oxford July 8, 2015 Point Processes
More informationComputer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression
Group Prof. Daniel Cremers 9. Gaussian Processes - Regression Repetition: Regularized Regression Before, we solved for w using the pseudoinverse. But: we can kernelize this problem as well! First step:
More informationProbabilistic Graphical Models
2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector
More informationLecture 1: Bayesian Framework Basics
Lecture 1: Bayesian Framework Basics Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de April 21, 2014 What is this course about? Building Bayesian machine learning models Performing the inference of
More informationGenerative Models and Stochastic Algorithms for Population Average Estimation and Image Analysis
Generative Models and Stochastic Algorithms for Population Average Estimation and Image Analysis Stéphanie Allassonnière CIS, JHU July, 15th 28 Context : Computational Anatomy Context and motivations :
More informationPart 2: Multivariate fmri analysis using a sparsifying spatio-temporal prior
Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 2: Multivariate fmri analysis using a sparsifying spatio-temporal prior Tom Heskes joint work with Marcel van Gerven
More informationBayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference
1 The views expressed in this paper are those of the authors and do not necessarily reflect the views of the Federal Reserve Board of Governors or the Federal Reserve System. Bayesian Estimation of DSGE
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More information20: Gaussian Processes
10-708: Probabilistic Graphical Models 10-708, Spring 2016 20: Gaussian Processes Lecturer: Andrew Gordon Wilson Scribes: Sai Ganesh Bandiatmakuri 1 Discussion about ML Here we discuss an introduction
More informationLECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)
LECTURE 5 NOTES 1. Bayesian point estimators. In the conventional (frequentist) approach to statistical inference, the parameter θ Θ is considered a fixed quantity. In the Bayesian approach, it is considered
More informationStatistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart
Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart 1 Motivation and Problem In Lecture 1 we briefly saw how histograms
More informationBayesian Inference. Chapter 4: Regression and Hierarchical Models
Bayesian Inference Chapter 4: Regression and Hierarchical Models Conchi Ausín and Mike Wiper Department of Statistics Universidad Carlos III de Madrid Master in Business Administration and Quantitative
More informationDefault Priors and Effcient Posterior Computation in Bayesian
Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature
More informationData are collected along transect lines, with dense data along. Spatial modelling using GMRFs. 200m. Today? Spatial modelling using GMRFs
Centre for Mathematical Sciences Lund University Engineering geology Lund University Results A non-stationary extension The model Estimation Gaussian Markov random fields Basics Approximating Mate rn covariances
More informationProbabilistic Reasoning in Deep Learning
Probabilistic Reasoning in Deep Learning Dr Konstantina Palla, PhD palla@stats.ox.ac.uk September 2017 Deep Learning Indaba, Johannesburgh Konstantina Palla 1 / 39 OVERVIEW OF THE TALK Basics of Bayesian
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationLecture 6: Graphical Models: Learning
Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)
More informationParametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory
Statistical Inference Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory IP, José Bioucas Dias, IST, 2007
More informationComputer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)
Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori
More informationOn Bayesian Computation
On Bayesian Computation Michael I. Jordan with Elaine Angelino, Maxim Rabinovich, Martin Wainwright and Yun Yang Previous Work: Information Constraints on Inference Minimize the minimax risk under constraints
More informationBayesian Inverse Problems with L[subscript 1] Priors: A Randomize-Then-Optimize Approach
Bayesian Inverse Problems with L[subscript ] Priors: A Randomize-Then-Optimize Approach The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters.
More informationKernel methods, kernel SVM and ridge regression
Kernel methods, kernel SVM and ridge regression Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Collaborative Filtering 2 Collaborative Filtering R: rating matrix; U: user factor;
More informationBayesian Support Vector Machines for Feature Ranking and Selection
Bayesian Support Vector Machines for Feature Ranking and Selection written by Chu, Keerthi, Ong, Ghahramani Patrick Pletscher pat@student.ethz.ch ETH Zurich, Switzerland 12th January 2006 Overview 1 Introduction
More informationRelevance Vector Machines
LUT February 21, 2011 Support Vector Machines Model / Regression Marginal Likelihood Regression Relevance vector machines Exercise Support Vector Machines The relevance vector machine (RVM) is a bayesian
More informationIntroduction to Gaussian Process
Introduction to Gaussian Process CS 778 Chris Tensmeyer CS 478 INTRODUCTION 1 What Topic? Machine Learning Regression Bayesian ML Bayesian Regression Bayesian Non-parametric Gaussian Process (GP) GP Regression
More informationLecture 3: Pattern Classification
EE E6820: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 1 2 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mixtures
More informationIntegrated Non-Factorized Variational Inference
Integrated Non-Factorized Variational Inference Shaobo Han, Xuejun Liao and Lawrence Carin Duke University February 27, 2014 S. Han et al. Integrated Non-Factorized Variational Inference February 27, 2014
More informationEcon 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines
Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the
More informationHierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets
Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets Abhirup Datta 1 Sudipto Banerjee 1 Andrew O. Finley 2 Alan E. Gelfand 3 1 University of Minnesota, Minneapolis,
More informationHilbert Space Methods for Reduced-Rank Gaussian Process Regression
Hilbert Space Methods for Reduced-Rank Gaussian Process Regression Arno Solin and Simo Särkkä Aalto University, Finland Workshop on Gaussian Process Approximation Copenhagen, Denmark, May 2015 Solin &
More informationBayesian Regression Linear and Logistic Regression
When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we
More informationRecent Advances in Bayesian Inference for Inverse Problems
Recent Advances in Bayesian Inference for Inverse Problems Felix Lucka University College London, UK f.lucka@ucl.ac.uk Applied Inverse Problems Helsinki, May 25, 2015 Bayesian Inference for Inverse Problems
More informationTutorial on Gaussian Processes and the Gaussian Process Latent Variable Model
Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model (& discussion on the GPLVM tech. report by Prof. N. Lawrence, 06) Andreas Damianou Department of Neuro- and Computer Science,
More informationAsymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands
Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Elizabeth C. Mannshardt-Shamseldin Advisor: Richard L. Smith Duke University Department
More informationFundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner
Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization
More informationGaussian processes and bayesian optimization Stanisław Jastrzębski. kudkudak.github.io kudkudak
Gaussian processes and bayesian optimization Stanisław Jastrzębski kudkudak.github.io kudkudak Plan Goal: talk about modern hyperparameter optimization algorithms Bayes reminder: equivalent linear regression
More information