Introduction to Gaussian Processes

Size: px
Start display at page:

Download "Introduction to Gaussian Processes"

Transcription

1 Introduction to Gaussian Processes 1

2 Objectives to express prior knowledge/beliefs about model outputs using Gaussian process (GP) to sample functions from the probability measure defined by GP to build a Bayesian surrogate of a model using GP to use the GP model for uncertainty propagation to use the GP model for global optimization 2

3 The Best Book on the Subject Gaussian Processes for Machine Learning Carl Edward Rasmussen and Christopher K. I. Williams The MIT Press, ISBN X. Free online at With Matlab code. 3

4 The Best Code on the Subject GPy (in Python) from the group of N. University of Sheffield My lab s 4

5 Motivation Input Parameters Physical model Quantities of interest x f y We ll think about it as a mathematical function: y = f (x) 5

6 p(something) = probability something is true The essence of the present theory is that no probability, direct, prior, or posterior, is simply a frequency. H. Jeffreys (1939) Probability Theory: The Logic of Science, by E. T. Jaynes 6

7 Some of the Problems of Uncertainty Quantification Uncertainty propagation: Model calibration: p(x ) f p(y ) y f p(x y ) Design optimization under uncertainty: x * = max E ξ [O(f (x;ξ))] 7

8 Why are these problems difficult? High computational cost of models. High-dimensionality of inputs/outputs. Fusion of information from multiple sources. Quantification of model-form uncertainties. 8

9 The Surrogate Idea Do a finite number of simulations. Replace model with an approximation: y ˆ f (x) The surrogate is usually cheap to evaluate. Solve the UQ problem with the surrogate. 9

10 The Surrogate Idea 10

11 Classic Approach to Surrogates Usually f ˆ(x) = M w φ (x) j j j =1 with weights by looking at : D = {(x,y )} N i i i =1 using either a quadrature rule (orthogonal basis), least squares, or machine learning techniques. 11

12 Examples of Surrogates generalized polynomial chaos Fourrier expansions splines wavelets neural networks support vector machines compressive sensing 12

13 Limitations of Surrogates limited expressivity inability to quantify epistemic uncertainties due to limited number of observations high-dimensionality 13

14 Questions of interest You can do 5 simulations What is the best you can say about the solution of the X problem with this budget? If you could do one more simulation where should it be? 14

15 The Bayesian surrogate idea Put prior on functions. Evaluate model output on a finite set of inputs. Compute the posterior on functions. Use Bayes rule to solve UQ problems. Most people, even Bayesians, think that this sounds crazy when they first hear about it. -Persi Diaconis (1988) 15

16 Bayesian surrogate 16

17 Bayesian Surrogate 17

18 Bayesian Surrogate Bayesian surrogate = Gaussian process 18

19 Gaussian Process Regression is extremely expressive since it is equivalent to an infinite expansion: f ˆ(x) = w φ (x) j j j =1 with basis functions that can be tuned includes as sub-cases many standard methods is fully Bayesian Ch. 7, Rasmussen (2006) 19

20 Let s set up our workspace before we start going into the mathematical details 20

21 Definition of a Gaussian process A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution. Let s just explain in plain English what it is 21

22 Definition of a Gaussian process Input Parameters Physical model Quantities of interest x f y Treat f as unknown Unknown = uncertain = random, i.e., described with probabilities Let us denote our beliefs about f as follows: f ( ) ~ p(f ( )) 22

23 Definition of a Gaussian process A Gaussian process needs two ingredients: a mean function a covariance function It uses them to define a probability measure on the space of functions. We write: f ( ) ~ p(f ( )) = GP( f ( ) m( ),k(, ) ) 23

24 The mean function What do you think f(x) could be? Define the mean function by: m(x ) = E[f (x )] It models your expectation about f(x). 24

25 The covariance function How sure are you about this prediction? Consider the variance: ( k(x,x ) = E f (x ) m(x )) 2 It models your uncertainty about f(x). 25

26 The covariance function Now, consider two inputs x and x. How close do you think the corresponding outputs are? Consider the covariance function: k(x,x ) = E ( f (x) m(x) ) f (x ) m(x ) ( ) It models yours beliefs about the similarity of f(x) and f(x ). 26

27 To wrap it up We write: f ( ) ~ GP( f ( ) m( ),k(, ) ) and we interpret: m(x): What do I think f(x) could be? k(x, x): How sure am I about my expectation of f(x)? k(x, x ): How similar are f(x) and f(x )? 27

28 The most common covariance function: Squared Exponential (SE) Also known as radial basis function (RBF). k(x, x ) = v exp 1 2 Variance models uncertainty about f(x) d i =1 ( x x ) 2 i i l i 2. Length-scale models similarity of specific input dimensions 28

29 Example 1.1: Drawing covariance functions You have 15 minutes 29

30 The covariance matrix Consider an arbitrary selection of input points and their corresponding outputs: X = {x 1,,x n } f = {f (x 1 ),,f (x n )} The covariance matrix is defined to be: E (f m)(f m) T := K := k(x 1,x 1 ) k(x 1,x n )! "! k(x n,x 1 ) k(x n,x n ) 30

31 Restrictions on the covariance functions The covariance function has to be positive definite. That is, for any finite collection of inputs, the covariance matrix must be positive definite: K := k(x 1,x 1 ) k(x 1,x n )! "! k(x n,x 1 ) k(x n,x n ) 31

32 Covariance function factory The sum of two covariance functions is a covariance function. What does this model? k(x,x') = k 1 (x,x ) + k 2 (x,x ) The belief that the response comes from two sources: f (x) = f 1 (x) + f 2 (x) f ( ) ~ GP( f ( ) m ( ),k (, ) ),i = 1,2 i i i i 32

33 Covariance function factory The product of two covariance functions is a covariance function. k(x,x ) = k 1 (x,x )k 2 (x,x ) What does this model? The belief that the response comes from two sources: f (x) = f 1 (x)f 2 (x) f ( ) ~ GP( f ( ) m ( ),k (, ) ),i = 1,2 i i i i 33

34 Example 1.2: The covariance matrix and some properties of covariance functions 34

35 Sampling a Gaussian process A Gaussian process defines a probability measure over a function space: f ( ) ~ GP( f ( ) m( ),k(, ) ) How can we sample functions from it? Sample f at a finite, albeit large, set of inputs. 35

36 Sampling a Gaussian process Take a finite number of inputs: X = {x 1,,x n } and consider the model output on them: f = {f (x 1 ),,f (x n )} We believe that they are distributed according to: f ~ N ( f m,k ) 36

37 Sampling a Gaussian process Ok, so we need to be able to sample from this: with f ~ N ( f m,k ) m = m(x 1 )! m(x n ),K := k(x 1,x 1 ) k(x 1,x n )! "! k(x n,x 1 ) k(x n,x n ). 37

38 Sampling a Gaussian process To sample from: f ~ N ( f m,k ) Take the lower Cholesky decomposition L of K: Sample a standard normal: and set: K = LL T z ~ N ( z 0,I ) n n f = m + Lz 38

39 Sampling from a Gaussian process 39

40 Sampling from a Gaussian process 40

41 Changing the length scale 41

42 The samples are as smooth as the covariance Infinitely smooth SE covariance 42

43 The samples are as smooth as the covariance Matern 2-3, 2 times differentiable 43

44 The samples are as smooth as the covariance Exponential, continuous, nowhere differentiable 44

45 Invariances may be builtinto covariance functions Periodic Exponential, period =

46 Example 2.1 & 2.2: Drawing samples from a Gaussian process 46

47 Example 1: Motivation 47

48 Selection of the starting pool of input points Random (e.g., uniformly) selection is a good starting point. A latin hyper-cube design is a much better choice. Code develop by our lab here: 48

49 Adaptive selection: What is your goal? Demo: selecting observations with maximum predictive uncertainty 49

50 Gaussian process regression Assume that we have observed: X = {x,,x }, 1 N f = {f (x ),,f (x )} 1 N and that we want to make predictions at an arbitrary set of test inputs: X * = {x 1 *,,x N * * } f * = {f (x * ),,f (x * 1 N * )} 50

51 Gaussian process regression Since, we have assumed a priori that: f ( ) ~ GP( f ( ) m( ),k(, ) ) then by definition: f f * ~ N f f * m m *, K(X,X) K(X,X * ) K(X *,X) K(X *,X * ) 51

52 Gaussian process regression Mean on observations Covariance matrix of observations f f * ~ N f f * m m *, K(X,X) K(X,X * ) K(X *,X) K(X *,X * ) Mean on test inputs Cross covariance matrix (testobserved) 52 Covariance matrix of test inputs

53 Gaussian process regression f f * ~ N f f * m m *, K(X,X) K(X,X * ) K(X *,X) K(X *,X * ) Bayes rule f * X *,X,f ~? 53

54 Gaussian process regression f f * ~ N f f * m m *, K(X,X) K(X,X * ) K(X *,X) K(X *,X * ) f * X * (,X,f ~ N f * m!,k! ), m! = m * + K(X *,X)K 1 (f m), K! = K * K(X *,X)K 1 K(X,X * ) 54 Bayes rule Proof in Ch. 2.3 Bishop (2006)

55 The posterior Gaussian process Since the choice of test points was arbitrary, the procedure actually defines a posterior Gaussian process: ( f ( ) X,f ~ GP f ( )!m( ), k(, )! ),!m(x) = m(x) + K(x,X)K 1 (f m),!k(x,x ) = k(x,x ) K(x,X)K 1 K(X,x ) This encodes are beliefs about the model output after seeing the data. Predictions require a Cholesky decomposition. 55

56 Gaussian process regression Bayes rule Prior GP Posterior GP 56

57 Posterior GP: The point predictive distribution ( f ( ) X,f ~ GP f ( )!m( ), k(, )! ), Looking at just one point, we get the point predictive distribution: y x,x,f ~ N ( y!m(x), σ! 2 (x)), σ! 2 (x) = k(x,x).! You may use the mean as a surrogate. 57

58 Gaussian process regression y x,x,f ~ N ( y!m(x), σ! 2 (x)), f (x) =!m(x) ± 2! σ (x) 58

59 Gaussian process regression - Noisy observations Assume that we have observed: X = {x,,x }, 1 N y = {y,,y } 1 N where y is a noisy measurement of the ideal f(x) (MD simulation). We need to model the measurement process using a likelihood (typically Gaussian): y f (x ) = N ( y f (x ),σ 2 ) i i i i Noise (likelihood) variance 59

60 Gaussian process regression - Noisy observations The posterior GP, changes to: f ( ) X,f,σ 2 ( ~ GP f ( )!m( ), k(, )! ),!m(x) = m(x) + K(x,X)(K + σ 2 I N ) 1 (f m),!k(x,x ) = k(x,x ) K(x,X)(K + σ 2 I N ) 1 K(X,x ) and the point predictive distribution to: y x,x,f ~ N ( y!m(x), σ! 2 (x)), σ! 2 (x) = k(x,x)! + σ 2 60

61 Gaussian process regression - Noisy observations Each choice of the noise corresponds to a different interpretation of the data. 61

62 Noise improves numerical stability It is common to use small noise even if there is not any in the data. Cholesky fails when covariance is close to being semi-positive definite. Adding a small noise improves numerical stability. It is known as the jitter or as the nugget in this case. 62

63 Example 3.1, Questions 1-5: Gaussian process regression 63

64 Model Selection for GP regression Our prior assumptions were conditional mean and covariance parameters: f ( ) θ ~ GP( f ( ) m( ;θ),k(, ;θ) ) Observations are conditional on the noise level: y f (x ),σ 2 = N ( y f (x ),σ 2 ) Thus, the likelihood of all the observations is: ( ) y X,θ,σ 2 ~ p(y X,θ,σ 2 ) = N y m,k + σ 2 I N 64

65 Model Selection for GP regression The (marginal) likelihood of all the observations is: y X,θ,σ 2 ~ p(y X,θ,σ 2 ) = N ( y m,k + σ 2 I ) N To complete the prior specification, we must give: θ,σ 2 ~ p(θ,σ 2 ). Then, after seeing the data, our beliefs about the parameters should change to: θ,σ 2 X,y = p(θ,σ 2 X,y) p(y X,σ 2 )p(θ,σ 2 ) 65

66 Model Selection for GP regression After seeing the data, our beliefs about the parameters are: θ,σ 2 X,y = p(θ,σ 2 X,y) p(y X,σ 2 )p(θ,σ 2 ) Ideally, we would sample from this posterior with MCMC. Alternatively, we can find the MAP estimate of the parameters: { } θ *,(σ * ) 2 = argmax θ,σ logp(y X,σ 2 ) + logp(θ,σ 2 ) 66

67 Model Selection for GP regression MAP estimate of the parameters: θ *,(σ * ) 2 = argmax { logp(y X,σ 2 ) + logp(θ,σ 2 )} θ,σ If our prior assumptions are vague, then logp(θ,σ 2 ) = const and we are effectively just maximizing the likelihood. 67

68 Model Selection for GP regression noise standard deviation characteristic lengthscale Contour plot of marginal likelihood for specific example in Rasmussen (2006) 68

69 Careful: Different optima correspond to different interpretations noise standard deviation characteristic lengthscale (a) 2 2 output, y 1 0 output, y input, x input, x Contour plot of marginal (b) likelihood for specific example (c) in Rasmussen (2006) 2 69

70 Example 3.1, Questions 6-8, Example

71 Bayesian global optimization - The problem Problem: x * = argmin x f (x) when the objective is: very expensive to evaluate you don t have gradients might be noisy dimensionality < 30 parameters 71

72 Bayesian global optimization - The Idea Assume that we have observed: X = {x,,x }, 1 N y = {y,,y } 1 N and that we can make one more observation. Which observation do we choose? 72

73 Bayesian global optimization - The Idea Let s say that we make an observation at x and we see y. The improvement we would observe is: I(x,y ) = 0, y > min n y n min n y n y, otherwise But y, could be anything 73

74 Bayesian global optimization - The Idea Use data to build a GP that represents our state of knowledge about the model output. The point predictive distribution summarizes everything: y x,x,y ~ p(y x,x,y)n ( y!m(x), σ! 2 (x)) Integrate to get rid of y from the improvement: EI(x) = I(x,y )p(y x,x,y)dy 74

75 Bayesian global optimization - The Idea The integration is analytically available: EI(x) = ( min n y n!m(x) )Φ min y n n!m(x) σ!(x) + σ!(x)φ min n y n!m(x) σ!(x) Jones et al. (1998) 75

76 Bayesian global optimization - The algorithm 1. Observe initial pool of inputs-outputs (e.g., randomly selected or whatever is available). 2. Given current observations, build a GP representing our state of knowledge about the output. 3. Select the input with maximum expected improvement. If below threshold, STOP. Otherwise, do new simulation and GO TO 2. 76

77 Minimizing the energy of a binary molecule Consider the O2 molecule. Let r be the distance between to O atoms. We wish to find the interatomic distance. r * = argmin r V (r ) We run BGO starting with 1 randomly chosen simulation. 77

78 Minimizing the energy of a binary molecule 78

79 Example 4: BGO application to finding the minimum energy structure 79

80 BGO for solving inverse problems Suppose we observe model output y and want to find input x that gave rise to it. Simplest mathematical formulation is via a loss function: x * = argmin x L(x):= argmin x! y f (x)! 2 2 We represent the loss function with a GP and we employ BGO. 80

81 Example 5: BGO for solving inverse problems 81

82 Cool stuff you did not learn about today 82

83 Detecting discontinuities x 2 0 x x x 1 83

84 Detecting important inputs ct to SIAM license or copyright; see L 2 error nom L 2 error norm SGC ASGC, ε=10 1 ASGC, ε=10 2 ASGC, ε=10 3 ASGC, ε=10 4 RVM SE, N=20 RVM GPC, N=20, P= Number of samples SGC ASGC, ε=10 1 ASGC, ε=10 2 ASGC, ε=10 3 ASGC, ε=10 4 RVM SE, N=40 RVM GPC, N=40, P= Number of samples 84 Number of splits Number of splits RVM GPC, N=20, P=2, δ=10 7 RVM GPC, N=20, P=2, δ=10 6 RVM GPC, N=20, P=2, δ=10 5 RVM GPC, N=20, P=2, δ=10 4 RVM GPC, N=20, P=2, δ= Dimension RVM GPC, N=40, P=2, δ=10 7 RVM GPC, N=40, P=2, δ=10 6 RVM GPC, N=40, P=2, δ=10 5 RVM GPC, N=40, P=2, δ=10 4 RVM GPC, N=40, P=2, δ= Dimension

85 Getting predictive error bars for EVERYTHING verse Problems 30 (2014) I Bilionis and N Zabara (a) Single, 20 Obs. (b) Semi, 20 Obs. (c) Full, 20 Obs. 85

86 Learning non-linear dynamics from data (recursive GPs) 86

87 0.4 Doing multi-fidelity 0 optimization under budget constraints mev/atom mev/atom % Al in NiAl FIG. 2. (Color) Six di erent stages, set up in two columns, 87 in the Bayesian global optimization algorithm for learning

88 Doing multi-objective optimization with limited simulations 88

89 Doing high-dimensions & encoding physics 89

90 Unifying all UQ problems 90

91 Incomplete References Rasmussen, C. E., & Williams, C. K. I. (2006). Gaussian processes for machine learning. Cambridge, MA: MIT Press. Jones, D. R., Schonlau, M., & Welch, W. J. (1998). Efficient global optimization of expensive black-box functions. Journal of Global Optimization, 13(4), doi: Doi /A: and a many many more 91

92 Thanks! 92

GWAS V: Gaussian processes

GWAS V: Gaussian processes GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011

More information

Gaussian Process Regression

Gaussian Process Regression Gaussian Process Regression 4F1 Pattern Recognition, 21 Carl Edward Rasmussen Department of Engineering, University of Cambridge November 11th - 16th, 21 Rasmussen (Engineering, Cambridge) Gaussian Process

More information

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression Group Prof. Daniel Cremers 9. Gaussian Processes - Regression Repetition: Regularized Regression Before, we solved for w using the pseudoinverse. But: we can kernelize this problem as well! First step:

More information

Introduction to Gaussian Processes

Introduction to Gaussian Processes Introduction to Gaussian Processes Iain Murray murray@cs.toronto.edu CSC255, Introduction to Machine Learning, Fall 28 Dept. Computer Science, University of Toronto The problem Learn scalar function of

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression Group Prof. Daniel Cremers 4. Gaussian Processes - Regression Definition (Rep.) Definition: A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution.

More information

Probabilistic numerics for deep learning

Probabilistic numerics for deep learning Presenter: Shijia Wang Department of Engineering Science, University of Oxford rning (RLSS) Summer School, Montreal 2017 Outline 1 Introduction Probabilistic Numerics 2 Components Probabilistic modeling

More information

Lecture 5: GPs and Streaming regression

Lecture 5: GPs and Streaming regression Lecture 5: GPs and Streaming regression Gaussian Processes Information gain Confidence intervals COMP-652 and ECSE-608, Lecture 5 - September 19, 2017 1 Recall: Non-parametric regression Input space X

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

Model Selection for Gaussian Processes

Model Selection for Gaussian Processes Institute for Adaptive and Neural Computation School of Informatics,, UK December 26 Outline GP basics Model selection: covariance functions and parameterizations Criteria for model selection Marginal

More information

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu Lecture: Gaussian Process Regression STAT 6474 Instructor: Hongxiao Zhu Motivation Reference: Marc Deisenroth s tutorial on Robot Learning. 2 Fast Learning for Autonomous Robots with Gaussian Processes

More information

Gaussian Processes in Machine Learning

Gaussian Processes in Machine Learning Gaussian Processes in Machine Learning November 17, 2011 CharmGil Hong Agenda Motivation GP : How does it make sense? Prior : Defining a GP More about Mean and Covariance Functions Posterior : Conditioning

More information

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian

More information

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model (& discussion on the GPLVM tech. report by Prof. N. Lawrence, 06) Andreas Damianou Department of Neuro- and Computer Science,

More information

CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes

CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes Roger Grosse Roger Grosse CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes 1 / 55 Adminis-Trivia Did everyone get my e-mail

More information

Probabilistic & Unsupervised Learning

Probabilistic & Unsupervised Learning Probabilistic & Unsupervised Learning Gaussian Processes Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College London

More information

GAUSSIAN PROCESS REGRESSION

GAUSSIAN PROCESS REGRESSION GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The

More information

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.) Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori

More information

Prediction of Data with help of the Gaussian Process Method

Prediction of Data with help of the Gaussian Process Method of Data with help of the Gaussian Process Method R. Preuss, U. von Toussaint Max-Planck-Institute for Plasma Physics EURATOM Association 878 Garching, Germany March, Abstract The simulation of plasma-wall

More information

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

PILCO: A Model-Based and Data-Efficient Approach to Policy Search PILCO: A Model-Based and Data-Efficient Approach to Policy Search (M.P. Deisenroth and C.E. Rasmussen) CSC2541 November 4, 2016 PILCO Graphical Model PILCO Probabilistic Inference for Learning COntrol

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

Introduction to Gaussian Processes

Introduction to Gaussian Processes Introduction to Gaussian Processes Neil D. Lawrence GPSS 10th June 2013 Book Rasmussen and Williams (2006) Outline The Gaussian Density Covariance from Basis Functions Basis Function Representations Constructing

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Gaussian Processes for Machine Learning

Gaussian Processes for Machine Learning Gaussian Processes for Machine Learning Carl Edward Rasmussen Max Planck Institute for Biological Cybernetics Tübingen, Germany carl@tuebingen.mpg.de Carlos III, Madrid, May 2006 The actual science of

More information

Machine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart

Machine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart Machine Learning Bayesian Regression & Classification learning as inference, Bayesian Kernel Ridge regression & Gaussian Processes, Bayesian Kernel Logistic Regression & GP classification, Bayesian Neural

More information

Statistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes

Statistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes Statistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes Lecturer: Drew Bagnell Scribe: Venkatraman Narayanan 1, M. Koval and P. Parashar 1 Applications of Gaussian

More information

Linear regression example Simple linear regression: f(x) = ϕ(x)t w w ~ N(0, ) The mean and covariance are given by E[f(x)] = ϕ(x)e[w] = 0.

Linear regression example Simple linear regression: f(x) = ϕ(x)t w w ~ N(0, ) The mean and covariance are given by E[f(x)] = ϕ(x)e[w] = 0. Gaussian Processes Gaussian Process Stochastic process: basically, a set of random variables. may be infinite. usually related in some way. Gaussian process: each variable has a Gaussian distribution every

More information

Introduction to emulators - the what, the when, the why

Introduction to emulators - the what, the when, the why School of Earth and Environment INSTITUTE FOR CLIMATE & ATMOSPHERIC SCIENCE Introduction to emulators - the what, the when, the why Dr Lindsay Lee 1 What is a simulator? A simulator is a computer code

More information

Optimization of Gaussian Process Hyperparameters using Rprop

Optimization of Gaussian Process Hyperparameters using Rprop Optimization of Gaussian Process Hyperparameters using Rprop Manuel Blum and Martin Riedmiller University of Freiburg - Department of Computer Science Freiburg, Germany Abstract. Gaussian processes are

More information

Practical Bayesian Optimization of Machine Learning. Learning Algorithms

Practical Bayesian Optimization of Machine Learning. Learning Algorithms Practical Bayesian Optimization of Machine Learning Algorithms CS 294 University of California, Berkeley Tuesday, April 20, 2016 Motivation Machine Learning Algorithms (MLA s) have hyperparameters that

More information

Gaussian with mean ( µ ) and standard deviation ( σ)

Gaussian with mean ( µ ) and standard deviation ( σ) Slide from Pieter Abbeel Gaussian with mean ( µ ) and standard deviation ( σ) 10/6/16 CSE-571: Robotics X ~ N( µ, σ ) Y ~ N( aµ + b, a σ ) Y = ax + b + + + + 1 1 1 1 1 1 1 1 1 1, ~ ) ( ) ( ), ( ~ ), (

More information

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the

More information

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations

More information

Nonparameteric Regression:

Nonparameteric Regression: Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,

More information

Nonparmeteric Bayes & Gaussian Processes. Baback Moghaddam Machine Learning Group

Nonparmeteric Bayes & Gaussian Processes. Baback Moghaddam Machine Learning Group Nonparmeteric Bayes & Gaussian Processes Baback Moghaddam baback@jpl.nasa.gov Machine Learning Group Outline Bayesian Inference Hierarchical Models Model Selection Parametric vs. Nonparametric Gaussian

More information

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS

More information

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II 1 Non-linear regression techniques Part - II Regression Algorithms in this Course Support Vector Machine Relevance Vector Machine Support vector regression Boosting random projections Relevance vector

More information

Learning Gaussian Process Models from Uncertain Data

Learning Gaussian Process Models from Uncertain Data Learning Gaussian Process Models from Uncertain Data Patrick Dallaire, Camille Besse, and Brahim Chaib-draa DAMAS Laboratory, Computer Science & Software Engineering Department, Laval University, Canada

More information

COMP 551 Applied Machine Learning Lecture 21: Bayesian optimisation

COMP 551 Applied Machine Learning Lecture 21: Bayesian optimisation COMP 55 Applied Machine Learning Lecture 2: Bayesian optimisation Associate Instructor: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp55 Unless otherwise noted, all material posted

More information

CS 7140: Advanced Machine Learning

CS 7140: Advanced Machine Learning Instructor CS 714: Advanced Machine Learning Lecture 3: Gaussian Processes (17 Jan, 218) Jan-Willem van de Meent (j.vandemeent@northeastern.edu) Scribes Mo Han (han.m@husky.neu.edu) Guillem Reus Muns (reusmuns.g@husky.neu.edu)

More information

CS-E3210 Machine Learning: Basic Principles

CS-E3210 Machine Learning: Basic Principles CS-E3210 Machine Learning: Basic Principles Lecture 4: Regression II slides by Markus Heinonen Department of Computer Science Aalto University, School of Science Autumn (Period I) 2017 1 / 61 Today s introduction

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Introduction to Gaussian Processes

Introduction to Gaussian Processes Introduction to Gaussian Processes Iain Murray School of Informatics, University of Edinburgh The problem Learn scalar function of vector values f(x).5.5 f(x) y i.5.2.4.6.8 x f 5 5.5 x x 2.5 We have (possibly

More information

Introduction to Gaussian Process

Introduction to Gaussian Process Introduction to Gaussian Process CS 778 Chris Tensmeyer CS 478 INTRODUCTION 1 What Topic? Machine Learning Regression Bayesian ML Bayesian Regression Bayesian Non-parametric Gaussian Process (GP) GP Regression

More information

Intelligent Systems I

Intelligent Systems I Intelligent Systems I 00 INTRODUCTION Stefan Harmeling & Philipp Hennig 24. October 2013 Max Planck Institute for Intelligent Systems Dptmt. of Empirical Inference Which Card? Opening Experiment Which

More information

Reliability Monitoring Using Log Gaussian Process Regression

Reliability Monitoring Using Log Gaussian Process Regression COPYRIGHT 013, M. Modarres Reliability Monitoring Using Log Gaussian Process Regression Martin Wayne Mohammad Modarres PSA 013 Center for Risk and Reliability University of Maryland Department of Mechanical

More information

Gaussian Processes (10/16/13)

Gaussian Processes (10/16/13) STA561: Probabilistic machine learning Gaussian Processes (10/16/13) Lecturer: Barbara Engelhardt Scribes: Changwei Hu, Di Jin, Mengdi Wang 1 Introduction In supervised learning, we observe some inputs

More information

Supervised Learning Coursework

Supervised Learning Coursework Supervised Learning Coursework John Shawe-Taylor Tom Diethe Dorota Glowacka November 30, 2009; submission date: noon December 18, 2009 Abstract Using a series of synthetic examples, in this exercise session

More information

Probabilistic Models for Learning Data Representations. Andreas Damianou

Probabilistic Models for Learning Data Representations. Andreas Damianou Probabilistic Models for Learning Data Representations Andreas Damianou Department of Computer Science, University of Sheffield, UK IBM Research, Nairobi, Kenya, 23/06/2015 Sheffield SITraN Outline Part

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning 12. Gaussian Processes Alex Smola Carnegie Mellon University http://alex.smola.org/teaching/cmu2013-10-701 10-701 The Normal Distribution http://www.gaussianprocess.org/gpml/chapters/

More information

Neutron inverse kinetics via Gaussian Processes

Neutron inverse kinetics via Gaussian Processes Neutron inverse kinetics via Gaussian Processes P. Picca Politecnico di Torino, Torino, Italy R. Furfaro University of Arizona, Tucson, Arizona Outline Introduction Review of inverse kinetics techniques

More information

Multivariate Bayesian Linear Regression MLAI Lecture 11

Multivariate Bayesian Linear Regression MLAI Lecture 11 Multivariate Bayesian Linear Regression MLAI Lecture 11 Neil D. Lawrence Department of Computer Science Sheffield University 21st October 2012 Outline Univariate Bayesian Linear Regression Multivariate

More information

Predictive Variance Reduction Search

Predictive Variance Reduction Search Predictive Variance Reduction Search Vu Nguyen, Sunil Gupta, Santu Rana, Cheng Li, Svetha Venkatesh Centre of Pattern Recognition and Data Analytics (PRaDA), Deakin University Email: v.nguyen@deakin.edu.au

More information

COMP 551 Applied Machine Learning Lecture 20: Gaussian processes

COMP 551 Applied Machine Learning Lecture 20: Gaussian processes COMP 55 Applied Machine Learning Lecture 2: Gaussian processes Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~hvanho2/comp55

More information

Probabilistic Graphical Models Lecture 20: Gaussian Processes

Probabilistic Graphical Models Lecture 20: Gaussian Processes Probabilistic Graphical Models Lecture 20: Gaussian Processes Andrew Gordon Wilson www.cs.cmu.edu/~andrewgw Carnegie Mellon University March 30, 2015 1 / 53 What is Machine Learning? Machine learning algorithms

More information

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester Physics 403 Parameter Estimation, Correlations, and Error Bars Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Best Estimates and Reliability

More information

Bayesian Linear Regression [DRAFT - In Progress]

Bayesian Linear Regression [DRAFT - In Progress] Bayesian Linear Regression [DRAFT - In Progress] David S. Rosenberg Abstract Here we develop some basics of Bayesian linear regression. Most of the calculations for this document come from the basic theory

More information

Can you predict the future..?

Can you predict the future..? Can you predict the future..? Gaussian Process Modelling for Forward Prediction Anna Scaife 1 1 Jodrell Bank Centre for Astrophysics University of Manchester @radastrat September 7, 2017 Anna Scaife University

More information

Gaussian Processes for Computer Experiments

Gaussian Processes for Computer Experiments Gaussian Processes for Computer Experiments Jeremy Oakley School of Mathematics and Statistics, University of Sheffield www.jeremy-oakley.staff.shef.ac.uk 1 / 43 Computer models Computer model represented

More information

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop Music and Machine Learning (IFT68 Winter 8) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

More information

Statistical Techniques in Robotics (16-831, F12) Lecture#20 (Monday November 12) Gaussian Processes

Statistical Techniques in Robotics (16-831, F12) Lecture#20 (Monday November 12) Gaussian Processes Statistical Techniques in Robotics (6-83, F) Lecture# (Monday November ) Gaussian Processes Lecturer: Drew Bagnell Scribe: Venkatraman Narayanan Applications of Gaussian Processes (a) Inverse Kinematics

More information

Modelling Transcriptional Regulation with Gaussian Processes

Modelling Transcriptional Regulation with Gaussian Processes Modelling Transcriptional Regulation with Gaussian Processes Neil Lawrence School of Computer Science University of Manchester Joint work with Magnus Rattray and Guido Sanguinetti 8th March 7 Outline Application

More information

Advanced Introduction to Machine Learning CMU-10715

Advanced Introduction to Machine Learning CMU-10715 Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes Barnabás Póczos http://www.gaussianprocess.org/ 2 Some of these slides in the intro are taken from D. Lizotte, R. Parr, C. Guesterin

More information

Introduction to Probabilistic Graphical Models: Exercises

Introduction to Probabilistic Graphical Models: Exercises Introduction to Probabilistic Graphical Models: Exercises Cédric Archambeau Xerox Research Centre Europe cedric.archambeau@xrce.xerox.com Pascal Bootcamp Marseille, France, July 2010 Exercise 1: basics

More information

20: Gaussian Processes

20: Gaussian Processes 10-708: Probabilistic Graphical Models 10-708, Spring 2016 20: Gaussian Processes Lecturer: Andrew Gordon Wilson Scribes: Sai Ganesh Bandiatmakuri 1 Discussion about ML Here we discuss an introduction

More information

y Xw 2 2 y Xw λ w 2 2

y Xw 2 2 y Xw λ w 2 2 CS 189 Introduction to Machine Learning Spring 2018 Note 4 1 MLE and MAP for Regression (Part I) So far, we ve explored two approaches of the regression framework, Ordinary Least Squares and Ridge Regression:

More information

Quantifying mismatch in Bayesian optimization

Quantifying mismatch in Bayesian optimization Quantifying mismatch in Bayesian optimization Eric Schulz University College London e.schulz@cs.ucl.ac.uk Maarten Speekenbrink University College London m.speekenbrink@ucl.ac.uk José Miguel Hernández-Lobato

More information

Introduction to Bayesian Learning. Machine Learning Fall 2018

Introduction to Bayesian Learning. Machine Learning Fall 2018 Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability

More information

Scalable kernel methods and their use in black-box optimization

Scalable kernel methods and their use in black-box optimization with derivatives Scalable kernel methods and their use in black-box optimization David Eriksson Center for Applied Mathematics Cornell University dme65@cornell.edu November 9, 2018 1 2 3 4 1/37 with derivatives

More information

DD Advanced Machine Learning

DD Advanced Machine Learning Modelling Carl Henrik {chek}@csc.kth.se Royal Institute of Technology November 4, 2015 Who do I think you are? Mathematically competent linear algebra multivariate calculus Ok programmers Able to extend

More information

MCMC Sampling for Bayesian Inference using L1-type Priors

MCMC Sampling for Bayesian Inference using L1-type Priors MÜNSTER MCMC Sampling for Bayesian Inference using L1-type Priors (what I do whenever the ill-posedness of EEG/MEG is just not frustrating enough!) AG Imaging Seminar Felix Lucka 26.06.2012 , MÜNSTER Sampling

More information

Talk on Bayesian Optimization

Talk on Bayesian Optimization Talk on Bayesian Optimization Jungtaek Kim (jtkim@postech.ac.kr) Machine Learning Group, Department of Computer Science and Engineering, POSTECH, 77-Cheongam-ro, Nam-gu, Pohang-si 37673, Gyungsangbuk-do,

More information

The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan

The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan Background: Global Optimization and Gaussian Processes The Geometry of Gaussian Processes and the Chaining Trick Algorithm

More information

System identification and control with (deep) Gaussian processes. Andreas Damianou

System identification and control with (deep) Gaussian processes. Andreas Damianou System identification and control with (deep) Gaussian processes Andreas Damianou Department of Computer Science, University of Sheffield, UK MIT, 11 Feb. 2016 Outline Part 1: Introduction Part 2: Gaussian

More information

Introduction. Chapter 1

Introduction. Chapter 1 Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics

More information

Joint Emotion Analysis via Multi-task Gaussian Processes

Joint Emotion Analysis via Multi-task Gaussian Processes Joint Emotion Analysis via Multi-task Gaussian Processes Daniel Beck, Trevor Cohn, Lucia Specia October 28, 2014 1 Introduction 2 Multi-task Gaussian Process Regression 3 Experiments and Discussion 4 Conclusions

More information

An Introduction to Gaussian Processes for Spatial Data (Predictions!)

An Introduction to Gaussian Processes for Spatial Data (Predictions!) An Introduction to Gaussian Processes for Spatial Data (Predictions!) Matthew Kupilik College of Engineering Seminar Series Nov 216 Matthew Kupilik UAA GP for Spatial Data Nov 216 1 / 35 Why? When evidence

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters

More information

Fully Bayesian Deep Gaussian Processes for Uncertainty Quantification

Fully Bayesian Deep Gaussian Processes for Uncertainty Quantification Fully Bayesian Deep Gaussian Processes for Uncertainty Quantification N. Zabaras 1 S. Atkinson 1 Center for Informatics and Computational Science Department of Aerospace and Mechanical Engineering University

More information

Physics 403. Segev BenZvi. Propagation of Uncertainties. Department of Physics and Astronomy University of Rochester

Physics 403. Segev BenZvi. Propagation of Uncertainties. Department of Physics and Astronomy University of Rochester Physics 403 Propagation of Uncertainties Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Maximum Likelihood and Minimum Least Squares Uncertainty Intervals

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Introduction: MLE, MAP, Bayesian reasoning (28/8/13) STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this

More information

Choosing among models

Choosing among models Eco 515 Fall 2014 Chris Sims Choosing among models September 18, 2014 c 2014 by Christopher A. Sims. This document is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported

More information

An introduction to Bayesian statistics and model calibration and a host of related topics

An introduction to Bayesian statistics and model calibration and a host of related topics An introduction to Bayesian statistics and model calibration and a host of related topics Derek Bingham Statistics and Actuarial Science Simon Fraser University Cast of thousands have participated in the

More information

Gaussian Process Regression with K-means Clustering for Very Short-Term Load Forecasting of Individual Buildings at Stanford

Gaussian Process Regression with K-means Clustering for Very Short-Term Load Forecasting of Individual Buildings at Stanford Gaussian Process Regression with K-means Clustering for Very Short-Term Load Forecasting of Individual Buildings at Stanford Carol Hsin Abstract The objective of this project is to return expected electricity

More information

Probability and Information Theory. Sargur N. Srihari

Probability and Information Theory. Sargur N. Srihari Probability and Information Theory Sargur N. srihari@cedar.buffalo.edu 1 Topics in Probability and Information Theory Overview 1. Why Probability? 2. Random Variables 3. Probability Distributions 4. Marginal

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning

More information

Outline Introduction OLS Design of experiments Regression. Metamodeling. ME598/494 Lecture. Max Yi Ren

Outline Introduction OLS Design of experiments Regression. Metamodeling. ME598/494 Lecture. Max Yi Ren 1 / 34 Metamodeling ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University March 1, 2015 2 / 34 1. preliminaries 1.1 motivation 1.2 ordinary least square 1.3 information

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 89 Part II

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Introduction. Basic Probability and Bayes Volkan Cevher, Matthias Seeger Ecole Polytechnique Fédérale de Lausanne 26/9/2011 (EPFL) Graphical Models 26/9/2011 1 / 28 Outline

More information

Bayesian RL Seminar. Chris Mansley September 9, 2008

Bayesian RL Seminar. Chris Mansley September 9, 2008 Bayesian RL Seminar Chris Mansley September 9, 2008 Bayes Basic Probability One of the basic principles of probability theory, the chain rule, will allow us to derive most of the background material in

More information

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014 Learning with Noisy Labels Kate Niehaus Reading group 11-Feb-2014 Outline Motivations Generative model approach: Lawrence, N. & Scho lkopf, B. Estimating a Kernel Fisher Discriminant in the Presence of

More information

Today. Probability and Statistics. Linear Algebra. Calculus. Naïve Bayes Classification. Matrix Multiplication Matrix Inversion

Today. Probability and Statistics. Linear Algebra. Calculus. Naïve Bayes Classification. Matrix Multiplication Matrix Inversion Today Probability and Statistics Naïve Bayes Classification Linear Algebra Matrix Multiplication Matrix Inversion Calculus Vector Calculus Optimization Lagrange Multipliers 1 Classical Artificial Intelligence

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September

More information

Bayesian Interpretations of Regularization

Bayesian Interpretations of Regularization Bayesian Interpretations of Regularization Charlie Frogner 9.50 Class 15 April 1, 009 The Plan Regularized least squares maps {(x i, y i )} n i=1 to a function that minimizes the regularized loss: f S

More information

Nonparametric Bayesian Methods - Lecture I

Nonparametric Bayesian Methods - Lecture I Nonparametric Bayesian Methods - Lecture I Harry van Zanten Korteweg-de Vries Institute for Mathematics CRiSM Masterclass, April 4-6, 2016 Overview of the lectures I Intro to nonparametric Bayesian statistics

More information

Creating Non-Gaussian Processes from Gaussian Processes by the Log-Sum-Exp Approach. Radford M. Neal, 28 February 2005

Creating Non-Gaussian Processes from Gaussian Processes by the Log-Sum-Exp Approach. Radford M. Neal, 28 February 2005 Creating Non-Gaussian Processes from Gaussian Processes by the Log-Sum-Exp Approach Radford M. Neal, 28 February 2005 A Very Brief Review of Gaussian Processes A Gaussian process is a distribution over

More information

Announcements. Proposals graded

Announcements. Proposals graded Announcements Proposals graded Kevin Jamieson 2018 1 Bayesian Methods Machine Learning CSE546 Kevin Jamieson University of Washington November 1, 2018 2018 Kevin Jamieson 2 MLE Recap - coin flips Data:

More information