GWAS V: Gaussian processes

Size: px
Start display at page:

Download "GWAS V: Gaussian processes"

Transcription

1 GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer

2 Motivation Why Gaussian processes? So far: linear models with a finite number of basis functions, e.g. φ(x) = (1, x, x 2,..., x K ) Open questions: How to design a suitable basis? How many basis functions to pick? Gaussian processes: accurate and flexible regression method yielding predictions alongside with error bars. Oliver Stegle GWAS V: Gaussian processes Summer

3 Motivation Why Gaussian processes? So far: linear models with a finite number of basis functions, e.g. φ(x) = (1, x, x 2,..., x K ) Open questions: How to design a suitable basis? How many basis functions to pick? Gaussian processes: accurate and flexible regression method yielding predictions alongside with error bars. Y X Oliver Stegle GWAS V: Gaussian processes Summer

4 Motivation Why Gaussian processes? So far: linear models with a finite number of basis functions, e.g. φ(x) = (1, x, x 2,..., x K ) Open questions: How to design a suitable basis? How many basis functions to pick? Gaussian processes: accurate and flexible regression method yielding predictions alongside with error bars. Y X Oliver Stegle GWAS V: Gaussian processes Summer

5 Motivation Making predictions with variance component models Linear model, accounting ( for a set of measured ) SNPs X p(y X, θ, σ 2 S ) = N y x s θ s, σ 2 I s=1 Prediction at unseen test input given max. likelihood weight: p(y x, ˆθ) = N (y x ˆθ, ) σ 2 Marginal likelihood p(y X, σ 2, σg) 2 = θ N ( y Xθ, σ 2 I ) N ( θ 0, σgi 2 ) = N y 0, σ 2 gxx T +σ 2 I }{{} K Making predictions with variance component models? Oliver Stegle GWAS V: Gaussian processes Summer

6 Motivation Making predictions with variance component models Linear model, accounting ( for a set of measured ) SNPs X p(y X, θ, σ 2 S ) = N y x s θ s, σ 2 I s=1 Prediction at unseen test input given max. likelihood weight: p(y x, ˆθ) = N (y x ˆθ, ) σ 2 Marginal likelihood p(y X, σ 2, σg) 2 = θ N ( y Xθ, σ 2 I ) N ( θ 0, σgi 2 ) = N y 0, σ 2 gxx T +σ 2 I }{{} K Making predictions with variance component models? Oliver Stegle GWAS V: Gaussian processes Summer

7 Motivation Making predictions with variance component models Linear model, accounting ( for a set of measured ) SNPs X p(y X, θ, σ 2 S ) = N y x s θ s, σ 2 I s=1 Prediction at unseen test input given max. likelihood weight: p(y x, ˆθ) = N (y x ˆθ, ) σ 2 Marginal likelihood p(y X, σ 2, σg) 2 = θ N ( y Xθ, σ 2 I ) N ( θ 0, σgi 2 ) = N y 0, σ 2 gxx T +σ 2 I }{{} K Making predictions with variance component models? Oliver Stegle GWAS V: Gaussian processes Summer

8 Motivation Making predictions with variance component models Linear model, accounting ( for a set of measured ) SNPs X p(y X, θ, σ 2 S ) = N y x s θ s, σ 2 I s=1 Prediction at unseen test input given max. likelihood weight: p(y x, ˆθ) = N (y x ˆθ, ) σ 2 Marginal likelihood p(y X, σ 2, σg) 2 = θ N ( y Xθ, σ 2 I ) N ( θ 0, σgi 2 ) = N y 0, σ 2 gxx T +σ 2 I }{{} K Making predictions with variance component models? Oliver Stegle GWAS V: Gaussian processes Summer

9 Motivation Further reading C. E. Rasmussen, C. K. Williams Gaussian proceesses for machine learning [Rasmussen, 2004] Comprehensive and freely available introduction (Appendix!). A really good introductory movie to watch [MacKay, 2006] Several ideas used in this course are borrowed from this lecture. Christopher M. Bishop: Pattern Recognition and Machine learning [Bishop, 2006] Oliver Stegle GWAS V: Gaussian processes Summer

10 Outline Outline Oliver Stegle GWAS V: Gaussian processes Summer

11 Intuitive approach Outline Motivation Intuitive approach Function space view GP classification & other extensions Summary Oliver Stegle GWAS V: Gaussian processes Summer

12 Intuitive approach The Gaussian distribution Gaussian processes are merely based on the good old Gaussian ( ) [ 1 N x µ, K = exp 1 ] 2 (x µ)t K 1 (x µ) 2π K Covariance matrix or kernel matrix Oliver Stegle GWAS V: Gaussian processes Summer

13 Intuitive approach A 2D Gaussian Probability contour Samples y y1 K = [ ] Oliver Stegle GWAS V: Gaussian processes Summer

14 Intuitive approach A 2D Gaussian Probability contour Samples y y1 K = [ ] Oliver Stegle GWAS V: Gaussian processes Summer

15 Intuitive approach A 2D Gaussian Varying the covariance matrix y2 0 y2 0 y y1 [ K = ] y1 K = [ ] y1 [ K = ] Oliver Stegle GWAS V: Gaussian processes Summer

16 Intuitive approach A 2D Gaussian Inference Oliver Stegle GWAS V: Gaussian processes Summer

17 Intuitive approach A 2D Gaussian Inference Oliver Stegle GWAS V: Gaussian processes Summer

18 Intuitive approach A 2D Gaussian Inference Oliver Stegle GWAS V: Gaussian processes Summer

19 Intuitive approach Inference Joint probability p(y 1, y 2 K) = N ([y 1, y 2 ] 0, K) Conditional probability p(y 2 y 1, K) = p(y 1, y 2 K) p(y 1 K) { exp 1 [ 2 [y 1, y 2 ] K 1 y1 y 2 ]} Completing the square yields a Gaussian with non-zero as posterior for y 2. Oliver Stegle GWAS V: Gaussian processes Summer

20 Intuitive approach Inference Gaussian conditioning in 2D p(y 2 y 1, K) = p(y 1, y 2 K) p(y 1 K) { exp 1 [ ]} 2 [y 1, y 2 ] K 1 y1 y 2 = exp{ 1 [ y K 1 1,1 + y2 2K 1 2,2 + 2y 1K 1 1,2 y 2] } = exp{ 1 [ y K 1 2,2 + 2y 2K 1 1,2 y 1 + C ] } [ ] = Z exp{ 1 2 K 1 2,2 = Z exp{ 1 2 K 1 2,2 [ y y 2 K 1 1,2 y 1 K 1 2,2 y y 2 K 1 1,2 y 1 K 1 2,2 } + K 1 1,2 y 1 K 1 2,2 = Z exp{ 1 [ 2 K 1 2,2 y2 + K 1 1,2 y 1 ] 2} ( N }{{} K 1 y2 µ, σ 2 ) 2,2 σ }{{} 2 µ 2 ] + 1 K 1 2 K 1 2,2 1,2 y 1 K 1 2,2 Oliver Stegle GWAS V: Gaussian processes Summer }

21 Intuitive approach Extending the idea to higher dimensions Let us interpret y 1 and y 2 as outputs in a regression setting. We can introduce an additional 3rd point Y X Now P ([y 1, y 2, y 3 ] K 3 ) = N ([y 1, y 2, y 3 ] 0, K 3 ), where K 3 is now a 3 x 3 covariance matrix! Oliver Stegle GWAS V: Gaussian processes Summer

22 Intuitive approach Extending the idea to higher dimensions Let us interpret y 1 and y 2 as outputs in a regression setting. We can introduce an additional 3rd point Y X Now P ([y 1, y 2, y 3 ] K 3 ) = N ([y 1, y 2, y 3 ] 0, K 3 ), where K 3 is now a 3 x 3 covariance matrix! Oliver Stegle GWAS V: Gaussian processes Summer

23 Intuitive approach Extending the idea to higher dimensions Let us interpret y 1 and y 2 as outputs in a regression setting. We can introduce an additional 3rd point Y X Now P ([y 1, y 2, y 3 ] K 3 ) = N ([y 1, y 2, y 3 ] 0, K 3 ), where K 3 is now a 3 x 3 covariance matrix! Oliver Stegle GWAS V: Gaussian processes Summer

24 Intuitive approach Extending the idea to higher dimensions Let us interpret y 1 and y 2 as outputs in a regression setting. We can introduce an additional 3rd point Y X Now P ([y 1, y 2, y 3 ] K 3 ) = N ([y 1, y 2, y 3 ] 0, K 3 ), where K 3 is now a 3 x 3 covariance matrix! Oliver Stegle GWAS V: Gaussian processes Summer

25 Intuitive approach Constructing Covariance Matrices Analogously we can look at the joint probability for arbitrary many points and obtain predictions. Issue: how to construct a good covariance matrix? A simple heuristics [ ] K 2 = K 3 = Note: The ordering of the points y 1, y 2, y 3 matters. Important to ensure that covariance matrices remain positive definite (matrix inversion). Oliver Stegle GWAS V: Gaussian processes Summer

26 Intuitive approach Constructing Covariance Matrices Analogously we can look at the joint probability for arbitrary many points and obtain predictions. Issue: how to construct a good covariance matrix? A simple heuristics [ ] K 2 = K 3 = Note: The ordering of the points y 1, y 2, y 3 matters. Important to ensure that covariance matrices remain positive definite (matrix inversion). Oliver Stegle GWAS V: Gaussian processes Summer

27 Intuitive approach Constructing Covariance Matrices Analogously we can look at the joint probability for arbitrary many points and obtain predictions. Issue: how to construct a good covariance matrix? A simple heuristics [ ] K 2 = K 3 = Note: The ordering of the points y 1, y 2, y 3 matters. Important to ensure that covariance matrices remain positive definite (matrix inversion). Oliver Stegle GWAS V: Gaussian processes Summer

28 Intuitive approach Constructing Covariance Matrices A general recipe Use a covariance function (kernel function) to construct K: K i,j = k(x i, x j ; Θ K ) Example: The linear covariance function corresponds to a variance component model k LIN (x i, x j, ; A) = A 2 x i x j Example: The squared exponential covariance function embodies the belief that points further apart are less correlated: { } k SE (x i, x j, ; A, L) = A 2 exp 0.5 (x i x j ) 2 Θ K = {A, L}: hyperparameters. A 2 Overall correlation, amplitude L 2 Scaling parameter, smoothness Denote the covariance matrix for a set of inputs X = {x 1,..., x N } as: K X,X (Θ K ) Oliver Stegle GWAS V: Gaussian processes Summer L 2

29 Intuitive approach Constructing Covariance Matrices A general recipe Use a covariance function (kernel function) to construct K: K i,j = k(x i, x j ; Θ K ) Example: The linear covariance function corresponds to a variance component model k LIN (x i, x j, ; A) = A 2 x i x j Example: The squared exponential covariance function embodies the belief that points further apart are less correlated: { } k SE (x i, x j, ; A, L) = A 2 exp 0.5 (x i x j ) 2 Θ K = {A, L}: hyperparameters. A 2 Overall correlation, amplitude L 2 Scaling parameter, smoothness Denote the covariance matrix for a set of inputs X = {x 1,..., x N } as: K X,X (Θ K ) Oliver Stegle GWAS V: Gaussian processes Summer L 2

30 Intuitive approach Constructing Covariance Matrices A general recipe Use a covariance function (kernel function) to construct K: K i,j = k(x i, x j ; Θ K ) Example: The linear covariance function corresponds to a variance component model k LIN (x i, x j, ; A) = A 2 x i x j Example: The squared exponential covariance function embodies the belief that points further apart are less correlated: { } k SE (x i, x j, ; A, L) = A 2 exp 0.5 (x i x j ) 2 Θ K = {A, L}: hyperparameters. A 2 Overall correlation, amplitude L 2 Scaling parameter, smoothness Denote the covariance matrix for a set of inputs X = {x 1,..., x N } as: K X,X (Θ K ) Oliver Stegle GWAS V: Gaussian processes Summer L 2

31 Intuitive approach Constructing Covariance Matrices A general recipe Use a covariance function (kernel function) to construct K: K i,j = k(x i, x j ; Θ K ) Example: The linear covariance function corresponds to a variance component model k LIN (x i, x j, ; A) = A 2 x i x j Example: The squared exponential covariance function embodies the belief that points further apart are less correlated: { } k SE (x i, x j, ; A, L) = A 2 exp 0.5 (x i x j ) 2 Θ K = {A, L}: hyperparameters. A 2 Overall correlation, amplitude L 2 Scaling parameter, smoothness Denote the covariance matrix for a set of inputs X = {x 1,..., x N } as: K X,X (Θ K ) Oliver Stegle GWAS V: Gaussian processes Summer L 2

32 Intuitive approach Constructing Covariance Matrices A general recipe Use a covariance function (kernel function) to construct K: K i,j = k(x i, x j ; Θ K ) Example: The linear covariance function corresponds to a variance component model k LIN (x i, x j, ; A) = A 2 x i x j Example: The squared exponential covariance function embodies the belief that points further apart are less correlated: { } k SE (x i, x j, ; A, L) = A 2 exp 0.5 (x i x j ) 2 Θ K = {A, L}: hyperparameters. A 2 Overall correlation, amplitude L 2 Scaling parameter, smoothness Denote the covariance matrix for a set of inputs X = {x 1,..., x N } as: K X,X (Θ K ) Oliver Stegle GWAS V: Gaussian processes Summer L 2

33 Intuitive approach Constructing Covariance Matrices GP samples using the squared exponential covariance function A=1,L=1 3 A=1,L=0.5 A=3,L= D Gaussian Oliver Stegle GWAS V: Gaussian processes Summer

34 Intuitive approach Constructing Covariance Matrices GP samples using the squared exponential covariance function A=1,L=1 4 A=1,L=0.5 A=3,L= D Gaussian Oliver Stegle GWAS V: Gaussian processes Summer

35 Intuitive approach Constructing Covariance Matrices GP samples using the squared exponential covariance function y y1 Reminder: Every function line corresponds to a sample drawn from this 2D Gaussian! Oliver Stegle GWAS V: Gaussian processes Summer

36 Intuitive approach Drawing samples from a Gaussian processes For each sample do: Choose discretization of x axes X = {x 0, x 1,..., x N }. Evaluate covariance K = K X,X (Θ K ) Math Draw from p(y K) = N (y 0, K) Matlab Draw independent Gaussian variables ỹ = randn(n, 1) Rotate with K y = chol(k) ỹ Oliver Stegle GWAS V: Gaussian processes Summer

37 Intuitive approach Drawing samples from a Gaussian processes For each sample do: Choose discretization of x axes X = {x 0, x 1,..., x N }. Evaluate covariance K = K X,X (Θ K ) Math Draw from p(y K) = N (y 0, K) Matlab Draw independent Gaussian variables ỹ = randn(n, 1) Rotate with K y = chol(k) ỹ Oliver Stegle GWAS V: Gaussian processes Summer

38 Intuitive approach Drawing samples from a Gaussian processes For each sample do: Choose discretization of x axes X = {x 0, x 1,..., x N }. Evaluate covariance K = K X,X (Θ K ) Math Draw from p(y K) = N (y 0, K) Matlab Draw independent Gaussian variables ỹ = randn(n, 1) Rotate with K y = chol(k) ỹ Oliver Stegle GWAS V: Gaussian processes Summer

39 Intuitive approach Why this all works Consistency of the 10D and 500D Gaussian. A small quizz: Let y1, y 2, y 3 have covariance matrix K 3 = and inverse K 1 3 = i.e. p({y 1, y 2, y 3 } K 3 ) = N ({y 1, y 2, y 3 } 0, K 3 ) Now focus on the variables y1, y 2, integrating out y 3. p({y 1, y 2 }) = N ({y 1, y 2, y 3 } 0, K 3 ) y 3 = N ({y 1, y 2 } 0, K 2 ) Which of the following statements is true [ ] 1 5 a) K 2 = 5 1 b) K 1 2 = [ ] Oliver Stegle GWAS V: Gaussian processes Summer

40 Intuitive approach Why this all works Consistency of the 10D and 500D Gaussian. A small quizz: Let y1, y 2, y 3 have covariance matrix K 3 = and inverse K 1 3 = i.e. p({y 1, y 2, y 3 } K 3 ) = N ({y 1, y 2, y 3 } 0, K 3 ) Now focus on the variables y1, y 2, integrating out y 3. p({y 1, y 2 }) = N ({y 1, y 2, y 3 } 0, K 3 ) y 3 = N ({y 1, y 2 } 0, K 2 ) Which of the following statements is true [ ] 1 5 a) K 2 = 5 1 b) K 1 2 = [ ] Oliver Stegle GWAS V: Gaussian processes Summer

41 Intuitive approach Why this all works GP as infinite object (philosophical) A valid covariance function k(x, x ) defines recipe to calculate covariance for any choice of inputs. Prior on functions: all points on the real line are inputs; K R,R is an infinite object! Numerical implementation: choose finite subset X and evaluate on a reduced, finite K X,X, exploiting consistency rule. Oliver Stegle GWAS V: Gaussian processes Summer

42 Intuitive approach Why this all works GP as infinite object (philosophical) A valid covariance function k(x, x ) defines recipe to calculate covariance for any choice of inputs. Prior on functions: all points on the real line are inputs; K R,R is an infinite object! Numerical implementation: choose finite subset X and evaluate on a reduced, finite K X,X, exploiting consistency rule. Oliver Stegle GWAS V: Gaussian processes Summer

43 Intuitive approach Why this all works GP as infinite object (philosophical) A valid covariance function k(x, x ) defines recipe to calculate covariance for any choice of inputs. Prior on functions: all points on the real line are inputs; K R,R is an infinite object! Numerical implementation: choose finite subset X and evaluate on a reduced, finite K X,X, exploiting consistency rule. Oliver Stegle GWAS V: Gaussian processes Summer

44 Function space view Outline Motivation Intuitive approach Function space view GP classification & other extensions Summary Oliver Stegle GWAS V: Gaussian processes Summer

45 Function space view Function space view So far 1. Joint Gaussian distribution over the set of all outputs y. 2. Covariance function as a recipe to construct a suitable covariance matrices from the corresponding inputs X. Oliver Stegle GWAS V: Gaussian processes Summer

46 Function space view Function space view The Gaussian process as a prior on functions Covariance function and hyperparameters reflect the prior belief on function smoothness, lengthscales etc. The general recipe allows a joint Gaussian to be constructed for an arbitrary selection of input locations X. Prior on infinite function f(x) Prior on function values f = (f 1,..., f N ) p(f(x)) = GP(f(x) k) p(f X, Θ K ) = N (f 0, K X,X (Θ K )) Oliver Stegle GWAS V: Gaussian processes Summer

47 Function space view Noise-free observations Given noise-free training data D = {x n, f n } N n=1 Want to make predictions f at test points X Joint distribution of f and f is ( [ p([f, f ] X, X, Θ K ) = N [f, f ] 0, KX,X K X,X K X,X K X,X ]) (All kernel matrices K depend on hyperparameters Θ K which are dropped for brevity.) Real data is rarely noise-free. Oliver Stegle GWAS V: Gaussian processes Summer

48 Function space view Noise-free observations Given noise-free training data D = {x n, f n } N n=1 Want to make predictions f at test points X Joint distribution of f and f is ( [ p([f, f ] X, X, Θ K ) = N [f, f ] 0, KX,X K X,X K X,X K X,X ]) (All kernel matrices K depend on hyperparameters Θ K which are dropped for brevity.) Real data is rarely noise-free. Oliver Stegle GWAS V: Gaussian processes Summer

49 Function space view Inference Given observed noisy data D = {X, y}, the joint probability over latent function values f and f given y is Prior {}}{ p([f, f ] X, X, y, Θ K, σ 2 ) N ([f, f ] 0, K) N N ( y n fn, σ 2), n=1 } {{ } Likelihood Oliver Stegle GWAS V: Gaussian processes Summer

50 Function space view Inference Given observed noisy data D = {X, y}, the joint probability over latent function values f and f given y is p([f, f ] X, X, y, Θ K, σ 2 ) Prior { ( }}{ [ ] ) N [f, f ] 0, KX,X K X,X K X,X K X,X N n=1 N ( y n fn, σ 2), } {{ } Likelihood Oliver Stegle GWAS V: Gaussian processes Summer

51 Function space view Inference Applying Gaussian calculus, integrating out f yields ( [ ]) p([y, f ] X, X, y, Θ K, σ 2 ) N [y, f ] 0, KX,X + σ 2 I K X,X K X,X K X,X Note: Assuming noisy instead of perfect observation noise merely corresponds to adding a diagonal component to the self-covariance K X,X. Oliver Stegle GWAS V: Gaussian processes Summer

52 Function space view Inference Applying Gaussian calculus, integrating out f yields ( [ ]) p([y, f ] X, X, y, Θ K, σ 2 ) N [y, f ] 0, KX,X + σ 2 I K X,X K X,X K X,X Note: Assuming noisy instead of perfect observation noise merely corresponds to adding a diagonal component to the self-covariance K X,X. Oliver Stegle GWAS V: Gaussian processes Summer

53 Function space view Making predictions The predictive distribution follows from the joint distribution by completing the square (conditioning) ( [ p([y, f ] X, X, y, Θ K, σ 2 ) N [y, f ] 0, KX,X + σ 2 I K X,X K X,X K X,X ]) Gaussian predictive distribution for f p(f X, y, X, Θ K, σ 2 ) = N (f µ, Σ ) with µ [ = K X,X KX,X + σ 2 I ] 1 y Σ = K X,X K [ X,X KX,X + σ 2 I ] 1 KX,X Oliver Stegle GWAS V: Gaussian processes Summer

54 Function space view Making predictions The predictive distribution follows from the joint distribution by completing the square (conditioning) ( [ p([y, f ] X, X, y, Θ K, σ 2 ) N [y, f ] 0, KX,X + σ 2 I K X,X K X,X K X,X ]) Gaussian predictive distribution for f p(f X, y, X, Θ K, σ 2 ) = N (f µ, Σ ) with µ [ = K X,X KX,X + σ 2 I ] 1 y Σ = K X,X K [ X,X KX,X + σ 2 I ] 1 KX,X Oliver Stegle GWAS V: Gaussian processes Summer

55 Function space view Making predictions Example Y X Oliver Stegle GWAS V: Gaussian processes Summer

56 Function space view Making predictions Example Y X Oliver Stegle GWAS V: Gaussian processes Summer

57 Function space view Learning hyperparameters 1. Fixed covariance matrix: p(y K) 2. Constructed covariance matrix: {K} i,j = k(x i, x j ; Θ K ) 3. Can we learn the hyperparameters Θ K? Oliver Stegle GWAS V: Gaussian processes Summer

58 Function space view Learning hyperparameters Formally we are interested in the posterior p(θ K D) p (y X, Θ K ) p(θ K ) Inference is analytically intractable! MAP estimate instead of a full posterior. Set Θ K to the most probable hyperparameter settings: ˆ Θ K = argmax Θ K ln [p (y X, Θ K ) p(θ K )] = argmax ln N ( y 0, KX,X (Θ K ) + σ 2 I ) + ln p(θ K ) Θ K [ = argmax 1 Θ K 2 log det[k X,X(Θ K ) + σ 2 I] 1 2 yt [K X,X (Θ K ) + σ 2 I] 1 y N ] 2 log 2π + ln p(θ K) Optimization can be carried out using standard optimization techniques. Oliver Stegle GWAS V: Gaussian processes Summer

59 Function space view Learning hyperparameters Formally we are interested in the posterior p(θ K D) p (y X, Θ K ) p(θ K ) Inference is analytically intractable! MAP estimate instead of a full posterior. Set Θ K to the most probable hyperparameter settings: ˆ Θ K = argmax Θ K ln [p (y X, Θ K ) p(θ K )] = argmax ln N ( y 0, KX,X (Θ K ) + σ 2 I ) + ln p(θ K ) Θ K [ = argmax 1 Θ K 2 log det[k X,X(Θ K ) + σ 2 I] 1 2 yt [K X,X (Θ K ) + σ 2 I] 1 y N ] 2 log 2π + ln p(θ K) Optimization can be carried out using standard optimization techniques. Oliver Stegle GWAS V: Gaussian processes Summer

60 Function space view Learning hyperparameters Formally we are interested in the posterior p(θ K D) p (y X, Θ K ) p(θ K ) Inference is analytically intractable! MAP estimate instead of a full posterior. Set Θ K to the most probable hyperparameter settings: ˆ Θ K = argmax Θ K ln [p (y X, Θ K ) p(θ K )] = argmax ln N ( y 0, KX,X (Θ K ) + σ 2 I ) + ln p(θ K ) Θ K [ = argmax 1 Θ K 2 log det[k X,X(Θ K ) + σ 2 I] 1 2 yt [K X,X (Θ K ) + σ 2 I] 1 y N ] 2 log 2π + ln p(θ K) Optimization can be carried out using standard optimization techniques. Oliver Stegle GWAS V: Gaussian processes Summer

61 Function space view Choosing covariance functions The covariance function embodies the prior belief about functions. Example: linear regression y n = wx n + c + ψ n Covariance function denote covariation k(x n, x n) = y n y n = (wx n + c + ψ n )(wx n + c + ψ n) = w 2 x n x n + c 2 +δ }{{} n,n ψn 2 kernel: k(x n,x n) Oliver Stegle GWAS V: Gaussian processes Summer

62 Function space view Choosing covariance functions Multidimensional input space Generalise squared exponential covariance function to multiple dimensions 1 Dimension k SE (x i, x j, ; A, L) = A 2 exp { 0.5 (x i x j ) 2 } D Dimensions dd D k SE (x i, x j, ; A, L) = A 2 exp 0.5 (x d i xd j )2 Lengthscale parameters L d denote relevance of a particular data dimension. Large L d correspond to irrelevant dimensions. d=1 L 2 d L 2 Oliver Stegle GWAS V: Gaussian processes Summer

63 Function space view Choosing covariance functions Multidimensional input space Generalise squared exponential covariance function to multiple dimensions 1 Dimension k SE (x i, x j, ; A, L) = A 2 exp { 0.5 (x i x j ) 2 } D Dimensions dd D k SE (x i, x j, ; A, L) = A 2 exp 0.5 (x d i xd j )2 Lengthscale parameters L d denote relevance of a particular data dimension. Large L d correspond to irrelevant dimensions. d=1 L 2 d L 2 Oliver Stegle GWAS V: Gaussian processes Summer

64 Function space view Choosing covariance functions 2D regression 5 4 Y X X2 Oliver Stegle GWAS V: Gaussian processes Summer

65 Function space view Choosing covariance functions 2D regression Y X X2 Oliver Stegle GWAS V: Gaussian processes Summer

66 Function space view Choosing covariance functions Any kernel will do Established kernels are all valid covariance functions, allowing for a wide range of possible input domains X: Graph kernels (molecules) Kernels defined on strings (DNA sequences) Oliver Stegle GWAS V: Gaussian processes Summer

67 Function space view Choosing covariance functions Combining existing covariance functions The sum of two covariances functions is itself a valid covariance function k S (x, x ) = k 1 (x, x ) + k 2 (x, x ) The product of two covariance functions is itself a valid covariance function k P (x, x ) = k 1 (x, x ) k 2 (x, x ) Oliver Stegle GWAS V: Gaussian processes Summer

68 Function space view GPs versus variance component models Variance component Linear model p(y X, θ, σ 2 ) = N ( y Φ(X) θ, σ 2 I ) Marginalize over θ p(y X, σg, 2 σ 2 ) = N ( y 0, σgφ(x)φ(x) 2 T +σ 2 I ) }{{} K Gaussian process Define covariance through recipe K X,X (Θ K ) Implies marginal likelihood p(y X, Θ K, σ 2 ) = N ( y 0, K X,X (Θ K ) +σ 2 I ) }{{} K Any feature map Φ implies a valid covariance function K X,X (Θ K ). The inverse is not necessarily true! Oliver Stegle GWAS V: Gaussian processes Summer

69 Function space view GPs versus variance component models Variance component Linear model p(y X, θ, σ 2 ) = N ( y Φ(X) θ, σ 2 I ) Marginalize over θ p(y X, σg, 2 σ 2 ) = N ( y 0, σgφ(x)φ(x) 2 T +σ 2 I ) }{{} K Gaussian process Define covariance through recipe K X,X (Θ K ) Implies marginal likelihood p(y X, Θ K, σ 2 ) = N ( y 0, K X,X (Θ K ) +σ 2 I ) }{{} K Any feature map Φ implies a valid covariance function K X,X (Θ K ). The inverse is not necessarily true! Oliver Stegle GWAS V: Gaussian processes Summer

70 Function space view GPs versus variance component models Variance component Linear model p(y X, θ, σ 2 ) = N ( y Φ(X) θ, σ 2 I ) Marginalize over θ p(y X, σg, 2 σ 2 ) = N ( y 0, σgφ(x)φ(x) 2 T +σ 2 I ) }{{} K Gaussian process Define covariance through recipe K X,X (Θ K ) Implies marginal likelihood p(y X, Θ K, σ 2 ) = N ( y 0, K X,X (Θ K ) +σ 2 I ) }{{} K Any feature map Φ implies a valid covariance function K X,X (Θ K ). The inverse is not necessarily true! Oliver Stegle GWAS V: Gaussian processes Summer

71 Function space view GPs versus variance component models Variance component Linear model p(y X, θ, σ 2 ) = N ( y Φ(X) θ, σ 2 I ) Marginalize over θ p(y X, σg, 2 σ 2 ) = N ( y 0, σgφ(x)φ(x) 2 T +σ 2 I ) }{{} K Gaussian process Define covariance through recipe K X,X (Θ K ) Implies marginal likelihood p(y X, Θ K, σ 2 ) = N ( y 0, K X,X (Θ K ) +σ 2 I ) }{{} K Any feature map Φ implies a valid covariance function K X,X (Θ K ). The inverse is not necessarily true! Oliver Stegle GWAS V: Gaussian processes Summer

72 GP classification & other extensions Outline Motivation Intuitive approach Function space view GP classification & other extensions Summary Oliver Stegle GWAS V: Gaussian processes Summer

73 GP classification & other extensions GPs for classification How to deal with binary observations? Y X X1 0.5 Oliver Stegle GWAS V: Gaussian processes Summer

74 GP classification & other extensions GPs for classification How to deal with binary observations? 50 0 Y X X2 0.5 Oliver Stegle GWAS V: Gaussian processes Summer

75 GP classification & other extensions GPs for classification Probit likelihood model Posterior with a general likelihood model p(f X, y, Θ K, σ 2 ) Classification: probit link model Likelihood {}}{ Prior {}}{ N N (f 0, K X,X (Θ K )) p(y n f n ) p(y n = 1 f n ) = exp( f n ) n=1 Oliver Stegle GWAS V: Gaussian processes Summer

76 GP classification & other extensions GPs for classification Inference Inference with non-gaussian likelihood is analytically intractable. Idea: approximate the true likelihood terms each with a Gaussian exact likelihood {}}{ [ Prior {}}{ N KL N (f 0, K X,K (Θ K )) p(y n f n ) n=1 N ] N (f 0, K X,X (Θ K )) N (f n µ n, σ n ) }{{} n=1 Prior }{{} approximation The KL divergence is a common measure of approximation accuracy (θ D) KL[P Q] = P (θ)p θ Q(θ) Oliver Stegle GWAS V: Gaussian processes Summer

77 GP classification & other extensions GPs for classification Inference Inference with non-gaussian likelihood is analytically intractable. Idea: approximate the true likelihood terms each with a Gaussian exact likelihood {}}{ [ Prior {}}{ N KL N (f 0, K X,K (Θ K )) p(y n f n ) n=1 N ] N (f 0, K X,X (Θ K )) N (f n µ n, σ n ) }{{} n=1 Prior }{{} approximation The KL divergence is a common measure of approximation accuracy (θ D) KL[P Q] = P (θ)p θ Q(θ) Oliver Stegle GWAS V: Gaussian processes Summer

78 GP classification & other extensions Robust regression Regression with 15% outliers 3 + 2*stdDev mean Oliver Stegle GWAS V: Gaussian processes Summer

79 GP classification & other extensions Robust regression Regression with 1% outliers *stdDev mean Oliver Stegle GWAS V: Gaussian processes Summer

80 GP classification & other extensions Robust regression Mixture likelihood model Naive: filtering. We rather would like the likelihood model to empobdy the belief that a fraction of datapoints is useless. p(y n f n ) = π ok N ( y n fn, σ 2) + (1 π ok )N ( y n fn, σ 2 ) Oliver Stegle GWAS V: Gaussian processes Summer

81 GP classification & other extensions Robust regression Mixture likelihood model Naive: filtering. We rather would like the likelihood model to empobdy the belief that a fraction of datapoints is useless. p(y n f n ) = π ok N ( y n fn, σ 2) + (1 π ok )N ( y n fn, σ 2 ) Oliver Stegle GWAS V: Gaussian processes Summer

82 GP classification & other extensions Robust regression Mixture likelihood in action Robust noise model 3 + 2*stdDev mean Oliver Stegle GWAS V: Gaussian processes Summer

83 GP classification & other extensions Why Gaussian processes and not something else? Tractable probabilistic model; uncertainty estimates Equal or better performance than other methods. Many other approaches are special case Linear regression Splines Neural networks Variance component models Kernel method; flexible choice of covariance functions. Major limitation: inversion of N N matrix; scaling O(N 3 ). Max. 5,000 datapoints. General purpose tricks: 50,000 datapoints. Tricks for special cases: FastLMM: > 100, 000 datapoints. Oliver Stegle GWAS V: Gaussian processes Summer

84 GP classification & other extensions Why Gaussian processes and not something else? Tractable probabilistic model; uncertainty estimates Equal or better performance than other methods. Many other approaches are special case Linear regression Splines Neural networks Variance component models Kernel method; flexible choice of covariance functions. Major limitation: inversion of N N matrix; scaling O(N 3 ). Max. 5,000 datapoints. General purpose tricks: 50,000 datapoints. Tricks for special cases: FastLMM: > 100, 000 datapoints. Oliver Stegle GWAS V: Gaussian processes Summer

85 GP classification & other extensions Why Gaussian processes and not something else? Tractable probabilistic model; uncertainty estimates Equal or better performance than other methods. Many other approaches are special case Linear regression Splines Neural networks Variance component models Kernel method; flexible choice of covariance functions. Major limitation: inversion of N N matrix; scaling O(N 3 ). Max. 5,000 datapoints. General purpose tricks: 50,000 datapoints. Tricks for special cases: FastLMM: > 100, 000 datapoints. Oliver Stegle GWAS V: Gaussian processes Summer

86 Summary Outline Motivation Intuitive approach Function space view GP classification & other extensions Summary Oliver Stegle GWAS V: Gaussian processes Summer

87 Summary Summary The key ingredient of a Gaussian processes is the covariance function; a recipe to construct covariance matrices. GP predictions boil down to conditioning joint Gaussian distributions. Most probable covariance function hyperparameters can be derived from the marginal likelihood. Close relationship between linear models, variance component models and Gaussian processes. Non-Gaussian likelihood models allow for classification and robust regression, however require approximate inference techniques. Oliver Stegle GWAS V: Gaussian processes Summer

88 Summary References I C. Bishop. Pattern recognition and machine learning, volume 4. Springer New York, D. J. MacKay. Gaussian process basics. Video Lectures, URL C. Rasmussen. Gaussian processes in machine learning. Advanced Lectures on Machine Learning, pages 63 71, URL Oliver Stegle GWAS V: Gaussian processes Summer

GWAS IV: Bayesian linear (variance component) models

GWAS IV: Bayesian linear (variance component) models GWAS IV: Bayesian linear (variance component) models Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS IV: Bayesian

More information

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression Group Prof. Daniel Cremers 9. Gaussian Processes - Regression Repetition: Regularized Regression Before, we solved for w using the pseudoinverse. But: we can kernelize this problem as well! First step:

More information

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu Lecture: Gaussian Process Regression STAT 6474 Instructor: Hongxiao Zhu Motivation Reference: Marc Deisenroth s tutorial on Robot Learning. 2 Fast Learning for Autonomous Robots with Gaussian Processes

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

Introduction to Gaussian Processes

Introduction to Gaussian Processes Introduction to Gaussian Processes Iain Murray murray@cs.toronto.edu CSC255, Introduction to Machine Learning, Fall 28 Dept. Computer Science, University of Toronto The problem Learn scalar function of

More information

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression Group Prof. Daniel Cremers 4. Gaussian Processes - Regression Definition (Rep.) Definition: A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution.

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Nonparameteric Regression:

Nonparameteric Regression: Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,

More information

GAUSSIAN PROCESS REGRESSION

GAUSSIAN PROCESS REGRESSION GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The

More information

Model Selection for Gaussian Processes

Model Selection for Gaussian Processes Institute for Adaptive and Neural Computation School of Informatics,, UK December 26 Outline GP basics Model selection: covariance functions and parameterizations Criteria for model selection Marginal

More information

Gaussian Processes (10/16/13)

Gaussian Processes (10/16/13) STA561: Probabilistic machine learning Gaussian Processes (10/16/13) Lecturer: Barbara Engelhardt Scribes: Changwei Hu, Di Jin, Mengdi Wang 1 Introduction In supervised learning, we observe some inputs

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

DD Advanced Machine Learning

DD Advanced Machine Learning Modelling Carl Henrik {chek}@csc.kth.se Royal Institute of Technology November 4, 2015 Who do I think you are? Mathematically competent linear algebra multivariate calculus Ok programmers Able to extend

More information

Gaussian Process Regression

Gaussian Process Regression Gaussian Process Regression 4F1 Pattern Recognition, 21 Carl Edward Rasmussen Department of Engineering, University of Cambridge November 11th - 16th, 21 Rasmussen (Engineering, Cambridge) Gaussian Process

More information

Gaussian processes and bayesian optimization Stanisław Jastrzębski. kudkudak.github.io kudkudak

Gaussian processes and bayesian optimization Stanisław Jastrzębski. kudkudak.github.io kudkudak Gaussian processes and bayesian optimization Stanisław Jastrzębski kudkudak.github.io kudkudak Plan Goal: talk about modern hyperparameter optimization algorithms Bayes reminder: equivalent linear regression

More information

Probabilistic Graphical Models Lecture 20: Gaussian Processes

Probabilistic Graphical Models Lecture 20: Gaussian Processes Probabilistic Graphical Models Lecture 20: Gaussian Processes Andrew Gordon Wilson www.cs.cmu.edu/~andrewgw Carnegie Mellon University March 30, 2015 1 / 53 What is Machine Learning? Machine learning algorithms

More information

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations

More information

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian

More information

Neutron inverse kinetics via Gaussian Processes

Neutron inverse kinetics via Gaussian Processes Neutron inverse kinetics via Gaussian Processes P. Picca Politecnico di Torino, Torino, Italy R. Furfaro University of Arizona, Tucson, Arizona Outline Introduction Review of inverse kinetics techniques

More information

CS-E3210 Machine Learning: Basic Principles

CS-E3210 Machine Learning: Basic Principles CS-E3210 Machine Learning: Basic Principles Lecture 4: Regression II slides by Markus Heinonen Department of Computer Science Aalto University, School of Science Autumn (Period I) 2017 1 / 61 Today s introduction

More information

Introduction to Gaussian Processes

Introduction to Gaussian Processes Introduction to Gaussian Processes 1 Objectives to express prior knowledge/beliefs about model outputs using Gaussian process (GP) to sample functions from the probability measure defined by GP to build

More information

Machine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart

Machine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart Machine Learning Bayesian Regression & Classification learning as inference, Bayesian Kernel Ridge regression & Gaussian Processes, Bayesian Kernel Logistic Regression & GP classification, Bayesian Neural

More information

Gaussian Processes in Machine Learning

Gaussian Processes in Machine Learning Gaussian Processes in Machine Learning November 17, 2011 CharmGil Hong Agenda Motivation GP : How does it make sense? Prior : Defining a GP More about Mean and Covariance Functions Posterior : Conditioning

More information

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model (& discussion on the GPLVM tech. report by Prof. N. Lawrence, 06) Andreas Damianou Department of Neuro- and Computer Science,

More information

Linear Dynamical Systems

Linear Dynamical Systems Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations

More information

Introduction to Gaussian Process

Introduction to Gaussian Process Introduction to Gaussian Process CS 778 Chris Tensmeyer CS 478 INTRODUCTION 1 What Topic? Machine Learning Regression Bayesian ML Bayesian Regression Bayesian Non-parametric Gaussian Process (GP) GP Regression

More information

Introduction to Gaussian Processes

Introduction to Gaussian Processes Introduction to Gaussian Processes Iain Murray School of Informatics, University of Edinburgh The problem Learn scalar function of vector values f(x).5.5 f(x) y i.5.2.4.6.8 x f 5 5.5 x x 2.5 We have (possibly

More information

Introduction. Chapter 1

Introduction. Chapter 1 Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics

More information

Statistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes

Statistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes Statistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes Lecturer: Drew Bagnell Scribe: Venkatraman Narayanan 1, M. Koval and P. Parashar 1 Applications of Gaussian

More information

Approximate Inference Part 1 of 2

Approximate Inference Part 1 of 2 Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ Bayesian paradigm Consistent use of probability theory

More information

Introduction to Gaussian Processes

Introduction to Gaussian Processes Introduction to Gaussian Processes Neil D. Lawrence GPSS 10th June 2013 Book Rasmussen and Williams (2006) Outline The Gaussian Density Covariance from Basis Functions Basis Function Representations Constructing

More information

Approximate Inference Part 1 of 2

Approximate Inference Part 1 of 2 Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ 1 Bayesian paradigm Consistent use of probability theory

More information

COMP 551 Applied Machine Learning Lecture 20: Gaussian processes

COMP 551 Applied Machine Learning Lecture 20: Gaussian processes COMP 55 Applied Machine Learning Lecture 2: Gaussian processes Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~hvanho2/comp55

More information

Probabilistic & Unsupervised Learning

Probabilistic & Unsupervised Learning Probabilistic & Unsupervised Learning Gaussian Processes Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College London

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.) Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori

More information

Gaussian with mean ( µ ) and standard deviation ( σ)

Gaussian with mean ( µ ) and standard deviation ( σ) Slide from Pieter Abbeel Gaussian with mean ( µ ) and standard deviation ( σ) 10/6/16 CSE-571: Robotics X ~ N( µ, σ ) Y ~ N( aµ + b, a σ ) Y = ax + b + + + + 1 1 1 1 1 1 1 1 1 1, ~ ) ( ) ( ), ( ~ ), (

More information

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

PILCO: A Model-Based and Data-Efficient Approach to Policy Search PILCO: A Model-Based and Data-Efficient Approach to Policy Search (M.P. Deisenroth and C.E. Rasmussen) CSC2541 November 4, 2016 PILCO Graphical Model PILCO Probabilistic Inference for Learning COntrol

More information

Lecture 5: GPs and Streaming regression

Lecture 5: GPs and Streaming regression Lecture 5: GPs and Streaming regression Gaussian Processes Information gain Confidence intervals COMP-652 and ECSE-608, Lecture 5 - September 19, 2017 1 Recall: Non-parametric regression Input space X

More information

Probabilistic numerics for deep learning

Probabilistic numerics for deep learning Presenter: Shijia Wang Department of Engineering Science, University of Oxford rning (RLSS) Summer School, Montreal 2017 Outline 1 Introduction Probabilistic Numerics 2 Components Probabilistic modeling

More information

20: Gaussian Processes

20: Gaussian Processes 10-708: Probabilistic Graphical Models 10-708, Spring 2016 20: Gaussian Processes Lecturer: Andrew Gordon Wilson Scribes: Sai Ganesh Bandiatmakuri 1 Discussion about ML Here we discuss an introduction

More information

Part 1: Expectation Propagation

Part 1: Expectation Propagation Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 1: Expectation Propagation Tom Heskes Machine Learning Group, Institute for Computing and Information Sciences Radboud

More information

Gaussian Processes. 1 What problems can be solved by Gaussian Processes?

Gaussian Processes. 1 What problems can be solved by Gaussian Processes? Statistical Techniques in Robotics (16-831, F1) Lecture#19 (Wednesday November 16) Gaussian Processes Lecturer: Drew Bagnell Scribe:Yamuna Krishnamurthy 1 1 What problems can be solved by Gaussian Processes?

More information

STAT 518 Intro Student Presentation

STAT 518 Intro Student Presentation STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible

More information

Linear Regression and Discrimination

Linear Regression and Discrimination Linear Regression and Discrimination Kernel-based Learning Methods Christian Igel Institut für Neuroinformatik Ruhr-Universität Bochum, Germany http://www.neuroinformatik.rub.de July 16, 2009 Christian

More information

Reliability Monitoring Using Log Gaussian Process Regression

Reliability Monitoring Using Log Gaussian Process Regression COPYRIGHT 013, M. Modarres Reliability Monitoring Using Log Gaussian Process Regression Martin Wayne Mohammad Modarres PSA 013 Center for Risk and Reliability University of Maryland Department of Mechanical

More information

Machine learning - HT Maximum Likelihood

Machine learning - HT Maximum Likelihood Machine learning - HT 2016 3. Maximum Likelihood Varun Kanade University of Oxford January 27, 2016 Outline Probabilistic Framework Formulate linear regression in the language of probability Introduce

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning

More information

CS 7140: Advanced Machine Learning

CS 7140: Advanced Machine Learning Instructor CS 714: Advanced Machine Learning Lecture 3: Gaussian Processes (17 Jan, 218) Jan-Willem van de Meent (j.vandemeent@northeastern.edu) Scribes Mo Han (han.m@husky.neu.edu) Guillem Reus Muns (reusmuns.g@husky.neu.edu)

More information

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014 Learning with Noisy Labels Kate Niehaus Reading group 11-Feb-2014 Outline Motivations Generative model approach: Lawrence, N. & Scho lkopf, B. Estimating a Kernel Fisher Discriminant in the Presence of

More information

Gaussian Processes for Machine Learning

Gaussian Processes for Machine Learning Gaussian Processes for Machine Learning Carl Edward Rasmussen Max Planck Institute for Biological Cybernetics Tübingen, Germany carl@tuebingen.mpg.de Carlos III, Madrid, May 2006 The actual science of

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 20: Expectation Maximization Algorithm EM for Mixture Models Many figures courtesy Kevin Murphy s

More information

COM336: Neural Computing

COM336: Neural Computing COM336: Neural Computing http://www.dcs.shef.ac.uk/ sjr/com336/ Lecture 2: Density Estimation Steve Renals Department of Computer Science University of Sheffield Sheffield S1 4DP UK email: s.renals@dcs.shef.ac.uk

More information

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University

More information

Active and Semi-supervised Kernel Classification

Active and Semi-supervised Kernel Classification Active and Semi-supervised Kernel Classification Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London Work done in collaboration with Xiaojin Zhu (CMU), John Lafferty (CMU),

More information

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS

More information

Supervised Learning Coursework

Supervised Learning Coursework Supervised Learning Coursework John Shawe-Taylor Tom Diethe Dorota Glowacka November 30, 2009; submission date: noon December 18, 2009 Abstract Using a series of synthetic examples, in this exercise session

More information

Multiple-step Time Series Forecasting with Sparse Gaussian Processes

Multiple-step Time Series Forecasting with Sparse Gaussian Processes Multiple-step Time Series Forecasting with Sparse Gaussian Processes Perry Groot ab Peter Lucas a Paul van den Bosch b a Radboud University, Model-Based Systems Development, Heyendaalseweg 135, 6525 AJ

More information

Statistical Techniques in Robotics (16-831, F12) Lecture#20 (Monday November 12) Gaussian Processes

Statistical Techniques in Robotics (16-831, F12) Lecture#20 (Monday November 12) Gaussian Processes Statistical Techniques in Robotics (6-83, F) Lecture# (Monday November ) Gaussian Processes Lecturer: Drew Bagnell Scribe: Venkatraman Narayanan Applications of Gaussian Processes (a) Inverse Kinematics

More information

Recent Advances in Bayesian Inference Techniques

Recent Advances in Bayesian Inference Techniques Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian

More information

Probabilistic Models for Learning Data Representations. Andreas Damianou

Probabilistic Models for Learning Data Representations. Andreas Damianou Probabilistic Models for Learning Data Representations Andreas Damianou Department of Computer Science, University of Sheffield, UK IBM Research, Nairobi, Kenya, 23/06/2015 Sheffield SITraN Outline Part

More information

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative

More information

Advanced Introduction to Machine Learning CMU-10715

Advanced Introduction to Machine Learning CMU-10715 Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes Barnabás Póczos http://www.gaussianprocess.org/ 2 Some of these slides in the intro are taken from D. Lizotte, R. Parr, C. Guesterin

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin

More information

CS281 Section 4: Factor Analysis and PCA

CS281 Section 4: Factor Analysis and PCA CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we

More information

System identification and control with (deep) Gaussian processes. Andreas Damianou

System identification and control with (deep) Gaussian processes. Andreas Damianou System identification and control with (deep) Gaussian processes Andreas Damianou Department of Computer Science, University of Sheffield, UK MIT, 11 Feb. 2016 Outline Part 1: Introduction Part 2: Gaussian

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Practical Bayesian Optimization of Machine Learning. Learning Algorithms

Practical Bayesian Optimization of Machine Learning. Learning Algorithms Practical Bayesian Optimization of Machine Learning Algorithms CS 294 University of California, Berkeley Tuesday, April 20, 2016 Motivation Machine Learning Algorithms (MLA s) have hyperparameters that

More information

The Variational Gaussian Approximation Revisited

The Variational Gaussian Approximation Revisited The Variational Gaussian Approximation Revisited Manfred Opper Cédric Archambeau March 16, 2009 Abstract The variational approximation of posterior distributions by multivariate Gaussians has been much

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

an introduction to bayesian inference

an introduction to bayesian inference with an application to network analysis http://jakehofman.com january 13, 2010 motivation would like models that: provide predictive and explanatory power are complex enough to describe observed phenomena

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm 1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable

More information

Expectation Propagation in Dynamical Systems

Expectation Propagation in Dynamical Systems Expectation Propagation in Dynamical Systems Marc Peter Deisenroth Joint Work with Shakir Mohamed (UBC) August 10, 2012 Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 1 Motivation Figure : Complex

More information

Part 2: Multivariate fmri analysis using a sparsifying spatio-temporal prior

Part 2: Multivariate fmri analysis using a sparsifying spatio-temporal prior Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 2: Multivariate fmri analysis using a sparsifying spatio-temporal prior Tom Heskes joint work with Marcel van Gerven

More information

CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes

CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes Roger Grosse Roger Grosse CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes 1 / 55 Adminis-Trivia Did everyone get my e-mail

More information

Multivariate Bayesian Linear Regression MLAI Lecture 11

Multivariate Bayesian Linear Regression MLAI Lecture 11 Multivariate Bayesian Linear Regression MLAI Lecture 11 Neil D. Lawrence Department of Computer Science Sheffield University 21st October 2012 Outline Univariate Bayesian Linear Regression Multivariate

More information

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II 1 Non-linear regression techniques Part - II Regression Algorithms in this Course Support Vector Machine Relevance Vector Machine Support vector regression Boosting random projections Relevance vector

More information

Ch 4. Linear Models for Classification

Ch 4. Linear Models for Classification Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,

More information

Expectation Propagation for Approximate Bayesian Inference

Expectation Propagation for Approximate Bayesian Inference Expectation Propagation for Approximate Bayesian Inference José Miguel Hernández Lobato Universidad Autónoma de Madrid, Computer Science Department February 5, 2007 1/ 24 Bayesian Inference Inference Given

More information

Machine Learning Srihari. Gaussian Processes. Sargur Srihari

Machine Learning Srihari. Gaussian Processes. Sargur Srihari Gaussian Processes Sargur Srihari 1 Topics in Gaussian Processes 1. Examples of use of GP 2. Duality: From Basis Functions to Kernel Functions 3. GP Definition and Intuition 4. Linear regression revisited

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September

More information

Learning Gaussian Process Models from Uncertain Data

Learning Gaussian Process Models from Uncertain Data Learning Gaussian Process Models from Uncertain Data Patrick Dallaire, Camille Besse, and Brahim Chaib-draa DAMAS Laboratory, Computer Science & Software Engineering Department, Laval University, Canada

More information

Prediction of Data with help of the Gaussian Process Method

Prediction of Data with help of the Gaussian Process Method of Data with help of the Gaussian Process Method R. Preuss, U. von Toussaint Max-Planck-Institute for Plasma Physics EURATOM Association 878 Garching, Germany March, Abstract The simulation of plasma-wall

More information

Bayesian Interpretations of Regularization

Bayesian Interpretations of Regularization Bayesian Interpretations of Regularization Charlie Frogner 9.50 Class 15 April 1, 009 The Plan Regularized least squares maps {(x i, y i )} n i=1 to a function that minimizes the regularized loss: f S

More information

Introduction to SVM and RVM

Introduction to SVM and RVM Introduction to SVM and RVM Machine Learning Seminar HUS HVL UIB Yushu Li, UIB Overview Support vector machine SVM First introduced by Vapnik, et al. 1992 Several literature and wide applications Relevance

More information

MTTTS16 Learning from Multiple Sources

MTTTS16 Learning from Multiple Sources MTTTS16 Learning from Multiple Sources 5 ECTS credits Autumn 2018, University of Tampere Lecturer: Jaakko Peltonen Lecture 6: Multitask learning with kernel methods and nonparametric models On this lecture:

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

Variational Principal Components

Variational Principal Components Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings

More information

1 Bayesian Linear Regression (BLR)

1 Bayesian Linear Regression (BLR) Statistical Techniques in Robotics (STR, S15) Lecture#10 (Wednesday, February 11) Lecturer: Byron Boots Gaussian Properties, Bayesian Linear Regression 1 Bayesian Linear Regression (BLR) In linear regression,

More information

Kernel methods, kernel SVM and ridge regression

Kernel methods, kernel SVM and ridge regression Kernel methods, kernel SVM and ridge regression Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Collaborative Filtering 2 Collaborative Filtering R: rating matrix; U: user factor;

More information

Tokamak profile database construction incorporating Gaussian process regression

Tokamak profile database construction incorporating Gaussian process regression Tokamak profile database construction incorporating Gaussian process regression A. Ho 1, J. Citrin 1, C. Bourdelle 2, Y. Camenen 3, F. Felici 4, M. Maslov 5, K.L. van de Plassche 1,4, H. Weisen 6 and JET

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters

More information

Variational Model Selection for Sparse Gaussian Process Regression

Variational Model Selection for Sparse Gaussian Process Regression Variational Model Selection for Sparse Gaussian Process Regression Michalis K. Titsias School of Computer Science University of Manchester 7 September 2008 Outline Gaussian process regression and sparse

More information

PATTERN RECOGNITION AND MACHINE LEARNING

PATTERN RECOGNITION AND MACHINE LEARNING PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality

More information

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization Prof. Daniel Cremers 6. Mixture Models and Expectation-Maximization Motivation Often the introduction of latent (unobserved) random variables into a model can help to express complex (marginal) distributions

More information

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K

More information