Models. Carl Henrik Ek Philip H. S. Torr Neil D. Lawrence. Computer Science Departmental Seminar University of Bristol October 25 th, 2007

Size: px
Start display at page:

Download "Models. Carl Henrik Ek Philip H. S. Torr Neil D. Lawrence. Computer Science Departmental Seminar University of Bristol October 25 th, 2007"

Transcription

1 Carl Henrik Ek Philip H. S. Torr Neil D. Lawrence Oxford Brookes University University of Manchester Computer Science Departmental Seminar University of Bristol October 25 th, 2007

2 Source code and slides are available online MATLAB Toolboxes neill/software.html. Contact Carl Henrik Ek Neil D. Lawrence

3 1 Introduction 2 es 3 GP-LVM 4 GP-LVM 5 Applications 6 Conclusion 7 References

4 1 Introduction 2 es 3 GP-LVM 4 GP-LVM 5 Applications 6 Conclusion 7 References

5 Representation Dimensional Object 1 code/image sample.m

6 Representation Dimensional Object 1 code/image sample.m

7 Representation Dimensional Object 1 code/image sample.m

8 Representation Dimensional Object 1 code/image sample.m

9 Representation Dimensional Object 1 code/image sample.m

10 Representation Dimensional Object 1 code/image sample.m

11 Representation 1 Representation often reflects collection of data rather than characteristics of data 1 code/image sample.m

12 Re-representation Feature Selection: is the technique of selecting a subset of relevant features for building robust learning models. By removing most irrelevant and redundant features from the data, feature selection helps improve the performance of learning models. Feature Extraction: Feature extraction involves simplifying the amount of resources required to describe a large set of data accurately.

13 Re-representation Feature Selection: is the technique of selecting a subset of relevant features for building robust learning models. By removing most irrelevant and redundant features from the data, feature selection helps improve the performance of learning models. Supervised Dimensionality Reduction Feature Extraction: Feature extraction involves simplifying the amount of resources required to describe a large set of data accurately.

14 Re-representation Feature Selection: is the technique of selecting a subset of relevant features for building robust learning models. By removing most irrelevant and redundant features from the data, feature selection helps improve the performance of learning models. Supervised Dimensionality Reduction Feature Extraction: Feature extraction involves simplifying the amount of resources required to describe a large set of data accurately. Unsupervised Dimensionality Reduction

15 Today Feature Extraction with supervised structure

16 1 Introduction 2 es 3 GP-LVM 4 GP-LVM 5 Applications 6 Conclusion 7 References

17 es (GP) 2 Generalisation of Distribution Distribution: Mean vector, Covariance matrix : Mean function, Covariance function Distributions over functions Functions are infinite objects GP s defined over infinite index sets instantiations of functions Provides probabilistic framework for dealing with functions 2 [Rasmussen and Williams(2006)]

18 es: Design : GP (µ(x), k(x, x )) Mean function µ(x) Often taken to be zero, i.e. centering of data Covariance function k(x, x ) Defines the class of functions the GP contains Class of valid covariance functions is the class of Mercer kernels

19 GP-covariance function Linear Radial Basis Function (RBF) k(x i, x j ) = x i x T j k(x i, x j ) = θe γ 2 x i x j 2 Multi-Layered Perceptron (MLP) k(x i, x j ) = θsin -1 wx T i x j + b (wx T i x i + b + 1 ) ( wx T i x i + b + 1 ) Notation: Φ collects parameters of covariance function

20 GP-prior Linear Kernel 0!0.5!1!1.5!0.5!0.4!0.3!0.2! Linear Kernel 3 code/prior sample.m

21 GP-prior RBF Kernel width=1 0!0.5!1!1.5!2!0.5!0.4!0.3!0.2! RBF Kernel width = 1 3 code/prior sample.m

22 GP-prior RBF Kernel width=1e!1!0.5!1!1.5!2!2.5!3!0.5!0.4!0.3!0.2! RBF Kernel width = code/prior sample.m

23 GP-prior ()*,e./e0 1i3t561e!2 0!1!2!3!0"#!0"4!0"3!0"2!0"1 0 0"1 0"2 0"3 0"4 0"# RBF Kernel width = code/prior sample.m

24 GP-prior MLP Kernel!0.5!1!1.5!2!2.5!0.5!0.4!0.3!0.2! MLP Kernel 3 code/prior sample.m

25 GP-prior RBF, Linear, Noise!1!2!3!4!5!6!0.5!0.4!0.3!0.2! Linear + RBF + Noise Kernel 3 code/prior sample.m

26 GP-posterior Corresponding observations D = {(x i, y i ) i = 1,..., N} x i R D y i R Joint Distribution with unobserved data x [ ] ( [ y k(x, X) + σ N 0, 2 ]) I k(x, x ) k(x, X) k(x, x ) + σ 2 y Predictions from the posterior y N (ȳ, cov(y )) details

27 GP-posterior 4 4 code/posterior sample.m

28 GP-posterior 4 4 code/posterior sample.m

29 GP-posterior 4 4 code/posterior sample.m

30 Regression Regression problem y i = f (x i ) + ɛ ɛ N (0, k) How can we choose the co-variance function? How do we choose the parameters of the co-variance function?

31 GP-training Formulate the marginal likelihood p(y X, Φ) = p(y f, X, Φ)p(f X, Φ)df p(f X, Φ) = N (0, K) Find parameters Φ that maximises the marginal likelihood log p(y X) = 1 2 tr ( y T (K + σ 2 I) 1 y ) }{{} data fit 1 2 log det(k + σ2 I) N log 2π }{{} 2 complexity

32 GP-regression 5 Introduction to es Interpolation with es Prediction with es Regression with es Examples Parametric vs GPs Conclusions Learning Kernel Parameters Learning Kernel Parameters Can we determine length scales and noise levels from the data? demoptimisekern 1 0.5!1.5!1! !0.5!1!1.5!2 log!likelihood!4!5!6!7!8!9!10!11!12 10! length scale Neil Lawrence es 1 2 tr ( y T (K + β 1 I) 1 y ) 1 }{{} 2 logdet(k + β 1 I) N }{{} 2 log2π data fit complexity Model Selection 5 Images: N.D. Lawrence

33 GP-regression 5 Introduction to es Interpolation with es Prediction with es Regression with es Examples Parametric vs GPs Conclusions Learning Kernel Parameters Learning Kernel Parameters Can we determine length scales and noise levels from the data? demoptimisekern 1 0.5!1.5!1! !0.5!1!1.5!2 log!likelihood!4!5!6!7!8!9!10!11!12 10! length scale Neil Lawrence es 1 2 tr ( y T (K + β 1 I) 1 y ) 1 }{{} 2 logdet(k + β 1 I) N }{{} 2 log2π data fit complexity Model Selection 5 Images: N.D. Lawrence

34 GP-regression 5 Introduction to es Interpolation with es Prediction with es Regression with es Examples Parametric vs GPs Conclusions Learning Kernel Parameters Learning Kernel Parameters Can we determine length scales and noise levels from the data? demoptimisekern 1 0.5!1.5!1! !0.5!1!1.5!2 log!likelihood!4!5!6!7!8!9!10!11!12 10! length scale Neil Lawrence es 1 2 tr ( y T (K + β 1 I) 1 y ) 1 }{{} 2 logdet(k + β 1 I) N }{{} 2 log2π data fit complexity Model Selection 5 Images: N.D. Lawrence

35 GP-regression 5 Introduction to es Interpolation with es Prediction with es Regression with es Examples Parametric vs GPs Conclusions Learning Kernel Parameters Learning Kernel Parameters Can we determine length scales and noise levels from the data? demoptimisekern 1 0.5!1.5!1! !0.5!1!1.5!2 log!likelihood!4!5!6!7!8!9!10!11!12 10! length scale Neil Lawrence es 1 2 tr ( y T (K + β 1 I) 1 y ) 1 }{{} 2 logdet(k + β 1 I) N }{{} 2 log2π data fit complexity Model Selection 5 Images: N.D. Lawrence

36 GP-regression 5 Introduction to es Interpolation with es Prediction with es Regression with es Examples Parametric vs GPs Conclusions Learning Kernel Parameters Learning Kernel Parameters Can we determine length scales and noise levels from the data? demoptimisekern 1 0.5!1.5!1! !0.5!1!1.5!2 log!likelihood!4!5!6!7!8!9!10!11!12 10! length scale Neil Lawrence es 1 2 tr ( y T (K + β 1 I) 1 y ) 1 }{{} 2 logdet(k + β 1 I) N }{{} 2 log2π data fit complexity Model Selection 5 Images: N.D. Lawrence

37 GP-regression 5 Introduction to es Interpolation with es Prediction with es Regression with es Examples Parametric vs GPs Conclusions Learning Kernel Parameters Learning Kernel Parameters Can we determine length scales and noise levels from the data? demoptimisekern 1 0.5!1.5!1! !0.5!1!1.5!2 log!likelihood!4!5!6!7!8!9!10!11!12 10! length scale Neil Lawrence es 1 2 tr ( y T (K + β 1 I) 1 y ) 1 }{{} 2 logdet(k + β 1 I) N }{{} 2 log2π data fit complexity Model Selection 5 Images: N.D. Lawrence

38 GP-regression 5 Introduction to es Interpolation with es Prediction with es Regression with es Examples Parametric vs GPs Conclusions Learning Kernel Parameters Learning Kernel Parameters Can we determine length scales and noise levels from the data? demoptimisekern 1 0.5!1.5!1! !0.5!1!1.5!2 log!likelihood!4!5!6!7!8!9!10!11!12 10! length scale Neil Lawrence es 1 2 tr ( y T (K + β 1 I) 1 y ) 1 }{{} 2 logdet(k + β 1 I) N }{{} 2 log2π data fit complexity Model Selection 5 Images: N.D. Lawrence

39 GP-regression 5 Introduction to es Interpolation with es Prediction with es Regression with es Examples Parametric vs GPs Conclusions Learning Kernel Parameters Learning Kernel Parameters Can we determine length scales and noise levels from the data? demoptimisekern 1 0.5!1.5!1! !0.5!1!1.5!2 log!likelihood!4!5!6!7!8!9!10!11!12 10! length scale Neil Lawrence es 1 2 tr ( y T (K + β 1 I) 1 y ) 1 }{{} 2 logdet(k + β 1 I) N }{{} 2 log2π data fit complexity Model Selection 5 Images: N.D. Lawrence

40 GP-regression 5 Introduction to es Interpolation with es Prediction with es Regression with es Examples Parametric vs GPs Conclusions Learning Kernel Parameters Learning Kernel Parameters Can we determine length scales and noise levels from the data? demoptimisekern 1 0.5!1.5!1! !0.5!1!1.5!2 log!likelihood!4!5!6!7!8!9!10!11!12 10! length scale Neil Lawrence es 1 2 tr ( y T (K + β 1 I) 1 y ) 1 }{{} 2 logdet(k + β 1 I) N }{{} 2 log2π data fit complexity Model Selection 5 Images: N.D. Lawrence

41 1 Introduction 2 es 3 GP-LVM 4 GP-LVM 5 Applications 6 Conclusion 7 References

42 Dimensionality Reduction Un-raveling: Assumption: manifold structure preserved in observed representation MDS,PCA,Isomap,MVU 6,... Raveling : Assumption: observed data smoothly sampled from low dimensional manifold PPCA,GTM,GP-LVM 6 [Weinberger et al.(2004)weinberger, Sha, and Saul]

43 GP-LVM 7 X Y W PPCA: marginalise latent locations optimize parameters Limited to linear relationships Closed form solution Dual PPCA: marginalise parameters optimise latent locations Allows for non-linear relationships No closed form solution when non-linear 7 [Lawrence(2005)]

44 GP-LVM Observed data y i R D generated from a latent variable x i R q y i = f (x i ) + ɛ Find latent locations X and kernel parameters Φ maximising marginal likelihood {ˆX, ˆΦ} = argmax X,Φ p(y X, Φ)

45 GP-LVM Advantages: + Correctly models sampling process + Provides a mapping to the observed data + Associates uncertainty to latent locations and observed locations Challenges: - Non-Convex optimisation problem for general co-variance functions Initialisation of parameters - Computationally expensive - Manifold dimensionality free parameter

46 Back-Constrained GP-LVM 8 sampling process by a smooth function Points close in latent close in observed space Does not preserve smoothness from observed space Constrain latent coordinates to be represented by smooth mapping from observed data x i = g(y i, W) Indirectly optimise latent locations X {W, Φ} = argmax W,Φ p(y W, Φ) 8 [Lawrence and Candela(2006)]

47 Dynamic GP-LVM 9 Learn latent representation respecting ordering of observations Auto-regressive function h x t = h(x t 1 ) + ɛ dyn ɛ dyn N (0, σ 2 dyni) Place GP-prior over h and combine with GP-LVM {ˆX, ˆΦY, ˆΦ dyn } = argmax X,ΦY,Φ dyn p(y X, Φ Y )p(x Φ dyn ) 9 [Wang et al.(2006)wang, Fleet, and Hertzmann]

48 1 Introduction 2 es 3 GP-LVM 4 GP-LVM 5 Applications 6 Conclusion 7 References

49 GP-LVM Corresponding observations of same underlying phenomenon Example: Different language representations of text Facial expression and robot servos Model both observations using a single latent representation Infer corresponding locations between spaces

50 GP-LVM! Y Y X Z! Z Learn two separate kernels {Φ Y, Φ Z } from a single shared latent representation X Objective p(y, Z X, Φ Y, Φ Z ) = p(y X, Φ Y )p(z X, Φ Z ) Inference

51 GP-LVM representation represent full variance of both observation spaces Manifold alignment Not possible to align manifolds Manifolds topologically different variance small relative full variance Not reflected by objective function argmax X,ΦY,Φ Z p(y, Z X, Φ Y, Φ Z )

52 GP-LVM Experiments 10!dyn! Y X! Z W Y Z Feature Pose Silhouette Features: y i R 100, Pose Parameters: z i R 54 Multi-modal: Same silhouette could have been generated from several different poses NOT possible to model with a regression model Generative model: p(silhouette pose) 1 Dimensionality of pose space 2 Limited amount of training data 10 [Ek et al.(2007)ek, Torr, and Lawrence]

53 GP-LVM Experiments Most variance in feature silhouette space irrelevant for pose

54 GP-LVM Experiments Most variance in feature silhouette space irrelevant for pose

55 Introduction Manifold Alignment: Both observation spaces lie on manifolds of same topology New Model: Subspace of each manifold share topology

56 Assumptions Observations y i Y and z i Z generated from low-dimensional manifold y ni = fi Y (u Y n ) + ɛ Y ni z ni = fi Z (u Z n) + ɛ Z ni Assume U Y and U Z share a non-zero subspace X S U Y X S U Z X S 0

57 Assumption Spaces X Y and X Z completes latent representation y ni = fi Y ({ x S n, x Y }) n + ɛ Y ni z ni = f Z i ({ x S n, x Z }) n + ɛ Z ni

58 Assumptions subspace X S x S i = g Y (y i ) = g Z (z i ) Private subspaces X Y and X Z { x Y i = h Y (y i ) x Z i = h Z (z i )

59 Graphical Model X Y X S X Z Y h f Y g Y g f h Z Z Z! Y! Z Y Z

60 Canonical Correlation Analysis Correlation: ρ YZ = Canonical Correlation Analysis: cov(y, Z) var(y)var(z) Find directions {W Y, W Z } in each observed space maximizing the correlation { ay = YW Canonical variate Y a Z = ZW Z Solution through Eigenvalue problem details solution

61 Non-Consolidating Component Analysis CCA explains shared variance Non-Consolidating Component Analysis Directions explaining remaining variance ˆv 1 Y = argmax v Y(v Y 1 1 ) T cov(y, Y)v1 Y { (v Y subject to: 1 ) T v1 Y = 1 (v1 Y)T W = 0 Solution through Eigenvalue Problem details

62 Non-Consolidating Correlation Analysis Successive directions solved through additional eigenvalue problem details Add directions until sufficient amount of variance explained by basis Summary Variance: X S CCA Private Variance: X Y,Z NCCA

63 Non-linear Both algorithms can easily be kernelised Non-linearise through kernel induced feature space { ΨY : Y F Y Ψ Z : Z F Z Many kernels do dimensionality expansion instead of reduction (e.g. RBF)

64 Non-linear Sampled Correlation: ρ s = cov s (y, z) vars (y)var s (z) = { Expand } = cos(y, z) Gram matrix in point expanded feature space close full-rank Feature spaces effectively the same CCA trivial Details

65 Practical Non-Linearisation 1 Represent each feature space by dominant principal directions Remove trivial CCA solution Find correlated directions explaining significant variance 2 Apply CCA and NCCA in reduced feature space Feature Spaces: Many possible choices of feature space 1 Linear Kernel 2 RBF 3 Maximum Variance Unfolding, Isomap Main interest topology of latent space kernels not providing explicit mapping learn { g {Y,Z}, h {Y,Z}} by GP-regression

66 Model Selection Rank embedding according to generating function f 1 Data-fit of f 2 Complexity of f Encapsulated by GP-LVM objective Lawrence [Lawrence(2005)] suggested spectral initialisation of latent locations details Hamerling [Harmeling(2007)] compared GP-LVM likelihood with procrustes score to ground truth { If { g {Y,Z}, h {Y,Z}} have correctly unraveled manifold f {Y,Z} should hold with a high likelihood Select embedding according to p(y, Z {X S, X Y, X Z }, Φ Y, Φ Z )

67 Summary 1 Map observations to feature space 2 Re-represent feature space by dominant principal directions 3 Extract shared directions using CCA 4 Extract private directions using NCCA 5 Chose embedding maximising GP-LVM likelihood 6 Train GP-regressors over implicit mappings

68 Illusion demo 11 DEMO: ILLUSION 11 code/demo illusion1.m

69 Illusion demo 12 DEMO: ILLUSION2 12 code/demo illusion2.m

70 Human Pose Estimation X Y X Y Z h g g! Y X f Y Y f S Z Z Z h! dyn! Z Y Z 1 Embed observations using NCCA 2 Learn dynamic GP over pose re-representation

71 NCCA demo 13 DEMO: NCCA POSE 13 code/demo ncca.m

72 Conclusion Introduced es and the GP-LVM GP-LVM models for multiple observation spaces GP-LVM analogy to CCA Application for Multimodal regression

73 eof.

74 References C. H. Ek, P. H. Torr, and N. D. Lawrence. process latent variable models for human pose estimation. In MLMI, S. Harmeling. Exploring model selection techniques for nonlinear dimensionality reduction. Technical Report EDI-INF-RR-0960, University of Edinburgh, N. D. Lawrence. Probabilistic non-linear principal component analysis with process latent variable models. JMLR, 6: , N. D. Lawrence and J. Q. Candela. Local distance preservation in the gp-lvm through back constraints. In ICML, pages , C. E. Rasmussen and C. K. Williams. es for Machine Learning. The MIT Press, J. M. Wang, D. J. Fleet, and A. Hertzmann. process dynamical models. In Y. Weiss, B. Schölkopf, and J. C. Platt, editors, NIPS, volume 18, Cambridge, MA, MIT.

75 K. Q. Weinberger, F. Sha, and L. K. Saul. Learning a kernel matrix for nonlinear dimensionality reduction. In R. Greiner and D. Schuurmans, editors, ICML, volume 21, pages Omnipress, 2004.

76 8 Appendix

77 GP-Prediction Predictive Equations y N (ȳ, cov(y )) ȳ = k(x, X) ( k(x, X) + σ 2) 1 y cov(y ) = ( k(x, x ) + σ 2) ( k(x, X) + σ 2) 1 k(x, x ) Linear Predictor N y = α i k(x i, x j ) i=1 α = ( K(X, X) + σ 2 I ) 1 y Return

78 GP-LVM: Inference Infer location of z i given corresponding y i ˆx = argmax x p(y x, X, Φ Y ) ẑ = f Z (ˆx ) Return

79 CCA Correlation ρ = tr ( W T Y YT ZW Z ) ( tr ( W T Z Z T ZW Z ) tr ( W T Y Y T YW Y )) 1 2 CCA {W Y, W Z } = argmax WY,W Z tr(wy T YT ZW Z ) { W T subject to: Y Y T YW Y = I WZ TZT ZW Z = I Return

80 CCA CCA Solution: (Y T Y) 1 Y T Z(Z T Z) 1 Z T Yw i y = λ 2 i w i y (Z T Z) 1 Z T Y(Y T Y) 1 Y T Zw i z = λ 2 i w i z Return

81 CCA Geometry Sampled Correlation: ρ s = cov s (y, z) vars (y)var s (z) = { covs (y, z) = y T z var s (y) = y T y = y 2 } = y T z y 2 z = { y T y = y z cos(y, z) } y z cos(y, z) = 2 y z = cos(y, z) Return

82 CCA Trivial Solution Kernel Trick { ΨY : Y F Map points to feature space Y Ψ Z : Z F Z Reduced feature space representation: subspace spanned by training data R N Arbitrary vector v = N i=1 Ψ(y i) T α i Canonical Variate: a = Ψ Y Ψ T Y α }{{} K Y rank(k Y ) = rank(ψ Y ) = {rank-nullity theorem} = dim (im (Ψ Y )) K full rank each point explained by separate feature perfect correlation by alignment Return

83 NCCA NCCA solution: ( cov(y, Y) WY W T Y cov(y, Y)) v 1 = λ 1 v 1 Successive Directions: k th ( ( ) ) k 1 cov(y, Y) W Y WY T + v i vi T cov(y, Y) v k = λ k v k i=1 Return1 Return2

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model (& discussion on the GPLVM tech. report by Prof. N. Lawrence, 06) Andreas Damianou Department of Neuro- and Computer Science,

More information

Non Linear Latent Variable Models

Non Linear Latent Variable Models Non Linear Latent Variable Models Neil Lawrence GPRS 14th February 2014 Outline Nonlinear Latent Variable Models Extensions Outline Nonlinear Latent Variable Models Extensions Non-Linear Latent Variable

More information

Latent Variable Models with Gaussian Processes

Latent Variable Models with Gaussian Processes Latent Variable Models with Gaussian Processes Neil D. Lawrence GP Master Class 6th February 2017 Outline Motivating Example Linear Dimensionality Reduction Non-linear Dimensionality Reduction Outline

More information

Probabilistic Models for Learning Data Representations. Andreas Damianou

Probabilistic Models for Learning Data Representations. Andreas Damianou Probabilistic Models for Learning Data Representations Andreas Damianou Department of Computer Science, University of Sheffield, UK IBM Research, Nairobi, Kenya, 23/06/2015 Sheffield SITraN Outline Part

More information

Gaussian Processes (10/16/13)

Gaussian Processes (10/16/13) STA561: Probabilistic machine learning Gaussian Processes (10/16/13) Lecturer: Barbara Engelhardt Scribes: Changwei Hu, Di Jin, Mengdi Wang 1 Introduction In supervised learning, we observe some inputs

More information

Dimensionality Reduction

Dimensionality Reduction Dimensionality Reduction Neil D. Lawrence neill@cs.man.ac.uk Mathematics for Data Modelling University of Sheffield January 23rd 28 Neil Lawrence () Dimensionality Reduction Data Modelling School 1 / 7

More information

GWAS V: Gaussian processes

GWAS V: Gaussian processes GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011

More information

Probabilistic & Unsupervised Learning

Probabilistic & Unsupervised Learning Probabilistic & Unsupervised Learning Gaussian Processes Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College London

More information

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR

More information

Nonparameteric Regression:

Nonparameteric Regression: Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

Dimensionality Reduction by Unsupervised Regression

Dimensionality Reduction by Unsupervised Regression Dimensionality Reduction by Unsupervised Regression Miguel Á. Carreira-Perpiñán, EECS, UC Merced http://faculty.ucmerced.edu/mcarreira-perpinan Zhengdong Lu, CSEE, OGI http://www.csee.ogi.edu/~zhengdon

More information

Introduction to Gaussian Processes

Introduction to Gaussian Processes Introduction to Gaussian Processes Raquel Urtasun TTI Chicago August 2, 2013 R. Urtasun (TTIC) Gaussian Processes August 2, 2013 1 / 59 Motivation for Non-Linear Dimensionality Reduction USPS Data Set

More information

Kernel methods for comparing distributions, measuring dependence

Kernel methods for comparing distributions, measuring dependence Kernel methods for comparing distributions, measuring dependence Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Principal component analysis Given a set of M centered observations

More information

Gaussian Process Regression

Gaussian Process Regression Gaussian Process Regression 4F1 Pattern Recognition, 21 Carl Edward Rasmussen Department of Engineering, University of Cambridge November 11th - 16th, 21 Rasmussen (Engineering, Cambridge) Gaussian Process

More information

STAT 518 Intro Student Presentation

STAT 518 Intro Student Presentation STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible

More information

Deep Gaussian Processes

Deep Gaussian Processes Deep Gaussian Processes Neil D. Lawrence 30th April 2015 KTH Royal Institute of Technology Outline Introduction Deep Gaussian Process Models Variational Approximation Samples and Results Outline Introduction

More information

Introduction to Gaussian Processes

Introduction to Gaussian Processes Introduction to Gaussian Processes Neil D. Lawrence GPSS 10th June 2013 Book Rasmussen and Williams (2006) Outline The Gaussian Density Covariance from Basis Functions Basis Function Representations Constructing

More information

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395 Data Mining Dimensionality reduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 42 Outline 1 Introduction 2 Feature selection

More information

Approximate Kernel Methods

Approximate Kernel Methods Lecture 3 Approximate Kernel Methods Bharath K. Sriperumbudur Department of Statistics, Pennsylvania State University Machine Learning Summer School Tübingen, 207 Outline Motivating example Ridge regression

More information

Machine Learning. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Machine Learning. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395 Machine Learning Dimensionality reduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1395 1 / 47 Table of contents 1 Introduction

More information

Missing Data in Kernel PCA

Missing Data in Kernel PCA Missing Data in Kernel PCA Guido Sanguinetti, Neil D. Lawrence Department of Computer Science, University of Sheffield 211 Portobello Street, Sheffield S1 4DP, U.K. 19th June 26 Abstract Kernel Principal

More information

Nonlinear Dimensionality Reduction

Nonlinear Dimensionality Reduction Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Kernel PCA 2 Isomap 3 Locally Linear Embedding 4 Laplacian Eigenmap

More information

System identification and control with (deep) Gaussian processes. Andreas Damianou

System identification and control with (deep) Gaussian processes. Andreas Damianou System identification and control with (deep) Gaussian processes Andreas Damianou Department of Computer Science, University of Sheffield, UK MIT, 11 Feb. 2016 Outline Part 1: Introduction Part 2: Gaussian

More information

Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models

Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models Journal of Machine Learning Research 6 (25) 1783 1816 Submitted 4/5; Revised 1/5; Published 11/5 Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models Neil

More information

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu Lecture: Gaussian Process Regression STAT 6474 Instructor: Hongxiao Zhu Motivation Reference: Marc Deisenroth s tutorial on Robot Learning. 2 Fast Learning for Autonomous Robots with Gaussian Processes

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Supervised Learning Coursework

Supervised Learning Coursework Supervised Learning Coursework John Shawe-Taylor Tom Diethe Dorota Glowacka November 30, 2009; submission date: noon December 18, 2009 Abstract Using a series of synthetic examples, in this exercise session

More information

Modelling Transcriptional Regulation with Gaussian Processes

Modelling Transcriptional Regulation with Gaussian Processes Modelling Transcriptional Regulation with Gaussian Processes Neil Lawrence School of Computer Science University of Manchester Joint work with Magnus Rattray and Guido Sanguinetti 8th March 7 Outline Application

More information

Visualization with Gaussian Processes

Visualization with Gaussian Processes Visualization with Gaussian Processes Neil Lawrence BioPreDyn Course, EBI Cambridge, 13th May 2014 Outline Motivation Nonlinear Latent Variable Models Single Cell Data Extensions Outline Motivation Nonlinear

More information

Advanced Introduction to Machine Learning CMU-10715

Advanced Introduction to Machine Learning CMU-10715 Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes Barnabás Póczos http://www.gaussianprocess.org/ 2 Some of these slides in the intro are taken from D. Lizotte, R. Parr, C. Guesterin

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 8 Continuous Latent Variable

More information

DD Advanced Machine Learning

DD Advanced Machine Learning Modelling Carl Henrik {chek}@csc.kth.se Royal Institute of Technology November 4, 2015 Who do I think you are? Mathematically competent linear algebra multivariate calculus Ok programmers Able to extend

More information

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian

More information

Gaussian Process Latent Variable Models for Dimensionality Reduction and Time Series Modeling

Gaussian Process Latent Variable Models for Dimensionality Reduction and Time Series Modeling Gaussian Process Latent Variable Models for Dimensionality Reduction and Time Series Modeling Nakul Gopalan IAS, TU Darmstadt nakul.gopalan@stud.tu-darmstadt.de Abstract Time series data of high dimensions

More information

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS

More information

Nonparametric Inference for Auto-Encoding Variational Bayes

Nonparametric Inference for Auto-Encoding Variational Bayes Nonparametric Inference for Auto-Encoding Variational Bayes Erik Bodin * Iman Malik * Carl Henrik Ek * Neill D. F. Campbell * University of Bristol University of Bath Variational approximations are an

More information

Learning Gaussian Process Models from Uncertain Data

Learning Gaussian Process Models from Uncertain Data Learning Gaussian Process Models from Uncertain Data Patrick Dallaire, Camille Besse, and Brahim Chaib-draa DAMAS Laboratory, Computer Science & Software Engineering Department, Laval University, Canada

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x

More information

MTTTS16 Learning from Multiple Sources

MTTTS16 Learning from Multiple Sources MTTTS16 Learning from Multiple Sources 5 ECTS credits Autumn 2018, University of Tampere Lecturer: Jaakko Peltonen Lecture 6: Multitask learning with kernel methods and nonparametric models On this lecture:

More information

Gaussian Processes for Machine Learning

Gaussian Processes for Machine Learning Gaussian Processes for Machine Learning Carl Edward Rasmussen Max Planck Institute for Biological Cybernetics Tübingen, Germany carl@tuebingen.mpg.de Carlos III, Madrid, May 2006 The actual science of

More information

Latent Variable Models and EM Algorithm

Latent Variable Models and EM Algorithm SC4/SM8 Advanced Topics in Statistical Machine Learning Latent Variable Models and EM Algorithm Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/atsml/

More information

Deep Gaussian Processes and Variational Propagation of Uncertainty

Deep Gaussian Processes and Variational Propagation of Uncertainty Deep Gaussian Processes and Variational Propagation of Uncertainty Andreas Damianou Department of Neuroscience University of Sheffield This dissertation is submitted for the degree of Doctor of Philosophy

More information

Expectation Propagation in Dynamical Systems

Expectation Propagation in Dynamical Systems Expectation Propagation in Dynamical Systems Marc Peter Deisenroth Joint Work with Shakir Mohamed (UBC) August 10, 2012 Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 1 Motivation Figure : Complex

More information

Nonlinear Dimensionality Reduction

Nonlinear Dimensionality Reduction Nonlinear Dimensionality Reduction Piyush Rai CS5350/6350: Machine Learning October 25, 2011 Recap: Linear Dimensionality Reduction Linear Dimensionality Reduction: Based on a linear projection of the

More information

Intelligent Systems I

Intelligent Systems I Intelligent Systems I 00 INTRODUCTION Stefan Harmeling & Philipp Hennig 24. October 2013 Max Planck Institute for Intelligent Systems Dptmt. of Empirical Inference Which Card? Opening Experiment Which

More information

Semi-Supervised Laplacian Regularization of Kernel Canonical Correlation Analysis

Semi-Supervised Laplacian Regularization of Kernel Canonical Correlation Analysis Semi-Supervised Laplacian Regularization of Kernel Canonical Correlation Analsis Matthew B. Blaschko, Christoph H. Lampert, & Arthur Gretton Ma Planck Institute for Biological Cbernetics Tübingen, German

More information

Gaussian processes autoencoder for dimensionality reduction

Gaussian processes autoencoder for dimensionality reduction Gaussian processes autoencoder for dimensionality reduction Conference or Workshop Item Accepted Version Jiang, X., Gao, J., Hong, X. and Cai,. (2014) Gaussian processes autoencoder for dimensionality

More information

Approximate Kernel PCA with Random Features

Approximate Kernel PCA with Random Features Approximate Kernel PCA with Random Features (Computational vs. Statistical Tradeoff) Bharath K. Sriperumbudur Department of Statistics, Pennsylvania State University Journées de Statistique Paris May 28,

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 13: Learning in Gaussian Graphical Models, Non-Gaussian Inference, Monte Carlo Methods Some figures

More information

Principal Component Analysis

Principal Component Analysis CSci 5525: Machine Learning Dec 3, 2008 The Main Idea Given a dataset X = {x 1,..., x N } The Main Idea Given a dataset X = {x 1,..., x N } Find a low-dimensional linear projection The Main Idea Given

More information

GAUSSIAN PROCESS REGRESSION

GAUSSIAN PROCESS REGRESSION GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The

More information

Lecture 3: Latent Variables Models and Learning with the EM Algorithm. Sam Roweis. Tuesday July25, 2006 Machine Learning Summer School, Taiwan

Lecture 3: Latent Variables Models and Learning with the EM Algorithm. Sam Roweis. Tuesday July25, 2006 Machine Learning Summer School, Taiwan Lecture 3: Latent Variables Models and Learning with the EM Algorithm Sam Roweis Tuesday July25, 2006 Machine Learning Summer School, Taiwan Latent Variable Models What to do when a variable z is always

More information

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric

More information

Gaussian Process priors with Uncertain Inputs: Multiple-Step-Ahead Prediction

Gaussian Process priors with Uncertain Inputs: Multiple-Step-Ahead Prediction Gaussian Process priors with Uncertain Inputs: Multiple-Step-Ahead Prediction Agathe Girard Dept. of Computing Science University of Glasgow Glasgow, UK agathe@dcs.gla.ac.uk Carl Edward Rasmussen Gatsby

More information

Gaussian with mean ( µ ) and standard deviation ( σ)

Gaussian with mean ( µ ) and standard deviation ( σ) Slide from Pieter Abbeel Gaussian with mean ( µ ) and standard deviation ( σ) 10/6/16 CSE-571: Robotics X ~ N( µ, σ ) Y ~ N( aµ + b, a σ ) Y = ax + b + + + + 1 1 1 1 1 1 1 1 1 1, ~ ) ( ) ( ), ( ~ ), (

More information

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.) Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori

More information

Lecture 1c: Gaussian Processes for Regression

Lecture 1c: Gaussian Processes for Regression Lecture c: Gaussian Processes for Regression Cédric Archambeau Centre for Computational Statistics and Machine Learning Department of Computer Science University College London c.archambeau@cs.ucl.ac.uk

More information

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Alvina Goh Vision Reading Group 13 October 2005 Connection of Local Linear Embedding, ISOMAP, and Kernel Principal

More information

Neutron inverse kinetics via Gaussian Processes

Neutron inverse kinetics via Gaussian Processes Neutron inverse kinetics via Gaussian Processes P. Picca Politecnico di Torino, Torino, Italy R. Furfaro University of Arizona, Tucson, Arizona Outline Introduction Review of inverse kinetics techniques

More information

Kernel-Based Contrast Functions for Sufficient Dimension Reduction

Kernel-Based Contrast Functions for Sufficient Dimension Reduction Kernel-Based Contrast Functions for Sufficient Dimension Reduction Michael I. Jordan Departments of Statistics and EECS University of California, Berkeley Joint work with Kenji Fukumizu and Francis Bach

More information

Multiple-step Time Series Forecasting with Sparse Gaussian Processes

Multiple-step Time Series Forecasting with Sparse Gaussian Processes Multiple-step Time Series Forecasting with Sparse Gaussian Processes Perry Groot ab Peter Lucas a Paul van den Bosch b a Radboud University, Model-Based Systems Development, Heyendaalseweg 135, 6525 AJ

More information

State Space Gaussian Processes with Non-Gaussian Likelihoods

State Space Gaussian Processes with Non-Gaussian Likelihoods State Space Gaussian Processes with Non-Gaussian Likelihoods Hannes Nickisch 1 Arno Solin 2 Alexander Grigorievskiy 2,3 1 Philips Research, 2 Aalto University, 3 Silo.AI ICML2018 July 13, 2018 Outline

More information

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University

More information

Variational Autoencoders

Variational Autoencoders Variational Autoencoders Recap: Story so far A classification MLP actually comprises two components A feature extraction network that converts the inputs into linearly separable features Or nearly linearly

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Dimension Reduction and Low-dimensional Embedding

Dimension Reduction and Low-dimensional Embedding Dimension Reduction and Low-dimensional Embedding Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1/26 Dimension

More information

SINGLE-TASK AND MULTITASK SPARSE GAUSSIAN PROCESSES

SINGLE-TASK AND MULTITASK SPARSE GAUSSIAN PROCESSES SINGLE-TASK AND MULTITASK SPARSE GAUSSIAN PROCESSES JIANG ZHU, SHILIANG SUN Department of Computer Science and Technology, East China Normal University 500 Dongchuan Road, Shanghai 20024, P. R. China E-MAIL:

More information

Deep Gaussian Processes

Deep Gaussian Processes Deep Gaussian Processes Neil D. Lawrence 8th April 2015 Mascot Num 2015 Outline Introduction Deep Gaussian Process Models Variational Methods Composition of GPs Results Outline Introduction Deep Gaussian

More information

Multi-task Learning with Gaussian Processes, with Applications to Robot Inverse Dynamics

Multi-task Learning with Gaussian Processes, with Applications to Robot Inverse Dynamics 1 / 38 Multi-task Learning with Gaussian Processes, with Applications to Robot Inverse Dynamics Chris Williams with Kian Ming A. Chai, Stefan Klanke, Sethu Vijayakumar December 2009 Motivation 2 / 38 Examples

More information

Dimensionality Reduction AShortTutorial

Dimensionality Reduction AShortTutorial Dimensionality Reduction AShortTutorial Ali Ghodsi Department of Statistics and Actuarial Science University of Waterloo Waterloo, Ontario, Canada, 2006 c Ali Ghodsi, 2006 Contents 1 An Introduction to

More information

Support Vector Machines: Maximum Margin Classifiers

Support Vector Machines: Maximum Margin Classifiers Support Vector Machines: Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 16, 2008 Piotr Mirowski Based on slides by Sumit Chopra and Fu-Jie Huang 1 Outline What is behind

More information

Kernel Bayes Rule: Nonparametric Bayesian inference with kernels

Kernel Bayes Rule: Nonparametric Bayesian inference with kernels Kernel Bayes Rule: Nonparametric Bayesian inference with kernels Kenji Fukumizu The Institute of Statistical Mathematics NIPS 2012 Workshop Confluence between Kernel Methods and Graphical Models December

More information

Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA

Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inria.fr http://perception.inrialpes.fr/

More information

ESANN'2001 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 2001, D-Facto public., ISBN ,

ESANN'2001 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 2001, D-Facto public., ISBN , Sparse Kernel Canonical Correlation Analysis Lili Tan and Colin Fyfe 2, Λ. Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong. 2. School of Information and Communication

More information

Lecture 5: GPs and Streaming regression

Lecture 5: GPs and Streaming regression Lecture 5: GPs and Streaming regression Gaussian Processes Information gain Confidence intervals COMP-652 and ECSE-608, Lecture 5 - September 19, 2017 1 Recall: Non-parametric regression Input space X

More information

Gaussian Process Dynamical Models Jack M Wang, David J Fleet, Aaron Hertzmann, NIPS 2005

Gaussian Process Dynamical Models Jack M Wang, David J Fleet, Aaron Hertzmann, NIPS 2005 Gaussian Process Dynamical Models Jack M Wang, David J Fleet, Aaron Hertzmann, NIPS 2005 Presented by Piotr Mirowski CBLL meeting, May 6, 2009 Courant Institute of Mathematical Sciences, New York University

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Kernel Density Estimation, Factor Analysis Mark Schmidt University of British Columbia Winter 2017 Admin Assignment 2: 2 late days to hand it in tonight. Assignment 3: Due Feburary

More information

Probabilistic Graphical Models Lecture 20: Gaussian Processes

Probabilistic Graphical Models Lecture 20: Gaussian Processes Probabilistic Graphical Models Lecture 20: Gaussian Processes Andrew Gordon Wilson www.cs.cmu.edu/~andrewgw Carnegie Mellon University March 30, 2015 1 / 53 What is Machine Learning? Machine learning algorithms

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan

The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan Background: Global Optimization and Gaussian Processes The Geometry of Gaussian Processes and the Chaining Trick Algorithm

More information

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang. Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning 12. Gaussian Processes Alex Smola Carnegie Mellon University http://alex.smola.org/teaching/cmu2013-10-701 10-701 The Normal Distribution http://www.gaussianprocess.org/gpml/chapters/

More information

Gaussian Process Regression Forecasting of Computer Network Conditions

Gaussian Process Regression Forecasting of Computer Network Conditions Gaussian Process Regression Forecasting of Computer Network Conditions Christina Garman Bucknell University August 3, 2010 Christina Garman (Bucknell University) GPR Forecasting of NPCs August 3, 2010

More information

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian

More information

Factor Analysis and Kalman Filtering (11/2/04)

Factor Analysis and Kalman Filtering (11/2/04) CS281A/Stat241A: Statistical Learning Theory Factor Analysis and Kalman Filtering (11/2/04) Lecturer: Michael I. Jordan Scribes: Byung-Gon Chun and Sunghoon Kim 1 Factor Analysis Factor analysis is used

More information

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given

More information

Variational Gaussian Process Dynamical Systems

Variational Gaussian Process Dynamical Systems Variational Gaussian Process Dynamical Systems Andreas C. Damianou Department of Computer Science University of Sheffield, UK andreas.damianou@sheffield.ac.uk Michalis K. Titsias School of Computer Science

More information

Probabilistic & Unsupervised Learning

Probabilistic & Unsupervised Learning Probabilistic & Unsupervised Learning Week 2: Latent Variable Models Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College

More information

Nonparmeteric Bayes & Gaussian Processes. Baback Moghaddam Machine Learning Group

Nonparmeteric Bayes & Gaussian Processes. Baback Moghaddam Machine Learning Group Nonparmeteric Bayes & Gaussian Processes Baback Moghaddam baback@jpl.nasa.gov Machine Learning Group Outline Bayesian Inference Hierarchical Models Model Selection Parametric vs. Nonparametric Gaussian

More information

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA Yoshua Bengio Pascal Vincent Jean-François Paiement University of Montreal April 2, Snowbird Learning 2003 Learning Modal Structures

More information

Model Selection for Gaussian Processes

Model Selection for Gaussian Processes Institute for Adaptive and Neural Computation School of Informatics,, UK December 26 Outline GP basics Model selection: covariance functions and parameterizations Criteria for model selection Marginal

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Linear and Non-Linear Dimensionality Reduction

Linear and Non-Linear Dimensionality Reduction Linear and Non-Linear Dimensionality Reduction Alexander Schulz aschulz(at)techfak.uni-bielefeld.de University of Pisa, Pisa 4.5.215 and 7.5.215 Overview Dimensionality Reduction Motivation Linear Projections

More information

Reliability Monitoring Using Log Gaussian Process Regression

Reliability Monitoring Using Log Gaussian Process Regression COPYRIGHT 013, M. Modarres Reliability Monitoring Using Log Gaussian Process Regression Martin Wayne Mohammad Modarres PSA 013 Center for Risk and Reliability University of Maryland Department of Mechanical

More information

Building an Automatic Statistician

Building an Automatic Statistician Building an Automatic Statistician Zoubin Ghahramani Department of Engineering University of Cambridge zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ ALT-Discovery Science Conference, October

More information

CS 7140: Advanced Machine Learning

CS 7140: Advanced Machine Learning Instructor CS 714: Advanced Machine Learning Lecture 3: Gaussian Processes (17 Jan, 218) Jan-Willem van de Meent (j.vandemeent@northeastern.edu) Scribes Mo Han (han.m@husky.neu.edu) Guillem Reus Muns (reusmuns.g@husky.neu.edu)

More information