Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu

Similar documents
Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

Gaussian Processes (10/16/13)

Gaussian Processes in Machine Learning

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression

GAUSSIAN PROCESS REGRESSION

Nonparametric Bayesian Methods (Gaussian Processes)

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

GWAS V: Gaussian processes

Introduction to Gaussian Processes

Practical Bayesian Optimization of Machine Learning. Learning Algorithms

Reliability Monitoring Using Log Gaussian Process Regression

Expectation Propagation in Dynamical Systems

Lecture : Probabilistic Machine Learning

Gaussian with mean ( µ ) and standard deviation ( σ)

Machine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart

STA 4273H: Sta-s-cal Machine Learning

System identification and control with (deep) Gaussian processes. Andreas Damianou

Probabilistic Graphical Models Lecture 20: Gaussian Processes

Model Selection for Gaussian Processes

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II

Probabilistic & Unsupervised Learning

STA414/2104 Statistical Methods for Machine Learning II

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

STAT 518 Intro Student Presentation

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

STA 4273H: Statistical Machine Learning

Lecture 16-17: Bayesian Nonparametrics I. STAT 6474 Instructor: Hongxiao Zhu

Bayesian Regression Linear and Logistic Regression

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017

Probabilistic & Bayesian deep learning. Andreas Damianou

Nonparameteric Regression:

Bayesian Machine Learning

Disease mapping with Gaussian processes

STA 4273H: Statistical Machine Learning

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

COMP 551 Applied Machine Learning Lecture 20: Gaussian processes

Density Estimation. Seungjin Choi

STA 4273H: Statistical Machine Learning

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Learning Gaussian Process Models from Uncertain Data

20: Gaussian Processes

Recent Advances in Bayesian Inference Techniques

Introduction to Gaussian Processes

Introduction. Chapter 1

Nonparmeteric Bayes & Gaussian Processes. Baback Moghaddam Machine Learning Group

Modelling and Control of Nonlinear Systems using Gaussian Processes with Partial Model Information

Non-Gaussian likelihoods for Gaussian Processes

Integrated Non-Factorized Variational Inference

Log Gaussian Cox Processes. Chi Group Meeting February 23, 2016

Neutron inverse kinetics via Gaussian Processes

MTTTS16 Learning from Multiple Sources

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li.

Reinforcement Learning with Reference Tracking Control in Continuous State Spaces

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

COMP 551 Applied Machine Learning Lecture 19: Bayesian Inference

Model-Based Reinforcement Learning with Continuous States and Actions

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Probabilistic numerics for deep learning

Introduction to Gaussian Processes

Gaussian Processes for Machine Learning

CMU-Q Lecture 24:

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

A short introduction to INLA and R-INLA

Variable sigma Gaussian processes: An expectation propagation perspective

Lecture 5: GPs and Streaming regression

Pattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods

Introduction to Machine Learning

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning

Probabilistic Machine Learning. Industrial AI Lab.

Gaussian Process Regression

Gaussian Process Optimization with Mutual Information

Gaussian Process Regression Networks

Likelihood NIPS July 30, Gaussian Process Regression with Student-t. Likelihood. Jarno Vanhatalo, Pasi Jylanki and Aki Vehtari NIPS-2009

ECE521 week 3: 23/26 January 2017

Statistical Learning Reading Assignments

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Analytic Long-Term Forecasting with Periodic Gaussian Processes

PART I INTRODUCTION The meaning of probability Basic definitions for frequentist statistics and Bayesian inference Bayesian inference Combinatorics

Adaptive Reinforcement Learning

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE

Introduction to Machine Learning

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Chapter 4 - Fundamentals of spatial processes Lecture notes

Introduction to Gaussian Processes

A Process over all Stationary Covariance Kernels

Identification of Gaussian Process State-Space Models with Particle Stochastic Approximation EM

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA

Probabilistic Models for Learning Data Representations. Andreas Damianou

Advanced Introduction to Machine Learning CMU-10715

L09. PARTICLE FILTERING. NA568 Mobile Robotics: Methods & Algorithms

Autonomous Mobile Robot Design

AUTOMOTIVE ENVIRONMENT SENSORS

Active and Semi-supervised Kernel Classification

Transcription:

Lecture: Gaussian Process Regression STAT 6474 Instructor: Hongxiao Zhu

Motivation Reference: Marc Deisenroth s tutorial on Robot Learning. 2

Fast Learning for Autonomous Robots with Gaussian Processes Demo 1: Cart-Pole Swing-up Swing up and balance a freely swinging pendulum on a cart. No knowledge about nonlinear dynamics >> learn from scratch. 3

Fast learning for Autonomous Robots with Gaussian processes Demo 2: Learning to Control a Low-Cost Manipulator. 4

Idea: Reinforcement Learning Difference between optimal control and reinforcement learning is that in optimal control, you assume that f() is known. 5

6

7

8

At x=7, what is going on? f(7) = -1.5. I need to make decision based on this prediction. 9

10

We need to characterize the model errors --- Use Gaussian Process to Characterize the Uncertainty of Prediction. 11

From Statistical Perspective 12

Linear vs. Nonlinear regression We have i.i.d. data pairs Want to make inference about relations between input and output. Linear relationships. 13

One puts priors on w (e.g. normal), and derives the posterior. To make predictions on new input distribution:, we use posterior predictive Alternatively, one could first map x to some basis function, then let Still restrictive due to sensitivity to the choice of 14

Want more flexible form for f(x), treat the whole f(.) as a parameter. Assume f(.) takes values from a function space. Assume f(.) is random. In particular, Gaussian Process. What does a Gaussian Process look like? 15

16

Gaussian process Assume that f(.) is a Gaussian process. Definition: A Gaussian process is a (infinite) collection of random variables, any finite number of which have a joint Gaussian distribution. A GP is completely specified by its mean m(x) and covariance function k(x, x ). 17

Gaussian process cont d We write with Interpret arbitrary finite projection, consistency (well-definedness). 18

Parametric forms of covariance functions Squared Exponential (SE) Matérn 19

Reminder: conditional distribution of MVN. Reference: https://en.wikipedia.org/wiki/multivariate_normal_distribution#conditional_distributions

Case 1: the GP regression with noise free observations 21

Case 1: the GP regression noise free observations (cont d) * 22

Case 2: the GP regression -- noisy observations 23

Model Selection: Hyperparameters 24 24

Optimizing Marginal Likelihood Note: marginalized f(.) from the likelihood. 25

Example 1. R code demo of GP regression 26

The GP classification binary case 27

The GP classification binary case (cont d) Let y = 1/0, the class labels. Assume latent function f(x),. Assume 28

f is a nuisance function (latent variable). we do not observe values of f itself (we observe only the inputs X and the class labels y) and we are not particularly interested in the values of f, but rather in π, in particular for test cases π(x ). The purpose of f is solely to allow a convenient formulation of the model. 29

Steps for predicting the y given x : 1. First compute the distribution of the latent variable corresponding to a test case. 2. Use this distribution over the latent f to produce a probabilistic prediction. 30

In classification p(y f) has link function involved, conjugacy of f are lost, and the integrations in the previous slides are difficult. Thus we need to use either analytic approximations of integrals to approximate p(f X, y), or solutions based on Monte Carlo sampling. E.g. Laplace approximation, expectation propagation (EP), INLA etc. 31

Laplace s method utilizes a Gaussian approximation q(f X, y) to the posterior p(f X, y). Laplace approximation: Local (normal) approximation to the posterior by matching the mode. We will discuss more in the non-mcmc method lecture. 32

33

Posterior predictive mean and variance based on Laplace approximation: 34