STAT 518 Intro Student Presentation

Similar documents
Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Nonparametric Bayesian Methods (Gaussian Processes)

Advanced Introduction to Machine Learning CMU-10715

A Process over all Stationary Covariance Kernels

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu

Bayesian Nonparametrics

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Nonparameteric Regression:

Prediction of double gene knockout measurements

Gaussian process for nonstationary time series prediction

Probabilistic & Unsupervised Learning

Creating Non-Gaussian Processes from Gaussian Processes by the Log-Sum-Exp Approach. Radford M. Neal, 28 February 2005

Bayesian Methods for Machine Learning

Infinite Mixtures of Gaussian Process Experts

Gaussian Processes (10/16/13)

20: Gaussian Processes

Nonparmeteric Bayes & Gaussian Processes. Baback Moghaddam Machine Learning Group

FastGP: an R package for Gaussian processes

Nonparametric Bayesian Methods - Lecture I

Neutron inverse kinetics via Gaussian Processes

Fast Likelihood-Free Inference via Bayesian Optimization

Introduction to Gaussian Processes

Practical Bayesian Optimization of Machine Learning. Learning Algorithms

INFINITE MIXTURES OF MULTIVARIATE GAUSSIAN PROCESSES

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

GAUSSIAN PROCESS REGRESSION

Pattern Recognition and Machine Learning

Gaussian Process Regression

GWAS V: Gaussian processes

Probabilistic Models for Learning Data Representations. Andreas Damianou

MTTTS16 Learning from Multiple Sources

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Gaussian Process Regression Forecasting of Computer Network Conditions

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II

Recent Advances in Bayesian Inference Techniques

Probabilistic Graphical Models Lecture 20: Gaussian Processes

Multivariate Bayesian Linear Regression MLAI Lecture 11

Density Estimation. Seungjin Choi

Statistical Techniques in Robotics (16-831, F12) Lecture#20 (Monday November 12) Gaussian Processes

Optimization of Gaussian Process Hyperparameters using Rprop

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

Introduction to Gaussian Processes

Statistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes

Why experimenters should not randomize, and what they should do instead

Intelligent Systems I

STA 4273H: Statistical Machine Learning

Gaussian Process Regression: Active Data Selection and Test Point. Rejection. Sambu Seo Marko Wallat Thore Graepel Klaus Obermayer

Modelling Transcriptional Regulation with Gaussian Processes

The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan

Gaussian Processes in Machine Learning

System identification and control with (deep) Gaussian processes. Andreas Damianou

Te Whare Wananga o te Upoko o te Ika a Maui. Computer Science

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA

The Bias-Variance dilemma of the Monte Carlo. method. Technion - Israel Institute of Technology, Technion City, Haifa 32000, Israel

Random Field Models for Applications in Computer Vision

COMP 551 Applied Machine Learning Lecture 20: Gaussian processes

Log Gaussian Cox Processes. Chi Group Meeting February 23, 2016

STA414/2104 Statistical Methods for Machine Learning II

The Variational Gaussian Approximation Revisited

Massachusetts Institute of Technology

Gaussian Processes. 1 What problems can be solved by Gaussian Processes?

Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures

Prediction of Data with help of the Gaussian Process Method

Gaussian Process Regression Model in Spatial Logistic Regression

Based on slides by Richard Zemel

CS 7140: Advanced Machine Learning

Lecture 5: GPs and Streaming regression

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling

Hierarchical Modeling for Univariate Spatial Data

Learning Gaussian Process Models from Uncertain Data

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayes methods for categorical data. April 25, 2017

Gaussian Processes for Regression. Carl Edward Rasmussen. Department of Computer Science. Toronto, ONT, M5S 1A4, Canada.

Model Selection for Gaussian Processes

Gibbs Sampling in Endogenous Variables Models

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017

Statistics & Data Sciences: First Year Prelim Exam May 2018

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Variational Model Selection for Sparse Gaussian Process Regression

Variational Principal Components

STA 4273H: Statistical Machine Learning

Introduction. Chapter 1

Statistical Approaches to Learning and Discovery

STA 4273H: Statistical Machine Learning

Worst-Case Bounds for Gaussian Process Models

STAT Advanced Bayesian Inference

Modeling human function learning with Gaussian processes

Reliability Monitoring Using Log Gaussian Process Regression

CSC 2541: Bayesian Methods for Machine Learning

Gaussian Processes for Machine Learning

Introduction to emulators - the what, the when, the why

A short introduction to INLA and R-INLA

An Alternative Infinite Mixture Of Gaussian Process Experts

Gaussian processes for inference in stochastic differential equations

Bagging During Markov Chain Monte Carlo for Smoother Predictions

Transcription:

STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013

Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999

What the paper is about Regression and Classification Flexible models not limited to simple parametric forms Objective is to obtain a predictive distribution for the outcome of a future observation Gaussian Process Priors Bayesian approach to obtain posterior distributions of model parameters Integrate over model parameters predictive distribution depends only on the known observations Gaussian Processes (GP) to define the covariance functions between outcomes

Linear regression example Observed data : ( x (1), t (1)),..., ( x (n), t (n)) t (i) = outcome (target) for case i [ ] x (i) = x (i) 1,..., x p (i) = vector of p fixed inputs (predictors) for case i Fit a familiar model: t (i) = α + p u=1 x u (i) β u + ɛ (i), ɛ (i) N ( ) 0, σɛ 2 iid Put independent priors on unknown parameters α and β u : α N ( 0, σα) 2, βu N ( ) 0, σu 2 iid

Linear regression example Prior joint multivariate Gaussian distribution for t (i) : [ ] [ E t (i)] p = E α + x u (i) β u + ɛ (i) = 0 u=1 [( [ Cov t (i), t (j)] = E α + = σ 2 α + δ ij = 1 C = p u=1 p u=1 x (i) u x (i) u β u + ɛ (i) ) (α + x u (j) σ 2 u + δ ij σ 2 ɛ p u=1 x (j) u β u + ɛ (j) )] if i = j and 0 otherwise i.e. Kronecker delta { [ Cov t (i), t (j)] } i, j [1, n]

Linear regression example Observed targets t = [ t (1),..., t (n)] T Nn (0, C) Given the inputs for a new case x (n+1), the predictive distribution is Gaussian: E [ t (n+1) t (1),..., t (n)] = k T C 1 t var [ t (n+1) t (1),..., t (n)] = V k T C 1 k k = (Cov [ t (n+1), t (1)],..., Cov [ t (n+1), t (n)] ) T V = Cov [ t (n+1), t (n+1)] = prior var [ t (n+1)]

Why use Gaussian processes? Cov [ t (i), t (j)] is the key term in the predictive distribution e.g. linear combination of prior hyperparameters σ 2 α, σ 2 u, σ 2 ɛ for linear regression Gaussian process procedure can handle more interesting and flexible models, simply by using a different covariance function e.g. regression model based on a class of smooth functions may be obtained with a covariance function: Cov [ ( ) t (i), t (j)] p ( ) = η 2 exp ρ 2 u x (i) u x u (j) 2 + δ ij σɛ 2 u=1

Covariance functions May be constructed with various parts representing prior variance/covariances e.g. constant, linear, exponential. Valid covariance function must always result in a positive definite covariance matrix for the targets. Various hyperparameters may be used to control: amount of noise in the model strength of association between each of the predictors with the target sizes of the different additive components of the model. Posterior inference for these hyperparameters will reveal high-level structure in the data, and make model selection easier and more intuitive.

The title is Regression and classification... Suppose the targets t (i) are from the set {0,..., K 1}. The distribution of t (i) would then be in terms of unobserved real-valued latent variables y (i) 1,..., y (i) K 1 for each case i e.g. Class probabilities for K classes: ( Pr ( t (i) = k ) exp = y (i) k K 1 h=0 exp ( ) y (i) h The K latent values can then be given independent and identical Gaussian process priors. )

Bayesian approach Latent values for the targets have to be integrated over Markov chain Monte Carlo methods e.g. Gibbs sampling Integrate over the posterior distribution for the hyperparameters of the covariance function e.g. η 2, ρ 2 u, σ 2 ɛ Markov chain sampling e.g. hybrid Monte Carlo

Other literature Gaussian processes have recently received more attention with the introduction of kernel machines in machine learning [Rasmussen and Williams, 2006] Various nonparametric and flexible regression and classification methods Chapters 10-12 of [Wakefield, 2013].

Conclusion Why I chose this paper Gaussian Process models are a form of flexible regression: letting the data speak for itself Use of covariance functions could be a natural way of thinking about the underlying physical processes What s Next Construct various covariance functions and the corresponding regression models (some graphs to plot) Reproduce the simulation example (classification problem) Implement the Bayesian procedure in R

Components of a covariance function Constant same for any pair of cases, regardless of inputs x e.g. σ 2 α Linear has form: Exponential has form: p u=1 x (i) u x u (j) σu 2 ( η 2 exp p u=1 ρr u x u (i) x u (j) ), R R [0, 2] for the covariance matrix to be positive definite Noise/Jitter has form δ ij σ 2 ɛ or δ ij J 2 δ ij = 1 if i = j and 0 otherwise

References R M Neal. Regression and classification using Gaussian process priors (with discussion). Bayesian Statistics, 6: 475 501, 1999. Carl Edward Rasmussen and Christopher K I Williams. Gaussian processes for machine learning, volume 1. MIT press Cambridge, MA, 2006. ISBN 978-0-262-18253-9. Jon Wakefield. Bayesian and Frequentist Regression Methods. Springer, 2013.