The Gaussian process latent variable model with Cox regression

Similar documents
Probabilistic & Unsupervised Learning

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Gaussian with mean ( µ ) and standard deviation ( σ)

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II

Probabilistic Reasoning in Deep Learning

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression

PATTERN RECOGNITION AND MACHINE LEARNING

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

Stochastic Variational Inference for Gaussian Process Latent Variable Models using Back Constraints

Introduction to Probabilistic Graphical Models: Exercises

Bayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine

Gaussian processes and bayesian optimization Stanisław Jastrzębski. kudkudak.github.io kudkudak

Nonparameteric Regression:

Gaussian Processes (10/16/13)

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Probabilistic Graphical Models Lecture 20: Gaussian Processes

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

STA 414/2104, Spring 2014, Practice Problem Set #1

Bayesian Machine Learning

DD Advanced Machine Learning

Nonparametric Regression With Gaussian Processes

The Variational Gaussian Approximation Revisited

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

Reliability Monitoring Using Log Gaussian Process Regression

Pattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods

HMM part 1. Dr Philip Jackson

GWAS V: Gaussian processes

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Relevance Vector Machines

Machine Learning - MT & 5. Basis Expansion, Regularization, Validation

GAUSSIAN PROCESS REGRESSION

Advanced Introduction to Machine Learning CMU-10715

Linear Models for Regression

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen

Analysing geoadditive regression data: a mixed model approach

Gaussian Process Regression

COMP 551 Applied Machine Learning Lecture 20: Gaussian processes

Model Selection for Gaussian Processes

Learning Gaussian Process Models from Uncertain Data

Gaussian Processes in Machine Learning

20: Gaussian Processes

Gaussian processes for inference in stochastic differential equations

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Nonparametric Bayesian Methods (Gaussian Processes)

CMU-Q Lecture 24:

Modelling geoadditive survival data

Probabilistic numerics for deep learning

How to build an automatic statistician

Fully Bayesian Deep Gaussian Processes for Uncertainty Quantification

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Probabilistic & Unsupervised Learning

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Non-Factorised Variational Inference in Dynamical Systems

Chapter 2. Data Analysis

Linear Dynamical Systems (Kalman filter)

CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes

Introduction to Gaussian Processes

STA 4273H: Sta-s-cal Machine Learning

Cheng Soon Ong & Christian Walder. Canberra February June 2018

COMP 551 Applied Machine Learning Lecture 19: Bayesian Inference

Gaussian process regression for survival data with competing risks

Gaussian Processes for Machine Learning

Further Issues and Conclusions

Lecture : Probabilistic Machine Learning

CS-E4830 Kernel Methods in Machine Learning

ECE521 week 3: 23/26 January 2017

Multivariate Bayesian Linear Regression MLAI Lecture 11

Machine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart

Gaussian Processes for Big Data. James Hensman

Probabilistic & Bayesian deep learning. Andreas Damianou

System identification and control with (deep) Gaussian processes. Andreas Damianou

STAT 518 Intro Student Presentation

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

On Bayesian Computation

Machine Learning Srihari. Gaussian Processes. Sargur Srihari

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Bayesian Linear Regression. Sargur Srihari

Gaussian Process Latent Variable Models for Dimensionality Reduction and Time Series Modeling

Afternoon Meeting on Bayesian Computation 2018 University of Reading

Outline Lecture 2 2(32)

Lecture 6: Bayesian Inference in SDE Models

Practical Bayesian Optimization of Machine Learning. Learning Algorithms

PMR Learning as Inference

Gaussian process regression for Sensitivity analysis

Introduction to SVM and RVM

COMP90051 Statistical Machine Learning

A latent variable modelling approach to the acoustic-to-articulatory mapping problem

Neutron inverse kinetics via Gaussian Processes

Doubly Stochastic Inference for Deep Gaussian Processes. Hugh Salimbeni Department of Computing Imperial College London

Introduction to Gaussian Processes

Gaussian Process Dynamical Models Jack M Wang, David J Fleet, Aaron Hertzmann, NIPS 2005

Introduction to Gaussian Processes

A Bayesian Treatment of Linear Gaussian Regression

Pattern Recognition and Machine Learning

Transcription:

The Gaussian process latent variable model with Cox regression James Barrett Institute for Mathematical and Molecular Biomedicine (IMMB), King s College London James Barrett (IMMB, KCL) GP Latent Variable Model 1 / 19

Layout 1 Introduction 2 GPLVM 3 Results 4 Extension to Cox proportional hazards model James Barrett (IMMB, KCL) GP Latent Variable Model 2 / 19

Integrating Multiple Data Sources via the GPLVM The Gaussian process latent variable model (Lawrence, 2005) is a flexible non-parametric probabilistic dimensionality reduction method. We want to: Also: Represent each dataset in terms of latent variables. Extract information common to each data source. Retain information unique to each source. Account for dimension mismatch between multiple datasets. Detect any intrinsic low dimensional structure. 1 2 James Barrett (IMMB, KCL) GP Latent Variable Model 3 / 19

Layout 1 Introduction 2 GPLVM 3 Results 4 Extension to Cox proportional hazards model James Barrett (IMMB, KCL) GP Latent Variable Model 4 / 19

Model Definition Observe S datasets Y 1 R N d 1,..., Y S R N d S. It is assumed each column of Y s is normalised to zero mean and unit variance. Represent these data in terms of q latent variables x where q < min s(d s). For individual i and covariate µ in source s we write Where y s iµ = M wµmφ s s m(x i ) + ξiµ s m=1 φ s m : R q R M are non-linear mappings that may depend on hyperparameters φ s w s µm are mapping coefficients ξ s iµ are noise variables. James Barrett (IMMB, KCL) GP Latent Variable Model 5 / 19

Data Likelihood Assume Gaussian priors for p(w s) and p(ξ s β s) with zero mean and covariances given by w s µm wνn s = δss δ µνδ mn and ξiµξ s jν s = β 1 s δ ss δ ij δ µν. For notational simplicity we define β = {β 1,..., β S }, Φ = {φ 1,..., φ S }, W = {W 1,..., W S }, ξ = {ξ 1,..., ξ S } and Y = {Y 1,..., Y S }. The data likelihood factorises over samples p(y X, W, ξ, β, Φ) = N p(y i x i, W, ξ i, β, Φ) Marginalising W and ξ we get a Gaussian distribution for Y with mean y iµ = 0 and covariance ( ) y s iµ yjν s = δss δ µν φ s m(x i )φ s m(x j ) + βs 1 δ ij m i=1 = δ ss δ µνk s(x i, x j ) The data likelihood can then be written as p(y X, β, Φ) = S d s s=1 µ=1 e 1 2 ys :,µ K 1 s y s :,µ (2π) N 2 K s 1 2 James Barrett (IMMB, KCL) GP Latent Variable Model 6 / 19

Bayesian Inference We specify three levels of uncertainty: Microscopic parameters: {X} Hyperparameters: {β, Φ} Models: H = {q, φ m} Posterior distributions are p(x Y, β, Φ, H) = p(β, Φ Y, H) = P(H Y) = p(y X, β, Φ, H)p(X H) dx p(y X, β, Φ, H)p(X H) p(y β, Φ, H)p(β, Φ H) dβ dφ p(y β, Φ, H)p(β, Φ H) p(y H)p(H) H p(y H )p(h ), where p(y β, Φ, H) = p(y H) = dxp(y X, β, Φ, H)p(X H) dβdφ p(y β, Φ, H)p(β, Φ H). James Barrett (IMMB, KCL) GP Latent Variable Model 7 / 19

Inferring latent variables To find the optimal latent variable representation, X we will numerically minimise the negative log likelihood of p(x Y, β, Φ, H) L X (X; β, Φ) = [ ] ds 2N tr(k 1 s S s) + ds ds log Ks + log 2π 2N 2 s where S s = 1 d s Y sy T s. Should we rescale the contribution from each source by d tot/d s where d tot = s ds? Expand to second order L X (X; β, Φ) L X (X ; β, Φ) + 1 2 N i,j q (xiµ x iµ )(xjν x jν )A iµ,jν µ,ν where 2 A iµ,jν = L X (X; β, Φ}) x iµ x jν X=X p(y β, Φ, H) = dxe NL X (X;β,Φ) = p(y X, β, Φ, H) dxe 1 2 ij µν (x iµ x iµ)(xjν x jν )A iµ,jν = p(y X, β, Φ, H)(2π) Nq/2 A(X, β, Φ) 1/2 James Barrett (IMMB, KCL) GP Latent Variable Model 8 / 19

Invariance under Unitary Transformations The kernel functions considered here are all invariant under arbitrary unitary transformations. Let U be a unitary matrix, such that U T U = UU T = I and let x = Ux. Then x i x j = x i U T Ux j = x i x j and ( x i x j ) 2 = (x i x j )U T U(x i x j ) = (x i x j ) 2. This invariance under unitary transformations induces symmetries in the posterior search space of X R N q. Fix with x 11 0 0 0 x 21 x 22 0 0 X = x 31 x 32 x 33 0. x 41 x 42 x 43 x 44.. We pin down the latent variables and optimise over the Nq (q 2 q)/2 non zero entries. James Barrett (IMMB, KCL) GP Latent Variable Model 9 / 19

Layout 1 Introduction 2 GPLVM 3 Results 4 Extension to Cox proportional hazards model James Barrett (IMMB, KCL) GP Latent Variable Model 10 / 19

Synthetic Data 2 1 1.5 1 0.5 q2 0 q2 0 1 0.5 2 2 1 0 1 2 q 1 1 1.5 1.5 1 0.5 0 0.5 1 1.5 q 1 (a) True latent variables (b) Retrieved latent variables We can define three ad hoc error measures E radial = 1 x i r C r i C E angular = 1 θ i θ C θ i C E linear = SSerr SS tot where SS err = (x i2 αx i1 ) 2 and SS tot = (x i2 x i2 ) 2. James Barrett (IMMB, KCL) GP Latent Variable Model 11 / 19

Dependence on β and d β E radial E angular E linear 0.1 0.0060 0.0046 0.0079 0.5 0.0766 0.0813 0.2577 1.0 0.0998 0.1701 0.3263 (a) Dependence on β d E radial E angular E linear 10 0.0944 0.0454 0.5491 100 0.0061 0.0051 0.0108 1000 0.0004 0.0008 0.0016 (b) Dependence on d Table: (a) The magnitude of the errors increases as more noise is added (for fixed d) to the synthetic data. (b) For fixed noise levels the greater d is the better the extraction of the true low dimensional structure from a dataset. James Barrett (IMMB, KCL) GP Latent Variable Model 12 / 19

Dimensionality detection 10 0 Linear Polynomial Min Lh y p 10 20 30 1 2 3 4 5 6 No of latent variables q Figure: Plot of the minimal values of L hyp obtained for different values of q and two different kernels, the linear kernel and the polynomial kernel. Both the kernel types detect that q = 2 is the optimal dimension. Furthermore, the model can distinguish that the linear kernel offers the best explanation of the data in this case. James Barrett (IMMB, KCL) GP Latent Variable Model 13 / 19

Integration of two sources 10 15 5 10 Min Lh y p 0 5 10 Min Lh y p 5 0 5 10 15 15 20 2 4 6 8 10 No of latent variables q 20 2 4 6 8 10 No of latent variables q (a) Linear (b) Polynomial 150 100 Min Lh y p 50 0 50 2 4 6 8 10 No of latent variables q (c) Combined James Barrett (IMMB, KCL) GP Latent Variable Model 14 / 19

Classification experiment We generated N = 100 samples with q = 2. 50 samples from a Gaussian with unit variance and mean (1, 1) with class +1. 50 samples from a unit variance Gaussians with means ( 1 2, 1 2 ) and ( 1 2, 1 2 ) with class 1. Projected into d = 100 space with a linear mapping. Y (d = 100) X (q = 2) Training Success 86.3% 86.6% Validation Success 74.0% 83.0% We repeated this 300 times. The mean improvement between the validation success on Y and the success on X was found to be 8.7% with a standard deviation of 4.8%. James Barrett (IMMB, KCL) GP Latent Variable Model 15 / 19

Layout 1 Introduction 2 GPLVM 3 Results 4 Extension to Cox proportional hazards model James Barrett (IMMB, KCL) GP Latent Variable Model 16 / 19

What is Survival Analysis? Suppose we have a group of N cancer patients. For each individual i we measure: The time τ i 0 until an event of interest occurs, for example the time to metastasis. A vector of covariates (also called features or input variables) x i R d We will assume one risk and use an indicator variable { 0 if i is censored i = 1 if the primary risk occurs Event Time τ (years) 9 8 7 6 5 4 3 2 1 Primary events Censoring events 0 3 2 1 0 1 2 3 Covariate x Aim To extract any statistical relationship between x and τ for each risk. Challenges: How can we incorporate information from censored individuals? How can we deal with non-negative outputs? James Barrett (IMMB, KCL) GP Latent Variable Model 17 / 19

Linking Survival Data to Latent Variables For each patient we observe covariates y, time to event t, and event type. p(x Y, t) p(y, t X)p(X) where as above = p(y X) p(t X) p(x) }{{}}{{} GPLVM Cox Covariates Y X Latent Variables p(y X) = d µ=1 and for the Cox model e 1 2 yt :,µ K 1 y :,µ (2π) N 2 K 1 2 T,Δ Survival Data p(t X) = N λ 0(t)e β x i e eβ x i Λ 0 (t) i=1 James Barrett (IMMB, KCL) GP Latent Variable Model 18 / 19

Predictions If we observe a new patient with y we predict the corresponding event time t via y GPLVM x Cox t We can also use Cox to generate survival curves: 1 0.9 True survival curves and retrieved low dimensional survival curves True Survival Retrieved Low Dim Survival 1 0.9 True survival curves and high dimensional survival curves True Survival High Dim Survival 0.8 0.8 0.7 0.7 0.6 0.6 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 2 4 6 8 10 0 0 2 4 6 8 10 James Barrett (IMMB, KCL) GP Latent Variable Model 19 / 19