Kernel Bayes Rule: Nonparametric Bayesian inference with kernels
|
|
- Imogene Linette Montgomery
- 5 years ago
- Views:
Transcription
1 Kernel Bayes Rule: Nonparametric Bayesian inference with kernels Kenji Fukumizu The Institute of Statistical Mathematics NIPS 2012 Workshop Confluence between Kernel Methods and Graphical Models December 8, Tahoe 1
2 Introduction A new kernel methodology for nonparametric inference. Kernel means are used in representing and manipulating the probabilities of variables. Nonparametric Bayesian inference is also possible! Completely nonparametric Computation is done by linear algebra with Gram matrices. Different from Bayesian nonparametrics Today s main topic. 2
3 Outline 1. Kernel mean: a method for nonparametric inference 2. Representing conditional probability 3. Kernel Bayes Rule and its applications 4. Conclusions 3
4 Kernel mean: representing probabilities Classical nonparametric methods for representing probabilities Kernel density estimation: p n x = 1 n K((x X n i=1 i)/h n ) Characteristic function: Ch. f X u = E[e iii ], Ch. f X u = 1 n n eiix i i=1 New alternative: kernel mean X: random variable taking values on Ω, with probability P. k: positive definite kernel on Ω, H k : RKHS associated with k. Def. Kernel mean of X on H k : m P E Φ X = k, x dd x H k n Φ x = k(, x): feature vector Empirical estimation: m P = 1 Φ X n i=1 i for X 1,, X n ~P. i.i.d. 4
5 Reproducing expectation: f, m P = E f X f H k. Kernel mean has information of higher order moments of X e.g. k u, x = c 0 + c 1 uu + c 2 uu 2 + c i 0, e.g., e uu m P u = c 0 + c 1 E X u + c 2 E X 2 u 2 + Moment generating function 5
6 Characteristic kernel (Fukumizu et al. JMLR 2004, AoS 2009; Sriperumbudur et al. JMLR2010) Def. A bounded measurable kernel k is characteristic, if m P = m Q P = Q. Kernel mean m P with characteristic kernel k uniquely determines the probability. Examples: Gaussian, Laplace kernel (polynomial kernel is not) Analogous to the characteristic function Ch. f X u = E[e iii ]. Ch.f. uniquely determines the probability of X. Positive definite kernel gives a better alternative: efficient computation by kernel trick. applicable to non-vectorial data. 6
7 Nonparametric inference with kernels Principle: with characteristic kernels, Inference on P Inference on m P Two sample test m P = m Q? (Gretton et al. NIPS 2006, JMLR 2012, NIPS 2009, 2012) Independence test m XX = m X m Y? (Gretton NIPS 2007) Bayesian inference Estimate kernel mean of the posterior given kernel representation of prior and conditional probability. 7
8 Conventional approaches to nonparametric inference Smoothing kernel (not necessarily positive definite) Kernel density estimation, local polynomial fitting h d K(x/h) Characteristic function: E[e iii ] etc, etc, Curse of dimensionality e.g. smoothing kernel: difficulty for high (or several) dimension. Kernel methods for nonparametric inference What can we do? How robust to high-dimensionality? 8
9 Conditional probabilities 9
10 Conditional kernel mean Conditional probabilities are important to inference Graphical modeling: conditional independence / dependence Bayesian inference Kernel mean of conditional probability E Φ Y X = x = Φ y p y x dd Question: How can we estimate it in the kernel framework? Accurate estimation of p(y x) is not easy. Regression approach. 10
11 Covariance (X, Y) : random vector taking values on Ω X Ω Y. (H X, k X ), (H Y, k Y ): RKHS on Ω X and Ω Y, resp. Def. (uncentered) covariance operators C YY : H X H Y, C XX : H X H X C YY = E Φ Y X Φ X Y T, C XX = E Φ X X Φ X X T Simply, extension of covariance matrix (linear map) V YY = E XY T Reproducing property: g, C YY f = E f X g Y for all f H X, g H Y. C YY can be identified with the kernel mean E[k Y, Y k X, X ] on the product space H Y H X : 11
12 Conditional kernel mean Review: X, Y, Gaussian random variables ( R m, R l, resp.) argmin A R l m Y AX 2 dd(x, Y) = V YY V XX 1 For general X and Y E Y X = x = V YY V XX 1 x argmin Φ Y Y F X F H X H Y F, Φ X X With characteristic kernel k X, 2 1 HY dd(x, Y) = C YY C XX HX E Φ Y X = x = C YY C XX 1 Φ X (x) Conditional kernel mean given X = x 12
13 Empirical estimation E [Φ Y (Y) X = x] = k Y T G X + nε n I n 1 k X (x) k X (x) = k X x, X 1,, k X x, X n T R n, k Y ( ) = k Y, Y 1,, k Y, Y n T H Y n, ε n : regularization coefficient Note: joint sample X 1, Y 1,, X n, Y n ~ P XX is used to give the conditional kernel mean with P Y X. c.f. kernel ridge regression E Y X = x = Y T G X + nε n I n 1 k X (x) 13
14 Kernel Bayes Rule 14
15 Inference with conditional kernel mean Sum rule: q y = p y x π x dd Chain rule : q x, y = p y x π x Bayes rule: q x y = p y x π x p y x π x dd Kernelization Express the probabilities by kernel means. Express the statistical relations among variables with covariance operators. Realize the above inference rules with Gram matrix computation. 15
16 Kernel Sum Rule Sum rule: q y = p y x π x dd Kernelization: m Y = C YY C 1 XX m π Gram matrix expression Joint sample Input: m π = l i=1 α i Φ X i, X 1, Y 1,, X n, Y n ~ P XX, n m Y = i=1 β i Φ Y i, β = G X + nε n I 1 n G XX α. G X = k X i, X j ii, G XX = k X i, X j ii Proof: Φ y p y x dd = C YY C 1 XX Φ x π x dd Φ y p y x π x dddd = C YY C 1 XX m Π 16
17 Kernel Chain Rule Chain rule: q x, y = p y x π x Kernelization: m Q = C (YY)X C 1 XX m π Gram matrix expression: Input: m π = l i=1 α i Φ X i, X 1, Y 1,, X n, Y n ~ P XX n m Q = i=1 β i Φ Y i Φ X i, β = G X + nε n I 1 n G XX α. Intuition: Note C (YY)X : H X H Y H X, E Φ Y Φ X Φ X p(y, x x From Sum Rule, ) C (YY)X C 1 XX m π = Φ y Φ x p y x δ(x x )π x ddddddd = Φ y Φ x p y x π x dddd = m Q 17
18 Kernel Bayes Rule (KBR) Bayes rule is regression y x with probability q(x, y) = p y x π x Kernel Bayes Rule (KBR, Fukumizu et al NIPS2011) m Qx y = C π π XX C 1 YY Φ(y) where C π YY = C YY X C 1 XX m π, C π YY = C (YY)X C 1 XX m π Gram matrix expression: Input: m π = l i=1 α i Φ X i, X 1, Y 1,, X n, Y n ~ P XX, n m Qx y = i=1 w i y Φ X i, w y = R X Y k Y y, Recall: Mean on the product space = Covariance 2 1 R X Y = ΛG YY ΛG YY + δ n I n Λk y, Λ = Diag G XX + nε n I 1 n G XX α 18
19 Inference with KBR KBR estimates the kernel mean of the posterior q(x y), not itself. How can we use it for Bayesian inference? Expectation: for any f H X, f X T R X Y k Y y f x q x y dd. (consistent) where f X = f X 1,, f X n T. Point estimation: x = argmin x m X Y=y Φ X x HX (pre-image problem) solved numerically 19
20 Completely nonparametric way of computing Bayes rule. No parametric models are needed, but data or samples are used to express the probabilistic relations nonparametrically. Examples: 1. Nonparametric HMM See next. X 0 X 1 X 2 X 3 X T Y 0 Y 1 Y 2 Y 3 Y T 2. Kernel Approximate Bayesian Computation (Nakagome, F., Mano 2012) Explicit form of likelihood p(y x) is unavailable, but sampling is possible. c.f. Approximate Bayesian Computation (ABC) 3. Kernelization of Bellman equation in POMDP (Nishiyama et al UAI2012) 20
21 Example : KBR for nonparametric HMM Assume: p X, Y = p(x 0, Y 0 ) T t=1 p Y t X t q X t X t 1 p(y t x t ) and/or q(x t x t 1 ) is not known. T But, data X t, Y t t=0 is available in training phase. Examples: X 0 X 1 X 2 X 3 X T Y 0 Y 1 Y 2 Y 3 Y T Measurement of hidden states is expensive, Hidden states are measured with time delay. Testing phase (e.g., filtering, e.g.): given y 0,, y t, estimate hidden state x s. KBR point estimator: argmin xs m xs y 0,,y t Φ x HX General sequential inference uses Bayes rule KBR applied. 21
22 Smoothing: noisy oscillation u t v t = sin 8θ t cos θ t sin (θ t ) + Z t, θ t+1 = arctan v t u t + 0.4, Y t = u t, v t T + W t, Z t, W t ~ N 0, 0.04I 2 i. i. d. Note: KBR does not know the dynamics, while the EKF and UKF use it. 22
23 Rotation angle of camera Hidden X t : angles of a video camera located at a corner of a room. Observed Y t : movie frame of a room + additive Gaussian noise. X t : 3600 downsampled frames of 20 x 20 RGB pixels (1200 dim. ). The first 1800 frames for training, and the second half for testing. noise KBR (Trace) Kalman filter(q) σ 2 = ±< ± 0.02 σ 2 = ± ± 0.02 Average MSE for camera angles (10 runs) * For the rotation matrices, Tr[AB -1 ] kernel for KBR, and quaternion expression for Kalman filter are used. 23
24 Concluding remarks Kernel methods : useful, general tool for nonparametric inference. Efficient linear algebraic computation with Gram matrices. Kernel Baeys rule. Inference with kernel mean of conditional probablity. Completely nonparametric way for general Bayesian inference. 24
25 Ongoing / future works Combination of parametric model and kernel nonparametric method: Exact integration + kernel nonparametrics (Nishiyama et al IBIS2012) Particle filter + kernel nonparametrics (Kanagawa et al IBIS 2012) Theoretical analysis in high-dimensional situation. Relation to other recent nonparametric approaches? Gaussian process Bayesian nonparametrics 25
26 Collaborators Bernhard Schölkopf (MPI) Arthur Gretton (UCL/MPI) Bharath Sriperumbudur (Cambridge) Le Song (Georgia Tech) Shigeki Nakagome (ISM) Shuhei Mano (ISM) Yu Nishiyama (ISM) Motonobu Kanagawa (NAIST) 26
Kernel methods for Bayesian inference
Kernel methods for Bayesian inference Arthur Gretton Gatsby Computational Neuroscience Unit Lancaster, Nov. 2014 Motivating Example: Bayesian inference without a model 3600 downsampled frames of 20 20
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationKernel Bayes Rule: Bayesian Inference with Positive Definite Kernels
Journal of Machine Learning Research 14 (2013) 3753-3783 Submitted 12/11; Revised 6/13; Published 12/13 Kernel Bayes Rule: Bayesian Inference with Positive Definite Kernels Kenji Fukumizu The Institute
More informationLecture 2: From Linear Regression to Kalman Filter and Beyond
Lecture 2: From Linear Regression to Kalman Filter and Beyond Department of Biomedical Engineering and Computational Science Aalto University January 26, 2012 Contents 1 Batch and Recursive Estimation
More informationECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering
ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering Lecturer: Nikolay Atanasov: natanasov@ucsd.edu Teaching Assistants: Siwei Guo: s9guo@eng.ucsd.edu Anwesan Pal:
More informationAdvances in kernel exponential families
Advances in kernel exponential families Arthur Gretton Gatsby Computational Neuroscience Unit, University College London NIPS, 2017 1/39 Outline Motivating application: Fast estimation of complex multivariate
More informationIntroduction to Probabilistic Graphical Models: Exercises
Introduction to Probabilistic Graphical Models: Exercises Cédric Archambeau Xerox Research Centre Europe cedric.archambeau@xrce.xerox.com Pascal Bootcamp Marseille, France, July 2010 Exercise 1: basics
More informationLecture 2: From Linear Regression to Kalman Filter and Beyond
Lecture 2: From Linear Regression to Kalman Filter and Beyond January 18, 2017 Contents 1 Batch and Recursive Estimation 2 Towards Bayesian Filtering 3 Kalman Filter and Bayesian Filtering and Smoothing
More informationRecovering Distributions from Gaussian RKHS Embeddings
Motonobu Kanagawa Graduate University for Advanced Studies kanagawa@ism.ac.jp Kenji Fukumizu Institute of Statistical Mathematics fukumizu@ism.ac.jp Abstract Recent advances of kernel methods have yielded
More informationKernel-Based Contrast Functions for Sufficient Dimension Reduction
Kernel-Based Contrast Functions for Sufficient Dimension Reduction Michael I. Jordan Departments of Statistics and EECS University of California, Berkeley Joint work with Kenji Fukumizu and Francis Bach
More informationKernel Embeddings of Conditional Distributions
Kernel Embeddings of Conditional Distributions Le Song, Kenji Fukumizu and Arthur Gretton Georgia Institute of Technology The Institute of Statistical Mathematics University College London Abstract Many
More informationCSci 8980: Advanced Topics in Graphical Models Gaussian Processes
CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian
More informationNonparameteric Regression:
Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,
More informationGraphical Models for Collaborative Filtering
Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters
More information22 : Hilbert Space Embeddings of Distributions
10-708: Probabilistic Graphical Models 10-708, Spring 2014 22 : Hilbert Space Embeddings of Distributions Lecturer: Eric P. Xing Scribes: Sujay Kumar Jauhar and Zhiguang Huo 1 Introduction and Motivation
More informationarxiv: v1 [stat.ml] 18 Sep 2014
Noname manuscript No. (will be inserted by the editor) Model-based Kernel Sum Rule Yu Nishiyama Motonobu Kanagawa Arthur Gretton Kenji Fukumizu arxiv:1409.5178v1 [stat.ml] 18 Sep 2014 Received: date /
More informationCS8803: Statistical Techniques in Robotics Byron Boots. Hilbert Space Embeddings
CS8803: Statistical Techniques in Robotics Byron Boots Hilbert Space Embeddings 1 Motivation CS8803: STR Hilbert Space Embeddings 2 Overview Multinomial Distributions Marginal, Joint, Conditional Sum,
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationKernel methods, kernel SVM and ridge regression
Kernel methods, kernel SVM and ridge regression Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Collaborative Filtering 2 Collaborative Filtering R: rating matrix; U: user factor;
More informationKalman filtering and friends: Inference in time series models. Herke van Hoof slides mostly by Michael Rubinstein
Kalman filtering and friends: Inference in time series models Herke van Hoof slides mostly by Michael Rubinstein Problem overview Goal Estimate most probable state at time k using measurement up to time
More informationNonparmeteric Bayes & Gaussian Processes. Baback Moghaddam Machine Learning Group
Nonparmeteric Bayes & Gaussian Processes Baback Moghaddam baback@jpl.nasa.gov Machine Learning Group Outline Bayesian Inference Hierarchical Models Model Selection Parametric vs. Nonparametric Gaussian
More informationStein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm
Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm Qiang Liu and Dilin Wang NIPS 2016 Discussion by Yunchen Pu March 17, 2017 March 17, 2017 1 / 8 Introduction Let x R d
More informationLecture 6: Bayesian Inference in SDE Models
Lecture 6: Bayesian Inference in SDE Models Bayesian Filtering and Smoothing Point of View Simo Särkkä Aalto University Simo Särkkä (Aalto) Lecture 6: Bayesian Inference in SDEs 1 / 45 Contents 1 SDEs
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 12 Dynamical Models CS/CNS/EE 155 Andreas Krause Homework 3 out tonight Start early!! Announcements Project milestones due today Please email to TAs 2 Parameter learning
More informationTutorial on Gaussian Processes and the Gaussian Process Latent Variable Model
Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model (& discussion on the GPLVM tech. report by Prof. N. Lawrence, 06) Andreas Damianou Department of Neuro- and Computer Science,
More informationLinear Dynamical Systems
Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationHilbert Space Embeddings of Conditional Distributions with Applications to Dynamical Systems
with Applications to Dynamical Systems Le Song Jonathan Huang School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA Alex Smola Yahoo! Research, Santa Clara, CA 95051, USA Kenji
More informationHilbert Space Embeddings of Conditional Distributions with Applications to Dynamical Systems
with Applications to Dynamical Systems Le Song Jonathan Huang School of Computer Science, Carnegie Mellon University Alex Smola Yahoo! Research, Santa Clara, CA, USA Kenji Fukumizu Institute of Statistical
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More informationPATTERN RECOGNITION AND MACHINE LEARNING
PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality
More informationGaussian Process Regression
Gaussian Process Regression 4F1 Pattern Recognition, 21 Carl Edward Rasmussen Department of Engineering, University of Cambridge November 11th - 16th, 21 Rasmussen (Engineering, Cambridge) Gaussian Process
More informationBig Hypothesis Testing with Kernel Embeddings
Big Hypothesis Testing with Kernel Embeddings Dino Sejdinovic Department of Statistics University of Oxford 9 January 2015 UCL Workshop on the Theory of Big Data D. Sejdinovic (Statistics, Oxford) Big
More informationBayesian Interpretations of Regularization
Bayesian Interpretations of Regularization Charlie Frogner 9.50 Class 15 April 1, 009 The Plan Regularized least squares maps {(x i, y i )} n i=1 to a function that minimizes the regularized loss: f S
More informationConvergence Rates of Kernel Quadrature Rules
Convergence Rates of Kernel Quadrature Rules Francis Bach INRIA - Ecole Normale Supérieure, Paris, France ÉCOLE NORMALE SUPÉRIEURE NIPS workshop on probabilistic integration - Dec. 2015 Outline Introduction
More informationGaussian processes and bayesian optimization Stanisław Jastrzębski. kudkudak.github.io kudkudak
Gaussian processes and bayesian optimization Stanisław Jastrzębski kudkudak.github.io kudkudak Plan Goal: talk about modern hyperparameter optimization algorithms Bayes reminder: equivalent linear regression
More informationLinear Dynamical Systems (Kalman filter)
Linear Dynamical Systems (Kalman filter) (a) Overview of HMMs (b) From HMMs to Linear Dynamical Systems (LDS) 1 Markov Chains with Discrete Random Variables x 1 x 2 x 3 x T Let s assume we have discrete
More informationGWAS V: Gaussian processes
GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011
More informationGaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008
Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:
More informationKernel methods for comparing distributions, measuring dependence
Kernel methods for comparing distributions, measuring dependence Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Principal component analysis Given a set of M centered observations
More informationThe Kalman Filter ImPr Talk
The Kalman Filter ImPr Talk Ged Ridgway Centre for Medical Image Computing November, 2006 Outline What is the Kalman Filter? State Space Models Kalman Filter Overview Bayesian Updating of Estimates Kalman
More informationKernel Learning via Random Fourier Representations
Kernel Learning via Random Fourier Representations L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Module 5: Machine Learning L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Kernel Learning via Random
More informationModel Selection for Gaussian Processes
Institute for Adaptive and Neural Computation School of Informatics,, UK December 26 Outline GP basics Model selection: covariance functions and parameterizations Criteria for model selection Marginal
More informationDynamic models 1 Kalman filters, linearization,
Koller & Friedman: Chapter 16 Jordan: Chapters 13, 15 Uri Lerner s Thesis: Chapters 3,9 Dynamic models 1 Kalman filters, linearization, Switching KFs, Assumed density filters Probabilistic Graphical Models
More informationGAUSSIAN PROCESS REGRESSION
GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The
More informationLearning Interpretable Features to Compare Distributions
Learning Interpretable Features to Compare Distributions Arthur Gretton Gatsby Computational Neuroscience Unit, University College London Theory of Big Data, 2017 1/41 Goal of this talk Given: Two collections
More informationNonparametric Tree Graphical Models via Kernel Embeddings
Le Song, 1 Arthur Gretton, 1,2 Carlos Guestrin 1 1 School of Computer Science, Carnegie Mellon University; 2 MPI for Biological Cybernetics Abstract We introduce a nonparametric representation for graphical
More informationLA Support for Scalable Kernel Methods. David Bindel 29 Sep 2018
LA Support for Scalable Kernel Methods David Bindel 29 Sep 2018 Collaborators Kun Dong (Cornell CAM) David Eriksson (Cornell CAM) Jake Gardner (Cornell CS) Eric Lee (Cornell CS) Hannes Nickisch (Phillips
More informationMinimax Estimation of Kernel Mean Embeddings
Minimax Estimation of Kernel Mean Embeddings Bharath K. Sriperumbudur Department of Statistics Pennsylvania State University Gatsby Computational Neuroscience Unit May 4, 2016 Collaborators Dr. Ilya Tolstikhin
More informationThe geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan
The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan Background: Global Optimization and Gaussian Processes The Geometry of Gaussian Processes and the Chaining Trick Algorithm
More informationMathematical Formulation of Our Example
Mathematical Formulation of Our Example We define two binary random variables: open and, where is light on or light off. Our question is: What is? Computer Vision 1 Combining Evidence Suppose our robot
More informationAdaptive HMC via the Infinite Exponential Family
Adaptive HMC via the Infinite Exponential Family Arthur Gretton Gatsby Unit, CSML, University College London RegML, 2017 Arthur Gretton (Gatsby Unit, UCL) Adaptive HMC via the Infinite Exponential Family
More informationLecture 4: Extended Kalman filter and Statistically Linearized Filter
Lecture 4: Extended Kalman filter and Statistically Linearized Filter Department of Biomedical Engineering and Computational Science Aalto University February 17, 2011 Contents 1 Overview of EKF 2 Linear
More informationForefront of the Two Sample Problem
Forefront of the Two Sample Problem From classical to state-of-the-art methods Yuchi Matsuoka What is the Two Sample Problem? X ",, X % ~ P i. i. d Y ",, Y - ~ Q i. i. d Two Sample Problem P = Q? Example:
More informationPILCO: A Model-Based and Data-Efficient Approach to Policy Search
PILCO: A Model-Based and Data-Efficient Approach to Policy Search (M.P. Deisenroth and C.E. Rasmussen) CSC2541 November 4, 2016 PILCO Graphical Model PILCO Probabilistic Inference for Learning COntrol
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationBayesian Nonparametrics
Bayesian Nonparametrics Peter Orbanz Columbia University PARAMETERS AND PATTERNS Parameters P(X θ) = Probability[data pattern] 3 2 1 0 1 2 3 5 0 5 Inference idea data = underlying pattern + independent
More informationECE521 Lecture 19 HMM cont. Inference in HMM
ECE521 Lecture 19 HMM cont. Inference in HMM Outline Hidden Markov models Model definitions and notations Inference in HMMs Learning in HMMs 2 Formally, a hidden Markov model defines a generative process
More informationApproximate Kernel Methods
Lecture 3 Approximate Kernel Methods Bharath K. Sriperumbudur Department of Statistics, Pennsylvania State University Machine Learning Summer School Tübingen, 207 Outline Motivating example Ridge regression
More informationHilbert Space Embeddings of Predictive State Representations
Hilbert Space Embeddings of Predictive State Representations Byron Boots Computer Science and Engineering Dept. University of Washington Seattle, WA Arthur Gretton Gatsby Unit University College London
More informationUndirected Graphical Models
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional
More information1 Kalman Filter Introduction
1 Kalman Filter Introduction You should first read Chapter 1 of Stochastic models, estimation, and control: Volume 1 by Peter S. Maybec (available here). 1.1 Explanation of Equations (1-3) and (1-4) Equation
More informationLearning features to compare distributions
Learning features to compare distributions Arthur Gretton Gatsby Computational Neuroscience Unit, University College London NIPS 2016 Workshop on Adversarial Learning, Barcelona Spain 1/28 Goal of this
More informationROBOTICS 01PEEQW. Basilio Bona DAUIN Politecnico di Torino
ROBOTICS 01PEEQW Basilio Bona DAUIN Politecnico di Torino Probabilistic Fundamentals in Robotics Gaussian Filters Course Outline Basic mathematical framework Probabilistic models of mobile robots Mobile
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 89 Part II
More informationCMU-Q Lecture 24:
CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input
More informationState Space and Hidden Markov Models
State Space and Hidden Markov Models Kunsch H.R. State Space and Hidden Markov Models. ETH- Zurich Zurich; Aliaksandr Hubin Oslo 2014 Contents 1. Introduction 2. Markov Chains 3. Hidden Markov and State
More informationGaussian processes for inference in stochastic differential equations
Gaussian processes for inference in stochastic differential equations Manfred Opper, AI group, TU Berlin November 6, 2017 Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017
More informationSTA414/2104. Lecture 11: Gaussian Processes. Department of Statistics
STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations
More informationLecture 5: GPs and Streaming regression
Lecture 5: GPs and Streaming regression Gaussian Processes Information gain Confidence intervals COMP-652 and ECSE-608, Lecture 5 - September 19, 2017 1 Recall: Non-parametric regression Input space X
More informationCOMP 551 Applied Machine Learning Lecture 20: Gaussian processes
COMP 55 Applied Machine Learning Lecture 2: Gaussian processes Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~hvanho2/comp55
More informationProbabilistic & Unsupervised Learning
Probabilistic & Unsupervised Learning Gaussian Processes Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College London
More informationGaussian Processes for Machine Learning
Gaussian Processes for Machine Learning Carl Edward Rasmussen Max Planck Institute for Biological Cybernetics Tübingen, Germany carl@tuebingen.mpg.de Carlos III, Madrid, May 2006 The actual science of
More informationCS281A/Stat241A Lecture 17
CS281A/Stat241A Lecture 17 p. 1/4 CS281A/Stat241A Lecture 17 Factor Analysis and State Space Models Peter Bartlett CS281A/Stat241A Lecture 17 p. 2/4 Key ideas of this lecture Factor Analysis. Recall: Gaussian
More informationData assimilation with and without a model
Data assimilation with and without a model Tyrus Berry George Mason University NJIT Feb. 28, 2017 Postdoc supported by NSF This work is in collaboration with: Tim Sauer, GMU Franz Hamilton, Postdoc, NCSU
More informationRemarks on kernel Bayes rule
Johno et al, Cogent Mathematics & Statistics 08, 5: 4470 https://doiorg/0080/33835084470 STATISTICS RESEARCH ARTICLE Remarks on kernel Bayes rule Hisashi Johno, Kazunori Nakamoto * and Tatsuhiko Saigo
More informationLecture 16: State Space Model and Kalman Filter Bus 41910, Time Series Analysis, Mr. R. Tsay
Lecture 6: State Space Model and Kalman Filter Bus 490, Time Series Analysis, Mr R Tsay A state space model consists of two equations: S t+ F S t + Ge t+, () Z t HS t + ɛ t (2) where S t is a state vector
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate
More informationLecture Notes 15 Prediction Chapters 13, 22, 20.4.
Lecture Notes 15 Prediction Chapters 13, 22, 20.4. 1 Introduction Prediction is covered in detail in 36-707, 36-701, 36-715, 10/36-702. Here, we will just give an introduction. We observe training data
More informationAn Online Spectral Learning Algorithm for Partially Observable Nonlinear Dynamical Systems
An Online Spectral Learning Algorithm for Partially Observable Nonlinear Dynamical Systems Byron Boots and Geoffrey J. Gordon AAAI 2011 Select Lab Carnegie Mellon University What is out there?...... o
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationSTA 414/2104: Machine Learning
STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far
More informationMiscellaneous. Regarding reading materials. Again, ask questions (if you have) and ask them earlier
Miscellaneous Regarding reading materials Reading materials will be provided as needed If no assigned reading, it means I think the material from class is sufficient Should be enough for you to do your
More informationPattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods
Pattern Recognition and Machine Learning Chapter 6: Kernel Methods Vasil Khalidov Alex Kläser December 13, 2007 Training Data: Keep or Discard? Parametric methods (linear/nonlinear) so far: learn parameter
More informationFactor Analysis and Kalman Filtering (11/2/04)
CS281A/Stat241A: Statistical Learning Theory Factor Analysis and Kalman Filtering (11/2/04) Lecturer: Michael I. Jordan Scribes: Byung-Gon Chun and Sunghoon Kim 1 Factor Analysis Factor analysis is used
More informationDiscrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 26. Estimation: Regression and Least Squares
CS 70 Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 26 Estimation: Regression and Least Squares This note explains how to use observations to estimate unobserved random variables.
More informationMachine Learning Srihari. Gaussian Processes. Sargur Srihari
Gaussian Processes Sargur Srihari 1 Topics in Gaussian Processes 1. Examples of use of GP 2. Duality: From Basis Functions to Kernel Functions 3. GP Definition and Intuition 4. Linear regression revisited
More informationAdvanced Introduction to Machine Learning CMU-10715
Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes Barnabás Póczos http://www.gaussianprocess.org/ 2 Some of these slides in the intro are taken from D. Lizotte, R. Parr, C. Guesterin
More informationLecture 3. STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher
Lecture 3 STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher Previous lectures What is machine learning? Objectives of machine learning Supervised and
More informationBayesian Regression Linear and Logistic Regression
When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we
More informationLecture 3: Pattern Classification
EE E6820: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 1 2 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mixtures
More informationIntroduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak
Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,
More informationIntroduction to Mobile Robotics Probabilistic Robotics
Introduction to Mobile Robotics Probabilistic Robotics Wolfram Burgard 1 Probabilistic Robotics Key idea: Explicit representation of uncertainty (using the calculus of probability theory) Perception Action
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationKernel Bayesian Inference with Posterior Regularization
Kernel Bayesian Inference with Posterior Regularization Yang Song, Jun Zhu, Yong Ren Dept. of Physics, Tsinghua University, Beijing, China Dept. of Comp. Sci. & Tech., TNList Lab; Center for Bio-Inspired
More information