An Online Spectral Learning Algorithm for Partially Observable Nonlinear Dynamical Systems

Size: px
Start display at page:

Download "An Online Spectral Learning Algorithm for Partially Observable Nonlinear Dynamical Systems"

Transcription

1 An Online Spectral Learning Algorithm for Partially Observable Nonlinear Dynamical Systems Byron Boots and Geoffrey J. Gordon AAAI 2011 Select Lab Carnegie Mellon University

2 What is out there? o t+2 o t 2 o t 1 o t o t+1

3 What is out there? Dynamical Systems Past Future o t+2 o t 2 o t 1 o t o t+1 State Dynamical System = A recursive rule for updating state based on observations

4 Learning a Dynamical System Past Future o t+2 o t 2 o t 1 o t o t+1 State we would like to a model of a dynamical system

5 Learning a Dynamical System Past Future o t+2 o t 2 o t 1 o t o t+1 State we would like to a model of a dynamical system today I will focus on Spectral Learning Algorithms for Predictive State Representations

6 Predictive State Representations (PSRs) comprised of: set of ions set of observations initial state set of transition matrices normalization vector A O x1 R d M ao R d d e R d

7 Predictive State Representations (PSRs) comprised of: set of ions set of observations initial state set of transition matrices normalization vector A O x1 R d M ao R d d e R d at each time step can predict: P (o x t, do(a t )) = e M at,ox t

8 Predictive State Representations (PSRs) comprised of: set of ions set of observations initial state set of transition matrices normalization vector A O x1 R d M ao R d d e R d at each time step can predict: P (o x t, do(a t )) = e M at,ox t

9 Predictive State Representations (PSRs) comprised of: set of ions set of observations initial state set of transition matrices normalization vector A O x1 R d M ao R d d e R d at each time step can predict: P (o x t, do(a t )) = e M at,ox t

10 Predictive State Representations (PSRs) comprised of: set of ions set of observations initial state set of transition matrices normalization vector A O x1 R d M ao R d d e R d at each time step can predict: P (o x t, do(a t )) = e M at,ox t and update state: x t+1 = M at,o t x t /P (o t x t, do(a t ))

11 Predictive State Representations (PSRs) comprised of: set of ions set of observations initial state set of transition matrices normalization vector A O x1 R d M ao R d d e R d at each time step can predict: P (o x t, do(a t )) = e M at,ox t and update state: x t+1 = M at,o t x t /P (o t x t, do(a t ))

12 Predictive State Representations (PSRs) comprised of: set of ions set of observations initial state set of transition matrices normalization vector A O x1 R d M ao R d d e R d at each time step can predict: P (o x t, do(a t )) = e M at,ox t and update state: x t+1 = M at,o t x t /P (o t x t, do(a t ))

13 Predictive State Representations parameters are only determined up to a similarity transform S R d d if we replace M ao S 1 M ao S x 1 S 1 x 1 e S e the resulting PSR makes exly the same predictions as the original one e.g. P (o x t, do(a t )) = e SS 1 M at,oss 1 x t

14 PSRs Are Very Expressive Predictive State Representations Reduced-Rank HMMs & Reduced Rank POMDPS HMMs & POMDPS for fixed latent dimension d

15 Learning PSRs can use fast, statistically consistent, spectral methods to PSR parameters

16 A General Principle predict data about past (many samples) state data about future (many samples) compress bottleneck expand

17 A General Principle predict data about past (many samples) state data about future (many samples) compress bottleneck expand If bottleneck = rank constraint, then get a spectral method

18 Why Spectral Methods? There are many ways to a dynamical system Maximum Likelihood via Expectation Maximization, Gradient Descent,... Bayesian inference via Gibbs, Metropolis Hastings,... In contrast to these methods, spectral ing algorithms give No local optima: Huge gain in computational efficiency Slight loss in statistical efficiency

19 Spectral Learning for PSRs moments of directly observable features Σ T,AO,H trivariance tensor of features of the future, present, and past Σ T,H Σ AO,AO covariance matrix of features of the future and past covariance matrix of features present

20 Spectral Learning for PSRs moments of directly observable features Σ T,AO,H trivariance tensor of features of the future, present, and past Σ T,H Σ AO,AO covariance matrix of features of the future and past covariance matrix of features present U left d singular vectors of Σ T,H

21 Spectral Learning for PSRs moments of directly observable features Σ T,AO,H trivariance tensor of features of the future, present, and past Σ T,H Σ AO,AO covariance matrix of features of the future and past covariance matrix of features present U left d singular vectors of Σ T,H S 1 M ao S := Σ T,AO,H 1 U 2 φ(ao) (Σ AO,AO ) 1 3 (Σ T,HU) the other parameters can be found analogously

22 Spectral Learning for PSRs Spectral Learning Algorithm: Estimate Σ T,AO,H, Σ T,H, and Σ AO,AO from data Find U by SVD Plug in to recover PSR parameters Learning is Statistically Consistent Only requires Linear Algebra For details, see: B. Boots, S. M. Siddiqi, and G. Gordon. Closing the ing-planning loop with predictive state representations. RSS, 2010.

23 Infinite Features Can extend the ing algorithm to infinite feature spaces Kernels Learning algorithm that we have seen is linear algebra works just fine in an arbitrary RKHS Can rewrite all of the formulas in terms of Gram matrices Uses kernel SVD instead of SVD Result: Hilbert Space Embeddings of Dynamical Systems handles near arbitrary observation distributions good prediction performance For details, see: L. Song, B. Boots, S. M. Siddiqi, G. Gordon, and A. J. Smola. Hilbert space embeddings of hidden Markov models. ICML, 2010.

24 Accuracy (%) 6 MSE Err. Avg. Prediction 6 Avg. Prediction Err. Slot Car 8 An Experiment Mean 7 Last 6 LDS Hilbert Space Embeddings of Hidde 5 Hilbert Space Embeddings of H IMU 4 8 x 10 B. B. 8A.x HMM Mean 7 HMM Mean PSR RR-HMM Last 85 RR-HMM LDS Last Embedded 6 80 IMU 2 54 LDS Embedded 5 IMU thanks 2 to Dieter Fox s lab Slot Car Slot Car A. B. Avg. Prediction Err. A. sense x 10 6 Racetrack Prediction Horiz Figure 4. Slot car inertial measurement data. car platform and the IMU (top) and the rac man vs. this data while the slot car circled the track controlled 11 tom). (B) Squared error for prediction with d tom). (B) Squared error for prediction with different estiembedd by a constant policy. The goal of this experiment was N mated models and baselines. to a model of the noisy IMU data, and, after differen mated models and baselines. ti Racetrack Prediction Horizon 1 60 horizon 0 Figure Slot 30 car inertial (A) The slot Fi measurement data car platform and the IMU (top) and the racetrack (botm Racetrack Prediction Horizon tom). (B) Squared error for prediction with different estiem Figure 4.L. Slot inertial measurement The mated models and baselines. Song, B.car Boots, S. M. Siddiqi, G. Gordon, and A.data. J. Smola.(A) Hilbert spaceslot embeddingsfiguredi5 of hiddenand Markov models. ICML, and the racetrack (botcar platform the IMU (top) filtering, to predict future IMU readings. sp

25 Batch Methods Bottleneck: SVD of Gram or Covariance matrix G: (# time steps) 2 C: (# features window length) (# time steps) E.g., 1 hr video, 24 fps, , features of past and future are all pixels in 2 s windows G: ( ) ( ) 10 10

26 Making it Fast Two techniques online ing random projections Neither one new, but combination with spectral ing for PSRs is, and makes huge difference in price

27 Online Learning U left d singular vectors of Σ T,H S 1 M ao S := Σ T,AO,H 1 U 2 φ(ao) (Σ AO,AO ) 1 3 (Σ T,HU)

28 Online Learning U left d singular vectors of Σ T,H S 1 M ao S := Σ T,AO,H 1 U 2 φ(ao) (Σ AO,AO ) 1 3 (Σ T,HU) With each new observation, rank-1 update of: SVD (Brand) inverse (Sherman-Morrison) n features; latent dimension d; T steps space = O(nd): may fit in cache! time = O(nd 2 T): bounded time per example Problem: no rank-1 update of kernel SVD! can use random projections [Rahimi & Recht, 2007]

29 Random Projections U left d singular vectors of Σ T,H S 1 M ao S := Σ T,AO,H 1 U 2 φ(ao) (Σ AO,AO ) 1 3 (Σ T,HU) With each new observation, rank-1 update of: SVD (Brand) inverse (Sherman-Morrison) n features; latent dimension d; T steps space = O(nd): may fit in cache! time = O(nd 2 T): bounded time per example Problem: no rank-1 update of kernel SVD! can use random projections [Rahimi & Recht, 2007]

30 Hilbert Space B. (Revisited) Experiment 8 Avg.MSEPrediction Err. Mean 7 Last 6 LDS Hilbert Embeddings of Hidd 5 x 10Space 6 IMU 8 x 10 Mean Observation 47 B HMM 7 HSE-HMMHMM 36 MeanEmbedded 85 Last Online withrr-hmm Rand. Features LDS Embedded 4 5 IMU thanks 2 to Dieter Fox s lab Accuracy (%) Slot Car A. Slot Car A. x 10 6 Avg. Prediction Err. sense Racetrack Prediction Hori Prediction 70 80Horizon Figure measurement data. Racetrack 4. Slot car inertial Prediction Horizon Figure 4. Slot car inertial measurement data. (A) The slot Figure car platform and the IMU (top) and the ra car platform and the IMU (top) and the racetrack (botman vs (B) error Squared errorwith fordifferent prediction with tom).tom). (B) Squared for prediction estiembed mated models and baselines. differen mated models and baselines

31 31 sense Online Learning Example Conference Room A. Table

32 32 sense Online Learning Example Conference Room Table A. online+random: 100k features, 11k frames, limit = avail. data offline: 2k frames, compressed & subsampled, compute-limited

33 33 sense Online Learning Example Conference Room Table A. online+random: 100k features, 11k frames, limit = avail. data offline: 2k frames, compressed & subsampled, compute-limited s 100 steps

34 34 sense Online Learning Example Conference Room Table A. online+random: 100k features, 11k frames, limit = avail. data offline: 2k frames, compressed & subsampled, compute-limited 350 steps

35 35 sense Online Learning Example Conference Room Table A. online+random: 100k features, 11k frames, limit = avail. data offline: 2k frames, compressed & subsampled, compute-limited 600 steps 0

36 sense Online Learning Example Conference Room A. online+random: 100k features, 11k frames, limit = avail. data Table offline: 2k frames, compressed & subsampled, compute-limited A. B. final embedding (colors = 3rd dim) 0 After 600 samples 0 Table 0 B. í í 600 steps C. C. After 1 $I sa Af sa í í 0 36

37 Paper Summary We present spectral ing algorithms for PSR models of partially observable nonlinear dynamical systems. We show how to update parameters of the estimated PSR model given new data efficient online spectral ing algorithm We show how to use random projections to approximate kernel-based ing algorithms

Learning about State. Geoff Gordon Machine Learning Department Carnegie Mellon University

Learning about State. Geoff Gordon Machine Learning Department Carnegie Mellon University Learning about State Geoff Gordon Machine Learning Department Carnegie Mellon University joint work with Byron Boots, Sajid Siddiqi, Le Song, Alex Smola 2 What s out there?...... ot-2 ot-1 ot ot+1 ot+2

More information

Learning about State. Geoff Gordon Machine Learning Department Carnegie Mellon University

Learning about State. Geoff Gordon Machine Learning Department Carnegie Mellon University Learning about State Geoff Gordon Machine Learning Department Carnegie Mellon University joint work with Byron Boots, Sajid Siddiqi, Le Song, Alex Smola What s out there?...... ot-2 ot-1 ot ot+1 ot+2 2

More information

Spectral learning algorithms for dynamical systems

Spectral learning algorithms for dynamical systems Spectral learning algorithms for dynamical systems Geoff Gordon http://www.cs.cmu.edu/~ggordon/ Machine Learning Department Carnegie Mellon University joint work with Byron Boots, Sajid Siddiqi, Le Song,

More information

Reduced-Rank Hidden Markov Models

Reduced-Rank Hidden Markov Models Reduced-Rank Hidden Markov Models Sajid M. Siddiqi Byron Boots Geoffrey J. Gordon Carnegie Mellon University ... x 1 x 2 x 3 x τ y 1 y 2 y 3 y τ Sequence of observations: Y =[y 1 y 2 y 3... y τ ] Assume

More information

Hilbert Space Embeddings of Predictive State Representations

Hilbert Space Embeddings of Predictive State Representations Hilbert Space Embeddings of Predictive State Representations Byron Boots Computer Science and Engineering Dept. University of Washington Seattle, WA Arthur Gretton Gatsby Unit University College London

More information

Hilbert Space Embeddings of Hidden Markov Models

Hilbert Space Embeddings of Hidden Markov Models Hilbert Space Embeddings of Hidden Markov Models Le Song Carnegie Mellon University Joint work with Byron Boots, Sajid Siddiqi, Geoff Gordon and Alex Smola 1 Big Picture QuesJon Graphical Models! Dependent

More information

A Constraint Generation Approach to Learning Stable Linear Dynamical Systems

A Constraint Generation Approach to Learning Stable Linear Dynamical Systems A Constraint Generation Approach to Learning Stable Linear Dynamical Systems Sajid M. Siddiqi Byron Boots Geoffrey J. Gordon Carnegie Mellon University NIPS 2007 poster W22 steam Application: Dynamic Textures

More information

26 : Spectral GMs. Lecturer: Eric P. Xing Scribes: Guillermo A Cidre, Abelino Jimenez G.

26 : Spectral GMs. Lecturer: Eric P. Xing Scribes: Guillermo A Cidre, Abelino Jimenez G. 10-708: Probabilistic Graphical Models, Spring 2015 26 : Spectral GMs Lecturer: Eric P. Xing Scribes: Guillermo A Cidre, Abelino Jimenez G. 1 Introduction A common task in machine learning is to work with

More information

CS8803: Statistical Techniques in Robotics Byron Boots. Hilbert Space Embeddings

CS8803: Statistical Techniques in Robotics Byron Boots. Hilbert Space Embeddings CS8803: Statistical Techniques in Robotics Byron Boots Hilbert Space Embeddings 1 Motivation CS8803: STR Hilbert Space Embeddings 2 Overview Multinomial Distributions Marginal, Joint, Conditional Sum,

More information

Hilbert Space Embeddings of Hidden Markov Models

Hilbert Space Embeddings of Hidden Markov Models Le Song lesong@cs.cmu.edu Byron Boots beb@cs.cmu.edu School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA Sajid M. Siddiqi siddiqi@google.com Google, Pittsburgh, PA 15213,

More information

Kernel Bayes Rule: Nonparametric Bayesian inference with kernels

Kernel Bayes Rule: Nonparametric Bayesian inference with kernels Kernel Bayes Rule: Nonparametric Bayesian inference with kernels Kenji Fukumizu The Institute of Statistical Mathematics NIPS 2012 Workshop Confluence between Kernel Methods and Graphical Models December

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA Contents in latter part Linear Dynamical Systems What is different from HMM? Kalman filter Its strength and limitation Particle Filter

More information

arxiv: v2 [stat.ml] 28 Feb 2018

arxiv: v2 [stat.ml] 28 Feb 2018 An Efficient, Expressive and Local Minima-free Method for Learning Controlled Dynamical Systems Ahmed Hefny Carnegie Mellon University Carlton Downey Carnegie Mellon University Geoffrey Gordon Carnegie

More information

arxiv: v1 [cs.lg] 12 Dec 2009

arxiv: v1 [cs.lg] 12 Dec 2009 Closing the Learning-Planning Loop with Predictive State Representations Byron Boots Machine Learning Department Carnegie Mellon University Pittsburgh, PA 15213 beb@cs.cmu.edu Sajid M. Siddiqi Robotics

More information

Graphical Models for Collaborative Filtering

Graphical Models for Collaborative Filtering Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,

More information

Hilbert Space Embeddings of Hidden Markov Models

Hilbert Space Embeddings of Hidden Markov Models Carnegie Mellon University Research Showcase @ CMU Machine Learning Department School of Computer Science 6-00 Hilbert Space Embeddings of Hidden Markov Models Le Song Carnegie Mellon University Byron

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Two-Manifold Problems with Applications to Nonlinear System Identification

Two-Manifold Problems with Applications to Nonlinear System Identification Two-Manifold Problems with Applications to Nonlinear System Identification Byron Boots Geoffrey J. Gordon Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA 1213 beb@cs.cmu.edu ggordon@cs.cmu.edu

More information

Linear Dynamical Systems (Kalman filter)

Linear Dynamical Systems (Kalman filter) Linear Dynamical Systems (Kalman filter) (a) Overview of HMMs (b) From HMMs to Linear Dynamical Systems (LDS) 1 Markov Chains with Discrete Random Variables x 1 x 2 x 3 x T Let s assume we have discrete

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin

More information

Spectral Probabilistic Modeling and Applications to Natural Language Processing

Spectral Probabilistic Modeling and Applications to Natural Language Processing School of Computer Science Spectral Probabilistic Modeling and Applications to Natural Language Processing Doctoral Thesis Proposal Ankur Parikh Thesis Committee: Eric Xing, Geoff Gordon, Noah Smith, Le

More information

arxiv: v1 [cs.lg] 29 Dec 2011

arxiv: v1 [cs.lg] 29 Dec 2011 Two-Manifold Problems arxiv:1112.6399v1 [cs.lg] 29 Dec 211 Byron Boots Machine Learning Department Carnegie Mellon University Pittsburgh, PA 15213 beb@cs.cmu.edu Abstract Recently, there has been much

More information

Lecture 21: Spectral Learning for Graphical Models

Lecture 21: Spectral Learning for Graphical Models 10-708: Probabilistic Graphical Models 10-708, Spring 2016 Lecture 21: Spectral Learning for Graphical Models Lecturer: Eric P. Xing Scribes: Maruan Al-Shedivat, Wei-Cheng Chang, Frederick Liu 1 Motivation

More information

Randomized Algorithms

Randomized Algorithms Randomized Algorithms Saniv Kumar, Google Research, NY EECS-6898, Columbia University - Fall, 010 Saniv Kumar 9/13/010 EECS6898 Large Scale Machine Learning 1 Curse of Dimensionality Gaussian Mixture Models

More information

Kernel Sequential Monte Carlo

Kernel Sequential Monte Carlo Kernel Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) * equal contribution April 25, 2016 1 / 37 Section

More information

Linear Regression (continued)

Linear Regression (continued) Linear Regression (continued) Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 6, 2017 1 / 39 Outline 1 Administration 2 Review of last lecture 3 Linear regression

More information

Learning Low Dimensional Predictive Representations

Learning Low Dimensional Predictive Representations Learning Low Dimensional Predictive Representations Matthew Rosencrantz MROSEN@CS.CMU.EDU Computer Science Department, Carnegie Mellon University, Forbes Avenue, Pittsburgh, PA, USA Geoff Gordon GGORDON@CS.CMU.EDU

More information

Diffeomorphic Warping. Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel)

Diffeomorphic Warping. Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel) Diffeomorphic Warping Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel) What Manifold Learning Isn t Common features of Manifold Learning Algorithms: 1-1 charting Dense sampling Geometric Assumptions

More information

Introduction to Graphical Models

Introduction to Graphical Models Introduction to Graphical Models The 15 th Winter School of Statistical Physics POSCO International Center & POSTECH, Pohang 2018. 1. 9 (Tue.) Yung-Kyun Noh GENERALIZATION FOR PREDICTION 2 Probabilistic

More information

Condensed Table of Contents for Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control by J. C.

Condensed Table of Contents for Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control by J. C. Condensed Table of Contents for Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control by J. C. Spall John Wiley and Sons, Inc., 2003 Preface... xiii 1. Stochastic Search

More information

Chapter 3. Tohru Katayama

Chapter 3. Tohru Katayama Subspace Methods for System Identification Chapter 3 Tohru Katayama Subspace Methods Reading Group UofA, Edmonton Barnabás Póczos May 14, 2009 Preliminaries before Linear Dynamical Systems Hidden Markov

More information

Inference Machines for Nonparametric Filter Learning

Inference Machines for Nonparametric Filter Learning Inference Machines for Nonparametric Filter Learning Arun Venkatraman, Wen Sun, Martial Hebert, Byron Boots 2, J. Andrew Bagnell Robotics Institute, Carnegie Mellon University, Pittsburgh, PA 523 2 School

More information

Estimating Covariance Using Factorial Hidden Markov Models

Estimating Covariance Using Factorial Hidden Markov Models Estimating Covariance Using Factorial Hidden Markov Models João Sedoc 1,2 with: Jordan Rodu 3, Lyle Ungar 1, Dean Foster 1 and Jean Gallier 1 1 University of Pennsylvania Philadelphia, PA joao@cis.upenn.edu

More information

Deriving Principal Component Analysis (PCA)

Deriving Principal Component Analysis (PCA) -0 Mathematical Foundations for Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Deriving Principal Component Analysis (PCA) Matt Gormley Lecture 11 Oct.

More information

Machine Learning 4771

Machine Learning 4771 Machine Learning 4771 Instructor: ony Jebara Kalman Filtering Linear Dynamical Systems and Kalman Filtering Structure from Motion Linear Dynamical Systems Audio: x=pitch y=acoustic waveform Vision: x=object

More information

HST.582J/6.555J/16.456J

HST.582J/6.555J/16.456J Blind Source Separation: PCA & ICA HST.582J/6.555J/16.456J Gari D. Clifford gari [at] mit. edu http://www.mit.edu/~gari G. D. Clifford 2005-2009 What is BSS? Assume an observation (signal) is a linear

More information

State Space Abstraction for Reinforcement Learning

State Space Abstraction for Reinforcement Learning State Space Abstraction for Reinforcement Learning Rowan McAllister & Thang Bui MLG, Cambridge Nov 6th, 24 / 26 Rowan s introduction 2 / 26 Types of abstraction [LWL6] Abstractions are partitioned based

More information

Convergence Rates of Kernel Quadrature Rules

Convergence Rates of Kernel Quadrature Rules Convergence Rates of Kernel Quadrature Rules Francis Bach INRIA - Ecole Normale Supérieure, Paris, France ÉCOLE NORMALE SUPÉRIEURE NIPS workshop on probabilistic integration - Dec. 2015 Outline Introduction

More information

Kernel adaptive Sequential Monte Carlo

Kernel adaptive Sequential Monte Carlo Kernel adaptive Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) December 7, 2015 1 / 36 Section 1 Outline

More information

Learning Predictive Models of a Depth Camera & Manipulator from Raw Execution Traces

Learning Predictive Models of a Depth Camera & Manipulator from Raw Execution Traces Learning Predictive Models of a Depth Camera & Manipulator from Raw Execution Traces Byron Boots Arunkumar Byravan Dieter Fox Abstract In this paper, we attack the problem of learning a predictive model

More information

Linear Dynamical Systems

Linear Dynamical Systems Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations

More information

Restricted Boltzmann Machines for Collaborative Filtering

Restricted Boltzmann Machines for Collaborative Filtering Restricted Boltzmann Machines for Collaborative Filtering Authors: Ruslan Salakhutdinov Andriy Mnih Geoffrey Hinton Benjamin Schwehn Presentation by: Ioan Stanculescu 1 Overview The Netflix prize problem

More information

Partially Observable Markov Decision Processes (POMDPs) Pieter Abbeel UC Berkeley EECS

Partially Observable Markov Decision Processes (POMDPs) Pieter Abbeel UC Berkeley EECS Partially Observable Markov Decision Processes (POMDPs) Pieter Abbeel UC Berkeley EECS Many slides adapted from Jur van den Berg Outline POMDPs Separation Principle / Certainty Equivalence Locally Optimal

More information

Scaling Neighbourhood Methods

Scaling Neighbourhood Methods Quick Recap Scaling Neighbourhood Methods Collaborative Filtering m = #items n = #users Complexity : m * m * n Comparative Scale of Signals ~50 M users ~25 M items Explicit Ratings ~ O(1M) (1 per billion)

More information

Partially observable systems and Predictive State Representation (PSR) Nan Jiang CS 598 Statistical UIUC

Partially observable systems and Predictive State Representation (PSR) Nan Jiang CS 598 Statistical UIUC Partially observable systems and Predictive State Representation (PSR) Nan Jiang CS 598 Statistical RL @ UIUC Partially observable systems Key assumption so far: Markov property (Markovianness) 2 Partially

More information

Human-Oriented Robotics. Temporal Reasoning. Kai Arras Social Robotics Lab, University of Freiburg

Human-Oriented Robotics. Temporal Reasoning. Kai Arras Social Robotics Lab, University of Freiburg Temporal Reasoning Kai Arras, University of Freiburg 1 Temporal Reasoning Contents Introduction Temporal Reasoning Hidden Markov Models Linear Dynamical Systems (LDS) Kalman Filter 2 Temporal Reasoning

More information

Robust Low Rank Kernel Embeddings of Multivariate Distributions

Robust Low Rank Kernel Embeddings of Multivariate Distributions Robust Low Rank Kernel Embeddings of Multivariate Distributions Le Song, Bo Dai College of Computing, Georgia Institute of Technology lsong@cc.gatech.edu, bodai@gatech.edu Abstract Kernel embedding of

More information

Machine Learning for Data Science (CS4786) Lecture 24

Machine Learning for Data Science (CS4786) Lecture 24 Machine Learning for Data Science (CS4786) Lecture 24 HMM Particle Filter Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2017fa/ Rejection Sampling Rejection Sampling Sample variables from joint

More information

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Linear Regression Regression Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Example: Height, Gender, Weight Shoe Size Audio features

More information

Detection and Estimation Theory

Detection and Estimation Theory Detection and Estimation Theory Instructor: Prof. Namrata Vaswani Dept. of Electrical and Computer Engineering Iowa State University http://www.ece.iastate.edu/ namrata Slide 1 What is Estimation and Detection

More information

Introduction to the Tensor Train Decomposition and Its Applications in Machine Learning

Introduction to the Tensor Train Decomposition and Its Applications in Machine Learning Introduction to the Tensor Train Decomposition and Its Applications in Machine Learning Anton Rodomanov Higher School of Economics, Russia Bayesian methods research group (http://bayesgroup.ru) 14 March

More information

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Linear Regression Regression Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Example: Height, Gender, Weight Shoe Size Audio features

More information

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang. Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning

More information

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works CS68: The Modern Algorithmic Toolbox Lecture #8: How PCA Works Tim Roughgarden & Gregory Valiant April 20, 206 Introduction Last lecture introduced the idea of principal components analysis (PCA). The

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 12 Dynamical Models CS/CNS/EE 155 Andreas Krause Homework 3 out tonight Start early!! Announcements Project milestones due today Please email to TAs 2 Parameter learning

More information

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels? Machine Learning and Bayesian Inference Dr Sean Holden Computer Laboratory, Room FC6 Telephone extension 6372 Email: sbh11@cl.cam.ac.uk www.cl.cam.ac.uk/ sbh11/ Unsupervised learning Can we find regularity

More information

Linear Regression (9/11/13)

Linear Regression (9/11/13) STA561: Probabilistic machine learning Linear Regression (9/11/13) Lecturer: Barbara Engelhardt Scribes: Zachary Abzug, Mike Gloudemans, Zhuosheng Gu, Zhao Song 1 Why use linear regression? Figure 1: Scatter

More information

Estimating Latent Variable Graphical Models with Moments and Likelihoods

Estimating Latent Variable Graphical Models with Moments and Likelihoods Estimating Latent Variable Graphical Models with Moments and Likelihoods Arun Tejasvi Chaganty Percy Liang Stanford University June 18, 2014 Chaganty, Liang (Stanford University) Moments and Likelihoods

More information

LECTURE 15 Markov chain Monte Carlo

LECTURE 15 Markov chain Monte Carlo LECTURE 15 Markov chain Monte Carlo There are many settings when posterior computation is a challenge in that one does not have a closed form expression for the posterior distribution. Markov chain Monte

More information

Kalman filtering and friends: Inference in time series models. Herke van Hoof slides mostly by Michael Rubinstein

Kalman filtering and friends: Inference in time series models. Herke van Hoof slides mostly by Michael Rubinstein Kalman filtering and friends: Inference in time series models Herke van Hoof slides mostly by Michael Rubinstein Problem overview Goal Estimate most probable state at time k using measurement up to time

More information

Pose Tracking II! Gordon Wetzstein! Stanford University! EE 267 Virtual Reality! Lecture 12! stanford.edu/class/ee267/!

Pose Tracking II! Gordon Wetzstein! Stanford University! EE 267 Virtual Reality! Lecture 12! stanford.edu/class/ee267/! Pose Tracking II! Gordon Wetzstein! Stanford University! EE 267 Virtual Reality! Lecture 12! stanford.edu/class/ee267/!! WARNING! this class will be dense! will learn how to use nonlinear optimization

More information

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012 CSE 573: Artificial Intelligence Autumn 2012 Reasoning about Uncertainty & Hidden Markov Models Daniel Weld Many slides adapted from Dan Klein, Stuart Russell, Andrew Moore & Luke Zettlemoyer 1 Outline

More information

11 The Max-Product Algorithm

11 The Max-Product Algorithm Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms for Inference Fall 2014 11 The Max-Product Algorithm In the previous lecture, we introduced

More information

Gaussian Processes as Continuous-time Trajectory Representations: Applications in SLAM and Motion Planning

Gaussian Processes as Continuous-time Trajectory Representations: Applications in SLAM and Motion Planning Gaussian Processes as Continuous-time Trajectory Representations: Applications in SLAM and Motion Planning Jing Dong jdong@gatech.edu 2017-06-20 License CC BY-NC-SA 3.0 Discrete time SLAM Downsides: Measurements

More information

Review of similarity transformation and Singular Value Decomposition

Review of similarity transformation and Singular Value Decomposition Review of similarity transformation and Singular Value Decomposition Nasser M Abbasi Applied Mathematics Department, California State University, Fullerton July 8 7 page compiled on June 9, 5 at 9:5pm

More information

Online Bayesian Transfer Learning for Sequential Data Modeling

Online Bayesian Transfer Learning for Sequential Data Modeling Online Bayesian Transfer Learning for Sequential Data Modeling....? Priyank Jaini Machine Learning, Algorithms and Theory Lab Network for Aging Research 2 3 Data of personal preferences (years) Data (non-existent)

More information

Symbolic Perseus: a Generic POMDP Algorithm with Application to Dynamic Pricing with Demand Learning

Symbolic Perseus: a Generic POMDP Algorithm with Application to Dynamic Pricing with Demand Learning Symbolic Perseus: a Generic POMDP Algorithm with Application to Dynamic Pricing with Demand Learning Pascal Poupart (University of Waterloo) INFORMS 2009 1 Outline Dynamic Pricing as a POMDP Symbolic Perseus

More information

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari Deep Belief Nets Sargur N. Srihari srihari@cedar.buffalo.edu Topics 1. Boltzmann machines 2. Restricted Boltzmann machines 3. Deep Belief Networks 4. Deep Boltzmann machines 5. Boltzmann machines for continuous

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

More information

Mathematical Formulation of Our Example

Mathematical Formulation of Our Example Mathematical Formulation of Our Example We define two binary random variables: open and, where is light on or light off. Our question is: What is? Computer Vision 1 Combining Evidence Suppose our robot

More information

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

PILCO: A Model-Based and Data-Efficient Approach to Policy Search PILCO: A Model-Based and Data-Efficient Approach to Policy Search (M.P. Deisenroth and C.E. Rasmussen) CSC2541 November 4, 2016 PILCO Graphical Model PILCO Probabilistic Inference for Learning COntrol

More information

Modeling Multiple-mode Systems with Predictive State Representations

Modeling Multiple-mode Systems with Predictive State Representations Modeling Multiple-mode Systems with Predictive State Representations Britton Wolfe Computer Science Indiana University-Purdue University Fort Wayne wolfeb@ipfw.edu Michael R. James AI and Robotics Group

More information

Topics of Active Research in Reinforcement Learning Relevant to Spoken Dialogue Systems

Topics of Active Research in Reinforcement Learning Relevant to Spoken Dialogue Systems Topics of Active Research in Reinforcement Learning Relevant to Spoken Dialogue Systems Pascal Poupart David R. Cheriton School of Computer Science University of Waterloo 1 Outline Review Markov Models

More information

Dictionary Learning Using Tensor Methods

Dictionary Learning Using Tensor Methods Dictionary Learning Using Tensor Methods Anima Anandkumar U.C. Irvine Joint work with Rong Ge, Majid Janzamin and Furong Huang. Feature learning as cornerstone of ML ML Practice Feature learning as cornerstone

More information

Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm

Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm Qiang Liu and Dilin Wang NIPS 2016 Discussion by Yunchen Pu March 17, 2017 March 17, 2017 1 / 8 Introduction Let x R d

More information

Diffusion/Inference geometries of data features, situational awareness and visualization. Ronald R Coifman Mathematics Yale University

Diffusion/Inference geometries of data features, situational awareness and visualization. Ronald R Coifman Mathematics Yale University Diffusion/Inference geometries of data features, situational awareness and visualization Ronald R Coifman Mathematics Yale University Digital data is generally converted to point clouds in high dimensional

More information

Predictive Timing Models

Predictive Timing Models Predictive Timing Models Pierre-Luc Bacon McGill University pbacon@cs.mcgill.ca Borja Balle McGill University bballe@cs.mcgill.ca Doina Precup McGill University dprecup@cs.mcgill.ca Abstract We consider

More information

Part 1: Expectation Propagation

Part 1: Expectation Propagation Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 1: Expectation Propagation Tom Heskes Machine Learning Group, Institute for Computing and Information Sciences Radboud

More information

Online Algorithms for Sum-Product

Online Algorithms for Sum-Product Online Algorithms for Sum-Product Networks with Continuous Variables Priyank Jaini Ph.D. Seminar Consistent/Robust Tensor Decomposition And Spectral Learning Offline Bayesian Learning ADF, EP, SGD, oem

More information

Time-Sensitive Dirichlet Process Mixture Models

Time-Sensitive Dirichlet Process Mixture Models Time-Sensitive Dirichlet Process Mixture Models Xiaojin Zhu Zoubin Ghahramani John Lafferty May 25 CMU-CALD-5-4 School of Computer Science Carnegie Mellon University Pittsburgh, PA 523 Abstract We introduce

More information

Predictive State Recurrent Neural Networks

Predictive State Recurrent Neural Networks Predictive State Recurrent Neural Networks Carlton Downey Carnegie Mellon University Pittsburgh, PA 15213 cmdowney@cs.cmu.edu Ahmed Hefny Carnegie Mellon University Pittsburgh, PA, 15213 ahefny@cs.cmu.edu

More information

Machine Learning Techniques for Computer Vision

Machine Learning Techniques for Computer Vision Machine Learning Techniques for Computer Vision Part 2: Unsupervised Learning Microsoft Research Cambridge x 3 1 0.5 0.2 0 0.5 0.3 0 0.5 1 ECCV 2004, Prague x 2 x 1 Overview of Part 2 Mixture models EM

More information

ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering

ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering Lecturer: Nikolay Atanasov: natanasov@ucsd.edu Teaching Assistants: Siwei Guo: s9guo@eng.ucsd.edu Anwesan Pal:

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Afternoon Meeting on Bayesian Computation 2018 University of Reading

Afternoon Meeting on Bayesian Computation 2018 University of Reading Gabriele Abbati 1, Alessra Tosi 2, Seth Flaxman 3, Michael A Osborne 1 1 University of Oxford, 2 Mind Foundry Ltd, 3 Imperial College London Afternoon Meeting on Bayesian Computation 2018 University of

More information

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007 MIT OpenCourseWare http://ocw.mit.edu HST.582J / 6.555J / 16.456J Biomedical Signal and Image Processing Spring 2007 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

CSEP 573: Artificial Intelligence

CSEP 573: Artificial Intelligence CSEP 573: Artificial Intelligence Hidden Markov Models Luke Zettlemoyer Many slides over the course adapted from either Dan Klein, Stuart Russell, Andrew Moore, Ali Farhadi, or Dan Weld 1 Outline Probabilistic

More information

A New View of Predictive State Methods for Dynamical System Learning

A New View of Predictive State Methods for Dynamical System Learning A New View of Predictive State Methods for Dynamical System Learning Ahmed Hefny Abstract Recently there has been substantial interest in predictive state methods for learning dynamical systems: these

More information

CS181 Midterm 2 Practice Solutions

CS181 Midterm 2 Practice Solutions CS181 Midterm 2 Practice Solutions 1. Convergence of -Means Consider Lloyd s algorithm for finding a -Means clustering of N data, i.e., minimizing the distortion measure objective function J({r n } N n=1,

More information

Managing Uncertainty

Managing Uncertainty Managing Uncertainty Bayesian Linear Regression and Kalman Filter December 4, 2017 Objectives The goal of this lab is multiple: 1. First it is a reminder of some central elementary notions of Bayesian

More information

The Kalman Filter ImPr Talk

The Kalman Filter ImPr Talk The Kalman Filter ImPr Talk Ged Ridgway Centre for Medical Image Computing November, 2006 Outline What is the Kalman Filter? State Space Models Kalman Filter Overview Bayesian Updating of Estimates Kalman

More information

Graphical Models and Kernel Methods

Graphical Models and Kernel Methods Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.

More information

1/30/2012. Reasoning with uncertainty III

1/30/2012. Reasoning with uncertainty III Reasoning with uncertainty III 1 Temporal Models Assumptions 2 Example First-order: From d-separation, X t+1 is conditionally independent of X t-1 given X t No independence of measurements Yt Alternate

More information

CS60021: Scalable Data Mining. Dimensionality Reduction

CS60021: Scalable Data Mining. Dimensionality Reduction J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 1 CS60021: Scalable Data Mining Dimensionality Reduction Sourangshu Bhattacharya Assumption: Data lies on or near a

More information

Spectral Learning Part I

Spectral Learning Part I Spectral Learning Part I Geoff Gordon http://www.cs.cmu.edu/~ggordon/ Machine Learning Department Carnegie Mellon University http://www.cs.cmu.edu/ ~ ggordon/spectral-learning/ What is spectral learning?

More information

Learning the Linear Dynamical System with ASOS ( Approximated Second-Order Statistics )

Learning the Linear Dynamical System with ASOS ( Approximated Second-Order Statistics ) Learning the Linear Dynamical System with ASOS ( Approximated Second-Order Statistics ) James Martens University of Toronto June 24, 2010 Computer Science UNIVERSITY OF TORONTO James Martens (U of T) Learning

More information

Hidden Markov models

Hidden Markov models Hidden Markov models Charles Elkan November 26, 2012 Important: These lecture notes are based on notes written by Lawrence Saul. Also, these typeset notes lack illustrations. See the classroom lectures

More information

Final Exam December 12, 2017

Final Exam December 12, 2017 Introduction to Artificial Intelligence CSE 473, Autumn 2017 Dieter Fox Final Exam December 12, 2017 Directions This exam has 7 problems with 111 points shown in the table below, and you have 110 minutes

More information

Final Exam, Fall 2002

Final Exam, Fall 2002 15-781 Final Exam, Fall 22 1. Write your name and your andrew email address below. Name: Andrew ID: 2. There should be 17 pages in this exam (excluding this cover sheet). 3. If you need more room to work

More information