An Online Spectral Learning Algorithm for Partially Observable Nonlinear Dynamical Systems
|
|
- Dorothy Roberts
- 5 years ago
- Views:
Transcription
1 An Online Spectral Learning Algorithm for Partially Observable Nonlinear Dynamical Systems Byron Boots and Geoffrey J. Gordon AAAI 2011 Select Lab Carnegie Mellon University
2 What is out there? o t+2 o t 2 o t 1 o t o t+1
3 What is out there? Dynamical Systems Past Future o t+2 o t 2 o t 1 o t o t+1 State Dynamical System = A recursive rule for updating state based on observations
4 Learning a Dynamical System Past Future o t+2 o t 2 o t 1 o t o t+1 State we would like to a model of a dynamical system
5 Learning a Dynamical System Past Future o t+2 o t 2 o t 1 o t o t+1 State we would like to a model of a dynamical system today I will focus on Spectral Learning Algorithms for Predictive State Representations
6 Predictive State Representations (PSRs) comprised of: set of ions set of observations initial state set of transition matrices normalization vector A O x1 R d M ao R d d e R d
7 Predictive State Representations (PSRs) comprised of: set of ions set of observations initial state set of transition matrices normalization vector A O x1 R d M ao R d d e R d at each time step can predict: P (o x t, do(a t )) = e M at,ox t
8 Predictive State Representations (PSRs) comprised of: set of ions set of observations initial state set of transition matrices normalization vector A O x1 R d M ao R d d e R d at each time step can predict: P (o x t, do(a t )) = e M at,ox t
9 Predictive State Representations (PSRs) comprised of: set of ions set of observations initial state set of transition matrices normalization vector A O x1 R d M ao R d d e R d at each time step can predict: P (o x t, do(a t )) = e M at,ox t
10 Predictive State Representations (PSRs) comprised of: set of ions set of observations initial state set of transition matrices normalization vector A O x1 R d M ao R d d e R d at each time step can predict: P (o x t, do(a t )) = e M at,ox t and update state: x t+1 = M at,o t x t /P (o t x t, do(a t ))
11 Predictive State Representations (PSRs) comprised of: set of ions set of observations initial state set of transition matrices normalization vector A O x1 R d M ao R d d e R d at each time step can predict: P (o x t, do(a t )) = e M at,ox t and update state: x t+1 = M at,o t x t /P (o t x t, do(a t ))
12 Predictive State Representations (PSRs) comprised of: set of ions set of observations initial state set of transition matrices normalization vector A O x1 R d M ao R d d e R d at each time step can predict: P (o x t, do(a t )) = e M at,ox t and update state: x t+1 = M at,o t x t /P (o t x t, do(a t ))
13 Predictive State Representations parameters are only determined up to a similarity transform S R d d if we replace M ao S 1 M ao S x 1 S 1 x 1 e S e the resulting PSR makes exly the same predictions as the original one e.g. P (o x t, do(a t )) = e SS 1 M at,oss 1 x t
14 PSRs Are Very Expressive Predictive State Representations Reduced-Rank HMMs & Reduced Rank POMDPS HMMs & POMDPS for fixed latent dimension d
15 Learning PSRs can use fast, statistically consistent, spectral methods to PSR parameters
16 A General Principle predict data about past (many samples) state data about future (many samples) compress bottleneck expand
17 A General Principle predict data about past (many samples) state data about future (many samples) compress bottleneck expand If bottleneck = rank constraint, then get a spectral method
18 Why Spectral Methods? There are many ways to a dynamical system Maximum Likelihood via Expectation Maximization, Gradient Descent,... Bayesian inference via Gibbs, Metropolis Hastings,... In contrast to these methods, spectral ing algorithms give No local optima: Huge gain in computational efficiency Slight loss in statistical efficiency
19 Spectral Learning for PSRs moments of directly observable features Σ T,AO,H trivariance tensor of features of the future, present, and past Σ T,H Σ AO,AO covariance matrix of features of the future and past covariance matrix of features present
20 Spectral Learning for PSRs moments of directly observable features Σ T,AO,H trivariance tensor of features of the future, present, and past Σ T,H Σ AO,AO covariance matrix of features of the future and past covariance matrix of features present U left d singular vectors of Σ T,H
21 Spectral Learning for PSRs moments of directly observable features Σ T,AO,H trivariance tensor of features of the future, present, and past Σ T,H Σ AO,AO covariance matrix of features of the future and past covariance matrix of features present U left d singular vectors of Σ T,H S 1 M ao S := Σ T,AO,H 1 U 2 φ(ao) (Σ AO,AO ) 1 3 (Σ T,HU) the other parameters can be found analogously
22 Spectral Learning for PSRs Spectral Learning Algorithm: Estimate Σ T,AO,H, Σ T,H, and Σ AO,AO from data Find U by SVD Plug in to recover PSR parameters Learning is Statistically Consistent Only requires Linear Algebra For details, see: B. Boots, S. M. Siddiqi, and G. Gordon. Closing the ing-planning loop with predictive state representations. RSS, 2010.
23 Infinite Features Can extend the ing algorithm to infinite feature spaces Kernels Learning algorithm that we have seen is linear algebra works just fine in an arbitrary RKHS Can rewrite all of the formulas in terms of Gram matrices Uses kernel SVD instead of SVD Result: Hilbert Space Embeddings of Dynamical Systems handles near arbitrary observation distributions good prediction performance For details, see: L. Song, B. Boots, S. M. Siddiqi, G. Gordon, and A. J. Smola. Hilbert space embeddings of hidden Markov models. ICML, 2010.
24 Accuracy (%) 6 MSE Err. Avg. Prediction 6 Avg. Prediction Err. Slot Car 8 An Experiment Mean 7 Last 6 LDS Hilbert Space Embeddings of Hidde 5 Hilbert Space Embeddings of H IMU 4 8 x 10 B. B. 8A.x HMM Mean 7 HMM Mean PSR RR-HMM Last 85 RR-HMM LDS Last Embedded 6 80 IMU 2 54 LDS Embedded 5 IMU thanks 2 to Dieter Fox s lab Slot Car Slot Car A. B. Avg. Prediction Err. A. sense x 10 6 Racetrack Prediction Horiz Figure 4. Slot car inertial measurement data. car platform and the IMU (top) and the rac man vs. this data while the slot car circled the track controlled 11 tom). (B) Squared error for prediction with d tom). (B) Squared error for prediction with different estiembedd by a constant policy. The goal of this experiment was N mated models and baselines. to a model of the noisy IMU data, and, after differen mated models and baselines. ti Racetrack Prediction Horizon 1 60 horizon 0 Figure Slot 30 car inertial (A) The slot Fi measurement data car platform and the IMU (top) and the racetrack (botm Racetrack Prediction Horizon tom). (B) Squared error for prediction with different estiem Figure 4.L. Slot inertial measurement The mated models and baselines. Song, B.car Boots, S. M. Siddiqi, G. Gordon, and A.data. J. Smola.(A) Hilbert spaceslot embeddingsfiguredi5 of hiddenand Markov models. ICML, and the racetrack (botcar platform the IMU (top) filtering, to predict future IMU readings. sp
25 Batch Methods Bottleneck: SVD of Gram or Covariance matrix G: (# time steps) 2 C: (# features window length) (# time steps) E.g., 1 hr video, 24 fps, , features of past and future are all pixels in 2 s windows G: ( ) ( ) 10 10
26 Making it Fast Two techniques online ing random projections Neither one new, but combination with spectral ing for PSRs is, and makes huge difference in price
27 Online Learning U left d singular vectors of Σ T,H S 1 M ao S := Σ T,AO,H 1 U 2 φ(ao) (Σ AO,AO ) 1 3 (Σ T,HU)
28 Online Learning U left d singular vectors of Σ T,H S 1 M ao S := Σ T,AO,H 1 U 2 φ(ao) (Σ AO,AO ) 1 3 (Σ T,HU) With each new observation, rank-1 update of: SVD (Brand) inverse (Sherman-Morrison) n features; latent dimension d; T steps space = O(nd): may fit in cache! time = O(nd 2 T): bounded time per example Problem: no rank-1 update of kernel SVD! can use random projections [Rahimi & Recht, 2007]
29 Random Projections U left d singular vectors of Σ T,H S 1 M ao S := Σ T,AO,H 1 U 2 φ(ao) (Σ AO,AO ) 1 3 (Σ T,HU) With each new observation, rank-1 update of: SVD (Brand) inverse (Sherman-Morrison) n features; latent dimension d; T steps space = O(nd): may fit in cache! time = O(nd 2 T): bounded time per example Problem: no rank-1 update of kernel SVD! can use random projections [Rahimi & Recht, 2007]
30 Hilbert Space B. (Revisited) Experiment 8 Avg.MSEPrediction Err. Mean 7 Last 6 LDS Hilbert Embeddings of Hidd 5 x 10Space 6 IMU 8 x 10 Mean Observation 47 B HMM 7 HSE-HMMHMM 36 MeanEmbedded 85 Last Online withrr-hmm Rand. Features LDS Embedded 4 5 IMU thanks 2 to Dieter Fox s lab Accuracy (%) Slot Car A. Slot Car A. x 10 6 Avg. Prediction Err. sense Racetrack Prediction Hori Prediction 70 80Horizon Figure measurement data. Racetrack 4. Slot car inertial Prediction Horizon Figure 4. Slot car inertial measurement data. (A) The slot Figure car platform and the IMU (top) and the ra car platform and the IMU (top) and the racetrack (botman vs (B) error Squared errorwith fordifferent prediction with tom).tom). (B) Squared for prediction estiembed mated models and baselines. differen mated models and baselines
31 31 sense Online Learning Example Conference Room A. Table
32 32 sense Online Learning Example Conference Room Table A. online+random: 100k features, 11k frames, limit = avail. data offline: 2k frames, compressed & subsampled, compute-limited
33 33 sense Online Learning Example Conference Room Table A. online+random: 100k features, 11k frames, limit = avail. data offline: 2k frames, compressed & subsampled, compute-limited s 100 steps
34 34 sense Online Learning Example Conference Room Table A. online+random: 100k features, 11k frames, limit = avail. data offline: 2k frames, compressed & subsampled, compute-limited 350 steps
35 35 sense Online Learning Example Conference Room Table A. online+random: 100k features, 11k frames, limit = avail. data offline: 2k frames, compressed & subsampled, compute-limited 600 steps 0
36 sense Online Learning Example Conference Room A. online+random: 100k features, 11k frames, limit = avail. data Table offline: 2k frames, compressed & subsampled, compute-limited A. B. final embedding (colors = 3rd dim) 0 After 600 samples 0 Table 0 B. í í 600 steps C. C. After 1 $I sa Af sa í í 0 36
37 Paper Summary We present spectral ing algorithms for PSR models of partially observable nonlinear dynamical systems. We show how to update parameters of the estimated PSR model given new data efficient online spectral ing algorithm We show how to use random projections to approximate kernel-based ing algorithms
Learning about State. Geoff Gordon Machine Learning Department Carnegie Mellon University
Learning about State Geoff Gordon Machine Learning Department Carnegie Mellon University joint work with Byron Boots, Sajid Siddiqi, Le Song, Alex Smola 2 What s out there?...... ot-2 ot-1 ot ot+1 ot+2
More informationLearning about State. Geoff Gordon Machine Learning Department Carnegie Mellon University
Learning about State Geoff Gordon Machine Learning Department Carnegie Mellon University joint work with Byron Boots, Sajid Siddiqi, Le Song, Alex Smola What s out there?...... ot-2 ot-1 ot ot+1 ot+2 2
More informationSpectral learning algorithms for dynamical systems
Spectral learning algorithms for dynamical systems Geoff Gordon http://www.cs.cmu.edu/~ggordon/ Machine Learning Department Carnegie Mellon University joint work with Byron Boots, Sajid Siddiqi, Le Song,
More informationReduced-Rank Hidden Markov Models
Reduced-Rank Hidden Markov Models Sajid M. Siddiqi Byron Boots Geoffrey J. Gordon Carnegie Mellon University ... x 1 x 2 x 3 x τ y 1 y 2 y 3 y τ Sequence of observations: Y =[y 1 y 2 y 3... y τ ] Assume
More informationHilbert Space Embeddings of Predictive State Representations
Hilbert Space Embeddings of Predictive State Representations Byron Boots Computer Science and Engineering Dept. University of Washington Seattle, WA Arthur Gretton Gatsby Unit University College London
More informationHilbert Space Embeddings of Hidden Markov Models
Hilbert Space Embeddings of Hidden Markov Models Le Song Carnegie Mellon University Joint work with Byron Boots, Sajid Siddiqi, Geoff Gordon and Alex Smola 1 Big Picture QuesJon Graphical Models! Dependent
More informationA Constraint Generation Approach to Learning Stable Linear Dynamical Systems
A Constraint Generation Approach to Learning Stable Linear Dynamical Systems Sajid M. Siddiqi Byron Boots Geoffrey J. Gordon Carnegie Mellon University NIPS 2007 poster W22 steam Application: Dynamic Textures
More information26 : Spectral GMs. Lecturer: Eric P. Xing Scribes: Guillermo A Cidre, Abelino Jimenez G.
10-708: Probabilistic Graphical Models, Spring 2015 26 : Spectral GMs Lecturer: Eric P. Xing Scribes: Guillermo A Cidre, Abelino Jimenez G. 1 Introduction A common task in machine learning is to work with
More informationCS8803: Statistical Techniques in Robotics Byron Boots. Hilbert Space Embeddings
CS8803: Statistical Techniques in Robotics Byron Boots Hilbert Space Embeddings 1 Motivation CS8803: STR Hilbert Space Embeddings 2 Overview Multinomial Distributions Marginal, Joint, Conditional Sum,
More informationHilbert Space Embeddings of Hidden Markov Models
Le Song lesong@cs.cmu.edu Byron Boots beb@cs.cmu.edu School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA Sajid M. Siddiqi siddiqi@google.com Google, Pittsburgh, PA 15213,
More informationKernel Bayes Rule: Nonparametric Bayesian inference with kernels
Kernel Bayes Rule: Nonparametric Bayesian inference with kernels Kenji Fukumizu The Institute of Statistical Mathematics NIPS 2012 Workshop Confluence between Kernel Methods and Graphical Models December
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA Contents in latter part Linear Dynamical Systems What is different from HMM? Kalman filter Its strength and limitation Particle Filter
More informationarxiv: v2 [stat.ml] 28 Feb 2018
An Efficient, Expressive and Local Minima-free Method for Learning Controlled Dynamical Systems Ahmed Hefny Carnegie Mellon University Carlton Downey Carnegie Mellon University Geoffrey Gordon Carnegie
More informationarxiv: v1 [cs.lg] 12 Dec 2009
Closing the Learning-Planning Loop with Predictive State Representations Byron Boots Machine Learning Department Carnegie Mellon University Pittsburgh, PA 15213 beb@cs.cmu.edu Sajid M. Siddiqi Robotics
More informationGraphical Models for Collaborative Filtering
Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,
More informationHilbert Space Embeddings of Hidden Markov Models
Carnegie Mellon University Research Showcase @ CMU Machine Learning Department School of Computer Science 6-00 Hilbert Space Embeddings of Hidden Markov Models Le Song Carnegie Mellon University Byron
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationTwo-Manifold Problems with Applications to Nonlinear System Identification
Two-Manifold Problems with Applications to Nonlinear System Identification Byron Boots Geoffrey J. Gordon Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA 1213 beb@cs.cmu.edu ggordon@cs.cmu.edu
More informationLinear Dynamical Systems (Kalman filter)
Linear Dynamical Systems (Kalman filter) (a) Overview of HMMs (b) From HMMs to Linear Dynamical Systems (LDS) 1 Markov Chains with Discrete Random Variables x 1 x 2 x 3 x T Let s assume we have discrete
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin
More informationSpectral Probabilistic Modeling and Applications to Natural Language Processing
School of Computer Science Spectral Probabilistic Modeling and Applications to Natural Language Processing Doctoral Thesis Proposal Ankur Parikh Thesis Committee: Eric Xing, Geoff Gordon, Noah Smith, Le
More informationarxiv: v1 [cs.lg] 29 Dec 2011
Two-Manifold Problems arxiv:1112.6399v1 [cs.lg] 29 Dec 211 Byron Boots Machine Learning Department Carnegie Mellon University Pittsburgh, PA 15213 beb@cs.cmu.edu Abstract Recently, there has been much
More informationLecture 21: Spectral Learning for Graphical Models
10-708: Probabilistic Graphical Models 10-708, Spring 2016 Lecture 21: Spectral Learning for Graphical Models Lecturer: Eric P. Xing Scribes: Maruan Al-Shedivat, Wei-Cheng Chang, Frederick Liu 1 Motivation
More informationRandomized Algorithms
Randomized Algorithms Saniv Kumar, Google Research, NY EECS-6898, Columbia University - Fall, 010 Saniv Kumar 9/13/010 EECS6898 Large Scale Machine Learning 1 Curse of Dimensionality Gaussian Mixture Models
More informationKernel Sequential Monte Carlo
Kernel Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) * equal contribution April 25, 2016 1 / 37 Section
More informationLinear Regression (continued)
Linear Regression (continued) Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 6, 2017 1 / 39 Outline 1 Administration 2 Review of last lecture 3 Linear regression
More informationLearning Low Dimensional Predictive Representations
Learning Low Dimensional Predictive Representations Matthew Rosencrantz MROSEN@CS.CMU.EDU Computer Science Department, Carnegie Mellon University, Forbes Avenue, Pittsburgh, PA, USA Geoff Gordon GGORDON@CS.CMU.EDU
More informationDiffeomorphic Warping. Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel)
Diffeomorphic Warping Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel) What Manifold Learning Isn t Common features of Manifold Learning Algorithms: 1-1 charting Dense sampling Geometric Assumptions
More informationIntroduction to Graphical Models
Introduction to Graphical Models The 15 th Winter School of Statistical Physics POSCO International Center & POSTECH, Pohang 2018. 1. 9 (Tue.) Yung-Kyun Noh GENERALIZATION FOR PREDICTION 2 Probabilistic
More informationCondensed Table of Contents for Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control by J. C.
Condensed Table of Contents for Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control by J. C. Spall John Wiley and Sons, Inc., 2003 Preface... xiii 1. Stochastic Search
More informationChapter 3. Tohru Katayama
Subspace Methods for System Identification Chapter 3 Tohru Katayama Subspace Methods Reading Group UofA, Edmonton Barnabás Póczos May 14, 2009 Preliminaries before Linear Dynamical Systems Hidden Markov
More informationInference Machines for Nonparametric Filter Learning
Inference Machines for Nonparametric Filter Learning Arun Venkatraman, Wen Sun, Martial Hebert, Byron Boots 2, J. Andrew Bagnell Robotics Institute, Carnegie Mellon University, Pittsburgh, PA 523 2 School
More informationEstimating Covariance Using Factorial Hidden Markov Models
Estimating Covariance Using Factorial Hidden Markov Models João Sedoc 1,2 with: Jordan Rodu 3, Lyle Ungar 1, Dean Foster 1 and Jean Gallier 1 1 University of Pennsylvania Philadelphia, PA joao@cis.upenn.edu
More informationDeriving Principal Component Analysis (PCA)
-0 Mathematical Foundations for Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Deriving Principal Component Analysis (PCA) Matt Gormley Lecture 11 Oct.
More informationMachine Learning 4771
Machine Learning 4771 Instructor: ony Jebara Kalman Filtering Linear Dynamical Systems and Kalman Filtering Structure from Motion Linear Dynamical Systems Audio: x=pitch y=acoustic waveform Vision: x=object
More informationHST.582J/6.555J/16.456J
Blind Source Separation: PCA & ICA HST.582J/6.555J/16.456J Gari D. Clifford gari [at] mit. edu http://www.mit.edu/~gari G. D. Clifford 2005-2009 What is BSS? Assume an observation (signal) is a linear
More informationState Space Abstraction for Reinforcement Learning
State Space Abstraction for Reinforcement Learning Rowan McAllister & Thang Bui MLG, Cambridge Nov 6th, 24 / 26 Rowan s introduction 2 / 26 Types of abstraction [LWL6] Abstractions are partitioned based
More informationConvergence Rates of Kernel Quadrature Rules
Convergence Rates of Kernel Quadrature Rules Francis Bach INRIA - Ecole Normale Supérieure, Paris, France ÉCOLE NORMALE SUPÉRIEURE NIPS workshop on probabilistic integration - Dec. 2015 Outline Introduction
More informationKernel adaptive Sequential Monte Carlo
Kernel adaptive Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) December 7, 2015 1 / 36 Section 1 Outline
More informationLearning Predictive Models of a Depth Camera & Manipulator from Raw Execution Traces
Learning Predictive Models of a Depth Camera & Manipulator from Raw Execution Traces Byron Boots Arunkumar Byravan Dieter Fox Abstract In this paper, we attack the problem of learning a predictive model
More informationLinear Dynamical Systems
Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations
More informationRestricted Boltzmann Machines for Collaborative Filtering
Restricted Boltzmann Machines for Collaborative Filtering Authors: Ruslan Salakhutdinov Andriy Mnih Geoffrey Hinton Benjamin Schwehn Presentation by: Ioan Stanculescu 1 Overview The Netflix prize problem
More informationPartially Observable Markov Decision Processes (POMDPs) Pieter Abbeel UC Berkeley EECS
Partially Observable Markov Decision Processes (POMDPs) Pieter Abbeel UC Berkeley EECS Many slides adapted from Jur van den Berg Outline POMDPs Separation Principle / Certainty Equivalence Locally Optimal
More informationScaling Neighbourhood Methods
Quick Recap Scaling Neighbourhood Methods Collaborative Filtering m = #items n = #users Complexity : m * m * n Comparative Scale of Signals ~50 M users ~25 M items Explicit Ratings ~ O(1M) (1 per billion)
More informationPartially observable systems and Predictive State Representation (PSR) Nan Jiang CS 598 Statistical UIUC
Partially observable systems and Predictive State Representation (PSR) Nan Jiang CS 598 Statistical RL @ UIUC Partially observable systems Key assumption so far: Markov property (Markovianness) 2 Partially
More informationHuman-Oriented Robotics. Temporal Reasoning. Kai Arras Social Robotics Lab, University of Freiburg
Temporal Reasoning Kai Arras, University of Freiburg 1 Temporal Reasoning Contents Introduction Temporal Reasoning Hidden Markov Models Linear Dynamical Systems (LDS) Kalman Filter 2 Temporal Reasoning
More informationRobust Low Rank Kernel Embeddings of Multivariate Distributions
Robust Low Rank Kernel Embeddings of Multivariate Distributions Le Song, Bo Dai College of Computing, Georgia Institute of Technology lsong@cc.gatech.edu, bodai@gatech.edu Abstract Kernel embedding of
More informationMachine Learning for Data Science (CS4786) Lecture 24
Machine Learning for Data Science (CS4786) Lecture 24 HMM Particle Filter Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2017fa/ Rejection Sampling Rejection Sampling Sample variables from joint
More informationRegression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)
Linear Regression Regression Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Example: Height, Gender, Weight Shoe Size Audio features
More informationDetection and Estimation Theory
Detection and Estimation Theory Instructor: Prof. Namrata Vaswani Dept. of Electrical and Computer Engineering Iowa State University http://www.ece.iastate.edu/ namrata Slide 1 What is Estimation and Detection
More informationIntroduction to the Tensor Train Decomposition and Its Applications in Machine Learning
Introduction to the Tensor Train Decomposition and Its Applications in Machine Learning Anton Rodomanov Higher School of Economics, Russia Bayesian methods research group (http://bayesgroup.ru) 14 March
More informationRegression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)
Linear Regression Regression Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Example: Height, Gender, Weight Shoe Size Audio features
More informationMachine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.
Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning
More informationCS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works
CS68: The Modern Algorithmic Toolbox Lecture #8: How PCA Works Tim Roughgarden & Gregory Valiant April 20, 206 Introduction Last lecture introduced the idea of principal components analysis (PCA). The
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 12 Dynamical Models CS/CNS/EE 155 Andreas Krause Homework 3 out tonight Start early!! Announcements Project milestones due today Please email to TAs 2 Parameter learning
More informationMachine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?
Machine Learning and Bayesian Inference Dr Sean Holden Computer Laboratory, Room FC6 Telephone extension 6372 Email: sbh11@cl.cam.ac.uk www.cl.cam.ac.uk/ sbh11/ Unsupervised learning Can we find regularity
More informationLinear Regression (9/11/13)
STA561: Probabilistic machine learning Linear Regression (9/11/13) Lecturer: Barbara Engelhardt Scribes: Zachary Abzug, Mike Gloudemans, Zhuosheng Gu, Zhao Song 1 Why use linear regression? Figure 1: Scatter
More informationEstimating Latent Variable Graphical Models with Moments and Likelihoods
Estimating Latent Variable Graphical Models with Moments and Likelihoods Arun Tejasvi Chaganty Percy Liang Stanford University June 18, 2014 Chaganty, Liang (Stanford University) Moments and Likelihoods
More informationLECTURE 15 Markov chain Monte Carlo
LECTURE 15 Markov chain Monte Carlo There are many settings when posterior computation is a challenge in that one does not have a closed form expression for the posterior distribution. Markov chain Monte
More informationKalman filtering and friends: Inference in time series models. Herke van Hoof slides mostly by Michael Rubinstein
Kalman filtering and friends: Inference in time series models Herke van Hoof slides mostly by Michael Rubinstein Problem overview Goal Estimate most probable state at time k using measurement up to time
More informationPose Tracking II! Gordon Wetzstein! Stanford University! EE 267 Virtual Reality! Lecture 12! stanford.edu/class/ee267/!
Pose Tracking II! Gordon Wetzstein! Stanford University! EE 267 Virtual Reality! Lecture 12! stanford.edu/class/ee267/!! WARNING! this class will be dense! will learn how to use nonlinear optimization
More informationOutline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012
CSE 573: Artificial Intelligence Autumn 2012 Reasoning about Uncertainty & Hidden Markov Models Daniel Weld Many slides adapted from Dan Klein, Stuart Russell, Andrew Moore & Luke Zettlemoyer 1 Outline
More information11 The Max-Product Algorithm
Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms for Inference Fall 2014 11 The Max-Product Algorithm In the previous lecture, we introduced
More informationGaussian Processes as Continuous-time Trajectory Representations: Applications in SLAM and Motion Planning
Gaussian Processes as Continuous-time Trajectory Representations: Applications in SLAM and Motion Planning Jing Dong jdong@gatech.edu 2017-06-20 License CC BY-NC-SA 3.0 Discrete time SLAM Downsides: Measurements
More informationReview of similarity transformation and Singular Value Decomposition
Review of similarity transformation and Singular Value Decomposition Nasser M Abbasi Applied Mathematics Department, California State University, Fullerton July 8 7 page compiled on June 9, 5 at 9:5pm
More informationOnline Bayesian Transfer Learning for Sequential Data Modeling
Online Bayesian Transfer Learning for Sequential Data Modeling....? Priyank Jaini Machine Learning, Algorithms and Theory Lab Network for Aging Research 2 3 Data of personal preferences (years) Data (non-existent)
More informationSymbolic Perseus: a Generic POMDP Algorithm with Application to Dynamic Pricing with Demand Learning
Symbolic Perseus: a Generic POMDP Algorithm with Application to Dynamic Pricing with Demand Learning Pascal Poupart (University of Waterloo) INFORMS 2009 1 Outline Dynamic Pricing as a POMDP Symbolic Perseus
More informationDeep Learning Srihari. Deep Belief Nets. Sargur N. Srihari
Deep Belief Nets Sargur N. Srihari srihari@cedar.buffalo.edu Topics 1. Boltzmann machines 2. Restricted Boltzmann machines 3. Deep Belief Networks 4. Deep Boltzmann machines 5. Boltzmann machines for continuous
More informationSTA 414/2104: Machine Learning
STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far
More informationMathematical Formulation of Our Example
Mathematical Formulation of Our Example We define two binary random variables: open and, where is light on or light off. Our question is: What is? Computer Vision 1 Combining Evidence Suppose our robot
More informationPILCO: A Model-Based and Data-Efficient Approach to Policy Search
PILCO: A Model-Based and Data-Efficient Approach to Policy Search (M.P. Deisenroth and C.E. Rasmussen) CSC2541 November 4, 2016 PILCO Graphical Model PILCO Probabilistic Inference for Learning COntrol
More informationModeling Multiple-mode Systems with Predictive State Representations
Modeling Multiple-mode Systems with Predictive State Representations Britton Wolfe Computer Science Indiana University-Purdue University Fort Wayne wolfeb@ipfw.edu Michael R. James AI and Robotics Group
More informationTopics of Active Research in Reinforcement Learning Relevant to Spoken Dialogue Systems
Topics of Active Research in Reinforcement Learning Relevant to Spoken Dialogue Systems Pascal Poupart David R. Cheriton School of Computer Science University of Waterloo 1 Outline Review Markov Models
More informationDictionary Learning Using Tensor Methods
Dictionary Learning Using Tensor Methods Anima Anandkumar U.C. Irvine Joint work with Rong Ge, Majid Janzamin and Furong Huang. Feature learning as cornerstone of ML ML Practice Feature learning as cornerstone
More informationStein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm
Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm Qiang Liu and Dilin Wang NIPS 2016 Discussion by Yunchen Pu March 17, 2017 March 17, 2017 1 / 8 Introduction Let x R d
More informationDiffusion/Inference geometries of data features, situational awareness and visualization. Ronald R Coifman Mathematics Yale University
Diffusion/Inference geometries of data features, situational awareness and visualization Ronald R Coifman Mathematics Yale University Digital data is generally converted to point clouds in high dimensional
More informationPredictive Timing Models
Predictive Timing Models Pierre-Luc Bacon McGill University pbacon@cs.mcgill.ca Borja Balle McGill University bballe@cs.mcgill.ca Doina Precup McGill University dprecup@cs.mcgill.ca Abstract We consider
More informationPart 1: Expectation Propagation
Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 1: Expectation Propagation Tom Heskes Machine Learning Group, Institute for Computing and Information Sciences Radboud
More informationOnline Algorithms for Sum-Product
Online Algorithms for Sum-Product Networks with Continuous Variables Priyank Jaini Ph.D. Seminar Consistent/Robust Tensor Decomposition And Spectral Learning Offline Bayesian Learning ADF, EP, SGD, oem
More informationTime-Sensitive Dirichlet Process Mixture Models
Time-Sensitive Dirichlet Process Mixture Models Xiaojin Zhu Zoubin Ghahramani John Lafferty May 25 CMU-CALD-5-4 School of Computer Science Carnegie Mellon University Pittsburgh, PA 523 Abstract We introduce
More informationPredictive State Recurrent Neural Networks
Predictive State Recurrent Neural Networks Carlton Downey Carnegie Mellon University Pittsburgh, PA 15213 cmdowney@cs.cmu.edu Ahmed Hefny Carnegie Mellon University Pittsburgh, PA, 15213 ahefny@cs.cmu.edu
More informationMachine Learning Techniques for Computer Vision
Machine Learning Techniques for Computer Vision Part 2: Unsupervised Learning Microsoft Research Cambridge x 3 1 0.5 0.2 0 0.5 0.3 0 0.5 1 ECCV 2004, Prague x 2 x 1 Overview of Part 2 Mixture models EM
More informationECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering
ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering Lecturer: Nikolay Atanasov: natanasov@ucsd.edu Teaching Assistants: Siwei Guo: s9guo@eng.ucsd.edu Anwesan Pal:
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationAfternoon Meeting on Bayesian Computation 2018 University of Reading
Gabriele Abbati 1, Alessra Tosi 2, Seth Flaxman 3, Michael A Osborne 1 1 University of Oxford, 2 Mind Foundry Ltd, 3 Imperial College London Afternoon Meeting on Bayesian Computation 2018 University of
More informationHST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007
MIT OpenCourseWare http://ocw.mit.edu HST.582J / 6.555J / 16.456J Biomedical Signal and Image Processing Spring 2007 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
More informationCSEP 573: Artificial Intelligence
CSEP 573: Artificial Intelligence Hidden Markov Models Luke Zettlemoyer Many slides over the course adapted from either Dan Klein, Stuart Russell, Andrew Moore, Ali Farhadi, or Dan Weld 1 Outline Probabilistic
More informationA New View of Predictive State Methods for Dynamical System Learning
A New View of Predictive State Methods for Dynamical System Learning Ahmed Hefny Abstract Recently there has been substantial interest in predictive state methods for learning dynamical systems: these
More informationCS181 Midterm 2 Practice Solutions
CS181 Midterm 2 Practice Solutions 1. Convergence of -Means Consider Lloyd s algorithm for finding a -Means clustering of N data, i.e., minimizing the distortion measure objective function J({r n } N n=1,
More informationManaging Uncertainty
Managing Uncertainty Bayesian Linear Regression and Kalman Filter December 4, 2017 Objectives The goal of this lab is multiple: 1. First it is a reminder of some central elementary notions of Bayesian
More informationThe Kalman Filter ImPr Talk
The Kalman Filter ImPr Talk Ged Ridgway Centre for Medical Image Computing November, 2006 Outline What is the Kalman Filter? State Space Models Kalman Filter Overview Bayesian Updating of Estimates Kalman
More informationGraphical Models and Kernel Methods
Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.
More information1/30/2012. Reasoning with uncertainty III
Reasoning with uncertainty III 1 Temporal Models Assumptions 2 Example First-order: From d-separation, X t+1 is conditionally independent of X t-1 given X t No independence of measurements Yt Alternate
More informationCS60021: Scalable Data Mining. Dimensionality Reduction
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 1 CS60021: Scalable Data Mining Dimensionality Reduction Sourangshu Bhattacharya Assumption: Data lies on or near a
More informationSpectral Learning Part I
Spectral Learning Part I Geoff Gordon http://www.cs.cmu.edu/~ggordon/ Machine Learning Department Carnegie Mellon University http://www.cs.cmu.edu/ ~ ggordon/spectral-learning/ What is spectral learning?
More informationLearning the Linear Dynamical System with ASOS ( Approximated Second-Order Statistics )
Learning the Linear Dynamical System with ASOS ( Approximated Second-Order Statistics ) James Martens University of Toronto June 24, 2010 Computer Science UNIVERSITY OF TORONTO James Martens (U of T) Learning
More informationHidden Markov models
Hidden Markov models Charles Elkan November 26, 2012 Important: These lecture notes are based on notes written by Lawrence Saul. Also, these typeset notes lack illustrations. See the classroom lectures
More informationFinal Exam December 12, 2017
Introduction to Artificial Intelligence CSE 473, Autumn 2017 Dieter Fox Final Exam December 12, 2017 Directions This exam has 7 problems with 111 points shown in the table below, and you have 110 minutes
More informationFinal Exam, Fall 2002
15-781 Final Exam, Fall 22 1. Write your name and your andrew email address below. Name: Andrew ID: 2. There should be 17 pages in this exam (excluding this cover sheet). 3. If you need more room to work
More information