Learning about State. Geoff Gordon Machine Learning Department Carnegie Mellon University

Size: px
Start display at page:

Download "Learning about State. Geoff Gordon Machine Learning Department Carnegie Mellon University"

Transcription

1 Learning about State Geoff Gordon Machine Learning Department Carnegie Mellon University joint work with Byron Boots, Sajid Siddiqi, Le Song, Alex Smola

2 What s out there? ot-2 ot-1 ot ot+1 ot+2 2

3 What s out there? ot-2 ot-1 ot ot+1 ot+2 2

4 What s out there? ot-2 ot-1 ot ot+1 ot+2 steam rising from a grate 2

5 What s out there? A dynamical system Past ot-2 ot-1 ot ot+1 ot+2 State Future Dynamical system = recursive rule for updating state based on observations 3

6 What s out there? A dynamical system Past ot-2 ot-1 ot ot+1 ot+2 State Future Dynamical system = recursive rule for updating state based on observations 3

7 Learning a dynamical system Past Future ot-2 ot-1 ot ot+1 ot+2 Given past observations from a partially observable system State 4

8 Learning a dynamical system Past Future ot-2 ot-1 ot ot+1 ot+2 Given past observations from a partially observable system State 4

9 Learning a dynamical system Past Future ot-2 ot-1 ot ot+1 ot+2 Given past observations from a partially observable system State Predict future observations 4

10 Examples Baum-Welsh EM algorithm for HMMs Tomasi-Kanade structure from motion Black-Scholes model of stock price SLAM (from lidars, cameras, beacons, ) System identification for Kalman filters 5

11 A general principle predict data about past (many samples) state data about future (many samples) compress bottleneck expand 6

12 A general principle predict data about past (many samples) state data about future (many samples) compress bottleneck expand If bottleneck = rank constraint, get a spectral method 6

13 Why spectral methods? Many ways to learn models of dynamical systems max likelihood via EM, gradient descent, Bayesian inference via Gibbs, MH, In contrast to these, spectral methods give no local optima! huge gain in computational efficiency slight loss in statistical efficiency 7

14 Example: SSID for Kalman filter x = A x + noise o = C x + noise n n A m n C Past data = last k observations Future data = next k observations must have k and k big enough Prediction = linear regression look at empirical covariance of past & future Spectral: bottleneck = SVD of covariance [van Overschee & de Moor, 1996] 8

15 Kalman SSID x = A x + noise o = C x + noise Assume for simplicity m n, both A and C full rank For k 1, C E[o t+k o t ] = E[E[o t+k o t x t ]] A P C T = E[E[o t+k x t ]E[o t x t ]] = E[CA k x t (Cx t ) ] = CA k E[x t x t ]C = CA k PC 9

16 Kalman SSID Σ k = E[o t+k o t ] = CA k PC Let U = left n leading singular vectors of Σ1 Â Ĉ def = U Σ 2 (U Σ 1 ) = U CA 2 PC (U CAP C ) = (U CA)AP C (PC ) (U CA) 1 = SAS 1 def = U(SAS 1 ) 1 = USA 1 S 1 = U(U CA)A 1 S 1 = CS 1 10

17 Kalman SSID Algorithm: estimate Σ1 and Σ2 from data, get Û by SVD, plug in for  and Ĉ Consistent: continuity of formulas for  and Ĉ, law of large numbers for Σ1 and Σ2 wrinkle: SVD for Û isn t continuous, but range(û) is Can also recover steady-state x 11

18 Variations Use arbitrary features of length-k window of past and future observations work from covariance of past, future features good features make a big difference in practice Impose constraints on learned model (e.g., stability) 12

19 Kalman SSID: example Works well for video textures steam grate example above fountain: observation = raw pixels (vector of reals over time) 13

20 Structure from motion feature 1, step 2 x 11 y 11 x 12 y x 1T y 1T x 21 y 21 x 22 y x 2T y 2T x N1 y N1 x N2 y N2... x NT y NT Track N features over T steps [Tomasi & Kanade, 1992] 14

21 Structure from motion x 11 y 11 x 12 y x 1T y 1T x 21 y 21 x 22 y x 2T y 2T x N1 y N1 x N2 y N2... x NT y NT xit is projection of feature i onto camera s horizontal axis at time t (and yit, vertical) [ui, vi, wi] = feature i coordinates [h1t, h2t, h3t] = camera horizontal [v1t, v2t, v3t] = camera vertical o o o o 15

22 Structure from motion u 1 v 1 w 1 u 2 v 2 w u N v N w N h 11 v 11 h 12 v h 1T v 1T h 21 v 21 h 22 v h 2T v 2T h 31 v 31 h 32 v h 3T v 3T xit is projection of feature i onto camera s horizontal axis at time t (and yit, vertical) [ui, vi, wi] = feature i coordinates [h1t, h2t, h3t] = camera horizontal [v1t, v2t, v3t] = camera vertical o o o o 16

23 Structure from motion only determined up to an invertible transform 17

24 SfM as SSID cov = x 11 y 11 x 12 y x 1T y 1T x 21 y 21 x 22 y x 2T y 2T x N1 y N1 x N2 y N2... x NT y NT Past data: indicator of time step & h/v axis means we get to memorize each time step no attempt to learn dynamics Future data: observed screen coordinates (column of matrix) 18

25 Kalman SSID: failure HMM (Baum-Welsh) Kalman Filter (SSID) Preview all models: 10 latent dimensions 19

26 Kalman SSID: failure HMM (Baum-Welsh) Kalman Filter (SSID) Preview all models: 10 latent dimensions 19

27 Kalman SSID: failure HMM (Baum-Welsh) Kalman Filter (SSID) Preview all models: 10 latent dimensions 19

28 Kalman SSID: failure HMM (Baum-Welsh) Kalman Filter (SSID) Preview all models: 10 latent dimensions 19

29 Can we generalize? x = A x + noise o = C x + noise n n A m n C Get rid of Gaussian noise assumption HMM: same form as Kalman filter, but A 0, A1 = 1, C 0, C1 = 1 noise ~ multinomial x, o are indicators: e.g., 4 = [ ] T 20

30 Derivations for Kalman v. HMM Kalman filter HMM E[o t+k o t ] = E[E[o t+k o t x t ]] = E[E[o t+k x t ]E[o t x t ]] = E[CA k x t (Cx t ) ] = CA k E[x t x t ]C = CA k PC Assume for simplicity m n, both A and C full rank 21

31 Derivations for Kalman v. HMM Kalman filter E[o t+k o t ] = E[E[o t+k o t x t ]] = E[E[o t+k x t ]E[o t x t ]] = E[CA k x t (Cx t ) ] = CA k E[x t x t ]C = CA k PC HMM E[o t+k o t ] = E[E[o t+k o t x t ]] = E[E[o t+k x t ]E[o t x t ]] = E[CA k x t (Cx t ) ] = CA k E[x t x t ]C = CA k PC Assume for simplicity m n, both A and C full rank 21

32 HMM SSID: first try U Σ 2 (U Σ 1 ) = U CA 2 PC (U CAP C ) = (U CA)AP C (PC ) (U CA) 1 = SAS 1 As before, recover  and Ĉ from E[ot+1ot T ] & E[ot+2ot T ] C A P C T Doesn t satisfy A 0, A1 = 1, C 0, C1 = 1 is this a problem? 22

33 Merging A & C HMM tracking: write bt = P[xt o1:t] P[xt o1:t-1] = bt-0.5 = A bt-1 P[ot o1:t-1] = C bt-0.5 if ot = o: P(xt=x o1:t) = P(o xt=x) P(xt=x o1:t-1) / Z i.e., bt = diag(co,:) bt-0.5 / Z Write Ao = diag(co,:) A bt = Ao bt-1 / Z where Z = P(ot=o bt-1) 23

34 Merging A & C HMM tracking: write bt = P[xt o1:t] P[xt o1:t-1] = bt-0.5 = A bt-1 P[ot o1:t-1] = C bt-0.5 if ot = o: P(xt=x o1:t) = P(o xt=x) P(xt=x o1:t-1) / Z i.e., bt = diag(co,:) bt-0.5 / Z Write Ao = diag(co,:) A bt = Ao bt-1 / Z where Z = P(ot=o bt-1) It s enough to estimate Ao 23

35 Merging A & C HMM tracking: write bt = P[xt o1:t] P[xt o1:t-1] = bt-0.5 = A bt-1 P[ot o1:t-1] = C bt-0.5 if ot = o: P(xt=x o1:t) = P(o xt=x) P(xt=x o1:t-1) / Z i.e., bt = diag(co,:) bt-0.5 / Z Write Ao = diag(co,:) A bt = Ao bt-1 / Z where Z = P(ot=o bt-1) It s enough to estimate Ao P(ot=o bt-1) = 1 T Ao bt-1 23

36 HMM SSID: try #2 Σ o 2 def = E[o t+2 (δ o o t+1 )o t ] = E[E[o t+2 (δ o o t+1 )o t x t ]] = E[E[o t+2 (δ o o t+1 ) x t ]E[o t x t ]] = E[E[o t+2 x t,o t+1 = o]p[o t+1 = o x t ](Cx t ) ] = E[E[o t+2 x t,o t+1 = o](1 A o x t )(Cx t ) ] Ao x t = E CA 1 (1 A o x t )(Cx t ) A o x t = E[CAA o x t (Cx t ) ] = CAA o E[x t x t ]C = CAA o PC 24

37 HMM SSID: try #2 Â o def = U Σ o 2(U Σ 1 ) = U CAA o PC (U CAP C ) = (U CA)A o PC (PC ) (U CA) 1 = SA o S 1 x = Aox / P(o) o ~ Cx Co,: = e T Ao o 2 Estimate Σ1 and Σ from data; get Û = SVD(Σ1) Plug in to get Âo (for each o) Also need e = S -1 1 = leading left eigenvector of A1 + A2 + 25

38 Example: clock Discrete observations: sampled frames from training video when tracking: nearest neighbor or Parzen windows (mixture of Gaussians HMM) 10 latent dimensions 26

39 Can we generalize? HMMs had x Δ intuition: number of discrete states = number of dimensions We now have x SΔ essentially equally restrictive Can we allow x X for general X? # states > # dims 27

40 # states > # dims: the picture N=3 N=15 N=100 Random projections of N-dimensional simplex 28

41 SSID for OOMs PSRs without actions, multiplicity automata, OOM: x = Aox / P(o) o ~ Cx HMM: x = Aox / P(o) o ~ Cx Co,: = e T Ao Co,: = e T Ao OOM: defined by transition matrices Ao, normalization vector e like HMM, but lift restriction of X = SΔ instead of Aox 0, have Aox λx, λ 0 includes HMM as special case 29

42 OOM SSID No change! 30

43 OOM example No change! our HMM SSID was actually learning OOMs all along 31

Learning about State. Geoff Gordon Machine Learning Department Carnegie Mellon University

Learning about State. Geoff Gordon Machine Learning Department Carnegie Mellon University Learning about State Geoff Gordon Machine Learning Department Carnegie Mellon University joint work with Byron Boots, Sajid Siddiqi, Le Song, Alex Smola 2 What s out there?...... ot-2 ot-1 ot ot+1 ot+2

More information

An Online Spectral Learning Algorithm for Partially Observable Nonlinear Dynamical Systems

An Online Spectral Learning Algorithm for Partially Observable Nonlinear Dynamical Systems An Online Spectral Learning Algorithm for Partially Observable Nonlinear Dynamical Systems Byron Boots and Geoffrey J. Gordon AAAI 2011 Select Lab Carnegie Mellon University What is out there?...... o

More information

Spectral learning algorithms for dynamical systems

Spectral learning algorithms for dynamical systems Spectral learning algorithms for dynamical systems Geoff Gordon http://www.cs.cmu.edu/~ggordon/ Machine Learning Department Carnegie Mellon University joint work with Byron Boots, Sajid Siddiqi, Le Song,

More information

Reduced-Rank Hidden Markov Models

Reduced-Rank Hidden Markov Models Reduced-Rank Hidden Markov Models Sajid M. Siddiqi Byron Boots Geoffrey J. Gordon Carnegie Mellon University ... x 1 x 2 x 3 x τ y 1 y 2 y 3 y τ Sequence of observations: Y =[y 1 y 2 y 3... y τ ] Assume

More information

A Constraint Generation Approach to Learning Stable Linear Dynamical Systems

A Constraint Generation Approach to Learning Stable Linear Dynamical Systems A Constraint Generation Approach to Learning Stable Linear Dynamical Systems Sajid M. Siddiqi Byron Boots Geoffrey J. Gordon Carnegie Mellon University NIPS 2007 poster W22 steam Application: Dynamic Textures

More information

Hilbert Space Embeddings of Hidden Markov Models

Hilbert Space Embeddings of Hidden Markov Models Hilbert Space Embeddings of Hidden Markov Models Le Song Carnegie Mellon University Joint work with Byron Boots, Sajid Siddiqi, Geoff Gordon and Alex Smola 1 Big Picture QuesJon Graphical Models! Dependent

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin 1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA Contents in latter part Linear Dynamical Systems What is different from HMM? Kalman filter Its strength and limitation Particle Filter

More information

Learning the Linear Dynamical System with ASOS ( Approximated Second-Order Statistics )

Learning the Linear Dynamical System with ASOS ( Approximated Second-Order Statistics ) Learning the Linear Dynamical System with ASOS ( Approximated Second-Order Statistics ) James Martens University of Toronto June 24, 2010 Computer Science UNIVERSITY OF TORONTO James Martens (U of T) Learning

More information

CS281 Section 4: Factor Analysis and PCA

CS281 Section 4: Factor Analysis and PCA CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we

More information

Estimating Covariance Using Factorial Hidden Markov Models

Estimating Covariance Using Factorial Hidden Markov Models Estimating Covariance Using Factorial Hidden Markov Models João Sedoc 1,2 with: Jordan Rodu 3, Lyle Ungar 1, Dean Foster 1 and Jean Gallier 1 1 University of Pennsylvania Philadelphia, PA joao@cis.upenn.edu

More information

Graphical Models for Collaborative Filtering

Graphical Models for Collaborative Filtering Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin

More information

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010 Hidden Markov Models Aarti Singh Slides courtesy: Eric Xing Machine Learning 10-701/15-781 Nov 8, 2010 i.i.d to sequential data So far we assumed independent, identically distributed data Sequential data

More information

Kalman Filter Computer Vision (Kris Kitani) Carnegie Mellon University

Kalman Filter Computer Vision (Kris Kitani) Carnegie Mellon University Kalman Filter 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University Examples up to now have been discrete (binary) random variables Kalman filtering can be seen as a special case of a temporal

More information

arxiv: v1 [cs.lg] 10 Jul 2012

arxiv: v1 [cs.lg] 10 Jul 2012 A Spectral Learning Approach to Range-Only SLAM arxiv:1207.2491v1 [cs.lg] 10 Jul 2012 Byron Boots Machine Learning Department Carnegie Mellon University Pittsburgh, PA 15213 beb@cs.cmu.edu Abstract Geoffrey

More information

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent Unsupervised Machine Learning and Data Mining DS 5230 / DS 4420 - Fall 2018 Lecture 7 Jan-Willem van de Meent DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Dimensionality Reduction Goal:

More information

Linear Dynamical Systems (Kalman filter)

Linear Dynamical Systems (Kalman filter) Linear Dynamical Systems (Kalman filter) (a) Overview of HMMs (b) From HMMs to Linear Dynamical Systems (LDS) 1 Markov Chains with Discrete Random Variables x 1 x 2 x 3 x T Let s assume we have discrete

More information

Introduction to Graphical Models

Introduction to Graphical Models Introduction to Graphical Models The 15 th Winter School of Statistical Physics POSCO International Center & POSTECH, Pohang 2018. 1. 9 (Tue.) Yung-Kyun Noh GENERALIZATION FOR PREDICTION 2 Probabilistic

More information

Principal components analysis COMS 4771

Principal components analysis COMS 4771 Principal components analysis COMS 4771 1. Representation learning Useful representations of data Representation learning: Given: raw feature vectors x 1, x 2,..., x n R d. Goal: learn a useful feature

More information

Factor Analysis and Kalman Filtering (11/2/04)

Factor Analysis and Kalman Filtering (11/2/04) CS281A/Stat241A: Statistical Learning Theory Factor Analysis and Kalman Filtering (11/2/04) Lecturer: Michael I. Jordan Scribes: Byung-Gon Chun and Sunghoon Kim 1 Factor Analysis Factor analysis is used

More information

Lecture 21: Spectral Learning for Graphical Models

Lecture 21: Spectral Learning for Graphical Models 10-708: Probabilistic Graphical Models 10-708, Spring 2016 Lecture 21: Spectral Learning for Graphical Models Lecturer: Eric P. Xing Scribes: Maruan Al-Shedivat, Wei-Cheng Chang, Frederick Liu 1 Motivation

More information

Least Squares Estimation Namrata Vaswani,

Least Squares Estimation Namrata Vaswani, Least Squares Estimation Namrata Vaswani, namrata@iastate.edu Least Squares Estimation 1 Recall: Geometric Intuition for Least Squares Minimize J(x) = y Hx 2 Solution satisfies: H T H ˆx = H T y, i.e.

More information

Linear Dynamical Systems

Linear Dynamical Systems Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations

More information

Kernel Bayes Rule: Nonparametric Bayesian inference with kernels

Kernel Bayes Rule: Nonparametric Bayesian inference with kernels Kernel Bayes Rule: Nonparametric Bayesian inference with kernels Kenji Fukumizu The Institute of Statistical Mathematics NIPS 2012 Workshop Confluence between Kernel Methods and Graphical Models December

More information

Overfitting, Bias / Variance Analysis

Overfitting, Bias / Variance Analysis Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 12 Jan-Willem van de Meent (credit: Yijun Zhao, Percy Liang) DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Linear Dimensionality

More information

9 Multi-Model State Estimation

9 Multi-Model State Estimation Technion Israel Institute of Technology, Department of Electrical Engineering Estimation and Identification in Dynamical Systems (048825) Lecture Notes, Fall 2009, Prof. N. Shimkin 9 Multi-Model State

More information

Least Squares and Kalman Filtering Questions: me,

Least Squares and Kalman Filtering Questions:  me, Least Squares and Kalman Filtering Questions: Email me, namrata@ece.gatech.edu Least Squares and Kalman Filtering 1 Recall: Weighted Least Squares y = Hx + e Minimize Solution: J(x) = (y Hx) T W (y Hx)

More information

Hidden Markov models

Hidden Markov models Hidden Markov models Charles Elkan November 26, 2012 Important: These lecture notes are based on notes written by Lawrence Saul. Also, these typeset notes lack illustrations. See the classroom lectures

More information

Spectral Learning Part I

Spectral Learning Part I Spectral Learning Part I Geoff Gordon http://www.cs.cmu.edu/~ggordon/ Machine Learning Department Carnegie Mellon University http://www.cs.cmu.edu/ ~ ggordon/spectral-learning/ What is spectral learning?

More information

A Constraint Generation Approach to Learning Stable Linear Dynamical Systems

A Constraint Generation Approach to Learning Stable Linear Dynamical Systems A Constraint Generation Approach to Learning Stable Linear Dynamical Systems Sajid M Siddiqi Robotics Institute Carnegie-Mellon University Pittsburgh, PA 15213 siddiqi@cscmuedu Byron Boots Computer Science

More information

A Constraint Generation Approach to Learning Stable Linear Dynamical Systems

A Constraint Generation Approach to Learning Stable Linear Dynamical Systems A Constraint Generation Approach to Learning Stable Linear Dynamical Systems Sajid M. Siddiqi Robotics Institute Carnegie-Mellon University Pittsburgh, PA 15213 siddiqi@cs.cmu.edu Byron Boots Computer

More information

MACHINE LEARNING ADVANCED MACHINE LEARNING

MACHINE LEARNING ADVANCED MACHINE LEARNING MACHINE LEARNING ADVANCED MACHINE LEARNING Recap of Important Notions on Estimation of Probability Density Functions 22 MACHINE LEARNING Discrete Probabilities Consider two variables and y taking discrete

More information

L11: Pattern recognition principles

L11: Pattern recognition principles L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction

More information

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Linear Regression Regression Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Example: Height, Gender, Weight Shoe Size Audio features

More information

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Linear Regression Regression Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Example: Height, Gender, Weight Shoe Size Audio features

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

Machine Learning (CS 567) Lecture 5

Machine Learning (CS 567) Lecture 5 Machine Learning (CS 567) Lecture 5 Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol

More information

order is number of previous outputs

order is number of previous outputs Markov Models Lecture : Markov and Hidden Markov Models PSfrag Use past replacements as state. Next output depends on previous output(s): y t = f[y t, y t,...] order is number of previous outputs y t y

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

CS181 Midterm 2 Practice Solutions

CS181 Midterm 2 Practice Solutions CS181 Midterm 2 Practice Solutions 1. Convergence of -Means Consider Lloyd s algorithm for finding a -Means clustering of N data, i.e., minimizing the distortion measure objective function J({r n } N n=1,

More information

Hilbert Space Embeddings of Hidden Markov Models

Hilbert Space Embeddings of Hidden Markov Models Le Song lesong@cs.cmu.edu Byron Boots beb@cs.cmu.edu School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA Sajid M. Siddiqi siddiqi@google.com Google, Pittsburgh, PA 15213,

More information

Linear Regression (continued)

Linear Regression (continued) Linear Regression (continued) Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 6, 2017 1 / 39 Outline 1 Administration 2 Review of last lecture 3 Linear regression

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Hidden Markov Models Barnabás Póczos & Aarti Singh Slides courtesy: Eric Xing i.i.d to sequential data So far we assumed independent, identically distributed

More information

CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) CSCI567 Machine Learning (Fall 24) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu October 2, 24 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 24) October 2, 24 / 24 Outline Review

More information

Machine Learning for Data Science (CS4786) Lecture 24

Machine Learning for Data Science (CS4786) Lecture 24 Machine Learning for Data Science (CS4786) Lecture 24 HMM Particle Filter Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2017fa/ Rejection Sampling Rejection Sampling Sample variables from joint

More information

Optical Flow, Motion Segmentation, Feature Tracking

Optical Flow, Motion Segmentation, Feature Tracking BIL 719 - Computer Vision May 21, 2014 Optical Flow, Motion Segmentation, Feature Tracking Aykut Erdem Dept. of Computer Engineering Hacettepe University Motion Optical Flow Motion Segmentation Feature

More information

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010 Linear Regression Aarti Singh Machine Learning 10-701/15-781 Sept 27, 2010 Discrete to Continuous Labels Classification Sports Science News Anemic cell Healthy cell Regression X = Document Y = Topic X

More information

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given

More information

Kernel Methods. Barnabás Póczos

Kernel Methods. Barnabás Póczos Kernel Methods Barnabás Póczos Outline Quick Introduction Feature space Perceptron in the feature space Kernels Mercer s theorem Finite domain Arbitrary domain Kernel families Constructing new kernels

More information

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing Hidden Markov Models By Parisa Abedi Slides courtesy: Eric Xing i.i.d to sequential data So far we assumed independent, identically distributed data Sequential (non i.i.d.) data Time-series data E.g. Speech

More information

PCA and admixture models

PCA and admixture models PCA and admixture models CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar, Alkes Price PCA and admixture models 1 / 57 Announcements HW1

More information

ECE521 Lecture 19 HMM cont. Inference in HMM

ECE521 Lecture 19 HMM cont. Inference in HMM ECE521 Lecture 19 HMM cont. Inference in HMM Outline Hidden Markov models Model definitions and notations Inference in HMMs Learning in HMMs 2 Formally, a hidden Markov model defines a generative process

More information

Lecture: Face Recognition and Feature Reduction

Lecture: Face Recognition and Feature Reduction Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab 1 Recap - Curse of dimensionality Assume 5000 points uniformly distributed in the

More information

Linear Regression (9/11/13)

Linear Regression (9/11/13) STA561: Probabilistic machine learning Linear Regression (9/11/13) Lecturer: Barbara Engelhardt Scribes: Zachary Abzug, Mike Gloudemans, Zhuosheng Gu, Zhao Song 1 Why use linear regression? Figure 1: Scatter

More information

Dynamic models 1 Kalman filters, linearization,

Dynamic models 1 Kalman filters, linearization, Koller & Friedman: Chapter 16 Jordan: Chapters 13, 15 Uri Lerner s Thesis: Chapters 3,9 Dynamic models 1 Kalman filters, linearization, Switching KFs, Assumed density filters Probabilistic Graphical Models

More information

A Spectral Algorithm for Learning Hidden Markov Models

A Spectral Algorithm for Learning Hidden Markov Models A Spectral Algorithm for Learning Hidden Markov Models Daniel Hsu,2, Sham M. Kakade 2, and Tong Zhang Rutgers University, Piscataway, NJ 08854 2 University of Pennsylvania, Philadelphia, PA 904 Abstract

More information

CS 4495 Computer Vision Principle Component Analysis

CS 4495 Computer Vision Principle Component Analysis CS 4495 Computer Vision Principle Component Analysis (and it s use in Computer Vision) Aaron Bobick School of Interactive Computing Administrivia PS6 is out. Due *** Sunday, Nov 24th at 11:55pm *** PS7

More information

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop Modeling Data with Linear Combinations of Basis Functions Read Chapter 3 in the text by Bishop A Type of Supervised Learning Problem We want to model data (x 1, t 1 ),..., (x N, t N ), where x i is a vector

More information

HST.582J/6.555J/16.456J

HST.582J/6.555J/16.456J Blind Source Separation: PCA & ICA HST.582J/6.555J/16.456J Gari D. Clifford gari [at] mit. edu http://www.mit.edu/~gari G. D. Clifford 2005-2009 What is BSS? Assume an observation (signal) is a linear

More information

Probabilistic Graphical Models

Probabilistic Graphical Models 2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector

More information

Machine Learning Techniques for Computer Vision

Machine Learning Techniques for Computer Vision Machine Learning Techniques for Computer Vision Part 2: Unsupervised Learning Microsoft Research Cambridge x 3 1 0.5 0.2 0 0.5 0.3 0 0.5 1 ECCV 2004, Prague x 2 x 1 Overview of Part 2 Mixture models EM

More information

System 1 (last lecture) : limited to rigidly structured shapes. System 2 : recognition of a class of varying shapes. Need to:

System 1 (last lecture) : limited to rigidly structured shapes. System 2 : recognition of a class of varying shapes. Need to: System 2 : Modelling & Recognising Modelling and Recognising Classes of Classes of Shapes Shape : PDM & PCA All the same shape? System 1 (last lecture) : limited to rigidly structured shapes System 2 :

More information

Machine Learning 4771

Machine Learning 4771 Machine Learning 4771 Instructor: ony Jebara Kalman Filtering Linear Dynamical Systems and Kalman Filtering Structure from Motion Linear Dynamical Systems Audio: x=pitch y=acoustic waveform Vision: x=object

More information

Machine Learning for Signal Processing Bayes Classification

Machine Learning for Signal Processing Bayes Classification Machine Learning for Signal Processing Bayes Classification Class 16. 24 Oct 2017 Instructor: Bhiksha Raj - Abelino Jimenez 11755/18797 1 Recap: KNN A very effective and simple way of performing classification

More information

Principal Component Analysis CS498

Principal Component Analysis CS498 Principal Component Analysis CS498 Today s lecture Adaptive Feature Extraction Principal Component Analysis How, why, when, which A dual goal Find a good representation The features part Reduce redundancy

More information

26 : Spectral GMs. Lecturer: Eric P. Xing Scribes: Guillermo A Cidre, Abelino Jimenez G.

26 : Spectral GMs. Lecturer: Eric P. Xing Scribes: Guillermo A Cidre, Abelino Jimenez G. 10-708: Probabilistic Graphical Models, Spring 2015 26 : Spectral GMs Lecturer: Eric P. Xing Scribes: Guillermo A Cidre, Abelino Jimenez G. 1 Introduction A common task in machine learning is to work with

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 12: Gaussian Belief Propagation, State Space Models and Kalman Filters Guest Kalman Filter Lecture by

More information

Tensor Decompositions for Machine Learning. G. Roeder 1. UBC Machine Learning Reading Group, June University of British Columbia

Tensor Decompositions for Machine Learning. G. Roeder 1. UBC Machine Learning Reading Group, June University of British Columbia Network Feature s Decompositions for Machine Learning 1 1 Department of Computer Science University of British Columbia UBC Machine Learning Group, June 15 2016 1/30 Contact information Network Feature

More information

Principal Component Analysis

Principal Component Analysis B: Chapter 1 HTF: Chapter 1.5 Principal Component Analysis Barnabás Póczos University of Alberta Nov, 009 Contents Motivation PCA algorithms Applications Face recognition Facial expression recognition

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

Kalman filtering and friends: Inference in time series models. Herke van Hoof slides mostly by Michael Rubinstein

Kalman filtering and friends: Inference in time series models. Herke van Hoof slides mostly by Michael Rubinstein Kalman filtering and friends: Inference in time series models Herke van Hoof slides mostly by Michael Rubinstein Problem overview Goal Estimate most probable state at time k using measurement up to time

More information

IV. Matrix Approximation using Least-Squares

IV. Matrix Approximation using Least-Squares IV. Matrix Approximation using Least-Squares The SVD and Matrix Approximation We begin with the following fundamental question. Let A be an M N matrix with rank R. What is the closest matrix to A that

More information

Expectation Maximization Algorithm

Expectation Maximization Algorithm Expectation Maximization Algorithm Vibhav Gogate The University of Texas at Dallas Slides adapted from Carlos Guestrin, Dan Klein, Luke Zettlemoyer and Dan Weld The Evils of Hard Assignments? Clusters

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 13: Learning in Gaussian Graphical Models, Non-Gaussian Inference, Monte Carlo Methods Some figures

More information

Lecture: Face Recognition and Feature Reduction

Lecture: Face Recognition and Feature Reduction Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab Lecture 11-1 Recap - Curse of dimensionality Assume 5000 points uniformly distributed

More information

Advanced Introduction to Machine Learning CMU-10715

Advanced Introduction to Machine Learning CMU-10715 Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes Barnabás Póczos http://www.gaussianprocess.org/ 2 Some of these slides in the intro are taken from D. Lizotte, R. Parr, C. Guesterin

More information

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang. Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning

More information

CS325 Artificial Intelligence Chs. 18 & 4 Supervised Machine Learning (cont)

CS325 Artificial Intelligence Chs. 18 & 4 Supervised Machine Learning (cont) CS325 Artificial Intelligence Cengiz Spring 2013 Model Complexity in Learning f(x) x Model Complexity in Learning f(x) x Let s start with the linear case... Linear Regression Linear Regression price =

More information

CS168: The Modern Algorithmic Toolbox Lecture #7: Understanding Principal Component Analysis (PCA)

CS168: The Modern Algorithmic Toolbox Lecture #7: Understanding Principal Component Analysis (PCA) CS68: The Modern Algorithmic Toolbox Lecture #7: Understanding Principal Component Analysis (PCA) Tim Roughgarden & Gregory Valiant April 0, 05 Introduction. Lecture Goal Principal components analysis

More information

Dimension Reduction. David M. Blei. April 23, 2012

Dimension Reduction. David M. Blei. April 23, 2012 Dimension Reduction David M. Blei April 23, 2012 1 Basic idea Goal: Compute a reduced representation of data from p -dimensional to q-dimensional, where q < p. x 1,...,x p z 1,...,z q (1) We want to do

More information

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 2: From Linear Regression to Kalman Filter and Beyond Lecture 2: From Linear Regression to Kalman Filter and Beyond January 18, 2017 Contents 1 Batch and Recursive Estimation 2 Towards Bayesian Filtering 3 Kalman Filter and Bayesian Filtering and Smoothing

More information

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 7: Matrix completion Yuejie Chi The Ohio State University Page 1 Reference Guaranteed Minimum-Rank Solutions of Linear

More information

ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering

ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering Lecturer: Nikolay Atanasov: natanasov@ucsd.edu Teaching Assistants: Siwei Guo: s9guo@eng.ucsd.edu Anwesan Pal:

More information

Probabilistic Time Series Classification

Probabilistic Time Series Classification Probabilistic Time Series Classification Y. Cem Sübakan Boğaziçi University 25.06.2013 Y. Cem Sübakan (Boğaziçi University) M.Sc. Thesis Defense 25.06.2013 1 / 54 Problem Statement The goal is to assign

More information

Managing Uncertainty

Managing Uncertainty Managing Uncertainty Bayesian Linear Regression and Kalman Filter December 4, 2017 Objectives The goal of this lab is multiple: 1. First it is a reminder of some central elementary notions of Bayesian

More information

Mixture Models and Expectation-Maximization

Mixture Models and Expectation-Maximization Mixture Models and Expectation-Maximiation David M. Blei March 9, 2012 EM for mixtures of multinomials The graphical model for a mixture of multinomials π d x dn N D θ k K How should we fit the parameters?

More information

Learning the Linear Dynamical System with ASOS

Learning the Linear Dynamical System with ASOS James Martens University of Toronto, Ontario, M5S A, Canada JMARTENS@CS.TORONTO.EDU Abstract We develop a new algorithm, based on EM, for learning the Linear Dynamical System model. Called the method of

More information

Large-scale Collaborative Prediction Using a Nonparametric Random Effects Model

Large-scale Collaborative Prediction Using a Nonparametric Random Effects Model Large-scale Collaborative Prediction Using a Nonparametric Random Effects Model Kai Yu Joint work with John Lafferty and Shenghuo Zhu NEC Laboratories America, Carnegie Mellon University First Prev Page

More information

EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6)

EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6) EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6) Gordon Wetzstein gordon.wetzstein@stanford.edu This document serves as a supplement to the material discussed in

More information

Collaborative Filtering. Radek Pelánek

Collaborative Filtering. Radek Pelánek Collaborative Filtering Radek Pelánek 2017 Notes on Lecture the most technical lecture of the course includes some scary looking math, but typically with intuitive interpretation use of standard machine

More information

Advanced Introduction to Machine Learning

Advanced Introduction to Machine Learning 10-715 Advanced Introduction to Machine Learning Homework 3 Due Nov 12, 10.30 am Rules 1. Homework is due on the due date at 10.30 am. Please hand over your homework at the beginning of class. Please see

More information

6.801/866. Affine Structure from Motion. T. Darrell

6.801/866. Affine Structure from Motion. T. Darrell 6.801/866 Affine Structure from Motion T. Darrell [Read F&P Ch. 12.0, 12.2, 12.3, 12.4] Affine geometry is, roughly speaking, what is left after all ability to measure lengths, areas, angles, etc. has

More information

Graphical Models. Outline. HMM in short. HMMs. What about continuous HMMs? p(o t q t ) ML 701. Anna Goldenberg ... t=1. !

Graphical Models. Outline. HMM in short. HMMs. What about continuous HMMs? p(o t q t ) ML 701. Anna Goldenberg ... t=1. ! Outline Graphical Models ML 701 nna Goldenberg! ynamic Models! Gaussian Linear Models! Kalman Filter! N! Undirected Models! Unification! Summary HMMs HMM in short! is a ayes Net hidden states! satisfies

More information

Deriving Principal Component Analysis (PCA)

Deriving Principal Component Analysis (PCA) -0 Mathematical Foundations for Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Deriving Principal Component Analysis (PCA) Matt Gormley Lecture 11 Oct.

More information

Gaussian Processes for Audio Feature Extraction

Gaussian Processes for Audio Feature Extraction Gaussian Processes for Audio Feature Extraction Dr. Richard E. Turner (ret26@cam.ac.uk) Computational and Biological Learning Lab Department of Engineering University of Cambridge Machine hearing pipeline

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 4, 2015 Today: Generative discriminative classifiers Linear regression Decomposition of error into

More information