Learning about State. Geoff Gordon Machine Learning Department Carnegie Mellon University

Size: px
Start display at page:

Download "Learning about State. Geoff Gordon Machine Learning Department Carnegie Mellon University"

Transcription

1 Learning about State Geoff Gordon Machine Learning Department Carnegie Mellon University joint work with Byron Boots, Sajid Siddiqi, Le Song, Alex Smola

2 2 What s out there? ot-2 ot-1 ot ot+1 ot+2

3 2 What s out there? ot-2 ot-1 ot ot+1 ot+2

4 2 What s out there? ot-2 ot-1 ot ot+1 ot+2 steam rising from a grate

5 What s out there? A dynamical system Past Future ot-2 ot-1 ot ot+1 ot+2 State Dynamical system = recursive rule for updating state based on observations 3

6 What s out there? A dynamical system Past Future ot-2 ot-1 ot ot+1 ot+2 State Dynamical system = recursive rule for updating state based on observations 3

7 Learning a dynamical system Past Future ot-2 ot-1 ot ot+1 ot+2 Given past observations from a partially observable system State 4

8 Learning a dynamical system Past Future ot-2 ot-1 ot ot+1 ot+2 Given past observations from a partially observable system State 4

9 Learning a dynamical system Past Future ot-2 ot-1 ot ot+1 ot+2 Given past observations from a partially observable system State Predict future observations 4

10 5 Examples Baum-Welsh EM algorithm for HMMs Tomasi-Kanade structure from motion Black-Scholes model of stock price SLAM (from lidars, cameras, beacons, ) System identification for Kalman filters

11 6 A general principle predict data about past (many samples) state data about future (many samples) compress bottleneck expand

12 6 A general principle predict data about past (many samples) state data about future (many samples) compress bottleneck expand If bottleneck = rank constraint, get a spectral method

13 7 Why spectral methods? Many ways to learn models of dynamical systems max likelihood via EM, gradient descent, Bayesian inference via Gibbs, MH, In contrast to these, spectral methods give no local optima! huge gain in computational efficiency slight loss in statistical efficiency

14 8 Example: SSID for Kalman filter n n x = A x + noise o = C x + noise n A m C Past data = last k observations Future data = next k observations must have k and k big enough Prediction = linear regression look at empirical covariance of past & future Spectral: bottleneck = SVD of covariance [van Overschee & de Moor, 1996]

15 Kalman SSID x = A x + noise o = C x + noise C A P C T Assume for simplicity m n, both A and C full rank For k 1, E[o t+k o t ] = E[E[o t+k o t x t ]] = E[E[o t+k x t ]E[o t x t ]] = E[CA k x t (Cx t ) ] = CA k E[x t x t ]C = CA k PC 9

16 10 Kalman SSID Σ k = E[o t+k o t ] = CA k PC Let U = left n leading singular vectors of Σ1 Â Ĉ def = U Σ 2 (U Σ 1 ) = U CA 2 PC (U CAP C ) = (U CA)AP C (PC ) (U CA) 1 = SAS 1 def = U(SAS 1 ) 1 = USA 1 S 1 = U(U CA)A 1 S 1 = CS 1

17 11 Kalman SSID Algorithm: estimate Σ1 and Σ2 from data, get Û by SVD, plug in for  and Ĉ Consistent: continuity of formulas for  and Ĉ, law of large numbers for Σ1 and Σ2 wrinkle: SVD for Û isn t continuous, but range(û) is Can also recover steady-state x

18 12 Variations Use arbitrary features of length-k window of past and future observations work from covariance of past, future features good features make a big difference in practice Impose constraints on learned model (e.g., stability)

19 13 Kalman SSID: example Works well for video textures steam grate example above fountain: observation = raw pixels (vector of reals over time)

20 14 Structure from motion feature 1, step 2 x 11 y 11 x 12 y x 1T y 1T x 21 y 21 x 22 y x 2T y 2T x N1 y N1 x N2 y N2... x NT y NT Track N features over T steps [Tomasi & Kanade, 1992]

21 15 Structure from motion x 11 y 11 x 12 y x 1T y 1T x 21 y 21 x 22 y x 2T y 2T x N1 y N1 x N2 y N2... x NT y NT xit is projection of feature i onto camera s horizontal axis at time t (and yit, vertical) [ui, vi, wi] = feature i coordinates [h1t, h2t, h3t] = camera horizontal [v1t, v2t, v3t] = camera vertical o o o o

22 16 Structure from motion u 1 v 1 w 1 u 2 v 2 w h 11 v 11 h 12 v h 1T v 1T h 21 v 21 h 22 v h 2T v 2T h 31 v 31 h 32 v h 3T v 3T u N v N w N xit is projection of feature i onto camera s horizontal axis at time t (and yit, vertical) [ui, vi, wi] = feature i coordinates [h1t, h2t, h3t] = camera horizontal [v1t, v2t, v3t] = camera vertical o o o o

23 17 Structure from motion only determined up to an invertible transform

24 18 SfM as SSID cov = x 11 y 11 x 12 y x 1T y 1T x 21 y 21 x 22 y x 2T y 2T x N1 y N1 x N2 y N2... x NT y NT Past data: indicator of time step & h/v axis means we get to memorize each time step no attempt to learn dynamics Future data: observed screen coordinates (column of matrix)

25 19 Kalman SSID: failure HMM (Baum-Welsh) Kalman Filter (SSID) Preview all models: 10 latent dimensions

26 19 Kalman SSID: failure HMM (Baum-Welsh) Kalman Filter (SSID) Preview all models: 10 latent dimensions

27 19 Kalman SSID: failure HMM (Baum-Welsh) Kalman Filter (SSID) Preview all models: 10 latent dimensions

28 19 Kalman SSID: failure HMM (Baum-Welsh) Kalman Filter (SSID) Preview all models: 10 latent dimensions

29 Can we generalize? n n x = A x + noise o = C x + noise n A m C Get rid of Gaussian noise assumption HMM: same form as Kalman filter, but A 0, A1 = 1, C 0, C1 = 1 noise ~ multinomial x, o are indicators: e.g., 4 = [ ] T 20

30 Derivations for Kalman v. HMM Kalman filter HMM E[o t+k o t ] = E[E[o t+k o t x t ]] = E[E[o t+k x t ]E[o t x t ]] = E[CA k x t (Cx t ) ] = CA k E[x t x t ]C = CA k PC Assume for simplicity m n, both A and C full rank 21

31 Derivations for Kalman v. HMM Kalman filter HMM E[o t+k o t ] = E[E[o t+k o t x t ]] = E[E[o t+k x t ]E[o t x t ]] = E[CA k x t (Cx t ) ] = CA k E[x t x t ]C = CA k PC E[o t+k o t ] = E[E[o t+k o t x t ]] = E[E[o t+k x t ]E[o t x t ]] = E[CA k x t (Cx t ) ] = CA k E[x t x t ]C = CA k PC Assume for simplicity m n, both A and C full rank 21

32 HMM SSID: first try U Σ 2 (U Σ 1 ) = U CA 2 PC (U CAP C ) = (U CA)AP C (PC ) (U CA) 1 = SAS 1 As before, recover  and Ĉ from E[ot+1ot T ] & E[ot+2ot T ] C A P C T Doesn t satisfy A 0, A1 = 1, C 0, C1 = 1 is this a problem? 22

33 23 Merging A & C HMM tracking: write bt = P[xt o1:t] P[xt o1:t-1] = bt-0.5 = A bt-1 P[ot o1:t-1] = C bt-0.5 if ot = o: where Z = P(ot=o bt-1) P(xt=x o1:t) = P(o xt=x) P(xt=x o1:t-1) / Z i.e., bt = diag(co,:) bt-0.5 / Z Write Ao = diag(co,:) A bt = Ao bt-1 / Z

34 23 Merging A & C HMM tracking: write bt = P[xt o1:t] P[xt o1:t-1] = bt-0.5 = A bt-1 P[ot o1:t-1] = C bt-0.5 if ot = o: where Z = P(ot=o bt-1) P(xt=x o1:t) = P(o xt=x) P(xt=x o1:t-1) / Z i.e., bt = diag(co,:) bt-0.5 / Z Write Ao = diag(co,:) A It s enough to estimate Ao bt = Ao bt-1 / Z

35 23 Merging A & C HMM tracking: write bt = P[xt o1:t] P[xt o1:t-1] = bt-0.5 = A bt-1 P[ot o1:t-1] = C bt-0.5 if ot = o: where Z = P(ot=o bt-1) P(xt=x o1:t) = P(o xt=x) P(xt=x o1:t-1) / Z i.e., bt = diag(co,:) bt-0.5 / Z Write Ao = diag(co,:) A It s enough to estimate Ao bt = Ao bt-1 / Z P(ot=o bt-1) = 1 T Ao bt-1

36 24 HMM SSID: try #2 Σ o 2 def = E[o t+2 (δ o o t+1 )o t ] = E[E[o t+2 (δ o o t+1 )o t x t ]] = E[E[o t+2 (δ o o t+1 ) x t ]E[o t x t ]] = E[E[o t+2 x t,o t+1 = o]p[o t+1 = o x t ](Cx t ) ] = E[E[o t+2 x t,o t+1 = o](1 A o x t )(Cx t ) ] Ao x t (1 A o x t )(Cx t ) = E CA 1 A o x t = E[CAA o x t (Cx t ) ] = CAA o E[x t x t ]C = CAA o PC

37 25 HMM SSID: try #2 Â o def = U Σ o 2(U Σ 1 ) = U CAA o PC (U CAP C ) = (U CA)A o PC (PC ) (U CA) 1 = SA o S 1 x = Aox / P(o) o ~ Cx Co,: = e T Ao o 2 Estimate Σ1 and Σ from data; get Û = SVD(Σ1) Plug in to get Âo (for each o) Also need e = S -1 1 = leading left eigenvector of A1 + A2 +

38 Example: clock Discrete observations: sampled frames from training video when tracking: nearest neighbor or Parzen windows (mixture of Gaussians HMM) 10 latent dimensions 26

39 27 Can we generalize? HMMs had x Δ intuition: number of discrete states = number of dimensions We now have x SΔ essentially equally restrictive Can we allow x X for general X? # states > # dims

40 28 # states > # dims: the picture N=3 N=15 N=100 Random projections of N-dimensional simplex

41 29 SSID for OOMs PSRs without actions, multiplicity automata, x = Aox / P(o) x = Aox / P(o) OOM: o ~ Cx HMM: o ~ Cx Co,: = e T Ao Co,: = e T Ao OOM: defined by transition matrices Ao, normalization vector e like HMM, but lift restriction of X = SΔ instead of Aox 0, have Aox λx, λ 0 includes HMM as special case

42 30 OOM SSID No change!

43 31 OOM example No change! our HMM SSID was actually learning OOMs all along

44 Can we generalize? We ve allowed finer discretization of observation space Can we allow continuous observations? Yes: featurize! let ϕ(o) be a feature function 32

45 33 Featurize Σ φ 2 def = E[o t+2 φ(o t+1 )o t ] = o φ(o)e[o t+2 (δ o o t+1 )o t ] = o φ(o)caa o PC Â φ def = U Σ φ 2 (U Σ 1 ) = o φ(o)sa o S 1 Store Âϕ for many different ϕ, recover Âo as needed

46 Example: Range-only SLAM Robot measures distances to L landmarks as it moves; wants to reconstruct path and landmark locations T = 1000, L = 20, window = 1 obs, latent dimension = 15 Features = e d2 /2σ 2 34

47 Example: Range-only SLAM Robot measures distances to L landmarks as it moves; wants to reconstruct path and landmark locations T = 1000, L = 20, window = 1 obs, latent dimension = 15 Features = e d2 /2σ 2 34

48 35 Can we generalize? If some features are good, more must be better Kernels! Everything above is linear algebra Works just fine in an arbitrary RKHS Can rewrite in terms of Gram matrix (no infinite-d computations required) Caveat: regularization now more important

49 Avg. Prediction E Path Prediction performance 6 Avg. Prediction Err. Slot Ca Slot Car A. 7 6 Hilbert 5 Space Embeddin IMU x 10 4 B. 8 HMM 7 3 Mean RR-HMM Last Environment 6 2 LDS Embedded 5 IMU thanks to Dieter Fox s lab Hilbert Space Embeddings Hilbert of Hidden Markov Modelso Space Embeddings HMM RR-HMM Embedded Mean Last LDS Environment mated models and baselines. Geoff Gordon ARC colloquium Apr, 2011 horizon x Accuracy (%) Avg. Prediction Err. Avg.prediction Prediction Err. error Racetrack Predic B. A. B Figure measureme Racetrack 4. Slot car inertial Prediction Horizon U Figure 4. Slot car inertial measurement data. (A) The slot car platform and the IMU (top) and ar platform and the IMU (top) and the racetrack (bot(b) error Squared errorwith fordifferent predictio Pathfor prediction om).tom). (B) Squared estix 10 1 Example0Images HMM Mean RR-HMM Last 60 LDS Embedded Space 80 90Dimens Latent 65 Prediction Horizon horizon

50 37 Learning in-the-loop: option pricing pt pt 100 Price a financial derivative: psychic call holder gets to say I bought call 100 days ago underlying stock follows Black Scholes (unknown parameters) One solution: ID the B-S parameters, plan but planning is itself hard

51 38 Option pricing pt pt 100 A better solution [Van Roy et al.]: use RL 16 hand-picked features (e.g., poly history) initialize policy arbitrarily least-squares temporal differences (LSTD) to estimate value function policy := greedy; repeat

52 38 Option pricing pt pt 100 A better solution [Van Roy et al.]: use RL 16 hand-picked features (e.g., poly history) initialize policy arbitrarily least-squares temporal differences (LSTD) to estimate value function policy := greedy; repeat

53 39 Option pricing Still better: use SSID inside policy iteration 16 original features from Van Roy et al. 204 additional low-originality features e.g., linear fns of price history of underlying SSID picks best 16-d dynamics to explain feature evolution solve for value function in closed form

54 Policy iteration w/ spectral learning Expected payoff / $ invested Policy Iteration Iterations of PI Threshold LSTD (16) LSTD LARS-TD PSTD 0.82 /$ better than best competitor; 1.33 /$ better than best previously published data: 1,000,000 time steps 40

55 41 Making it fast Bottleneck: SVD of Gram or Hankel matrix G: (# time steps) 2 H: (# obs window length) (# time steps) E.g., 1 hr video, 24 fps, , 2 s window G: ( ) ( ) H: ( ) ( ) >> k = 50; n = 2000; % n 2 = >> tic; x = randn(n,n); [u,s,v] = svds(x,k); toc Elapsed time is seconds.

56 42 Making it fast Two techniques online learning random embedding Neither one new, but combination with PSR SSID is, and makes huge difference in practice

57 Online learning With each new observation, rank-1 update of: SVD (Brand) inverse (Sherman-Morrison) n features; latent dimension d; T steps space = O(nd): may fit in cache! time = O(nd 2 T): bounded time per example Small loss in statistical efficiency (estimated subspace rotates), but can deal with it Problem: no rank-1 update of KSVD! 43

58 44 Random embedding k(z) = p(ω)e jω z dω R d = p(ω) cos(ω z)dω R d = (1/Z) E ω Zp [cos(ω z)] x Often, k(x,y) = k(x y); PD Fourier transform of k(z) satisfies p(ω) 0 e.g., Gaussian, Laplacian, Cauchy ω R D R Looks like an expectation [Rahimi & Recht, 2007]

59 45 Random embedding cos(ω (x y)) = cos(ω x) cos(ω y) + sin(ω x) sin(ω y) cos(ω x) cos(ω y) = sin(ω x) sin(ω y) Features cos(ω x)/k, sin(ω x)/k for k random ωi ~ p φ(x) φ(y) = 1 k k cos(ω i (x y)) E[cos(ω (x y)] i=1 Convergence uniform in z as k

60 Random embedding example basis size

61 C. A. B. Table B. 0 0 first few steps C. les í í online+random: 100k features, 11k frames, limit = avail. data offline: 2k frames, compressed & subsampled, compute-limited Table final embedding (colors = 3rd dim) $IWHU samples After 600 A. p After 100 sam Results: Closing the loop 0 í í 0 47

62 Results: Closing the loop A. online+random: 100k features, 11k frames, limit = avail. data offline: 2k frames, compressed & subsampled, compute-limited Table A. final embedding (colors = 3rd dim) Table first few steps B. 0 0 HU mples C. í í 00 samples B. 0 í í 0 47

63 Results: Closing the loop A. online+random: 100k features, 11k frames, limit = avail. data offline: 2k frames, compressed & subsampled, compute-limited Table C. 0 After 600 samples Table B. 0 0 final embedding (colors = 3rd dim) í í first few steps B. A. í í 0 47

64 48 Planning image wheel velocities Suppose we can predict future state Choose actions to maximize reward

65 49 Planning Value iteration: exactly the same math as POMDP value iteration point-based methods are fast, accurate need # points exponential in latent dim: possible big win for learned models

66 Value iteration: Data Bird s Eye View goal image t =1 16x16 RGB t = 10 3d view observation at t = 10 observation at t =1 actions: 6 different noisy translations and rotations Data: 10,000 random start positions; traces of 7 random actions, observations, and rewards (+1 for goal, ϵ otherwise) 50

67 VI: Learned subspace 2 dimensions of learned 5D subspace points take average color of next image 51

68 VI: Plans goal goal Near-optimal plans, only 5d latent space (compare to 5-state POMDP) 52

69 53 Summary Learn dynamical system models with no local optima, fast online computation Nonparametric (kernel-based) version handles near-arbitrary observation distributions One general principle yields algorithms for HMMs, OOMs, SfM, range-only SLAM (kn. corr.), Kalman system ID, RL, Good results from a general-purpose algorithm on problems typically tackled by lots of engineering

70 Papers B. Boots and G. An Online Spectral Learning Algorithm for Partially Observable Nonlinear Dynamical Systems. AAAI, B. Boots and G. Predictive state temporal difference learning. NIPS, B. Boots, S. M. Siddiqi, and G. Closing the learning-planning loop with predictive state representations. RSS, L. Song, B. Boots, S. M. Siddiqi, G., and A. J. Smola. Hilbert space embeddings of hidden Markov models. ICML, (Best paper) 54

Learning about State. Geoff Gordon Machine Learning Department Carnegie Mellon University

Learning about State. Geoff Gordon Machine Learning Department Carnegie Mellon University Learning about State Geoff Gordon Machine Learning Department Carnegie Mellon University joint work with Byron Boots, Sajid Siddiqi, Le Song, Alex Smola What s out there?...... ot-2 ot-1 ot ot+1 ot+2 2

More information

An Online Spectral Learning Algorithm for Partially Observable Nonlinear Dynamical Systems

An Online Spectral Learning Algorithm for Partially Observable Nonlinear Dynamical Systems An Online Spectral Learning Algorithm for Partially Observable Nonlinear Dynamical Systems Byron Boots and Geoffrey J. Gordon AAAI 2011 Select Lab Carnegie Mellon University What is out there?...... o

More information

Spectral learning algorithms for dynamical systems

Spectral learning algorithms for dynamical systems Spectral learning algorithms for dynamical systems Geoff Gordon http://www.cs.cmu.edu/~ggordon/ Machine Learning Department Carnegie Mellon University joint work with Byron Boots, Sajid Siddiqi, Le Song,

More information

Reduced-Rank Hidden Markov Models

Reduced-Rank Hidden Markov Models Reduced-Rank Hidden Markov Models Sajid M. Siddiqi Byron Boots Geoffrey J. Gordon Carnegie Mellon University ... x 1 x 2 x 3 x τ y 1 y 2 y 3 y τ Sequence of observations: Y =[y 1 y 2 y 3... y τ ] Assume

More information

A Constraint Generation Approach to Learning Stable Linear Dynamical Systems

A Constraint Generation Approach to Learning Stable Linear Dynamical Systems A Constraint Generation Approach to Learning Stable Linear Dynamical Systems Sajid M. Siddiqi Byron Boots Geoffrey J. Gordon Carnegie Mellon University NIPS 2007 poster W22 steam Application: Dynamic Textures

More information

Hilbert Space Embeddings of Hidden Markov Models

Hilbert Space Embeddings of Hidden Markov Models Hilbert Space Embeddings of Hidden Markov Models Le Song Carnegie Mellon University Joint work with Byron Boots, Sajid Siddiqi, Geoff Gordon and Alex Smola 1 Big Picture QuesJon Graphical Models! Dependent

More information

Randomized Algorithms

Randomized Algorithms Randomized Algorithms Saniv Kumar, Google Research, NY EECS-6898, Columbia University - Fall, 010 Saniv Kumar 9/13/010 EECS6898 Large Scale Machine Learning 1 Curse of Dimensionality Gaussian Mixture Models

More information

Hilbert Space Embeddings of Hidden Markov Models

Hilbert Space Embeddings of Hidden Markov Models Le Song lesong@cs.cmu.edu Byron Boots beb@cs.cmu.edu School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA Sajid M. Siddiqi siddiqi@google.com Google, Pittsburgh, PA 15213,

More information

Kernel Bayes Rule: Nonparametric Bayesian inference with kernels

Kernel Bayes Rule: Nonparametric Bayesian inference with kernels Kernel Bayes Rule: Nonparametric Bayesian inference with kernels Kenji Fukumizu The Institute of Statistical Mathematics NIPS 2012 Workshop Confluence between Kernel Methods and Graphical Models December

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

CS8803: Statistical Techniques in Robotics Byron Boots. Hilbert Space Embeddings

CS8803: Statistical Techniques in Robotics Byron Boots. Hilbert Space Embeddings CS8803: Statistical Techniques in Robotics Byron Boots Hilbert Space Embeddings 1 Motivation CS8803: STR Hilbert Space Embeddings 2 Overview Multinomial Distributions Marginal, Joint, Conditional Sum,

More information

CS181 Midterm 2 Practice Solutions

CS181 Midterm 2 Practice Solutions CS181 Midterm 2 Practice Solutions 1. Convergence of -Means Consider Lloyd s algorithm for finding a -Means clustering of N data, i.e., minimizing the distortion measure objective function J({r n } N n=1,

More information

Learning the Linear Dynamical System with ASOS ( Approximated Second-Order Statistics )

Learning the Linear Dynamical System with ASOS ( Approximated Second-Order Statistics ) Learning the Linear Dynamical System with ASOS ( Approximated Second-Order Statistics ) James Martens University of Toronto June 24, 2010 Computer Science UNIVERSITY OF TORONTO James Martens (U of T) Learning

More information

Chapter 3. Tohru Katayama

Chapter 3. Tohru Katayama Subspace Methods for System Identification Chapter 3 Tohru Katayama Subspace Methods Reading Group UofA, Edmonton Barnabás Póczos May 14, 2009 Preliminaries before Linear Dynamical Systems Hidden Markov

More information

Hilbert Space Embeddings of Predictive State Representations

Hilbert Space Embeddings of Predictive State Representations Hilbert Space Embeddings of Predictive State Representations Byron Boots Computer Science and Engineering Dept. University of Washington Seattle, WA Arthur Gretton Gatsby Unit University College London

More information

Principal components analysis COMS 4771

Principal components analysis COMS 4771 Principal components analysis COMS 4771 1. Representation learning Useful representations of data Representation learning: Given: raw feature vectors x 1, x 2,..., x n R d. Goal: learn a useful feature

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

Hilbert Space Embeddings of Hidden Markov Models

Hilbert Space Embeddings of Hidden Markov Models Carnegie Mellon University Research Showcase @ CMU Machine Learning Department School of Computer Science 6-00 Hilbert Space Embeddings of Hidden Markov Models Le Song Carnegie Mellon University Byron

More information

Estimating Covariance Using Factorial Hidden Markov Models

Estimating Covariance Using Factorial Hidden Markov Models Estimating Covariance Using Factorial Hidden Markov Models João Sedoc 1,2 with: Jordan Rodu 3, Lyle Ungar 1, Dean Foster 1 and Jean Gallier 1 1 University of Pennsylvania Philadelphia, PA joao@cis.upenn.edu

More information

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent Unsupervised Machine Learning and Data Mining DS 5230 / DS 4420 - Fall 2018 Lecture 7 Jan-Willem van de Meent DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Dimensionality Reduction Goal:

More information

Kernel-Based Contrast Functions for Sufficient Dimension Reduction

Kernel-Based Contrast Functions for Sufficient Dimension Reduction Kernel-Based Contrast Functions for Sufficient Dimension Reduction Michael I. Jordan Departments of Statistics and EECS University of California, Berkeley Joint work with Kenji Fukumizu and Francis Bach

More information

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang. Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning

More information

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin 1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)

More information

Predictive Timing Models

Predictive Timing Models Predictive Timing Models Pierre-Luc Bacon McGill University pbacon@cs.mcgill.ca Borja Balle McGill University bballe@cs.mcgill.ca Doina Precup McGill University dprecup@cs.mcgill.ca Abstract We consider

More information

Partially Observable Markov Decision Processes (POMDPs) Pieter Abbeel UC Berkeley EECS

Partially Observable Markov Decision Processes (POMDPs) Pieter Abbeel UC Berkeley EECS Partially Observable Markov Decision Processes (POMDPs) Pieter Abbeel UC Berkeley EECS Many slides adapted from Jur van den Berg Outline POMDPs Separation Principle / Certainty Equivalence Locally Optimal

More information

CS281 Section 4: Factor Analysis and PCA

CS281 Section 4: Factor Analysis and PCA CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

More information

Diffeomorphic Warping. Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel)

Diffeomorphic Warping. Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel) Diffeomorphic Warping Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel) What Manifold Learning Isn t Common features of Manifold Learning Algorithms: 1-1 charting Dense sampling Geometric Assumptions

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 12 Jan-Willem van de Meent (credit: Yijun Zhao, Percy Liang) DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Linear Dimensionality

More information

Linear Dynamical Systems (Kalman filter)

Linear Dynamical Systems (Kalman filter) Linear Dynamical Systems (Kalman filter) (a) Overview of HMMs (b) From HMMs to Linear Dynamical Systems (LDS) 1 Markov Chains with Discrete Random Variables x 1 x 2 x 3 x T Let s assume we have discrete

More information

Learning Low Dimensional Predictive Representations

Learning Low Dimensional Predictive Representations Learning Low Dimensional Predictive Representations Matthew Rosencrantz MROSEN@CS.CMU.EDU Computer Science Department, Carnegie Mellon University, Forbes Avenue, Pittsburgh, PA, USA Geoff Gordon GGORDON@CS.CMU.EDU

More information

Graphical Models for Collaborative Filtering

Graphical Models for Collaborative Filtering Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,

More information

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010 Hidden Markov Models Aarti Singh Slides courtesy: Eric Xing Machine Learning 10-701/15-781 Nov 8, 2010 i.i.d to sequential data So far we assumed independent, identically distributed data Sequential data

More information

Introduction to Graphical Models

Introduction to Graphical Models Introduction to Graphical Models The 15 th Winter School of Statistical Physics POSCO International Center & POSTECH, Pohang 2018. 1. 9 (Tue.) Yung-Kyun Noh GENERALIZATION FOR PREDICTION 2 Probabilistic

More information

Learning gradients: prescriptive models

Learning gradients: prescriptive models Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x

More information

26 : Spectral GMs. Lecturer: Eric P. Xing Scribes: Guillermo A Cidre, Abelino Jimenez G.

26 : Spectral GMs. Lecturer: Eric P. Xing Scribes: Guillermo A Cidre, Abelino Jimenez G. 10-708: Probabilistic Graphical Models, Spring 2015 26 : Spectral GMs Lecturer: Eric P. Xing Scribes: Guillermo A Cidre, Abelino Jimenez G. 1 Introduction A common task in machine learning is to work with

More information

Sampling Circulant Matrix Approach: A Comparison of Recent Kernel Matrix Approximation Techniques in Ridge Kernel Regression

Sampling Circulant Matrix Approach: A Comparison of Recent Kernel Matrix Approximation Techniques in Ridge Kernel Regression Sampling Circulant Matrix Approach: A Comparison of Recent Kernel Matrix Approximation Techniques in Ridge Kernel Regression NP Slagle College of Computing Georgia Institute of Technology Atlanta, GA 30332

More information

CS 188: Artificial Intelligence

CS 188: Artificial Intelligence CS 188: Artificial Intelligence Hidden Markov Models Instructor: Anca Dragan --- University of California, Berkeley [These slides were created by Dan Klein, Pieter Abbeel, and Anca. http://ai.berkeley.edu.]

More information

Linear Dynamical Systems

Linear Dynamical Systems Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations

More information

TUM 2016 Class 3 Large scale learning by regularization

TUM 2016 Class 3 Large scale learning by regularization TUM 2016 Class 3 Large scale learning by regularization Lorenzo Rosasco UNIGE-MIT-IIT July 25, 2016 Learning problem Solve min w E(w), E(w) = dρ(x, y)l(w x, y) given (x 1, y 1 ),..., (x n, y n ) Beyond

More information

Lecture 21: Spectral Learning for Graphical Models

Lecture 21: Spectral Learning for Graphical Models 10-708: Probabilistic Graphical Models 10-708, Spring 2016 Lecture 21: Spectral Learning for Graphical Models Lecturer: Eric P. Xing Scribes: Maruan Al-Shedivat, Wei-Cheng Chang, Frederick Liu 1 Motivation

More information

Machine Learning 4771

Machine Learning 4771 Machine Learning 4771 Instructor: ony Jebara Kalman Filtering Linear Dynamical Systems and Kalman Filtering Structure from Motion Linear Dynamical Systems Audio: x=pitch y=acoustic waveform Vision: x=object

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA Contents in latter part Linear Dynamical Systems What is different from HMM? Kalman filter Its strength and limitation Particle Filter

More information

Lecture: Face Recognition and Feature Reduction

Lecture: Face Recognition and Feature Reduction Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab 1 Recap - Curse of dimensionality Assume 5000 points uniformly distributed in the

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Kernel methods for comparing distributions, measuring dependence

Kernel methods for comparing distributions, measuring dependence Kernel methods for comparing distributions, measuring dependence Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Principal component analysis Given a set of M centered observations

More information

Final Exam, Fall 2002

Final Exam, Fall 2002 15-781 Final Exam, Fall 22 1. Write your name and your andrew email address below. Name: Andrew ID: 2. There should be 17 pages in this exam (excluding this cover sheet). 3. If you need more room to work

More information

Overfitting, Bias / Variance Analysis

Overfitting, Bias / Variance Analysis Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic

More information

Kernel Methods. Barnabás Póczos

Kernel Methods. Barnabás Póczos Kernel Methods Barnabás Póczos Outline Quick Introduction Feature space Perceptron in the feature space Kernels Mercer s theorem Finite domain Arbitrary domain Kernel families Constructing new kernels

More information

Linear Regression (continued)

Linear Regression (continued) Linear Regression (continued) Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 6, 2017 1 / 39 Outline 1 Administration 2 Review of last lecture 3 Linear regression

More information

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR

More information

Machine Learning - MT & 14. PCA and MDS

Machine Learning - MT & 14. PCA and MDS Machine Learning - MT 2016 13 & 14. PCA and MDS Varun Kanade University of Oxford November 21 & 23, 2016 Announcements Sheet 4 due this Friday by noon Practical 3 this week (continue next week if necessary)

More information

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations

More information

Kernel Learning via Random Fourier Representations

Kernel Learning via Random Fourier Representations Kernel Learning via Random Fourier Representations L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Module 5: Machine Learning L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Kernel Learning via Random

More information

Linear Regression (9/11/13)

Linear Regression (9/11/13) STA561: Probabilistic machine learning Linear Regression (9/11/13) Lecturer: Barbara Engelhardt Scribes: Zachary Abzug, Mike Gloudemans, Zhuosheng Gu, Zhao Song 1 Why use linear regression? Figure 1: Scatter

More information

Approximate Kernel PCA with Random Features

Approximate Kernel PCA with Random Features Approximate Kernel PCA with Random Features (Computational vs. Statistical Tradeoff) Bharath K. Sriperumbudur Department of Statistics, Pennsylvania State University Journées de Statistique Paris May 28,

More information

arxiv: v1 [cs.lg] 10 Jul 2012

arxiv: v1 [cs.lg] 10 Jul 2012 A Spectral Learning Approach to Range-Only SLAM arxiv:1207.2491v1 [cs.lg] 10 Jul 2012 Byron Boots Machine Learning Department Carnegie Mellon University Pittsburgh, PA 15213 beb@cs.cmu.edu Abstract Geoffrey

More information

Principal Component Analysis CS498

Principal Component Analysis CS498 Principal Component Analysis CS498 Today s lecture Adaptive Feature Extraction Principal Component Analysis How, why, when, which A dual goal Find a good representation The features part Reduce redundancy

More information

LECTURE NOTE #11 PROF. ALAN YUILLE

LECTURE NOTE #11 PROF. ALAN YUILLE LECTURE NOTE #11 PROF. ALAN YUILLE 1. NonLinear Dimension Reduction Spectral Methods. The basic idea is to assume that the data lies on a manifold/surface in D-dimensional space, see figure (1) Perform

More information

Kalman Filter Computer Vision (Kris Kitani) Carnegie Mellon University

Kalman Filter Computer Vision (Kris Kitani) Carnegie Mellon University Kalman Filter 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University Examples up to now have been discrete (binary) random variables Kalman filtering can be seen as a special case of a temporal

More information

Final Exam, Machine Learning, Spring 2009

Final Exam, Machine Learning, Spring 2009 Name: Andrew ID: Final Exam, 10701 Machine Learning, Spring 2009 - The exam is open-book, open-notes, no electronics other than calculators. - The maximum possible score on this exam is 100. You have 3

More information

Partially Observable Markov Decision Processes (POMDPs)

Partially Observable Markov Decision Processes (POMDPs) Partially Observable Markov Decision Processes (POMDPs) Sachin Patil Guest Lecture: CS287 Advanced Robotics Slides adapted from Pieter Abbeel, Alex Lee Outline Introduction to POMDPs Locally Optimal Solutions

More information

Partially observable systems and Predictive State Representation (PSR) Nan Jiang CS 598 Statistical UIUC

Partially observable systems and Predictive State Representation (PSR) Nan Jiang CS 598 Statistical UIUC Partially observable systems and Predictive State Representation (PSR) Nan Jiang CS 598 Statistical RL @ UIUC Partially observable systems Key assumption so far: Markov property (Markovianness) 2 Partially

More information

Collaborative Filtering. Radek Pelánek

Collaborative Filtering. Radek Pelánek Collaborative Filtering Radek Pelánek 2017 Notes on Lecture the most technical lecture of the course includes some scary looking math, but typically with intuitive interpretation use of standard machine

More information

Learning a Linear Dynamical System Model for Spatiotemporal

Learning a Linear Dynamical System Model for Spatiotemporal Learning a Linear Dynamical System Model for Spatiotemporal Fields Using a Group of Mobile Sensing Robots Xiaodong Lan a, Mac Schwager b, a Department of Mechanical Engineering, Boston University, Boston,

More information

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Linear Regression Regression Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Example: Height, Gender, Weight Shoe Size Audio features

More information

arxiv: v2 [stat.ml] 28 Feb 2018

arxiv: v2 [stat.ml] 28 Feb 2018 An Efficient, Expressive and Local Minima-free Method for Learning Controlled Dynamical Systems Ahmed Hefny Carnegie Mellon University Carlton Downey Carnegie Mellon University Geoffrey Gordon Carnegie

More information

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Linear Regression Regression Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Example: Height, Gender, Weight Shoe Size Audio features

More information

CSE 473: Artificial Intelligence

CSE 473: Artificial Intelligence CSE 473: Artificial Intelligence Hidden Markov Models Dieter Fox --- University of Washington [Most slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials

More information

Hidden Markov models

Hidden Markov models Hidden Markov models Charles Elkan November 26, 2012 Important: These lecture notes are based on notes written by Lawrence Saul. Also, these typeset notes lack illustrations. See the classroom lectures

More information

Linear Least-squares Dyna-style Planning

Linear Least-squares Dyna-style Planning Linear Least-squares Dyna-style Planning Hengshuai Yao Department of Computing Science University of Alberta Edmonton, AB, Canada T6G2E8 hengshua@cs.ualberta.ca Abstract World model is very important for

More information

Two-Manifold Problems with Applications to Nonlinear System Identification

Two-Manifold Problems with Applications to Nonlinear System Identification Two-Manifold Problems with Applications to Nonlinear System Identification Byron Boots Geoffrey J. Gordon Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA 1213 beb@cs.cmu.edu ggordon@cs.cmu.edu

More information

Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron

Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron CS446: Machine Learning, Fall 2017 Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron Lecturer: Sanmi Koyejo Scribe: Ke Wang, Oct. 24th, 2017 Agenda Recap: SVM and Hinge loss, Representer

More information

Dimensionality Reduction

Dimensionality Reduction Lecture 5 1 Outline 1. Overview a) What is? b) Why? 2. Principal Component Analysis (PCA) a) Objectives b) Explaining variability c) SVD 3. Related approaches a) ICA b) Autoencoders 2 Example 1: Sportsball

More information

Lecture: Face Recognition and Feature Reduction

Lecture: Face Recognition and Feature Reduction Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab Lecture 11-1 Recap - Curse of dimensionality Assume 5000 points uniformly distributed

More information

Laplacian Agent Learning: Representation Policy Iteration

Laplacian Agent Learning: Representation Policy Iteration Laplacian Agent Learning: Representation Policy Iteration Sridhar Mahadevan Example of a Markov Decision Process a1: $0 Heaven $1 Earth What should the agent do? a2: $100 Hell $-1 V a1 ( Earth ) = f(0,1,1,1,1,...)

More information

State-Space Methods for Inferring Spike Trains from Calcium Imaging

State-Space Methods for Inferring Spike Trains from Calcium Imaging State-Space Methods for Inferring Spike Trains from Calcium Imaging Joshua Vogelstein Johns Hopkins April 23, 2009 Joshua Vogelstein (Johns Hopkins) State-Space Calcium Imaging April 23, 2009 1 / 78 Outline

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning 3. Instance Based Learning Alex Smola Carnegie Mellon University http://alex.smola.org/teaching/cmu2013-10-701 10-701 Outline Parzen Windows Kernels, algorithm Model selection

More information

Advanced Introduction to Machine Learning

Advanced Introduction to Machine Learning 10-715 Advanced Introduction to Machine Learning Homework Due Oct 15, 10.30 am Rules Please follow these guidelines. Failure to do so, will result in loss of credit. 1. Homework is due on the due date

More information

L11: Pattern recognition principles

L11: Pattern recognition principles L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction

More information

Lecture 12: Algorithms for HMMs

Lecture 12: Algorithms for HMMs Lecture 12: Algorithms for HMMs Nathan Schneider (some slides from Sharon Goldwater; thanks to Jonathan May for bug fixes) ENLP 17 October 2016 updated 9 September 2017 Recap: tagging POS tagging is a

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA

More information

EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6)

EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6) EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6) Gordon Wetzstein gordon.wetzstein@stanford.edu This document serves as a supplement to the material discussed in

More information

7 Principal Component Analysis

7 Principal Component Analysis 7 Principal Component Analysis This topic will build a series of techniques to deal with high-dimensional data. Unlike regression problems, our goal is not to predict a value (the y-coordinate), it is

More information

Least Squares and Kalman Filtering Questions: me,

Least Squares and Kalman Filtering Questions:  me, Least Squares and Kalman Filtering Questions: Email me, namrata@ece.gatech.edu Least Squares and Kalman Filtering 1 Recall: Weighted Least Squares y = Hx + e Minimize Solution: J(x) = (y Hx) T W (y Hx)

More information

Gaussian Processes as Continuous-time Trajectory Representations: Applications in SLAM and Motion Planning

Gaussian Processes as Continuous-time Trajectory Representations: Applications in SLAM and Motion Planning Gaussian Processes as Continuous-time Trajectory Representations: Applications in SLAM and Motion Planning Jing Dong jdong@gatech.edu 2017-06-20 License CC BY-NC-SA 3.0 Discrete time SLAM Downsides: Measurements

More information

Introduction to Gaussian Process

Introduction to Gaussian Process Introduction to Gaussian Process CS 778 Chris Tensmeyer CS 478 INTRODUCTION 1 What Topic? Machine Learning Regression Bayesian ML Bayesian Regression Bayesian Non-parametric Gaussian Process (GP) GP Regression

More information

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given

More information

Kernel methods for Bayesian inference

Kernel methods for Bayesian inference Kernel methods for Bayesian inference Arthur Gretton Gatsby Computational Neuroscience Unit Lancaster, Nov. 2014 Motivating Example: Bayesian inference without a model 3600 downsampled frames of 20 20

More information

LEARNING DYNAMIC SYSTEMS: MARKOV MODELS

LEARNING DYNAMIC SYSTEMS: MARKOV MODELS LEARNING DYNAMIC SYSTEMS: MARKOV MODELS Markov Process and Markov Chains Hidden Markov Models Kalman Filters Types of dynamic systems Problem of future state prediction Predictability Observability Easily

More information

A Factorization Method for 3D Multi-body Motion Estimation and Segmentation

A Factorization Method for 3D Multi-body Motion Estimation and Segmentation 1 A Factorization Method for 3D Multi-body Motion Estimation and Segmentation René Vidal Department of EECS University of California Berkeley CA 94710 rvidal@eecs.berkeley.edu Stefano Soatto Dept. of Computer

More information

A Constraint Generation Approach to Learning Stable Linear Dynamical Systems

A Constraint Generation Approach to Learning Stable Linear Dynamical Systems A Constraint Generation Approach to Learning Stable Linear Dynamical Systems Sajid M Siddiqi Robotics Institute Carnegie-Mellon University Pittsburgh, PA 15213 siddiqi@cscmuedu Byron Boots Computer Science

More information

ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering

ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering Lecturer: Nikolay Atanasov: natanasov@ucsd.edu Teaching Assistants: Siwei Guo: s9guo@eng.ucsd.edu Anwesan Pal:

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

arxiv: v1 [cs.lg] 12 Dec 2009

arxiv: v1 [cs.lg] 12 Dec 2009 Closing the Learning-Planning Loop with Predictive State Representations Byron Boots Machine Learning Department Carnegie Mellon University Pittsburgh, PA 15213 beb@cs.cmu.edu Sajid M. Siddiqi Robotics

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

A Constraint Generation Approach to Learning Stable Linear Dynamical Systems

A Constraint Generation Approach to Learning Stable Linear Dynamical Systems A Constraint Generation Approach to Learning Stable Linear Dynamical Systems Sajid M. Siddiqi Robotics Institute Carnegie-Mellon University Pittsburgh, PA 15213 siddiqi@cs.cmu.edu Byron Boots Computer

More information

PCA, Kernel PCA, ICA

PCA, Kernel PCA, ICA PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per

More information