Kernels for Dynamic Textures
|
|
- Silvester Rich
- 5 years ago
- Views:
Transcription
1 Kernels for Dynamic Textures S.V.N. Vishwanathan National ICT Australia and Australian National University Joint work with Alex Smola and René Vidal S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 1
2 Roadmap Introduction to Kernel Methods Why kernels? Kernels on Dynamical Systems Trajectories, Noise Models Computation Dynamical Textures ARMA Models Approximate Solutions Kernel Computation Experiments Outlook and Conclusion S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 2
3 Classification Data: Task: Pairs of observations (x i, y i ) Underlying distribution P(x, y) Examples (blood status, cancer), (transactions, fraud) Find a function f(x) which predicts y given x The function f(x) must generalize well S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 3
4 Optimal Separating Hyperplane Minimize 1 2 w 2 subject to y i ( w, x i + b) 1 for all i S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 4
5 Kernels and Nonlinearity Problem: Linear functions are often too simple to provide good estimators Idea 1: Map to a higher dimensional feature space via Φ : x Φ(x) and solve the problem there Replace every x, x by Φ(x), Φ(x ) Idea 2: Instead of computing Φ(x) explicitly use a kernel function k(x, x ) := Φ(x), Φ(x ) A large class of functions are admissible as kernels Non-vectorial data can be handled if we can compute meaningful k(x, x ) S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 5
6 Roadmap Introduction to Kernel Methods Why kernels? Kernels on Dynamical Systems Trajectories, Noise Models Computation Dynamical Textures ARMA Models Approximate Solutions Kernel Computation Experiments Outlook and Conclusion S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 6
7 The Basic Idea Key Observation: Trajectories are easily observable Similar trajectories similar systems Restrict attention to interesting cases Average over noise models Kernels Using Dynamical Systems: Simulate system for both inputs Similar time evolution similar inputs Kernels on Dynamical Systems: Restrict to interesting initial conditions Simulate both the systems Similar time evolution similar systems S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 7
8 Notation X - state space (Hilbert space) A - time evolution operators T - time of measurement µ - nice probability measure on T Discounting Factors: For some λ > 0 µ(t) = λ 1 e λt for T = R + 0 Time Evolution: We study µ(t) = e λt 1 e λ for T = N 0 x A (t) := A(t)x for A A S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 8
9 Trajectories and Kernels Comparing Trajectories: Using the dot product on X we define a dot product on X T θ, θ := E µ [ θ(t), θ (t) ] for θ, θ X T Extending to Dynamical Systems: Identify a dynamical system with its trajectory and define ] k((x, A), ( x, Ã)) := E µ [ A(t)x, Ã(t) x Other Ideas: A nicely decaying measure required for convergence Modify the dot product in X Covariance matrices? Rational kernels and transducers S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 9
10 Special Cases Kernels on Dynamical Systems: Restrict attention to x = x Compare trajectory for identical initial conditions Take expectation if interested in a range of x ] k(a, Ã) := E x [k((x, A), (x, Ã)) More generally k(a, Ã) := E A EÃ E x [ k((x, A), (x, Ã)) ] Kernels Using Dynamical Systems: Restrict attention to a particular dynamical system As before we can take expectations over A k(x, x) := E x E x E A [k((x, A), ( x, A))] S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 10
11 Discrete Linear Systems Linear Systems: We assume time propagation occurs as x A (t + 1) = A x A (t) + a t + ξ t In closed form t x A (t) = A t x 0 + A t i ξ i + A t i a t i=0 To avoid messy math assume a t = 0 and hence t x A (t) = A t x 0 + A t i ξ i i=0 Contribution to kernel due to A as well as noise S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 11
12 Continuous Linear Systems Linear Systems: Sytem dynamics here are described by d dt x A(t) = A x A (t) + a(t) + ξ(t) Here ξ(t) with E[ξ(t)] = 0 is a stochastic process and x A (t) = exp(a t)x 0 + t 0 exp(a(t τ))(a(τ) + ξ(τ))dτ As before we assume a(t) = 0 We even assume ξ(τ) = 0 (avoids messy math again!) x A (t) = exp(a t)x 0 Kernel contribution only due to A S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 12
13 Convergence Criterion Discrete Case: Let A and B and W be linear operators The matrix norms obey 0 A, B Λ For suitable λ with e λ > Λ 2 and W 0 M := e λt A t W B t t=0 Sylvester equation e λ AMB + W = M Continuous Case: We define M := 0 e λt exp(at) W exp(bt) dt Sylvester equation (A + λ 2 1)M + M(B + λ 2 1) = W S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 13
14 Gory Details Contribution due to A: [ ] p e λt A t x, Ãt x := p x e λt (A t ) W Ãt t=0 t=0 = p x M x x Contribution due to noise: p E ξ t=0 t=0 j,j =0 t e λt A t j ξ j, ξ Ãt j j ( [ ]) = p tr C ξ e λt (A t ) M Ãt := p tr(c ξ M) In above equations p is a normalizing term S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 14
15 Delving Deeper More on M and M: The matrix M and M look like [ ] M := e λt (A t ) W Ãt t=0 and [ ] M := e λt (A t ) M Ãt t=0 Sylvester Equation: Both M and M satisfy the Sylvester equation e λ A M Ã +W = M and e λ A M Ã +M = M Can be solved for in cubic time S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 15
16 Discrete Kernel Discrete Case: Putting it all together k((a, x), (Ã, x)) = p [ x M x + tr(c ξ M) ] Note that C ξ is the covariance matrix of ξ t Can assume different noise models per time step Initial Conditions: C be the covariance matrix of the initial conditions If we set x = x then k((a, x), (Ã, x)) = p [ tr(cm) + tr(c ξ M) ] S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 16
17 Continuous Kernel Contribution due to A: Since we assumed a(t) = ξ(t) = 0 we get k((x, A), ( x, Ã)) = λ 1 e λt exp(a t)x, exp(ã t) x dt The Final Form: The kernel can be expressed as k((x, A), ( x, Ã)) = λ 1 x M x where 0 (A + λ 2 1)M + M (Ã +λ 2 1) = W Solution in cubic time by solving Sylvester equation S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 17
18 Special Cases Snapshot: If we consider only the snapshot at time instance T k((x, A), ( x, Ã)) = λ 1 x exp(a t)w exp(ã t) x Initial Conditions: Fix A = Ã Now we just solve Dynamical Systems: M = 1 2 (A +λ 2 1) 1 W Fix x = x to get k(a, Ã) = λ 1 tr(mc) Here C is the covariance matrix of initial conditions S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 18
19 Graph Kernels Graph Laplacian: Let E be the adjacency matrix and D := diag(e 1) L := E D and L := D 1 2 LD 1 2 Diffusion Process: We can define a diffusion process by d x(t) = Lx(t) dt Diffusion Kernel (Kondor and Lafferty, 2002): If we measure overlap at time instance T we get K = exp(lt ) exp(lt ) K ij is the probability that state l reached from i and j S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 19
20 Graph Kernels Undirected Graphs (Kondor and Lafferty, 2002): Here L is symmetric and hence yields K = exp(2lt ) Labeled Graphs (Gärtner, 2002): If W acts as an indicator for node labels Say W ij = 1 if two nodes have same label For other fancy weights see (Kashima et al, 2003) Averaged Graph Laplacian: If we average over a range of T values K = 1 ( L + λ2 ) S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 20
21 Roadmap Introduction to Kernel Methods Why kernels? Kernels on Dynamical Systems Trajectories, Noise Models Computation Dynamical Textures ARMA Models Approximate Solutions Kernel Computation Experiments Outlook and Conclusion S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 21
22 ARMA Models ARMA Model: An auto-regressive moving average model is x(t + 1) = A x(t) + B v(t) y(t) = φ(x(t)) + w(t) x(t) is a hidden variable v(t) and w(t) are IID random noise Linear Gaussian Model: If φ is linear and the noise is white Gaussian: x(t + 1) = A x(t) + v(t) v(t) N(0, Q) y(t) = C x(t) + w(t) w(t) N(0, R) Fix scaling by demanding that C C = 1 S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 22
23 Dynamic Textures Image Model: y(t) R m are the observed noisy images x(t) R n (n < m) are hidden variables Modeling: A sequence of images {y(1),..., y(τ)} is observed Ideally we want to solve A(τ), C(τ), Q(τ), R(τ) = arg max p(y(1),..., y(τ)) A,C,Q,R Exact Solution: n4sid in MATLAB solves above problem Does not scale well if m is large Impractical for images where m 10 5 S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 23
24 Approximate Solution Problem To Solve: For any variable z(t) define Z τ i := [z(i),..., z(τ)] We are solving Solving By SVD: Y τ 1 = C X τ 1 + W τ 1 with C C = 1 Solving for arg min C,X τ 1 W yields C(τ) = U and X(τ) = ΣV where Y τ 1 = UΣV Solving for arg min A X τ 2 A X τ 1 yields A(τ) = ΣV D 1 V (V D 2 V ) 1 Σ 1 [ ] [ ] 0 0 1(τ 1) 0 Here D 1 = and D 1 (τ 1) 0 2 = 0 0 S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 24
25 Dynamic Texture Kernel Kernel Definition: Estimate model and compute kernels between models If we average out the noise then for some W 0 [ ] k((x 0, A, C), (x 0, A, C )) := E e λt yt W y t v,w Kernel Computation: t=1 The kernel can be computed as k = x 0 Mx 0 + ( e λ 1 ) 1 tr [ Q M + W R ] The matrices M and M satisfy M = e λ A C W C A +e λ A M A M = C W C +e λ A M A S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 25
26 Experimental Setup Typical Textures: Some sample textures A long clip was cut to shorter clips of 120 frames each Freak Textures: We also collected some freak textures S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 26
27 Results Kernel Induced Metric: Clips closer on a axis are from the same master clip We plot the kernel induced metric for λ = 0.9 and 0.1 Results fairly independent of the cholice of λ Notice the block diagonal structure of the metric matrix S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 27
28 Roadmap Introduction to Kernel Methods Why kernels? Kernels on Dynamical Systems Trajectories, Noise Models Computation Dynamical Textures ARMA Models Approximate Solutions Kernel Computation Experiments Outlook and Conclusion S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 28
29 Conclusion A new method to embed dynamical systems Analytical solutions for linear systems Many graph kernels are special cases Analytical solutions require cubic time Are better solutions possible for special cases? Extensions to nonlinear systems? Application to dynamical textures Works with approximate model parameters Picks out clips from the same master clip Close relations to rational kernels of Cortes et. al. More information at S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 29
30 Questions? S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 30
Binet-Cauchy Kernerls on Dynamical Systems and its Application to the Analysis of Dynamic Scenes
on Dynamical Systems and its Application to the Analysis of Dynamic Scenes Dynamical Smola Alexander J. Vidal René presented by Tron Roberto July 31, 2006 Support Vector Machines (SVMs) Classification
More informationKernels A Machine Learning Overview
Kernels A Machine Learning Overview S.V.N. Vishy Vishwanathan vishy@axiom.anu.edu.au National ICT of Australia and Australian National University Thanks to Alex Smola, Stéphane Canu, Mike Jordan and Peter
More informationBinet-Cauchy Kernels on Dynamical Systems and its Application to the Analysis of Dynamic Scenes
Binet-Cauchy Kernels on Dynamical Systems and its Application to the Analysis of Dynamic Scenes S.V.N. Vishwanathan Statistical Machine Learning Program National ICT Australia Canberra, 0200 ACT, Australia
More informationBinet-Cauchy Kernels on Dynamical Systems and its Application to the Analysis of Dynamic Scenes
Binet-Cauchy Kernels on Dynamical Systems and its Application to the Analysis of Dynamic Scenes S.V.N. Vishwanathan Statistical Machine Learning Program National ICT Australia Canberra, 0200 ACT, Australia
More informationKernels and Dynamical Systems
Kernels and Dynamical Systems Alexander J. Smola a,b, René Vidal c, S.V.N. Vishy Vishwanathan b a Australian National University, RSISE, Machine Learning Group, Canberra, 0200 ACT, Australia b National
More informationBinet-Cauchy Kernels on Dynamical Systems and its Application to the Analysis of Dynamic Scenes
International Journal of Computer Vision c 2006 Springer Science + Business Media, LLC. Manufactured in The Netherlands. DOI: 10.1007/s11263-006-9352-0 Binet-Cauchy Kernels on Dynamical Systems and its
More informationCOMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017
COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS
More informationL5 Support Vector Classification
L5 Support Vector Classification Support Vector Machine Problem definition Geometrical picture Optimization problem Optimization Problem Hard margin Convexity Dual problem Soft margin problem Alexander
More informationBinet-Cauchy Kernels
Binet-Cauchy Kernels S.V.N. Vishwanathan, Alexander J. Smola National ICT Australia, Machine Learning Program, Canberra, ACT 0200, Australia {SVN.Vishwanathan, Alex.Smola}@nicta.com.au Abstract We propose
More informationLecture 6 Sept Data Visualization STAT 442 / 890, CM 462
Lecture 6 Sept. 25-2006 Data Visualization STAT 442 / 890, CM 462 Lecture: Ali Ghodsi 1 Dual PCA It turns out that the singular value decomposition also allows us to formulate the principle components
More informationGraphs, Geometry and Semi-supervised Learning
Graphs, Geometry and Semi-supervised Learning Mikhail Belkin The Ohio State University, Dept of Computer Science and Engineering and Dept of Statistics Collaborators: Partha Niyogi, Vikas Sindhwani In
More informationLinear Regression and Its Applications
Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start
More informationCMU-Q Lecture 24:
CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input
More informationCOMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017
COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University PRINCIPAL COMPONENT ANALYSIS DIMENSIONALITY
More informationLinear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction
Linear vs Non-linear classifier CS789: Machine Learning and Neural Network Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Linear classifier is in the
More informationComputer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)
Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori
More informationData Mining. Linear & nonlinear classifiers. Hamid Beigy. Sharif University of Technology. Fall 1396
Data Mining Linear & nonlinear classifiers Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 1 / 31 Table of contents 1 Introduction
More informationLecture 6: Bayesian Inference in SDE Models
Lecture 6: Bayesian Inference in SDE Models Bayesian Filtering and Smoothing Point of View Simo Särkkä Aalto University Simo Särkkä (Aalto) Lecture 6: Bayesian Inference in SDEs 1 / 45 Contents 1 SDEs
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table
More informationComputer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression
Group Prof. Daniel Cremers 4. Gaussian Processes - Regression Definition (Rep.) Definition: A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution.
More informationComputer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression
Group Prof. Daniel Cremers 9. Gaussian Processes - Regression Repetition: Regularized Regression Before, we solved for w using the pseudoinverse. But: we can kernelize this problem as well! First step:
More informationLECTURE NOTE #NEW 6 PROF. ALAN YUILLE
LECTURE NOTE #NEW 6 PROF. ALAN YUILLE 1. Introduction to Regression Now consider learning the conditional distribution p(y x). This is often easier than learning the likelihood function p(x y) and the
More informationGAUSSIAN PROCESS REGRESSION
GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationFast Direct Methods for Gaussian Processes
Fast Direct Methods for Gaussian Processes Mike O Neil Departments of Mathematics New York University oneil@cims.nyu.edu December 12, 2015 1 Collaborators This is joint work with: Siva Ambikasaran Dan
More informationDynamicBoost: Boosting Time Series Generated by Dynamical Systems
DynamicBoost: Boosting Time Series Generated by Dynamical Systems René Vidal Center for Imaging Science, Dept. of BME, Johns Hopkins University, Baltimore MD, USA rvidal@cis.jhu.edu http://www.vision.jhu.edu
More informationReliability Monitoring Using Log Gaussian Process Regression
COPYRIGHT 013, M. Modarres Reliability Monitoring Using Log Gaussian Process Regression Martin Wayne Mohammad Modarres PSA 013 Center for Risk and Reliability University of Maryland Department of Mechanical
More informationECE521 Lecture7. Logistic Regression
ECE521 Lecture7 Logistic Regression Outline Review of decision theory Logistic regression A single neuron Multi-class classification 2 Outline Decision theory is conceptually easy and computationally hard
More informationMulti-task Learning with Gaussian Processes, with Applications to Robot Inverse Dynamics
1 / 38 Multi-task Learning with Gaussian Processes, with Applications to Robot Inverse Dynamics Chris Williams with Kian Ming A. Chai, Stefan Klanke, Sethu Vijayakumar December 2009 Motivation 2 / 38 Examples
More informationGaussian with mean ( µ ) and standard deviation ( σ)
Slide from Pieter Abbeel Gaussian with mean ( µ ) and standard deviation ( σ) 10/6/16 CSE-571: Robotics X ~ N( µ, σ ) Y ~ N( aµ + b, a σ ) Y = ax + b + + + + 1 1 1 1 1 1 1 1 1 1, ~ ) ( ) ( ), ( ~ ), (
More informationReduced-Rank Hidden Markov Models
Reduced-Rank Hidden Markov Models Sajid M. Siddiqi Byron Boots Geoffrey J. Gordon Carnegie Mellon University ... x 1 x 2 x 3 x τ y 1 y 2 y 3 y τ Sequence of observations: Y =[y 1 y 2 y 3... y τ ] Assume
More informationCSE 151 Machine Learning. Instructor: Kamalika Chaudhuri
CSE 151 Machine Learning Instructor: Kamalika Chaudhuri Linear Classification Given labeled data: (xi, feature vector yi) label i=1,..,n where y is 1 or 1, find a hyperplane to separate from Linear Classification
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2016 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More information5 Linear Algebra and Inverse Problem
5 Linear Algebra and Inverse Problem 5.1 Introduction Direct problem ( Forward problem) is to find field quantities satisfying Governing equations, Boundary conditions, Initial conditions. The direct problem
More informationClassification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative
More informationGraph Matching & Information. Geometry. Towards a Quantum Approach. David Emms
Graph Matching & Information Geometry Towards a Quantum Approach David Emms Overview Graph matching problem. Quantum algorithms. Classical approaches. How these could help towards a Quantum approach. Graphs
More informationNon-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines
Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2018 CS 551, Fall
More informationIntroduction to Machine Learning
Introduction to Machine Learning 12. Gaussian Processes Alex Smola Carnegie Mellon University http://alex.smola.org/teaching/cmu2013-10-701 10-701 The Normal Distribution http://www.gaussianprocess.org/gpml/chapters/
More informationGAUSSIAN PROCESS TRANSFORMS
GAUSSIAN PROCESS TRANSFORMS Philip A. Chou Ricardo L. de Queiroz Microsoft Research, Redmond, WA, USA pachou@microsoft.com) Computer Science Department, Universidade de Brasilia, Brasilia, Brazil queiroz@ieee.org)
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationPattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods
Pattern Recognition and Machine Learning Chapter 6: Kernel Methods Vasil Khalidov Alex Kläser December 13, 2007 Training Data: Keep or Discard? Parametric methods (linear/nonlinear) so far: learn parameter
More informationApproximation Theoretical Questions for SVMs
Ingo Steinwart LA-UR 07-7056 October 20, 2007 Statistical Learning Theory: an Overview Support Vector Machines Informal Description of the Learning Goal X space of input samples Y space of labels, usually
More informationPath Integral Stochastic Optimal Control for Reinforcement Learning
Preprint August 3, 204 The st Multidisciplinary Conference on Reinforcement Learning and Decision Making RLDM203 Path Integral Stochastic Optimal Control for Reinforcement Learning Farbod Farshidian Institute
More informationLecture 2 Machine Learning Review
Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things
More informationGWAS V: Gaussian processes
GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationCLOSE-TO-CLEAN REGULARIZATION RELATES
Worshop trac - ICLR 016 CLOSE-TO-CLEAN REGULARIZATION RELATES VIRTUAL ADVERSARIAL TRAINING, LADDER NETWORKS AND OTHERS Mudassar Abbas, Jyri Kivinen, Tapani Raio Department of Computer Science, School of
More informationNotes on Regularized Least Squares Ryan M. Rifkin and Ross A. Lippert
Computer Science and Artificial Intelligence Laboratory Technical Report MIT-CSAIL-TR-2007-025 CBCL-268 May 1, 2007 Notes on Regularized Least Squares Ryan M. Rifkin and Ross A. Lippert massachusetts institute
More informationMachine learning - HT Maximum Likelihood
Machine learning - HT 2016 3. Maximum Likelihood Varun Kanade University of Oxford January 27, 2016 Outline Probabilistic Framework Formulate linear regression in the language of probability Introduce
More informationECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering
ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering Lecturer: Nikolay Atanasov: natanasov@ucsd.edu Teaching Assistants: Siwei Guo: s9guo@eng.ucsd.edu Anwesan Pal:
More informationA GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES. Wei Chu, S. Sathiya Keerthi, Chong Jin Ong
A GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES Wei Chu, S. Sathiya Keerthi, Chong Jin Ong Control Division, Department of Mechanical Engineering, National University of Singapore 0 Kent Ridge Crescent,
More informationMachine Learning. 7. Logistic and Linear Regression
Sapienza University of Rome, Italy - Machine Learning (27/28) University of Rome La Sapienza Master in Artificial Intelligence and Robotics Machine Learning 7. Logistic and Linear Regression Luca Iocchi,
More informationMAP Examples. Sargur Srihari
MAP Examples Sargur srihari@cedar.buffalo.edu 1 Potts Model CRF for OCR Topics Image segmentation based on energy minimization 2 Examples of MAP Many interesting examples of MAP inference are instances
More informationLecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu
Lecture: Gaussian Process Regression STAT 6474 Instructor: Hongxiao Zhu Motivation Reference: Marc Deisenroth s tutorial on Robot Learning. 2 Fast Learning for Autonomous Robots with Gaussian Processes
More informationSupport Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2
More informationUncertainties estimation and compensation on a robotic manipulator. Application on Barrett arm
Uncertainties estimation and compensation on a robotic manipulator. Application on Barrett arm Duong Dang December, 8 Abstract In this project, two different regression methods have been used to estimate
More informationCSC2545 Topics in Machine Learning: Kernel Methods and Support Vector Machines
CSC2545 Topics in Machine Learning: Kernel Methods and Support Vector Machines A comprehensive introduc@on to SVMs and other kernel methods, including theory, algorithms and applica@ons. Instructor: Anthony
More informationReminders. Thought questions should be submitted on eclass. Please list the section related to the thought question
Linear regression Reminders Thought questions should be submitted on eclass Please list the section related to the thought question If it is a more general, open-ended question not exactly related to a
More informationGaussian Processes (10/16/13)
STA561: Probabilistic machine learning Gaussian Processes (10/16/13) Lecturer: Barbara Engelhardt Scribes: Changwei Hu, Di Jin, Mengdi Wang 1 Introduction In supervised learning, we observe some inputs
More information20: Gaussian Processes
10-708: Probabilistic Graphical Models 10-708, Spring 2016 20: Gaussian Processes Lecturer: Andrew Gordon Wilson Scribes: Sai Ganesh Bandiatmakuri 1 Discussion about ML Here we discuss an introduction
More information1. Kernel ridge regression In contrast to ordinary least squares which has a cost function. m (θ T x (i) y (i) ) 2, J(θ) = 1 2.
CS229 Problem Set #2 Solutions 1 CS 229, Public Course Problem Set #2 Solutions: Theory Kernels, SVMs, and 1. Kernel ridge regression In contrast to ordinary least squares which has a cost function J(θ)
More informationExamples include: (a) the Lorenz system for climate and weather modeling (b) the Hodgkin-Huxley system for neuron modeling
1 Introduction Many natural processes can be viewed as dynamical systems, where the system is represented by a set of state variables and its evolution governed by a set of differential equations. Examples
More informationActive and Semi-supervised Kernel Classification
Active and Semi-supervised Kernel Classification Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London Work done in collaboration with Xiaojin Zhu (CMU), John Lafferty (CMU),
More informationCIS 520: Machine Learning Oct 09, Kernel Methods
CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed
More informationLearning gradients: prescriptive models
Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan
More informationGaussian processes and bayesian optimization Stanisław Jastrzębski. kudkudak.github.io kudkudak
Gaussian processes and bayesian optimization Stanisław Jastrzębski kudkudak.github.io kudkudak Plan Goal: talk about modern hyperparameter optimization algorithms Bayes reminder: equivalent linear regression
More informationGaussian Process Regression
Gaussian Process Regression 4F1 Pattern Recognition, 21 Carl Edward Rasmussen Department of Engineering, University of Cambridge November 11th - 16th, 21 Rasmussen (Engineering, Cambridge) Gaussian Process
More informationInformation geometry for bivariate distribution control
Information geometry for bivariate distribution control C.T.J.Dodson + Hong Wang Mathematics + Control Systems Centre, University of Manchester Institute of Science and Technology Optimal control of stochastic
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationA brief introduction to Conditional Random Fields
A brief introduction to Conditional Random Fields Mark Johnson Macquarie University April, 2005, updated October 2010 1 Talk outline Graphical models Maximum likelihood and maximum conditional likelihood
More informationStatistical Learning. Dong Liu. Dept. EEIS, USTC
Statistical Learning Dong Liu Dept. EEIS, USTC Chapter 6. Unsupervised and Semi-Supervised Learning 1. Unsupervised learning 2. k-means 3. Gaussian mixture model 4. Other approaches to clustering 5. Principle
More informationKernel Methods. Outline
Kernel Methods Quang Nguyen University of Pittsburgh CS 3750, Fall 2011 Outline Motivation Examples Kernels Definitions Kernel trick Basic properties Mercer condition Constructing feature space Hilbert
More informationObservability and state estimation
EE263 Autumn 2015 S Boyd and S Lall Observability and state estimation state estimation discrete-time observability observability controllability duality observers for noiseless case continuous-time observability
More informationGaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008
Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:
More information1 EM algorithm: updating the mixing proportions {π k } ik are the posterior probabilities at the qth iteration of EM.
Université du Sud Toulon - Var Master Informatique Probabilistic Learning and Data Analysis TD: Model-based clustering by Faicel CHAMROUKHI Solution The aim of this practical wor is to show how the Classification
More informationData-dependent representations: Laplacian Eigenmaps
Data-dependent representations: Laplacian Eigenmaps November 4, 2015 Data Organization and Manifold Learning There are many techniques for Data Organization and Manifold Learning, e.g., Principal Component
More informationProbabilistic & Unsupervised Learning
Probabilistic & Unsupervised Learning Gaussian Processes Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College London
More informationUndirected Graphical Models
Undirected Graphical Models 1 Conditional Independence Graphs Let G = (V, E) be an undirected graph with vertex set V and edge set E, and let A, B, and C be subsets of vertices. We say that C separates
More informationMultiple-step Time Series Forecasting with Sparse Gaussian Processes
Multiple-step Time Series Forecasting with Sparse Gaussian Processes Perry Groot ab Peter Lucas a Paul van den Bosch b a Radboud University, Model-Based Systems Development, Heyendaalseweg 135, 6525 AJ
More informationLecture 19 Observability and state estimation
EE263 Autumn 2007-08 Stephen Boyd Lecture 19 Observability and state estimation state estimation discrete-time observability observability controllability duality observers for noiseless case continuous-time
More informationDiscrete time processes
Discrete time processes Predictions are difficult. Especially about the future Mark Twain. Florian Herzog 2013 Modeling observed data When we model observed (realized) data, we encounter usually the following
More informationSingular value decomposition. If only the first p singular values are nonzero we write. U T o U p =0
Singular value decomposition If only the first p singular values are nonzero we write G =[U p U o ] " Sp 0 0 0 # [V p V o ] T U p represents the first p columns of U U o represents the last N-p columns
More informationMachine Learning for Signal Processing Bayes Classification and Regression
Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For
More informationErgodicity in data assimilation methods
Ergodicity in data assimilation methods David Kelly Andy Majda Xin Tong Courant Institute New York University New York NY www.dtbkelly.com April 15, 2016 ETH Zurich David Kelly (CIMS) Data assimilation
More informationLinear Dynamical Systems (Kalman filter)
Linear Dynamical Systems (Kalman filter) (a) Overview of HMMs (b) From HMMs to Linear Dynamical Systems (LDS) 1 Markov Chains with Discrete Random Variables x 1 x 2 x 3 x T Let s assume we have discrete
More informationLast lecture: Recurrence relations and differential equations. The solution to the differential equation dx
Last lecture: Recurrence relations and differential equations The solution to the differential equation dx = ax is x(t) = ce ax, where c = x() is determined by the initial conditions x(t) Let X(t) = and
More informationData Mining and Analysis: Fundamental Concepts and Algorithms
Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA
More informationDIMENSION REDUCTION. min. j=1
DIMENSION REDUCTION 1 Principal Component Analysis (PCA) Principal components analysis (PCA) finds low dimensional approximations to the data by projecting the data onto linear subspaces. Let X R d and
More informationLinear Models for Classification
Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,
More informationElements of Positive Definite Kernel and Reproducing Kernel Hilbert Space
Elements of Positive Definite Kernel and Reproducing Kernel Hilbert Space Statistical Inference with Reproducing Kernel Hilbert Space Kenji Fukumizu Institute of Statistical Mathematics, ROIS Department
More informationLecture 1: From Data to Graphs, Weighted Graphs and Graph Laplacian
Lecture 1: From Data to Graphs, Weighted Graphs and Graph Laplacian Radu Balan February 5, 2018 Datasets diversity: Social Networks: Set of individuals ( agents, actors ) interacting with each other (e.g.,
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:
More informationLecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.
Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods
More informationESANN'2001 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 2001, D-Facto public., ISBN ,
Sparse Kernel Canonical Correlation Analysis Lili Tan and Colin Fyfe 2, Λ. Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong. 2. School of Information and Communication
More informationProbabilistic Graphical Models Lecture 20: Gaussian Processes
Probabilistic Graphical Models Lecture 20: Gaussian Processes Andrew Gordon Wilson www.cs.cmu.edu/~andrewgw Carnegie Mellon University March 30, 2015 1 / 53 What is Machine Learning? Machine learning algorithms
More informationKernel methods for comparing distributions, measuring dependence
Kernel methods for comparing distributions, measuring dependence Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Principal component analysis Given a set of M centered observations
More informationMatrix Factorization with Applications to Clustering Problems: Formulation, Algorithms and Performance
Date: Mar. 3rd, 2017 Matrix Factorization with Applications to Clustering Problems: Formulation, Algorithms and Performance Presenter: Songtao Lu Department of Electrical and Computer Engineering Iowa
More informationIntroduction to Support Vector Machines
Introduction to Support Vector Machines Hsuan-Tien Lin Learning Systems Group, California Institute of Technology Talk in NTU EE/CS Speech Lab, November 16, 2005 H.-T. Lin (Learning Systems Group) Introduction
More information