Binet-Cauchy Kernerls on Dynamical Systems and its Application to the Analysis of Dynamic Scenes

Similar documents
Binet-Cauchy Kernels on Dynamical Systems and its Application to the Analysis of Dynamic Scenes

Binet-Cauchy Kernels on Dynamical Systems and its Application to the Analysis of Dynamic Scenes

Binet-Cauchy Kernels on Dynamical Systems and its Application to the Analysis of Dynamic Scenes

Kernels for Dynamic Textures

Kernels and Dynamical Systems

Binet-Cauchy Kernels

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction

A Kernel Between Sets of Vectors

CS798: Selected topics in Machine Learning

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto

DynamicBoost: Boosting Time Series Generated by Dynamical Systems

Kernels and the Kernel Trick. Machine Learning Fall 2017

Data Mining. Linear & nonlinear classifiers. Hamid Beigy. Sharif University of Technology. Fall 1396

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Complex Dynamic Systems: Qualitative vs Quantitative analysis

Kernel Methods. Foundations of Data Analysis. Torsten Möller. Möller/Mori 1

Last lecture: Recurrence relations and differential equations. The solution to the differential equation dx

Lecture 4 February 2

Support Vector Machine (SVM) and Kernel Methods

L5 Support Vector Classification

Machine Learning. Support Vector Machines. Manfred Huber

Mehryar Mohri Foundations of Machine Learning Courant Institute of Mathematical Sciences Homework assignment 3 April 5, 2013 Due: April 19, 2013

Support Vector Machines. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015

Problem Set 1. Homeworks will graded based on content and clarity. Please show your work clearly for full credit.

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machines for Classification: A Statistical Portrait

Lecture 2: Linear Algebra Review

10-701/ Recitation : Kernels

CS4495/6495 Introduction to Computer Vision. 8C-L3 Support Vector Machines

Statistical Pattern Recognition

Support Vector Machine (SVM) and Kernel Methods

Statistical Learning. Dong Liu. Dept. EEIS, USTC

9.2 Support Vector Machines 159

A New Space for Comparing Graphs

Data Analysis and Manifold Learning Lecture 9: Diffusion on Manifolds and on Graphs

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017

The Kernel Trick. Robert M. Haralick. Computer Science, Graduate Center City University of New York

Outline. Basic concepts: SVM and kernels SVM primal/dual problems. Chih-Jen Lin (National Taiwan Univ.) 1 / 22

ARCS IN FINITE PROJECTIVE SPACES. Basic objects and definitions

Random Walks, Random Fields, and Graph Kernels

About this class. Maximizing the Margin. Maximum margin classifiers. Picture of large and small margin hyperplanes

Linear & nonlinear classifiers

Machine Learning. Kernels. Fall (Kernels, Kernelized Perceptron and SVM) Professor Liang Huang. (Chap. 12 of CIML)

Announcements. Proposals graded

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

This is an author-deposited version published in : Eprints ID : 17710

Kernel methods for comparing distributions, measuring dependence

Gaussian with mean ( µ ) and standard deviation ( σ)

Kernel Methods. Konstantin Tretyakov MTAT Machine Learning

Support Vector Machine

MATH 583A REVIEW SESSION #1

Foundation of Intelligent Systems, Part I. SVM s & Kernel Methods

Kernel Methods. Konstantin Tretyakov MTAT Machine Learning

Nearest Neighbors Methods for Support Vector Machines

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Perceptron Revisited: Linear Separators. Support Vector Machines

CS145: INTRODUCTION TO DATA MINING

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Mathematical foundations - linear algebra

Introduction to Machine Learning

Kaggle.

A Least Squares Formulation for Canonical Correlation Analysis

Linking non-binned spike train kernels to several existing spike train metrics

Lecture 2 and 3: Controllability of DT-LTI systems

CSE 151 Machine Learning. Instructor: Kamalika Chaudhuri

Unsupervised dimensionality reduction

Chapter Two Elements of Linear Algebra

Nonlinear Dimensionality Reduction

CS 7140: Advanced Machine Learning

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Finite Dimensional Space Orthogonal Projection Operators

Kernel Discriminant Analysis for Regression Problems

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM

Learning gradients: prescriptive models

Kernel Methods. Outline

SUPPORT VECTOR MACHINE

Active and Semi-supervised Kernel Classification

Kernel Methods. Barnabás Póczos

Linear Algebra Exercises

Kernel methods, kernel SVM and ridge regression

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

Review: Support vector machines. Machine learning techniques and image analysis

Vector Spaces and Subspaces

Machine Learning. Support Vector Machines. Fabio Vandin November 20, 2017

Multiple Similarities Based Kernel Subspace Learning for Image Classification

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH

c 4, < y 2, 1 0, otherwise,

Linear & nonlinear classifiers

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis

Semester Review Packet

Advanced Machine Learning & Perception

Reproducing Kernel Hilbert Spaces Class 03, 15 February 2006 Andrea Caponnetto

LINEAR ALGEBRA KNOWLEDGE SURVEY

7. Dimension and Structure.

Support Vector Machines

2 Eigenvectors and Eigenvalues in abstract spaces.

Statistical Machine Learning

Manifold Learning: Theory and Applications to HRI

Transcription:

on Dynamical Systems and its Application to the Analysis of Dynamic Scenes Dynamical Smola Alexander J. Vidal René presented by Tron Roberto July 31, 2006

Support Vector Machines (SVMs) Classification problem Given m observations (x i, y i ) X Y, X R d,y = { 1, 1} find f : X Y Support Vector Machines use f = sign( w, x + b) (1) The hyperplane H(w, b) = {x w, x, + b = 0} splits X in two half spaces and is given by 1 min w,b,ξ i 2 w 2 + C n i=1 subject to y i ( w, x + b) 1 ξ i ξ i 0 i = 1,..., m ξ i (2) Support Vector Machines Kernel methods Dynamical

H(w, b) depends only on some points (called support vectors) efficient computations in training Support Vector Machines Kernel methods Dynamical

Kernel methods SVMs use the hypothesis of linearly separable clusters. What to do if the surface of separation is not linear? Idea: use Φ : X H to map the points x i into a higher (possible infinite) dimensional feature space where Φ(x i ) are linearly separable f and w 2 can be expressed in term of the dot product Φ(x), Φ(x ) avoid explicite computation of the mapping Φ(x) Kernel Trick: the classification can be done by replacing Φ(x), Φ(x ) with the kernel function k(x, x ) Support Vector Machines Kernel methods Dynamical

Compound matrices Let A R m n. Given q min(m, n), define I n q = {i = (i 1, i 2,..., i q ) : 1 i 1... i q, i i N } and likewise I m q. The compound matrix of order q, C q (A) is defined as [C q (A)] i,j = det(a(i k, j l )) where i I n q and j I m q (3) The size of C q (A) is ( n q ) ( mq ). Two particular cases: if q = m = n, C q (A) = det(a) if q = 1, C q (A) = A Dynamical

Binet-Cauchy Theorem (Binet-Cauchy) Let A R l m and B R l n. For all 1 q min(m, n, l) we have C q (A B) = C q (A) C q (B) Theorem (Binet-Cauchy ) Let A, B R n k. For all 1 q min(n, k) the two k(a, B) = tr(c q (A B)) (4) k(a, B) = det(c q (A B)) (5) Dynamical are well defined and positive semi-definite

Dynamical and trajectories Definitions The state of a dynamical system is defined by some x H where H is a RKHS with kernel κ(, ). The evolution of a linear dynamical system is denoted as: x A (t) = A(x, t) for t T (6) A : H T H is a set of linear operators indexed by T, the set of times of measurement. The pair (x, A) identifies the trajectory of a dynamical system, defined as the map: Traj A : H H T, Traj A (x)(t) = x A (t) (7) Dynamical Introduction Trace Determinant

Dynamical and trajectories Comparison approaches Two approaches: Example Compare parameters: useful when suitable parametrization exist. But with different parameters can behave almost identically. The two x a(x) = x p and x b(x) = min( x p, x ) (8) give identical trajectories as long as p > 1 and x < 1 Compare trajectories: independent from parametrization, can take in account initial conditions. Requires a similarity measure. Dynamical Introduction Trace Determinant

Trace on dynamical Applying the trace kernel on trajectories involve the computation of tr ( Traj A (x) Traj A (x ) ) = κ(x A (t), x A (t)) (9) t T To assure convergence, a popular solution is to adopt discounting schemes: Definition (Trace kernel for trajectories) The trace kernel on trajectories is defined as µ(t) = ce λt (10) µ(t) = δ τ (t) (11) Dynamical Introduction Trace Determinant k((x, A)(x, A )) = E t µ(t) [κ(x A (t), x A (t))] (12)

Trace on dynamical This kernel can be further specialized: on model parameters k(a, A ) = E x,x [k((x, A)(x, A ))] (13) on initial conditions k(x, x ) = E A,A [k((x, A)(x, A ))] (14) Dynamical Introduction Trace Determinant

Determinant kernel on dynamical Definition (Determinant kernel for trajectories) Like the previous case, the determinant kernel is defined as k((x, A)(x, A )) = E t µ(t) [ det(traja (x) Traj A (x )) ] (15) For the determinant to exist, µ(t) must have finite support. Then, the kernel reduce to where K i,j = κ(x A (t i ), x A (t j)) k((x, A)(x, A )) = det(k) (16) Dynamical Introduction Trace Determinant This kernel, following an information-theoretic interpretation, measures the statistical dependence between two sequences

- Discrete time ARMA model y t = Cx t + w t where w t N (0, R), y R m (17) x t+1 = Ax t + v t where v t N (0, Q), x R n (18) In this case, closed form solution can be found for both trace and determinant Dynamical Discrete time Trace kernel Determinant kernel Continous time

Trace kernel on linear dynamical With the assumption µ(t) = e λt, κ(y t, y t) = y t Wy t, W positive semi-definite, the trace kernel can be written with same realization of noise for both k((x 0, A, C), (x 0, A, C )) = x0 Mx 0 1 + tr(qm +WR) 1 e λ (19) with different noise realizations k((x 0, A, C), (x 0, A, C )) = x 0 Mx 0 (20) where M satisfies the Sylvester s equation M = e λ A MA + C WC (21) Dynamical Discrete time Trace kernel Determinant kernel Continous time Other closed forms exist when independence from initial condition is desired and when the model is fully observable (C = I,R = 0)

Determinant kernel on linear dynamical With the same assumption used for the trace kernel, the determinant kernel can be expressed in a closed form where N.B. k((x 0, A, C), (x 0, A, C )) = det(w ) det(c MC ) (22) M = e λ AM(A ) + x 0 (x 0) (23) 1 k is uniformly zero if C or M do not have full rank (e.g. m > n) 2 if we consider E x0,x 0 [k] to obtain independence from initial conditions, a closed form is possible only for m = n = 1 Dynamical Discrete time Trace kernel Determinant kernel Continous time

- Continous time The evolution of an LTI system can be described with the model which has solution d x(t) = Ax(t) (24) dt x(t) = exp(at)x(0) (25) Using the exponential discount µ(t) = e λt, the trace kernel becomes k((x 0, A), (x 0, A )) = x 0 Mx 0 (26) where (A λ 4 I)M + M(A λ I) = W (27) 4 Dynamical Discrete time Trace kernel Determinant kernel Continous time

Subspaces angles Given an ARMA model without noise, the Henkel matrix C y 0 y 1 y 2... Z = y 1 y 2... CA [ =.... CA 2 x0 x 1 x 2... ]. lives in a subspace spanned by the columns of O The kernel k((a, C), (A, C )) = = O [ x 0 x 1 x 2... ] (28) n cos 2 (θ i ) = det(q Q ) 2 (29) i=1 Dynamical Continous time Graphs measure the angle between subspaces

Cepstral coefficient Let H(z) be the transfer function of an ARMA model without noise. The cepstrum is defined as The Martin kernel log(h(z)h (z 1 )) = n Z c n z n (30) k((a, C), (A, C )) = ncn c n (31) n=1 Dynamical Continous time Graphs is equivalent to the Subspace Angles approach (De Cock and De Moor, 2002)

Random walks Random walk on a graph can be seen as an ARMA model without noise y t = Cx t (32) x t+1 = Ax t (33) Assume equiprobable distribution over starting node k(g 1, G 2 ) = 1 V 1 V 2 1 M1 (34) M = e λ A 1 MA 2 + C 1 WC 2 (35) Dynamical Continous time Graphs Equivalent to the geometric graph kernel of Gärtner et al. (2003)

Diffusion on graphs Continuous diffusion process can be seen as the continuous LTI system d x(t) = Lx(t) (36) dt Assume same Laplacian matrix, different initial conditions, µ(t) = δ τ (t) [K] i,j = k((l, e i ), (L, e j )) = [exp(lτ) exp(lτ)] i,j (37) If undirected graph, L = L K = exp(2lτ) (38) Dynamical Continous time Graphs Same as the diffusion kernel of Kondor and Lafferty (2002).

Polygons comparison Burkhardt (2004) uses feature functions Φ(x, p) where p is a polygon and x is a vertex index Dynamic of the system from x (x mod n) + 1 Compare two polygons from their trajectories (Φ(1, p),..., Φ(n, p)) Equiprobable distribution on initial conditions (starting points on each polygon) k(p, p ) = 1 nn n,n x,x =1 Φ(x, p), Φ(x, p ) (39) Dynamical Continous time Graphs Equivalent to Burkhardt (2004) but can be generalized to any kernel

The time evolution of a fully observable non-linear dynamical is governed by x t+1 = f (x t ) (40) Find a mapping Φ : X H which linearize the system and apply the same tools. N.B. Φ(x t+1 ) = AΦ(x t ) (41) the space of possible solutions have to be restricted by imposing constraints on the structure of A the same constraints appears on M while solving the Sylvester s equations Dynamical

Dynamic textures Setup The proposed have been applied to the problem of comparing dynamic textures, using ARMA models System identification performed using the sub-optimal closed form solution of Doretto et al. (2003) Dataset of 150 sequences (grayscale 8 bit, 115x170 px, 120 frames each) Sequences from 65 to 80 cannot be modeled as dynamic textures Dynamical Dynamic textures Video clips clustering

Dynamic textures Results Dynamical Figure: Binet-Cauchy kernel Figure: Martin kernel Darker area low value kernel similar sequences The value of λ doesn t affect clustering Sequences 65-80 are recognized as novel Dynamic textures Video clips clustering

Dynamic textures Parameters interpretation Initial condition of the system x 0 : discriminate between sequences with same the foreground (dynamic behaviour) but different background (contained in the initial condition) Exponential decay λ: relative weights between short and long range interactions Dynamics (A,C): with A and A with different dimensionality can be compared if the outputs y t, y t belong to the same space. Can compare various levels of detail in the parametrization of the same system. Dynamical Dynamic textures Video clips clustering

Video clips clustering Clustering of random video clips (120 frames each) The kernel has been used to compute the k-nearest neighborhood and LLE (Local Linear Embedding) has been used for clustering and embedding in the 2D space Dynamical Dynamic textures Video clips clustering

General framework which includes previous as particular cases Can be applied to a wide class of dynamical (linear/non-linear, DT/CT) Computational-efficient closed form solution for linear dynamical Good results Dynamical