Kernels for Dynamic Textures

Size: px
Start display at page:

Download "Kernels for Dynamic Textures"

Transcription

1 Kernels for Dynamic Textures S.V.N. Vishwanathan National ICT Australia and Australian National University Joint work with Alex Smola and René Vidal S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 1

2 Roadmap Introduction to Kernel Methods Why kernels? Kernels on Dynamical Systems Trajectories, Noise Models Computation Dynamical Textures ARMA Models Approximate Solutions Kernel Computation Experiments Outlook and Conclusion S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 2

3 Classification Data: Task: Pairs of observations (x i, y i ) Underlying distribution P(x, y) Examples (blood status, cancer), (transactions, fraud) Find a function f(x) which predicts y given x The function f(x) must generalize well S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 3

4 Optimal Separating Hyperplane Minimize 1 2 w 2 subject to y i ( w, x i + b) 1 for all i S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 4

5 Kernels and Nonlinearity Problem: Linear functions are often too simple to provide good estimators Idea 1: Map to a higher dimensional feature space via Φ : x Φ(x) and solve the problem there Replace every x, x by Φ(x), Φ(x ) Idea 2: Instead of computing Φ(x) explicitly use a kernel function k(x, x ) := Φ(x), Φ(x ) A large class of functions are admissible as kernels Non-vectorial data can be handled if we can compute meaningful k(x, x ) S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 5

6 Roadmap Introduction to Kernel Methods Why kernels? Kernels on Dynamical Systems Trajectories, Noise Models Computation Dynamical Textures ARMA Models Approximate Solutions Kernel Computation Experiments Outlook and Conclusion S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 6

7 The Basic Idea Key Observation: Trajectories are easily observable Similar trajectories similar systems Restrict attention to interesting cases Average over noise models Kernels Using Dynamical Systems: Simulate system for both inputs Similar time evolution similar inputs Kernels on Dynamical Systems: Restrict to interesting initial conditions Simulate both the systems Similar time evolution similar systems S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 7

8 Notation X - state space (Hilbert space) A - time evolution operators T - time of measurement µ - nice probability measure on T Discounting Factors: For some λ > 0 µ(t) = λ 1 e λt for T = R + 0 Time Evolution: We study µ(t) = e λt 1 e λ for T = N 0 x A (t) := A(t)x for A A S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 8

9 Trajectories and Kernels Comparing Trajectories: Using the dot product on X we define a dot product on X T θ, θ := E µ [ θ(t), θ (t) ] for θ, θ X T Extending to Dynamical Systems: Identify a dynamical system with its trajectory and define ] k((x, A), ( x, Ã)) := E µ [ A(t)x, Ã(t) x Other Ideas: A nicely decaying measure required for convergence Modify the dot product in X Covariance matrices? Rational kernels and transducers S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 9

10 Special Cases Kernels on Dynamical Systems: Restrict attention to x = x Compare trajectory for identical initial conditions Take expectation if interested in a range of x ] k(a, Ã) := E x [k((x, A), (x, Ã)) More generally k(a, Ã) := E A EÃ E x [ k((x, A), (x, Ã)) ] Kernels Using Dynamical Systems: Restrict attention to a particular dynamical system As before we can take expectations over A k(x, x) := E x E x E A [k((x, A), ( x, A))] S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 10

11 Discrete Linear Systems Linear Systems: We assume time propagation occurs as x A (t + 1) = A x A (t) + a t + ξ t In closed form t x A (t) = A t x 0 + A t i ξ i + A t i a t i=0 To avoid messy math assume a t = 0 and hence t x A (t) = A t x 0 + A t i ξ i i=0 Contribution to kernel due to A as well as noise S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 11

12 Continuous Linear Systems Linear Systems: Sytem dynamics here are described by d dt x A(t) = A x A (t) + a(t) + ξ(t) Here ξ(t) with E[ξ(t)] = 0 is a stochastic process and x A (t) = exp(a t)x 0 + t 0 exp(a(t τ))(a(τ) + ξ(τ))dτ As before we assume a(t) = 0 We even assume ξ(τ) = 0 (avoids messy math again!) x A (t) = exp(a t)x 0 Kernel contribution only due to A S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 12

13 Convergence Criterion Discrete Case: Let A and B and W be linear operators The matrix norms obey 0 A, B Λ For suitable λ with e λ > Λ 2 and W 0 M := e λt A t W B t t=0 Sylvester equation e λ AMB + W = M Continuous Case: We define M := 0 e λt exp(at) W exp(bt) dt Sylvester equation (A + λ 2 1)M + M(B + λ 2 1) = W S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 13

14 Gory Details Contribution due to A: [ ] p e λt A t x, Ãt x := p x e λt (A t ) W Ãt t=0 t=0 = p x M x x Contribution due to noise: p E ξ t=0 t=0 j,j =0 t e λt A t j ξ j, ξ Ãt j j ( [ ]) = p tr C ξ e λt (A t ) M Ãt := p tr(c ξ M) In above equations p is a normalizing term S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 14

15 Delving Deeper More on M and M: The matrix M and M look like [ ] M := e λt (A t ) W Ãt t=0 and [ ] M := e λt (A t ) M Ãt t=0 Sylvester Equation: Both M and M satisfy the Sylvester equation e λ A M Ã +W = M and e λ A M Ã +M = M Can be solved for in cubic time S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 15

16 Discrete Kernel Discrete Case: Putting it all together k((a, x), (Ã, x)) = p [ x M x + tr(c ξ M) ] Note that C ξ is the covariance matrix of ξ t Can assume different noise models per time step Initial Conditions: C be the covariance matrix of the initial conditions If we set x = x then k((a, x), (Ã, x)) = p [ tr(cm) + tr(c ξ M) ] S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 16

17 Continuous Kernel Contribution due to A: Since we assumed a(t) = ξ(t) = 0 we get k((x, A), ( x, Ã)) = λ 1 e λt exp(a t)x, exp(ã t) x dt The Final Form: The kernel can be expressed as k((x, A), ( x, Ã)) = λ 1 x M x where 0 (A + λ 2 1)M + M (Ã +λ 2 1) = W Solution in cubic time by solving Sylvester equation S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 17

18 Special Cases Snapshot: If we consider only the snapshot at time instance T k((x, A), ( x, Ã)) = λ 1 x exp(a t)w exp(ã t) x Initial Conditions: Fix A = Ã Now we just solve Dynamical Systems: M = 1 2 (A +λ 2 1) 1 W Fix x = x to get k(a, Ã) = λ 1 tr(mc) Here C is the covariance matrix of initial conditions S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 18

19 Graph Kernels Graph Laplacian: Let E be the adjacency matrix and D := diag(e 1) L := E D and L := D 1 2 LD 1 2 Diffusion Process: We can define a diffusion process by d x(t) = Lx(t) dt Diffusion Kernel (Kondor and Lafferty, 2002): If we measure overlap at time instance T we get K = exp(lt ) exp(lt ) K ij is the probability that state l reached from i and j S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 19

20 Graph Kernels Undirected Graphs (Kondor and Lafferty, 2002): Here L is symmetric and hence yields K = exp(2lt ) Labeled Graphs (Gärtner, 2002): If W acts as an indicator for node labels Say W ij = 1 if two nodes have same label For other fancy weights see (Kashima et al, 2003) Averaged Graph Laplacian: If we average over a range of T values K = 1 ( L + λ2 ) S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 20

21 Roadmap Introduction to Kernel Methods Why kernels? Kernels on Dynamical Systems Trajectories, Noise Models Computation Dynamical Textures ARMA Models Approximate Solutions Kernel Computation Experiments Outlook and Conclusion S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 21

22 ARMA Models ARMA Model: An auto-regressive moving average model is x(t + 1) = A x(t) + B v(t) y(t) = φ(x(t)) + w(t) x(t) is a hidden variable v(t) and w(t) are IID random noise Linear Gaussian Model: If φ is linear and the noise is white Gaussian: x(t + 1) = A x(t) + v(t) v(t) N(0, Q) y(t) = C x(t) + w(t) w(t) N(0, R) Fix scaling by demanding that C C = 1 S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 22

23 Dynamic Textures Image Model: y(t) R m are the observed noisy images x(t) R n (n < m) are hidden variables Modeling: A sequence of images {y(1),..., y(τ)} is observed Ideally we want to solve A(τ), C(τ), Q(τ), R(τ) = arg max p(y(1),..., y(τ)) A,C,Q,R Exact Solution: n4sid in MATLAB solves above problem Does not scale well if m is large Impractical for images where m 10 5 S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 23

24 Approximate Solution Problem To Solve: For any variable z(t) define Z τ i := [z(i),..., z(τ)] We are solving Solving By SVD: Y τ 1 = C X τ 1 + W τ 1 with C C = 1 Solving for arg min C,X τ 1 W yields C(τ) = U and X(τ) = ΣV where Y τ 1 = UΣV Solving for arg min A X τ 2 A X τ 1 yields A(τ) = ΣV D 1 V (V D 2 V ) 1 Σ 1 [ ] [ ] 0 0 1(τ 1) 0 Here D 1 = and D 1 (τ 1) 0 2 = 0 0 S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 24

25 Dynamic Texture Kernel Kernel Definition: Estimate model and compute kernels between models If we average out the noise then for some W 0 [ ] k((x 0, A, C), (x 0, A, C )) := E e λt yt W y t v,w Kernel Computation: t=1 The kernel can be computed as k = x 0 Mx 0 + ( e λ 1 ) 1 tr [ Q M + W R ] The matrices M and M satisfy M = e λ A C W C A +e λ A M A M = C W C +e λ A M A S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 25

26 Experimental Setup Typical Textures: Some sample textures A long clip was cut to shorter clips of 120 frames each Freak Textures: We also collected some freak textures S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 26

27 Results Kernel Induced Metric: Clips closer on a axis are from the same master clip We plot the kernel induced metric for λ = 0.9 and 0.1 Results fairly independent of the cholice of λ Notice the block diagonal structure of the metric matrix S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 27

28 Roadmap Introduction to Kernel Methods Why kernels? Kernels on Dynamical Systems Trajectories, Noise Models Computation Dynamical Textures ARMA Models Approximate Solutions Kernel Computation Experiments Outlook and Conclusion S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 28

29 Conclusion A new method to embed dynamical systems Analytical solutions for linear systems Many graph kernels are special cases Analytical solutions require cubic time Are better solutions possible for special cases? Extensions to nonlinear systems? Application to dynamical textures Works with approximate model parameters Picks out clips from the same master clip Close relations to rational kernels of Cortes et. al. More information at S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 29

30 Questions? S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 30

Binet-Cauchy Kernerls on Dynamical Systems and its Application to the Analysis of Dynamic Scenes

Binet-Cauchy Kernerls on Dynamical Systems and its Application to the Analysis of Dynamic Scenes on Dynamical Systems and its Application to the Analysis of Dynamic Scenes Dynamical Smola Alexander J. Vidal René presented by Tron Roberto July 31, 2006 Support Vector Machines (SVMs) Classification

More information

Kernels A Machine Learning Overview

Kernels A Machine Learning Overview Kernels A Machine Learning Overview S.V.N. Vishy Vishwanathan vishy@axiom.anu.edu.au National ICT of Australia and Australian National University Thanks to Alex Smola, Stéphane Canu, Mike Jordan and Peter

More information

Binet-Cauchy Kernels on Dynamical Systems and its Application to the Analysis of Dynamic Scenes

Binet-Cauchy Kernels on Dynamical Systems and its Application to the Analysis of Dynamic Scenes Binet-Cauchy Kernels on Dynamical Systems and its Application to the Analysis of Dynamic Scenes S.V.N. Vishwanathan Statistical Machine Learning Program National ICT Australia Canberra, 0200 ACT, Australia

More information

Binet-Cauchy Kernels on Dynamical Systems and its Application to the Analysis of Dynamic Scenes

Binet-Cauchy Kernels on Dynamical Systems and its Application to the Analysis of Dynamic Scenes Binet-Cauchy Kernels on Dynamical Systems and its Application to the Analysis of Dynamic Scenes S.V.N. Vishwanathan Statistical Machine Learning Program National ICT Australia Canberra, 0200 ACT, Australia

More information

Kernels and Dynamical Systems

Kernels and Dynamical Systems Kernels and Dynamical Systems Alexander J. Smola a,b, René Vidal c, S.V.N. Vishy Vishwanathan b a Australian National University, RSISE, Machine Learning Group, Canberra, 0200 ACT, Australia b National

More information

Binet-Cauchy Kernels on Dynamical Systems and its Application to the Analysis of Dynamic Scenes

Binet-Cauchy Kernels on Dynamical Systems and its Application to the Analysis of Dynamic Scenes International Journal of Computer Vision c 2006 Springer Science + Business Media, LLC. Manufactured in The Netherlands. DOI: 10.1007/s11263-006-9352-0 Binet-Cauchy Kernels on Dynamical Systems and its

More information

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS

More information

L5 Support Vector Classification

L5 Support Vector Classification L5 Support Vector Classification Support Vector Machine Problem definition Geometrical picture Optimization problem Optimization Problem Hard margin Convexity Dual problem Soft margin problem Alexander

More information

Binet-Cauchy Kernels

Binet-Cauchy Kernels Binet-Cauchy Kernels S.V.N. Vishwanathan, Alexander J. Smola National ICT Australia, Machine Learning Program, Canberra, ACT 0200, Australia {SVN.Vishwanathan, Alex.Smola}@nicta.com.au Abstract We propose

More information

Lecture 6 Sept Data Visualization STAT 442 / 890, CM 462

Lecture 6 Sept Data Visualization STAT 442 / 890, CM 462 Lecture 6 Sept. 25-2006 Data Visualization STAT 442 / 890, CM 462 Lecture: Ali Ghodsi 1 Dual PCA It turns out that the singular value decomposition also allows us to formulate the principle components

More information

Graphs, Geometry and Semi-supervised Learning

Graphs, Geometry and Semi-supervised Learning Graphs, Geometry and Semi-supervised Learning Mikhail Belkin The Ohio State University, Dept of Computer Science and Engineering and Dept of Statistics Collaborators: Partha Niyogi, Vikas Sindhwani In

More information

Linear Regression and Its Applications

Linear Regression and Its Applications Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start

More information

CMU-Q Lecture 24:

CMU-Q Lecture 24: CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input

More information

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University PRINCIPAL COMPONENT ANALYSIS DIMENSIONALITY

More information

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction Linear vs Non-linear classifier CS789: Machine Learning and Neural Network Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Linear classifier is in the

More information

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.) Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori

More information

Data Mining. Linear & nonlinear classifiers. Hamid Beigy. Sharif University of Technology. Fall 1396

Data Mining. Linear & nonlinear classifiers. Hamid Beigy. Sharif University of Technology. Fall 1396 Data Mining Linear & nonlinear classifiers Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 1 / 31 Table of contents 1 Introduction

More information

Lecture 6: Bayesian Inference in SDE Models

Lecture 6: Bayesian Inference in SDE Models Lecture 6: Bayesian Inference in SDE Models Bayesian Filtering and Smoothing Point of View Simo Särkkä Aalto University Simo Särkkä (Aalto) Lecture 6: Bayesian Inference in SDEs 1 / 45 Contents 1 SDEs

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

More information

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression Group Prof. Daniel Cremers 4. Gaussian Processes - Regression Definition (Rep.) Definition: A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution.

More information

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression Group Prof. Daniel Cremers 9. Gaussian Processes - Regression Repetition: Regularized Regression Before, we solved for w using the pseudoinverse. But: we can kernelize this problem as well! First step:

More information

LECTURE NOTE #NEW 6 PROF. ALAN YUILLE

LECTURE NOTE #NEW 6 PROF. ALAN YUILLE LECTURE NOTE #NEW 6 PROF. ALAN YUILLE 1. Introduction to Regression Now consider learning the conditional distribution p(y x). This is often easier than learning the likelihood function p(x y) and the

More information

GAUSSIAN PROCESS REGRESSION

GAUSSIAN PROCESS REGRESSION GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Fast Direct Methods for Gaussian Processes

Fast Direct Methods for Gaussian Processes Fast Direct Methods for Gaussian Processes Mike O Neil Departments of Mathematics New York University oneil@cims.nyu.edu December 12, 2015 1 Collaborators This is joint work with: Siva Ambikasaran Dan

More information

DynamicBoost: Boosting Time Series Generated by Dynamical Systems

DynamicBoost: Boosting Time Series Generated by Dynamical Systems DynamicBoost: Boosting Time Series Generated by Dynamical Systems René Vidal Center for Imaging Science, Dept. of BME, Johns Hopkins University, Baltimore MD, USA rvidal@cis.jhu.edu http://www.vision.jhu.edu

More information

Reliability Monitoring Using Log Gaussian Process Regression

Reliability Monitoring Using Log Gaussian Process Regression COPYRIGHT 013, M. Modarres Reliability Monitoring Using Log Gaussian Process Regression Martin Wayne Mohammad Modarres PSA 013 Center for Risk and Reliability University of Maryland Department of Mechanical

More information

ECE521 Lecture7. Logistic Regression

ECE521 Lecture7. Logistic Regression ECE521 Lecture7 Logistic Regression Outline Review of decision theory Logistic regression A single neuron Multi-class classification 2 Outline Decision theory is conceptually easy and computationally hard

More information

Multi-task Learning with Gaussian Processes, with Applications to Robot Inverse Dynamics

Multi-task Learning with Gaussian Processes, with Applications to Robot Inverse Dynamics 1 / 38 Multi-task Learning with Gaussian Processes, with Applications to Robot Inverse Dynamics Chris Williams with Kian Ming A. Chai, Stefan Klanke, Sethu Vijayakumar December 2009 Motivation 2 / 38 Examples

More information

Gaussian with mean ( µ ) and standard deviation ( σ)

Gaussian with mean ( µ ) and standard deviation ( σ) Slide from Pieter Abbeel Gaussian with mean ( µ ) and standard deviation ( σ) 10/6/16 CSE-571: Robotics X ~ N( µ, σ ) Y ~ N( aµ + b, a σ ) Y = ax + b + + + + 1 1 1 1 1 1 1 1 1 1, ~ ) ( ) ( ), ( ~ ), (

More information

Reduced-Rank Hidden Markov Models

Reduced-Rank Hidden Markov Models Reduced-Rank Hidden Markov Models Sajid M. Siddiqi Byron Boots Geoffrey J. Gordon Carnegie Mellon University ... x 1 x 2 x 3 x τ y 1 y 2 y 3 y τ Sequence of observations: Y =[y 1 y 2 y 3... y τ ] Assume

More information

CSE 151 Machine Learning. Instructor: Kamalika Chaudhuri

CSE 151 Machine Learning. Instructor: Kamalika Chaudhuri CSE 151 Machine Learning Instructor: Kamalika Chaudhuri Linear Classification Given labeled data: (xi, feature vector yi) label i=1,..,n where y is 1 or 1, find a hyperplane to separate from Linear Classification

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2016 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

5 Linear Algebra and Inverse Problem

5 Linear Algebra and Inverse Problem 5 Linear Algebra and Inverse Problem 5.1 Introduction Direct problem ( Forward problem) is to find field quantities satisfying Governing equations, Boundary conditions, Initial conditions. The direct problem

More information

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative

More information

Graph Matching & Information. Geometry. Towards a Quantum Approach. David Emms

Graph Matching & Information. Geometry. Towards a Quantum Approach. David Emms Graph Matching & Information Geometry Towards a Quantum Approach David Emms Overview Graph matching problem. Quantum algorithms. Classical approaches. How these could help towards a Quantum approach. Graphs

More information

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2018 CS 551, Fall

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning 12. Gaussian Processes Alex Smola Carnegie Mellon University http://alex.smola.org/teaching/cmu2013-10-701 10-701 The Normal Distribution http://www.gaussianprocess.org/gpml/chapters/

More information

GAUSSIAN PROCESS TRANSFORMS

GAUSSIAN PROCESS TRANSFORMS GAUSSIAN PROCESS TRANSFORMS Philip A. Chou Ricardo L. de Queiroz Microsoft Research, Redmond, WA, USA pachou@microsoft.com) Computer Science Department, Universidade de Brasilia, Brasilia, Brazil queiroz@ieee.org)

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Pattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods

Pattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods Pattern Recognition and Machine Learning Chapter 6: Kernel Methods Vasil Khalidov Alex Kläser December 13, 2007 Training Data: Keep or Discard? Parametric methods (linear/nonlinear) so far: learn parameter

More information

Approximation Theoretical Questions for SVMs

Approximation Theoretical Questions for SVMs Ingo Steinwart LA-UR 07-7056 October 20, 2007 Statistical Learning Theory: an Overview Support Vector Machines Informal Description of the Learning Goal X space of input samples Y space of labels, usually

More information

Path Integral Stochastic Optimal Control for Reinforcement Learning

Path Integral Stochastic Optimal Control for Reinforcement Learning Preprint August 3, 204 The st Multidisciplinary Conference on Reinforcement Learning and Decision Making RLDM203 Path Integral Stochastic Optimal Control for Reinforcement Learning Farbod Farshidian Institute

More information

Lecture 2 Machine Learning Review

Lecture 2 Machine Learning Review Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things

More information

GWAS V: Gaussian processes

GWAS V: Gaussian processes GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

CLOSE-TO-CLEAN REGULARIZATION RELATES

CLOSE-TO-CLEAN REGULARIZATION RELATES Worshop trac - ICLR 016 CLOSE-TO-CLEAN REGULARIZATION RELATES VIRTUAL ADVERSARIAL TRAINING, LADDER NETWORKS AND OTHERS Mudassar Abbas, Jyri Kivinen, Tapani Raio Department of Computer Science, School of

More information

Notes on Regularized Least Squares Ryan M. Rifkin and Ross A. Lippert

Notes on Regularized Least Squares Ryan M. Rifkin and Ross A. Lippert Computer Science and Artificial Intelligence Laboratory Technical Report MIT-CSAIL-TR-2007-025 CBCL-268 May 1, 2007 Notes on Regularized Least Squares Ryan M. Rifkin and Ross A. Lippert massachusetts institute

More information

Machine learning - HT Maximum Likelihood

Machine learning - HT Maximum Likelihood Machine learning - HT 2016 3. Maximum Likelihood Varun Kanade University of Oxford January 27, 2016 Outline Probabilistic Framework Formulate linear regression in the language of probability Introduce

More information

ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering

ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering Lecturer: Nikolay Atanasov: natanasov@ucsd.edu Teaching Assistants: Siwei Guo: s9guo@eng.ucsd.edu Anwesan Pal:

More information

A GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES. Wei Chu, S. Sathiya Keerthi, Chong Jin Ong

A GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES. Wei Chu, S. Sathiya Keerthi, Chong Jin Ong A GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES Wei Chu, S. Sathiya Keerthi, Chong Jin Ong Control Division, Department of Mechanical Engineering, National University of Singapore 0 Kent Ridge Crescent,

More information

Machine Learning. 7. Logistic and Linear Regression

Machine Learning. 7. Logistic and Linear Regression Sapienza University of Rome, Italy - Machine Learning (27/28) University of Rome La Sapienza Master in Artificial Intelligence and Robotics Machine Learning 7. Logistic and Linear Regression Luca Iocchi,

More information

MAP Examples. Sargur Srihari

MAP Examples. Sargur Srihari MAP Examples Sargur srihari@cedar.buffalo.edu 1 Potts Model CRF for OCR Topics Image segmentation based on energy minimization 2 Examples of MAP Many interesting examples of MAP inference are instances

More information

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu Lecture: Gaussian Process Regression STAT 6474 Instructor: Hongxiao Zhu Motivation Reference: Marc Deisenroth s tutorial on Robot Learning. 2 Fast Learning for Autonomous Robots with Gaussian Processes

More information

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2

More information

Uncertainties estimation and compensation on a robotic manipulator. Application on Barrett arm

Uncertainties estimation and compensation on a robotic manipulator. Application on Barrett arm Uncertainties estimation and compensation on a robotic manipulator. Application on Barrett arm Duong Dang December, 8 Abstract In this project, two different regression methods have been used to estimate

More information

CSC2545 Topics in Machine Learning: Kernel Methods and Support Vector Machines

CSC2545 Topics in Machine Learning: Kernel Methods and Support Vector Machines CSC2545 Topics in Machine Learning: Kernel Methods and Support Vector Machines A comprehensive introduc@on to SVMs and other kernel methods, including theory, algorithms and applica@ons. Instructor: Anthony

More information

Reminders. Thought questions should be submitted on eclass. Please list the section related to the thought question

Reminders. Thought questions should be submitted on eclass. Please list the section related to the thought question Linear regression Reminders Thought questions should be submitted on eclass Please list the section related to the thought question If it is a more general, open-ended question not exactly related to a

More information

Gaussian Processes (10/16/13)

Gaussian Processes (10/16/13) STA561: Probabilistic machine learning Gaussian Processes (10/16/13) Lecturer: Barbara Engelhardt Scribes: Changwei Hu, Di Jin, Mengdi Wang 1 Introduction In supervised learning, we observe some inputs

More information

20: Gaussian Processes

20: Gaussian Processes 10-708: Probabilistic Graphical Models 10-708, Spring 2016 20: Gaussian Processes Lecturer: Andrew Gordon Wilson Scribes: Sai Ganesh Bandiatmakuri 1 Discussion about ML Here we discuss an introduction

More information

1. Kernel ridge regression In contrast to ordinary least squares which has a cost function. m (θ T x (i) y (i) ) 2, J(θ) = 1 2.

1. Kernel ridge regression In contrast to ordinary least squares which has a cost function. m (θ T x (i) y (i) ) 2, J(θ) = 1 2. CS229 Problem Set #2 Solutions 1 CS 229, Public Course Problem Set #2 Solutions: Theory Kernels, SVMs, and 1. Kernel ridge regression In contrast to ordinary least squares which has a cost function J(θ)

More information

Examples include: (a) the Lorenz system for climate and weather modeling (b) the Hodgkin-Huxley system for neuron modeling

Examples include: (a) the Lorenz system for climate and weather modeling (b) the Hodgkin-Huxley system for neuron modeling 1 Introduction Many natural processes can be viewed as dynamical systems, where the system is represented by a set of state variables and its evolution governed by a set of differential equations. Examples

More information

Active and Semi-supervised Kernel Classification

Active and Semi-supervised Kernel Classification Active and Semi-supervised Kernel Classification Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London Work done in collaboration with Xiaojin Zhu (CMU), John Lafferty (CMU),

More information

CIS 520: Machine Learning Oct 09, Kernel Methods

CIS 520: Machine Learning Oct 09, Kernel Methods CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed

More information

Learning gradients: prescriptive models

Learning gradients: prescriptive models Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan

More information

Gaussian processes and bayesian optimization Stanisław Jastrzębski. kudkudak.github.io kudkudak

Gaussian processes and bayesian optimization Stanisław Jastrzębski. kudkudak.github.io kudkudak Gaussian processes and bayesian optimization Stanisław Jastrzębski kudkudak.github.io kudkudak Plan Goal: talk about modern hyperparameter optimization algorithms Bayes reminder: equivalent linear regression

More information

Gaussian Process Regression

Gaussian Process Regression Gaussian Process Regression 4F1 Pattern Recognition, 21 Carl Edward Rasmussen Department of Engineering, University of Cambridge November 11th - 16th, 21 Rasmussen (Engineering, Cambridge) Gaussian Process

More information

Information geometry for bivariate distribution control

Information geometry for bivariate distribution control Information geometry for bivariate distribution control C.T.J.Dodson + Hong Wang Mathematics + Control Systems Centre, University of Manchester Institute of Science and Technology Optimal control of stochastic

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

A brief introduction to Conditional Random Fields

A brief introduction to Conditional Random Fields A brief introduction to Conditional Random Fields Mark Johnson Macquarie University April, 2005, updated October 2010 1 Talk outline Graphical models Maximum likelihood and maximum conditional likelihood

More information

Statistical Learning. Dong Liu. Dept. EEIS, USTC

Statistical Learning. Dong Liu. Dept. EEIS, USTC Statistical Learning Dong Liu Dept. EEIS, USTC Chapter 6. Unsupervised and Semi-Supervised Learning 1. Unsupervised learning 2. k-means 3. Gaussian mixture model 4. Other approaches to clustering 5. Principle

More information

Kernel Methods. Outline

Kernel Methods. Outline Kernel Methods Quang Nguyen University of Pittsburgh CS 3750, Fall 2011 Outline Motivation Examples Kernels Definitions Kernel trick Basic properties Mercer condition Constructing feature space Hilbert

More information

Observability and state estimation

Observability and state estimation EE263 Autumn 2015 S Boyd and S Lall Observability and state estimation state estimation discrete-time observability observability controllability duality observers for noiseless case continuous-time observability

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

1 EM algorithm: updating the mixing proportions {π k } ik are the posterior probabilities at the qth iteration of EM.

1 EM algorithm: updating the mixing proportions {π k } ik are the posterior probabilities at the qth iteration of EM. Université du Sud Toulon - Var Master Informatique Probabilistic Learning and Data Analysis TD: Model-based clustering by Faicel CHAMROUKHI Solution The aim of this practical wor is to show how the Classification

More information

Data-dependent representations: Laplacian Eigenmaps

Data-dependent representations: Laplacian Eigenmaps Data-dependent representations: Laplacian Eigenmaps November 4, 2015 Data Organization and Manifold Learning There are many techniques for Data Organization and Manifold Learning, e.g., Principal Component

More information

Probabilistic & Unsupervised Learning

Probabilistic & Unsupervised Learning Probabilistic & Unsupervised Learning Gaussian Processes Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College London

More information

Undirected Graphical Models

Undirected Graphical Models Undirected Graphical Models 1 Conditional Independence Graphs Let G = (V, E) be an undirected graph with vertex set V and edge set E, and let A, B, and C be subsets of vertices. We say that C separates

More information

Multiple-step Time Series Forecasting with Sparse Gaussian Processes

Multiple-step Time Series Forecasting with Sparse Gaussian Processes Multiple-step Time Series Forecasting with Sparse Gaussian Processes Perry Groot ab Peter Lucas a Paul van den Bosch b a Radboud University, Model-Based Systems Development, Heyendaalseweg 135, 6525 AJ

More information

Lecture 19 Observability and state estimation

Lecture 19 Observability and state estimation EE263 Autumn 2007-08 Stephen Boyd Lecture 19 Observability and state estimation state estimation discrete-time observability observability controllability duality observers for noiseless case continuous-time

More information

Discrete time processes

Discrete time processes Discrete time processes Predictions are difficult. Especially about the future Mark Twain. Florian Herzog 2013 Modeling observed data When we model observed (realized) data, we encounter usually the following

More information

Singular value decomposition. If only the first p singular values are nonzero we write. U T o U p =0

Singular value decomposition. If only the first p singular values are nonzero we write. U T o U p =0 Singular value decomposition If only the first p singular values are nonzero we write G =[U p U o ] " Sp 0 0 0 # [V p V o ] T U p represents the first p columns of U U o represents the last N-p columns

More information

Machine Learning for Signal Processing Bayes Classification and Regression

Machine Learning for Signal Processing Bayes Classification and Regression Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For

More information

Ergodicity in data assimilation methods

Ergodicity in data assimilation methods Ergodicity in data assimilation methods David Kelly Andy Majda Xin Tong Courant Institute New York University New York NY www.dtbkelly.com April 15, 2016 ETH Zurich David Kelly (CIMS) Data assimilation

More information

Linear Dynamical Systems (Kalman filter)

Linear Dynamical Systems (Kalman filter) Linear Dynamical Systems (Kalman filter) (a) Overview of HMMs (b) From HMMs to Linear Dynamical Systems (LDS) 1 Markov Chains with Discrete Random Variables x 1 x 2 x 3 x T Let s assume we have discrete

More information

Last lecture: Recurrence relations and differential equations. The solution to the differential equation dx

Last lecture: Recurrence relations and differential equations. The solution to the differential equation dx Last lecture: Recurrence relations and differential equations The solution to the differential equation dx = ax is x(t) = ce ax, where c = x() is determined by the initial conditions x(t) Let X(t) = and

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA

More information

DIMENSION REDUCTION. min. j=1

DIMENSION REDUCTION. min. j=1 DIMENSION REDUCTION 1 Principal Component Analysis (PCA) Principal components analysis (PCA) finds low dimensional approximations to the data by projecting the data onto linear subspaces. Let X R d and

More information

Linear Models for Classification

Linear Models for Classification Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,

More information

Elements of Positive Definite Kernel and Reproducing Kernel Hilbert Space

Elements of Positive Definite Kernel and Reproducing Kernel Hilbert Space Elements of Positive Definite Kernel and Reproducing Kernel Hilbert Space Statistical Inference with Reproducing Kernel Hilbert Space Kenji Fukumizu Institute of Statistical Mathematics, ROIS Department

More information

Lecture 1: From Data to Graphs, Weighted Graphs and Graph Laplacian

Lecture 1: From Data to Graphs, Weighted Graphs and Graph Laplacian Lecture 1: From Data to Graphs, Weighted Graphs and Graph Laplacian Radu Balan February 5, 2018 Datasets diversity: Social Networks: Set of individuals ( agents, actors ) interacting with each other (e.g.,

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:

More information

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods

More information

ESANN'2001 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 2001, D-Facto public., ISBN ,

ESANN'2001 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 2001, D-Facto public., ISBN , Sparse Kernel Canonical Correlation Analysis Lili Tan and Colin Fyfe 2, Λ. Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong. 2. School of Information and Communication

More information

Probabilistic Graphical Models Lecture 20: Gaussian Processes

Probabilistic Graphical Models Lecture 20: Gaussian Processes Probabilistic Graphical Models Lecture 20: Gaussian Processes Andrew Gordon Wilson www.cs.cmu.edu/~andrewgw Carnegie Mellon University March 30, 2015 1 / 53 What is Machine Learning? Machine learning algorithms

More information

Kernel methods for comparing distributions, measuring dependence

Kernel methods for comparing distributions, measuring dependence Kernel methods for comparing distributions, measuring dependence Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Principal component analysis Given a set of M centered observations

More information

Matrix Factorization with Applications to Clustering Problems: Formulation, Algorithms and Performance

Matrix Factorization with Applications to Clustering Problems: Formulation, Algorithms and Performance Date: Mar. 3rd, 2017 Matrix Factorization with Applications to Clustering Problems: Formulation, Algorithms and Performance Presenter: Songtao Lu Department of Electrical and Computer Engineering Iowa

More information

Introduction to Support Vector Machines

Introduction to Support Vector Machines Introduction to Support Vector Machines Hsuan-Tien Lin Learning Systems Group, California Institute of Technology Talk in NTU EE/CS Speech Lab, November 16, 2005 H.-T. Lin (Learning Systems Group) Introduction

More information