Multiple kernel learning for multiple sources

Size: px
Start display at page:

Download "Multiple kernel learning for multiple sources"

Transcription

1 Multiple kernel learning for multiple sources Francis Bach INRIA - Ecole Normale Supérieure NIPS Workshop - December 2008

2 Talk outline Multiple sources in computer vision Multiple kernel learning (MKL) Equivalent formulations Theoretical analysis and open problems Covariance operators

3 Machine learning for computer vision Learning tasks on images Multiplication of digital media Many different tasks to be solved Associated with different machine learning problems

4 Image retrieval Classification, ranking, outlier detection

5 Image retrieval Classification, ranking, outlier detection

6 Personal photos Classification, clustering, visualisation

7 Machine learning for computer vision Learning tasks on images Multiplication of digital media Many different tasks to be solved Associated with different machine learning problems Application: retrieval/indexing of images Common issues: Complex tasks Heterogeneous data links with other medias (text and sound) Massive data

8 Machine learning for computer vision Learning tasks on images Multiplication of digital media Many different tasks to be solved Associated with different machine learning problems Application: retrieval/indexing of images Common issues: Complex tasks Heterogeneous data links with other medias (text and sound) Massive data Kernel methods

9 Multiple sources in computer vision Many different cues Shape Color Texture Segments Interest points Kernel design is easier for one source at at time Links with bioinformatics

10 Kernels for interest points SIFT + pyramid match (Grauman and Darrell, 2007)

11 Kernels for texture Histograms of filters

12 Kernels from segmentation graphs (Harchaoui and Bach, 2007) Goal of segmentation: extract objects of interest Many methods available, but, rarely find the object of interest entirely Segmentation graphs Allows to work on more reliable over-segmentation Going to a large square grid (millions of pixels) to a small graph (dozens or hundreds of regions)

13 Segmentation by watershed transform (Meyer, 2001) image gradient watershed 287 segments 64 segments 10 segments

14 Segmentation by watershed transform (Meyer, 2001) image gradient watershed 287 segments 64 segments 10 segments

15 Image as a segmentation graph Labelled undirected graph Vertices: connected segmented regions Edges: between spatially neighboring regions Labels: region pixels Graph kernels (Gärtner et al., 2003; Kashima et al., 2004; Harchaoui and Bach, 2007) provide an elegant and efficient solution

16 Talk outline Multiple sources in computer vision Multiple kernel learning (MKL) Equivalent formulations Theoretical analysis and open problems Covariance operators

17 Multiple sources by combining kernels Learning combinations of kernels: K(η) = m j=1 η jk j, η 0 Summing kernels concatenating feature spaces Assume k 1 (x,y)= Φ 1 (x),φ 1 (y), k 2 (x,y)= Φ 2 (x),φ 2 (y) k 1 (x,y) + k 2 (x,y) = (Φ1 (x) Φ 2 (x) ) (, Φ1 (y)) Φ 2 (y)

18 Multiple sources by combining kernels Learning combinations of kernels: K(η) = m j=1 η jk j, η 0 Summing kernels concatenating feature spaces Two natural (equivalent) settings 1. Single input space, multiple feature spaces x X, m different kernels on X Example: learning hyperparameters of kernels 2. Multiple pairs of input/feature spaces x j X j, j = 1,...,m, kernels on each input space Multiple sources Generalized additive models (Hastie and Tibshirani, 1990)

19 Multiple kernel learning (Lanckriet et al., 2004; Bach et al., 2004a) Learning kernels K = m j=1 η jk j, η 0 Summing kernels is equivalent to concatenating feature spaces m feature maps Φ j : X F j, j = 1,...,m. Minimization with respect to f 1 F 1,...,f m F m Predictor: f(x) = f 1 Φ 1 (x) + + f m Φ m (x) Φ 1 (x) f 1 ր.. ց x Φ j (x) f j ց.. ր Φ m (x) f m Which regularization? f 1 Φ 1 (x) + + f mφ m (x)

20 Regularization for multiple kernels Summing kernels is equivalent to concatenating feature spaces m feature maps Φ j : X F j, j = 1,...,m. Minimization with respect to f 1 F 1,...,f m F m Predictor: f(x) = f 1 Φ 1 (x) + + f m Φ m (x) Regularization by m j=1 f j 2 is equivalent to using K = m j=1 K j

21 Regularization for multiple kernels Summing kernels is equivalent to concatenating feature spaces m feature maps Φ j : X F j, j = 1,...,m. Minimization with respect to f 1 F 1,...,f m F m Predictor: f(x) = f 1 Φ 1 (x) + + f m Φ m (x) Regularization by m j=1 f j 2 is equivalent to using K = m j=1 K j Regularization by m j=1 f j should impose sparsity at the group level Main questions when regularizing by block l 1 -norm: 1. Equivalence with other kernel learning formulations 2. Algorithms 3. Analysis of sparsity inducing properties (Bach, 2008)

22 General kernel learning Proposition (Lanckriet et al, 2004, Bach et al., 2005, Micchelli and Pontil, 2005): G(K) = min f F n i=1 ϕ i(f Φ(x i )) + λ 2 f 2 = max α R n n i=1 ψ i(λα i ) λ 2 α Kα is a convex function of the Gram matrix K Theoretical learning bounds (Lanckriet et al, 2004, Srebro and Ben- David, 2006)

23 MKL - equivalence with other kernel learning formulations (Bach et al., 2004a) Block l 1 -norm problem: n i=1 ϕ i(f 1 Φ 1 (x i ) + + f mφ m (x i )) + λ 2 ( f f m ) 2 Kernel learning formulation: minimize w.r.t. η in the simplex: G(K(η)) = max α R n n i=1 ψ i(λα i ) λ 2 α ( m j=1 η jk j ) α Proposition: Block l 1 -norm regularization is equivalent to minimizing with respect to η the optimal value G(K(η)) Weights η obtained from optimality conditions Single optimization problem for learning both η and α

24 Algorithms for MKL (very) costly optimization with SDP, QCQP ou SOCP n 1,000 10,000, m 100 not possible loose required precision first order methods (see, e.g., Bottou and Bousquet (2008)) Dual coordinate ascent (SMO) with smoothing (Bach et al., 2004a) Optimization of G(K) by cutting planes (Sonnenburg et al., 2006) Optimization of G(K) with steepest descent with smoothing (Rakotomamonjy et al., 2008) Regularization path (Bach et al., 2004b) etc...

25 Summing kernels vs. optimizing weights Different regularizations: Regularization by m j=1 f j 2 is equivalent to using K = m j=1 K j Regularization by m j=1 f j should impose sparsity at the group level, and learn sparse weights η, with K = m j=1 η j K j If sparsity is not expected, l 1 has no reason to be better

26 Performance on Corel14 (Harchaoui and Bach, 2007) Corel14: 1400 natural images with 14 classes

27 Performance on Corel14 (Harchaoui and Bach, 2007) Error rates Performance comparison on Corel14 Histogram kernels (H) Walk kernels (W) Tree-walk kernels (TW) Test error Weighted tree-walks (wtw) MKL (M) H W TW wtw M Kernels

28 Caltech101 database (Fei-Fei et al., 2006)

29 Kernel combination for Caltech101 (Varma and Ray, 2007) Classification accuracies 1- NN SVM (1 vs. 1) SVM (1 vs. rest) Shape GB ± ± ± 0.70 Shape GB ± ± ± 0.57 Self Similarity ± ± ± 0.84 PHOG ± ± ± 0.52 PHOG ± ± ± 0.85 PHOWColour ± ± ± 1.46 PHOWGray ± ± ± 0.30 MKL Block l ± ± 0.39 (Varma and Ray, 2007) ± ± 0.59 See also Bosch et al. (2008)

30 Talk outline Multiple sources in computer vision Multiple kernel learning (MKL) Equivalent formulations Theoretical analysis and open problems Covariance operators

31 Analysis of MKL as non parametric group Lasso Assume m Hilbert spaces F i, i = 1,...,m on m different input spaces n m min y i f j (x ji ) + µ m n f j. f i F i, i=1,...,m 2n 2 i=1 NB: f j (x ji ) = f j Φ j(x ji ) j=1 Sparse generalized additive models (Hastie and Tibshirani, 1990, Ravikumar et al., 2007) Algorithms: use parametrization with α Analysis: Do not use α use covariance operators (i.e., stay in the primal/input space) j=1

32 (non centered) covariance operators Single random variable X: Σ XX is a bounded linear operator from F to F such that for all (f,g) F F, f,σ XX g = E(f(X)g(X)) Under minor assumptions, the operator Σ XX is auto-adjoint, nonnegative and Hilbert-Schmidt Tool of choice for the analysis of least-squares non parametric methods (Blanchard, 2006, Fukumizu et al., 2005, 2006, Gretton et al., 2006, Harchaoui et al., 2007, 2008, etc.) Natural empirical estimate f, ˆΣ XX g = 1 n n i=1 f(x i)g(x i ) converges in probability to Σ XX in HS norm.

33 Cross-covariance operators Several random variables: cross-covariance operators Σ Xi X j from F j to F i such that (f i,f j ) F i F j, f i,σ Xi X j f j = E(f i (X i )f j (X j )) Similar convergence properties of empirical estimates Joint covariance operator Σ XX defined by blocks We can define the bounded correlation operators through Σ Xi X j = Σ 1/2 X i X i C Xi X j Σ 1/2 X j X j NB: the joint covariance operator is never invertible, but the correlation operator may be

34 Covariance operators for multiple sources Simple tool for characterizing relationship between sources Formally equivalent to finite feature space setting Allows proper asymptotic and non asymptotic analysis, e.g., Limit distributions of non parametric test statistics (Gretton et al., 2006, Harchaoui, Bach and Moulines, 2007, 2008)

35 Analysis of MKL as non parametric group Lasso Assumptions 1. Generalized additive model: There exists functions f = (f 1,...,f m ) F = F 1 F m such that Y = m j=1 f j(x j ) + ε 2. Compacity and invertibility : All cross-correlation operators are compact and the joint correlation operator is invertible. 3. Additional technical assumptions

36 Compacity and invertibility of joint correlation operator Sufficient condition for compacity when distributions have densities: E { } pxi X j (x i,x j ) p Xi (x i )p Xj (x j ) 1 <. Dependence between variables is not too strong Sufficient condition for invertibility: functions in the RKHS. no exact correlation using Empty concurvity space assumption (Hastie and Tibshirani, 1990)

37 Group lasso - Consistency conditions Strict condition Weak condition max i J c Σ 1/2 X i X i C Xi X J C 1 X J X J Diag(1/ f j )g J < 1 max i J c Σ 1/2 X i X i C Xi X J C 1 X J X J Diag(1/ f j )g J 1 Theorem 1: Strict condition is sufficient for joint regular and sparsity consistency of the lasso. Theorem 2: Weak condition is necessary for joint regular and sparsity consistency of the lasso.

38 Conclusion - Interesting problems/issues Multiple kernel learning for supervised learning with multiple sources Vision (and bioinformatics?) Equivalent formulations Learning from exponentially many sources Theory: good estimation as long as log p = o(n) Structure is needed! (Bach, NIPS 2008) Choosing well-behaved sources Different sources or similar sources? Characterizing when using multiple sources helps

39 References F. R. Bach. Consistency of the group Lasso and multiple kernel learning. Journal of Machine Learning Research, pages , F. R. Bach, G. R. G. Lanckriet, and M. I. Jordan. Multiple kernel learning, conic duality, and the SMO algorithm. In Proceedings of the International Conference on Machine Learning (ICML), 2004a. F. R. Bach, R. Thibaux, and M. I. Jordan. Computing regularization paths for learning multiple kernels. In Advances in Neural Information Processing Systems 17, 2004b. A. Bosch, Zisserman A., and X. Munoz. Image classification using rois and multiple kernel learning. International Journal of Computer Vision, submitted. Léon Bottou and Olivier Bousquet. Learning using large datasets. In Mining Massive DataSets for Security, NATO ASI Workshop Series. IOS Press, Amsterdam, URL org/papers/bottou-bousquet-2008b. to appear. L. Fei-Fei, R. Fergus, and P. Perona. Learning generative visual models for 101 object categories. Computer Vision and Image Understanding, Thomas Gärtner, Peter A. Flach, and Stefan Wrobel. On graph kernels: Hardness results and efficient alternatives. In COLT, K. Grauman and T. Darrell. The pyramid match kernel: Efficient learning with sets of features. J. Mach. Learn. Res., 8: , ISSN Z. Harchaoui and F. R. Bach. Image classification with segmentation graph kernels. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), 2007.

40 T. J. Hastie and R. J. Tibshirani. Generalized Additive Models. Chapman & Hall, Hisashi Kashima, Koji Tsuda, and Akihiro Inokuchi. Kernels for graphs. In Kernel Methods in Computational Biology. MIT Press, G. R. G. Lanckriet, N. Cristianini, L. El Ghaoui, P. Bartlett, and M. I. Jordan. Learning the kernel matrix with semidefinite programming. Journal of Machine Learning Research, 5:27 72, F. Meyer. Hierarchies of partitions and morphological segmentation. In Scale-Space and Morphology in Computer Vision. Springer-Verlag, A. Rakotomamonjy, F. R. Bach, S. Canu, and Y. Grandvalet. Simplemkl. Journal of Machine Learning Research, to appear, S. Sonnenburg, G. Räsch, C. Schäfer, and B. Schölkopf. Large scale multiple kernel learning. Journal of Machine Learning Research, 7: , M. Varma and D. Ray. Learning the discriminative power-invariance trade-off. In Proc. ICCV, 2007.

Hierarchical kernel learning

Hierarchical kernel learning Hierarchical kernel learning Francis Bach Willow project, INRIA - Ecole Normale Supérieure May 2010 Outline Supervised learning and regularization Kernel methods vs. sparse methods MKL: Multiple kernel

More information

Computing regularization paths for learning multiple kernels

Computing regularization paths for learning multiple kernels Computing regularization paths for learning multiple kernels Francis Bach Romain Thibaux Michael Jordan Computer Science, UC Berkeley December, 24 Code available at www.cs.berkeley.edu/~fbach Computing

More information

MULTIPLEKERNELLEARNING CSE902

MULTIPLEKERNELLEARNING CSE902 MULTIPLEKERNELLEARNING CSE902 Multiple Kernel Learning -keywords Heterogeneous information fusion Feature selection Max-margin classification Multiple kernel learning MKL Convex optimization Kernel classification

More information

Kernel Learning for Multi-modal Recognition Tasks

Kernel Learning for Multi-modal Recognition Tasks Kernel Learning for Multi-modal Recognition Tasks J. Saketha Nath CSE, IIT-B IBM Workshop J. Saketha Nath (IIT-B) IBM Workshop 23-Sep-09 1 / 15 Multi-modal Learning Tasks Multiple views or descriptions

More information

arxiv: v1 [cs.lg] 31 Dec 2013

arxiv: v1 [cs.lg] 31 Dec 2013 Controlled Sparsity Kernel Learning arxiv:141.116v1 [cs.lg] 31 Dec 213 Abstract Dinesh Govindaraj Bell Labs Research Bangalore, India dinesh.govindaraj@alcatel-lucent.com Sreedal Menon Bell Labs Research

More information

Kernel methods & sparse methods for computer vision

Kernel methods & sparse methods for computer vision Kernel methods & sparse methods for computer vision Francis Bach Willow project, INRIA - Ecole Normale Supérieure CVML Summer School, Grenoble, July 2010 1 Machine learning Supervised learning Predict

More information

Beyond stochastic gradient descent for large-scale machine learning

Beyond stochastic gradient descent for large-scale machine learning Beyond stochastic gradient descent for large-scale machine learning Francis Bach INRIA - Ecole Normale Supérieure, Paris, France Joint work with Eric Moulines - October 2014 Big data revolution? A new

More information

Beyond stochastic gradient descent for large-scale machine learning

Beyond stochastic gradient descent for large-scale machine learning Beyond stochastic gradient descent for large-scale machine learning Francis Bach INRIA - Ecole Normale Supérieure, Paris, France Joint work with Eric Moulines, Nicolas Le Roux and Mark Schmidt - CAP, July

More information

Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning

Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning Francis Bach INRIA - Willow Project, École Normale Supérieure 45, rue d Ulm, 75230 Paris, France francis.bach@mines.org Abstract

More information

Cluster Kernels for Semi-Supervised Learning

Cluster Kernels for Semi-Supervised Learning Cluster Kernels for Semi-Supervised Learning Olivier Chapelle, Jason Weston, Bernhard Scholkopf Max Planck Institute for Biological Cybernetics, 72076 Tiibingen, Germany {first. last} @tuebingen.mpg.de

More information

Statistical Convergence of Kernel CCA

Statistical Convergence of Kernel CCA Statistical Convergence of Kernel CCA Kenji Fukumizu Institute of Statistical Mathematics Tokyo 106-8569 Japan fukumizu@ism.ac.jp Francis R. Bach Centre de Morphologie Mathematique Ecole des Mines de Paris,

More information

Parcimonie en apprentissage statistique

Parcimonie en apprentissage statistique Parcimonie en apprentissage statistique Guillaume Obozinski Ecole des Ponts - ParisTech Journée Parcimonie Fédération Charles Hermite, 23 Juin 2014 Parcimonie en apprentissage 1/44 Classical supervised

More information

Kernels for Multi task Learning

Kernels for Multi task Learning Kernels for Multi task Learning Charles A Micchelli Department of Mathematics and Statistics State University of New York, The University at Albany 1400 Washington Avenue, Albany, NY, 12222, USA Massimiliano

More information

Stochastic gradient methods for machine learning

Stochastic gradient methods for machine learning Stochastic gradient methods for machine learning Francis Bach INRIA - Ecole Normale Supérieure, Paris, France Joint work with Eric Moulines, Nicolas Le Roux and Mark Schmidt - January 2013 Context Machine

More information

Convergence Rates of Kernel Quadrature Rules

Convergence Rates of Kernel Quadrature Rules Convergence Rates of Kernel Quadrature Rules Francis Bach INRIA - Ecole Normale Supérieure, Paris, France ÉCOLE NORMALE SUPÉRIEURE NIPS workshop on probabilistic integration - Dec. 2015 Outline Introduction

More information

High-Dimensional Non-Linear Variable Selection through Hierarchical Kernel Learning

High-Dimensional Non-Linear Variable Selection through Hierarchical Kernel Learning High-Dimensional Non-Linear Variable Selection through Hierarchical Kernel Learning Francis Bach INRIA - WILLOW Project-Team Laboratoire d Informatique de l Ecole Normale Supérieure CNRS/ENS/INRIA UMR

More information

Convex Optimization in Classification Problems

Convex Optimization in Classification Problems New Trends in Optimization and Computational Algorithms December 9 13, 2001 Convex Optimization in Classification Problems Laurent El Ghaoui Department of EECS, UC Berkeley elghaoui@eecs.berkeley.edu 1

More information

KEOPS: KERNELS ORGANIZED INTO PYRAMIDS

KEOPS: KERNELS ORGANIZED INTO PYRAMIDS KEOPS: KERNELS ORGANIZED INTO PYRAMIDS Marie Szafranski ENSIIE IBISC Université d Évry Val d Essonne Évry, France Yves Grandvalet Université de Technologie de Compiègne CNRS UMR 253 Heudiasyc Compiègne,

More information

Stochastic gradient descent and robustness to ill-conditioning

Stochastic gradient descent and robustness to ill-conditioning Stochastic gradient descent and robustness to ill-conditioning Francis Bach INRIA - Ecole Normale Supérieure, Paris, France ÉCOLE NORMALE SUPÉRIEURE Joint work with Aymeric Dieuleveut, Nicolas Flammarion,

More information

Multi-Task Learning and Matrix Regularization

Multi-Task Learning and Matrix Regularization Multi-Task Learning and Matrix Regularization Andreas Argyriou Department of Computer Science University College London Collaborators T. Evgeniou (INSEAD) R. Hauser (University of Oxford) M. Herbster (University

More information

Optimal Kernel Selection in Kernel Fisher Discriminant Analysis

Optimal Kernel Selection in Kernel Fisher Discriminant Analysis Optimal Kernel Selection in Kernel Fisher Discriminant Analysis Seung-Jean Kim Alessandro Magnani Stephen Boyd Department of Electrical Engineering, Stanford University, Stanford, CA 94304 USA sjkim@stanford.org

More information

Semi-supervised Dictionary Learning Based on Hilbert-Schmidt Independence Criterion

Semi-supervised Dictionary Learning Based on Hilbert-Schmidt Independence Criterion Semi-supervised ictionary Learning Based on Hilbert-Schmidt Independence Criterion Mehrdad J. Gangeh 1, Safaa M.A. Bedawi 2, Ali Ghodsi 3, and Fakhri Karray 2 1 epartments of Medical Biophysics, and Radiation

More information

KEOPS: KERNELS ORGANIZED INTO PYRAMIDS

KEOPS: KERNELS ORGANIZED INTO PYRAMIDS 204 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP) KEOPS: KERNELS ORGANIZED INTO PYRAMIDS Marie Szafranski ENSIIE IBISC Université d Évry Val d Essonne Évry, France Yves

More information

Stochastic optimization in Hilbert spaces

Stochastic optimization in Hilbert spaces Stochastic optimization in Hilbert spaces Aymeric Dieuleveut Aymeric Dieuleveut Stochastic optimization Hilbert spaces 1 / 48 Outline Learning vs Statistics Aymeric Dieuleveut Stochastic optimization Hilbert

More information

Fantope Regularization in Metric Learning

Fantope Regularization in Metric Learning Fantope Regularization in Metric Learning CVPR 2014 Marc T. Law (LIP6, UPMC), Nicolas Thome (LIP6 - UPMC Sorbonne Universités), Matthieu Cord (LIP6 - UPMC Sorbonne Universités), Paris, France Introduction

More information

Multiple Kernel Self-Organizing Maps

Multiple Kernel Self-Organizing Maps Multiple Kernel Self-Organizing Maps Madalina Olteanu (2), Nathalie Villa-Vialaneix (1,2), Christine Cierco-Ayrolles (1) http://www.nathalievilla.org {madalina.olteanu,nathalie.villa}@univ-paris1.fr, christine.cierco@toulouse.inra.fr

More information

Fast Kernel Learning using Sequential Minimal Optimization

Fast Kernel Learning using Sequential Minimal Optimization Fast Kernel Learning using Sequential Minimal Optimization Francis R. Bach & Gert R. G. Lanckriet {fbach,gert}@cs.berkeley.edu Division of Computer Science, Department of Electrical Engineering and Computer

More information

Introduction to Three Paradigms in Machine Learning. Julien Mairal

Introduction to Three Paradigms in Machine Learning. Julien Mairal Introduction to Three Paradigms in Machine Learning Julien Mairal Inria Grenoble Yerevan, 208 Julien Mairal Introduction to Three Paradigms in Machine Learning /25 Optimization is central to machine learning

More information

Learning Kernels -Tutorial Part III: Theoretical Guarantees.

Learning Kernels -Tutorial Part III: Theoretical Guarantees. Learning Kernels -Tutorial Part III: Theoretical Guarantees. Corinna Cortes Google Research corinna@google.com Mehryar Mohri Courant Institute & Google Research mohri@cims.nyu.edu Afshin Rostami UC Berkeley

More information

Support Vector Machines for Classification: A Statistical Portrait

Support Vector Machines for Classification: A Statistical Portrait Support Vector Machines for Classification: A Statistical Portrait Yoonkyung Lee Department of Statistics The Ohio State University May 27, 2011 The Spring Conference of Korean Statistical Society KAIST,

More information

Discriminative Learning and Big Data

Discriminative Learning and Big Data AIMS-CDT Michaelmas 2016 Discriminative Learning and Big Data Lecture 2: Other loss functions and ANN Andrew Zisserman Visual Geometry Group University of Oxford http://www.robots.ox.ac.uk/~vgg Lecture

More information

Multiple Kernel Learning, Conic Duality, and the SMO Algorithm

Multiple Kernel Learning, Conic Duality, and the SMO Algorithm Multiple Kernel Learning, Conic Duality, and the SMO Algorithm Francis R. Bach & Gert R. G. Lanckriet {fbach,gert}@cs.berkeley.edu Department of Electrical Engineering and Computer Science, University

More information

Semi-Supervised Learning through Principal Directions Estimation

Semi-Supervised Learning through Principal Directions Estimation Semi-Supervised Learning through Principal Directions Estimation Olivier Chapelle, Bernhard Schölkopf, Jason Weston Max Planck Institute for Biological Cybernetics, 72076 Tübingen, Germany {first.last}@tuebingen.mpg.de

More information

Kernel-Based Contrast Functions for Sufficient Dimension Reduction

Kernel-Based Contrast Functions for Sufficient Dimension Reduction Kernel-Based Contrast Functions for Sufficient Dimension Reduction Michael I. Jordan Departments of Statistics and EECS University of California, Berkeley Joint work with Kenji Fukumizu and Francis Bach

More information

Stochastic gradient methods for machine learning

Stochastic gradient methods for machine learning Stochastic gradient methods for machine learning Francis Bach INRIA - Ecole Normale Supérieure, Paris, France Joint work with Eric Moulines, Nicolas Le Roux and Mark Schmidt - September 2012 Context Machine

More information

Optimizing Over Radial Kernels on Compact Manifolds

Optimizing Over Radial Kernels on Compact Manifolds Optimizing Over Radial Kernels on Compact Manifolds Sadeep Jayasumana, Richard Hartley, Mathieu Salzmann, Hongdong Li, and Mehrtash Harandi Australian National University, Canberra NICTA, Canberra sadeep.jayasumana@anu.edu.au

More information

Learning Spectral Graph Segmentation

Learning Spectral Graph Segmentation Learning Spectral Graph Segmentation AISTATS 2005 Timothée Cour Jianbo Shi Nicolas Gogin Computer and Information Science Department University of Pennsylvania Computer Science Ecole Polytechnique Graph-based

More information

Statistical Optimality of Stochastic Gradient Descent through Multiple Passes

Statistical Optimality of Stochastic Gradient Descent through Multiple Passes Statistical Optimality of Stochastic Gradient Descent through Multiple Passes Francis Bach INRIA - Ecole Normale Supérieure, Paris, France ÉCOLE NORMALE SUPÉRIEURE Joint work with Loucas Pillaud-Vivien

More information

The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space

The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space Robert Jenssen, Deniz Erdogmus 2, Jose Principe 2, Torbjørn Eltoft Department of Physics, University of Tromsø, Norway

More information

A convex relaxation for weakly supervised classifiers

A convex relaxation for weakly supervised classifiers A convex relaxation for weakly supervised classifiers Armand Joulin and Francis Bach SIERRA group INRIA -Ecole Normale Supérieure ICML 2012 Weakly supervised classification We adress the problem of weakly

More information

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of

More information

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Learning Kernels MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Outline Kernel methods. Learning kernels scenario. learning bounds. algorithms. page 2 Machine Learning

More information

Convex relaxation for Combinatorial Penalties

Convex relaxation for Combinatorial Penalties Convex relaxation for Combinatorial Penalties Guillaume Obozinski Equipe Imagine Laboratoire d Informatique Gaspard Monge Ecole des Ponts - ParisTech Joint work with Francis Bach Fête Parisienne in Computation,

More information

Similarity-Based Theoretical Foundation for Sparse Parzen Window Prediction

Similarity-Based Theoretical Foundation for Sparse Parzen Window Prediction Similarity-Based Theoretical Foundation for Sparse Parzen Window Prediction Nina Balcan Avrim Blum Nati Srebro Toyota Technological Institute Chicago Sparse Parzen Window Prediction We are concerned with

More information

1 Graph Kernels by Spectral Transforms

1 Graph Kernels by Spectral Transforms Graph Kernels by Spectral Transforms Xiaojin Zhu Jaz Kandola John Lafferty Zoubin Ghahramani Many graph-based semi-supervised learning methods can be viewed as imposing smoothness conditions on the target

More information

Robust Support Vector Machines for Probability Distributions

Robust Support Vector Machines for Probability Distributions Robust Support Vector Machines for Probability Distributions Andreas Christmann joint work with Ingo Steinwart (Los Alamos National Lab) ICORS 2008, Antalya, Turkey, September 8-12, 2008 Andreas Christmann,

More information

Mixture Kernel Least Mean Square

Mixture Kernel Least Mean Square Proceedings of International Joint Conference on Neural Networks, Dallas, Texas, USA, August 4-9, 2013 Mixture Kernel Least Mean Square Rosha Pokharel, José. C. Príncipe Department of Electrical and Computer

More information

An Adaptive Test of Independence with Analytic Kernel Embeddings

An Adaptive Test of Independence with Analytic Kernel Embeddings An Adaptive Test of Independence with Analytic Kernel Embeddings Wittawat Jitkrittum 1 Zoltán Szabó 2 Arthur Gretton 1 1 Gatsby Unit, University College London 2 CMAP, École Polytechnique ICML 2017, Sydney

More information

Tales from fmri Learning from limited labeled data. Gae l Varoquaux

Tales from fmri Learning from limited labeled data. Gae l Varoquaux Tales from fmri Learning from limited labeled data Gae l Varoquaux fmri data p 100 000 voxels per map Heavily correlated + structured noise Low SNR: 5% 13 db Brain response maps (activation) n Hundreds,

More information

Distinguish between different types of scenes. Matching human perception Understanding the environment

Distinguish between different types of scenes. Matching human perception Understanding the environment Scene Recognition Adriana Kovashka UTCS, PhD student Problem Statement Distinguish between different types of scenes Applications Matching human perception Understanding the environment Indexing of images

More information

9.2 Support Vector Machines 159

9.2 Support Vector Machines 159 9.2 Support Vector Machines 159 9.2.3 Kernel Methods We have all the tools together now to make an exciting step. Let us summarize our findings. We are interested in regularized estimation problems of

More information

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer Vicente L. Malave February 23, 2011 Outline Notation minimize a number of functions φ

More information

Least Absolute Shrinkage is Equivalent to Quadratic Penalization

Least Absolute Shrinkage is Equivalent to Quadratic Penalization Least Absolute Shrinkage is Equivalent to Quadratic Penalization Yves Grandvalet Heudiasyc, UMR CNRS 6599, Université de Technologie de Compiègne, BP 20.529, 60205 Compiègne Cedex, France Yves.Grandvalet@hds.utc.fr

More information

Discriminative Tensor Sparse Coding for Image Classification

Discriminative Tensor Sparse Coding for Image Classification ZHANG et al: DISCRIMINATIVE TENSOR SPARSE CODING 1 Discriminative Tensor Sparse Coding for Image Classification Yangmuzi Zhang 1 ymzhang@umiacs.umd.edu Zhuolin Jiang 2 zhuolin.jiang@huawei.com Larry S.

More information

One-class Label Propagation Using Local Cone Based Similarity

One-class Label Propagation Using Local Cone Based Similarity One-class Label Propagation Using Local Based Similarity Takumi Kobayashi and Nobuyuki Otsu Abstract In this paper, we propose a novel method of label propagation for one-class learning. For binary (positive/negative)

More information

Kernel methods for comparing distributions, measuring dependence

Kernel methods for comparing distributions, measuring dependence Kernel methods for comparing distributions, measuring dependence Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Principal component analysis Given a set of M centered observations

More information

Tractable Upper Bounds on the Restricted Isometry Constant

Tractable Upper Bounds on the Restricted Isometry Constant Tractable Upper Bounds on the Restricted Isometry Constant Alex d Aspremont, Francis Bach, Laurent El Ghaoui Princeton University, École Normale Supérieure, U.C. Berkeley. Support from NSF, DHS and Google.

More information

A Spectral Regularization Framework for Multi-Task Structure Learning

A Spectral Regularization Framework for Multi-Task Structure Learning A Spectral Regularization Framework for Multi-Task Structure Learning Massimiliano Pontil Department of Computer Science University College London (Joint work with A. Argyriou, T. Evgeniou, C.A. Micchelli,

More information

CS798: Selected topics in Machine Learning

CS798: Selected topics in Machine Learning CS798: Selected topics in Machine Learning Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Jakramate Bootkrajang CS798: Selected topics in Machine Learning

More information

Bits of Machine Learning Part 1: Supervised Learning

Bits of Machine Learning Part 1: Supervised Learning Bits of Machine Learning Part 1: Supervised Learning Alexandre Proutiere and Vahan Petrosyan KTH (The Royal Institute of Technology) Outline of the Course 1. Supervised Learning Regression and Classification

More information

Online MKL for Structured Prediction

Online MKL for Structured Prediction Online MKL for Structured Prediction André F. T. Martins Noah A. Smith Eric P. Xing School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA Pedro M. Q. Aguiar Instituto de Sistemas

More information

... SPARROW. SPARse approximation Weighted regression. Pardis Noorzad. Department of Computer Engineering and IT Amirkabir University of Technology

... SPARROW. SPARse approximation Weighted regression. Pardis Noorzad. Department of Computer Engineering and IT Amirkabir University of Technology ..... SPARROW SPARse approximation Weighted regression Pardis Noorzad Department of Computer Engineering and IT Amirkabir University of Technology Université de Montréal March 12, 2012 SPARROW 1/47 .....

More information

Compressed Fisher vectors for LSVR

Compressed Fisher vectors for LSVR XRCE@ILSVRC2011 Compressed Fisher vectors for LSVR Florent Perronnin and Jorge Sánchez* Xerox Research Centre Europe (XRCE) *Now with CIII, Cordoba University, Argentina Our system in a nutshell High-dimensional

More information

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction Linear vs Non-linear classifier CS789: Machine Learning and Neural Network Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Linear classifier is in the

More information

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima. http://goo.gl/jv7vj9 Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT

More information

Beyond stochastic gradient descent for large-scale machine learning

Beyond stochastic gradient descent for large-scale machine learning Beyond stochastic gradient descent for large-scale machine learning Francis Bach INRIA - Ecole Normale Supérieure, Paris, France ÉCOLE NORMALE SUPÉRIEURE Joint work with Aymeric Dieuleveut, Nicolas Flammarion,

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Dan Oneaţă 1 Introduction Probabilistic Latent Semantic Analysis (plsa) is a technique from the category of topic models. Its main goal is to model cooccurrence information

More information

Approximate Kernel Methods

Approximate Kernel Methods Lecture 3 Approximate Kernel Methods Bharath K. Sriperumbudur Department of Statistics, Pennsylvania State University Machine Learning Summer School Tübingen, 207 Outline Motivating example Ridge regression

More information

Robust Kernel-Based Regression

Robust Kernel-Based Regression Robust Kernel-Based Regression Budi Santosa Department of Industrial Engineering Sepuluh Nopember Institute of Technology Kampus ITS Surabaya Surabaya 60111,Indonesia Theodore B. Trafalis School of Industrial

More information

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM 1 Support Vector Machines (SVM) in bioinformatics Day 1: Introduction to SVM Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan Jean-Philippe.Vert@mines.org Human Genome Center, University

More information

SVRG++ with Non-uniform Sampling

SVRG++ with Non-uniform Sampling SVRG++ with Non-uniform Sampling Tamás Kern András György Department of Electrical and Electronic Engineering Imperial College London, London, UK, SW7 2BT {tamas.kern15,a.gyorgy}@imperial.ac.uk Abstract

More information

Metric Embedding of Task-Specific Similarity. joint work with Trevor Darrell (MIT)

Metric Embedding of Task-Specific Similarity. joint work with Trevor Darrell (MIT) Metric Embedding of Task-Specific Similarity Greg Shakhnarovich Brown University joint work with Trevor Darrell (MIT) August 9, 2006 Task-specific similarity A toy example: Task-specific similarity A toy

More information

Kernel Density Topic Models: Visual Topics Without Visual Words

Kernel Density Topic Models: Visual Topics Without Visual Words Kernel Density Topic Models: Visual Topics Without Visual Words Konstantinos Rematas K.U. Leuven ESAT-iMinds krematas@esat.kuleuven.be Mario Fritz Max Planck Institute for Informatics mfrtiz@mpi-inf.mpg.de

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

CS4495/6495 Introduction to Computer Vision. 8C-L3 Support Vector Machines

CS4495/6495 Introduction to Computer Vision. 8C-L3 Support Vector Machines CS4495/6495 Introduction to Computer Vision 8C-L3 Support Vector Machines Discriminative classifiers Discriminative classifiers find a division (surface) in feature space that separates the classes Several

More information

Mathematical Programming for Multiple Kernel Learning

Mathematical Programming for Multiple Kernel Learning Mathematical Programming for Multiple Kernel Learning Alex Zien Fraunhofer FIRST.IDA, Berlin, Germany Friedrich Miescher Laboratory, Tübingen, Germany 07. July 2009 Mathematical Programming Stream, EURO

More information

Classification. The goal: map from input X to a label Y. Y has a discrete set of possible values. We focused on binary Y (values 0 or 1).

Classification. The goal: map from input X to a label Y. Y has a discrete set of possible values. We focused on binary Y (values 0 or 1). Regression and PCA Classification The goal: map from input X to a label Y. Y has a discrete set of possible values We focused on binary Y (values 0 or 1). But we also discussed larger number of classes

More information

Graph Partitioning Using Random Walks

Graph Partitioning Using Random Walks Graph Partitioning Using Random Walks A Convex Optimization Perspective Lorenzo Orecchia Computer Science Why Spectral Algorithms for Graph Problems in practice? Simple to implement Can exploit very efficient

More information

Exclusive Lasso for Multi-task Feature Selection

Exclusive Lasso for Multi-task Feature Selection Yang Zhou 1 Rong Jin 1 Steven C.H. Hoi 1 Department of Computer Science and Engineering Michigan State University East Lansing, MI 48910 USA {zhouyang,rongjin}@msu.edu School of Computer Engineering Nanyang

More information

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015 EE613 Machine Learning for Engineers Kernel methods Support Vector Machines jean-marc odobez 2015 overview Kernel methods introductions and main elements defining kernels Kernelization of k-nn, K-Means,

More information

Kernel Measures of Conditional Dependence

Kernel Measures of Conditional Dependence Kernel Measures of Conditional Dependence Kenji Fukumizu Institute of Statistical Mathematics 4-6-7 Minami-Azabu, Minato-ku Tokyo 6-8569 Japan fukumizu@ism.ac.jp Arthur Gretton Max-Planck Institute for

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Tobias Pohlen Selected Topics in Human Language Technology and Pattern Recognition February 10, 2014 Human Language Technology and Pattern Recognition Lehrstuhl für Informatik 6

More information

Machine Learning. Classification, Discriminative learning. Marc Toussaint University of Stuttgart Summer 2015

Machine Learning. Classification, Discriminative learning. Marc Toussaint University of Stuttgart Summer 2015 Machine Learning Classification, Discriminative learning Structured output, structured input, discriminative function, joint input-output features, Likelihood Maximization, Logistic regression, binary

More information

Sample-adaptive Multiple Kernel Learning

Sample-adaptive Multiple Kernel Learning Sample-adaptive Multiple Kernel Learning Xinwang Liu, Lei Wang, Jian Zhang, Jianping Yin Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology

More information

Metric Embedding for Kernel Classification Rules

Metric Embedding for Kernel Classification Rules Metric Embedding for Kernel Classification Rules Bharath K. Sriperumbudur University of California, San Diego (Joint work with Omer Lang & Gert Lanckriet) Bharath K. Sriperumbudur (UCSD) Metric Embedding

More information

An Adaptive Test of Independence with Analytic Kernel Embeddings

An Adaptive Test of Independence with Analytic Kernel Embeddings An Adaptive Test of Independence with Analytic Kernel Embeddings Wittawat Jitkrittum Gatsby Unit, University College London wittawat@gatsby.ucl.ac.uk Probabilistic Graphical Model Workshop 2017 Institute

More information

Multiple Similarities Based Kernel Subspace Learning for Image Classification

Multiple Similarities Based Kernel Subspace Learning for Image Classification Multiple Similarities Based Kernel Subspace Learning for Image Classification Wang Yan, Qingshan Liu, Hanqing Lu, and Songde Ma National Laboratory of Pattern Recognition, Institute of Automation, Chinese

More information

Scale-Invariance of Support Vector Machines based on the Triangular Kernel. Abstract

Scale-Invariance of Support Vector Machines based on the Triangular Kernel. Abstract Scale-Invariance of Support Vector Machines based on the Triangular Kernel François Fleuret Hichem Sahbi IMEDIA Research Group INRIA Domaine de Voluceau 78150 Le Chesnay, France Abstract This paper focuses

More information

Brief Introduction to Machine Learning

Brief Introduction to Machine Learning Brief Introduction to Machine Learning Yuh-Jye Lee Lab of Data Science and Machine Intelligence Dept. of Applied Math. at NCTU August 29, 2016 1 / 49 1 Introduction 2 Binary Classification 3 Support Vector

More information

Semi-Supervised Learning

Semi-Supervised Learning Semi-Supervised Learning getting more for less in natural language processing and beyond Xiaojin (Jerry) Zhu School of Computer Science Carnegie Mellon University 1 Semi-supervised Learning many human

More information

Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite Sum Structure

Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite Sum Structure Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite Sum Structure Alberto Bietti Julien Mairal Inria Grenoble (Thoth) March 21, 2017 Alberto Bietti Stochastic MISO March 21,

More information

Machine Learning. A. Supervised Learning A.1. Linear Regression. Lars Schmidt-Thieme

Machine Learning. A. Supervised Learning A.1. Linear Regression. Lars Schmidt-Thieme Machine Learning A. Supervised Learning A.1. Linear Regression Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim, Germany

More information

MTTTS16 Learning from Multiple Sources

MTTTS16 Learning from Multiple Sources MTTTS16 Learning from Multiple Sources 5 ECTS credits Autumn 2018, University of Tampere Lecturer: Jaakko Peltonen Lecture 6: Multitask learning with kernel methods and nonparametric models On this lecture:

More information

Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 1/31

Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 1/31 Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking Dengyong Zhou zhou@tuebingen.mpg.de Dept. Schölkopf, Max Planck Institute for Biological Cybernetics, Germany Learning from

More information

ESANN'2003 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 2003, d-side publi., ISBN X, pp.

ESANN'2003 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 2003, d-side publi., ISBN X, pp. On different ensembles of kernel machines Michiko Yamana, Hiroyuki Nakahara, Massimiliano Pontil, and Shun-ichi Amari Λ Abstract. We study some ensembles of kernel machines. Each machine is first trained

More information

How to learn from very few examples?

How to learn from very few examples? How to learn from very few examples? Dengyong Zhou Department of Empirical Inference Max Planck Institute for Biological Cybernetics Spemannstr. 38, 72076 Tuebingen, Germany Outline Introduction Part A

More information

Joint distribution optimal transportation for domain adaptation

Joint distribution optimal transportation for domain adaptation Joint distribution optimal transportation for domain adaptation Changhuang Wan Mechanical and Aerospace Engineering Department The Ohio State University March 8 th, 2018 Joint distribution optimal transportation

More information

Global Scene Representations. Tilke Judd

Global Scene Representations. Tilke Judd Global Scene Representations Tilke Judd Papers Oliva and Torralba [2001] Fei Fei and Perona [2005] Labzebnik, Schmid and Ponce [2006] Commonalities Goal: Recognize natural scene categories Extract features

More information

Diversity Regularization of Latent Variable Models: Theory, Algorithm and Applications

Diversity Regularization of Latent Variable Models: Theory, Algorithm and Applications Diversity Regularization of Latent Variable Models: Theory, Algorithm and Applications Pengtao Xie, Machine Learning Department, Carnegie Mellon University 1. Background Latent Variable Models (LVMs) are

More information

Graph-Based Semi-Supervised Learning

Graph-Based Semi-Supervised Learning Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université de Montréal CIAR Workshop - April 26th, 2005 Graph-Based Semi-Supervised Learning Yoshua Bengio, Olivier

More information