Multiple kernel learning for multiple sources
|
|
- Rosalind Mills
- 6 years ago
- Views:
Transcription
1 Multiple kernel learning for multiple sources Francis Bach INRIA - Ecole Normale Supérieure NIPS Workshop - December 2008
2 Talk outline Multiple sources in computer vision Multiple kernel learning (MKL) Equivalent formulations Theoretical analysis and open problems Covariance operators
3 Machine learning for computer vision Learning tasks on images Multiplication of digital media Many different tasks to be solved Associated with different machine learning problems
4 Image retrieval Classification, ranking, outlier detection
5 Image retrieval Classification, ranking, outlier detection
6 Personal photos Classification, clustering, visualisation
7 Machine learning for computer vision Learning tasks on images Multiplication of digital media Many different tasks to be solved Associated with different machine learning problems Application: retrieval/indexing of images Common issues: Complex tasks Heterogeneous data links with other medias (text and sound) Massive data
8 Machine learning for computer vision Learning tasks on images Multiplication of digital media Many different tasks to be solved Associated with different machine learning problems Application: retrieval/indexing of images Common issues: Complex tasks Heterogeneous data links with other medias (text and sound) Massive data Kernel methods
9 Multiple sources in computer vision Many different cues Shape Color Texture Segments Interest points Kernel design is easier for one source at at time Links with bioinformatics
10 Kernels for interest points SIFT + pyramid match (Grauman and Darrell, 2007)
11 Kernels for texture Histograms of filters
12 Kernels from segmentation graphs (Harchaoui and Bach, 2007) Goal of segmentation: extract objects of interest Many methods available, but, rarely find the object of interest entirely Segmentation graphs Allows to work on more reliable over-segmentation Going to a large square grid (millions of pixels) to a small graph (dozens or hundreds of regions)
13 Segmentation by watershed transform (Meyer, 2001) image gradient watershed 287 segments 64 segments 10 segments
14 Segmentation by watershed transform (Meyer, 2001) image gradient watershed 287 segments 64 segments 10 segments
15 Image as a segmentation graph Labelled undirected graph Vertices: connected segmented regions Edges: between spatially neighboring regions Labels: region pixels Graph kernels (Gärtner et al., 2003; Kashima et al., 2004; Harchaoui and Bach, 2007) provide an elegant and efficient solution
16 Talk outline Multiple sources in computer vision Multiple kernel learning (MKL) Equivalent formulations Theoretical analysis and open problems Covariance operators
17 Multiple sources by combining kernels Learning combinations of kernels: K(η) = m j=1 η jk j, η 0 Summing kernels concatenating feature spaces Assume k 1 (x,y)= Φ 1 (x),φ 1 (y), k 2 (x,y)= Φ 2 (x),φ 2 (y) k 1 (x,y) + k 2 (x,y) = (Φ1 (x) Φ 2 (x) ) (, Φ1 (y)) Φ 2 (y)
18 Multiple sources by combining kernels Learning combinations of kernels: K(η) = m j=1 η jk j, η 0 Summing kernels concatenating feature spaces Two natural (equivalent) settings 1. Single input space, multiple feature spaces x X, m different kernels on X Example: learning hyperparameters of kernels 2. Multiple pairs of input/feature spaces x j X j, j = 1,...,m, kernels on each input space Multiple sources Generalized additive models (Hastie and Tibshirani, 1990)
19 Multiple kernel learning (Lanckriet et al., 2004; Bach et al., 2004a) Learning kernels K = m j=1 η jk j, η 0 Summing kernels is equivalent to concatenating feature spaces m feature maps Φ j : X F j, j = 1,...,m. Minimization with respect to f 1 F 1,...,f m F m Predictor: f(x) = f 1 Φ 1 (x) + + f m Φ m (x) Φ 1 (x) f 1 ր.. ց x Φ j (x) f j ց.. ր Φ m (x) f m Which regularization? f 1 Φ 1 (x) + + f mφ m (x)
20 Regularization for multiple kernels Summing kernels is equivalent to concatenating feature spaces m feature maps Φ j : X F j, j = 1,...,m. Minimization with respect to f 1 F 1,...,f m F m Predictor: f(x) = f 1 Φ 1 (x) + + f m Φ m (x) Regularization by m j=1 f j 2 is equivalent to using K = m j=1 K j
21 Regularization for multiple kernels Summing kernels is equivalent to concatenating feature spaces m feature maps Φ j : X F j, j = 1,...,m. Minimization with respect to f 1 F 1,...,f m F m Predictor: f(x) = f 1 Φ 1 (x) + + f m Φ m (x) Regularization by m j=1 f j 2 is equivalent to using K = m j=1 K j Regularization by m j=1 f j should impose sparsity at the group level Main questions when regularizing by block l 1 -norm: 1. Equivalence with other kernel learning formulations 2. Algorithms 3. Analysis of sparsity inducing properties (Bach, 2008)
22 General kernel learning Proposition (Lanckriet et al, 2004, Bach et al., 2005, Micchelli and Pontil, 2005): G(K) = min f F n i=1 ϕ i(f Φ(x i )) + λ 2 f 2 = max α R n n i=1 ψ i(λα i ) λ 2 α Kα is a convex function of the Gram matrix K Theoretical learning bounds (Lanckriet et al, 2004, Srebro and Ben- David, 2006)
23 MKL - equivalence with other kernel learning formulations (Bach et al., 2004a) Block l 1 -norm problem: n i=1 ϕ i(f 1 Φ 1 (x i ) + + f mφ m (x i )) + λ 2 ( f f m ) 2 Kernel learning formulation: minimize w.r.t. η in the simplex: G(K(η)) = max α R n n i=1 ψ i(λα i ) λ 2 α ( m j=1 η jk j ) α Proposition: Block l 1 -norm regularization is equivalent to minimizing with respect to η the optimal value G(K(η)) Weights η obtained from optimality conditions Single optimization problem for learning both η and α
24 Algorithms for MKL (very) costly optimization with SDP, QCQP ou SOCP n 1,000 10,000, m 100 not possible loose required precision first order methods (see, e.g., Bottou and Bousquet (2008)) Dual coordinate ascent (SMO) with smoothing (Bach et al., 2004a) Optimization of G(K) by cutting planes (Sonnenburg et al., 2006) Optimization of G(K) with steepest descent with smoothing (Rakotomamonjy et al., 2008) Regularization path (Bach et al., 2004b) etc...
25 Summing kernels vs. optimizing weights Different regularizations: Regularization by m j=1 f j 2 is equivalent to using K = m j=1 K j Regularization by m j=1 f j should impose sparsity at the group level, and learn sparse weights η, with K = m j=1 η j K j If sparsity is not expected, l 1 has no reason to be better
26 Performance on Corel14 (Harchaoui and Bach, 2007) Corel14: 1400 natural images with 14 classes
27 Performance on Corel14 (Harchaoui and Bach, 2007) Error rates Performance comparison on Corel14 Histogram kernels (H) Walk kernels (W) Tree-walk kernels (TW) Test error Weighted tree-walks (wtw) MKL (M) H W TW wtw M Kernels
28 Caltech101 database (Fei-Fei et al., 2006)
29 Kernel combination for Caltech101 (Varma and Ray, 2007) Classification accuracies 1- NN SVM (1 vs. 1) SVM (1 vs. rest) Shape GB ± ± ± 0.70 Shape GB ± ± ± 0.57 Self Similarity ± ± ± 0.84 PHOG ± ± ± 0.52 PHOG ± ± ± 0.85 PHOWColour ± ± ± 1.46 PHOWGray ± ± ± 0.30 MKL Block l ± ± 0.39 (Varma and Ray, 2007) ± ± 0.59 See also Bosch et al. (2008)
30 Talk outline Multiple sources in computer vision Multiple kernel learning (MKL) Equivalent formulations Theoretical analysis and open problems Covariance operators
31 Analysis of MKL as non parametric group Lasso Assume m Hilbert spaces F i, i = 1,...,m on m different input spaces n m min y i f j (x ji ) + µ m n f j. f i F i, i=1,...,m 2n 2 i=1 NB: f j (x ji ) = f j Φ j(x ji ) j=1 Sparse generalized additive models (Hastie and Tibshirani, 1990, Ravikumar et al., 2007) Algorithms: use parametrization with α Analysis: Do not use α use covariance operators (i.e., stay in the primal/input space) j=1
32 (non centered) covariance operators Single random variable X: Σ XX is a bounded linear operator from F to F such that for all (f,g) F F, f,σ XX g = E(f(X)g(X)) Under minor assumptions, the operator Σ XX is auto-adjoint, nonnegative and Hilbert-Schmidt Tool of choice for the analysis of least-squares non parametric methods (Blanchard, 2006, Fukumizu et al., 2005, 2006, Gretton et al., 2006, Harchaoui et al., 2007, 2008, etc.) Natural empirical estimate f, ˆΣ XX g = 1 n n i=1 f(x i)g(x i ) converges in probability to Σ XX in HS norm.
33 Cross-covariance operators Several random variables: cross-covariance operators Σ Xi X j from F j to F i such that (f i,f j ) F i F j, f i,σ Xi X j f j = E(f i (X i )f j (X j )) Similar convergence properties of empirical estimates Joint covariance operator Σ XX defined by blocks We can define the bounded correlation operators through Σ Xi X j = Σ 1/2 X i X i C Xi X j Σ 1/2 X j X j NB: the joint covariance operator is never invertible, but the correlation operator may be
34 Covariance operators for multiple sources Simple tool for characterizing relationship between sources Formally equivalent to finite feature space setting Allows proper asymptotic and non asymptotic analysis, e.g., Limit distributions of non parametric test statistics (Gretton et al., 2006, Harchaoui, Bach and Moulines, 2007, 2008)
35 Analysis of MKL as non parametric group Lasso Assumptions 1. Generalized additive model: There exists functions f = (f 1,...,f m ) F = F 1 F m such that Y = m j=1 f j(x j ) + ε 2. Compacity and invertibility : All cross-correlation operators are compact and the joint correlation operator is invertible. 3. Additional technical assumptions
36 Compacity and invertibility of joint correlation operator Sufficient condition for compacity when distributions have densities: E { } pxi X j (x i,x j ) p Xi (x i )p Xj (x j ) 1 <. Dependence between variables is not too strong Sufficient condition for invertibility: functions in the RKHS. no exact correlation using Empty concurvity space assumption (Hastie and Tibshirani, 1990)
37 Group lasso - Consistency conditions Strict condition Weak condition max i J c Σ 1/2 X i X i C Xi X J C 1 X J X J Diag(1/ f j )g J < 1 max i J c Σ 1/2 X i X i C Xi X J C 1 X J X J Diag(1/ f j )g J 1 Theorem 1: Strict condition is sufficient for joint regular and sparsity consistency of the lasso. Theorem 2: Weak condition is necessary for joint regular and sparsity consistency of the lasso.
38 Conclusion - Interesting problems/issues Multiple kernel learning for supervised learning with multiple sources Vision (and bioinformatics?) Equivalent formulations Learning from exponentially many sources Theory: good estimation as long as log p = o(n) Structure is needed! (Bach, NIPS 2008) Choosing well-behaved sources Different sources or similar sources? Characterizing when using multiple sources helps
39 References F. R. Bach. Consistency of the group Lasso and multiple kernel learning. Journal of Machine Learning Research, pages , F. R. Bach, G. R. G. Lanckriet, and M. I. Jordan. Multiple kernel learning, conic duality, and the SMO algorithm. In Proceedings of the International Conference on Machine Learning (ICML), 2004a. F. R. Bach, R. Thibaux, and M. I. Jordan. Computing regularization paths for learning multiple kernels. In Advances in Neural Information Processing Systems 17, 2004b. A. Bosch, Zisserman A., and X. Munoz. Image classification using rois and multiple kernel learning. International Journal of Computer Vision, submitted. Léon Bottou and Olivier Bousquet. Learning using large datasets. In Mining Massive DataSets for Security, NATO ASI Workshop Series. IOS Press, Amsterdam, URL org/papers/bottou-bousquet-2008b. to appear. L. Fei-Fei, R. Fergus, and P. Perona. Learning generative visual models for 101 object categories. Computer Vision and Image Understanding, Thomas Gärtner, Peter A. Flach, and Stefan Wrobel. On graph kernels: Hardness results and efficient alternatives. In COLT, K. Grauman and T. Darrell. The pyramid match kernel: Efficient learning with sets of features. J. Mach. Learn. Res., 8: , ISSN Z. Harchaoui and F. R. Bach. Image classification with segmentation graph kernels. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), 2007.
40 T. J. Hastie and R. J. Tibshirani. Generalized Additive Models. Chapman & Hall, Hisashi Kashima, Koji Tsuda, and Akihiro Inokuchi. Kernels for graphs. In Kernel Methods in Computational Biology. MIT Press, G. R. G. Lanckriet, N. Cristianini, L. El Ghaoui, P. Bartlett, and M. I. Jordan. Learning the kernel matrix with semidefinite programming. Journal of Machine Learning Research, 5:27 72, F. Meyer. Hierarchies of partitions and morphological segmentation. In Scale-Space and Morphology in Computer Vision. Springer-Verlag, A. Rakotomamonjy, F. R. Bach, S. Canu, and Y. Grandvalet. Simplemkl. Journal of Machine Learning Research, to appear, S. Sonnenburg, G. Räsch, C. Schäfer, and B. Schölkopf. Large scale multiple kernel learning. Journal of Machine Learning Research, 7: , M. Varma and D. Ray. Learning the discriminative power-invariance trade-off. In Proc. ICCV, 2007.
Hierarchical kernel learning
Hierarchical kernel learning Francis Bach Willow project, INRIA - Ecole Normale Supérieure May 2010 Outline Supervised learning and regularization Kernel methods vs. sparse methods MKL: Multiple kernel
More informationComputing regularization paths for learning multiple kernels
Computing regularization paths for learning multiple kernels Francis Bach Romain Thibaux Michael Jordan Computer Science, UC Berkeley December, 24 Code available at www.cs.berkeley.edu/~fbach Computing
More informationMULTIPLEKERNELLEARNING CSE902
MULTIPLEKERNELLEARNING CSE902 Multiple Kernel Learning -keywords Heterogeneous information fusion Feature selection Max-margin classification Multiple kernel learning MKL Convex optimization Kernel classification
More informationKernel Learning for Multi-modal Recognition Tasks
Kernel Learning for Multi-modal Recognition Tasks J. Saketha Nath CSE, IIT-B IBM Workshop J. Saketha Nath (IIT-B) IBM Workshop 23-Sep-09 1 / 15 Multi-modal Learning Tasks Multiple views or descriptions
More informationarxiv: v1 [cs.lg] 31 Dec 2013
Controlled Sparsity Kernel Learning arxiv:141.116v1 [cs.lg] 31 Dec 213 Abstract Dinesh Govindaraj Bell Labs Research Bangalore, India dinesh.govindaraj@alcatel-lucent.com Sreedal Menon Bell Labs Research
More informationKernel methods & sparse methods for computer vision
Kernel methods & sparse methods for computer vision Francis Bach Willow project, INRIA - Ecole Normale Supérieure CVML Summer School, Grenoble, July 2010 1 Machine learning Supervised learning Predict
More informationBeyond stochastic gradient descent for large-scale machine learning
Beyond stochastic gradient descent for large-scale machine learning Francis Bach INRIA - Ecole Normale Supérieure, Paris, France Joint work with Eric Moulines - October 2014 Big data revolution? A new
More informationBeyond stochastic gradient descent for large-scale machine learning
Beyond stochastic gradient descent for large-scale machine learning Francis Bach INRIA - Ecole Normale Supérieure, Paris, France Joint work with Eric Moulines, Nicolas Le Roux and Mark Schmidt - CAP, July
More informationExploring Large Feature Spaces with Hierarchical Multiple Kernel Learning
Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning Francis Bach INRIA - Willow Project, École Normale Supérieure 45, rue d Ulm, 75230 Paris, France francis.bach@mines.org Abstract
More informationCluster Kernels for Semi-Supervised Learning
Cluster Kernels for Semi-Supervised Learning Olivier Chapelle, Jason Weston, Bernhard Scholkopf Max Planck Institute for Biological Cybernetics, 72076 Tiibingen, Germany {first. last} @tuebingen.mpg.de
More informationStatistical Convergence of Kernel CCA
Statistical Convergence of Kernel CCA Kenji Fukumizu Institute of Statistical Mathematics Tokyo 106-8569 Japan fukumizu@ism.ac.jp Francis R. Bach Centre de Morphologie Mathematique Ecole des Mines de Paris,
More informationParcimonie en apprentissage statistique
Parcimonie en apprentissage statistique Guillaume Obozinski Ecole des Ponts - ParisTech Journée Parcimonie Fédération Charles Hermite, 23 Juin 2014 Parcimonie en apprentissage 1/44 Classical supervised
More informationKernels for Multi task Learning
Kernels for Multi task Learning Charles A Micchelli Department of Mathematics and Statistics State University of New York, The University at Albany 1400 Washington Avenue, Albany, NY, 12222, USA Massimiliano
More informationStochastic gradient methods for machine learning
Stochastic gradient methods for machine learning Francis Bach INRIA - Ecole Normale Supérieure, Paris, France Joint work with Eric Moulines, Nicolas Le Roux and Mark Schmidt - January 2013 Context Machine
More informationConvergence Rates of Kernel Quadrature Rules
Convergence Rates of Kernel Quadrature Rules Francis Bach INRIA - Ecole Normale Supérieure, Paris, France ÉCOLE NORMALE SUPÉRIEURE NIPS workshop on probabilistic integration - Dec. 2015 Outline Introduction
More informationHigh-Dimensional Non-Linear Variable Selection through Hierarchical Kernel Learning
High-Dimensional Non-Linear Variable Selection through Hierarchical Kernel Learning Francis Bach INRIA - WILLOW Project-Team Laboratoire d Informatique de l Ecole Normale Supérieure CNRS/ENS/INRIA UMR
More informationConvex Optimization in Classification Problems
New Trends in Optimization and Computational Algorithms December 9 13, 2001 Convex Optimization in Classification Problems Laurent El Ghaoui Department of EECS, UC Berkeley elghaoui@eecs.berkeley.edu 1
More informationKEOPS: KERNELS ORGANIZED INTO PYRAMIDS
KEOPS: KERNELS ORGANIZED INTO PYRAMIDS Marie Szafranski ENSIIE IBISC Université d Évry Val d Essonne Évry, France Yves Grandvalet Université de Technologie de Compiègne CNRS UMR 253 Heudiasyc Compiègne,
More informationStochastic gradient descent and robustness to ill-conditioning
Stochastic gradient descent and robustness to ill-conditioning Francis Bach INRIA - Ecole Normale Supérieure, Paris, France ÉCOLE NORMALE SUPÉRIEURE Joint work with Aymeric Dieuleveut, Nicolas Flammarion,
More informationMulti-Task Learning and Matrix Regularization
Multi-Task Learning and Matrix Regularization Andreas Argyriou Department of Computer Science University College London Collaborators T. Evgeniou (INSEAD) R. Hauser (University of Oxford) M. Herbster (University
More informationOptimal Kernel Selection in Kernel Fisher Discriminant Analysis
Optimal Kernel Selection in Kernel Fisher Discriminant Analysis Seung-Jean Kim Alessandro Magnani Stephen Boyd Department of Electrical Engineering, Stanford University, Stanford, CA 94304 USA sjkim@stanford.org
More informationSemi-supervised Dictionary Learning Based on Hilbert-Schmidt Independence Criterion
Semi-supervised ictionary Learning Based on Hilbert-Schmidt Independence Criterion Mehrdad J. Gangeh 1, Safaa M.A. Bedawi 2, Ali Ghodsi 3, and Fakhri Karray 2 1 epartments of Medical Biophysics, and Radiation
More informationKEOPS: KERNELS ORGANIZED INTO PYRAMIDS
204 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP) KEOPS: KERNELS ORGANIZED INTO PYRAMIDS Marie Szafranski ENSIIE IBISC Université d Évry Val d Essonne Évry, France Yves
More informationStochastic optimization in Hilbert spaces
Stochastic optimization in Hilbert spaces Aymeric Dieuleveut Aymeric Dieuleveut Stochastic optimization Hilbert spaces 1 / 48 Outline Learning vs Statistics Aymeric Dieuleveut Stochastic optimization Hilbert
More informationFantope Regularization in Metric Learning
Fantope Regularization in Metric Learning CVPR 2014 Marc T. Law (LIP6, UPMC), Nicolas Thome (LIP6 - UPMC Sorbonne Universités), Matthieu Cord (LIP6 - UPMC Sorbonne Universités), Paris, France Introduction
More informationMultiple Kernel Self-Organizing Maps
Multiple Kernel Self-Organizing Maps Madalina Olteanu (2), Nathalie Villa-Vialaneix (1,2), Christine Cierco-Ayrolles (1) http://www.nathalievilla.org {madalina.olteanu,nathalie.villa}@univ-paris1.fr, christine.cierco@toulouse.inra.fr
More informationFast Kernel Learning using Sequential Minimal Optimization
Fast Kernel Learning using Sequential Minimal Optimization Francis R. Bach & Gert R. G. Lanckriet {fbach,gert}@cs.berkeley.edu Division of Computer Science, Department of Electrical Engineering and Computer
More informationIntroduction to Three Paradigms in Machine Learning. Julien Mairal
Introduction to Three Paradigms in Machine Learning Julien Mairal Inria Grenoble Yerevan, 208 Julien Mairal Introduction to Three Paradigms in Machine Learning /25 Optimization is central to machine learning
More informationLearning Kernels -Tutorial Part III: Theoretical Guarantees.
Learning Kernels -Tutorial Part III: Theoretical Guarantees. Corinna Cortes Google Research corinna@google.com Mehryar Mohri Courant Institute & Google Research mohri@cims.nyu.edu Afshin Rostami UC Berkeley
More informationSupport Vector Machines for Classification: A Statistical Portrait
Support Vector Machines for Classification: A Statistical Portrait Yoonkyung Lee Department of Statistics The Ohio State University May 27, 2011 The Spring Conference of Korean Statistical Society KAIST,
More informationDiscriminative Learning and Big Data
AIMS-CDT Michaelmas 2016 Discriminative Learning and Big Data Lecture 2: Other loss functions and ANN Andrew Zisserman Visual Geometry Group University of Oxford http://www.robots.ox.ac.uk/~vgg Lecture
More informationMultiple Kernel Learning, Conic Duality, and the SMO Algorithm
Multiple Kernel Learning, Conic Duality, and the SMO Algorithm Francis R. Bach & Gert R. G. Lanckriet {fbach,gert}@cs.berkeley.edu Department of Electrical Engineering and Computer Science, University
More informationSemi-Supervised Learning through Principal Directions Estimation
Semi-Supervised Learning through Principal Directions Estimation Olivier Chapelle, Bernhard Schölkopf, Jason Weston Max Planck Institute for Biological Cybernetics, 72076 Tübingen, Germany {first.last}@tuebingen.mpg.de
More informationKernel-Based Contrast Functions for Sufficient Dimension Reduction
Kernel-Based Contrast Functions for Sufficient Dimension Reduction Michael I. Jordan Departments of Statistics and EECS University of California, Berkeley Joint work with Kenji Fukumizu and Francis Bach
More informationStochastic gradient methods for machine learning
Stochastic gradient methods for machine learning Francis Bach INRIA - Ecole Normale Supérieure, Paris, France Joint work with Eric Moulines, Nicolas Le Roux and Mark Schmidt - September 2012 Context Machine
More informationOptimizing Over Radial Kernels on Compact Manifolds
Optimizing Over Radial Kernels on Compact Manifolds Sadeep Jayasumana, Richard Hartley, Mathieu Salzmann, Hongdong Li, and Mehrtash Harandi Australian National University, Canberra NICTA, Canberra sadeep.jayasumana@anu.edu.au
More informationLearning Spectral Graph Segmentation
Learning Spectral Graph Segmentation AISTATS 2005 Timothée Cour Jianbo Shi Nicolas Gogin Computer and Information Science Department University of Pennsylvania Computer Science Ecole Polytechnique Graph-based
More informationStatistical Optimality of Stochastic Gradient Descent through Multiple Passes
Statistical Optimality of Stochastic Gradient Descent through Multiple Passes Francis Bach INRIA - Ecole Normale Supérieure, Paris, France ÉCOLE NORMALE SUPÉRIEURE Joint work with Loucas Pillaud-Vivien
More informationThe Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space
The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space Robert Jenssen, Deniz Erdogmus 2, Jose Principe 2, Torbjørn Eltoft Department of Physics, University of Tromsø, Norway
More informationA convex relaxation for weakly supervised classifiers
A convex relaxation for weakly supervised classifiers Armand Joulin and Francis Bach SIERRA group INRIA -Ecole Normale Supérieure ICML 2012 Weakly supervised classification We adress the problem of weakly
More informationA New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables
A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of
More informationAdvanced Machine Learning
Advanced Machine Learning Learning Kernels MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Outline Kernel methods. Learning kernels scenario. learning bounds. algorithms. page 2 Machine Learning
More informationConvex relaxation for Combinatorial Penalties
Convex relaxation for Combinatorial Penalties Guillaume Obozinski Equipe Imagine Laboratoire d Informatique Gaspard Monge Ecole des Ponts - ParisTech Joint work with Francis Bach Fête Parisienne in Computation,
More informationSimilarity-Based Theoretical Foundation for Sparse Parzen Window Prediction
Similarity-Based Theoretical Foundation for Sparse Parzen Window Prediction Nina Balcan Avrim Blum Nati Srebro Toyota Technological Institute Chicago Sparse Parzen Window Prediction We are concerned with
More information1 Graph Kernels by Spectral Transforms
Graph Kernels by Spectral Transforms Xiaojin Zhu Jaz Kandola John Lafferty Zoubin Ghahramani Many graph-based semi-supervised learning methods can be viewed as imposing smoothness conditions on the target
More informationRobust Support Vector Machines for Probability Distributions
Robust Support Vector Machines for Probability Distributions Andreas Christmann joint work with Ingo Steinwart (Los Alamos National Lab) ICORS 2008, Antalya, Turkey, September 8-12, 2008 Andreas Christmann,
More informationMixture Kernel Least Mean Square
Proceedings of International Joint Conference on Neural Networks, Dallas, Texas, USA, August 4-9, 2013 Mixture Kernel Least Mean Square Rosha Pokharel, José. C. Príncipe Department of Electrical and Computer
More informationAn Adaptive Test of Independence with Analytic Kernel Embeddings
An Adaptive Test of Independence with Analytic Kernel Embeddings Wittawat Jitkrittum 1 Zoltán Szabó 2 Arthur Gretton 1 1 Gatsby Unit, University College London 2 CMAP, École Polytechnique ICML 2017, Sydney
More informationTales from fmri Learning from limited labeled data. Gae l Varoquaux
Tales from fmri Learning from limited labeled data Gae l Varoquaux fmri data p 100 000 voxels per map Heavily correlated + structured noise Low SNR: 5% 13 db Brain response maps (activation) n Hundreds,
More informationDistinguish between different types of scenes. Matching human perception Understanding the environment
Scene Recognition Adriana Kovashka UTCS, PhD student Problem Statement Distinguish between different types of scenes Applications Matching human perception Understanding the environment Indexing of images
More information9.2 Support Vector Machines 159
9.2 Support Vector Machines 159 9.2.3 Kernel Methods We have all the tools together now to make an exciting step. Let us summarize our findings. We are interested in regularized estimation problems of
More informationAdaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer Vicente L. Malave February 23, 2011 Outline Notation minimize a number of functions φ
More informationLeast Absolute Shrinkage is Equivalent to Quadratic Penalization
Least Absolute Shrinkage is Equivalent to Quadratic Penalization Yves Grandvalet Heudiasyc, UMR CNRS 6599, Université de Technologie de Compiègne, BP 20.529, 60205 Compiègne Cedex, France Yves.Grandvalet@hds.utc.fr
More informationDiscriminative Tensor Sparse Coding for Image Classification
ZHANG et al: DISCRIMINATIVE TENSOR SPARSE CODING 1 Discriminative Tensor Sparse Coding for Image Classification Yangmuzi Zhang 1 ymzhang@umiacs.umd.edu Zhuolin Jiang 2 zhuolin.jiang@huawei.com Larry S.
More informationOne-class Label Propagation Using Local Cone Based Similarity
One-class Label Propagation Using Local Based Similarity Takumi Kobayashi and Nobuyuki Otsu Abstract In this paper, we propose a novel method of label propagation for one-class learning. For binary (positive/negative)
More informationKernel methods for comparing distributions, measuring dependence
Kernel methods for comparing distributions, measuring dependence Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Principal component analysis Given a set of M centered observations
More informationTractable Upper Bounds on the Restricted Isometry Constant
Tractable Upper Bounds on the Restricted Isometry Constant Alex d Aspremont, Francis Bach, Laurent El Ghaoui Princeton University, École Normale Supérieure, U.C. Berkeley. Support from NSF, DHS and Google.
More informationA Spectral Regularization Framework for Multi-Task Structure Learning
A Spectral Regularization Framework for Multi-Task Structure Learning Massimiliano Pontil Department of Computer Science University College London (Joint work with A. Argyriou, T. Evgeniou, C.A. Micchelli,
More informationCS798: Selected topics in Machine Learning
CS798: Selected topics in Machine Learning Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Jakramate Bootkrajang CS798: Selected topics in Machine Learning
More informationBits of Machine Learning Part 1: Supervised Learning
Bits of Machine Learning Part 1: Supervised Learning Alexandre Proutiere and Vahan Petrosyan KTH (The Royal Institute of Technology) Outline of the Course 1. Supervised Learning Regression and Classification
More informationOnline MKL for Structured Prediction
Online MKL for Structured Prediction André F. T. Martins Noah A. Smith Eric P. Xing School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA Pedro M. Q. Aguiar Instituto de Sistemas
More information... SPARROW. SPARse approximation Weighted regression. Pardis Noorzad. Department of Computer Engineering and IT Amirkabir University of Technology
..... SPARROW SPARse approximation Weighted regression Pardis Noorzad Department of Computer Engineering and IT Amirkabir University of Technology Université de Montréal March 12, 2012 SPARROW 1/47 .....
More informationCompressed Fisher vectors for LSVR
XRCE@ILSVRC2011 Compressed Fisher vectors for LSVR Florent Perronnin and Jorge Sánchez* Xerox Research Centre Europe (XRCE) *Now with CIII, Cordoba University, Argentina Our system in a nutshell High-dimensional
More informationLinear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction
Linear vs Non-linear classifier CS789: Machine Learning and Neural Network Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Linear classifier is in the
More informationStatistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.
http://goo.gl/jv7vj9 Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT
More informationBeyond stochastic gradient descent for large-scale machine learning
Beyond stochastic gradient descent for large-scale machine learning Francis Bach INRIA - Ecole Normale Supérieure, Paris, France ÉCOLE NORMALE SUPÉRIEURE Joint work with Aymeric Dieuleveut, Nicolas Flammarion,
More informationProbabilistic Latent Semantic Analysis
Probabilistic Latent Semantic Analysis Dan Oneaţă 1 Introduction Probabilistic Latent Semantic Analysis (plsa) is a technique from the category of topic models. Its main goal is to model cooccurrence information
More informationApproximate Kernel Methods
Lecture 3 Approximate Kernel Methods Bharath K. Sriperumbudur Department of Statistics, Pennsylvania State University Machine Learning Summer School Tübingen, 207 Outline Motivating example Ridge regression
More informationRobust Kernel-Based Regression
Robust Kernel-Based Regression Budi Santosa Department of Industrial Engineering Sepuluh Nopember Institute of Technology Kampus ITS Surabaya Surabaya 60111,Indonesia Theodore B. Trafalis School of Industrial
More informationSupport Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM
1 Support Vector Machines (SVM) in bioinformatics Day 1: Introduction to SVM Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan Jean-Philippe.Vert@mines.org Human Genome Center, University
More informationSVRG++ with Non-uniform Sampling
SVRG++ with Non-uniform Sampling Tamás Kern András György Department of Electrical and Electronic Engineering Imperial College London, London, UK, SW7 2BT {tamas.kern15,a.gyorgy}@imperial.ac.uk Abstract
More informationMetric Embedding of Task-Specific Similarity. joint work with Trevor Darrell (MIT)
Metric Embedding of Task-Specific Similarity Greg Shakhnarovich Brown University joint work with Trevor Darrell (MIT) August 9, 2006 Task-specific similarity A toy example: Task-specific similarity A toy
More informationKernel Density Topic Models: Visual Topics Without Visual Words
Kernel Density Topic Models: Visual Topics Without Visual Words Konstantinos Rematas K.U. Leuven ESAT-iMinds krematas@esat.kuleuven.be Mario Fritz Max Planck Institute for Informatics mfrtiz@mpi-inf.mpg.de
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationCS4495/6495 Introduction to Computer Vision. 8C-L3 Support Vector Machines
CS4495/6495 Introduction to Computer Vision 8C-L3 Support Vector Machines Discriminative classifiers Discriminative classifiers find a division (surface) in feature space that separates the classes Several
More informationMathematical Programming for Multiple Kernel Learning
Mathematical Programming for Multiple Kernel Learning Alex Zien Fraunhofer FIRST.IDA, Berlin, Germany Friedrich Miescher Laboratory, Tübingen, Germany 07. July 2009 Mathematical Programming Stream, EURO
More informationClassification. The goal: map from input X to a label Y. Y has a discrete set of possible values. We focused on binary Y (values 0 or 1).
Regression and PCA Classification The goal: map from input X to a label Y. Y has a discrete set of possible values We focused on binary Y (values 0 or 1). But we also discussed larger number of classes
More informationGraph Partitioning Using Random Walks
Graph Partitioning Using Random Walks A Convex Optimization Perspective Lorenzo Orecchia Computer Science Why Spectral Algorithms for Graph Problems in practice? Simple to implement Can exploit very efficient
More informationExclusive Lasso for Multi-task Feature Selection
Yang Zhou 1 Rong Jin 1 Steven C.H. Hoi 1 Department of Computer Science and Engineering Michigan State University East Lansing, MI 48910 USA {zhouyang,rongjin}@msu.edu School of Computer Engineering Nanyang
More informationEE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015
EE613 Machine Learning for Engineers Kernel methods Support Vector Machines jean-marc odobez 2015 overview Kernel methods introductions and main elements defining kernels Kernelization of k-nn, K-Means,
More informationKernel Measures of Conditional Dependence
Kernel Measures of Conditional Dependence Kenji Fukumizu Institute of Statistical Mathematics 4-6-7 Minami-Azabu, Minato-ku Tokyo 6-8569 Japan fukumizu@ism.ac.jp Arthur Gretton Max-Planck Institute for
More informationSupport Vector Machines
Support Vector Machines Tobias Pohlen Selected Topics in Human Language Technology and Pattern Recognition February 10, 2014 Human Language Technology and Pattern Recognition Lehrstuhl für Informatik 6
More informationMachine Learning. Classification, Discriminative learning. Marc Toussaint University of Stuttgart Summer 2015
Machine Learning Classification, Discriminative learning Structured output, structured input, discriminative function, joint input-output features, Likelihood Maximization, Logistic regression, binary
More informationSample-adaptive Multiple Kernel Learning
Sample-adaptive Multiple Kernel Learning Xinwang Liu, Lei Wang, Jian Zhang, Jianping Yin Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology
More informationMetric Embedding for Kernel Classification Rules
Metric Embedding for Kernel Classification Rules Bharath K. Sriperumbudur University of California, San Diego (Joint work with Omer Lang & Gert Lanckriet) Bharath K. Sriperumbudur (UCSD) Metric Embedding
More informationAn Adaptive Test of Independence with Analytic Kernel Embeddings
An Adaptive Test of Independence with Analytic Kernel Embeddings Wittawat Jitkrittum Gatsby Unit, University College London wittawat@gatsby.ucl.ac.uk Probabilistic Graphical Model Workshop 2017 Institute
More informationMultiple Similarities Based Kernel Subspace Learning for Image Classification
Multiple Similarities Based Kernel Subspace Learning for Image Classification Wang Yan, Qingshan Liu, Hanqing Lu, and Songde Ma National Laboratory of Pattern Recognition, Institute of Automation, Chinese
More informationScale-Invariance of Support Vector Machines based on the Triangular Kernel. Abstract
Scale-Invariance of Support Vector Machines based on the Triangular Kernel François Fleuret Hichem Sahbi IMEDIA Research Group INRIA Domaine de Voluceau 78150 Le Chesnay, France Abstract This paper focuses
More informationBrief Introduction to Machine Learning
Brief Introduction to Machine Learning Yuh-Jye Lee Lab of Data Science and Machine Intelligence Dept. of Applied Math. at NCTU August 29, 2016 1 / 49 1 Introduction 2 Binary Classification 3 Support Vector
More informationSemi-Supervised Learning
Semi-Supervised Learning getting more for less in natural language processing and beyond Xiaojin (Jerry) Zhu School of Computer Science Carnegie Mellon University 1 Semi-supervised Learning many human
More informationStochastic Optimization with Variance Reduction for Infinite Datasets with Finite Sum Structure
Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite Sum Structure Alberto Bietti Julien Mairal Inria Grenoble (Thoth) March 21, 2017 Alberto Bietti Stochastic MISO March 21,
More informationMachine Learning. A. Supervised Learning A.1. Linear Regression. Lars Schmidt-Thieme
Machine Learning A. Supervised Learning A.1. Linear Regression Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim, Germany
More informationMTTTS16 Learning from Multiple Sources
MTTTS16 Learning from Multiple Sources 5 ECTS credits Autumn 2018, University of Tampere Lecturer: Jaakko Peltonen Lecture 6: Multitask learning with kernel methods and nonparametric models On this lecture:
More informationLearning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 1/31
Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking Dengyong Zhou zhou@tuebingen.mpg.de Dept. Schölkopf, Max Planck Institute for Biological Cybernetics, Germany Learning from
More informationESANN'2003 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 2003, d-side publi., ISBN X, pp.
On different ensembles of kernel machines Michiko Yamana, Hiroyuki Nakahara, Massimiliano Pontil, and Shun-ichi Amari Λ Abstract. We study some ensembles of kernel machines. Each machine is first trained
More informationHow to learn from very few examples?
How to learn from very few examples? Dengyong Zhou Department of Empirical Inference Max Planck Institute for Biological Cybernetics Spemannstr. 38, 72076 Tuebingen, Germany Outline Introduction Part A
More informationJoint distribution optimal transportation for domain adaptation
Joint distribution optimal transportation for domain adaptation Changhuang Wan Mechanical and Aerospace Engineering Department The Ohio State University March 8 th, 2018 Joint distribution optimal transportation
More informationGlobal Scene Representations. Tilke Judd
Global Scene Representations Tilke Judd Papers Oliva and Torralba [2001] Fei Fei and Perona [2005] Labzebnik, Schmid and Ponce [2006] Commonalities Goal: Recognize natural scene categories Extract features
More informationDiversity Regularization of Latent Variable Models: Theory, Algorithm and Applications
Diversity Regularization of Latent Variable Models: Theory, Algorithm and Applications Pengtao Xie, Machine Learning Department, Carnegie Mellon University 1. Background Latent Variable Models (LVMs) are
More informationGraph-Based Semi-Supervised Learning
Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université de Montréal CIAR Workshop - April 26th, 2005 Graph-Based Semi-Supervised Learning Yoshua Bengio, Olivier
More information