Fantope Regularization in Metric Learning
|
|
- Julius White
- 5 years ago
- Views:
Transcription
1 Fantope Regularization in Metric Learning CVPR 2014 Marc T. Law (LIP6, UPMC), Nicolas Thome (LIP6 - UPMC Sorbonne Universités), Matthieu Cord (LIP6 - UPMC Sorbonne Universités), Paris, France
2 Introduction Notations & Related work Metric learning Fantope regularization Metric learning optimization algorithm Experiments Conclusion
3 Introduction Metric learning algorithms produce a linear transformation of data which is optimized to fit semantical relationships between training samples. Different aspects of the learning procedure have recently been investigated: how the dataset is annotated and used in the learning process; design choices for the distance parameterization; extensions to large scale context, etc. Surprisingly, few attempts have been made for deriving a proper regularization scheme. Regularization in metric learning is however a critical issue, as it often limits model complexity, the number of independent parameters to learn, and thus overfitting. Models learned with regularization usually better exploit correlations between features and often have improved predictive accuracy.
4 Introduction In this paper, we propose a novel regularization approach for metric learning that explicitly controls the rank of the learned distance matrix. The figure below illustrates the relevance of our approach. PubFig Our method LMNN OSR Our method LMNN
5 Introduction Notations & Related work Metric learning Fantope regularization Metric learning optimization algorithm Experiments Conclusion
6 Notations S d : sets of d d real-valued symmetric matrices S + d : sets of d d real-valued symmetric positive semidefinite (PSD) matrices For matrices A S d and B S d, we denote the Frobenius inner product by A, B = tr A T B. Π S+ d A is the orthogonal projection of the matrix A S d onto the positive semidefinite cone S + d. For a given a = a 1,, a d T R d, Diag a = A S d corresponds to a square diagonal matrix such that i, A i,i = a i. λ A is the vector of eigenvalues of matrix A arranged in non-increasing order. λ A i is the i-th largest eigenvalue of A. x i R d (resp. x j R d ) is the vector representation of image p i (resp. p j ), we note x ij = x i x j. For x R d, let x + = max 0, x.
7 Related work We focus in this work on supervised distance metric learning methods. ----similar and dissimilar pairs of images ----triplets of images In this paper, we consider the widely used Mahalanobis distance metric D M that is parameterized by the PSD matrix M S + d such that: D M 2 p i, p j = x i x j T M xi x j = x ij T Mxij It can also be rewritten: D 2 T M p i, p j = M, x ij x ij J. V. Davis, et al. ICML, 2007 A. Mignon and F. Jurie. CVPR, 2012 E. Xing, et al. NIPS, 2002 G. Chechik, et al. JMLR, 2010 A. Frome, et al. ICCV, 2007 M. Schultz and T. Joachims. NIPS, 2003 K. Weinberger and L. Saul. JMLR, 2009
8 Related work Many approaches prefer working on a specific matrix decomposition: i.e. M = L T L where L R e d and d is the data dimension. resulting optimization is very fast not convex w.r.t. L local minima In addition, an explicit regularization term is rarely introduced in the learning scheme. For instance, that lack of regularization makes LMNN prone to overfitting. To limit this shortcoming, many approaches perform early stopping which stops an iterative optimization process before convergence. However, this method needs to be carefully tuned for each dataset. T. Mensink, et al. PAMI, 2013 A. Mignon and F. Jurie. CVPR, 2012 K. Weinberger and L. Saul. JMLR, 2009
9 Related work Schultz and Joachims use the squared Frobenius norm M F 2, following the SVM framework to learn a diagonal PSD distance matrix. The ITML method (Information-Theoretic Metric Learning) uses a LogDet regularizer that constrains the distance matrix to be strictly positive definite. Another powerful way to regularize, is to control the rank of M. Some methods add the trace tr M as a regularization term, because it is a convex surrogate for rank M. M. Schultz and T. Joachims. NIPS, 2003 J. V. Davis, et al. ICML, 2007 D. Lim, et al. ICML, 2013 B. McFee and G. ICML, 2010 C. Shen, et al. NIPS, 2009
10 Related work In this paper, we investigate a new optimization scheme with a regularization term that explicitly controls the rank of M. Such a scheme allows to avoid overfitting without any trick such as early stopping. The main contributions of this paper are: 1) We introduce a new regularization strategy based on the convex hull of rank-k projection matrices, called Fantope, which allows to explicitly control the rank of distance matrices. 2) We propose an efficient algorithm to solve the new optimization scheme. 3) Our framework outperforms state-of-the-art metric learning methods on synthetic and challenging real Computer Vision datasets.
11 Metric Learning Fantope regularization Objective function A metric learning algorithm aims at determining M such that the metric satisfies most of the constraints defined by the training information. It is generally formulated as an optimization problem of the form: min M μr M + l M, A μ 0 is the regularization parameter R M is a regularization term on the parameter M l M, A is a loss function
12 Metric Learning Fantope regularization Motivation for the proposed regularization Controlling the rank of the PSD distance matrix M A standard way: use the nuclear norm M as a regularization term. In the case of PSD matrices, M S + d, M = tr M. seek a rank-0 matrix (i.e. M = 0) We formulate the regularization term R M as the sum of the k smallest eigenvalues of M S d + : R M = d i=d k+1 λ M i limit overfitting exploit correlations between features
13 Metric Learning Fantope regularization Motivation for the proposed regularization R M = d i=d k+1 λ M i Such a minimization of R M will naturally converge to a subspace corresponding to the (d k) most significant eigenvalues. As the rank of the PSD matrix M S + d is the number of its non-zero eigenvalues and all the eigenvalues of M S + d are non-negative, the proposed regularization term R M allows an explicit control over the rank of M: R M equals 0 iff rank M d k
14 Metric Learning Fantope regularization Explicit rank control regularization Using Ky Fan s theorem, we can rewrite the sum of the k smallest eigenvalues of any symmetric matrix M as the trace tr WM where W is the convex hull of the set comprising outer product of orthonormal matrices (rank-k projection matrices). This convex hull is called a Fantope. Our regularization term may be expressed as: R M = tr WM = W, M where the matrix W S + d allows to project the matrix M onto the target k-dimensional subspace. K. Fan. On a theorem of weyl concerning eigenvalues of linear transformations. Proceedings of the National Academy of Sciences of the United States of America, 1949
15 Metric Learning Fantope regularization Explicit rank control regularization A simple way to construct such a matrix W S d + is to use the eigendecomposition of M: T M = V M Diag λ M V M non-increasing order construct w = w 1,, w T d R d such that: 0 if 1 i d k (the first d k elements) w i = 1 if d k + 1 i d the last k elements then express W as: W = V M Diag w V M T
16 Metric Learning Fantope regularization Explicit rank control regularization A simple way to construct such a matrix W S d + is to use the eigendecomposition of M: T M = V M Diag λ M V M construct w = w 1,, w T d R d such that: d 0 if 1 i d k (the first d k elements) w i = R M = 1 if d k + 1 i d the last k elements then express W as: W = V M Diag w V M T R M = tr WM = tr V M Diag w V M T V M Diag λ M V M T = tr Diag w Diag λ M = w T λ M = d i=d k+1 λ M i non-increasing order i=d k+1 λ M i R M equals 0 iff rank M d k
17 Metric Learning Fantope regularization Explicit rank control regularization A simple way to construct such a matrix W S d + is to use the eigendecomposition of M: T M = V M Diag λ M V M construct w = w 1,, w T d R d such that: 0 if 1 i d k (the first d k elements) w i = 1 if d k + 1 i d the last k elements then express W as: W = V M Diag w V M T non-increasing order Fantope regularization is a generalization of trace regularization. Indeed, for every matrix M S + d, tr M = tr I d M. Trace regularization is equivalent to a Fantope regularization where tr WM is the sum of the d smallest eigenvalues of M W = V M Diag 1 V M T = I d.
18 Metric Learning Optimization algorithm Optimization problem Constraints: quadruplet-wise constraints For any quadruplet of images q = p i, p j, p k, p l : q A, D M 2 p k, p l δ q + D M 2 p i, p j a safety margin The triplet-wise constraint: D 2 M p i, p k 1 + D 2 M p i, p j : q = p i, p j, p i, p k and δ q = 1 The pairwise constraint: the dissimilar pair p i, p j D D 2 M p i, p j l: q = p i, p i, p i, p j and δ q = l the similar pair p i, p j S u D 2 M p i, p j : q = p i, p j, p i, p i and δ q = u a minimum value an upper bound
19 Metric Learning Optimization algorithm Optimization problem Constraints: quadruplet-wise constraints For any quadruplet of images q = p i, p j, p k, p l : q A, D M 2 p k, p l δ q + D M 2 p i, p j Using D 2 T M p i, p j = M, x ij x ij the quadruplet-wise constraints using q = p i, p j, p k, p l q A, M, x kl x T kl x ij x T ij A can be rewritten: δ q
20 Metric Learning Optimization algorithm Optimization problem Optimization Define a global loss: l M, A = q A l M q Design the loss for a single quadruplet: l M q = max 0, δ q + M, x ij x T T ij x kl x kl By including our regularization term and l M, A, the optimization problem becomes: min f M S d W M = min μr M + l M, A + M f W M = μ W, M + q A δ q + M, x ij x T T ij x kl x kl + μ 0 is a regularization parameter W, M is the sum of the k smallest eigenvalues of M.
21 Metric Learning Optimization algorithm Solving the optimization problem min f M S d W M = min μr M + l M, A M + f W M = μ W, M + q A δ q + M, x ij x T T ij x kl x kl f W M is not globally convex, it is convex w.r.t. M when W is fixed. the subgradient w.r.t. M is: M = μw + q A + x ij x T T ij x kl x kl subset of constraints in A +, μ 0 W is updated by construction as explained before so that W, M is the sum of the k smallest eigenvalues of M. That process stops when the objective value stops decreasing.
22 Metric Learning Optimization algorithm Solving the optimization problem The global learning scheme is described in Algorithm 1. W = V M Diag w V M T M = μw + q A + x ij x T T ij x kl x kl min f M S d W M = min μr M + l M, A M +
23 Metric Learning Optimization algorithm Efficiency discussion An alternative method to solve the optimization problem is to switch the update between M and W after a full subgradient descent over M. computationally demanding When the input space dimension d is large, the eigendecomposition required at each iteration of the subgradient descent also becomes computationally expensive.
24 Metric Learning Optimization algorithm Efficiency discussion We propose an adaptation of the Alternating Direction Method of Multipliers (ADMM) [S. Boyd, et al] to learn a metric. We then adapt the optimization problem in this way: min f M S d, Z S d W M + g Z where g Z = 0 if Z S + d + if Z S + Introducing a Lagrange multiplier Λ S + d, we obtain the augmented Lagrangian: L p M, Z, Λ = f W M + g Z + Λ, M Z + ρ 2 M Z F 2 where ρ > 0 is a scaling parameter. S. Boyd, et al. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning, 2011 d
25 Metric Learning Optimization algorithm Efficiency discussion Algorithm 2 finds the optimal M before updating W, as previously proposed. However, the approximation and speed up in Algorithm 2 comes from the constraint M S + d which has been replaced by the constraint M S d, whereas g Z promotes a PSD solution matrix. U = 1 ρ Λ
26 Experiments Face verification task Image classification with relative attributes
27 Experiments Face verification task In the face verification task, we are provided with pairs of face images. The goal is to learn a classifier that determines whether image pairs are similar (represent the same person) or dissimilar (represent two different persons).
28 Face verification: LFW Experiment setup Dataset and evaluation metric Labeled Faces in the Wild (LFW) dataset: more than 13,000 images of faces restricted paradigm which only providing with two sets of pairs of images set S set D We follow the standard evaluation protocol that uses View 2 data for training and testing and View 1 for validation.
29 Face verification: LFW Experiment setup Dataset and evaluation metric Labeled Faces in the Wild (LFW) dataset: more than 13,000 images of faces restricted paradigm which only providing with two sets of pairs of images To generate our constraints, set the upper bound u = 0.5, and the lower bound l = 1.5. The distance of a test pair is compared to the threshold l+u = 1 to determine 2 whether the pair is similar or dissimilar. set S set D The pairwise constraint: the dissimilar pair p i, p j D D 2 M p i, p j l: q = p i, p i, p i, p j and δ q = l the similar pair p i, p j S u D 2 M p i, p j : q = p i, p j, p i, p i and δ q = u
30 Face verification: LFW Experiment setup Image representation Use same input features as popular metric learning methods [ITML, LDML, PCCA]. Use the SIFT descriptors computed by [LDML] available on their website. J. V. Davis, et al. Information-theoretic metric learning. ICML, 2007 M. Guillaumin, et al. Is that you? metric learning approaches for face identification. In ICCV, 2009 A. Mignon and F. Jurie. Pcca: A new approach for distance learning from sparse pairwise constraints. CVPR, 2012 Initialization of the distance matrix M S + d First compute the matrix L R e d that is composed of the coefficients for the e most dominant principal components of the training data. Then: M = L T L
31 Face verification: LFW Results Impact of regularization compare the impact of Fantope regularization over trace regularization The table shows classification accuracies when solving the optimization problem with both regularization methods. mean & standard error This illustrates the importance of having an explicit control on the rank of the distance matrix.
32 Face verification: LFW Results State-of-the-art results compare Fantope regularization to other popular metric learning algorithms The table shows performances of ITML, LDML and PCCA. Fantope regularization outperforms ITML and LDML and is comparable to PCCA.
33 Face verification: LFW Results Impact of early stopping The table reports the accuracies we obtained on LFW by testing the code of PCCA provided by its authors, as a function of the number of iterations of gradient descent. Use early stopping criterion in our method: 83.5 ± 0.5% Conclusion: our regularization scheme makes our method much more robust than PCCA to early stopping.
34 Face verification: LFW Results Impact of the hyper-parameter μ 82.3% 81.2% with μ=0 expected rank e = 40 for high values of μ
35 Experiments Image classification with relative attributes In the image classification task with attributes, we are provided with images described with attributes. The goal is to assign an image to a predefined class. Particularly, we focus on the case where classes are described with attributes. Image p i : x i R d, j-th element of x i represents the score (degree) of presence of the j-th attribute in x i.
36 Metric learning in attribute space Experiment setup Datasets Outdoor Scene Recognition (OSR) containing 2688 images from 8 scene categories A subset of Public Figure Face (PubFig) containing 771 images from 8 face categories We use the image features made publicly available by [Parikh and Grauman. ICCV, 2011]: a 512-dimensional GIST [Oliva and Torralba. IJCV, 2001] descriptor for OSR and a concatenation of the GIST descriptor and a 45-dimensional Lab color histogram for PubFig. D. Parikh and K. Grauman. Relative attributes. In ICCV, 2011.
37 Metric learning in attribute space Experiment setup Baselines 1) The relative attribute learning problem described in [Parikh and Grauman. ICCV, 2011] uses relative attribute annotations on classes to compute high-level representations of images x i R d, a Gaussian distribution is learned for each class. 2) The Large Margin Nearest Neighbor (LMNN) [Weinberger and Saul. JMLR, 2009] is a popular metric learning method used for image classification. High-level representations x i R d are used as input features of the LMNN classifier. D. Parikh and K. Grauman. Relative attributes. In ICCV, K. Weinberger and L. Saul. Distance metric learning for large margin nearest neighbor classification. JMLR, 2009
38 Metric learning in attribute space Experiment setup Integration of regularization We modify the code of LMNN [Weinberger and Saul. JMLR, 2009] to integrate trace and Fantope regularization, the stopping criterion is the convergence of the algorithm (i.e. the objective function stops decreasing). Learning setup we use the same experimental setup as [Parikh and Grauman. ICCV, 2011]. N = 30 training images, the rest is for testing.
39 Metric learning in attribute space Results The table reports accuracies of baselines and our proposed regularization method on both OSR and PubFig datasets. 2% 3% These results validate the importance of a proper regularization for predictive accuracy.
40 Metric learning in attribute space Results The figure illustrates on some examples how our scheme is effective to learn semantics. Our method LMNN Our method LMNN
41 Metric learning in attribute space Results The figure illustrates on some examples how our scheme is effective to learn semantics. Our method LMNN Our method LMNN
42 Conclusion We proposed a new regularization scheme for metric learning that explicitly controls the rank of the learned distance matrix. Our method generalizes the trace regularization, and can be applied to various optimization frameworks to impose a meaningful structure on the learned PSD matrix. We derived an efficient metric learning algorithm that combines the regularization term with a loss function that can incorporate constraints between pairs or triplets of images. We demonstrated that regularization greatly improves recognition on real datasets, showing the relevance of this new regularization to limit overfitting. Future work includes the learning of a better designed ADMM formulation scheme that takes into account the fact that the objective function is not convex.
43 Thank You!
Riemannian Metric Learning for Symmetric Positive Definite Matrices
CMSC 88J: Linear Subspaces and Manifolds for Computer Vision and Machine Learning Riemannian Metric Learning for Symmetric Positive Definite Matrices Raviteja Vemulapalli Guide: Professor David W. Jacobs
More informationBeyond Mahalanobis Metric: Cayley-Klein Metric Learning
Beyond Mahalanobis Metric: Cayley-Klein Metric Learning Yanhong Bi, Bin Fan, Fuchao Wu Institute of Automation, Chinese Academy of Sciences Beijing, 100190, China {yanhong.bi, bfan, fcwu}@nlpr.ia.ac.cn
More informationLarge Scale Similarity Learning Using Similar Pairs for Person Verification
Large Scale Similarity Learning Using Similar Pairs for Person Verification Yang Yang, Shengcai Liao, Zhen Lei, Stan Z. Li Center for Biometrics and Security Research & National Laboratory of Pattern Recognition,
More informationFace detection and recognition. Detection Recognition Sally
Face detection and recognition Detection Recognition Sally Face detection & recognition Viola & Jones detector Available in open CV Face recognition Eigenfaces for face recognition Metric learning identification
More informationMirror Descent for Metric Learning. Gautam Kunapuli Jude W. Shavlik
Mirror Descent for Metric Learning Gautam Kunapuli Jude W. Shavlik And what do we have here? We have a metric learning algorithm that uses composite mirror descent (COMID): Unifying framework for metric
More informationSemi Supervised Distance Metric Learning
Semi Supervised Distance Metric Learning wliu@ee.columbia.edu Outline Background Related Work Learning Framework Collaborative Image Retrieval Future Research Background Euclidean distance d( x, x ) =
More informationSimilarity Metric Learning for Face Recognition
Similarity Metric Learning for Face Recognition Qiong Cao, Yig Ying Department of Computer Science University of Exeter, UK {qc218,y.ying}@exeter.ac.uk Peng Li Department of Engineering Mathematics University
More informationSparse Gaussian conditional random fields
Sparse Gaussian conditional random fields Matt Wytock, J. ico Kolter School of Computer Science Carnegie Mellon University Pittsburgh, PA 53 {mwytock, zkolter}@cs.cmu.edu Abstract We propose sparse Gaussian
More informationSparse Compositional Metric Learning
Sparse Compositional Metric Learning Yuan Shi and Aurélien Bellet and Fei Sha Department of Computer Science University of Southern California Los Angeles, CA 90089, USA {yuanshi,bellet,feisha}@usc.edu
More informationarxiv: v1 [cs.cv] 25 Dec 2012
Large Scale Strongly Supervised Ensemble Metric Learning, with Applications to Face Verification and Retrieval Chang Huang NEC Laboratories America chuang@sv.nec-labs.com Shenghuo Zhu NEC Laboratories
More informationKronecker Decomposition for Image Classification
university of innsbruck institute of computer science intelligent and interactive systems Kronecker Decomposition for Image Classification Sabrina Fontanella 1,2, Antonio Rodríguez-Sánchez 1, Justus Piater
More informationAn Efficient Sparse Metric Learning in High-Dimensional Space via l 1 -Penalized Log-Determinant Regularization
via l 1 -Penalized Log-Determinant Regularization Guo-Jun Qi qi4@illinois.edu Depart. ECE, University of Illinois at Urbana-Champaign, 405 North Mathews Avenue, Urbana, IL 61801 USA Jinhui Tang, Zheng-Jun
More informationLarge-scale Image Annotation by Efficient and Robust Kernel Metric Learning
Large-scale Image Annotation by Efficient and Robust Kernel Metric Learning Supplementary Material Zheyun Feng Rong Jin Anil Jain Department of Computer Science and Engineering, Michigan State University,
More informationA Randomized Approach for Crowdsourcing in the Presence of Multiple Views
A Randomized Approach for Crowdsourcing in the Presence of Multiple Views Presenter: Yao Zhou joint work with: Jingrui He - 1 - Roadmap Motivation Proposed framework: M2VW Experimental results Conclusion
More informationMetric Learning From Relative Comparisons by Minimizing Squared Residual
Metric Learning From Relative Comparisons by Minimizing Squared Residual Eric Yi Liu, Zhishan Guo, Xiang Zhang, Vladimir Jojic and Wei Wang {liuyi, zsguo}@cs.unc.edu, xiang.zhang@case.edu, vjojic@cs.unc.edu,
More informationMetric Learning. 16 th Feb 2017 Rahul Dey Anurag Chowdhury
Metric Learning 16 th Feb 2017 Rahul Dey Anurag Chowdhury 1 Presentation based on Bellet, Aurélien, Amaury Habrard, and Marc Sebban. "A survey on metric learning for feature vectors and structured data."
More informationEfficient Stochastic Optimization for Low-Rank Distance Metric Learning
Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17 Efficient Stochastic Optimization for Low-Rank Distance Metric Learning Jie Zhang, Lijun Zhang National Key Laboratory
More informationFast Nonnegative Matrix Factorization with Rank-one ADMM
Fast Nonnegative Matrix Factorization with Rank-one Dongjin Song, David A. Meyer, Martin Renqiang Min, Department of ECE, UCSD, La Jolla, CA, 9093-0409 dosong@ucsd.edu Department of Mathematics, UCSD,
More informationJoint Semi-Supervised Similarity Learning for Linear Classification
Joint Semi-Supervised Similarity Learning for Linear Classification Maria-Irina Nicolae 1,2, Éric Gaussier2, Amaury Habrard 1, and Marc Sebban 1 1 Université Jean Monnet, Laboratoire Hubert Curien, France
More informationhttps://goo.gl/kfxweg KYOTO UNIVERSITY Statistical Machine Learning Theory Sparsity Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT OF INTELLIGENCE SCIENCE AND TECHNOLOGY 1 KYOTO UNIVERSITY Topics:
More informationA Least Squares Formulation for Canonical Correlation Analysis
A Least Squares Formulation for Canonical Correlation Analysis Liang Sun, Shuiwang Ji, and Jieping Ye Department of Computer Science and Engineering Arizona State University Motivation Canonical Correlation
More informationSupervised Metric Learning with Generalization Guarantees
Supervised Metric Learning with Generalization Guarantees Aurélien Bellet Laboratoire Hubert Curien, Université de Saint-Etienne, Université de Lyon Reviewers: Pierre Dupont (UC Louvain) and Jose Oncina
More informationSupport'Vector'Machines. Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan
Support'Vector'Machines Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan kasthuri.kannan@nyumc.org Overview Support Vector Machines for Classification Linear Discrimination Nonlinear Discrimination
More informationLecture 5 : Projections
Lecture 5 : Projections EE227C. Lecturer: Professor Martin Wainwright. Scribe: Alvin Wan Up until now, we have seen convergence rates of unconstrained gradient descent. Now, we consider a constrained minimization
More informationLearning A Mixture of Sparse Distance Metrics for Classification and Dimensionality Reduction
Learning A Mixture of Sparse Distance Metrics for Classification and Dimensionality Reduction Yi Hong, Quannan Li, Jiayan Jiang, and Zhuowen Tu, Lab of Neuro Imaging and Department of Computer Science,
More informationIntroduction to Alternating Direction Method of Multipliers
Introduction to Alternating Direction Method of Multipliers Yale Chang Machine Learning Group Meeting September 29, 2016 Yale Chang (Machine Learning Group Meeting) Introduction to Alternating Direction
More informationDistance Metric Learning
Distance Metric Learning Technical University of Munich Department of Informatics Computer Vision Group November 11, 2016 M.Sc. John Chiotellis: Distance Metric Learning 1 / 36 Outline Computer Vision
More informationCS4495/6495 Introduction to Computer Vision. 8C-L3 Support Vector Machines
CS4495/6495 Introduction to Computer Vision 8C-L3 Support Vector Machines Discriminative classifiers Discriminative classifiers find a division (surface) in feature space that separates the classes Several
More informationKnowledge Transfer with Interactive Learning of Semantic Relationships
Knowledge Transfer with Interactive Learning of Semantic Relationships Feb. 15, 2016 Jonghyun Choi Sung Ju Hwang, Leonid Sigal and Larry S. Davis University of Maryland Institute of Advanced Computer Studies
More informationAn Invariant Large Margin Nearest Neighbour Classifier
An Invariant Large Margin Nearest Neighbour Classifier M. Pawan Kumar P.H.S. Torr A. Zisserman Oxford Brookes University University of Oxford {pkmudigonda,philiptorr}@brookes.ac.uk http://cms.brookes.ac.uk/research/visiongroup
More informationPerceptron Revisited: Linear Separators. Support Vector Machines
Support Vector Machines Perceptron Revisited: Linear Separators Binary classification can be viewed as the task of separating classes in feature space: w T x + b > 0 w T x + b = 0 w T x + b < 0 Department
More informationSample Complexity of Learning Mahalanobis Distance Metrics
Sample Complexity of Learning Mahalanobis Distance Metrics Nakul Verma Janelia Research Campus, HHMI verman@janelia.hhmi.org Kristin Branson Janelia Research Campus, HHMI bransonk@janelia.hhmi.org Abstract
More informationKernel Density Topic Models: Visual Topics Without Visual Words
Kernel Density Topic Models: Visual Topics Without Visual Words Konstantinos Rematas K.U. Leuven ESAT-iMinds krematas@esat.kuleuven.be Mario Fritz Max Planck Institute for Informatics mfrtiz@mpi-inf.mpg.de
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationLecture Notes 1: Vector spaces
Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector
More informationGlobal Scene Representations. Tilke Judd
Global Scene Representations Tilke Judd Papers Oliva and Torralba [2001] Fei Fei and Perona [2005] Labzebnik, Schmid and Ponce [2006] Commonalities Goal: Recognize natural scene categories Extract features
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction
More informationEfficient Online Relative Comparison Kernel Learning
Efficient Online Relative Comparison Kernel Learning Eric Heim Matthew Berger Lee M. Seversky Milos Hauskrecht Abstract Learning a kernel matrix from relative comparison human feedback is an important
More informationMachine learning for pervasive systems Classification in high-dimensional spaces
Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version
More informationConvex Optimization Algorithms for Machine Learning in 10 Slides
Convex Optimization Algorithms for Machine Learning in 10 Slides Presenter: Jul. 15. 2015 Outline 1 Quadratic Problem Linear System 2 Smooth Problem Newton-CG 3 Composite Problem Proximal-Newton-CD 4 Non-smooth,
More informationVector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.
Vector spaces DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Vector space Consists of: A set V A scalar
More informationMULTIPLEKERNELLEARNING CSE902
MULTIPLEKERNELLEARNING CSE902 Multiple Kernel Learning -keywords Heterogeneous information fusion Feature selection Max-margin classification Multiple kernel learning MKL Convex optimization Kernel classification
More informationOnline Kernel PCA with Entropic Matrix Updates
Dima Kuzmin Manfred K. Warmuth Computer Science Department, University of California - Santa Cruz dima@cse.ucsc.edu manfred@cse.ucsc.edu Abstract A number of updates for density matrices have been developed
More informationMachine Learning - MT & 14. PCA and MDS
Machine Learning - MT 2016 13 & 14. PCA and MDS Varun Kanade University of Oxford November 21 & 23, 2016 Announcements Sheet 4 due this Friday by noon Practical 3 this week (continue next week if necessary)
More informationmetric learning course
metric learning course Cours RI Master DAC UPMC (Construit à partir d un tutorial ECML-PKDD 2015 (A. Bellet, M. Cord)) 1. Introduction 2. Linear metric learning 3. Nonlinear extensions 4. Large-scale metric
More informationA RELIEF Based Feature Extraction Algorithm
A RELIEF Based Feature Extraction Algorithm Yijun Sun Dapeng Wu Abstract RELIEF is considered one of the most successful algorithms for assessing the quality of features due to its simplicity and effectiveness.
More informationCS6375: Machine Learning Gautam Kunapuli. Support Vector Machines
Gautam Kunapuli Example: Text Categorization Example: Develop a model to classify news stories into various categories based on their content. sports politics Use the bag-of-words representation for this
More informationHomework 4. Convex Optimization /36-725
Homework 4 Convex Optimization 10-725/36-725 Due Friday November 4 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Support Vector Machines Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique
More informationUncorrelated Multilinear Principal Component Analysis through Successive Variance Maximization
Uncorrelated Multilinear Principal Component Analysis through Successive Variance Maximization Haiping Lu 1 K. N. Plataniotis 1 A. N. Venetsanopoulos 1,2 1 Department of Electrical & Computer Engineering,
More informationOptimal Stochastic Strongly Convex Optimization with a Logarithmic Number of Projections
Optimal Stochastic Strongly Convex Optimization with a Logarithmic Number of Projections Jianhui Chen 1, Tianbao Yang 2, Qihang Lin 2, Lijun Zhang 3, and Yi Chang 4 July 18, 2016 Yahoo Research 1, The
More informationDeep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści
Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?
More informationSome tensor decomposition methods for machine learning
Some tensor decomposition methods for machine learning Massimiliano Pontil Istituto Italiano di Tecnologia and University College London 16 August 2016 1 / 36 Outline Problem and motivation Tucker decomposition
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationDimensionality Reduction Using the Sparse Linear Model: Supplementary Material
Dimensionality Reduction Using the Sparse Linear Model: Supplementary Material Ioannis Gkioulekas arvard SEAS Cambridge, MA 038 igkiou@seas.harvard.edu Todd Zickler arvard SEAS Cambridge, MA 038 zickler@seas.harvard.edu
More informationMachine Learning Basics
Security and Fairness of Deep Learning Machine Learning Basics Anupam Datta CMU Spring 2019 Image Classification Image Classification Image classification pipeline Input: A training set of N images, each
More informationMidterm exam CS 189/289, Fall 2015
Midterm exam CS 189/289, Fall 2015 You have 80 minutes for the exam. Total 100 points: 1. True/False: 36 points (18 questions, 2 points each). 2. Multiple-choice questions: 24 points (8 questions, 3 points
More informationKernel Learning with Bregman Matrix Divergences
Kernel Learning with Bregman Matrix Divergences Inderjit S. Dhillon The University of Texas at Austin Workshop on Algorithms for Modern Massive Data Sets Stanford University and Yahoo! Research June 22,
More informationTutorial on Metric Learning
Tutorial on Metric Learning Aurélien Bellet Department of Computer Science Viterbi School of Engineering University of Southern California Computational Intelligence and Learning Doctoral School October
More informationHYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH
HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH Hoang Trang 1, Tran Hoang Loc 1 1 Ho Chi Minh City University of Technology-VNU HCM, Ho Chi
More informationarxiv: v1 [cs.lg] 9 Apr 2008
On Kernelization of Supervised Mahalanobis Distance Learners Ratthachat Chatpatanasiri, Teesid Korsrilabutr, Pasakorn Tangchanachaianan, and Boonserm Kijsirikul arxiv:0804.1441v1 [cs.lg] 9 Apr 2008 Department
More informationarxiv: v2 [cs.lg] 12 Jan 2015
Efficient Online Relative Comparison Kernel Learning Eric Heim Matthew Berger Lee M. Seversky Milos Hauskrecht arxiv:1501.01242v2 [cs.lg] 12 Jan 2015 Abstract Learning a kernel matrix from relative comparison
More informationAsaf Bar Zvi Adi Hayat. Semantic Segmentation
Asaf Bar Zvi Adi Hayat Semantic Segmentation Today s Topics Fully Convolutional Networks (FCN) (CVPR 2015) Conditional Random Fields as Recurrent Neural Networks (ICCV 2015) Gaussian Conditional random
More informationApproximating the Covariance Matrix with Low-rank Perturbations
Approximating the Covariance Matrix with Low-rank Perturbations Malik Magdon-Ismail and Jonathan T. Purnell Department of Computer Science Rensselaer Polytechnic Institute Troy, NY 12180 {magdon,purnej}@cs.rpi.edu
More informationRanking from Crowdsourced Pairwise Comparisons via Matrix Manifold Optimization
Ranking from Crowdsourced Pairwise Comparisons via Matrix Manifold Optimization Jialin Dong ShanghaiTech University 1 Outline Introduction FourVignettes: System Model and Problem Formulation Problem Analysis
More informationThe Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017
The Kernel Trick, Gram Matrices, and Feature Extraction CS6787 Lecture 4 Fall 2017 Momentum for Principle Component Analysis CS6787 Lecture 3.1 Fall 2017 Principle Component Analysis Setting: find the
More informationNonlinear Metric Learning with Kernel Density Estimation
1 Nonlinear Metric Learning with Kernel Density Estimation Yujie He, Yi Mao, Wenlin Chen, and Yixin Chen, Senior Member, IEEE Abstract Metric learning, the task of learning a good distance metric, is a
More informationLearning to Rank and Quadratic Assignment
Learning to Rank and Quadratic Assignment Thomas Mensink TVPA - XRCE & LEAR - INRIA Grenoble, France Jakob Verbeek LEAR Team INRIA Rhône-Alpes Grenoble, France Abstract Tiberio Caetano Machine Learning
More informationLecture Note 5: Semidefinite Programming for Stability Analysis
ECE7850: Hybrid Systems:Theory and Applications Lecture Note 5: Semidefinite Programming for Stability Analysis Wei Zhang Assistant Professor Department of Electrical and Computer Engineering Ohio State
More informationCS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu
CS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu Feature engineering is hard 1. Extract informative features from domain knowledge
More informationLinear Subspace Models
Linear Subspace Models Goal: Explore linear models of a data set. Motivation: A central question in vision concerns how we represent a collection of data vectors. The data vectors may be rasterized images,
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table
More informationKernel Density Metric Learning
Kernel Density Metric Learning Yujie He, Wenlin Chen, Yixin Chen Department of Computer Science and Engineering Washington University St. Louis, USA yujie.he@wustl.edu, wenlinchen@wustl.edu, chen@cse.wustl.edu
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2016 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationRegularization in Neural Networks
Regularization in Neural Networks Sargur Srihari 1 Topics in Neural Network Regularization What is regularization? Methods 1. Determining optimal number of hidden units 2. Use of regularizer in error function
More informationLearning Spectral Clustering
Learning Spectral Clustering Francis R. Bach Computer Science University of California Berkeley, CA 94720 fbach@cs.berkeley.edu Michael I. Jordan Computer Science and Statistics University of California
More informationSVMs, Duality and the Kernel Trick
SVMs, Duality and the Kernel Trick Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 26 th, 2007 2005-2007 Carlos Guestrin 1 SVMs reminder 2005-2007 Carlos Guestrin 2 Today
More informationCS 3710: Visual Recognition Describing Images with Features. Adriana Kovashka Department of Computer Science January 8, 2015
CS 3710: Visual Recognition Describing Images with Features Adriana Kovashka Department of Computer Science January 8, 2015 Plan for Today Presentation assignments + schedule changes Image filtering Feature
More informationAn efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss
An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss arxiv:1811.04545v1 [stat.co] 12 Nov 2018 Cheng Wang School of Mathematical Sciences, Shanghai Jiao
More informationLogistic Regression. COMP 527 Danushka Bollegala
Logistic Regression COMP 527 Danushka Bollegala Binary Classification Given an instance x we must classify it to either positive (1) or negative (0) class We can use {1,-1} instead of {1,0} but we will
More informationSupport Vector Machines and Kernel Methods
2018 CS420 Machine Learning, Lecture 3 Hangout from Prof. Andrew Ng. http://cs229.stanford.edu/notes/cs229-notes3.pdf Support Vector Machines and Kernel Methods Weinan Zhang Shanghai Jiao Tong University
More informationIntroduction to Support Vector Machines
Introduction to Support Vector Machines Hsuan-Tien Lin Learning Systems Group, California Institute of Technology Talk in NTU EE/CS Speech Lab, November 16, 2005 H.-T. Lin (Learning Systems Group) Introduction
More informationCS 231A Section 1: Linear Algebra & Probability Review. Kevin Tang
CS 231A Section 1: Linear Algebra & Probability Review Kevin Tang Kevin Tang Section 1-1 9/30/2011 Topics Support Vector Machines Boosting Viola Jones face detector Linear Algebra Review Notation Operations
More informationCS 231A Section 1: Linear Algebra & Probability Review
CS 231A Section 1: Linear Algebra & Probability Review 1 Topics Support Vector Machines Boosting Viola-Jones face detector Linear Algebra Review Notation Operations & Properties Matrix Calculus Probability
More informationUses of duality. Geoff Gordon & Ryan Tibshirani Optimization /
Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationHomework 5. Convex Optimization /36-725
Homework 5 Convex Optimization 10-725/36-725 Due Tuesday November 22 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)
More informationCENG 793. On Machine Learning and Optimization. Sinan Kalkan
CENG 793 On Machine Learning and Optimization Sinan Kalkan 2 Now Introduction to ML Problem definition Classes of approaches K-NN Support Vector Machines Softmax classification / logistic regression Parzen
More informationStatistical Machine Learning
Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x
More informationBeyond Linear Similarity Function Learning
Beyond Linear Similarity Function Learning Julien Bohné A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy of University College London. Department
More informationLinear Regression. Aarti Singh. Machine Learning / Sept 27, 2010
Linear Regression Aarti Singh Machine Learning 10-701/15-781 Sept 27, 2010 Discrete to Continuous Labels Classification Sports Science News Anemic cell Healthy cell Regression X = Document Y = Topic X
More informationDimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas
Dimensionality Reduction: PCA Nicholas Ruozzi University of Texas at Dallas Eigenvalues λ is an eigenvalue of a matrix A R n n if the linear system Ax = λx has at least one non-zero solution If Ax = λx
More informationSparse Covariance Selection using Semidefinite Programming
Sparse Covariance Selection using Semidefinite Programming A. d Aspremont ORFE, Princeton University Joint work with O. Banerjee, L. El Ghaoui & G. Natsoulis, U.C. Berkeley & Iconix Pharmaceuticals Support
More informationarxiv: v1 [stat.ml] 10 Dec 2015
Boosted Sparse Non-linear Distance Metric Learning arxiv:1512.03396v1 [stat.ml] 10 Dec 2015 Yuting Ma Tian Zheng yma@stat.columbia.edu tzheng@stat.columbia.edu Department of Statistics Department of Statistics
More informationOnline Videos FERPA. Sign waiver or sit on the sides or in the back. Off camera question time before and after lecture. Questions?
Online Videos FERPA Sign waiver or sit on the sides or in the back Off camera question time before and after lecture Questions? Lecture 1, Slide 1 CS224d Deep NLP Lecture 4: Word Window Classification
More informationMachine Learning Support Vector Machines. Prof. Matteo Matteucci
Machine Learning Support Vector Machines Prof. Matteo Matteucci Discriminative vs. Generative Approaches 2 o Generative approach: we derived the classifier from some generative hypothesis about the way
More informationNon-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines
Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2018 CS 551, Fall
More informationmetric learning for large-scale data
metric learning for large-scale data Aurélien Bellet MAGNET Project-Team, Inria Seminar Statistical Machine Learning (SMILE) in Paris April 28, 2016 a bit about me 2009-12: Ph.D., Université de Saint-Etienne
More informationEE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015
EE613 Machine Learning for Engineers Kernel methods Support Vector Machines jean-marc odobez 2015 overview Kernel methods introductions and main elements defining kernels Kernelization of k-nn, K-Means,
More informationLinear Dimensionality Reduction
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Principal Component Analysis 3 Factor Analysis
More information