Fantope Regularization in Metric Learning

Size: px
Start display at page:

Download "Fantope Regularization in Metric Learning"

Transcription

1 Fantope Regularization in Metric Learning CVPR 2014 Marc T. Law (LIP6, UPMC), Nicolas Thome (LIP6 - UPMC Sorbonne Universités), Matthieu Cord (LIP6 - UPMC Sorbonne Universités), Paris, France

2 Introduction Notations & Related work Metric learning Fantope regularization Metric learning optimization algorithm Experiments Conclusion

3 Introduction Metric learning algorithms produce a linear transformation of data which is optimized to fit semantical relationships between training samples. Different aspects of the learning procedure have recently been investigated: how the dataset is annotated and used in the learning process; design choices for the distance parameterization; extensions to large scale context, etc. Surprisingly, few attempts have been made for deriving a proper regularization scheme. Regularization in metric learning is however a critical issue, as it often limits model complexity, the number of independent parameters to learn, and thus overfitting. Models learned with regularization usually better exploit correlations between features and often have improved predictive accuracy.

4 Introduction In this paper, we propose a novel regularization approach for metric learning that explicitly controls the rank of the learned distance matrix. The figure below illustrates the relevance of our approach. PubFig Our method LMNN OSR Our method LMNN

5 Introduction Notations & Related work Metric learning Fantope regularization Metric learning optimization algorithm Experiments Conclusion

6 Notations S d : sets of d d real-valued symmetric matrices S + d : sets of d d real-valued symmetric positive semidefinite (PSD) matrices For matrices A S d and B S d, we denote the Frobenius inner product by A, B = tr A T B. Π S+ d A is the orthogonal projection of the matrix A S d onto the positive semidefinite cone S + d. For a given a = a 1,, a d T R d, Diag a = A S d corresponds to a square diagonal matrix such that i, A i,i = a i. λ A is the vector of eigenvalues of matrix A arranged in non-increasing order. λ A i is the i-th largest eigenvalue of A. x i R d (resp. x j R d ) is the vector representation of image p i (resp. p j ), we note x ij = x i x j. For x R d, let x + = max 0, x.

7 Related work We focus in this work on supervised distance metric learning methods. ----similar and dissimilar pairs of images ----triplets of images In this paper, we consider the widely used Mahalanobis distance metric D M that is parameterized by the PSD matrix M S + d such that: D M 2 p i, p j = x i x j T M xi x j = x ij T Mxij It can also be rewritten: D 2 T M p i, p j = M, x ij x ij J. V. Davis, et al. ICML, 2007 A. Mignon and F. Jurie. CVPR, 2012 E. Xing, et al. NIPS, 2002 G. Chechik, et al. JMLR, 2010 A. Frome, et al. ICCV, 2007 M. Schultz and T. Joachims. NIPS, 2003 K. Weinberger and L. Saul. JMLR, 2009

8 Related work Many approaches prefer working on a specific matrix decomposition: i.e. M = L T L where L R e d and d is the data dimension. resulting optimization is very fast not convex w.r.t. L local minima In addition, an explicit regularization term is rarely introduced in the learning scheme. For instance, that lack of regularization makes LMNN prone to overfitting. To limit this shortcoming, many approaches perform early stopping which stops an iterative optimization process before convergence. However, this method needs to be carefully tuned for each dataset. T. Mensink, et al. PAMI, 2013 A. Mignon and F. Jurie. CVPR, 2012 K. Weinberger and L. Saul. JMLR, 2009

9 Related work Schultz and Joachims use the squared Frobenius norm M F 2, following the SVM framework to learn a diagonal PSD distance matrix. The ITML method (Information-Theoretic Metric Learning) uses a LogDet regularizer that constrains the distance matrix to be strictly positive definite. Another powerful way to regularize, is to control the rank of M. Some methods add the trace tr M as a regularization term, because it is a convex surrogate for rank M. M. Schultz and T. Joachims. NIPS, 2003 J. V. Davis, et al. ICML, 2007 D. Lim, et al. ICML, 2013 B. McFee and G. ICML, 2010 C. Shen, et al. NIPS, 2009

10 Related work In this paper, we investigate a new optimization scheme with a regularization term that explicitly controls the rank of M. Such a scheme allows to avoid overfitting without any trick such as early stopping. The main contributions of this paper are: 1) We introduce a new regularization strategy based on the convex hull of rank-k projection matrices, called Fantope, which allows to explicitly control the rank of distance matrices. 2) We propose an efficient algorithm to solve the new optimization scheme. 3) Our framework outperforms state-of-the-art metric learning methods on synthetic and challenging real Computer Vision datasets.

11 Metric Learning Fantope regularization Objective function A metric learning algorithm aims at determining M such that the metric satisfies most of the constraints defined by the training information. It is generally formulated as an optimization problem of the form: min M μr M + l M, A μ 0 is the regularization parameter R M is a regularization term on the parameter M l M, A is a loss function

12 Metric Learning Fantope regularization Motivation for the proposed regularization Controlling the rank of the PSD distance matrix M A standard way: use the nuclear norm M as a regularization term. In the case of PSD matrices, M S + d, M = tr M. seek a rank-0 matrix (i.e. M = 0) We formulate the regularization term R M as the sum of the k smallest eigenvalues of M S d + : R M = d i=d k+1 λ M i limit overfitting exploit correlations between features

13 Metric Learning Fantope regularization Motivation for the proposed regularization R M = d i=d k+1 λ M i Such a minimization of R M will naturally converge to a subspace corresponding to the (d k) most significant eigenvalues. As the rank of the PSD matrix M S + d is the number of its non-zero eigenvalues and all the eigenvalues of M S + d are non-negative, the proposed regularization term R M allows an explicit control over the rank of M: R M equals 0 iff rank M d k

14 Metric Learning Fantope regularization Explicit rank control regularization Using Ky Fan s theorem, we can rewrite the sum of the k smallest eigenvalues of any symmetric matrix M as the trace tr WM where W is the convex hull of the set comprising outer product of orthonormal matrices (rank-k projection matrices). This convex hull is called a Fantope. Our regularization term may be expressed as: R M = tr WM = W, M where the matrix W S + d allows to project the matrix M onto the target k-dimensional subspace. K. Fan. On a theorem of weyl concerning eigenvalues of linear transformations. Proceedings of the National Academy of Sciences of the United States of America, 1949

15 Metric Learning Fantope regularization Explicit rank control regularization A simple way to construct such a matrix W S d + is to use the eigendecomposition of M: T M = V M Diag λ M V M non-increasing order construct w = w 1,, w T d R d such that: 0 if 1 i d k (the first d k elements) w i = 1 if d k + 1 i d the last k elements then express W as: W = V M Diag w V M T

16 Metric Learning Fantope regularization Explicit rank control regularization A simple way to construct such a matrix W S d + is to use the eigendecomposition of M: T M = V M Diag λ M V M construct w = w 1,, w T d R d such that: d 0 if 1 i d k (the first d k elements) w i = R M = 1 if d k + 1 i d the last k elements then express W as: W = V M Diag w V M T R M = tr WM = tr V M Diag w V M T V M Diag λ M V M T = tr Diag w Diag λ M = w T λ M = d i=d k+1 λ M i non-increasing order i=d k+1 λ M i R M equals 0 iff rank M d k

17 Metric Learning Fantope regularization Explicit rank control regularization A simple way to construct such a matrix W S d + is to use the eigendecomposition of M: T M = V M Diag λ M V M construct w = w 1,, w T d R d such that: 0 if 1 i d k (the first d k elements) w i = 1 if d k + 1 i d the last k elements then express W as: W = V M Diag w V M T non-increasing order Fantope regularization is a generalization of trace regularization. Indeed, for every matrix M S + d, tr M = tr I d M. Trace regularization is equivalent to a Fantope regularization where tr WM is the sum of the d smallest eigenvalues of M W = V M Diag 1 V M T = I d.

18 Metric Learning Optimization algorithm Optimization problem Constraints: quadruplet-wise constraints For any quadruplet of images q = p i, p j, p k, p l : q A, D M 2 p k, p l δ q + D M 2 p i, p j a safety margin The triplet-wise constraint: D 2 M p i, p k 1 + D 2 M p i, p j : q = p i, p j, p i, p k and δ q = 1 The pairwise constraint: the dissimilar pair p i, p j D D 2 M p i, p j l: q = p i, p i, p i, p j and δ q = l the similar pair p i, p j S u D 2 M p i, p j : q = p i, p j, p i, p i and δ q = u a minimum value an upper bound

19 Metric Learning Optimization algorithm Optimization problem Constraints: quadruplet-wise constraints For any quadruplet of images q = p i, p j, p k, p l : q A, D M 2 p k, p l δ q + D M 2 p i, p j Using D 2 T M p i, p j = M, x ij x ij the quadruplet-wise constraints using q = p i, p j, p k, p l q A, M, x kl x T kl x ij x T ij A can be rewritten: δ q

20 Metric Learning Optimization algorithm Optimization problem Optimization Define a global loss: l M, A = q A l M q Design the loss for a single quadruplet: l M q = max 0, δ q + M, x ij x T T ij x kl x kl By including our regularization term and l M, A, the optimization problem becomes: min f M S d W M = min μr M + l M, A + M f W M = μ W, M + q A δ q + M, x ij x T T ij x kl x kl + μ 0 is a regularization parameter W, M is the sum of the k smallest eigenvalues of M.

21 Metric Learning Optimization algorithm Solving the optimization problem min f M S d W M = min μr M + l M, A M + f W M = μ W, M + q A δ q + M, x ij x T T ij x kl x kl f W M is not globally convex, it is convex w.r.t. M when W is fixed. the subgradient w.r.t. M is: M = μw + q A + x ij x T T ij x kl x kl subset of constraints in A +, μ 0 W is updated by construction as explained before so that W, M is the sum of the k smallest eigenvalues of M. That process stops when the objective value stops decreasing.

22 Metric Learning Optimization algorithm Solving the optimization problem The global learning scheme is described in Algorithm 1. W = V M Diag w V M T M = μw + q A + x ij x T T ij x kl x kl min f M S d W M = min μr M + l M, A M +

23 Metric Learning Optimization algorithm Efficiency discussion An alternative method to solve the optimization problem is to switch the update between M and W after a full subgradient descent over M. computationally demanding When the input space dimension d is large, the eigendecomposition required at each iteration of the subgradient descent also becomes computationally expensive.

24 Metric Learning Optimization algorithm Efficiency discussion We propose an adaptation of the Alternating Direction Method of Multipliers (ADMM) [S. Boyd, et al] to learn a metric. We then adapt the optimization problem in this way: min f M S d, Z S d W M + g Z where g Z = 0 if Z S + d + if Z S + Introducing a Lagrange multiplier Λ S + d, we obtain the augmented Lagrangian: L p M, Z, Λ = f W M + g Z + Λ, M Z + ρ 2 M Z F 2 where ρ > 0 is a scaling parameter. S. Boyd, et al. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning, 2011 d

25 Metric Learning Optimization algorithm Efficiency discussion Algorithm 2 finds the optimal M before updating W, as previously proposed. However, the approximation and speed up in Algorithm 2 comes from the constraint M S + d which has been replaced by the constraint M S d, whereas g Z promotes a PSD solution matrix. U = 1 ρ Λ

26 Experiments Face verification task Image classification with relative attributes

27 Experiments Face verification task In the face verification task, we are provided with pairs of face images. The goal is to learn a classifier that determines whether image pairs are similar (represent the same person) or dissimilar (represent two different persons).

28 Face verification: LFW Experiment setup Dataset and evaluation metric Labeled Faces in the Wild (LFW) dataset: more than 13,000 images of faces restricted paradigm which only providing with two sets of pairs of images set S set D We follow the standard evaluation protocol that uses View 2 data for training and testing and View 1 for validation.

29 Face verification: LFW Experiment setup Dataset and evaluation metric Labeled Faces in the Wild (LFW) dataset: more than 13,000 images of faces restricted paradigm which only providing with two sets of pairs of images To generate our constraints, set the upper bound u = 0.5, and the lower bound l = 1.5. The distance of a test pair is compared to the threshold l+u = 1 to determine 2 whether the pair is similar or dissimilar. set S set D The pairwise constraint: the dissimilar pair p i, p j D D 2 M p i, p j l: q = p i, p i, p i, p j and δ q = l the similar pair p i, p j S u D 2 M p i, p j : q = p i, p j, p i, p i and δ q = u

30 Face verification: LFW Experiment setup Image representation Use same input features as popular metric learning methods [ITML, LDML, PCCA]. Use the SIFT descriptors computed by [LDML] available on their website. J. V. Davis, et al. Information-theoretic metric learning. ICML, 2007 M. Guillaumin, et al. Is that you? metric learning approaches for face identification. In ICCV, 2009 A. Mignon and F. Jurie. Pcca: A new approach for distance learning from sparse pairwise constraints. CVPR, 2012 Initialization of the distance matrix M S + d First compute the matrix L R e d that is composed of the coefficients for the e most dominant principal components of the training data. Then: M = L T L

31 Face verification: LFW Results Impact of regularization compare the impact of Fantope regularization over trace regularization The table shows classification accuracies when solving the optimization problem with both regularization methods. mean & standard error This illustrates the importance of having an explicit control on the rank of the distance matrix.

32 Face verification: LFW Results State-of-the-art results compare Fantope regularization to other popular metric learning algorithms The table shows performances of ITML, LDML and PCCA. Fantope regularization outperforms ITML and LDML and is comparable to PCCA.

33 Face verification: LFW Results Impact of early stopping The table reports the accuracies we obtained on LFW by testing the code of PCCA provided by its authors, as a function of the number of iterations of gradient descent. Use early stopping criterion in our method: 83.5 ± 0.5% Conclusion: our regularization scheme makes our method much more robust than PCCA to early stopping.

34 Face verification: LFW Results Impact of the hyper-parameter μ 82.3% 81.2% with μ=0 expected rank e = 40 for high values of μ

35 Experiments Image classification with relative attributes In the image classification task with attributes, we are provided with images described with attributes. The goal is to assign an image to a predefined class. Particularly, we focus on the case where classes are described with attributes. Image p i : x i R d, j-th element of x i represents the score (degree) of presence of the j-th attribute in x i.

36 Metric learning in attribute space Experiment setup Datasets Outdoor Scene Recognition (OSR) containing 2688 images from 8 scene categories A subset of Public Figure Face (PubFig) containing 771 images from 8 face categories We use the image features made publicly available by [Parikh and Grauman. ICCV, 2011]: a 512-dimensional GIST [Oliva and Torralba. IJCV, 2001] descriptor for OSR and a concatenation of the GIST descriptor and a 45-dimensional Lab color histogram for PubFig. D. Parikh and K. Grauman. Relative attributes. In ICCV, 2011.

37 Metric learning in attribute space Experiment setup Baselines 1) The relative attribute learning problem described in [Parikh and Grauman. ICCV, 2011] uses relative attribute annotations on classes to compute high-level representations of images x i R d, a Gaussian distribution is learned for each class. 2) The Large Margin Nearest Neighbor (LMNN) [Weinberger and Saul. JMLR, 2009] is a popular metric learning method used for image classification. High-level representations x i R d are used as input features of the LMNN classifier. D. Parikh and K. Grauman. Relative attributes. In ICCV, K. Weinberger and L. Saul. Distance metric learning for large margin nearest neighbor classification. JMLR, 2009

38 Metric learning in attribute space Experiment setup Integration of regularization We modify the code of LMNN [Weinberger and Saul. JMLR, 2009] to integrate trace and Fantope regularization, the stopping criterion is the convergence of the algorithm (i.e. the objective function stops decreasing). Learning setup we use the same experimental setup as [Parikh and Grauman. ICCV, 2011]. N = 30 training images, the rest is for testing.

39 Metric learning in attribute space Results The table reports accuracies of baselines and our proposed regularization method on both OSR and PubFig datasets. 2% 3% These results validate the importance of a proper regularization for predictive accuracy.

40 Metric learning in attribute space Results The figure illustrates on some examples how our scheme is effective to learn semantics. Our method LMNN Our method LMNN

41 Metric learning in attribute space Results The figure illustrates on some examples how our scheme is effective to learn semantics. Our method LMNN Our method LMNN

42 Conclusion We proposed a new regularization scheme for metric learning that explicitly controls the rank of the learned distance matrix. Our method generalizes the trace regularization, and can be applied to various optimization frameworks to impose a meaningful structure on the learned PSD matrix. We derived an efficient metric learning algorithm that combines the regularization term with a loss function that can incorporate constraints between pairs or triplets of images. We demonstrated that regularization greatly improves recognition on real datasets, showing the relevance of this new regularization to limit overfitting. Future work includes the learning of a better designed ADMM formulation scheme that takes into account the fact that the objective function is not convex.

43 Thank You!

Riemannian Metric Learning for Symmetric Positive Definite Matrices

Riemannian Metric Learning for Symmetric Positive Definite Matrices CMSC 88J: Linear Subspaces and Manifolds for Computer Vision and Machine Learning Riemannian Metric Learning for Symmetric Positive Definite Matrices Raviteja Vemulapalli Guide: Professor David W. Jacobs

More information

Beyond Mahalanobis Metric: Cayley-Klein Metric Learning

Beyond Mahalanobis Metric: Cayley-Klein Metric Learning Beyond Mahalanobis Metric: Cayley-Klein Metric Learning Yanhong Bi, Bin Fan, Fuchao Wu Institute of Automation, Chinese Academy of Sciences Beijing, 100190, China {yanhong.bi, bfan, fcwu}@nlpr.ia.ac.cn

More information

Large Scale Similarity Learning Using Similar Pairs for Person Verification

Large Scale Similarity Learning Using Similar Pairs for Person Verification Large Scale Similarity Learning Using Similar Pairs for Person Verification Yang Yang, Shengcai Liao, Zhen Lei, Stan Z. Li Center for Biometrics and Security Research & National Laboratory of Pattern Recognition,

More information

Face detection and recognition. Detection Recognition Sally

Face detection and recognition. Detection Recognition Sally Face detection and recognition Detection Recognition Sally Face detection & recognition Viola & Jones detector Available in open CV Face recognition Eigenfaces for face recognition Metric learning identification

More information

Mirror Descent for Metric Learning. Gautam Kunapuli Jude W. Shavlik

Mirror Descent for Metric Learning. Gautam Kunapuli Jude W. Shavlik Mirror Descent for Metric Learning Gautam Kunapuli Jude W. Shavlik And what do we have here? We have a metric learning algorithm that uses composite mirror descent (COMID): Unifying framework for metric

More information

Semi Supervised Distance Metric Learning

Semi Supervised Distance Metric Learning Semi Supervised Distance Metric Learning wliu@ee.columbia.edu Outline Background Related Work Learning Framework Collaborative Image Retrieval Future Research Background Euclidean distance d( x, x ) =

More information

Similarity Metric Learning for Face Recognition

Similarity Metric Learning for Face Recognition Similarity Metric Learning for Face Recognition Qiong Cao, Yig Ying Department of Computer Science University of Exeter, UK {qc218,y.ying}@exeter.ac.uk Peng Li Department of Engineering Mathematics University

More information

Sparse Gaussian conditional random fields

Sparse Gaussian conditional random fields Sparse Gaussian conditional random fields Matt Wytock, J. ico Kolter School of Computer Science Carnegie Mellon University Pittsburgh, PA 53 {mwytock, zkolter}@cs.cmu.edu Abstract We propose sparse Gaussian

More information

Sparse Compositional Metric Learning

Sparse Compositional Metric Learning Sparse Compositional Metric Learning Yuan Shi and Aurélien Bellet and Fei Sha Department of Computer Science University of Southern California Los Angeles, CA 90089, USA {yuanshi,bellet,feisha}@usc.edu

More information

arxiv: v1 [cs.cv] 25 Dec 2012

arxiv: v1 [cs.cv] 25 Dec 2012 Large Scale Strongly Supervised Ensemble Metric Learning, with Applications to Face Verification and Retrieval Chang Huang NEC Laboratories America chuang@sv.nec-labs.com Shenghuo Zhu NEC Laboratories

More information

Kronecker Decomposition for Image Classification

Kronecker Decomposition for Image Classification university of innsbruck institute of computer science intelligent and interactive systems Kronecker Decomposition for Image Classification Sabrina Fontanella 1,2, Antonio Rodríguez-Sánchez 1, Justus Piater

More information

An Efficient Sparse Metric Learning in High-Dimensional Space via l 1 -Penalized Log-Determinant Regularization

An Efficient Sparse Metric Learning in High-Dimensional Space via l 1 -Penalized Log-Determinant Regularization via l 1 -Penalized Log-Determinant Regularization Guo-Jun Qi qi4@illinois.edu Depart. ECE, University of Illinois at Urbana-Champaign, 405 North Mathews Avenue, Urbana, IL 61801 USA Jinhui Tang, Zheng-Jun

More information

Large-scale Image Annotation by Efficient and Robust Kernel Metric Learning

Large-scale Image Annotation by Efficient and Robust Kernel Metric Learning Large-scale Image Annotation by Efficient and Robust Kernel Metric Learning Supplementary Material Zheyun Feng Rong Jin Anil Jain Department of Computer Science and Engineering, Michigan State University,

More information

A Randomized Approach for Crowdsourcing in the Presence of Multiple Views

A Randomized Approach for Crowdsourcing in the Presence of Multiple Views A Randomized Approach for Crowdsourcing in the Presence of Multiple Views Presenter: Yao Zhou joint work with: Jingrui He - 1 - Roadmap Motivation Proposed framework: M2VW Experimental results Conclusion

More information

Metric Learning From Relative Comparisons by Minimizing Squared Residual

Metric Learning From Relative Comparisons by Minimizing Squared Residual Metric Learning From Relative Comparisons by Minimizing Squared Residual Eric Yi Liu, Zhishan Guo, Xiang Zhang, Vladimir Jojic and Wei Wang {liuyi, zsguo}@cs.unc.edu, xiang.zhang@case.edu, vjojic@cs.unc.edu,

More information

Metric Learning. 16 th Feb 2017 Rahul Dey Anurag Chowdhury

Metric Learning. 16 th Feb 2017 Rahul Dey Anurag Chowdhury Metric Learning 16 th Feb 2017 Rahul Dey Anurag Chowdhury 1 Presentation based on Bellet, Aurélien, Amaury Habrard, and Marc Sebban. "A survey on metric learning for feature vectors and structured data."

More information

Efficient Stochastic Optimization for Low-Rank Distance Metric Learning

Efficient Stochastic Optimization for Low-Rank Distance Metric Learning Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17 Efficient Stochastic Optimization for Low-Rank Distance Metric Learning Jie Zhang, Lijun Zhang National Key Laboratory

More information

Fast Nonnegative Matrix Factorization with Rank-one ADMM

Fast Nonnegative Matrix Factorization with Rank-one ADMM Fast Nonnegative Matrix Factorization with Rank-one Dongjin Song, David A. Meyer, Martin Renqiang Min, Department of ECE, UCSD, La Jolla, CA, 9093-0409 dosong@ucsd.edu Department of Mathematics, UCSD,

More information

Joint Semi-Supervised Similarity Learning for Linear Classification

Joint Semi-Supervised Similarity Learning for Linear Classification Joint Semi-Supervised Similarity Learning for Linear Classification Maria-Irina Nicolae 1,2, Éric Gaussier2, Amaury Habrard 1, and Marc Sebban 1 1 Université Jean Monnet, Laboratoire Hubert Curien, France

More information

https://goo.gl/kfxweg KYOTO UNIVERSITY Statistical Machine Learning Theory Sparsity Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT OF INTELLIGENCE SCIENCE AND TECHNOLOGY 1 KYOTO UNIVERSITY Topics:

More information

A Least Squares Formulation for Canonical Correlation Analysis

A Least Squares Formulation for Canonical Correlation Analysis A Least Squares Formulation for Canonical Correlation Analysis Liang Sun, Shuiwang Ji, and Jieping Ye Department of Computer Science and Engineering Arizona State University Motivation Canonical Correlation

More information

Supervised Metric Learning with Generalization Guarantees

Supervised Metric Learning with Generalization Guarantees Supervised Metric Learning with Generalization Guarantees Aurélien Bellet Laboratoire Hubert Curien, Université de Saint-Etienne, Université de Lyon Reviewers: Pierre Dupont (UC Louvain) and Jose Oncina

More information

Support'Vector'Machines. Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan

Support'Vector'Machines. Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan Support'Vector'Machines Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan kasthuri.kannan@nyumc.org Overview Support Vector Machines for Classification Linear Discrimination Nonlinear Discrimination

More information

Lecture 5 : Projections

Lecture 5 : Projections Lecture 5 : Projections EE227C. Lecturer: Professor Martin Wainwright. Scribe: Alvin Wan Up until now, we have seen convergence rates of unconstrained gradient descent. Now, we consider a constrained minimization

More information

Learning A Mixture of Sparse Distance Metrics for Classification and Dimensionality Reduction

Learning A Mixture of Sparse Distance Metrics for Classification and Dimensionality Reduction Learning A Mixture of Sparse Distance Metrics for Classification and Dimensionality Reduction Yi Hong, Quannan Li, Jiayan Jiang, and Zhuowen Tu, Lab of Neuro Imaging and Department of Computer Science,

More information

Introduction to Alternating Direction Method of Multipliers

Introduction to Alternating Direction Method of Multipliers Introduction to Alternating Direction Method of Multipliers Yale Chang Machine Learning Group Meeting September 29, 2016 Yale Chang (Machine Learning Group Meeting) Introduction to Alternating Direction

More information

Distance Metric Learning

Distance Metric Learning Distance Metric Learning Technical University of Munich Department of Informatics Computer Vision Group November 11, 2016 M.Sc. John Chiotellis: Distance Metric Learning 1 / 36 Outline Computer Vision

More information

CS4495/6495 Introduction to Computer Vision. 8C-L3 Support Vector Machines

CS4495/6495 Introduction to Computer Vision. 8C-L3 Support Vector Machines CS4495/6495 Introduction to Computer Vision 8C-L3 Support Vector Machines Discriminative classifiers Discriminative classifiers find a division (surface) in feature space that separates the classes Several

More information

Knowledge Transfer with Interactive Learning of Semantic Relationships

Knowledge Transfer with Interactive Learning of Semantic Relationships Knowledge Transfer with Interactive Learning of Semantic Relationships Feb. 15, 2016 Jonghyun Choi Sung Ju Hwang, Leonid Sigal and Larry S. Davis University of Maryland Institute of Advanced Computer Studies

More information

An Invariant Large Margin Nearest Neighbour Classifier

An Invariant Large Margin Nearest Neighbour Classifier An Invariant Large Margin Nearest Neighbour Classifier M. Pawan Kumar P.H.S. Torr A. Zisserman Oxford Brookes University University of Oxford {pkmudigonda,philiptorr}@brookes.ac.uk http://cms.brookes.ac.uk/research/visiongroup

More information

Perceptron Revisited: Linear Separators. Support Vector Machines

Perceptron Revisited: Linear Separators. Support Vector Machines Support Vector Machines Perceptron Revisited: Linear Separators Binary classification can be viewed as the task of separating classes in feature space: w T x + b > 0 w T x + b = 0 w T x + b < 0 Department

More information

Sample Complexity of Learning Mahalanobis Distance Metrics

Sample Complexity of Learning Mahalanobis Distance Metrics Sample Complexity of Learning Mahalanobis Distance Metrics Nakul Verma Janelia Research Campus, HHMI verman@janelia.hhmi.org Kristin Branson Janelia Research Campus, HHMI bransonk@janelia.hhmi.org Abstract

More information

Kernel Density Topic Models: Visual Topics Without Visual Words

Kernel Density Topic Models: Visual Topics Without Visual Words Kernel Density Topic Models: Visual Topics Without Visual Words Konstantinos Rematas K.U. Leuven ESAT-iMinds krematas@esat.kuleuven.be Mario Fritz Max Planck Institute for Informatics mfrtiz@mpi-inf.mpg.de

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Lecture Notes 1: Vector spaces

Lecture Notes 1: Vector spaces Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector

More information

Global Scene Representations. Tilke Judd

Global Scene Representations. Tilke Judd Global Scene Representations Tilke Judd Papers Oliva and Torralba [2001] Fei Fei and Perona [2005] Labzebnik, Schmid and Ponce [2006] Commonalities Goal: Recognize natural scene categories Extract features

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction

More information

Efficient Online Relative Comparison Kernel Learning

Efficient Online Relative Comparison Kernel Learning Efficient Online Relative Comparison Kernel Learning Eric Heim Matthew Berger Lee M. Seversky Milos Hauskrecht Abstract Learning a kernel matrix from relative comparison human feedback is an important

More information

Machine learning for pervasive systems Classification in high-dimensional spaces

Machine learning for pervasive systems Classification in high-dimensional spaces Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version

More information

Convex Optimization Algorithms for Machine Learning in 10 Slides

Convex Optimization Algorithms for Machine Learning in 10 Slides Convex Optimization Algorithms for Machine Learning in 10 Slides Presenter: Jul. 15. 2015 Outline 1 Quadratic Problem Linear System 2 Smooth Problem Newton-CG 3 Composite Problem Proximal-Newton-CD 4 Non-smooth,

More information

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis. Vector spaces DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Vector space Consists of: A set V A scalar

More information

MULTIPLEKERNELLEARNING CSE902

MULTIPLEKERNELLEARNING CSE902 MULTIPLEKERNELLEARNING CSE902 Multiple Kernel Learning -keywords Heterogeneous information fusion Feature selection Max-margin classification Multiple kernel learning MKL Convex optimization Kernel classification

More information

Online Kernel PCA with Entropic Matrix Updates

Online Kernel PCA with Entropic Matrix Updates Dima Kuzmin Manfred K. Warmuth Computer Science Department, University of California - Santa Cruz dima@cse.ucsc.edu manfred@cse.ucsc.edu Abstract A number of updates for density matrices have been developed

More information

Machine Learning - MT & 14. PCA and MDS

Machine Learning - MT & 14. PCA and MDS Machine Learning - MT 2016 13 & 14. PCA and MDS Varun Kanade University of Oxford November 21 & 23, 2016 Announcements Sheet 4 due this Friday by noon Practical 3 this week (continue next week if necessary)

More information

metric learning course

metric learning course metric learning course Cours RI Master DAC UPMC (Construit à partir d un tutorial ECML-PKDD 2015 (A. Bellet, M. Cord)) 1. Introduction 2. Linear metric learning 3. Nonlinear extensions 4. Large-scale metric

More information

A RELIEF Based Feature Extraction Algorithm

A RELIEF Based Feature Extraction Algorithm A RELIEF Based Feature Extraction Algorithm Yijun Sun Dapeng Wu Abstract RELIEF is considered one of the most successful algorithms for assessing the quality of features due to its simplicity and effectiveness.

More information

CS6375: Machine Learning Gautam Kunapuli. Support Vector Machines

CS6375: Machine Learning Gautam Kunapuli. Support Vector Machines Gautam Kunapuli Example: Text Categorization Example: Develop a model to classify news stories into various categories based on their content. sports politics Use the bag-of-words representation for this

More information

Homework 4. Convex Optimization /36-725

Homework 4. Convex Optimization /36-725 Homework 4 Convex Optimization 10-725/36-725 Due Friday November 4 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Support Vector Machines Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique

More information

Uncorrelated Multilinear Principal Component Analysis through Successive Variance Maximization

Uncorrelated Multilinear Principal Component Analysis through Successive Variance Maximization Uncorrelated Multilinear Principal Component Analysis through Successive Variance Maximization Haiping Lu 1 K. N. Plataniotis 1 A. N. Venetsanopoulos 1,2 1 Department of Electrical & Computer Engineering,

More information

Optimal Stochastic Strongly Convex Optimization with a Logarithmic Number of Projections

Optimal Stochastic Strongly Convex Optimization with a Logarithmic Number of Projections Optimal Stochastic Strongly Convex Optimization with a Logarithmic Number of Projections Jianhui Chen 1, Tianbao Yang 2, Qihang Lin 2, Lijun Zhang 3, and Yi Chang 4 July 18, 2016 Yahoo Research 1, The

More information

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?

More information

Some tensor decomposition methods for machine learning

Some tensor decomposition methods for machine learning Some tensor decomposition methods for machine learning Massimiliano Pontil Istituto Italiano di Tecnologia and University College London 16 August 2016 1 / 36 Outline Problem and motivation Tucker decomposition

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Dimensionality Reduction Using the Sparse Linear Model: Supplementary Material

Dimensionality Reduction Using the Sparse Linear Model: Supplementary Material Dimensionality Reduction Using the Sparse Linear Model: Supplementary Material Ioannis Gkioulekas arvard SEAS Cambridge, MA 038 igkiou@seas.harvard.edu Todd Zickler arvard SEAS Cambridge, MA 038 zickler@seas.harvard.edu

More information

Machine Learning Basics

Machine Learning Basics Security and Fairness of Deep Learning Machine Learning Basics Anupam Datta CMU Spring 2019 Image Classification Image Classification Image classification pipeline Input: A training set of N images, each

More information

Midterm exam CS 189/289, Fall 2015

Midterm exam CS 189/289, Fall 2015 Midterm exam CS 189/289, Fall 2015 You have 80 minutes for the exam. Total 100 points: 1. True/False: 36 points (18 questions, 2 points each). 2. Multiple-choice questions: 24 points (8 questions, 3 points

More information

Kernel Learning with Bregman Matrix Divergences

Kernel Learning with Bregman Matrix Divergences Kernel Learning with Bregman Matrix Divergences Inderjit S. Dhillon The University of Texas at Austin Workshop on Algorithms for Modern Massive Data Sets Stanford University and Yahoo! Research June 22,

More information

Tutorial on Metric Learning

Tutorial on Metric Learning Tutorial on Metric Learning Aurélien Bellet Department of Computer Science Viterbi School of Engineering University of Southern California Computational Intelligence and Learning Doctoral School October

More information

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH Hoang Trang 1, Tran Hoang Loc 1 1 Ho Chi Minh City University of Technology-VNU HCM, Ho Chi

More information

arxiv: v1 [cs.lg] 9 Apr 2008

arxiv: v1 [cs.lg] 9 Apr 2008 On Kernelization of Supervised Mahalanobis Distance Learners Ratthachat Chatpatanasiri, Teesid Korsrilabutr, Pasakorn Tangchanachaianan, and Boonserm Kijsirikul arxiv:0804.1441v1 [cs.lg] 9 Apr 2008 Department

More information

arxiv: v2 [cs.lg] 12 Jan 2015

arxiv: v2 [cs.lg] 12 Jan 2015 Efficient Online Relative Comparison Kernel Learning Eric Heim Matthew Berger Lee M. Seversky Milos Hauskrecht arxiv:1501.01242v2 [cs.lg] 12 Jan 2015 Abstract Learning a kernel matrix from relative comparison

More information

Asaf Bar Zvi Adi Hayat. Semantic Segmentation

Asaf Bar Zvi Adi Hayat. Semantic Segmentation Asaf Bar Zvi Adi Hayat Semantic Segmentation Today s Topics Fully Convolutional Networks (FCN) (CVPR 2015) Conditional Random Fields as Recurrent Neural Networks (ICCV 2015) Gaussian Conditional random

More information

Approximating the Covariance Matrix with Low-rank Perturbations

Approximating the Covariance Matrix with Low-rank Perturbations Approximating the Covariance Matrix with Low-rank Perturbations Malik Magdon-Ismail and Jonathan T. Purnell Department of Computer Science Rensselaer Polytechnic Institute Troy, NY 12180 {magdon,purnej}@cs.rpi.edu

More information

Ranking from Crowdsourced Pairwise Comparisons via Matrix Manifold Optimization

Ranking from Crowdsourced Pairwise Comparisons via Matrix Manifold Optimization Ranking from Crowdsourced Pairwise Comparisons via Matrix Manifold Optimization Jialin Dong ShanghaiTech University 1 Outline Introduction FourVignettes: System Model and Problem Formulation Problem Analysis

More information

The Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017

The Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017 The Kernel Trick, Gram Matrices, and Feature Extraction CS6787 Lecture 4 Fall 2017 Momentum for Principle Component Analysis CS6787 Lecture 3.1 Fall 2017 Principle Component Analysis Setting: find the

More information

Nonlinear Metric Learning with Kernel Density Estimation

Nonlinear Metric Learning with Kernel Density Estimation 1 Nonlinear Metric Learning with Kernel Density Estimation Yujie He, Yi Mao, Wenlin Chen, and Yixin Chen, Senior Member, IEEE Abstract Metric learning, the task of learning a good distance metric, is a

More information

Learning to Rank and Quadratic Assignment

Learning to Rank and Quadratic Assignment Learning to Rank and Quadratic Assignment Thomas Mensink TVPA - XRCE & LEAR - INRIA Grenoble, France Jakob Verbeek LEAR Team INRIA Rhône-Alpes Grenoble, France Abstract Tiberio Caetano Machine Learning

More information

Lecture Note 5: Semidefinite Programming for Stability Analysis

Lecture Note 5: Semidefinite Programming for Stability Analysis ECE7850: Hybrid Systems:Theory and Applications Lecture Note 5: Semidefinite Programming for Stability Analysis Wei Zhang Assistant Professor Department of Electrical and Computer Engineering Ohio State

More information

CS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu

CS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu CS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu Feature engineering is hard 1. Extract informative features from domain knowledge

More information

Linear Subspace Models

Linear Subspace Models Linear Subspace Models Goal: Explore linear models of a data set. Motivation: A central question in vision concerns how we represent a collection of data vectors. The data vectors may be rasterized images,

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

More information

Kernel Density Metric Learning

Kernel Density Metric Learning Kernel Density Metric Learning Yujie He, Wenlin Chen, Yixin Chen Department of Computer Science and Engineering Washington University St. Louis, USA yujie.he@wustl.edu, wenlinchen@wustl.edu, chen@cse.wustl.edu

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2016 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Regularization in Neural Networks

Regularization in Neural Networks Regularization in Neural Networks Sargur Srihari 1 Topics in Neural Network Regularization What is regularization? Methods 1. Determining optimal number of hidden units 2. Use of regularizer in error function

More information

Learning Spectral Clustering

Learning Spectral Clustering Learning Spectral Clustering Francis R. Bach Computer Science University of California Berkeley, CA 94720 fbach@cs.berkeley.edu Michael I. Jordan Computer Science and Statistics University of California

More information

SVMs, Duality and the Kernel Trick

SVMs, Duality and the Kernel Trick SVMs, Duality and the Kernel Trick Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 26 th, 2007 2005-2007 Carlos Guestrin 1 SVMs reminder 2005-2007 Carlos Guestrin 2 Today

More information

CS 3710: Visual Recognition Describing Images with Features. Adriana Kovashka Department of Computer Science January 8, 2015

CS 3710: Visual Recognition Describing Images with Features. Adriana Kovashka Department of Computer Science January 8, 2015 CS 3710: Visual Recognition Describing Images with Features Adriana Kovashka Department of Computer Science January 8, 2015 Plan for Today Presentation assignments + schedule changes Image filtering Feature

More information

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss arxiv:1811.04545v1 [stat.co] 12 Nov 2018 Cheng Wang School of Mathematical Sciences, Shanghai Jiao

More information

Logistic Regression. COMP 527 Danushka Bollegala

Logistic Regression. COMP 527 Danushka Bollegala Logistic Regression COMP 527 Danushka Bollegala Binary Classification Given an instance x we must classify it to either positive (1) or negative (0) class We can use {1,-1} instead of {1,0} but we will

More information

Support Vector Machines and Kernel Methods

Support Vector Machines and Kernel Methods 2018 CS420 Machine Learning, Lecture 3 Hangout from Prof. Andrew Ng. http://cs229.stanford.edu/notes/cs229-notes3.pdf Support Vector Machines and Kernel Methods Weinan Zhang Shanghai Jiao Tong University

More information

Introduction to Support Vector Machines

Introduction to Support Vector Machines Introduction to Support Vector Machines Hsuan-Tien Lin Learning Systems Group, California Institute of Technology Talk in NTU EE/CS Speech Lab, November 16, 2005 H.-T. Lin (Learning Systems Group) Introduction

More information

CS 231A Section 1: Linear Algebra & Probability Review. Kevin Tang

CS 231A Section 1: Linear Algebra & Probability Review. Kevin Tang CS 231A Section 1: Linear Algebra & Probability Review Kevin Tang Kevin Tang Section 1-1 9/30/2011 Topics Support Vector Machines Boosting Viola Jones face detector Linear Algebra Review Notation Operations

More information

CS 231A Section 1: Linear Algebra & Probability Review

CS 231A Section 1: Linear Algebra & Probability Review CS 231A Section 1: Linear Algebra & Probability Review 1 Topics Support Vector Machines Boosting Viola-Jones face detector Linear Algebra Review Notation Operations & Properties Matrix Calculus Probability

More information

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization / Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

Homework 5. Convex Optimization /36-725

Homework 5. Convex Optimization /36-725 Homework 5 Convex Optimization 10-725/36-725 Due Tuesday November 22 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)

More information

CENG 793. On Machine Learning and Optimization. Sinan Kalkan

CENG 793. On Machine Learning and Optimization. Sinan Kalkan CENG 793 On Machine Learning and Optimization Sinan Kalkan 2 Now Introduction to ML Problem definition Classes of approaches K-NN Support Vector Machines Softmax classification / logistic regression Parzen

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x

More information

Beyond Linear Similarity Function Learning

Beyond Linear Similarity Function Learning Beyond Linear Similarity Function Learning Julien Bohné A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy of University College London. Department

More information

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010 Linear Regression Aarti Singh Machine Learning 10-701/15-781 Sept 27, 2010 Discrete to Continuous Labels Classification Sports Science News Anemic cell Healthy cell Regression X = Document Y = Topic X

More information

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas Dimensionality Reduction: PCA Nicholas Ruozzi University of Texas at Dallas Eigenvalues λ is an eigenvalue of a matrix A R n n if the linear system Ax = λx has at least one non-zero solution If Ax = λx

More information

Sparse Covariance Selection using Semidefinite Programming

Sparse Covariance Selection using Semidefinite Programming Sparse Covariance Selection using Semidefinite Programming A. d Aspremont ORFE, Princeton University Joint work with O. Banerjee, L. El Ghaoui & G. Natsoulis, U.C. Berkeley & Iconix Pharmaceuticals Support

More information

arxiv: v1 [stat.ml] 10 Dec 2015

arxiv: v1 [stat.ml] 10 Dec 2015 Boosted Sparse Non-linear Distance Metric Learning arxiv:1512.03396v1 [stat.ml] 10 Dec 2015 Yuting Ma Tian Zheng yma@stat.columbia.edu tzheng@stat.columbia.edu Department of Statistics Department of Statistics

More information

Online Videos FERPA. Sign waiver or sit on the sides or in the back. Off camera question time before and after lecture. Questions?

Online Videos FERPA. Sign waiver or sit on the sides or in the back. Off camera question time before and after lecture. Questions? Online Videos FERPA Sign waiver or sit on the sides or in the back Off camera question time before and after lecture Questions? Lecture 1, Slide 1 CS224d Deep NLP Lecture 4: Word Window Classification

More information

Machine Learning Support Vector Machines. Prof. Matteo Matteucci

Machine Learning Support Vector Machines. Prof. Matteo Matteucci Machine Learning Support Vector Machines Prof. Matteo Matteucci Discriminative vs. Generative Approaches 2 o Generative approach: we derived the classifier from some generative hypothesis about the way

More information

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2018 CS 551, Fall

More information

metric learning for large-scale data

metric learning for large-scale data metric learning for large-scale data Aurélien Bellet MAGNET Project-Team, Inria Seminar Statistical Machine Learning (SMILE) in Paris April 28, 2016 a bit about me 2009-12: Ph.D., Université de Saint-Etienne

More information

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015 EE613 Machine Learning for Engineers Kernel methods Support Vector Machines jean-marc odobez 2015 overview Kernel methods introductions and main elements defining kernels Kernelization of k-nn, K-Means,

More information

Linear Dimensionality Reduction

Linear Dimensionality Reduction Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Principal Component Analysis 3 Factor Analysis

More information