Learning from permutations. Jean-Philippe Vert
|
|
- Julius Garrison
- 5 years ago
- Views:
Transcription
1 Learning from permutations Jean-Philippe Vert
2
3 Perception
4 Communication
5 Mobility
6 Health
7 A common process: learning from data Given examples (training data), make a machine learn how to predict on new samples, or discover patterns in data Statistics + optimization + computer science Gets better with more training examples and bigger computers
8 It (more or less) boils down to
9 It (more or less) boils down to
10 It (more or less) boils down to
11 It (more or less) boils down to
12 In practice (eg, linear ridge logistic regression) Input X 1,..., X n R p Output Y 1,..., Y n { 1, 1} Classifier: f β (X) = sign(β X) for β R p Training } n ( ) l i β X i + λ β 2 ˆβ = argmin β R p { 1 n i=1 with l i ( β X i ) = ln ( 1 + e Y i β X i ) Convex optimization problem, scalable (SGD)
13 Questions ˆβ = argmin β R p { 1 n n i=1 How to compute ˆβ in practice? Will f ˆβ make good predictions? How to train nonlinear models? What if inputs are not vectors? What if outputs are not binary?... } ) ln (1 + e Y i β X i + λ β 2
14 Questions ˆβ = argmin β R p { 1 n n i=1 How to compute ˆβ in practice? Will f ˆβ make good predictions? How to train nonlinear models? What if inputs are not vectors? What if outputs are not binary?... } ) ln (1 + e Y i β X i + λ β 2
15 What if inputs are permutations? Permutation: a bijection σ : [1, N] [1, N] σ(i) = rank of item i Composition (σ 1 σ 2 )(i) = σ 1 (σ 2 (i)) S N the symmetric group S N = N!
16 Examples Ranking data Ranks extracted from data (histogram equalization, quantile normalization...)
17 Learning from permutations Assume your data are permutations and you want to learn f : S N R A solutions: embed S N to a Euclidean (or Hilbert) space Φ : S N R p and learn a linear function: f β (σ) = β Φ(σ) The corresponding kernel is K (σ 1, σ 2 ) = Φ(σ 1 ) Φ(σ 2 )
18 How to define the embedding Φ : S N R p? Should encode interesting features Should lead to efficient algorithms Should be invariant to renaming of the items, i.e., the kernel should be right-invariant σ 1, σ 2, π S N, K (σ 1 π, σ 2 π) = K (σ 1, σ 2 )
19 Harmonic analysis on S N A representation of S N is a matrix-valued function ρ : S N C dρ dρ such that σ 1, σ 2 S N, ρ(σ 1 σ 2 ) = ρ(σ 1 )ρ(σ 2 ) A representation is irreductible (irrep) if it is not equivalent to the direct sum of two other representations S N has a finite number of irreps {ρ λ : λ Λ} where Λ = {λ N} 1 is the set of partitions of N For any f : S N R, the Fourier transform of f is λ Λ, ˆf (ρλ ) = σ S N f (σ)ρ λ (σ) 1 λ N iff λ = (λ 1,..., λ r ) with λ 1... λ r and r i=1 λ i = N
20 Right-invariant kernels Bochner s theorem An embedding Φ : S N R p defines a right-invariant kernel K (σ 1, σ 2 ) = Φ(σ 1 ) Φ(σ 2 ) if and only there exists φ : S N R such that σ 1, σ 2 S N, K (σ 1, σ 2 ) = φ(σ 1 2 σ 1) and λ Λ, ˆφ(ρ λ ) 0
21 Some attempts Kendall SUQUAN (Jiao and Vert, 2015, 2017, 2018; Le Morvan and Vert, 2017)
22 SUQUAN embedding (Le Morvan and Vert, 2017) Let Φ(σ) = Π σ the permutation representation (Serres, 1977): { 1 if σ(j) = i, [Π σ ] ij = 0 otherwise. Leads to new approaches for supervised quantile normalization (SUQUAN) and vector quantization
23 Example: CIFAR-10 Discriminate images of horse vs. plane Different methods learn different quantile functions original median SVD SUQUAN BND Index Index Index
24 Limits of the SUQUAN embedding Linear model on Φ(σ) = Π σ R N N Captures first-order information of the form "i-th feature ranked at the j-th position" What about higher-order information such as "feature i larger than feature j"?
25 The Kendall embedding (Jiao and Vert, 2015, 2017) Φ i,j (σ) = { 1 if σ(i) < σ(j), 0 otherwise.
26 Kendall and Mallows kernels The Kendall kernel is K τ (σ, σ ) = Φ(σ) Φ(σ ) The Mallows kernel is λ 0 KM λ (σ, σ ) = e λ Φ(σ) Φ(σ ) 2 Theorem (Jiao and Vert, 2015, 2017) The Kendall and Mallows kernels are positive definite and can be evaluated in O(N log N) time Kernel trick useful with few samples in large dimensions
27 Remark Kondor and Barbarosa (2010) proposed the diffusion kernel on the Cayley graph of the symmetric group generated by adjacent transpositions. Computationally intensive (O(N 2N )) Mallows kernel is written as K λ M (σ, σ ) = e λn d (σ,σ ), Cayley graph of S 4 where n d (σ, σ ) is the shortest path distance on the Cayley graph. It can be computed in O(N log N)
28 Higher-order kernels (Jiao and Vert, 2018) Φ(σ) = Π d σ For d = 1, this is the SUQUAN embedding For d = 2, this leads to a new weighted Kendall kernel, where weights can optimized during training
29 Conclusion Kendall SUQUAN Machine learning beyond vectors, strings and graphs Different embeddings of the symmetric group Scalability? Robustness to adversarial attacks? MERCI!
30 References R. E. Barlow, D. Bartholomew, J. M. Bremner, and H. D. Brunk. Statistical inference under order restrictions; the theory and application of isotonic regression. Wiley, New-York, Y. Jiao and J.-P. Vert. The Kendall and Mallows kernels for permutations. In Proceedings of The 32nd International Conference on Machine Learning, volume 37 of JMLR:W&CP, pages , URL Y. Jiao and J.-P. Vert. The Kendall and Mallows kernels for permutations. IEEE Transactions on Pattern Analysis and Machine Intelligence, doi: /TPAMI URL Y. Jiao and J.-P. Vert. The weighted kendall and high-order kernels for permutations. Technical Report , arxiv, W. R. Knight. A computer method for calculating Kendall s tau with ungrouped data. J. Am. Stat. Assoc., 61(314): , URL M. Le Morvan and J.-P. Vert. Supervised quantile normalisation. Technical Report , arxiv, J.-P. Serres. Linear Representations of Finite Groups. Graduate Texts in Mathematics. Springer-Verlag New York, doi: / URL O. Sysoev and O. Burdakov. A smoothed monotonic regression via l2 regularization. Technical Report LiTH-MAT-R 2016/01 SE, Department of mathematics, Linköping University, URL
31 The quantile normalization (QN) embedding Data: permutation σ S N where σ(i)= rank of item/feature i Fix a target quantile q R N Define Φ q : S N R N by σ S N, [Φ q (σ)] i = q σ(i) "Keep the order, change the values"
32 How to choose a "good" target distribution?
33 SUQUAN (Le Morvan and Vert, 2017) Learn after standard QN: 1 Fix q arbitrarily 2 QN all samples to get Φ q (σ 1 ),..., Φ q (σ n ) 3 Learn a model on normalized data, e.g.: ˆβ = argmin β R N { 1 n } n ) l i (β Φ q (σ i ) + λ β 2 i=1 Supervised QN (SUQUAN): jointly learn q and the model: { } ( 1 n ) ˆβ, ˆq) = argmin l i (β Φ q (σ i ) + λ β 2 + γω(q) β,q R N n i=1
34 SUQUAN (Le Morvan and Vert, 2017) Learn after standard QN: 1 Fix q arbitrarily 2 QN all samples to get Φ q (σ 1 ),..., Φ q (σ n ) 3 Learn a model on normalized data, e.g.: ˆβ = argmin β R N { 1 n } n ) l i (β Φ q (σ i ) + λ β 2 i=1 Supervised QN (SUQUAN): jointly learn q and the model: { } ( 1 n ) ˆβ, ˆq) = argmin l i (β Φ q (σ i ) + λ β 2 + γω(q) β,q R N n i=1
35 Computing Φ q (σ) For σ S N let the permutation representation (Serres, 1977): { 1 if σ(j) = i, [Π σ ] ij = 0 otherwise. Then Φ q (σ) = Π σ q
36 Linear SUQAN as rank-1 matrix regression Linear SUQUAN therefore solves { 1 min β,q R N = min β,q R N = min β,q R N n l i { 1 n l i { 1 n l i ( ) β Φ q (σ i ) + λ β 2 + γω(q) } ( ) } β Π σ i q + λ β 2 + γω(q) ) } (< qβ, Π σi > Frobenius + λ β 2 + γω(q) A particular linear model to estimate a rank-1 matrix M = qβ Each sample σ S N is represented by the matrix Π σ R n n Non-convex Alternative optimization of f and w is easy
37 Experiments: CIFAR-10 Image classification into 10 classes (45 binary problems) N = 5, 000 per class, p = 1, 024 pixels AUC cauchy exponential uniform gaussian median SUQUAN SVD SUQUAN BND SUQUAN SPAV AUC on test set SUQUAN BND AUC on test set median
38 Experiments: CIFAR-10 Example: horse vs. plane Different methods learn different quantile functions original median SVD SUQUAN BND Index Index Index
39 Outline 1 The Kendall embedding
40 Limits of the QN embedding Linear model on Φ(σ) = Π σ R N N Captures first-order information of the form "i-th feature ranked at the j-th position" What about higher-order information such as "feature i larger than feature j"?
41 Another representation Φ i,j (σ) = { 1 if σ(i) < σ(j), 0 otherwise.
42 Kendall and Mallows kernels The Kendall kernel is K τ (σ, σ ) = Φ(σ) Φ(σ ) The Mallows kernel is λ 0 KM λ (σ, σ ) = e λ Φ(σ) Φ(σ ) 2 Theorem (Jiao and Vert, 2015, 2017) The Kendall and Mallows kernels are positive definite and can be evaluated in O(N log N) time Kernel trick useful with few samples in large dimensions
43 Proof For any two permutations σ, σ S N : Inner product Φ(σ) Φ(σ ) = 1 σ(i)<σ(j) 1 σ (i)<σ (j) = n c (σ, σ ) 1 i j N n c = number of concordant pairs Distance Φ(σ) Φ(σ ) 2 = (1 σ(i)<σ(j) 1 σ (i)<σ (j)) 2 = 2n d (σ, σ ) 1 i,j N n d = number of discordant pairs n c and n c can be computed in O(N log N) (Knight, 1966)
44 Related work Kondor and Barbarosa (2010) proposed the diffusion kernel on the Cayley graph of the symmetric group generated by adjacent transpositions. Computationally intensive (O(N 2N )) Mallows kernel is written as K λ M (σ, σ ) = e λn d (σ,σ ), Cayley graph of S 4 where n d (σ, σ ) is the shortest path distance on the Cayley graph. It can be computed in O(N log N)
45 Applications SVMkdtALL SVMlinearTOP SVMlinearALL SVMkdtTOP SVMpolyALL KFDkdtALL ktsp SVMpolyTOP KFDlinearALL KFDpolyALL TSP SVMrbfALL KFDrbfALL APMV acc Average performance on 10 microarray classification problems (Jiao and Vert, 2017).
46 Constraints on f Ridge Non-decreasing F 0 = { f R p : 1 p p i=1 f 2 i 1 }. F BND = F 0 I 0, where I 0 = {f R p : f 1 f 2... f p } Non-decreasing and smooth F SPAV = f I 0 : p 1 (f j+1 f j ) 2 1. j=1
47 produce a sequence of target quantiles although we found in our experiments below that the performance did not change significantly after the first iteration. Note also that, contrary to SUQUAN-SVD, this algorithm requires an initial non-decreasing target quantile function. By default we suggest to use the median of the data quantile functions, which is often the default used in standard QN normalisation. SUQUAN-BND and SUQUAN-PAVA Algorithm 2: SUQUAN-BND and SUQUAN-SPAV Input: (x1, y1 ),..., (xn, yn ), finit 2 I0, 2 R Output: f 2 I0 target quantile 1: for i = 1 to n do 2: ranki, orderi sort(xi ) 3: end for P n 4: w, b argmin n1 i=1 `i w> finit [ranki ] + b + w 2 w,b (standard linear optimisation) Pmodel n 5: f argmin n1 i=1 `i f > w[orderi ] + b f 2FBN D (isotonic optimisation problem using PAVA as prox) OR Pn f argmin n1 i=1 `i f > w[orderi ] + b f 2FSP AV (smoothed isotonic optimisation problem using SPAV as prox) 6 Experiments Alternate optimization in w and f, monotonicity constraint on f 6.1 Simulated dataproximal gradient optimization for f, using the Pool Accelerated We first test the ability of SUQUAN to overcome unwanted changes in quantile distributions on simulated Adjacent Violators Algorithm (PAVA, Barlow et al. (1972)) or the datasets. For that purpose we fix f 2 Rp to be the quantile distribution of the normal distribution, and p simulatesmoothed each sample xpool permuting the entries(spav, of f. WeSysoev then generate binary Violators algorithm and 1,..., xadjacent n 2 R by randomly labels y1,..., yn 2 { 1, 1} using the logistic model P (Y = 1 X = x) = 1+exp(1 w x), where w is randomly Burdakov (2016)) as proximal operator. sampled from a standard multivariate normal distribution. We then compare four methods to estimate w >
48 D A variant: SUQUAN-SVD F0, i.e., when we do not constrain nd ( ) = 2, then the set M Algorithm 1: SUQUAN-SVD 8) is exactly the set of rank-1 mainput: mounts to finding a rank-1 matrix (x1, y1 ),..., (xn, yn ) 2 Rp { 1, 1} es a linear regression or classificaoutput: f 2 F0 target quantile der the binary classification setting, 1: MLDA 0 2 Rp p mposed of pairs (xi, yi )i=1,...,n with 2: n {i : yi = +1} +1 e, a simple linear classifier (without 3: n 1 {i : yi = 1} e obtained by linear discriminant 4: for i = 1 to n do ariance: MLDA = µ+ µ, where 5: Compute xi (by sorting xi ) y the means of the matrices xi for 6: MLDA MLDA + nyyi xi lasses. Consequently, a good ranki 7: end for e closest rank-1 matrix to MLDA, 8: (, w, f ) SV D(MLDA, 1) d v are the left and right singular ed to the largest singular value. quantile function by keeping only tor of MLDA, which can then be used as target quantile for quantile normalising Ridge penalty (no monotonicity constraint), equivalent to running any linear classification method. Algorithm 1 summarises the method. nvolves an O(p ln(p)) sortingproblem of the entries of xi, and therefore computing MLDA, regression ion of n permutation matrices, requires O(np ln(p)) operations. Then computing finds the closest rank-1 thea LDA vector (linesvd 8) of M costs another O(p2 ) matrix operationsto using naive solution: LDA typically owever, if n p, we can exploit the fact that the product of a permutation matrix X X 1 permutation), so that the 1 power operation (just order the vector according to the MLDA = Πxi singular Πxi rst singular vector only takes O(np). Computing the right largest vector n+ n which is np)) complexity. Hence the complexity of SUQUAN-SVD is O(np ln(p)), i : yi =+1 i : yi =+1 y of the quantile normalisation. Complexity O(np ln(p)) (same as QN only) rank-1
49 Experiments: Simulations True distribution of X entries is normal Corrupt data with a cauchy, exponential, uniform or bimodal gaussian distributions. p = 1000, n varies, logistic regression. AUC Log. reg. original Log. reg. corrupted SUQUAN BND SUQUAN SPAV Number of samples Euclidean distance to original target quantile Number of samples
arxiv: v1 [stat.ml] 1 Jun 2017
Supervised Quantile Normalisation Marine Le Morvan,2,3 and Jean-Philippe Vert,2,3,4 arxiv:706.00244v [stat.ml] Jun 207 MINES ParisTech, PSL Research University, CBIO-Centre for Computational Biology, 75006
More informationIntroduction to Machine Learning
1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer
More informationCIS 520: Machine Learning Oct 09, Kernel Methods
CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed
More informationSupport Vector Machines for Classification: A Statistical Portrait
Support Vector Machines for Classification: A Statistical Portrait Yoonkyung Lee Department of Statistics The Ohio State University May 27, 2011 The Spring Conference of Korean Statistical Society KAIST,
More informationSupport Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM
1 Support Vector Machines (SVM) in bioinformatics Day 1: Introduction to SVM Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan Jean-Philippe.Vert@mines.org Human Genome Center, University
More informationA Least Squares Formulation for Canonical Correlation Analysis
A Least Squares Formulation for Canonical Correlation Analysis Liang Sun, Shuiwang Ji, and Jieping Ye Department of Computer Science and Engineering Arizona State University Motivation Canonical Correlation
More informationA Magiv CV Theory for Large-Margin Classifiers
A Magiv CV Theory for Large-Margin Classifiers Hui Zou School of Statistics, University of Minnesota June 30, 2018 Joint work with Boxiang Wang Outline 1 Background 2 Magic CV formula 3 Magic support vector
More informationSupport Vector Machines
Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationBig Data Analytics: Optimization and Randomization
Big Data Analytics: Optimization and Randomization Tianbao Yang Tutorial@ACML 2015 Hong Kong Department of Computer Science, The University of Iowa, IA, USA Nov. 20, 2015 Yang Tutorial for ACML 15 Nov.
More informationShort Course Robust Optimization and Machine Learning. 3. Optimization in Supervised Learning
Short Course Robust Optimization and 3. Optimization in Supervised EECS and IEOR Departments UC Berkeley Spring seminar TRANSP-OR, Zinal, Jan. 16-19, 2012 Outline Overview of Supervised models and variants
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationProbabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier
More informationCS798: Selected topics in Machine Learning
CS798: Selected topics in Machine Learning Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Jakramate Bootkrajang CS798: Selected topics in Machine Learning
More informationSupport Vector Machines
Wien, June, 2010 Paul Hofmarcher, Stefan Theussl, WU Wien Hofmarcher/Theussl SVM 1/21 Linear Separable Separating Hyperplanes Non-Linear Separable Soft-Margin Hyperplanes Hofmarcher/Theussl SVM 2/21 (SVM)
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Linear Classifiers. Blaine Nelson, Tobias Scheffer
Universität Potsdam Institut für Informatik Lehrstuhl Linear Classifiers Blaine Nelson, Tobias Scheffer Contents Classification Problem Bayesian Classifier Decision Linear Classifiers, MAP Models Logistic
More informationPattern Recognition and Machine Learning. Perceptrons and Support Vector machines
Pattern Recognition and Machine Learning James L. Crowley ENSIMAG 3 - MMIS Fall Semester 2016 Lessons 6 10 Jan 2017 Outline Perceptrons and Support Vector machines Notation... 2 Perceptrons... 3 History...3
More informationGraph Wavelets to Analyze Genomic Data with Biological Networks
Graph Wavelets to Analyze Genomic Data with Biological Networks Yunlong Jiao and Jean-Philippe Vert "Emerging Topics in Biological Networks and Systems Biology" symposium, Swedish Collegium for Advanced
More informationJeff Howbert Introduction to Machine Learning Winter
Classification / Regression Support Vector Machines Jeff Howbert Introduction to Machine Learning Winter 2012 1 Topics SVM classifiers for linearly separable classes SVM classifiers for non-linearly separable
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2016 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationMachine Learning. Classification, Discriminative learning. Marc Toussaint University of Stuttgart Summer 2015
Machine Learning Classification, Discriminative learning Structured output, structured input, discriminative function, joint input-output features, Likelihood Maximization, Logistic regression, binary
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationCSC 411 Lecture 17: Support Vector Machine
CSC 411 Lecture 17: Support Vector Machine Ethan Fetaya, James Lucas and Emad Andrews University of Toronto CSC411 Lec17 1 / 1 Today Max-margin classification SVM Hard SVM Duality Soft SVM CSC411 Lec17
More information10-701/ Recitation : Kernels
10-701/15-781 Recitation : Kernels Manojit Nandi February 27, 2014 Outline Mathematical Theory Banach Space and Hilbert Spaces Kernels Commonly Used Kernels Kernel Theory One Weird Kernel Trick Representer
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationMachine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling
Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University
More informationA Randomized Approach for Crowdsourcing in the Presence of Multiple Views
A Randomized Approach for Crowdsourcing in the Presence of Multiple Views Presenter: Yao Zhou joint work with: Jingrui He - 1 - Roadmap Motivation Proposed framework: M2VW Experimental results Conclusion
More informationLinear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction
Linear vs Non-linear classifier CS789: Machine Learning and Neural Network Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Linear classifier is in the
More informationLearning gradients: prescriptive models
Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationSUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION
SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION 1 Outline Basic terminology Features Training and validation Model selection Error and loss measures Statistical comparison Evaluation measures 2 Terminology
More informationMachine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber
Machine Learning Regression-Based Classification & Gaussian Discriminant Analysis Manfred Huber 2015 1 Logistic Regression Linear regression provides a nice representation and an efficient solution to
More informationSupport Vector Machine
Support Vector Machine Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Linear Support Vector Machine Kernelized SVM Kernels 2 From ERM to RLM Empirical Risk Minimization in the binary
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationGaussian and Linear Discriminant Analysis; Multiclass Classification
Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015
More informationMidterm exam CS 189/289, Fall 2015
Midterm exam CS 189/289, Fall 2015 You have 80 minutes for the exam. Total 100 points: 1. True/False: 36 points (18 questions, 2 points each). 2. Multiple-choice questions: 24 points (8 questions, 3 points
More informationhsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference
CS 229 Project Report (TR# MSB2010) Submitted 12/10/2010 hsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference Muhammad Shoaib Sehgal Computer Science
More informationKernel Learning via Random Fourier Representations
Kernel Learning via Random Fourier Representations L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Module 5: Machine Learning L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Kernel Learning via Random
More informationMachine Learning (CS 567) Lecture 5
Machine Learning (CS 567) Lecture 5 Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol
More informationClassification and Support Vector Machine
Classification and Support Vector Machine Yiyong Feng and Daniel P. Palomar The Hong Kong University of Science and Technology (HKUST) ELEC 5470 - Convex Optimization Fall 2017-18, HKUST, Hong Kong Outline
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 Exam policy: This exam allows two one-page, two-sided cheat sheets (i.e. 4 sides); No other materials. Time: 2 hours. Be sure to write
More informationSupport Vector Machines: Maximum Margin Classifiers
Support Vector Machines: Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 16, 2008 Piotr Mirowski Based on slides by Sumit Chopra and Fu-Jie Huang 1 Outline What is behind
More informationStatistical learning on graphs
Statistical learning on graphs Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr ParisTech, Ecole des Mines de Paris Institut Curie INSERM U900 Seminar of probabilities, Institut Joseph Fourier, Grenoble,
More informationLecture 18: Kernels Risk and Loss Support Vector Regression. Aykut Erdem December 2016 Hacettepe University
Lecture 18: Kernels Risk and Loss Support Vector Regression Aykut Erdem December 2016 Hacettepe University Administrative We will have a make-up lecture on next Saturday December 24, 2016 Presentations
More informationKernel Methods. Konstantin Tretyakov MTAT Machine Learning
Kernel Methods Konstantin Tretyakov (kt@ut.ee) MTAT.03.227 Machine Learning So far Supervised machine learning Linear models Non-linear models Unsupervised machine learning Generic scaffolding So far Supervised
More informationSupport Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2
More informationLinear Models in Machine Learning
CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,
More informationFinal Overview. Introduction to ML. Marek Petrik 4/25/2017
Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,
More informationFeature Engineering, Model Evaluations
Feature Engineering, Model Evaluations Giri Iyengar Cornell University gi43@cornell.edu Feb 5, 2018 Giri Iyengar (Cornell Tech) Feature Engineering Feb 5, 2018 1 / 35 Overview 1 ETL 2 Feature Engineering
More informationSupport Vector Machines
Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find a plane that separates the classes in feature space. If we cannot, we get creative in two
More informationA GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES. Wei Chu, S. Sathiya Keerthi, Chong Jin Ong
A GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES Wei Chu, S. Sathiya Keerthi, Chong Jin Ong Control Division, Department of Mechanical Engineering, National University of Singapore 0 Kent Ridge Crescent,
More informationLecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides
Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides Intelligent Data Analysis and Probabilistic Inference Lecture
More informationSVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels
SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels Karl Stratos June 21, 2018 1 / 33 Tangent: Some Loose Ends in Logistic Regression Polynomial feature expansion in logistic
More informationover the parameters θ. In both cases, consequently, we select the minimizing
MONOTONE REGRESSION JAN DE LEEUW Abstract. This is an entry for The Encyclopedia of Statistics in Behavioral Science, to be published by Wiley in 200. In linear regression we fit a linear function y =
More informationECE521: Inference Algorithms and Machine Learning University of Toronto. Assignment 1: k-nn and Linear Regression
ECE521: Inference Algorithms and Machine Learning University of Toronto Assignment 1: k-nn and Linear Regression TA: Use Piazza for Q&A Due date: Feb 7 midnight, 2017 Electronic submission to: ece521ta@gmailcom
More informationMachine Learning Practice Page 2 of 2 10/28/13
Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes
More informationSupport Vector Machines. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington
Support Vector Machines CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 A Linearly Separable Problem Consider the binary classification
More informationKernel Methods. Konstantin Tretyakov MTAT Machine Learning
Kernel Methods Konstantin Tretyakov (kt@ut.ee) MTAT.03.227 Machine Learning So far Supervised machine learning Linear models Least squares regression, SVR Fisher s discriminant, Perceptron, Logistic model,
More informationSVRG++ with Non-uniform Sampling
SVRG++ with Non-uniform Sampling Tamás Kern András György Department of Electrical and Electronic Engineering Imperial College London, London, UK, SW7 2BT {tamas.kern15,a.gyorgy}@imperial.ac.uk Abstract
More informationPart I. Linear Discriminant Analysis. Discriminant analysis. Discriminant analysis
Week 5 Based in part on slides from textbook, slides of Susan Holmes Part I Linear Discriminant Analysis October 29, 2012 1 / 1 2 / 1 Nearest centroid rule Suppose we break down our data matrix as by the
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Support Vector Machines Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique
More informationCS4495/6495 Introduction to Computer Vision. 8C-L3 Support Vector Machines
CS4495/6495 Introduction to Computer Vision 8C-L3 Support Vector Machines Discriminative classifiers Discriminative classifiers find a division (surface) in feature space that separates the classes Several
More informationA New Space for Comparing Graphs
A New Space for Comparing Graphs Anshumali Shrivastava and Ping Li Cornell University and Rutgers University August 18th 2014 Anshumali Shrivastava and Ping Li ASONAM 2014 August 18th 2014 1 / 38 Main
More informationRegression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)
Linear Regression Regression Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Example: Height, Gender, Weight Shoe Size Audio features
More informationDATA MINING AND MACHINE LEARNING. Lecture 4: Linear models for regression and classification Lecturer: Simone Scardapane
DATA MINING AND MACHINE LEARNING Lecture 4: Linear models for regression and classification Lecturer: Simone Scardapane Academic Year 2016/2017 Table of contents Linear models for regression Regularized
More informationThe exam is closed book, closed notes except your one-page cheat sheet.
CS 189 Fall 2015 Introduction to Machine Learning Final Please do not turn over the page before you are instructed to do so. You have 2 hours and 50 minutes. Please write your initials on the top-right
More informationRegression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)
Linear Regression Regression Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Example: Height, Gender, Weight Shoe Size Audio features
More informationActive and Semi-supervised Kernel Classification
Active and Semi-supervised Kernel Classification Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London Work done in collaboration with Xiaojin Zhu (CMU), John Lafferty (CMU),
More information5.6 Nonparametric Logistic Regression
5.6 onparametric Logistic Regression Dmitri Dranishnikov University of Florida Statistical Learning onparametric Logistic Regression onparametric? Doesnt mean that there are no parameters. Just means that
More informationSUPPORT VECTOR REGRESSION WITH A GENERALIZED QUADRATIC LOSS
SUPPORT VECTOR REGRESSION WITH A GENERALIZED QUADRATIC LOSS Filippo Portera and Alessandro Sperduti Dipartimento di Matematica Pura ed Applicata Universit a di Padova, Padova, Italy {portera,sperduti}@math.unipd.it
More informationSupport Vector Machine. Industrial AI Lab.
Support Vector Machine Industrial AI Lab. Classification (Linear) Autonomously figure out which category (or class) an unknown item should be categorized into Number of categories / classes Binary: 2 different
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring
More informationSemi-Supervised Learning in Reproducing Kernel Hilbert Spaces Using Local Invariances
Semi-Supervised Learning in Reproducing Kernel Hilbert Spaces Using Local Invariances Wee Sun Lee,2, Xinhua Zhang,2, and Yee Whye Teh Department of Computer Science, National University of Singapore. 2
More informationDiscriminative Models
No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models
More informationMachine Learning Support Vector Machines. Prof. Matteo Matteucci
Machine Learning Support Vector Machines Prof. Matteo Matteucci Discriminative vs. Generative Approaches 2 o Generative approach: we derived the classifier from some generative hypothesis about the way
More informationSVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning
SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning Mark Schmidt University of British Columbia, May 2016 www.cs.ubc.ca/~schmidtm/svan16 Some images from this lecture are
More informationFace Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi
Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi Overview Introduction Linear Methods for Dimensionality Reduction Nonlinear Methods and Manifold
More informationIntroduction to Machine Learning
1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer
More informationClassification. Chapter Introduction. 6.2 The Bayes classifier
Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode
More informationOutline. Motivation. Mapping the input space to the feature space Calculating the dot product in the feature space
to The The A s s in to Fabio A. González Ph.D. Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogotá April 2, 2009 to The The A s s in 1 Motivation Outline 2 The Mapping the
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationLearning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text
Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Yi Zhang Machine Learning Department Carnegie Mellon University yizhang1@cs.cmu.edu Jeff Schneider The Robotics Institute
More informationSparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda
Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic
More informationDiscriminative Models
No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models
More informationLearning Methods for Linear Detectors
Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2011/2012 Lesson 20 27 April 2012 Contents Learning Methods for Linear Detectors Learning Linear Detectors...2
More informationSupport Vector Machines
Support Vector Machines Hypothesis Space variable size deterministic continuous parameters Learning Algorithm linear and quadratic programming eager batch SVMs combine three important ideas Apply optimization
More informationSimple Optimization, Bigger Models, and Faster Learning. Niao He
Simple Optimization, Bigger Models, and Faster Learning Niao He Big Data Symposium, UIUC, 2016 Big Data, Big Picture Niao He (UIUC) 2/26 Big Data, Big Picture Niao He (UIUC) 3/26 Big Data, Big Picture
More informationGaussian discriminant analysis Naive Bayes
DM825 Introduction to Machine Learning Lecture 7 Gaussian discriminant analysis Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Outline 1. is 2. Multi-variate
More informationSupport Vector Machine for Classification and Regression
Support Vector Machine for Classification and Regression Ahlame Douzal AMA-LIG, Université Joseph Fourier Master 2R - MOSIG (2013) November 25, 2013 Loss function, Separating Hyperplanes, Canonical Hyperplan
More informationAdvanced Introduction to Machine Learning
10-715 Advanced Introduction to Machine Learning Homework Due Oct 15, 10.30 am Rules Please follow these guidelines. Failure to do so, will result in loss of credit. 1. Homework is due on the due date
More informationLecture 10: Support Vector Machine and Large Margin Classifier
Lecture 10: Support Vector Machine and Large Margin Classifier Applied Multivariate Analysis Math 570, Fall 2014 Xingye Qiao Department of Mathematical Sciences Binghamton University E-mail: qiao@math.binghamton.edu
More informationRandomized Algorithms
Randomized Algorithms Saniv Kumar, Google Research, NY EECS-6898, Columbia University - Fall, 010 Saniv Kumar 9/13/010 EECS6898 Large Scale Machine Learning 1 Curse of Dimensionality Gaussian Mixture Models
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Oct 18, 2016 Outline One versus all/one versus one Ranking loss for multiclass/multilabel classification Scaling to millions of labels Multiclass
More informationLinear Regression (continued)
Linear Regression (continued) Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 6, 2017 1 / 39 Outline 1 Administration 2 Review of last lecture 3 Linear regression
More informationStatistical Methods for SVM
Statistical Methods for SVM Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find a plane that separates the classes in feature space. If we cannot,
More informationDEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE
Data Provided: None DEPARTMENT OF COMPUTER SCIENCE Autumn Semester 203 204 MACHINE LEARNING AND ADAPTIVE INTELLIGENCE 2 hours Answer THREE of the four questions. All questions carry equal weight. Figures
More informationCPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017
CPSC 340: Machine Learning and Data Mining MLE and MAP Fall 2017 Assignment 3: Admin 1 late day to hand in tonight, 2 late days for Wednesday. Assignment 4: Due Friday of next week. Last Time: Multi-Class
More informationCMSC858P Supervised Learning Methods
CMSC858P Supervised Learning Methods Hector Corrada Bravo March, 2010 Introduction Today we discuss the classification setting in detail. Our setting is that we observe for each subject i a set of p predictors
More informationKernel Methods in Machine Learning
Kernel Methods in Machine Learning Autumn 2015 Lecture 1: Introduction Juho Rousu ICS-E4030 Kernel Methods in Machine Learning 9. September, 2015 uho Rousu (ICS-E4030 Kernel Methods in Machine Learning)
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More information