Learning from permutations. Jean-Philippe Vert

Size: px
Start display at page:

Download "Learning from permutations. Jean-Philippe Vert"

Transcription

1 Learning from permutations Jean-Philippe Vert

2

3 Perception

4 Communication

5 Mobility

6 Health

7 A common process: learning from data Given examples (training data), make a machine learn how to predict on new samples, or discover patterns in data Statistics + optimization + computer science Gets better with more training examples and bigger computers

8 It (more or less) boils down to

9 It (more or less) boils down to

10 It (more or less) boils down to

11 It (more or less) boils down to

12 In practice (eg, linear ridge logistic regression) Input X 1,..., X n R p Output Y 1,..., Y n { 1, 1} Classifier: f β (X) = sign(β X) for β R p Training } n ( ) l i β X i + λ β 2 ˆβ = argmin β R p { 1 n i=1 with l i ( β X i ) = ln ( 1 + e Y i β X i ) Convex optimization problem, scalable (SGD)

13 Questions ˆβ = argmin β R p { 1 n n i=1 How to compute ˆβ in practice? Will f ˆβ make good predictions? How to train nonlinear models? What if inputs are not vectors? What if outputs are not binary?... } ) ln (1 + e Y i β X i + λ β 2

14 Questions ˆβ = argmin β R p { 1 n n i=1 How to compute ˆβ in practice? Will f ˆβ make good predictions? How to train nonlinear models? What if inputs are not vectors? What if outputs are not binary?... } ) ln (1 + e Y i β X i + λ β 2

15 What if inputs are permutations? Permutation: a bijection σ : [1, N] [1, N] σ(i) = rank of item i Composition (σ 1 σ 2 )(i) = σ 1 (σ 2 (i)) S N the symmetric group S N = N!

16 Examples Ranking data Ranks extracted from data (histogram equalization, quantile normalization...)

17 Learning from permutations Assume your data are permutations and you want to learn f : S N R A solutions: embed S N to a Euclidean (or Hilbert) space Φ : S N R p and learn a linear function: f β (σ) = β Φ(σ) The corresponding kernel is K (σ 1, σ 2 ) = Φ(σ 1 ) Φ(σ 2 )

18 How to define the embedding Φ : S N R p? Should encode interesting features Should lead to efficient algorithms Should be invariant to renaming of the items, i.e., the kernel should be right-invariant σ 1, σ 2, π S N, K (σ 1 π, σ 2 π) = K (σ 1, σ 2 )

19 Harmonic analysis on S N A representation of S N is a matrix-valued function ρ : S N C dρ dρ such that σ 1, σ 2 S N, ρ(σ 1 σ 2 ) = ρ(σ 1 )ρ(σ 2 ) A representation is irreductible (irrep) if it is not equivalent to the direct sum of two other representations S N has a finite number of irreps {ρ λ : λ Λ} where Λ = {λ N} 1 is the set of partitions of N For any f : S N R, the Fourier transform of f is λ Λ, ˆf (ρλ ) = σ S N f (σ)ρ λ (σ) 1 λ N iff λ = (λ 1,..., λ r ) with λ 1... λ r and r i=1 λ i = N

20 Right-invariant kernels Bochner s theorem An embedding Φ : S N R p defines a right-invariant kernel K (σ 1, σ 2 ) = Φ(σ 1 ) Φ(σ 2 ) if and only there exists φ : S N R such that σ 1, σ 2 S N, K (σ 1, σ 2 ) = φ(σ 1 2 σ 1) and λ Λ, ˆφ(ρ λ ) 0

21 Some attempts Kendall SUQUAN (Jiao and Vert, 2015, 2017, 2018; Le Morvan and Vert, 2017)

22 SUQUAN embedding (Le Morvan and Vert, 2017) Let Φ(σ) = Π σ the permutation representation (Serres, 1977): { 1 if σ(j) = i, [Π σ ] ij = 0 otherwise. Leads to new approaches for supervised quantile normalization (SUQUAN) and vector quantization

23 Example: CIFAR-10 Discriminate images of horse vs. plane Different methods learn different quantile functions original median SVD SUQUAN BND Index Index Index

24 Limits of the SUQUAN embedding Linear model on Φ(σ) = Π σ R N N Captures first-order information of the form "i-th feature ranked at the j-th position" What about higher-order information such as "feature i larger than feature j"?

25 The Kendall embedding (Jiao and Vert, 2015, 2017) Φ i,j (σ) = { 1 if σ(i) < σ(j), 0 otherwise.

26 Kendall and Mallows kernels The Kendall kernel is K τ (σ, σ ) = Φ(σ) Φ(σ ) The Mallows kernel is λ 0 KM λ (σ, σ ) = e λ Φ(σ) Φ(σ ) 2 Theorem (Jiao and Vert, 2015, 2017) The Kendall and Mallows kernels are positive definite and can be evaluated in O(N log N) time Kernel trick useful with few samples in large dimensions

27 Remark Kondor and Barbarosa (2010) proposed the diffusion kernel on the Cayley graph of the symmetric group generated by adjacent transpositions. Computationally intensive (O(N 2N )) Mallows kernel is written as K λ M (σ, σ ) = e λn d (σ,σ ), Cayley graph of S 4 where n d (σ, σ ) is the shortest path distance on the Cayley graph. It can be computed in O(N log N)

28 Higher-order kernels (Jiao and Vert, 2018) Φ(σ) = Π d σ For d = 1, this is the SUQUAN embedding For d = 2, this leads to a new weighted Kendall kernel, where weights can optimized during training

29 Conclusion Kendall SUQUAN Machine learning beyond vectors, strings and graphs Different embeddings of the symmetric group Scalability? Robustness to adversarial attacks? MERCI!

30 References R. E. Barlow, D. Bartholomew, J. M. Bremner, and H. D. Brunk. Statistical inference under order restrictions; the theory and application of isotonic regression. Wiley, New-York, Y. Jiao and J.-P. Vert. The Kendall and Mallows kernels for permutations. In Proceedings of The 32nd International Conference on Machine Learning, volume 37 of JMLR:W&CP, pages , URL Y. Jiao and J.-P. Vert. The Kendall and Mallows kernels for permutations. IEEE Transactions on Pattern Analysis and Machine Intelligence, doi: /TPAMI URL Y. Jiao and J.-P. Vert. The weighted kendall and high-order kernels for permutations. Technical Report , arxiv, W. R. Knight. A computer method for calculating Kendall s tau with ungrouped data. J. Am. Stat. Assoc., 61(314): , URL M. Le Morvan and J.-P. Vert. Supervised quantile normalisation. Technical Report , arxiv, J.-P. Serres. Linear Representations of Finite Groups. Graduate Texts in Mathematics. Springer-Verlag New York, doi: / URL O. Sysoev and O. Burdakov. A smoothed monotonic regression via l2 regularization. Technical Report LiTH-MAT-R 2016/01 SE, Department of mathematics, Linköping University, URL

31 The quantile normalization (QN) embedding Data: permutation σ S N where σ(i)= rank of item/feature i Fix a target quantile q R N Define Φ q : S N R N by σ S N, [Φ q (σ)] i = q σ(i) "Keep the order, change the values"

32 How to choose a "good" target distribution?

33 SUQUAN (Le Morvan and Vert, 2017) Learn after standard QN: 1 Fix q arbitrarily 2 QN all samples to get Φ q (σ 1 ),..., Φ q (σ n ) 3 Learn a model on normalized data, e.g.: ˆβ = argmin β R N { 1 n } n ) l i (β Φ q (σ i ) + λ β 2 i=1 Supervised QN (SUQUAN): jointly learn q and the model: { } ( 1 n ) ˆβ, ˆq) = argmin l i (β Φ q (σ i ) + λ β 2 + γω(q) β,q R N n i=1

34 SUQUAN (Le Morvan and Vert, 2017) Learn after standard QN: 1 Fix q arbitrarily 2 QN all samples to get Φ q (σ 1 ),..., Φ q (σ n ) 3 Learn a model on normalized data, e.g.: ˆβ = argmin β R N { 1 n } n ) l i (β Φ q (σ i ) + λ β 2 i=1 Supervised QN (SUQUAN): jointly learn q and the model: { } ( 1 n ) ˆβ, ˆq) = argmin l i (β Φ q (σ i ) + λ β 2 + γω(q) β,q R N n i=1

35 Computing Φ q (σ) For σ S N let the permutation representation (Serres, 1977): { 1 if σ(j) = i, [Π σ ] ij = 0 otherwise. Then Φ q (σ) = Π σ q

36 Linear SUQAN as rank-1 matrix regression Linear SUQUAN therefore solves { 1 min β,q R N = min β,q R N = min β,q R N n l i { 1 n l i { 1 n l i ( ) β Φ q (σ i ) + λ β 2 + γω(q) } ( ) } β Π σ i q + λ β 2 + γω(q) ) } (< qβ, Π σi > Frobenius + λ β 2 + γω(q) A particular linear model to estimate a rank-1 matrix M = qβ Each sample σ S N is represented by the matrix Π σ R n n Non-convex Alternative optimization of f and w is easy

37 Experiments: CIFAR-10 Image classification into 10 classes (45 binary problems) N = 5, 000 per class, p = 1, 024 pixels AUC cauchy exponential uniform gaussian median SUQUAN SVD SUQUAN BND SUQUAN SPAV AUC on test set SUQUAN BND AUC on test set median

38 Experiments: CIFAR-10 Example: horse vs. plane Different methods learn different quantile functions original median SVD SUQUAN BND Index Index Index

39 Outline 1 The Kendall embedding

40 Limits of the QN embedding Linear model on Φ(σ) = Π σ R N N Captures first-order information of the form "i-th feature ranked at the j-th position" What about higher-order information such as "feature i larger than feature j"?

41 Another representation Φ i,j (σ) = { 1 if σ(i) < σ(j), 0 otherwise.

42 Kendall and Mallows kernels The Kendall kernel is K τ (σ, σ ) = Φ(σ) Φ(σ ) The Mallows kernel is λ 0 KM λ (σ, σ ) = e λ Φ(σ) Φ(σ ) 2 Theorem (Jiao and Vert, 2015, 2017) The Kendall and Mallows kernels are positive definite and can be evaluated in O(N log N) time Kernel trick useful with few samples in large dimensions

43 Proof For any two permutations σ, σ S N : Inner product Φ(σ) Φ(σ ) = 1 σ(i)<σ(j) 1 σ (i)<σ (j) = n c (σ, σ ) 1 i j N n c = number of concordant pairs Distance Φ(σ) Φ(σ ) 2 = (1 σ(i)<σ(j) 1 σ (i)<σ (j)) 2 = 2n d (σ, σ ) 1 i,j N n d = number of discordant pairs n c and n c can be computed in O(N log N) (Knight, 1966)

44 Related work Kondor and Barbarosa (2010) proposed the diffusion kernel on the Cayley graph of the symmetric group generated by adjacent transpositions. Computationally intensive (O(N 2N )) Mallows kernel is written as K λ M (σ, σ ) = e λn d (σ,σ ), Cayley graph of S 4 where n d (σ, σ ) is the shortest path distance on the Cayley graph. It can be computed in O(N log N)

45 Applications SVMkdtALL SVMlinearTOP SVMlinearALL SVMkdtTOP SVMpolyALL KFDkdtALL ktsp SVMpolyTOP KFDlinearALL KFDpolyALL TSP SVMrbfALL KFDrbfALL APMV acc Average performance on 10 microarray classification problems (Jiao and Vert, 2017).

46 Constraints on f Ridge Non-decreasing F 0 = { f R p : 1 p p i=1 f 2 i 1 }. F BND = F 0 I 0, where I 0 = {f R p : f 1 f 2... f p } Non-decreasing and smooth F SPAV = f I 0 : p 1 (f j+1 f j ) 2 1. j=1

47 produce a sequence of target quantiles although we found in our experiments below that the performance did not change significantly after the first iteration. Note also that, contrary to SUQUAN-SVD, this algorithm requires an initial non-decreasing target quantile function. By default we suggest to use the median of the data quantile functions, which is often the default used in standard QN normalisation. SUQUAN-BND and SUQUAN-PAVA Algorithm 2: SUQUAN-BND and SUQUAN-SPAV Input: (x1, y1 ),..., (xn, yn ), finit 2 I0, 2 R Output: f 2 I0 target quantile 1: for i = 1 to n do 2: ranki, orderi sort(xi ) 3: end for P n 4: w, b argmin n1 i=1 `i w> finit [ranki ] + b + w 2 w,b (standard linear optimisation) Pmodel n 5: f argmin n1 i=1 `i f > w[orderi ] + b f 2FBN D (isotonic optimisation problem using PAVA as prox) OR Pn f argmin n1 i=1 `i f > w[orderi ] + b f 2FSP AV (smoothed isotonic optimisation problem using SPAV as prox) 6 Experiments Alternate optimization in w and f, monotonicity constraint on f 6.1 Simulated dataproximal gradient optimization for f, using the Pool Accelerated We first test the ability of SUQUAN to overcome unwanted changes in quantile distributions on simulated Adjacent Violators Algorithm (PAVA, Barlow et al. (1972)) or the datasets. For that purpose we fix f 2 Rp to be the quantile distribution of the normal distribution, and p simulatesmoothed each sample xpool permuting the entries(spav, of f. WeSysoev then generate binary Violators algorithm and 1,..., xadjacent n 2 R by randomly labels y1,..., yn 2 { 1, 1} using the logistic model P (Y = 1 X = x) = 1+exp(1 w x), where w is randomly Burdakov (2016)) as proximal operator. sampled from a standard multivariate normal distribution. We then compare four methods to estimate w >

48 D A variant: SUQUAN-SVD F0, i.e., when we do not constrain nd ( ) = 2, then the set M Algorithm 1: SUQUAN-SVD 8) is exactly the set of rank-1 mainput: mounts to finding a rank-1 matrix (x1, y1 ),..., (xn, yn ) 2 Rp { 1, 1} es a linear regression or classificaoutput: f 2 F0 target quantile der the binary classification setting, 1: MLDA 0 2 Rp p mposed of pairs (xi, yi )i=1,...,n with 2: n {i : yi = +1} +1 e, a simple linear classifier (without 3: n 1 {i : yi = 1} e obtained by linear discriminant 4: for i = 1 to n do ariance: MLDA = µ+ µ, where 5: Compute xi (by sorting xi ) y the means of the matrices xi for 6: MLDA MLDA + nyyi xi lasses. Consequently, a good ranki 7: end for e closest rank-1 matrix to MLDA, 8: (, w, f ) SV D(MLDA, 1) d v are the left and right singular ed to the largest singular value. quantile function by keeping only tor of MLDA, which can then be used as target quantile for quantile normalising Ridge penalty (no monotonicity constraint), equivalent to running any linear classification method. Algorithm 1 summarises the method. nvolves an O(p ln(p)) sortingproblem of the entries of xi, and therefore computing MLDA, regression ion of n permutation matrices, requires O(np ln(p)) operations. Then computing finds the closest rank-1 thea LDA vector (linesvd 8) of M costs another O(p2 ) matrix operationsto using naive solution: LDA typically owever, if n p, we can exploit the fact that the product of a permutation matrix X X 1 permutation), so that the 1 power operation (just order the vector according to the MLDA = Πxi singular Πxi rst singular vector only takes O(np). Computing the right largest vector n+ n which is np)) complexity. Hence the complexity of SUQUAN-SVD is O(np ln(p)), i : yi =+1 i : yi =+1 y of the quantile normalisation. Complexity O(np ln(p)) (same as QN only) rank-1

49 Experiments: Simulations True distribution of X entries is normal Corrupt data with a cauchy, exponential, uniform or bimodal gaussian distributions. p = 1000, n varies, logistic regression. AUC Log. reg. original Log. reg. corrupted SUQUAN BND SUQUAN SPAV Number of samples Euclidean distance to original target quantile Number of samples

arxiv: v1 [stat.ml] 1 Jun 2017

arxiv: v1 [stat.ml] 1 Jun 2017 Supervised Quantile Normalisation Marine Le Morvan,2,3 and Jean-Philippe Vert,2,3,4 arxiv:706.00244v [stat.ml] Jun 207 MINES ParisTech, PSL Research University, CBIO-Centre for Computational Biology, 75006

More information

Introduction to Machine Learning

Introduction to Machine Learning 1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer

More information

CIS 520: Machine Learning Oct 09, Kernel Methods

CIS 520: Machine Learning Oct 09, Kernel Methods CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed

More information

Support Vector Machines for Classification: A Statistical Portrait

Support Vector Machines for Classification: A Statistical Portrait Support Vector Machines for Classification: A Statistical Portrait Yoonkyung Lee Department of Statistics The Ohio State University May 27, 2011 The Spring Conference of Korean Statistical Society KAIST,

More information

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM 1 Support Vector Machines (SVM) in bioinformatics Day 1: Introduction to SVM Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan Jean-Philippe.Vert@mines.org Human Genome Center, University

More information

A Least Squares Formulation for Canonical Correlation Analysis

A Least Squares Formulation for Canonical Correlation Analysis A Least Squares Formulation for Canonical Correlation Analysis Liang Sun, Shuiwang Ji, and Jieping Ye Department of Computer Science and Engineering Arizona State University Motivation Canonical Correlation

More information

A Magiv CV Theory for Large-Margin Classifiers

A Magiv CV Theory for Large-Margin Classifiers A Magiv CV Theory for Large-Margin Classifiers Hui Zou School of Statistics, University of Minnesota June 30, 2018 Joint work with Boxiang Wang Outline 1 Background 2 Magic CV formula 3 Magic support vector

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Big Data Analytics: Optimization and Randomization

Big Data Analytics: Optimization and Randomization Big Data Analytics: Optimization and Randomization Tianbao Yang Tutorial@ACML 2015 Hong Kong Department of Computer Science, The University of Iowa, IA, USA Nov. 20, 2015 Yang Tutorial for ACML 15 Nov.

More information

Short Course Robust Optimization and Machine Learning. 3. Optimization in Supervised Learning

Short Course Robust Optimization and Machine Learning. 3. Optimization in Supervised Learning Short Course Robust Optimization and 3. Optimization in Supervised EECS and IEOR Departments UC Berkeley Spring seminar TRANSP-OR, Zinal, Jan. 16-19, 2012 Outline Overview of Supervised models and variants

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

CS798: Selected topics in Machine Learning

CS798: Selected topics in Machine Learning CS798: Selected topics in Machine Learning Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Jakramate Bootkrajang CS798: Selected topics in Machine Learning

More information

Support Vector Machines

Support Vector Machines Wien, June, 2010 Paul Hofmarcher, Stefan Theussl, WU Wien Hofmarcher/Theussl SVM 1/21 Linear Separable Separating Hyperplanes Non-Linear Separable Soft-Margin Hyperplanes Hofmarcher/Theussl SVM 2/21 (SVM)

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Linear Classifiers. Blaine Nelson, Tobias Scheffer

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Linear Classifiers. Blaine Nelson, Tobias Scheffer Universität Potsdam Institut für Informatik Lehrstuhl Linear Classifiers Blaine Nelson, Tobias Scheffer Contents Classification Problem Bayesian Classifier Decision Linear Classifiers, MAP Models Logistic

More information

Pattern Recognition and Machine Learning. Perceptrons and Support Vector machines

Pattern Recognition and Machine Learning. Perceptrons and Support Vector machines Pattern Recognition and Machine Learning James L. Crowley ENSIMAG 3 - MMIS Fall Semester 2016 Lessons 6 10 Jan 2017 Outline Perceptrons and Support Vector machines Notation... 2 Perceptrons... 3 History...3

More information

Graph Wavelets to Analyze Genomic Data with Biological Networks

Graph Wavelets to Analyze Genomic Data with Biological Networks Graph Wavelets to Analyze Genomic Data with Biological Networks Yunlong Jiao and Jean-Philippe Vert "Emerging Topics in Biological Networks and Systems Biology" symposium, Swedish Collegium for Advanced

More information

Jeff Howbert Introduction to Machine Learning Winter

Jeff Howbert Introduction to Machine Learning Winter Classification / Regression Support Vector Machines Jeff Howbert Introduction to Machine Learning Winter 2012 1 Topics SVM classifiers for linearly separable classes SVM classifiers for non-linearly separable

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2016 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Machine Learning. Classification, Discriminative learning. Marc Toussaint University of Stuttgart Summer 2015

Machine Learning. Classification, Discriminative learning. Marc Toussaint University of Stuttgart Summer 2015 Machine Learning Classification, Discriminative learning Structured output, structured input, discriminative function, joint input-output features, Likelihood Maximization, Logistic regression, binary

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

CSC 411 Lecture 17: Support Vector Machine

CSC 411 Lecture 17: Support Vector Machine CSC 411 Lecture 17: Support Vector Machine Ethan Fetaya, James Lucas and Emad Andrews University of Toronto CSC411 Lec17 1 / 1 Today Max-margin classification SVM Hard SVM Duality Soft SVM CSC411 Lec17

More information

10-701/ Recitation : Kernels

10-701/ Recitation : Kernels 10-701/15-781 Recitation : Kernels Manojit Nandi February 27, 2014 Outline Mathematical Theory Banach Space and Hilbert Spaces Kernels Commonly Used Kernels Kernel Theory One Weird Kernel Trick Representer

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University

More information

A Randomized Approach for Crowdsourcing in the Presence of Multiple Views

A Randomized Approach for Crowdsourcing in the Presence of Multiple Views A Randomized Approach for Crowdsourcing in the Presence of Multiple Views Presenter: Yao Zhou joint work with: Jingrui He - 1 - Roadmap Motivation Proposed framework: M2VW Experimental results Conclusion

More information

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction Linear vs Non-linear classifier CS789: Machine Learning and Neural Network Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Linear classifier is in the

More information

Learning gradients: prescriptive models

Learning gradients: prescriptive models Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION 1 Outline Basic terminology Features Training and validation Model selection Error and loss measures Statistical comparison Evaluation measures 2 Terminology

More information

Machine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber

Machine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber Machine Learning Regression-Based Classification & Gaussian Discriminant Analysis Manfred Huber 2015 1 Logistic Regression Linear regression provides a nice representation and an efficient solution to

More information

Support Vector Machine

Support Vector Machine Support Vector Machine Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Linear Support Vector Machine Kernelized SVM Kernels 2 From ERM to RLM Empirical Risk Minimization in the binary

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Gaussian and Linear Discriminant Analysis; Multiclass Classification

Gaussian and Linear Discriminant Analysis; Multiclass Classification Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015

More information

Midterm exam CS 189/289, Fall 2015

Midterm exam CS 189/289, Fall 2015 Midterm exam CS 189/289, Fall 2015 You have 80 minutes for the exam. Total 100 points: 1. True/False: 36 points (18 questions, 2 points each). 2. Multiple-choice questions: 24 points (8 questions, 3 points

More information

hsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference

hsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference CS 229 Project Report (TR# MSB2010) Submitted 12/10/2010 hsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference Muhammad Shoaib Sehgal Computer Science

More information

Kernel Learning via Random Fourier Representations

Kernel Learning via Random Fourier Representations Kernel Learning via Random Fourier Representations L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Module 5: Machine Learning L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Kernel Learning via Random

More information

Machine Learning (CS 567) Lecture 5

Machine Learning (CS 567) Lecture 5 Machine Learning (CS 567) Lecture 5 Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol

More information

Classification and Support Vector Machine

Classification and Support Vector Machine Classification and Support Vector Machine Yiyong Feng and Daniel P. Palomar The Hong Kong University of Science and Technology (HKUST) ELEC 5470 - Convex Optimization Fall 2017-18, HKUST, Hong Kong Outline

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 Exam policy: This exam allows two one-page, two-sided cheat sheets (i.e. 4 sides); No other materials. Time: 2 hours. Be sure to write

More information

Support Vector Machines: Maximum Margin Classifiers

Support Vector Machines: Maximum Margin Classifiers Support Vector Machines: Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 16, 2008 Piotr Mirowski Based on slides by Sumit Chopra and Fu-Jie Huang 1 Outline What is behind

More information

Statistical learning on graphs

Statistical learning on graphs Statistical learning on graphs Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr ParisTech, Ecole des Mines de Paris Institut Curie INSERM U900 Seminar of probabilities, Institut Joseph Fourier, Grenoble,

More information

Lecture 18: Kernels Risk and Loss Support Vector Regression. Aykut Erdem December 2016 Hacettepe University

Lecture 18: Kernels Risk and Loss Support Vector Regression. Aykut Erdem December 2016 Hacettepe University Lecture 18: Kernels Risk and Loss Support Vector Regression Aykut Erdem December 2016 Hacettepe University Administrative We will have a make-up lecture on next Saturday December 24, 2016 Presentations

More information

Kernel Methods. Konstantin Tretyakov MTAT Machine Learning

Kernel Methods. Konstantin Tretyakov MTAT Machine Learning Kernel Methods Konstantin Tretyakov (kt@ut.ee) MTAT.03.227 Machine Learning So far Supervised machine learning Linear models Non-linear models Unsupervised machine learning Generic scaffolding So far Supervised

More information

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2

More information

Linear Models in Machine Learning

Linear Models in Machine Learning CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,

More information

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Final Overview. Introduction to ML. Marek Petrik 4/25/2017 Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,

More information

Feature Engineering, Model Evaluations

Feature Engineering, Model Evaluations Feature Engineering, Model Evaluations Giri Iyengar Cornell University gi43@cornell.edu Feb 5, 2018 Giri Iyengar (Cornell Tech) Feature Engineering Feb 5, 2018 1 / 35 Overview 1 ETL 2 Feature Engineering

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find a plane that separates the classes in feature space. If we cannot, we get creative in two

More information

A GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES. Wei Chu, S. Sathiya Keerthi, Chong Jin Ong

A GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES. Wei Chu, S. Sathiya Keerthi, Chong Jin Ong A GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES Wei Chu, S. Sathiya Keerthi, Chong Jin Ong Control Division, Department of Mechanical Engineering, National University of Singapore 0 Kent Ridge Crescent,

More information

Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides

Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides Intelligent Data Analysis and Probabilistic Inference Lecture

More information

SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels

SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels Karl Stratos June 21, 2018 1 / 33 Tangent: Some Loose Ends in Logistic Regression Polynomial feature expansion in logistic

More information

over the parameters θ. In both cases, consequently, we select the minimizing

over the parameters θ. In both cases, consequently, we select the minimizing MONOTONE REGRESSION JAN DE LEEUW Abstract. This is an entry for The Encyclopedia of Statistics in Behavioral Science, to be published by Wiley in 200. In linear regression we fit a linear function y =

More information

ECE521: Inference Algorithms and Machine Learning University of Toronto. Assignment 1: k-nn and Linear Regression

ECE521: Inference Algorithms and Machine Learning University of Toronto. Assignment 1: k-nn and Linear Regression ECE521: Inference Algorithms and Machine Learning University of Toronto Assignment 1: k-nn and Linear Regression TA: Use Piazza for Q&A Due date: Feb 7 midnight, 2017 Electronic submission to: ece521ta@gmailcom

More information

Machine Learning Practice Page 2 of 2 10/28/13

Machine Learning Practice Page 2 of 2 10/28/13 Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes

More information

Support Vector Machines. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Support Vector Machines. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington Support Vector Machines CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 A Linearly Separable Problem Consider the binary classification

More information

Kernel Methods. Konstantin Tretyakov MTAT Machine Learning

Kernel Methods. Konstantin Tretyakov MTAT Machine Learning Kernel Methods Konstantin Tretyakov (kt@ut.ee) MTAT.03.227 Machine Learning So far Supervised machine learning Linear models Least squares regression, SVR Fisher s discriminant, Perceptron, Logistic model,

More information

SVRG++ with Non-uniform Sampling

SVRG++ with Non-uniform Sampling SVRG++ with Non-uniform Sampling Tamás Kern András György Department of Electrical and Electronic Engineering Imperial College London, London, UK, SW7 2BT {tamas.kern15,a.gyorgy}@imperial.ac.uk Abstract

More information

Part I. Linear Discriminant Analysis. Discriminant analysis. Discriminant analysis

Part I. Linear Discriminant Analysis. Discriminant analysis. Discriminant analysis Week 5 Based in part on slides from textbook, slides of Susan Holmes Part I Linear Discriminant Analysis October 29, 2012 1 / 1 2 / 1 Nearest centroid rule Suppose we break down our data matrix as by the

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Support Vector Machines Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique

More information

CS4495/6495 Introduction to Computer Vision. 8C-L3 Support Vector Machines

CS4495/6495 Introduction to Computer Vision. 8C-L3 Support Vector Machines CS4495/6495 Introduction to Computer Vision 8C-L3 Support Vector Machines Discriminative classifiers Discriminative classifiers find a division (surface) in feature space that separates the classes Several

More information

A New Space for Comparing Graphs

A New Space for Comparing Graphs A New Space for Comparing Graphs Anshumali Shrivastava and Ping Li Cornell University and Rutgers University August 18th 2014 Anshumali Shrivastava and Ping Li ASONAM 2014 August 18th 2014 1 / 38 Main

More information

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Linear Regression Regression Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Example: Height, Gender, Weight Shoe Size Audio features

More information

DATA MINING AND MACHINE LEARNING. Lecture 4: Linear models for regression and classification Lecturer: Simone Scardapane

DATA MINING AND MACHINE LEARNING. Lecture 4: Linear models for regression and classification Lecturer: Simone Scardapane DATA MINING AND MACHINE LEARNING Lecture 4: Linear models for regression and classification Lecturer: Simone Scardapane Academic Year 2016/2017 Table of contents Linear models for regression Regularized

More information

The exam is closed book, closed notes except your one-page cheat sheet.

The exam is closed book, closed notes except your one-page cheat sheet. CS 189 Fall 2015 Introduction to Machine Learning Final Please do not turn over the page before you are instructed to do so. You have 2 hours and 50 minutes. Please write your initials on the top-right

More information

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Linear Regression Regression Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Example: Height, Gender, Weight Shoe Size Audio features

More information

Active and Semi-supervised Kernel Classification

Active and Semi-supervised Kernel Classification Active and Semi-supervised Kernel Classification Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London Work done in collaboration with Xiaojin Zhu (CMU), John Lafferty (CMU),

More information

5.6 Nonparametric Logistic Regression

5.6 Nonparametric Logistic Regression 5.6 onparametric Logistic Regression Dmitri Dranishnikov University of Florida Statistical Learning onparametric Logistic Regression onparametric? Doesnt mean that there are no parameters. Just means that

More information

SUPPORT VECTOR REGRESSION WITH A GENERALIZED QUADRATIC LOSS

SUPPORT VECTOR REGRESSION WITH A GENERALIZED QUADRATIC LOSS SUPPORT VECTOR REGRESSION WITH A GENERALIZED QUADRATIC LOSS Filippo Portera and Alessandro Sperduti Dipartimento di Matematica Pura ed Applicata Universit a di Padova, Padova, Italy {portera,sperduti}@math.unipd.it

More information

Support Vector Machine. Industrial AI Lab.

Support Vector Machine. Industrial AI Lab. Support Vector Machine Industrial AI Lab. Classification (Linear) Autonomously figure out which category (or class) an unknown item should be categorized into Number of categories / classes Binary: 2 different

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Semi-Supervised Learning in Reproducing Kernel Hilbert Spaces Using Local Invariances

Semi-Supervised Learning in Reproducing Kernel Hilbert Spaces Using Local Invariances Semi-Supervised Learning in Reproducing Kernel Hilbert Spaces Using Local Invariances Wee Sun Lee,2, Xinhua Zhang,2, and Yee Whye Teh Department of Computer Science, National University of Singapore. 2

More information

Discriminative Models

Discriminative Models No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models

More information

Machine Learning Support Vector Machines. Prof. Matteo Matteucci

Machine Learning Support Vector Machines. Prof. Matteo Matteucci Machine Learning Support Vector Machines Prof. Matteo Matteucci Discriminative vs. Generative Approaches 2 o Generative approach: we derived the classifier from some generative hypothesis about the way

More information

SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning

SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning Mark Schmidt University of British Columbia, May 2016 www.cs.ubc.ca/~schmidtm/svan16 Some images from this lecture are

More information

Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi

Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi Overview Introduction Linear Methods for Dimensionality Reduction Nonlinear Methods and Manifold

More information

Introduction to Machine Learning

Introduction to Machine Learning 1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer

More information

Classification. Chapter Introduction. 6.2 The Bayes classifier

Classification. Chapter Introduction. 6.2 The Bayes classifier Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode

More information

Outline. Motivation. Mapping the input space to the feature space Calculating the dot product in the feature space

Outline. Motivation. Mapping the input space to the feature space Calculating the dot product in the feature space to The The A s s in to Fabio A. González Ph.D. Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogotá April 2, 2009 to The The A s s in 1 Motivation Outline 2 The Mapping the

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Yi Zhang Machine Learning Department Carnegie Mellon University yizhang1@cs.cmu.edu Jeff Schneider The Robotics Institute

More information

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Sparse regression. Optimization-Based Data Analysis.   Carlos Fernandez-Granda Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic

More information

Discriminative Models

Discriminative Models No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models

More information

Learning Methods for Linear Detectors

Learning Methods for Linear Detectors Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2011/2012 Lesson 20 27 April 2012 Contents Learning Methods for Linear Detectors Learning Linear Detectors...2

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Hypothesis Space variable size deterministic continuous parameters Learning Algorithm linear and quadratic programming eager batch SVMs combine three important ideas Apply optimization

More information

Simple Optimization, Bigger Models, and Faster Learning. Niao He

Simple Optimization, Bigger Models, and Faster Learning. Niao He Simple Optimization, Bigger Models, and Faster Learning Niao He Big Data Symposium, UIUC, 2016 Big Data, Big Picture Niao He (UIUC) 2/26 Big Data, Big Picture Niao He (UIUC) 3/26 Big Data, Big Picture

More information

Gaussian discriminant analysis Naive Bayes

Gaussian discriminant analysis Naive Bayes DM825 Introduction to Machine Learning Lecture 7 Gaussian discriminant analysis Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Outline 1. is 2. Multi-variate

More information

Support Vector Machine for Classification and Regression

Support Vector Machine for Classification and Regression Support Vector Machine for Classification and Regression Ahlame Douzal AMA-LIG, Université Joseph Fourier Master 2R - MOSIG (2013) November 25, 2013 Loss function, Separating Hyperplanes, Canonical Hyperplan

More information

Advanced Introduction to Machine Learning

Advanced Introduction to Machine Learning 10-715 Advanced Introduction to Machine Learning Homework Due Oct 15, 10.30 am Rules Please follow these guidelines. Failure to do so, will result in loss of credit. 1. Homework is due on the due date

More information

Lecture 10: Support Vector Machine and Large Margin Classifier

Lecture 10: Support Vector Machine and Large Margin Classifier Lecture 10: Support Vector Machine and Large Margin Classifier Applied Multivariate Analysis Math 570, Fall 2014 Xingye Qiao Department of Mathematical Sciences Binghamton University E-mail: qiao@math.binghamton.edu

More information

Randomized Algorithms

Randomized Algorithms Randomized Algorithms Saniv Kumar, Google Research, NY EECS-6898, Columbia University - Fall, 010 Saniv Kumar 9/13/010 EECS6898 Large Scale Machine Learning 1 Curse of Dimensionality Gaussian Mixture Models

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Oct 18, 2016 Outline One versus all/one versus one Ranking loss for multiclass/multilabel classification Scaling to millions of labels Multiclass

More information

Linear Regression (continued)

Linear Regression (continued) Linear Regression (continued) Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 6, 2017 1 / 39 Outline 1 Administration 2 Review of last lecture 3 Linear regression

More information

Statistical Methods for SVM

Statistical Methods for SVM Statistical Methods for SVM Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find a plane that separates the classes in feature space. If we cannot,

More information

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE Data Provided: None DEPARTMENT OF COMPUTER SCIENCE Autumn Semester 203 204 MACHINE LEARNING AND ADAPTIVE INTELLIGENCE 2 hours Answer THREE of the four questions. All questions carry equal weight. Figures

More information

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017 CPSC 340: Machine Learning and Data Mining MLE and MAP Fall 2017 Assignment 3: Admin 1 late day to hand in tonight, 2 late days for Wednesday. Assignment 4: Due Friday of next week. Last Time: Multi-Class

More information

CMSC858P Supervised Learning Methods

CMSC858P Supervised Learning Methods CMSC858P Supervised Learning Methods Hector Corrada Bravo March, 2010 Introduction Today we discuss the classification setting in detail. Our setting is that we observe for each subject i a set of p predictors

More information

Kernel Methods in Machine Learning

Kernel Methods in Machine Learning Kernel Methods in Machine Learning Autumn 2015 Lecture 1: Introduction Juho Rousu ICS-E4030 Kernel Methods in Machine Learning 9. September, 2015 uho Rousu (ICS-E4030 Kernel Methods in Machine Learning)

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information