Surrogate regret bounds for generalized classification performance metrics

Size: px
Start display at page:

Download "Surrogate regret bounds for generalized classification performance metrics"

Transcription

1 Surrogate regret bounds for generalized classification performance metrics Wojciech Kotłowski Krzysztof Dembczyński Poznań University of Technology PL-SIGML, Częstochowa, / 36

2 Motivation 2 / 36

3 Kaggle Higgs Boson Machine Learning Challenge 3 / 36

4 Data set and rules Classes: signal (h τ + τ ) and background. # events # features % signal weight / 36

5 Data set and rules Classes: signal (h τ + τ ) and background. # events # features % signal weight Evaluation: Approximate Median Significance (AMS): ( AMS = 2(s + b + 10) log 1 + s ) s b + 10 s, b weight of signal/background events classified as signal. 4 / 36

6 How to optimize AMS? Research Problem How to optimize a global function of true/false positives/negatives, not decomposable into individual losses over the observations? 5 / 36

7 How to optimize AMS? Research Problem How to optimize a global function of true/false positives/negatives, not decomposable into individual losses over the observations? Most popular approach: Sort classifier s scores and threshold to maximize AMS. threshold classifier s score classified as negative classified as positive AMS not used while training, only for tuning the threshold. 5 / 36

8 How to optimize AMS? Research Problem How to optimize a global function of true/false positives/negatives, not decomposable into individual losses over the observations? Most popular approach: Sort classifier s scores and threshold to maximize AMS. threshold classifier s score classified as negative classified as positive AMS not used while training, only for tuning the threshold. Is this approach theoretically justified? 5 / 36

9 Optimization of generalized performance metrics Wojciech Kotłowski and Krzysztof Dembczyński. Surrogate regret bounds for generalized classification performance metrics. In ACML, volume 45 of JMLR W&C Proc., pages , 2015 [Best Paper Award] Wojciech Kotłowski. Consistent optimization of AMS by logistic loss minimization. In NIPS HEPML Workshop, volume 42 of JMLR W&C Proc., pages , / 36

10 Generalized performance metrics for binary classification Given a binary classifier h: X { 1, 1}, define: ( ) Ψ(h) = Ψ FP(h), FN(h), where: FP(h) = Pr(h(x) = 1 y = 1), FN(h) = Pr(h(x) = 1 y = 1). true y predicted ŷ = h(x) 1 +1 total 1 TN FP 1 P +1 FN TP P Goal: maximize Ψ(h). We assume Ψ is non-increasing in FP and FN. 7 / 36

11 Linear-fractional performance metric Definition Ψ(FP, FN) = a 0 + a 1 FP + a 2 FN b 0 + b 1 FP + b 2 FN, 8 / 36

12 Linear-fractional performance metric Definition Ψ(FP, FN) = a 0 + a 1 FP + a 2 FN b 0 + b 1 FP + b 2 FN, Examples Accuracy Acc = 1 FN FP F β -measure F β = (1+β2 )(P FN) (1+β 2 )P FN+FP Jaccard similarity J = P FN P +FP AM measure AM = 1 1 2P FN 1 2(1 P ) FP Weighted accuracy WA = 1 w FP w + FN 8 / 36

13 Convex performance metrics Definition Ψ(FP, FN) is jointly convex in FP and FN. 9 / 36

14 Convex performance metrics Definition Ψ(FP, FN) is jointly convex in FP and FN. Example: AMS 2 score ( ( AMS 2 (TP, FP) = 2 (TP + FP) log 1 + TP ) ) TP. FP 9 / 36

15 Example - F 1 -measure F FP FN 10 / 36

16 Example - AMS 2 score AMS FP FN 11 / 36

17 A simple approach to optimization of Ψ(h) training data

18 A simple approach to optimization of Ψ(h) training data f(x) learn real-valued f(x) (using standard classification tool)

19 A simple approach to optimization of Ψ(h) training data validation data learn real-valued f(x) (using standard classification tool) f(x)

20 A simple approach to optimization of Ψ(h) training data validation data learn real-valued f(x) (using standard classification tool) f(x) learn a threshold θ on f(x) by optimizing Ψ(h f,θ ) θ h f,θ (x) = 1 h f,θ (x) = +1 f(x) 12 / 36

21 Our results Algorithm training set f(x) θ validation set h f,θ (x) 1 Learn f minimizing a surrogate loss on the training sample. 2 Given f, tune a threshold θ on f on a the validation sample by direct optimization of Ψ. 13 / 36

22 Our results Algorithm training set f(x) θ validation set h f,θ (x) 1 Learn f minimizing a surrogate loss on the training sample. 2 Given f, tune a threshold θ on f on a the validation sample by direct optimization of Ψ. Our results (informally) Assumptions: Claim: the surrogate loss is strongly proper composite (e.g., logistic, exponential, squared-error loss), Ψ is linear-fractional or jointly convex, If f is close to the minimizer of the surrogate loss, then h f,θ is close to the maximizer of Ψ. 13 / 36

23 Ψ-regret and l-regret Ψ-regret of a classifier h: X { 1, 1}: Reg Ψ (h) = Ψ(h ) Ψ(h) Measures suboptimality. where h = argmax Ψ(h). h 14 / 36

24 Ψ-regret and l-regret Ψ-regret of a classifier h: X { 1, 1}: Reg Ψ (h) = Ψ(h ) Ψ(h) Measures suboptimality. where h = argmax Ψ(h). h Surrogate loss l(y, f(x)) of a real-valued function f : X R. Used in training: logistic loss, squared loss, hinge loss, / 36

25 Ψ-regret and l-regret Ψ-regret of a classifier h: X { 1, 1}: Reg Ψ (h) = Ψ(h ) Ψ(h) Measures suboptimality. where h = argmax Ψ(h). h Surrogate loss l(y, f(x)) of a real-valued function f : X R. Used in training: logistic loss, squared loss, hinge loss,... Expected loss (l-risk) of f: Risk l (f) = E (x,y) [l(y, f(x))]. 14 / 36

26 Ψ-regret and l-regret Ψ-regret of a classifier h: X { 1, 1}: Reg Ψ (h) = Ψ(h ) Ψ(h) Measures suboptimality. where h = argmax Ψ(h). h Surrogate loss l(y, f(x)) of a real-valued function f : X R. Used in training: logistic loss, squared loss, hinge loss,... Expected loss (l-risk) of f: Risk l (f) = E (x,y) [l(y, f(x))]. l-regret of f: Reg l (f) = Risk l (f) Risk l (f ) where f = argmin Risk l (f). f 14 / 36

27 Ψ-regret and l-regret Ψ-regret of a classifier h: X { 1, 1}: Reg Ψ (h) = Ψ(h ) Ψ(h) Measures suboptimality. where h = argmax Ψ(h). h Surrogate loss l(y, f(x)) of a real-valued function f : X R. Used in training: logistic loss, squared loss, hinge loss,... Relate Ψ-regret of h f,θ to l-regret of f Expected loss (l-risk) of f: Risk l (f) = E (x,y) [l(y, f(x))]. l-regret of f: Reg l (f) = Risk l (f) Risk l (f ) where f = argmin Risk l (f). f 14 / 36

28 Examples of surrogate losses Logistic loss Risk minimizer f (x): f (x) = log ( l(y, ŷ) = log 1 + e yŷ). η(x), where η(x) = Pr(y = 1 x). 1 η(x) Invertible function of conditional probability η(x). 15 / 36

29 Examples of surrogate losses Logistic loss Risk minimizer f (x): f (x) = log ( l(y, ŷ) = log 1 + e yŷ). η(x), where η(x) = Pr(y = 1 x). 1 η(x) Invertible function of conditional probability η(x). Hinge loss l(y, ŷ) = (1 yŷ) +. Its risk minimizer f (x) is non-invertible: f (x) = sgn(η(x) 1/2). 15 / 36

30 Examples of surrogate losses loss /1 loss squared error loss logistic loss hinge loss exponential loss yf(x) 16 / 36

31 Examples of surrogate losses loss f (η) = ψ(η) η(f ) = ψ 1 (f ) squared error 2η 1 logistic log η 1 η exponential 1 2 log η 1 η 1+f e f 1 1+e 2f hinge sgn(η 1/2) doesn t exist 17 / 36

32 Proper composite losses l(y, f) is proper composite if there exists a strictly increasing link function ψ, such that: f (x) = ψ(η(x)), where η(x) = Pr(y = 1 x). Minimizing proper composite losses implies probability estimation. 18 / 36

33 Strongly proper composite losses [Agarwal, 2014] l(y, f) is λ-strongly proper composite if it is proper composite and for any f, x, and distribution y x: E y x [l(y, f(x)) l(y, f (x))] λ 2 ( η(x) ψ 1 (f(x)) ) 2. Technical condition. 19 / 36

34 Strongly proper composite losses: Examples loss f (η) = ψ(η) η(f ) = ψ 1 (f ) λ squared error 2η 1 logistic log η 1 η 1 exponential 2 log η 1 η 1+f e f e 2f 4 20 / 36

35 Main result Theorem for linear fractional measures If: Ψ(FP, FN) is linear-fractional, non-increasing in FP and FN, l is λ-strongly proper composite, Then, there exists a threshold θ, such that for any real-valued function f, Reg Ψ (h f,θ ) C Ψ 2 λ Regl (f). 21 / 36

36 Main result Theorem for linear fractional measures If: Ψ(FP, FN) is linear-fractional, non-increasing in FP and FN, l is λ-strongly proper composite, Then, there exists a threshold θ, such that for any real-valued function f, Reg Ψ (h f,θ ) C Ψ 2 λ Regl (f). metric C Ψ F β -measure 1+β 2 β 2 P Jaccard similarity J +1 P AM measure 1 2P (1 P ) 21 / 36

37 Main result Theorem for linear fractional measures If: Ψ(FP, FN) is linear-fractional, non-increasing in FP and FN, l is λ-strongly proper composite, Then, theresimilar exists atheorem thresholdfor θ, convex such that performance any real-valued function f, metrics (such as AMS 2 ) Reg Ψ (h f,θ ) C Ψ 2 λ Regl (f). metric F β -measure C Ψ 1+β 2 β 2 P Jaccard similarity J +1 P AM measure 1 2P (1 P ) 21 / 36

38 Explanation of the theorem The maximizer of Ψ is h (x) = sgn(η(x) η ) for some η. 22 / 36

39 Explanation of the theorem The maximizer of Ψ is h (x) = sgn(η(x) η ) for some η. The minimizer of l is f (x) = ψ(η(x)) for invertible ψ. 22 / 36

40 Explanation of the theorem The maximizer of Ψ is h (x) = sgn(η(x) η ) for some η. The minimizer of l is f (x) = ψ(η(x)) for invertible ψ. Thresholding f (x) at θ = ψ(η ) Thresholding η(x) at η. θ = ψ(η ) f (x) η η(x) = 1 1+e f (x) 22 / 36

41 Explanation of the theorem The maximizer of Ψ is h (x) = sgn(η(x) η ) for some η. The minimizer of l is f (x) = ψ(η(x)) for invertible ψ. Thresholding f (x) at θ = ψ(η ) Thresholding η(x) at η. θ = ψ(η ) f (x) η η(x) = 1 1+e f (x) Gradients of Ψ and λ measure local variations of Ψ and l when f is not equal to f. 22 / 36

42 The optimal threshold θ Ψ = classification accuracy = θ = 0 (η = 1 2 ). 23 / 36

43 The optimal threshold θ Ψ = classification accuracy = θ = 0 (η = 1 2 ). Ψ = weighted accuracy = η = w w + + w. 23 / 36

44 The optimal threshold θ Ψ = classification accuracy = θ = 0 (η = 1 2 ). Ψ = weighted accuracy = η = w w + + w. More complex Ψ = θ unknown (depends on Ψ ) / 36

45 The optimal threshold θ Ψ = classification accuracy = θ = 0 (η = 1 2 ). Ψ = weighted accuracy = η = w w + + w. More complex Ψ = θ unknown (depends on Ψ )... = estimate θ from validation data 23 / 36

46 Tuning the threshold Corollary Given real-valued function f, validation sample of size m, let: ˆθ = argmax θ Ψ(h f,θ ) ( Ψ = estimate of Ψ) Then, under the same assumptions and notation: ( ) 2 Reg Ψ (h f,ˆθ) C Ψ λ Regl (f) + O 1 m. 24 / 36

47 Tuning the threshold Corollary Given real-valued function f, validation sample of size m, let: ˆθ = argmax θ Ψ(h f,θ ) ( Ψ = estimate of Ψ) Then, under the same assumptions and notation: ( ) 2 Reg Ψ (h f,ˆθ) C Ψ λ Regl (f) + O 1 m. Learning standard binary classifier and tuning the threshold afterwards is able to recover the maximizer of Ψ in the limit. 24 / 36

48 Multilabel classification A vector of m labels y = (y 1,..., y m ) for each x. Multilabel classifier h(x) = (h 1 (x),..., h m (x)). False positive/negative rates for each label: FP i (h i ) = Pr(h i = 1, y i = 1), FN i (h i ) = Pr(h i = 1, y i = 1). 25 / 36

49 Multilabel classification A vector of m labels y = (y 1,..., y m ) for each x. Multilabel classifier h(x) = (h 1 (x),..., h m (x)). False positive/negative rates for each label: FP i (h i ) = Pr(h i = 1, y i = 1), FN i (h i ) = Pr(h i = 1, y i = 1). How to use binary classification Ψ in the multilabel setting? We extend our bounds to cover micro- and macro-averaging. 25 / 36

50 Micro- and macro-averaging Macro-averaging Average outside Ψ: Ψ macro (h) = m Ψ ( FP i (h i ), FN i (h i ) ). i=1 Our bound suggests that a separate threshold needs to be tuned for each label. 26 / 36

51 Micro- and macro-averaging Macro-averaging Average outside Ψ: Ψ macro (h) = m Ψ ( FP i (h i ), FN i (h i ) ). i=1 Our bound suggests that a separate threshold needs to be tuned for each label. Micro-averaging Average inside Ψ: ( 1 m Ψ micro (h) = Ψ FP i (h i ), 1 m m i=1 ) m FN i (h i ). i=1 Our bound suggests that all labels share a single threshold. 26 / 36

52 Experiments 27 / 36

53 Experimental results Two synthetic and two benchmark data sets. Surrogates: Logistic loss (LR) and hinge loss (SVM). Performance metrics: F-measure and AM measure. Minimize l (logistic or hinge) on training data + tune the threshold ˆθ on validation data (optimizing the metrics). 28 / 36

54 Synthetic experiment I X = {1, 2,..., 25} with Pr(x) uniform. For each x X, η(x) Unif[0, 1]. logistic loss surrogate (converges as expected) Regret of F measure, the AM measure, and logistic loss Logistic regret F measure regret AM regret # of training examples hinge loss surrogate (does not converge) Regret of F measure, the AM measure and hinge loss Hinge regret F measure regret AM regret # of training examples 29 / 36

55 Synthetic experiment II X = R 2, x N(0, I). Logistic model: η(x) = (1 + exp( a 0 a x)) 1. Convergence of F-measure Regret of the F measure and surrogate losses Logistic regret (LR) Hinge regret (SVM) F measure regret (LR) F measure regret (SVM) # of training examples Convergence of AM measure Regret of the AM measure and surrogate losses Logistic regret (LR) Hinge regret (SVM) AM regret (LR) AM regret (SVM) # of training examples 30 / 36

56 Benchmark data experiment a bit of surprise dataset #examples #features covtype.binary 581, gisette 7,000 5,000 covtype.binary gisette F measure F measure (LR) F measure (SVM) F measure F measure (LR) F measure (SVM) # of training examples # of training examples AM AM (LR) AM (SVM) AM AM (LR) AM (SVM) # of training examples # of training examples Logistic loss as expected, but also hinge loss surprisingly well. 31 / 36

57 Multilabel classification data set # labels # training examples # test examples #features scene yeast mediamill Surrogates: Logistic loss (LR) and hinge loss (SVM). Performance metrics: F-measure and AM measure. Macro-averaging (separate threshold for each label) and micro-averaging (single threshold for all labels). Cross-evaluation of algorithms tuned for micro-averaging in terms of macro-averaged metrics, and vice versa. 32 / 36

58 Multilabel classification: F measure F-macro F-micro scene Macro F measure LR Macro F SVM Macro F LR Micro F SVM Micro F Micro F measure LR Macro F SVM Macro F LR Micro F SVM Micro F # of training examples # of training examples yeast Macro F measure LR Macro F SVM Macro F LR Micro F SVM Micro F Micro F measure LR Macro F SVM Macro F LR Micro F SVM Micro F # of training examples # of training examples mediamill Macro F measure LR Macro F SVM Macro F LR Micro F SVM Micro F Micro F measure LR Macro F SVMs Macro F LR Micro F SVM Micro F # of training examples # of training examples 33 / 36

59 Multilabel classification: AM measure AM-macro AM-micro scene Macro AM LR Macro AM SVM Macro AM LR Micro AM SVM Micro AM Micro AM LR Macro AM SVM Macro AM LR Micro AM SVM Micro AM # of training examples # of training examples yeast Macro AM LR Macro AM SVM Macro AM LR Micro AM SVM Micro AM Micro AM LR Macro AM SVM Macro AM LR Micro AM SVM Micro AM # of training examples # of training examples mediamill Macro AM LR Macro AM SVM Macro AM LR Micro AM SVM Micro AM Micro AM LR Macro AM SVM Macro AM LR Micro AM SVM Micro AM # of training examples # of training examples 34 / 36

60 Open problem Ψ(h ) Ψ(h) const Risk l (f) Risk l (f ) f is the minimizer over all functions 35 / 36

61 Open problem Ψ(h ) Ψ(h) const Risk l (f) Risk l (f ) f is the minimizer over all functions We often optimize within some class F (e.g., linear functions). If f / F r.h.s. does not converge to 0. This can be beneficial non non-proper losses (e.g., hinge loss). Most often f / F, and a theory for this is needed. 35 / 36

62 Summary Theoretical analysis of the two-step approach to optimize generalized performance metrics for classification. Regret bounds for linear-fractional and convex functions optimized by means of strongly proper composite surrogates. The theorem relates convergence to the global risk minimizer. Can we say anything about convergence to the risk minimizer within some class of functions? Why does hinge loss perform so well if the risk minimizer is outside the family of classification functions. 36 / 36

A Simple Algorithm for Multilabel Ranking

A Simple Algorithm for Multilabel Ranking A Simple Algorithm for Multilabel Ranking Krzysztof Dembczyński 1 Wojciech Kot lowski 1 Eyke Hüllermeier 2 1 Intelligent Decision Support Systems Laboratory (IDSS), Poznań University of Technology, Poland

More information

Binary Classification, Multi-label Classification and Ranking: A Decision-theoretic Approach

Binary Classification, Multi-label Classification and Ranking: A Decision-theoretic Approach Binary Classification, Multi-label Classification and Ranking: A Decision-theoretic Approach Krzysztof Dembczyński and Wojciech Kot lowski Intelligent Decision Support Systems Laboratory (IDSS) Poznań

More information

Classification objectives COMS 4771

Classification objectives COMS 4771 Classification objectives COMS 4771 1. Recap: binary classification Scoring functions Consider binary classification problems with Y = { 1, +1}. 1 / 22 Scoring functions Consider binary classification

More information

Regret Analysis for Performance Metrics in Multi-Label Classification The Case of Hamming and Subset Zero-One Loss

Regret Analysis for Performance Metrics in Multi-Label Classification The Case of Hamming and Subset Zero-One Loss Regret Analysis for Performance Metrics in Multi-Label Classification The Case of Hamming and Subset Zero-One Loss Krzysztof Dembczyński 1, Willem Waegeman 2, Weiwei Cheng 1, and Eyke Hüllermeier 1 1 Knowledge

More information

Evaluation requires to define performance measures to be optimized

Evaluation requires to define performance measures to be optimized Evaluation Basic concepts Evaluation requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain (generalization error) approximation

More information

Evaluation. Andrea Passerini Machine Learning. Evaluation

Evaluation. Andrea Passerini Machine Learning. Evaluation Andrea Passerini passerini@disi.unitn.it Machine Learning Basic concepts requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain

More information

Beyond Accuracy: Scalable Classification with Complex Metrics

Beyond Accuracy: Scalable Classification with Complex Metrics Beyond Accuracy: Scalable Classification with Complex Metrics Sanmi Koyejo University of Illinois at Urbana-Champaign Joint work with B. Yan @UT Austin K. Zhong @UT Austin P. Ravikumar @CMU N. Natarajan

More information

Online Learning and Sequential Decision Making

Online Learning and Sequential Decision Making Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Online Learning

More information

Directly and Efficiently Optimizing Prediction Error and AUC of Linear Classifiers

Directly and Efficiently Optimizing Prediction Error and AUC of Linear Classifiers Directly and Efficiently Optimizing Prediction Error and AUC of Linear Classifiers Hiva Ghanbari Joint work with Prof. Katya Scheinberg Industrial and Systems Engineering Department US & Mexico Workshop

More information

SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels

SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels Karl Stratos June 21, 2018 1 / 33 Tangent: Some Loose Ends in Logistic Regression Polynomial feature expansion in logistic

More information

DS-GA 1003: Machine Learning and Computational Statistics Homework 6: Generalized Hinge Loss and Multiclass SVM

DS-GA 1003: Machine Learning and Computational Statistics Homework 6: Generalized Hinge Loss and Multiclass SVM DS-GA 1003: Machine Learning and Computational Statistics Homework 6: Generalized Hinge Loss and Multiclass SVM Due: Monday, April 11, 2016, at 6pm (Submit via NYU Classes) Instructions: Your answers to

More information

CIS 520: Machine Learning Oct 09, Kernel Methods

CIS 520: Machine Learning Oct 09, Kernel Methods CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed

More information

On the Consistency of AUC Pairwise Optimization

On the Consistency of AUC Pairwise Optimization On the Consistency of AUC Pairwise Optimization Wei Gao and Zhi-Hua Zhou National Key Laboratory for Novel Software Technology, Nanjing University Collaborative Innovation Center of Novel Software Technology

More information

Class 4: Classification. Quaid Morris February 11 th, 2011 ML4Bio

Class 4: Classification. Quaid Morris February 11 th, 2011 ML4Bio Class 4: Classification Quaid Morris February 11 th, 211 ML4Bio Overview Basic concepts in classification: overfitting, cross-validation, evaluation. Linear Discriminant Analysis and Quadratic Discriminant

More information

IFT Lecture 7 Elements of statistical learning theory

IFT Lecture 7 Elements of statistical learning theory IFT 6085 - Lecture 7 Elements of statistical learning theory This version of the notes has not yet been thoroughly checked. Please report any bugs to the scribes or instructor. Scribe(s): Brady Neal and

More information

Consistency of Nearest Neighbor Methods

Consistency of Nearest Neighbor Methods E0 370 Statistical Learning Theory Lecture 16 Oct 25, 2011 Consistency of Nearest Neighbor Methods Lecturer: Shivani Agarwal Scribe: Arun Rajkumar 1 Introduction In this lecture we return to the study

More information

Calibrated Surrogate Losses

Calibrated Surrogate Losses EECS 598: Statistical Learning Theory, Winter 2014 Topic 14 Calibrated Surrogate Losses Lecturer: Clayton Scott Scribe: Efrén Cruz Cortés Disclaimer: These notes have not been subjected to the usual scrutiny

More information

Binary Classification, Multi-label Classification and Ranking: A Decision-theoretic Approach

Binary Classification, Multi-label Classification and Ranking: A Decision-theoretic Approach Binary Classification, Multi-label Classification and Ranking: A Decision-theoretic Approach Krzysztof Dembczyński and Wojciech Kot lowski Intelligent Decision Support Systems Laboratory (IDSS) Poznań

More information

Ordinal Classification with Decision Rules

Ordinal Classification with Decision Rules Ordinal Classification with Decision Rules Krzysztof Dembczyński 1, Wojciech Kotłowski 1, and Roman Słowiński 1,2 1 Institute of Computing Science, Poznań University of Technology, 60-965 Poznań, Poland

More information

A Blended Metric for Multi-label Optimisation and Evaluation

A Blended Metric for Multi-label Optimisation and Evaluation A Blended Metric for Multi-label Optimisation and Evaluation Laurence A. F. Park 1 and Jesse Read 1 School of Computing, Engineering and Mathematics, Western Sydney University, Australia. lapark@scem.westernsydney.edu.au

More information

Optimal Classification with Multivariate Losses

Optimal Classification with Multivariate Losses Nagarajan Natarajan Microsoft Research, INDIA Oluwasanmi Koyejo Stanford University, CA & University of Illinois at Urbana-Champaign, IL, USA Pradeep Ravikumar Inderjit S. Dhillon The University of Texas

More information

CSCI-567: Machine Learning (Spring 2019)

CSCI-567: Machine Learning (Spring 2019) CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March

More information

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative

More information

Learning with Rejection

Learning with Rejection Learning with Rejection Corinna Cortes 1, Giulia DeSalvo 2, and Mehryar Mohri 2,1 1 Google Research, 111 8th Avenue, New York, NY 2 Courant Institute of Mathematical Sciences, 251 Mercer Street, New York,

More information

Statistical Properties of Large Margin Classifiers

Statistical Properties of Large Margin Classifiers Statistical Properties of Large Margin Classifiers Peter Bartlett Division of Computer Science and Department of Statistics UC Berkeley Joint work with Mike Jordan, Jon McAuliffe, Ambuj Tewari. slides

More information

Machine Learning. Linear Models. Fabio Vandin October 10, 2017

Machine Learning. Linear Models. Fabio Vandin October 10, 2017 Machine Learning Linear Models Fabio Vandin October 10, 2017 1 Linear Predictors and Affine Functions Consider X = R d Affine functions: L d = {h w,b : w R d, b R} where ( d ) h w,b (x) = w, x + b = w

More information

Day 4: Classification, support vector machines

Day 4: Classification, support vector machines Day 4: Classification, support vector machines Introduction to Machine Learning Summer School June 18, 2018 - June 29, 2018, Chicago Instructor: Suriya Gunasekar, TTI Chicago 21 June 2018 Topics so far

More information

Variations of Logistic Regression with Stochastic Gradient Descent

Variations of Logistic Regression with Stochastic Gradient Descent Variations of Logistic Regression with Stochastic Gradient Descent Panqu Wang(pawang@ucsd.edu) Phuc Xuan Nguyen(pxn002@ucsd.edu) January 26, 2012 Abstract In this paper, we extend the traditional logistic

More information

Qualifying Exam in Machine Learning

Qualifying Exam in Machine Learning Qualifying Exam in Machine Learning October 20, 2009 Instructions: Answer two out of the three questions in Part 1. In addition, answer two out of three questions in two additional parts (choose two parts

More information

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. CS 189 Spring 013 Introduction to Machine Learning Final You have 3 hours for the exam. The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. Please

More information

Support Vector Machines and Kernel Methods

Support Vector Machines and Kernel Methods 2018 CS420 Machine Learning, Lecture 3 Hangout from Prof. Andrew Ng. http://cs229.stanford.edu/notes/cs229-notes3.pdf Support Vector Machines and Kernel Methods Weinan Zhang Shanghai Jiao Tong University

More information

Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers

Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers Erin Allwein, Robert Schapire and Yoram Singer Journal of Machine Learning Research, 1:113-141, 000 CSE 54: Seminar on Learning

More information

STATISTICAL BEHAVIOR AND CONSISTENCY OF CLASSIFICATION METHODS BASED ON CONVEX RISK MINIMIZATION

STATISTICAL BEHAVIOR AND CONSISTENCY OF CLASSIFICATION METHODS BASED ON CONVEX RISK MINIMIZATION STATISTICAL BEHAVIOR AND CONSISTENCY OF CLASSIFICATION METHODS BASED ON CONVEX RISK MINIMIZATION Tong Zhang The Annals of Statistics, 2004 Outline Motivation Approximation error under convex risk minimization

More information

Support vector machines Lecture 4

Support vector machines Lecture 4 Support vector machines Lecture 4 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, and Carlos Guestrin Q: What does the Perceptron mistake bound tell us? Theorem: The

More information

On the Bayes-Optimality of F-Measure Maximizers

On the Bayes-Optimality of F-Measure Maximizers Journal of Machine Learning Research 15 (2014) 3513-3568 Submitted 10/13; Revised 6/14; Published 11/14 On the Bayes-Optimality of F-Measure Maximizers Willem Waegeman willem.waegeman@ugent.be Department

More information

Lecture 16: Perceptron and Exponential Weights Algorithm

Lecture 16: Perceptron and Exponential Weights Algorithm EECS 598-005: Theoretical Foundations of Machine Learning Fall 2015 Lecture 16: Perceptron and Exponential Weights Algorithm Lecturer: Jacob Abernethy Scribes: Yue Wang, Editors: Weiqing Yu and Andrew

More information

4.1 Online Convex Optimization

4.1 Online Convex Optimization CS/CNS/EE 53: Advanced Topics in Machine Learning Topic: Online Convex Optimization and Online SVM Lecturer: Daniel Golovin Scribe: Xiaodi Hou Date: Jan 3, 4. Online Convex Optimization Definition 4..

More information

Warm up: risk prediction with logistic regression

Warm up: risk prediction with logistic regression Warm up: risk prediction with logistic regression Boss gives you a bunch of data on loans defaulting or not: {(x i,y i )} n i= x i 2 R d, y i 2 {, } You model the data as: P (Y = y x, w) = + exp( yw T

More information

Introduction to Machine Learning Lecture 11. Mehryar Mohri Courant Institute and Google Research

Introduction to Machine Learning Lecture 11. Mehryar Mohri Courant Institute and Google Research Introduction to Machine Learning Lecture 11 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Boosting Mehryar Mohri - Introduction to Machine Learning page 2 Boosting Ideas Main idea:

More information

Lecture 2 Machine Learning Review

Lecture 2 Machine Learning Review Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things

More information

15-388/688 - Practical Data Science: Nonlinear modeling, cross-validation, regularization, and evaluation

15-388/688 - Practical Data Science: Nonlinear modeling, cross-validation, regularization, and evaluation 15-388/688 - Practical Data Science: Nonlinear modeling, cross-validation, regularization, and evaluation J. Zico Kolter Carnegie Mellon University Fall 2016 1 Outline Example: return to peak demand prediction

More information

Bayesian Learning (II)

Bayesian Learning (II) Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP

More information

Lecture 4 Discriminant Analysis, k-nearest Neighbors

Lecture 4 Discriminant Analysis, k-nearest Neighbors Lecture 4 Discriminant Analysis, k-nearest Neighbors Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University. Email: fredrik.lindsten@it.uu.se fredrik.lindsten@it.uu.se

More information

Analysis and Optimization of Loss Functions for Multiclass, Top-k, and Multilabel Classification

Analysis and Optimization of Loss Functions for Multiclass, Top-k, and Multilabel Classification Analysis and Optimization of Loss Functions for Multiclass, Top-k, and Multilabel Classification arxiv:62.03663v [cs.cv] 2 Dec 206 Maksim Lapin, Matthias Hein, and Bernt Schiele Abstract Top-k error is

More information

Online Advertising is Big Business

Online Advertising is Big Business Online Advertising Online Advertising is Big Business Multiple billion dollar industry $43B in 2013 in USA, 17% increase over 2012 [PWC, Internet Advertising Bureau, April 2013] Higher revenue in USA

More information

A Magiv CV Theory for Large-Margin Classifiers

A Magiv CV Theory for Large-Margin Classifiers A Magiv CV Theory for Large-Margin Classifiers Hui Zou School of Statistics, University of Minnesota June 30, 2018 Joint work with Boxiang Wang Outline 1 Background 2 Magic CV formula 3 Magic support vector

More information

AdaBoost and other Large Margin Classifiers: Convexity in Classification

AdaBoost and other Large Margin Classifiers: Convexity in Classification AdaBoost and other Large Margin Classifiers: Convexity in Classification Peter Bartlett Division of Computer Science and Department of Statistics UC Berkeley Joint work with Mikhail Traskin. slides at

More information

Machine Learning And Applications: Supervised Learning-SVM

Machine Learning And Applications: Supervised Learning-SVM Machine Learning And Applications: Supervised Learning-SVM Raphaël Bournhonesque École Normale Supérieure de Lyon, Lyon, France raphael.bournhonesque@ens-lyon.fr 1 Supervised vs unsupervised learning Machine

More information

Least Squares Regression

Least Squares Regression CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the

More information

INTRODUCTION TO DATA SCIENCE

INTRODUCTION TO DATA SCIENCE INTRODUCTION TO DATA SCIENCE JOHN P DICKERSON Lecture #13 3/9/2017 CMSC320 Tuesdays & Thursdays 3:30pm 4:45pm ANNOUNCEMENTS Mini-Project #1 is due Saturday night (3/11): Seems like people are able to do

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

Relationship between Loss Functions and Confirmation Measures

Relationship between Loss Functions and Confirmation Measures Relationship between Loss Functions and Confirmation Measures Krzysztof Dembczyński 1 and Salvatore Greco 2 and Wojciech Kotłowski 1 and Roman Słowiński 1,3 1 Institute of Computing Science, Poznań University

More information

Machine Learning Practice Page 2 of 2 10/28/13

Machine Learning Practice Page 2 of 2 10/28/13 Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes

More information

Machine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang

Machine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang Machine Learning Basics Lecture 2: Linear Classification Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d.

More information

18.9 SUPPORT VECTOR MACHINES

18.9 SUPPORT VECTOR MACHINES 744 Chapter 8. Learning from Examples is the fact that each regression problem will be easier to solve, because it involves only the examples with nonzero weight the examples whose kernels overlap the

More information

Logistic Regression. William Cohen

Logistic Regression. William Cohen Logistic Regression William Cohen 1 Outline Quick review classi5ication, naïve Bayes, perceptrons new result for naïve Bayes Learning as optimization Logistic regression via gradient ascent Over5itting

More information

The AdaBoost algorithm =1/n for i =1,...,n 1) At the m th iteration we find (any) classifier h(x; ˆθ m ) for which the weighted classification error m

The AdaBoost algorithm =1/n for i =1,...,n 1) At the m th iteration we find (any) classifier h(x; ˆθ m ) for which the weighted classification error m ) Set W () i The AdaBoost algorithm =1/n for i =1,...,n 1) At the m th iteration we find (any) classifier h(x; ˆθ m ) for which the weighted classification error m m =.5 1 n W (m 1) i y i h(x i ; 2 ˆθ

More information

Bayesian Decision Theory

Bayesian Decision Theory Introduction to Pattern Recognition [ Part 4 ] Mahdi Vasighi Remarks It is quite common to assume that the data in each class are adequately described by a Gaussian distribution. Bayesian classifier is

More information

Decoupled Collaborative Ranking

Decoupled Collaborative Ranking Decoupled Collaborative Ranking Jun Hu, Ping Li April 24, 2017 Jun Hu, Ping Li WWW2017 April 24, 2017 1 / 36 Recommender Systems Recommendation system is an information filtering technique, which provides

More information

Notes on the framework of Ando and Zhang (2005) 1 Beyond learning good functions: learning good spaces

Notes on the framework of Ando and Zhang (2005) 1 Beyond learning good functions: learning good spaces Notes on the framework of Ando and Zhang (2005 Karl Stratos 1 Beyond learning good functions: learning good spaces 1.1 A single binary classification problem Let X denote the problem domain. Suppose we

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

Online Advertising is Big Business

Online Advertising is Big Business Online Advertising Online Advertising is Big Business Multiple billion dollar industry $43B in 2013 in USA, 17% increase over 2012 [PWC, Internet Advertising Bureau, April 2013] Higher revenue in USA

More information

Lecture 3: Multiclass Classification

Lecture 3: Multiclass Classification Lecture 3: Multiclass Classification Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Some slides are adapted from Vivek Skirmar and Dan Roth CS6501 Lecture 3 1 Announcement v Please enroll in

More information

Cutting Plane Training of Structural SVM

Cutting Plane Training of Structural SVM Cutting Plane Training of Structural SVM Seth Neel University of Pennsylvania sethneel@wharton.upenn.edu September 28, 2017 Seth Neel (Penn) Short title September 28, 2017 1 / 33 Overview Structural SVMs

More information

Surrogate loss functions, divergences and decentralized detection

Surrogate loss functions, divergences and decentralized detection Surrogate loss functions, divergences and decentralized detection XuanLong Nguyen Department of Electrical Engineering and Computer Science U.C. Berkeley Advisors: Michael Jordan & Martin Wainwright 1

More information

Logistic Regression. Machine Learning Fall 2018

Logistic Regression. Machine Learning Fall 2018 Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes

More information

Lecture 8. Instructor: Haipeng Luo

Lecture 8. Instructor: Haipeng Luo Lecture 8 Instructor: Haipeng Luo Boosting and AdaBoost In this lecture we discuss the connection between boosting and online learning. Boosting is not only one of the most fundamental theories in machine

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

ML4NLP Multiclass Classification

ML4NLP Multiclass Classification ML4NLP Multiclass Classification CS 590NLP Dan Goldwasser Purdue University dgoldwas@purdue.edu Social NLP Last week we discussed the speed-dates paper. Interesting perspective on NLP problems- Can we

More information

Machine Learning. Ludovic Samper. September 1st, Antidot. Ludovic Samper (Antidot) Machine Learning September 1st, / 77

Machine Learning. Ludovic Samper. September 1st, Antidot. Ludovic Samper (Antidot) Machine Learning September 1st, / 77 Machine Learning Ludovic Samper Antidot September 1st, 2015 Ludovic Samper (Antidot) Machine Learning September 1st, 2015 1 / 77 Antidot Software vendor since 1999 Paris, Lyon, Aix-en-Provence 45 employees

More information

Lecture 10 February 23

Lecture 10 February 23 EECS 281B / STAT 241B: Advanced Topics in Statistical LearningSpring 2009 Lecture 10 February 23 Lecturer: Martin Wainwright Scribe: Dave Golland Note: These lecture notes are still rough, and have only

More information

SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning

SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning Mark Schmidt University of British Columbia, May 2016 www.cs.ubc.ca/~schmidtm/svan16 Some images from this lecture are

More information

ECS171: Machine Learning

ECS171: Machine Learning ECS171: Machine Learning Lecture 3: Linear Models I (LFD 3.2, 3.3) Cho-Jui Hsieh UC Davis Jan 17, 2018 Linear Regression (LFD 3.2) Regression Classification: Customer record Yes/No Regression: predicting

More information

Algorithms for Predicting Structured Data

Algorithms for Predicting Structured Data 1 / 70 Algorithms for Predicting Structured Data Thomas Gärtner / Shankar Vembu Fraunhofer IAIS / UIUC ECML PKDD 2010 Structured Prediction 2 / 70 Predicting multiple outputs with complex internal structure

More information

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION 1 Outline Basic terminology Features Training and validation Model selection Error and loss measures Statistical comparison Evaluation measures 2 Terminology

More information

Listwise Approach to Learning to Rank Theory and Algorithm

Listwise Approach to Learning to Rank Theory and Algorithm Listwise Approach to Learning to Rank Theory and Algorithm Fen Xia *, Tie-Yan Liu Jue Wang, Wensheng Zhang and Hang Li Microsoft Research Asia Chinese Academy of Sciences document s Learning to Rank for

More information

MLCC 2017 Regularization Networks I: Linear Models

MLCC 2017 Regularization Networks I: Linear Models MLCC 2017 Regularization Networks I: Linear Models Lorenzo Rosasco UNIGE-MIT-IIT June 27, 2017 About this class We introduce a class of learning algorithms based on Tikhonov regularization We study computational

More information

Advanced Introduction to Machine Learning CMU-10715

Advanced Introduction to Machine Learning CMU-10715 Advanced Introduction to Machine Learning CMU-10715 Risk Minimization Barnabás Póczos What have we seen so far? Several classification & regression algorithms seem to work fine on training datasets: Linear

More information

Support Vector Machines, Kernel SVM

Support Vector Machines, Kernel SVM Support Vector Machines, Kernel SVM Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 27, 2017 1 / 40 Outline 1 Administration 2 Review of last lecture 3 SVM

More information

Introduction to Machine Learning (67577) Lecture 3

Introduction to Machine Learning (67577) Lecture 3 Introduction to Machine Learning (67577) Lecture 3 Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem General Learning Model and Bias-Complexity tradeoff Shai Shalev-Shwartz

More information

Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron

Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron CS446: Machine Learning, Fall 2017 Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron Lecturer: Sanmi Koyejo Scribe: Ke Wang, Oct. 24th, 2017 Agenda Recap: SVM and Hinge loss, Representer

More information

Randomized Smoothing for Stochastic Optimization

Randomized Smoothing for Stochastic Optimization Randomized Smoothing for Stochastic Optimization John Duchi Peter Bartlett Martin Wainwright University of California, Berkeley NIPS Big Learn Workshop, December 2011 Duchi (UC Berkeley) Smoothing and

More information

ABC-LogitBoost for Multi-Class Classification

ABC-LogitBoost for Multi-Class Classification Ping Li, Cornell University ABC-Boost BTRY 6520 Fall 2012 1 ABC-LogitBoost for Multi-Class Classification Ping Li Department of Statistical Science Cornell University 2 4 6 8 10 12 14 16 2 4 6 8 10 12

More information

Regret Analysis for Performance Metrics in Multi-Label Classification: The Case of Hamming and Subset Zero-One Loss

Regret Analysis for Performance Metrics in Multi-Label Classification: The Case of Hamming and Subset Zero-One Loss Regret Analysis for Performance Metrics in Multi-Label Classification: The Case of Hamming and Subset Zero-One Loss Krzysztof Dembczyński 1,3, Willem Waegeman 2, Weiwei Cheng 1, and Eyke Hüllermeier 1

More information

Subgradient Descent. David S. Rosenberg. New York University. February 7, 2018

Subgradient Descent. David S. Rosenberg. New York University. February 7, 2018 Subgradient Descent David S. Rosenberg New York University February 7, 2018 David S. Rosenberg (New York University) DS-GA 1003 / CSCI-GA 2567 February 7, 2018 1 / 43 Contents 1 Motivation and Review:

More information

Introduction to Support Vector Machines

Introduction to Support Vector Machines Introduction to Support Vector Machines Shivani Agarwal Support Vector Machines (SVMs) Algorithm for learning linear classifiers Motivated by idea of maximizing margin Efficient extension to non-linear

More information

arxiv: v1 [stat.ml] 8 Feb 2014

arxiv: v1 [stat.ml] 8 Feb 2014 F1-Optimal Thresholding in the Multi-Label Setting arxiv:1402.1892v1 [stat.ml] 8 Feb 2014 Zachary Chase Lipton, Charles Elkan, Balakrishnan (Murali) Naryanaswamy {zlipton,elkan,muralib}@cs.ucsd.edu University

More information

1. Implement AdaBoost with boosting stumps and apply the algorithm to the. Solution:

1. Implement AdaBoost with boosting stumps and apply the algorithm to the. Solution: Mehryar Mohri Foundations of Machine Learning Courant Institute of Mathematical Sciences Homework assignment 3 October 31, 2016 Due: A. November 11, 2016; B. November 22, 2016 A. Boosting 1. Implement

More information

Machine Learning for NLP

Machine Learning for NLP Machine Learning for NLP Linear Models Joakim Nivre Uppsala University Department of Linguistics and Philology Slides adapted from Ryan McDonald, Google Research Machine Learning for NLP 1(26) Outline

More information

Lecture 18: Kernels Risk and Loss Support Vector Regression. Aykut Erdem December 2016 Hacettepe University

Lecture 18: Kernels Risk and Loss Support Vector Regression. Aykut Erdem December 2016 Hacettepe University Lecture 18: Kernels Risk and Loss Support Vector Regression Aykut Erdem December 2016 Hacettepe University Administrative We will have a make-up lecture on next Saturday December 24, 2016 Presentations

More information

Methods and Criteria for Model Selection. CS57300 Data Mining Fall Instructor: Bruno Ribeiro

Methods and Criteria for Model Selection. CS57300 Data Mining Fall Instructor: Bruno Ribeiro Methods and Criteria for Model Selection CS57300 Data Mining Fall 2016 Instructor: Bruno Ribeiro Goal } Introduce classifier evaluation criteria } Introduce Bias x Variance duality } Model Assessment }

More information

Classification with Reject Option

Classification with Reject Option Classification with Reject Option Bartlett and Wegkamp (2008) Wegkamp and Yuan (2010) February 17, 2012 Outline. Introduction.. Classification with reject option. Spirit of the papers BW2008.. Infinite

More information

On the Design of Loss Functions for Classification: theory, robustness to outliers, and SavageBoost

On the Design of Loss Functions for Classification: theory, robustness to outliers, and SavageBoost On the Design of Loss Functions for Classification: theory, robustness to outliers, and SavageBoost Hamed Masnadi-Shirazi Statistical Visual Computing Laboratory, University of California, San Diego La

More information

Performance Evaluation

Performance Evaluation Performance Evaluation David S. Rosenberg Bloomberg ML EDU October 26, 2017 David S. Rosenberg (Bloomberg ML EDU) October 26, 2017 1 / 36 Baseline Models David S. Rosenberg (Bloomberg ML EDU) October 26,

More information

Linear Discriminant Analysis Based in part on slides from textbook, slides of Susan Holmes. November 9, Statistics 202: Data Mining

Linear Discriminant Analysis Based in part on slides from textbook, slides of Susan Holmes. November 9, Statistics 202: Data Mining Linear Discriminant Analysis Based in part on slides from textbook, slides of Susan Holmes November 9, 2012 1 / 1 Nearest centroid rule Suppose we break down our data matrix as by the labels yielding (X

More information

Fast Rates for Estimation Error and Oracle Inequalities for Model Selection

Fast Rates for Estimation Error and Oracle Inequalities for Model Selection Fast Rates for Estimation Error and Oracle Inequalities for Model Selection Peter L. Bartlett Computer Science Division and Department of Statistics University of California, Berkeley bartlett@cs.berkeley.edu

More information

On Equivalence Relationships Between Classification and Ranking Algorithms

On Equivalence Relationships Between Classification and Ranking Algorithms On Equivalence Relationships Between Classification and Ranking Algorithms On Equivalence Relationships Between Classification and Ranking Algorithms Şeyda Ertekin Cynthia Rudin MIT Sloan School of Management

More information

Machine Learning and Data Mining. Linear classification. Kalev Kask

Machine Learning and Data Mining. Linear classification. Kalev Kask Machine Learning and Data Mining Linear classification Kalev Kask Supervised learning Notation Features x Targets y Predictions ŷ = f(x ; q) Parameters q Program ( Learner ) Learning algorithm Change q

More information

Midterm exam CS 189/289, Fall 2015

Midterm exam CS 189/289, Fall 2015 Midterm exam CS 189/289, Fall 2015 You have 80 minutes for the exam. Total 100 points: 1. True/False: 36 points (18 questions, 2 points each). 2. Multiple-choice questions: 24 points (8 questions, 3 points

More information

Machine Learning. Lecture 3: Logistic Regression. Feng Li.

Machine Learning. Lecture 3: Logistic Regression. Feng Li. Machine Learning Lecture 3: Logistic Regression Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2016 Logistic Regression Classification

More information