Classification with Reject Option
|
|
- Jennifer Robertson
- 6 years ago
- Views:
Transcription
1 Classification with Reject Option Bartlett and Wegkamp (2008) Wegkamp and Yuan (2010) February 17, 2012
2 Outline. Introduction.. Classification with reject option. Spirit of the papers BW Infinite sample consistency.. Bounding excess risk.. Convergence of excess risk. Results.. Generalized hinge loss (SVM) BW Further generalization WY2010
3 Introduction reject option classification: observed data: {X, Y } X {±1} discriminant function: f : X R, typically classify by sign(f ) reject option: withhold for cost d when f < δ, and proceed as usual if f > δ for specified d, δ motivation: want to avoid making hard decisions
4 Introduction loss loss as a function of Yf (X ) typically: l(z) = { 1 if z 0 0 if z > 0 reject option: 1 if z < δ l d,δ (z) = 0 if z δ d if z δ for some d [0, 1/2], δ (0, 1)
5 Introduction loss loss as a function of Yf (X ) reject option: 1 if z < δ l d,δ (z) = 0 if z δ d if z δ for some d [0, 1/2], δ (0, 1) if d 0 always reject, if d > 1/2 never reject intuition: d is necessary amount of confidence for classifying
6 Introduction Bayes rule minimize L d,δ (f ) = E{l d,δ (Yf (X ))} f d (X ) = where η(x ) = P(Y = +1 X ) +1 if η(x ) > 1 d 0 if d η(x ) 1 d 1 if η(x ) < d follows from conditional expectation: E Y X {l d,δ (Yf (X ))} = η(x ) l d,δ (f (X )) + (1 η(x )) l d,δ ( f (X ))
7 Introduction Bayes rule minimize L d,δ (f ) = E{l d,δ (Yf (X ))} f d (X ) = where η(x ) = P(Y = +1 X ) +1 if η(x ) > 1 d 0 if d η(x ) 1 d 1 if η(x ) < d in practice, minimize empirical risk: P n (f ) = 1 n n l d,δ (Y i f (X i )) i=1 minimizing P n is NP hard
8 General spirit propose convex surrogate loss φ d consider φ d risk: Q(f ) = E{φ d (Yf (X )} minimize empirical risk: Q n (f ) let ˆf n = argmin f F Q n study the performance of φ d.. consistency.. bounding excess risk.. convergence rates for empirical maximizer
9 General spirit. consistency.. f d = argmin f Q(f ).. asymptotic minimizer of φ d risk is same as for l d,δ risk.. able to recover f d. bounding excess risk.. want some non-decreasing function ρ( ) such that: L d,δ (f ) L d,δ (f d ) ρ(q(f ) Q(f d )) L d,δ (f ) ρ( Q(f )).. guarantee performance of f. convergence rate for empirical maximizer.. compute rate for Q(ˆf n ) 0.. combine with bound on excess risk
10 Generalized hinge loss convex surrogate of the form: 1 ( ) 1 d d z if z 0 φ d (z) = 1 z if 0 < z 1 0 if z > 1 consistency Proposition 1 The Bayes discriminant function fd Q with: dq(f d ) = L d,δ(f d ) minimizes risk
11 Generalized hinge loss convex surrogate of the form: 1 ( ) 1 d d z if z 0 φ d (z) = 1 z if 0 < z 1 0 if z > 1 excess risk bounds note: for δ 1 d, l d,φ (X ) φ d (X ) Theorem 2 for fixed f, d [0, 1/2): for all δ (0, 1/2], for all δ [1/2, 1 d], L d,δ (f ) d δ Q(f ) L d,δ(f ) Q(f ) recommend using δ = 1/2
12 Generalized hinge loss convex surrogate of the form: 1 ( ) 1 d d z if z 0 φ d (z) = 1 z if 0 < z 1 0 if z > 1 excess risk bounds note: for δ 1 d, l d,φ (X ) φ d (X ) Theorem 2 for fixed f, (δ, d) = (0, 1/2): L(f ) Q(f ) risk bounds for usual classification with hinge loss
13 Generalized hinge loss convex surrogate of the form: 1 ( ) 1 d d z if z 0 φ d (z) = 1 z if 0 < z 1 0 if z > 1 convergence rates bounds of the form: EQ(ˆf n ) Q(fd ) 2 inf ( Q(f ) Q(f d ) ) + ɛ n f F. approximation error for space F.. method of sieves. estimation error for finite sample n authors results
14 Generalized hinge loss convergence rates Margin condition: we say that η satisfies the margin condition at d with exponent α > 0 if there is a c 1 such that for all t > 0, P{ η(x ) d t} ct α P{ η(x ) (1 d) t} ct α Bernstein class: we say that G L 2 (P) is a (β, B)-Bernstein class with respect to the probability measure P, with β (0, 1] and B 1 if every g G satisfies Pg 2 B(Pg) β
15 Generalized hinge loss convergence rates Lemma 8 if η satisfies the margin condition at d with exponent α, then for any class F of measurable uniformly bounded functions, the class G = {g f : f F} has a Bernstein exponent β = where g f (x, y) = φ d (yf (x)) φ d (yf d (x)). the excess φ d loss. E(g f (X, Y )) = Q(f ) α 1+α
16 Generalized hinge loss convergence rates Theorem 12 if η satisfies the margin condition at d with exponent α, F is a countable class of functions f : X R satisfying f B, and F satisfies log N(ɛ, L, F) Cɛ p for all ɛ > 0 and some 0 p 2, then there exists a constant C independent of n, such that EQ(ˆf n ) Q(fd ) 2 inf ( Q(f ) Q(f d ) ) + C n 1+α 2+p+α+pα f F
17 Generalized hinge loss - QCQP can be solved as the following quadratic constraint quadratic programming problem (QCQP) Theorem 5 for any x 1,..., x n X and y 1,..., y n {±1}, let ˆf H be the solution to where r > 0 minimize f n φ d (y i f (x i )) i=1 such that f 2 r 2
18 Generalized hinge loss - QCQP can be solved as the following quadratic constraint quadratic programming problem (QCQP) Theorem 5 Then we can represent ˆf as the finite sum ˆf (x) = n i=1 ˆα ik(x i, x), where ˆα 1,..., ˆα n is the solution to the following QCQP: 1 n min α i,ξ i γ i n i=1 such that i,j ( ξ i + 1 2d d γ i ) α i α j K(x i, x j ) r 2 n ξ i 0, ξ i 1 y i α j K(x i, x j ) j=1 n γ i 0, γ i y i α j K(x i, x j ) j=1
19 Generalization generalize to the following loss 1 if g(x ) Y and g(x ) 0 l[g(x ), Y ] = d if g(x ) = 0 0 if g(x ) = Y with classification rule +1 if f (X ) > δ C(f (X ), δ) = 0 if f (X ) δ 1 if f (X ) < δ equivalence to previous case can be seen by: l[c(f (X ), δ), Y ] l d,δ (Yf (X )) since looking at a family of surrogates we want to separate δ from the loss function
20 Generalization consistency Theorem 1 assue for convex φ, the classification rule C(fφ, δ) for some δ > 0 is infinite sample consistent if and only if. φ (δ) and φ ( δ) both exist. φ (0) < 0. φ (δ) φ (δ)+φ ( δ) = d for strictly convex φ, at most one value of δ for generalized hinge loss, holds for δ (0, 1)
21 Generalization excess risk bounds Theorem 9 assume φ is a convex infinite sample consistent surrogate and suppose that there exists constants C > 0 and s 1 such that then, η d s C s Q η ( δ) (1 η) d s C s Q η (δ) R[C(f, δ)] 2C[ Q(f )] 1/s where Q η (z) = η(x )φ(z) + (1 η(x ))φ( z)
22 Generalization excess risk bounds Theorem 10 further assume 2 s (η 1/2) s + C s Q η ( δ) 2 s ((1 η) 1/2) s + C s Q η (δ) then, R[C(f, δ)] C[ Q(f )] 1/s Theorem 11 if margin condition holds for some c 1 and α 0, then, R[C(f, δ)] K[ Q(f )] 1/(s+β βs) where K depends on c and α, and β = α/(1 + α)
23 Generalization convergence rates Theorem 18 assume that f B for all f F and let γ (0, 1). with probability at least 1 γ Q(ˆf n ) inf Q(f ) + 3L ( L 2 f F n + 8 2c + B ) log(nn /γ) 6 n where N n = N(1/n, L, F) relies on:. Lipschitz continuity of φ. constraint on modulus of convexity of Q
24 Generalization convergence rates Corollary 19 under the assumptions of Theorems 9 and 18, we have, with probability at least 1 γ { R(C(ˆf n, δ)) 2C inf Q(f ) + 3L ( L 2 f F n + 8 2c + LB 3 ) log(nn /γ) n } 1/s if the margin condition also holds, with probability at least 1 γ { R(C(ˆf n, δ)) K inf Q(f ) + 3L ( L 2 f F n + 8 2c + LB 3 ) log(nn /γ) n } 1/(sβ βs)
25 Miscellanea other losses considered: least squares exponential logistic squared hinge DWD further extensions: asymmetric loss hinge loss with L 1 regularization YW Bernoulli, 2011
Lecture 10 February 23
EECS 281B / STAT 241B: Advanced Topics in Statistical LearningSpring 2009 Lecture 10 February 23 Lecturer: Martin Wainwright Scribe: Dave Golland Note: These lecture notes are still rough, and have only
More informationLearning with Rejection
Learning with Rejection Corinna Cortes 1, Giulia DeSalvo 2, and Mehryar Mohri 2,1 1 Google Research, 111 8th Avenue, New York, NY 2 Courant Institute of Mathematical Sciences, 251 Mercer Street, New York,
More informationSTATISTICAL BEHAVIOR AND CONSISTENCY OF CLASSIFICATION METHODS BASED ON CONVEX RISK MINIMIZATION
STATISTICAL BEHAVIOR AND CONSISTENCY OF CLASSIFICATION METHODS BASED ON CONVEX RISK MINIMIZATION Tong Zhang The Annals of Statistics, 2004 Outline Motivation Approximation error under convex risk minimization
More informationStatistical Properties of Large Margin Classifiers
Statistical Properties of Large Margin Classifiers Peter Bartlett Division of Computer Science and Department of Statistics UC Berkeley Joint work with Mike Jordan, Jon McAuliffe, Ambuj Tewari. slides
More informationEmpirical Risk Minimization
Empirical Risk Minimization Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Introduction PAC learning ERM in practice 2 General setting Data X the input space and Y the output space
More informationPETER L. BARTLETT AND MARTEN H. WEGKAMP
CLASSIFICATION WITH A REJECT OPTION USING A HINGE LOSS PETER L. BARTLETT AND MARTEN H. WEGKAMP Abstract. We consier the problem of binary classification where the classifier can, for a particular cost,
More informationMachine Learning. Support Vector Machines. Fabio Vandin November 20, 2017
Machine Learning Support Vector Machines Fabio Vandin November 20, 2017 1 Classification and Margin Consider a classification problem with two classes: instance set X = R d label set Y = { 1, 1}. Training
More informationBayes rule and Bayes error. Donglin Zeng, Department of Biostatistics, University of North Carolina
Bayes rule and Bayes error Definition If f minimizes E[L(Y, f (X))], then f is called a Bayes rule (associated with the loss function L(y, f )) and the resulting prediction error rate, E[L(Y, f (X))],
More informationFast learning rates for plug-in classifiers under the margin condition
Fast learning rates for plug-in classifiers under the margin condition Jean-Yves Audibert 1 Alexandre B. Tsybakov 2 1 Certis ParisTech - Ecole des Ponts, France 2 LPMA Université Pierre et Marie Curie,
More informationChapter 9. Support Vector Machine. Yongdai Kim Seoul National University
Chapter 9. Support Vector Machine Yongdai Kim Seoul National University 1. Introduction Support Vector Machine (SVM) is a classification method developed by Vapnik (1996). It is thought that SVM improved
More informationIndirect Rule Learning: Support Vector Machines. Donglin Zeng, Department of Biostatistics, University of North Carolina
Indirect Rule Learning: Support Vector Machines Indirect learning: loss optimization It doesn t estimate the prediction rule f (x) directly, since most loss functions do not have explicit optimizers. Indirection
More informationClassification objectives COMS 4771
Classification objectives COMS 4771 1. Recap: binary classification Scoring functions Consider binary classification problems with Y = { 1, +1}. 1 / 22 Scoring functions Consider binary classification
More informationLarge Margin Classifiers: Convexity and Classification
Large Margin Classifiers: Convexity and Classification Peter Bartlett Division of Computer Science and Department of Statistics UC Berkeley Joint work with Mike Collins, Mike Jordan, David McAllester,
More informationMachine Learning for NLP
Machine Learning for NLP Linear Models Joakim Nivre Uppsala University Department of Linguistics and Philology Slides adapted from Ryan McDonald, Google Research Machine Learning for NLP 1(26) Outline
More informationSupport Vector Machine
Support Vector Machine Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Linear Support Vector Machine Kernelized SVM Kernels 2 From ERM to RLM Empirical Risk Minimization in the binary
More informationCalibrated Surrogate Losses
EECS 598: Statistical Learning Theory, Winter 2014 Topic 14 Calibrated Surrogate Losses Lecturer: Clayton Scott Scribe: Efrén Cruz Cortés Disclaimer: These notes have not been subjected to the usual scrutiny
More informationClassification Methods with Reject Option Based on Convex Risk Minimization
Journal of Machine Learning Research 11 (010) 111-130 Submitte /09; Revise 11/09; Publishe 1/10 Classification Methos with Reject Option Base on Convex Risk Minimization Ming Yuan H. Milton Stewart School
More informationFast Rates for Estimation Error and Oracle Inequalities for Model Selection
Fast Rates for Estimation Error and Oracle Inequalities for Model Selection Peter L. Bartlett Computer Science Division and Department of Statistics University of California, Berkeley bartlett@cs.berkeley.edu
More informationConsistency of Nearest Neighbor Methods
E0 370 Statistical Learning Theory Lecture 16 Oct 25, 2011 Consistency of Nearest Neighbor Methods Lecturer: Shivani Agarwal Scribe: Arun Rajkumar 1 Introduction In this lecture we return to the study
More informationAdaBoost and other Large Margin Classifiers: Convexity in Classification
AdaBoost and other Large Margin Classifiers: Convexity in Classification Peter Bartlett Division of Computer Science and Department of Statistics UC Berkeley Joint work with Mikhail Traskin. slides at
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationThe deterministic Lasso
The deterministic Lasso Sara van de Geer Seminar für Statistik, ETH Zürich Abstract We study high-dimensional generalized linear models and empirical risk minimization using the Lasso An oracle inequality
More informationLecture Notes 15 Prediction Chapters 13, 22, 20.4.
Lecture Notes 15 Prediction Chapters 13, 22, 20.4. 1 Introduction Prediction is covered in detail in 36-707, 36-701, 36-715, 10/36-702. Here, we will just give an introduction. We observe training data
More informationSVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels
SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels Karl Stratos June 21, 2018 1 / 33 Tangent: Some Loose Ends in Logistic Regression Polynomial feature expansion in logistic
More informationSupport Vector Machines
Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find a plane that separates the classes in feature space. If we cannot, we get creative in two
More informationBINARY CLASSIFICATION
BINARY CLASSIFICATION MAXIM RAGINSY The problem of binary classification can be stated as follows. We have a random couple Z = X, Y ), where X R d is called the feature vector and Y {, } is called the
More informationApproximation Theoretical Questions for SVMs
Ingo Steinwart LA-UR 07-7056 October 20, 2007 Statistical Learning Theory: an Overview Support Vector Machines Informal Description of the Learning Goal X space of input samples Y space of labels, usually
More informationMax Margin-Classifier
Max Margin-Classifier Oliver Schulte - CMPT 726 Bishop PRML Ch. 7 Outline Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Kernels and Non-linear Mappings Where does the maximization
More informationAccelerated Training of Max-Margin Markov Networks with Kernels
Accelerated Training of Max-Margin Markov Networks with Kernels Xinhua Zhang University of Alberta Alberta Innovates Centre for Machine Learning (AICML) Joint work with Ankan Saha (Univ. of Chicago) and
More informationA Study of Relative Efficiency and Robustness of Classification Methods
A Study of Relative Efficiency and Robustness of Classification Methods Yoonkyung Lee* Department of Statistics The Ohio State University *joint work with Rui Wang April 28, 2011 Department of Statistics
More informationDoes Unlabeled Data Help?
Does Unlabeled Data Help? Worst-case Analysis of the Sample Complexity of Semi-supervised Learning. Ben-David, Lu and Pal; COLT, 2008. Presentation by Ashish Rastogi Courant Machine Learning Seminar. Outline
More informationMachine Learning And Applications: Supervised Learning-SVM
Machine Learning And Applications: Supervised Learning-SVM Raphaël Bournhonesque École Normale Supérieure de Lyon, Lyon, France raphael.bournhonesque@ens-lyon.fr 1 Supervised vs unsupervised learning Machine
More informationSupport Vector Machines for Classification and Regression
CIS 520: Machine Learning Oct 04, 207 Support Vector Machines for Classification and Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may
More informationLecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron
CS446: Machine Learning, Fall 2017 Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron Lecturer: Sanmi Koyejo Scribe: Ke Wang, Oct. 24th, 2017 Agenda Recap: SVM and Hinge loss, Representer
More informationIntroduction to Logistic Regression and Support Vector Machine
Introduction to Logistic Regression and Support Vector Machine guest lecturer: Ming-Wei Chang CS 446 Fall, 2009 () / 25 Fall, 2009 / 25 Before we start () 2 / 25 Fall, 2009 2 / 25 Before we start Feel
More informationPAC-learning, VC Dimension and Margin-based Bounds
More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based
More informationDoes Modeling Lead to More Accurate Classification?
Does Modeling Lead to More Accurate Classification? A Comparison of the Efficiency of Classification Methods Yoonkyung Lee* Department of Statistics The Ohio State University *joint work with Rui Wang
More informationSupport Vector Machines with a Reject Option
Support Vector Machines with a Reject Option Yves Grandvalet, 2, Alain Rakotomamonjy 3, Joseph Keshet 2 and Stéphane Canu 3 Heudiasyc, UMR CNRS 6599 2 Idiap Research Institute Université de Technologie
More informationA short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie
A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie Computational Biology Program Memorial Sloan-Kettering Cancer Center http://cbio.mskcc.org/leslielab
More information12. Structural Risk Minimization. ECE 830 & CS 761, Spring 2016
12. Structural Risk Minimization ECE 830 & CS 761, Spring 2016 1 / 23 General setup for statistical learning theory We observe training examples {x i, y i } n i=1 x i = features X y i = labels / responses
More informationModelli Lineari (Generalizzati) e SVM
Modelli Lineari (Generalizzati) e SVM Corso di AA, anno 2018/19, Padova Fabio Aiolli 19/26 Novembre 2018 Fabio Aiolli Modelli Lineari (Generalizzati) e SVM 19/26 Novembre 2018 1 / 36 Outline Linear methods
More informationSupport Vector Machines for Classification and Regression. 1 Linearly Separable Data: Hard Margin SVMs
E0 270 Machine Learning Lecture 5 (Jan 22, 203) Support Vector Machines for Classification and Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in
More informationMehryar Mohri Foundations of Machine Learning Courant Institute of Mathematical Sciences Homework assignment 3 April 5, 2013 Due: April 19, 2013
Mehryar Mohri Foundations of Machine Learning Courant Institute of Mathematical Sciences Homework assignment 3 April 5, 2013 Due: April 19, 2013 A. Kernels 1. Let X be a finite set. Show that the kernel
More informationSupport Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2
More informationMinimax risk bounds for linear threshold functions
CS281B/Stat241B (Spring 2008) Statistical Learning Theory Lecture: 3 Minimax risk bounds for linear threshold functions Lecturer: Peter Bartlett Scribe: Hao Zhang 1 Review We assume that there is a probability
More informationStatistical learning with Lipschitz and convex loss functions
Statistical learning with Lipschitz and convex loss functions Geoffrey Chinot, Guillaume Lecué and Matthieu Lerasle October 3, 08 Abstract We obtain risk bounds for Empirical Risk Minimizers ERM and minmax
More informationl 1 -Regularized Linear Regression: Persistence and Oracle Inequalities
l -Regularized Linear Regression: Persistence and Oracle Inequalities Peter Bartlett EECS and Statistics UC Berkeley slides at http://www.stat.berkeley.edu/ bartlett Joint work with Shahar Mendelson and
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationDecentralized decision making with spatially distributed data
Decentralized decision making with spatially distributed data XuanLong Nguyen Department of Statistics University of Michigan Acknowledgement: Michael Jordan, Martin Wainwright, Ram Rajagopal, Pravin Varaiya
More informationSupport Vector Machines, Kernel SVM
Support Vector Machines, Kernel SVM Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 27, 2017 1 / 40 Outline 1 Administration 2 Review of last lecture 3 SVM
More informationSupport Vector Machines
Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized
More informationDivergences, surrogate loss functions and experimental design
Divergences, surrogate loss functions and experimental design XuanLong Nguyen University of California Berkeley, CA 94720 xuanlong@cs.berkeley.edu Martin J. Wainwright University of California Berkeley,
More informationCurve learning. p.1/35
Curve learning Gérard Biau UNIVERSITÉ MONTPELLIER II p.1/35 Summary The problem The mathematical model Functional classification 1. Fourier filtering 2. Wavelet filtering Applications p.2/35 The problem
More informationOutline. Basic concepts: SVM and kernels SVM primal/dual problems. Chih-Jen Lin (National Taiwan Univ.) 1 / 22
Outline Basic concepts: SVM and kernels SVM primal/dual problems Chih-Jen Lin (National Taiwan Univ.) 1 / 22 Outline Basic concepts: SVM and kernels Basic concepts: SVM and kernels SVM primal/dual problems
More informationClass 2 & 3 Overfitting & Regularization
Class 2 & 3 Overfitting & Regularization Carlo Ciliberto Department of Computer Science, UCL October 18, 2017 Last Class The goal of Statistical Learning Theory is to find a good estimator f n : X Y, approximating
More informationStatistical learning theory, Support vector machines, and Bioinformatics
1 Statistical learning theory, Support vector machines, and Bioinformatics Jean-Philippe.Vert@mines.org Ecole des Mines de Paris Computational Biology group ENS Paris, november 25, 2003. 2 Overview 1.
More informationOn Information Divergence Measures, Surrogate Loss Functions and Decentralized Hypothesis Testing
On Information Divergence Measures, Surrogate Loss Functions and Decentralized Hypothesis Testing XuanLong Nguyen Martin J. Wainwright Michael I. Jordan Electrical Engineering & Computer Science Department
More informationOn the Consistency of AUC Pairwise Optimization
On the Consistency of AUC Pairwise Optimization Wei Gao and Zhi-Hua Zhou National Key Laboratory for Novel Software Technology, Nanjing University Collaborative Innovation Center of Novel Software Technology
More informationOn divergences, surrogate loss functions, and decentralized detection
On divergences, surrogate loss functions, and decentralized detection XuanLong Nguyen Computer Science Division University of California, Berkeley xuanlong@eecs.berkeley.edu Martin J. Wainwright Statistics
More informationHow Good is a Kernel When Used as a Similarity Measure?
How Good is a Kernel When Used as a Similarity Measure? Nathan Srebro Toyota Technological Institute-Chicago IL, USA IBM Haifa Research Lab, ISRAEL nati@uchicago.edu Abstract. Recently, Balcan and Blum
More informationSurrogate loss functions, divergences and decentralized detection
Surrogate loss functions, divergences and decentralized detection XuanLong Nguyen Department of Electrical Engineering and Computer Science U.C. Berkeley Advisors: Michael Jordan & Martin Wainwright 1
More informationReview: Support vector machines. Machine learning techniques and image analysis
Review: Support vector machines Review: Support vector machines Margin optimization min (w,w 0 ) 1 2 w 2 subject to y i (w 0 + w T x i ) 1 0, i = 1,..., n. Review: Support vector machines Margin optimization
More informationSparseness Versus Estimating Conditional Probabilities: Some Asymptotic Results
Sparseness Versus Estimating Conditional Probabilities: Some Asymptotic Results Peter L. Bartlett 1 and Ambuj Tewari 2 1 Division of Computer Science and Department of Statistics University of California,
More informationLecture 7: Kernels for Classification and Regression
Lecture 7: Kernels for Classification and Regression CS 194-10, Fall 2011 Laurent El Ghaoui EECS Department UC Berkeley September 15, 2011 Outline Outline A linear regression problem Linear auto-regressive
More informationSubgradient Descent. David S. Rosenberg. New York University. February 7, 2018
Subgradient Descent David S. Rosenberg New York University February 7, 2018 David S. Rosenberg (New York University) DS-GA 1003 / CSCI-GA 2567 February 7, 2018 1 / 43 Contents 1 Motivation and Review:
More informationSUPPORT VECTOR MACHINE
SUPPORT VECTOR MACHINE Mainly based on https://nlp.stanford.edu/ir-book/pdf/15svm.pdf 1 Overview SVM is a huge topic Integration of MMDS, IIR, and Andrew Moore s slides here Our foci: Geometric intuition
More informationThis is an author-deposited version published in : Eprints ID : 17710
Open Archive TOULOUSE Archive Ouverte (OATAO) OATAO is an open access repository that collects the work of Toulouse researchers and makes it freely available over the web where possible. This is an author-deposited
More informationWarm up: risk prediction with logistic regression
Warm up: risk prediction with logistic regression Boss gives you a bunch of data on loans defaulting or not: {(x i,y i )} n i= x i 2 R d, y i 2 {, } You model the data as: P (Y = y x, w) = + exp( yw T
More informationIntroduction to Machine Learning (67577) Lecture 5
Introduction to Machine Learning (67577) Lecture 5 Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem Nonuniform learning, MDL, SRM, Decision Trees, Nearest Neighbor Shai
More informationSupport Vector Machines
Wien, June, 2010 Paul Hofmarcher, Stefan Theussl, WU Wien Hofmarcher/Theussl SVM 1/21 Linear Separable Separating Hyperplanes Non-Linear Separable Soft-Margin Hyperplanes Hofmarcher/Theussl SVM 2/21 (SVM)
More informationLinear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)
Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Solution only depends on a small subset of training
More informationConfidence Sets with Expected Sizes for Multiclass Classification
Journal of Machine Learning Research 18 2017 1-28 Submitted 11/16; Revised 9/17; Published 10/17 Confidence Sets with Expected Sizes for Multiclass Classification Christophe Denis LAMA UMR-CNRS 8050 Université
More informationLogistic Regression. Machine Learning Fall 2018
Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes
More informationMachine Learning. Support Vector Machines. Manfred Huber
Machine Learning Support Vector Machines Manfred Huber 2015 1 Support Vector Machines Both logistic regression and linear discriminant analysis learn a linear discriminant function to separate the data
More informationStatistical Properties of Numerical Derivatives
Statistical Properties of Numerical Derivatives Han Hong, Aprajit Mahajan, and Denis Nekipelov Stanford University and UC Berkeley November 2010 1 / 63 Motivation Introduction Many models have objective
More informationKernel Methods and Support Vector Machines
Kernel Methods and Support Vector Machines Oliver Schulte - CMPT 726 Bishop PRML Ch. 6 Support Vector Machines Defining Characteristics Like logistic regression, good for continuous input features, discrete
More informationSupport vector machines
Support vector machines Guillaume Obozinski Ecole des Ponts - ParisTech SOCN course 2014 SVM, kernel methods and multiclass 1/23 Outline 1 Constrained optimization, Lagrangian duality and KKT 2 Support
More informationSupport Vector Machines for Classification: A Statistical Portrait
Support Vector Machines for Classification: A Statistical Portrait Yoonkyung Lee Department of Statistics The Ohio State University May 27, 2011 The Spring Conference of Korean Statistical Society KAIST,
More informationNonparametric Bayesian Methods
Nonparametric Bayesian Methods Debdeep Pati Florida State University October 2, 2014 Large spatial datasets (Problem of big n) Large observational and computer-generated datasets: Often have spatial and
More informationTaylorBoost: First and Second-order Boosting Algorithms with Explicit Margin Control
TaylorBoost: First and Second-order Boosting Algorithms with Explicit Margin Control Mohammad J. Saberian Hamed Masnadi-Shirazi Nuno Vasconcelos Department of Electrical and Computer Engineering University
More informationSupport Vector Machines and Bayes Regression
Statistical Techniques in Robotics (16-831, F11) Lecture #14 (Monday ctober 31th) Support Vector Machines and Bayes Regression Lecturer: Drew Bagnell Scribe: Carl Doersch 1 1 Linear SVMs We begin by considering
More informationConvexity, Classification, and Risk Bounds
Convexity, Classification, and Risk Bounds Peter L. Bartlett Computer Science Division and Department of Statistics University of California, Berkeley bartlett@stat.berkeley.edu Michael I. Jordan Computer
More informationThe Margin Vector, Admissible Loss and Multi-class Margin-based Classifiers
The Margin Vector, Admissible Loss and Multi-class Margin-based Classifiers Hui Zou University of Minnesota Ji Zhu University of Michigan Trevor Hastie Stanford University Abstract We propose a new framework
More informationPlug-in Approach to Active Learning
Plug-in Approach to Active Learning Stanislav Minsker Stanislav Minsker (Georgia Tech) Plug-in approach to active learning 1 / 18 Prediction framework Let (X, Y ) be a random couple in R d { 1, +1}. X
More informationMachine Learning Basics Lecture 4: SVM I. Princeton University COS 495 Instructor: Yingyu Liang
Machine Learning Basics Lecture 4: SVM I Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d. from distribution
More informationStanford Statistics 311/Electrical Engineering 377
I. Bayes risk in classification problems a. Recall definition (1.2.3) of f-divergence between two distributions P and Q as ( ) p(x) D f (P Q) : q(x)f dx, q(x) where f : R + R is a convex function satisfying
More informationSurrogate regret bounds for generalized classification performance metrics
Surrogate regret bounds for generalized classification performance metrics Wojciech Kotłowski Krzysztof Dembczyński Poznań University of Technology PL-SIGML, Częstochowa, 14.04.2016 1 / 36 Motivation 2
More informationDistirbutional robustness, regularizing variance, and adversaries
Distirbutional robustness, regularizing variance, and adversaries John Duchi Based on joint work with Hongseok Namkoong and Aman Sinha Stanford University November 2017 Motivation We do not want machine-learned
More informationA talk on Oracle inequalities and regularization. by Sara van de Geer
A talk on Oracle inequalities and regularization by Sara van de Geer Workshop Regularization in Statistics Banff International Regularization Station September 6-11, 2003 Aim: to compare l 1 and other
More informationIntroduction to Machine Learning Midterm, Tues April 8
Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend
More informationA Bahadur Representation of the Linear Support Vector Machine
A Bahadur Representation of the Linear Support Vector Machine Yoonkyung Lee Department of Statistics The Ohio State University October 7, 2008 Data Mining and Statistical Learning Study Group Outline Support
More informationCorrections. ir j r mod n 2 ). (i r j r ) mod n should be d. p. 83, l. 2 [p. 80, l. 5]: d. p. 92, l. 1 [p. 87, l. 2]: Λ Z d should be Λ Z d.
Corrections We list here typos and errors that have been found in the published version of the book. We welcome any additional corrections you may find! We indicate page and line numbers for the published
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationNonparametric Decentralized Detection using Kernel Methods
Nonparametric Decentralized Detection using Kernel Methods XuanLong Nguyen xuanlong@cs.berkeley.edu Martin J. Wainwright, wainwrig@eecs.berkeley.edu Michael I. Jordan, jordan@cs.berkeley.edu Electrical
More informationDecentralized Detection and Classification using Kernel Methods
Decentralized Detection and Classification using Kernel Methods XuanLong Nguyen Computer Science Division University of California, Berkeley xuanlong@cs.berkeley.edu Martin J. Wainwright Electrical Engineering
More informationLecture 10: Support Vector Machine and Large Margin Classifier
Lecture 10: Support Vector Machine and Large Margin Classifier Applied Multivariate Analysis Math 570, Fall 2014 Xingye Qiao Department of Mathematical Sciences Binghamton University E-mail: qiao@math.binghamton.edu
More informationASSESSING ROBUSTNESS OF CLASSIFICATION USING ANGULAR BREAKDOWN POINT
Submitted to the Annals of Statistics ASSESSING ROBUSTNESS OF CLASSIFICATION USING ANGULAR BREAKDOWN POINT By Junlong Zhao, Guan Yu and Yufeng Liu, Beijing Normal University, China State University of
More informationSTAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song
STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song Presenter: Jiwei Zhao Department of Statistics University of Wisconsin Madison April
More informationStatistical Methods for SVM
Statistical Methods for SVM Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find a plane that separates the classes in feature space. If we cannot,
More informationSupport Vector Machine
Andrea Passerini passerini@disi.unitn.it Machine Learning Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)
More information