Large-Margin Thresholded Ensembles for Ordinal Regression

Similar documents
Large-Margin Thresholded Ensembles for Ordinal Regression

Lecture 8. Instructor: Haipeng Luo

Ordinal Classification with Decision Rules

Boosting. CAP5610: Machine Learning Instructor: Guo-Jun Qi

Introduction to Support Vector Machines

A Statistical View of Ranking: Midway between Classification and Regression

An Ensemble Ranking Solution for the Yahoo! Learning to Rank Challenge

CS229 Supplemental Lecture notes

Logistic Regression. Machine Learning Fall 2018

From Ordinal Ranking to Binary Classification

Stochastic Gradient Descent

Infinite Ensemble Learning with Support Vector Machinery

VBM683 Machine Learning

ECE 5424: Introduction to Machine Learning

A Brief Introduction to Adaboost

COMS 4771 Lecture Boosting 1 / 16

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9

CS7267 MACHINE LEARNING

Support Vector Machines

Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers

Multiclass Boosting with Repartitioning

Does Unlabeled Data Help?

2 Upper-bound of Generalization Error of AdaBoost

Introduction to Boosting and Joint Boosting

MIRA, SVM, k-nn. Lirong Xia

Learning Ensembles. 293S T. Yang. UCSB, 2017.

Decision Trees: Overfitting

Introduction to Machine Learning

CSCI-567: Machine Learning (Spring 2019)

Boosting. Acknowledgment Slides are based on tutorials from Robert Schapire and Gunnar Raetsch

The AdaBoost algorithm =1/n for i =1,...,n 1) At the m th iteration we find (any) classifier h(x; ˆθ m ) for which the weighted classification error m

6.036 midterm review. Wednesday, March 18, 15

Voting (Ensemble Methods)

From Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018

Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring /

Learning with multiple models. Boosting.

AdaBoost. Lecturer: Authors: Center for Machine Perception Czech Technical University, Prague

Novel Distance-Based SVM Kernels for Infinite Ensemble Learning

Boos$ng Can we make dumb learners smart?

Listwise Approach to Learning to Rank Theory and Algorithm

A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie

Ensembles. Léon Bottou COS 424 4/8/2010

CS 188: Artificial Intelligence. Outline

A Decision Stump. Decision Trees, cont. Boosting. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. October 1 st, 2007

i=1 = H t 1 (x) + α t h t (x)

Boosting: Foundations and Algorithms. Rob Schapire

Part of the slides are adapted from Ziko Kolter

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.

Lecture 3: Multiclass Classification

An overview of Boosting. Yoav Freund UCSD

Robotics 2 AdaBoost for People and Place Detection

Class 4: Classification. Quaid Morris February 11 th, 2011 ML4Bio

Introduction to Machine Learning Lecture 11. Mehryar Mohri Courant Institute and Google Research

Web Search and Text Mining. Learning from Preference Data

Multi-label Active Learning with Auxiliary Learner

Supervised Learning of Non-binary Problems Part I: Multiclass Categorization via Output Codes

Outline: Ensemble Learning. Ensemble Learning. The Wisdom of Crowds. The Wisdom of Crowds - Really? Crowd wiser than any individual

Ensemble Methods for Machine Learning

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.

ECE521 Lecture7. Logistic Regression

MIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE

Learning theory. Ensemble methods. Boosting. Boosting: history

Big Data Analytics. Special Topics for Computer Science CSE CSE Feb 24

Linear discriminant functions

Generative v. Discriminative classifiers Intuition

Accelerating Stochastic Optimization

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

Holdout and Cross-Validation Methods Overfitting Avoidance

Machine Learning for NLP

Totally Corrective Boosting Algorithms that Maximize the Margin

Open Problem: A (missing) boosting-type convergence result for ADABOOST.MH with factorized multi-class classifiers

COMS 4721: Machine Learning for Data Science Lecture 13, 3/2/2017

Metric Embedding of Task-Specific Similarity. joint work with Trevor Darrell (MIT)

CSE 151 Machine Learning. Instructor: Kamalika Chaudhuri

Statistics and learning: Big Data

Generative v. Discriminative classifiers Intuition

Classification: Analyzing Sentiment

Robotics 2. AdaBoost for People and Place Detection. Kai Arras, Cyrill Stachniss, Maren Bennewitz, Wolfram Burgard

LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning

Learning Binary Classifiers for Multi-Class Problem

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

Analysis of the Performance of AdaBoost.M2 for the Simulated Digit-Recognition-Example

Announcements Kevin Jamieson

Machine Learning Techniques

A Simple Algorithm for Multilabel Ranking

Gaussian and Linear Discriminant Analysis; Multiclass Classification

Multi-Category Classification by Soft-Max Combination of Binary Classifiers

What makes good ensemble? CS789: Machine Learning and Neural Network. Introduction. More on diversity

9 Classification. 9.1 Linear Classifiers

Multiclass Classification-1

Introduction to Logistic Regression and Support Vector Machine

Ensemble learning 11/19/13. The wisdom of the crowds. Chapter 11. Ensemble methods. Ensemble methods

Chapter 18. Decision Trees and Ensemble Learning. Recall: Learning Decision Trees

Machine Learning for NLP

Cross Validation & Ensembling

A Gentle Introduction to Gradient Boosting. Cheng Li College of Computer and Information Science Northeastern University

Errors, and What to Do. CS 188: Artificial Intelligence Fall What to Do About Errors. Later On. Some (Simplified) Biology

Theory and Applications of A Repeated Game Playing Algorithm. Rob Schapire Princeton University [currently visiting Yahoo!

CS 188: Artificial Intelligence Fall 2011

Learning Classification with Auxiliary Probabilistic Information Quang Nguyen Hamed Valizadegan Milos Hauskrecht

Transcription:

Large-Margin Thresholded Ensembles for Ordinal Regression Hsuan-Tien Lin (accepted by ALT 06, joint work with Ling Li) Learning Systems Group, Caltech Workshop Talk in MLSS 2006, Taipei, Taiwan, 07/25/2006 H.-T. Lin and L. Li (Learning Systems Group) Large-Margin Thresholded Ensembles 07/25/2006 1 / 12

Ordinal Regression Problem Reduction Method Algorithmic 1 identify the type of learning problem (ordinal regression) 2 find premade reduction (thresholded ensemble) and oracle learning algorithm (AdaBoost) 3 build a ordinal regression rule using (ORBoost) + data Theoretical 1 identify the type of learning problem (ordinal regression) 2 find premade reduction (thresholded ensemble) and known generalization bounds (large-margin ensembles) 3 derive new bound (large-margin thresholded ensembles) using the reduction + known bound this work: a concrete instance of reductions H.-T. Lin and L. Li (Learning Systems Group) Large-Margin Thresholded Ensembles 07/25/2006 2 / 12

Ordinal Regression Problem Ordinal Regression what is the age-group of the person in the picture? 2 1 2 3 4 rank: a finite ordered set of labels Y = {1, 2,, K } ordinal regression: given training set {(x n, y n )} N n=1, find a decision function g that predicts the ranks of unseen examples well e.g. ranking movies, ranking by document relevance, etc. matching human preferences: applications in social science and info. retrieval H.-T. Lin and L. Li (Learning Systems Group) Large-Margin Thresholded Ensembles 07/25/2006 3 / 12

Ordinal Regression Problem Properties of Ordinal Regression regression without metric: possibly metric underlying (age), but not encoded in {1, 2, 3, 4} classification with ordered categories: small mistake classify a teenager as a child; big mistake classify an infant as an adult common loss functions: determine the category: classification error L C (g, x, y) = [ g(x) y ] or at least have a close prediction: absolute error L A (g, x, y) = g(x) y will talk about L A only; similar for L C H.-T. Lin and L. Li (Learning Systems Group) Large-Margin Thresholded Ensembles 07/25/2006 4 / 12

Thresholded Ensemble Model Thresholded Model for Ordinal Regression naive algorithm for ordinal regression: 1 do general regression on {(x n, y n )}, and get H(x) general regression performs badly without metric 2 set g(x) = clip(round(h(x))) roundoff operation (uniform quantization) cause large error improved and generalized algorithm: 1 estimate a potential function H(x) 2 quantize H(x) by some ordered θ to get g(x) 1 2 3 4 g(x) x x H(x) θ1 θ2 θ3 thresholded model: g(x) g H,θ (x) = min {k : H(x) < θ k } H.-T. Lin and L. Li (Learning Systems Group) Large-Margin Thresholded Ensembles 07/25/2006 5 / 12

Thresholded Ensemble Model Thresholded Ensemble Model the potential function H(x) is a weighted ensemble H(x) H T (x) = T t=1 w th t (x) intuition: combine preferences to estimate the overall confidence e.g. if many people, h t, say a movie x is good, the confidence of the movie H(x) should be high h t can be binary, multi-valued, or continuous w t < 0: allow reversing bad preferences thresholded ensemble model: ensemble learning for ordinal regression H.-T. Lin and L. Li (Learning Systems Group) Large-Margin Thresholded Ensembles 07/25/2006 6 / 12

Bounds for Large-Margin Thresholded Ensembles Margins of Thresholded Ensembles 1 2 3 4 g(x) x x H(x) θ 1 θ 2 θ 3 ρ1 ρ2 ρ 3 margin: safe from the boundary normalized margin for thresholded ensemble { } /( ) HT (x) θ ρ(x, y, k) = k, if y > k T K 1 w θ k H T (x), if y k t + θ k t=1 k=1 negative margin wrong prediction K 1 ] [ ρ(x, y, k) 0 g(x) y k=1 H.-T. Lin and L. Li (Learning Systems Group) Large-Margin Thresholded Ensembles 07/25/2006 7 / 12

Bounds for Large-Margin Thresholded Ensembles Theoretical Reduction 1 2 3 4 g(x) x x H(x) θ 1 + + + + ++ + ++ (K 1) binary classification problems w.r.t. each θ k : ( (X)k, (Y ) k ) = ( (x, k), +/ ) (Schapire et al., 1998) binary classification: with probability at least 1 δ, for all > 0 and binary classifiers g c, [ E (X,Y ) D gc (X) Y ] 1 N [ ρ(xn, Y n ) ] + O( log N, 1 log N N, 1 δ ) n=1 (Lin and Li, 2006) ordinal regression: with similar settings, for all thresholded ensembles g, E (x,y) D L A (g, x, y) 1 N K 1 [ ρ(xn, y n, k) ] +O(K, log N, 1 log N N, 1 δ ) n=1 k=1 large-margin thresholded ensembles can generalize H.-T. Lin and L. Li (Learning Systems Group) Large-Margin Thresholded Ensembles 07/25/2006 8 / 12

Algorithms for Large-Margin Thresholded Ensembles Algorithmic Reduction (Freund and Schapire, 1996) AdaBoost: binary classification by operationally optimizing min N exp ( ρ(x n, y n ) ) max softmin n ρ(x n, y n ) n=1 (Lin and Li, 2006) ORBoost-LR (left-right): min N y n n=1 k=y n 1 exp ( ρ(x n, y n, k) ) ORBoost-All: min N K 1 n=1 k=1 exp ( ρ(x n, y n, k) ) algorithmic reduction to AdaBoost H.-T. Lin and L. Li (Learning Systems Group) Large-Margin Thresholded Ensembles 07/25/2006 9 / 12

Algorithms for Large-Margin Thresholded Ensembles Advantages of ORBoost ensemble learning: combine simple preferences to approximate complex targets threshold: adaptively estimated scales to perform ordinal regression inherit from AdaBoost: simple implementation guarantee on minimizing n,k[ ρ(xn, y n, k) ] fast practically less vulnerable to overfitting useful properties inherited with reduction H.-T. Lin and L. Li (Learning Systems Group) Large-Margin Thresholded Ensembles 07/25/2006 10 / 12

Algorithms for Large-Margin Thresholded Ensembles ORBoost Experiments test absolute error 2.5 RankBoost ORBoost SVM 2 1.5 1 0.5 0 py ma ho ab ba co ca ce Results (ORBoost-All) ORBoost-All simpler, and much better than RankBoost (Freund et al., 2003) ORBoost-All much faster, and comparable to SVM (Chu and Keerthi, 2005) similar for ORBoost-LR H.-T. Lin and L. Li (Learning Systems Group) Large-Margin Thresholded Ensembles 07/25/2006 11 / 12

Conclusion Conclusion thresholded ensemble model: useful for ordinal regression theoretical reduction: new large-margin bounds algorithmic reduction: new training algorithms ORBoost ORBoost: simplicity over existing boosting algorithms comparable performance to state-of-the-art algorithms fast training and less vulnerable to overfitting on-going work: similar reduction technique for other theoretical and algorithmic results with more general loss functions (Li and Lin, 2006) Questions? H.-T. Lin and L. Li (Learning Systems Group) Large-Margin Thresholded Ensembles 07/25/2006 12 / 12