SPRINT Multi-Objective Model Racing

Similar documents
SPRINT Multi-Objective Model Racing

Model Selection via Racing

Lecture 3. STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher

Lecture 6. Notes on Linear Algebra. Perceptron

CSC 578 Neural Networks and Deep Learning

Approximate Second Order Algorithms. Seo Taek Kong, Nithin Tangellamudi, Zhikai Guo

Variable Selection in Predictive Regressions

Estimate by the L 2 Norm of a Parameter Poisson Intensity Discontinuous

Simple Techniques for Improving SGD. CS6787 Lecture 2 Fall 2017

Dan Roth 461C, 3401 Walnut

Group-Sequential Tests for One Proportion in a Fleming Design

Worksheets for GCSE Mathematics. Algebraic Expressions. Mr Black 's Maths Resources for Teachers GCSE 1-9. Algebra

Support Vector Machines. CSE 4309 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Analytics Software. Beyond deterministic chain ladder reserves. Neil Covington Director of Solutions Management GI

Acceleration due to Gravity

7.3 The Jacobi and Gauss-Seidel Iterative Methods

General Strong Polarization

Mauro Birattari, Prasanna Balaprakash, and Marco Dorigo

Models of collective inference

Solving Fuzzy Nonlinear Equations by a General Iterative Method

Property Testing and Affine Invariance Part I Madhu Sudan Harvard University

14- Hardening Soil Model with Small Strain Stiffness - PLAXIS

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

General Strong Polarization

Tuning Parameters across Mixed Dimensional Instances: A Performance Scalability Study of Sep-G-CMA-ES

Angular Momentum, Electromagnetic Waves

CS Lecture 8 & 9. Lagrange Multipliers & Varitional Bounds

A Step Towards the Cognitive Radar: Target Detection under Nonstationary Clutter

Pubh 8482: Sequential Analysis

TECHNICAL NOTE AUTOMATIC GENERATION OF POINT SPRING SUPPORTS BASED ON DEFINED SOIL PROFILES AND COLUMN-FOOTING PROPERTIES

CS249: ADVANCED DATA MINING

Elastic light scattering

Interaction with matter

Administration. CSCI567 Machine Learning (Fall 2018) Outline. Outline. HW5 is available, due on 11/18. Practice final will also be available soon.

Data Mining und Maschinelles Lernen

SECTION 5: POWER FLOW. ESE 470 Energy Distribution Systems

Module 7 (Lecture 25) RETAINING WALLS

(1) Introduction: a new basis set

Non-Parametric Weighted Tests for Change in Distribution Function

Lecture 11. Kernel Methods

Natural Computing. Lecture 11. Michael Herrmann phone: Informatics Forum /10/2011 ACO II

A combined approach for claiming equivalence and difference for multiple endpoints with application in GMO-trials 1

Performance Evaluation and Comparison

Kinetic Model Parameter Estimation for Product Stability: Non-uniform Finite Elements and Convexity Analysis

Self-Adaptive Ant Colony System for the Traveling Salesman Problem

Entropy Enhanced Covariance Matrix Adaptation Evolution Strategy (EE_CMAES)

Multiple Regression Analysis: Inference ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD

Introduction to Density Estimation and Anomaly Detection. Tom Dietterich

A GA Mechanism for Optimizing the Design of attribute-double-sampling-plan

Prof. Dr. Ann Nowé. Artificial Intelligence Lab ai.vub.ac.be

Multilayer Perceptrons (MLPs)

(2) Orbital angular momentum

Lecture 10: Logistic Regression

Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process

COMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS16

HISTORY. In the 70 s vs today variants. (Multi-layer) Feed-forward (perceptron) Hebbian Recurrent

Multi Objective Optimization

New Algorithms for Contextual Bandits

Linear Scalarized Knowledge Gradient in the Multi-Objective Multi-Armed Bandits Problem

Jasmin Smajic1, Christian Hafner2, Jürg Leuthold2, March 23, 2015

Classical RSA algorithm

International Journal of Mathematical Archive-5(3), 2014, Available online through ISSN

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Comparison of Stochastic Soybean Yield Response Functions to Phosphorus Fertilizer

Review for Exam Hyunse Yoon, Ph.D. Assistant Research Scientist IIHR-Hydroscience & Engineering University of Iowa

Algebraic Codes and Invariance

Brief Introduction of Machine Learning Techniques for Content Analysis

Available online at ScienceDirect. Procedia Computer Science 20 (2013 ) 90 95

Lecture 4 January 23

Introduction to Bayesian Statistics

Mathematical Methods 2019 v1.2

Logistic Regression and Boosting for Labeled Bags of Instances

Photon Interactions in Matter

On one Application of Newton s Method to Stability Problem

12. LOCAL SEARCH. gradient descent Metropolis algorithm Hopfield neural networks maximum cut Nash equilibria

Selected Topics in Optimization. Some slides borrowed from

10.4 The Cross Product

Worksheets for GCSE Mathematics. Quadratics. mr-mathematics.com Maths Resources for Teachers. Algebra

Lecture 3. Linear Regression

A linear operator of a new class of multivalent harmonic functions

The two subset recurrent property of Markov chains

Definition: A sequence is a function from a subset of the integers (usually either the set

Reinforcement learning an introduction

Passing-Bablok Regression for Method Comparison

AMULTIOBJECTIVE optimization problem (MOP) can

Robust Multi-Objective Optimization in High Dimensional Spaces

Coulomb s Law and Coulomb s Constant

Hopfield Network for Associative Memory

SECTION 7: FAULT ANALYSIS. ESE 470 Energy Distribution Systems

Tests for the Odds Ratio of Two Proportions in a 2x2 Cross-Over Design

P.3 Division of Polynomials

The Multi-Arm Bandit Framework

Lesson 24: True and False Number Sentences

On The Cauchy Problem For Some Parabolic Fractional Partial Differential Equations With Time Delays

APPROXIMATING A COMMON SOLUTION OF A FINITE FAMILY OF GENERALIZED EQUILIBRIUM AND FIXED POINT PROBLEMS

Zebo Peng Embedded Systems Laboratory IDA, Linköping University

Review for Exam Hyunse Yoon, Ph.D. Adjunct Assistant Professor Department of Mechanical Engineering, University of Iowa

Algorithmisches Lernen/Machine Learning

A new procedure for sensitivity testing with two stress factors

Parameters affecting water hammer in metal pipelines

Transcription:

SPRINT Multi-Objective Model Racing Tiantian Zhang, Michael Georgiopoulos, Georgios C. Anagnostopoulos Machine Learning Lab University of Central Florida

Overview Racing Algorithms Multi-objective Model Selection SPRINT-Race Motivation Sequential Probability Ratio Test (Non-) Dominance Inference via Dual-SPRT Experimental Results and Discussions Conclusion and Future Direction 2

Racing Algorithm A set of initial models Model Selection Sample validation instances o Consider an ensemble of models and o stick with the best Elimination-type Multi-Armed Bandit Evaluate all models Abandon bad performers No Stop? Yes Return the best performers 3

Racing Algorithm Trade-off o Benefit: Identify best model(s) Computation Time o Price to be paid: Computational cost RAs trade off model optimalit y vs. computational effort by automatically allocating comp utational resources Racing algorithms Brute force Search Process 4

Racing Algorithm Name Year Statistical Test Criterion Of Goodness Application Hoeffding 1994 Hoeffiding s inequality Prediction Accuracy Classification & Func. Approximation BRACE 1994 Bayesian Statistics Cross-validation Error Classification F-Race 2002 Friedman Test Function Optimization Optimal parameter of Max- Min-Ant-System for TSP Bernstein Racing 2008 Bernstein s inequality Function Optimization Policy selection in CMA-ES S-Race 2013 Sign Test Prediction Accuracy Classification 5

Multi-Objective Optimization Pareto Dominance (minimization case) f 2 Pareto Front f 1

Multi-Objective Model Selection Multi-Criteria Recommendation Story: 2 Actors: 5 Story: 3 Actors: 4 Story: 4 Actors: 3

Multi-Objective Model Selection Multi-task Learning

Multi-Objective Model Selection Motivation Model selection often involves more than one crite rion of goodness Examples Generalization performance vs. model complexity Single model addressing multiple tasks How can racing be adapted to a MOMS setting? Approaches Scalarization of criteria into a single criterion Pros: Existing racing algorithms may be applicable Cons: Good models may be left undiscovered (due to non-convexity of the relevant Pareto front) Vector Optimization Pros: Selecting models based on Pareto optimality Cons: No racing algorithm available

Multi-Objective Racing Algorithm S-Race Fixed-Budget Offline Maximize selection accuracy SPRINT- Race Fixed-Confidence Online Minimize computational cost

SPRINT-Race Racing via Sequential Probability Ratio with INdiff erence zone Test Try to minimize the computational effort needed to achieve a predefi ned confidence about the quality of the returned models. A near-optimal non-parametric ternary-decision sequenti al analogue of the sign test is adopted to identify pair-wis e dominance/non-dominance relationship. Strictly confine the error probability of returning dominate d models and, simultaneously, abandoning non-dominated models at a predefined level. Able to stop automatically. The concept of indifference zone is introduced.

Non-sequential VS Sequential Test Non-sequential test Sample size n is fixed Given αα and rejection region, ββ is a function of sample size. The Uniformly Most Efficient test given n and αα, find the test procedur e that minimize ββ Sequential Test Sample size n is a variable Either we accept HH 0, or we accept HH 1, or we continue sampling Properties: OC function (probability of accepting HH 0 ), ASN (average samp le number) The Uniformly Most Efficient test given αα and ββ, find the test that mini mize the expected sample size

SPRT -- Sequential Probability Ratio Test Locally Most Efficient Test procedure SPRT with Bernoulli Distribution At n-th step, assume xx 1, xx 2,, xx nn are i.i.d. samples collected from a Bernoulli distri bution with P xx ii = 1 = pp, and dd = # xx ii = 1 Given two simple hypothesis HH 00 : pp = pp 00 HH 11 : pp = pp 11 Let AA = 1 ββ αα ββ, BB =. 1 αα ττ nn = pp 1 dd (1 pp 1 ) nn dd pp 0 dd (1 pp 0 ) nn dd if ττ nn BB, HH 00 is accepted; if ττ nn A, HH 1 is accepted; otherwise, continue sampling.

(Non-) Dominance Inference Given model CC ii and CC jj. nn iijj - the number of instances that the performance vector of nn jjjj CC ii dominates the performance vector of CC jj - vise versa p the probability that CC ii dominates CC jj nn iiii ~BBBBBBBBBBBBBBBB(nn iiii + nn jjjj, pp) 11 22 δδ 11 22 11 22 + δδ pp

Dual-SPRT (Sobel-Wald s test) The three-hypothesis problem is composed of two component two-hypothesis tests SPRT1: HH 0 : pp 1 δδ 2 HH 1: pp 1 2 SPRT2: HH 0 : pp 1 2 HH 1 : pp 1 + δδ 2 SPRT1 SPRT2 pp 11 22 δδ pp 11 22 pp 11 22 pp 11 22 + δδ pp 11 22 δδ pp = 11 22 pp 11 22 + δδ

Dual-SPRT (Sobel-Wald s test) Assume that common αα and ββ values are shared between SPRT1 and SPRT2 n ij H 2 H 1 A C B H 0 n ij + n ji

Dual-SPRT Accuracy Analysis Interval Wrong Decisio ns γγ(pp) pp 1 2 δδ Accept HH 1 or HH 2 γγ(pp) αα 1 2 δδ < pp < 1 2 pp = 1 2 Accept HH 2 Accept HH 0 or HH 2 γγ(pp) αα γγ pp αα + ββ 1 2 < pp < 1 2 + δδ Accept HH 0 γγ pp < ββ pp 1 2 + δδ Accept HH 0 or HH 1 γγ(pp) ββ γγ = max{γγ pp } ~αα + ββ

Overall Error Control Bonferroni Approach ( MM 2 ) ii=1 γγ ( MM 2 ii = ) ii=1 (αα ii + ββ ii ) MM Let αα ii = ββ ii = εε for ii = 1,, 2, mmmmmm denote the maximum error probability allowed for SPRINT-Race, εε = mmmmmm MM(MM 11)

SPRINT-Race Initialize PPoooooo CC 1, CC 2,, CC mm mm 2, t = 1 Randomly sample a problem instance for each pair CC ii, CC jj PPPPPPPP ss. tt. ii < jj do if the corresponding dual-sprt continues then Evaluate CC ii, CC jj, update nn iiii, nn jjjj if HH 0 is accepted then PPoooooo PPPPPPPP\{CC ii } stop all dual-sprts involving CC ii else if HH 2 is accepted then PPPPPPPP PPPPPPPP\{CC jj } stop all dual-sprts involving CC jj else if HH 1 is accepted then stop all dual-sprts involving CC ii, CC jj end if end for until All dual-sprts are terminated return the Pareto front models found

Experiments - Artificially Constructed MOMS Problem Given M models, construct MM 2 with known p values. Three performance metrics RR rrrrrrrrrrrrrrrrrr = PP RR PP PPPP PP PPPP EE eeeeeeeeeeee = PP RR\PP PPPP PP RR PP PPFF - Pareto front models Bernoulli distributions = 1 PP RR PP PPPP PP RR PP RR - models returned by SPRINT-Race T sample complexity 20

Impact of No. of Objectives R 1.005 0.995 1 2 3 4 5 6 7 8 9 10 11 12 13 14 E R 0.1 0 2 3 4 5 6 7 8 9 10 11 12 13 14 T E 2.00E+05 0.00E+00 2 3 4 5 6 7 8 9 10 11 12 13 14 T 21

Impact of δδ and mmmmmm Values 3.5 x 105 0.14 3 0.12 2-obj 3-obj 2.5 0.1 Sample Complexity (T) 2 1.5 1 1-R+E 0.08 0.06 0.04 0.5 0.02 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 δ 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 δ Decreasing mmmmmm will definitely increase sample com plexity with slightly varied R and E values. 22

Experiments - ACO Selection for TSPs Parameter tuning of ACOs is time consuming A pool of 125 candidate models were initialized with diverse confi guration in terms on different combinations of three parameters αα AAAAAA - the influence of pheromone trials ββ AAAAAA - the influence of heuristic information ρρ AAAAAA - the pheromone trial evaporation rate Two objectives Minimize the TSP tour length Minimize the actual computation time to find this tour ACOTSPJava + DIMACS TSP instance generator 23

Experiments - ACO Selection for TSPs 1 0.52 0.9 0.8 Dominated Models Non-dominated Models 0.5 0.48 Normalized Computation Time 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Normalized Tour Length Minimum Probability of Dominance 0.46 0.44 0.42 0.4 0.38 0.36 0.34 Dominated Non-dominated p=0.45 0.32 0 20 40 60 80 100 120 Index of Models 24

Conclusions SPRINT-Race is a multi-objective racing algorithm Solving multi-objective model selection Dual-SPRT is adopted for dominance and non-dominance inference The total probability of falsely retaining any dominated model and remov e any non-dominated model is strictly controlled Be able to stop automatically with fixed confidence SPRINT-Race is able to return almost exactly the true Pareto front models at a reduced cost Artificially-constructed MOMS problems Select Pareto optimal ACP parameters for solving TSPs? Optimal model configuration 25

References [1] O. Maron and A. Moore. Hoeffding races: Accelerating model selection search for classification and function approximation. A dvances in Neural Information Processing Systems, 6:59 66, 1994. [2] T. Zhang, M. Georgiopoulos, and G. C. Anagnostopoulos. S-Race: A multi-objective racing algorithm. In GECCO 13 Proc. of the Genet. and Evol. Comput. Conf., pages 1565 1572, 2013. [3] B. K. Ghosh. Sequential Tests of Statistical Hypotheses. Addison-Wesley Publishing Company, Inc., 1970. [4] Y. Jin and B. Sendhoff. Pareto-based multi-objective machine learning: An overview and case studies. IEEE Trans. Syst., Man, C ybern. C, Appl. Rev., 38:397 415, 2008. [5] W. Mendenhall, D. D. Wackerly, and R. L. Scheaffer. Nonparametric statistics. Math. Stat. with App., pages 674 679, 1989. [6] M. Sobel and A. Wald. A sequential decision procedure for choosing one of three hypotheses concerning the unknown mean of a normal distribution. The Ann. of Math. Stat., 20:502 522, 1949. [7] M. Birattari, Z. Yuan, P. Balaprakash, and T. Stützle. F-race and iterated F-race: An overview. Technical report, IRIDIA, 2011. 26

Thank you! Any Questions? 27

Backup Slides 28

Multi-Objective Optimization Performance vector of model C L objectives ff(cc) ff 1 (CC), ff 2 (CC),, ff LL (CC) (Minimization) CC dddddddddddddddds CC if and only if ff ii (CC) ff ii CC ii {1, 2,, LL} and jj 1, 2,, LL jj ii ff jj (CC) < ff jj CC

Multi-Objective Racing Algorithm S-Race (GECCO 13) Pareto-optimality Test Pareto dominance non-parametrically via sign test Consider family-wise testing procedures Limitations of S-Race Х No inference on non-dominance relationship Х No control over the overall probability of making any Ty pe II errors Х Sign test is not an optimum test procedure in sequential setting

Dual-SPRT (Sobel-Wald s test) The three-hypothesis problem is composed of two componen t two-hypothesis tests SPRT1: HH 0 : pp 1 2 δδ HH 1: pp 1 2 SPRT2: HH 0 : pp 1 2 HH 1 : pp 1 2 + δδ if SPRT1 accepts if SPRT2 accepts then dual SPRT accepts HH 0 : pp 1 2 δδ HH 0 : pp 1 2 HH 1 : pp 1 2 δδ(remove CC ii) HH 1 : pp 1 2 HH 0 : pp 1 2 HH 2 : pp = 1 2 (keep both) HH 1 : pp 1 2 HH 1 : pp 1 2 + εε HH 3 : pp 1 2 + εε (remove CC jj)