Boosting and Ensemble Methods

Similar documents
Consider a system of 2 simultaneous first order linear equations

The Variance-Covariance Matrix

Lecture 4 : Backpropagation Algorithm. Prof. Seul Jung ( Intelligent Systems and Emotional Engineering Laboratory) Chungnam National University

9. Simple Rules for Monetary Policy

Summary: Solving a Homogeneous System of Two Linear First Order Equations in Two Unknowns

t=0 t>0: + vr - i dvc Continuation

Frequency Response. Response of an LTI System to Eigenfunction

State Observer Design

Advanced Queueing Theory. M/G/1 Queueing Systems

Introduction to Boosting

FAULT TOLERANT SYSTEMS

Central University of Finance and Economics, Beijing, China. *Corresponding author

innovations shocks white noise

ELEN E4830 Digital Image Processing

UNIT #5 EXPONENTIAL AND LOGARITHMIC FUNCTIONS

Chapter 13 Laplace Transform Analysis

Wave Superposition Principle

Probabilistic Reasoning; Graphical models

Final Exam : Solutions

CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 259] B-Trees

Supplementary Figure 1. Experiment and simulation with finite qudit. anharmonicity. (a), Experimental data taken after a 60 ns three-tone pulse.

y = 2xe x + x 2 e x at (0, 3). solution: Since y is implicitly related to x we have to use implicit differentiation: 3 6y = 0 y = 1 2 x ln(b) ln(b)

Lucas Test is based on Euler s theorem which states that if n is any integer and a is coprime to n, then a φ(n) 1modn.

Review - Probabilistic Classification

Homework: Introduction to Motion

A Note on Estimability in Linear Models

Grand Canonical Ensemble

Department of Economics University of Toronto

SIMEON BALL AND AART BLOKHUIS

Lecture VI Regression

Copyright 2000, Kevin Wayne 1

AR(1) Process. The first-order autoregressive process, AR(1) is. where e t is WN(0, σ 2 )

CHAPTER 2: Supervised Learning

CHAPTER 7d. DIFFERENTIATION AND INTEGRATION

CHAPTER 10: LINEAR DISCRIMINATION

CSE 245: Computer Aided Circuit Simulation and Verification

Lecture 6: Learning for Control (Generalised Linear Regression)

8-node quadrilateral element. Numerical integration

Safety and Reliability of Embedded Systems. (Sicherheit und Zuverlässigkeit eingebetteter Systeme) Stochastic Reliability Analysis

Theoretical Seismology

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 4

EXERCISE - 01 CHECK YOUR GRASP

Pupil / Class Record We can assume a word has been learned when it has been either tested or used correctly at least three times.

Microscopic Flow Characteristics Time Headway - Distribution

Single-loop System Reliability-Based Design & Topology Optimization (SRBDO/SRBTO): A Matrix-based System Reliability (MSR) Method

S.Y. B.Sc. (IT) : Sem. III. Applied Mathematics. Q.1 Attempt the following (any THREE) [15]

Problem 1: Consider the following stationary data generation process for a random variable y t. e t ~ N(0,1) i.i.d.

Network Congestion Games

NAME: ANSWER KEY DATE: PERIOD. DIRECTIONS: MULTIPLE CHOICE. Choose the letter of the correct answer.

Safety and Reliability of Embedded Systems. (Sicherheit und Zuverlässigkeit eingebetteter Systeme) Stochastic Reliability Analysis

Gradient Descent for General Reinforcement Learning

Variants of Pegasos. December 11, 2009

Bethe-Salpeter Equation Green s Function and the Bethe-Salpeter Equation for Effective Interaction in the Ladder Approximation

4.1 The Uniform Distribution Def n: A c.r.v. X has a continuous uniform distribution on [a, b] when its pdf is = 1 a x b

CIVL 8/ D Boundary Value Problems - Triangular Elements (T6) 1/8

Control System Engineering (EE301T) Assignment: 2

(heat loss divided by total enthalpy flux) is of the order of 8-16 times

EEC 686/785 Modeling & Performance Evaluation of Computer Systems. Lecture 12

Lecture 1: Numerical Integration The Trapezoidal and Simpson s Rule

Mathematical Statistics. Chapter VIII Sampling Distributions and the Central Limit Theorem

On the Derivatives of Bessel and Modified Bessel Functions with Respect to the Order and the Argument

Reliability analysis of time - dependent stress - strength system when the number of cycles follows binomial distribution

Charging of capacitor through inductor and resistor

Discrete Markov Process. Introduction. Example: Balls and Urns. Stochastic Automaton. INTRODUCTION TO Machine Learning 3rd Edition

Transient Analysis of Two-dimensional State M/G/1 Queueing Model with Multiple Vacations and Bernoulli Schedule

1. Inverse Matrix 4[(3 7) (02)] 1[(0 7) (3 2)] Recall that the inverse of A is equal to:

ST 524 NCSU - Fall 2008 One way Analysis of variance Variances not homogeneous

Ensemble Methods: Boosting

Midterm Examination (100 pts)

where: u: input y: output x: state vector A, B, C, D are const matrices

The Fourier Transform

Solution in semi infinite diffusion couples (error function analysis)

An introduction to Support Vector Machine


Self-interaction mass formula that relates all leptons and quarks to the electron

Introduction to logistic regression

Chap 2: Reliability and Availability Models

In the complete model, these slopes are ANALYSIS OF VARIANCE FOR THE COMPLETE TWO-WAY MODEL. (! i+1 -! i ) + [(!") i+1,q - [(!

CHAPTER CHAPTER14. Expectations: The Basic Tools. Prepared by: Fernando Quijano and Yvonn Quijano

EE 434 Lecture 22. Bipolar Device Models

Basic Polyhedral theory

EE243 Advanced Electromagnetic Theory Lec # 10: Poynting s Theorem, Time- Harmonic EM Fields

Chapter 6 Student Lecture Notes 6-1

On the Speed of Heat Wave. Mihály Makai

Mixture Ratio Estimators Using Multi-Auxiliary Variables and Attributes for Two-Phase Sampling

TSS = SST + SSE An orthogonal partition of the total SS

Folding of Regular CW-Complexes

SER/BER in a Fading Channel

Deift/Zhou Steepest descent, Part I

Physics of Very High Frequency (VHF) Capacitively Coupled Plasma Discharges

First derivative analysis

MA1506 Tutorial 2 Solutions. Question 1. (1a) 1 ) y x. e x. 1 exp (in general, Integrating factor is. ye dx. So ) (1b) e e. e c.

Institute of Actuaries of India

Elementary Differential Equations and Boundary Value Problems

Davisson Germer experiment Announcements:

Shortest Paths in Graphs. Paths in graphs. Shortest paths CS 445. Alon Efrat Slides courtesy of Erik Demaine and Carola Wenk

Cmd> data<-matread("hwprobs.dat","exmpl8.2") This is the data shown in Table 8.5, the copper and diet factors.

Supplementary Materials

Outline. Probabilistic Model Learning. Probabilistic Model Learning. Probabilistic Model for Time-series Data: Hidden Markov Model

Analysis of decentralized potential field based multi-agent navigation via primal-dual Lyapunov theory

Transcription:

Boosng and Ensmbl Mhods

PAC Larnng modl Som dsrbuon D ovr doman X Eampls: <, c*> c* s h arg funcon Goal: Wh hgh probably -d fnd h n H such ha rrorh,c* < d and ar arbrarly small. Inro o ML 2

Wak Larnng - Confdnc Assum w ar gvn a larnng algorhm wh confdnc d ½ bu arbrary small accuracy ε>0 Can w boos h confdnc? How?! Inro o ML 3

BoosConfdnc Algorhm Inpu: Algorhm A and paramr d Cra klog2/d ndpndn problms Sampl S for -h copy Run Algorhm A on S wh accuracy /3 L h b h hypohss ha A oupus on S Tak a nw sampl S of sz m -2 log2k/d Rurn h bs hypohss h on S call h* Inro o ML 4

BoosConfdnc Analyss S For ach copy : Th probably ha rror h < /3 a las ½ Th probably ha som h has rrorh < /3 a las -2 -k - d/2 Holds for klog2/d Assum hs holds! Namly, som h has rrorh < /3 dno by h Inro o ML 5 +

BoosConfdnc Analyss Wh probably a las - d/2 for vry h : rrorh - [obs-rrorh ] ε/3 Chrnoff bound + unon bound usng m -2 log2k/d Assum hs holds! Toghr, wh probably a las -d rrorh* Errorh + ε/3 Obs-Errorh + 2ε/3 Obs-Errorh * Obs-Errorh + 2ε/3 Errorh * Obs_Error h * + ε/3 ε Inro o ML 6

Wak Larnng - Accuracy Assum: rrorh,c* < ½-g Th paramr g>0 s small consan Inuvly: gng slghly br han random should b asy Quson: Assum C s wak larnabl, s C PAC srong larnabl? Inro o ML 7

Wak Larnng - Dfnon Wak Larnng Algo A wak larns C usng H f ss γ>0 for all c n C for any dsrbuon D for all δ> 0 Oupus h n H such ha wh prob - δ rrorh,c < ½ - γ Srong Larnng Algo A srong larns C usng H f for all ε>0 for all c n C for any dsrbuon D for all δ> 0 Oupus h n H such ha wh prob - δ rrorh,c < ε Inro o ML 8

Wak larnng dfnon Why do w nd ANY dsrbuon Eampl: Consdr h followng dsrbuon ovr bs f 2 0 hn c* hard funcon ohrws c*0 Unform Dsrbuon Prdcng random for 2 0, 0 lswhr wll b corrc 87.5% Gng abov ha s mpossbl. Th hard dsrbuon: Sampl from h s whr 2 0 Inro o ML 9

Thr Wak Larnrs On wak larnr only on hng o do! Two wak larnrs wha o do f hy dsagr? Thr wak larnrs Can w mprov accuracy? Eampl - y - y 0-0 - 0 0-00 - 0 00-0 - 0 000-000 - 0 000-00000 - 0 Wak larnrs: sngl bs Inro o ML 0

Thr wak larnrs Frs wak larnr us h dsrbuon D g h Scond wak larnr How can w forc nw h? S D 2 s.. h has rror ½ G h 2 why h 2 h? Thrd wak larnr Wha ar nrsng npus? h 2 h L D 3 b such npus g h 3 How wll w prdc? If h 2 h usng h 3 Els h 2 or h majory Inro o ML

3 Wak Larnrs - Prformanc Dfn D 2 and D 3 Ds D 2 : Slc random b{0,} If b0 Sampl from D unl h c* Els b Sampl from D unl h c* Ds D 3 Sampl from D unl h h 2 Inro o ML Formally 0.5 D f - p D2 0.5 D f p p Pr[h c* ] D D3 Z 0 Z Pr[h h 2 f f ] h h h h c* h c* 2 h 2 2

3 Wak Larnrs - Analyss Assum, for smplcy, ha all WL hav rror ra p½ -γ If all rrors ar ndpndn, h rror of h majory s Error 3p 2 -p+p 3 3p 2-2p 3 Goal: holds vn for dpndn rrors Inro o ML 3

3 Wak Larnrs - Analyss Assum all WL ar p½ -γ To convr from a probably n D 2 o D 0.5 D f - p D2 0.5 D f p p Pr[h c* ] h h c* c* Corrc n h : 2-p Error n h : 2p Inro o ML 4

c: 2-p : 2p 3 Wak Larnrs - Analyss Assum all WL ar p½ -γ P c P cc P c P h on D h on D 2 h 2 on D 2 Inro o ML 5

c: 2-p : 2p 3 Wak Larnrs - Analyss Assum all WL ar p½ -γ P c P cc P c P h on D h on D 2 h 2 on D 2 a P * P c +P Inro o ML p-a 6

c: 2-p : 2p 3 Wak Larnrs - Analyss Assum all WL ar p½ -γ P c P cc P c P h on D h on D 2 h 2 on D 2 a ½-p+a Inro o ML ½ p-a 7

c: 2-p : 2p 3 Wak Larnrs - Analyss Assum all WL ar p½ -γ P c P cc P c P 2-pa 2p½-p+a 2pp-a h on D h 2 on D 2 a ½-p+a Inro o ML ½ p-a 8

3 Wak Larnrs - Analyss Assum all WL ar p½ -γ P c P cc P c P h on D 2-pa 2p½-p+a 2pp-a Error P +pp c +P c 3p 2-2p 3 Inro o ML 9

3 Wak Larnrs - Analyss Assum all WL ar p½ -γ P c P cc P c P h on D 2-pa 2p½-p+a 2pp-a Error P +pp c +P c 3p 2-2p 3 p3p-2p 2 p-4γ 2 Inro o ML 20

3 Wak Larnrs - Analyss Nw Error Old Error*-4γ 2-4γ 2 Inro o ML γ 2

Wha abou mor hypohss Th CS way Do rcursvly Can push down h rror arbrarly W show a mor consrucv way Inro o ML 22

ADABOOST: ADAPTIVE BOOSTING Inro o ML 23

AdaBoos: Ovrvw Buld a lnar classfr basc lmns, wak larnrs An onln approach ach m add on mor classfr F h sampl S Each m sp hav a hypohss f Slc a dsrbuon D on S Fnd a wak larnr h w.r. D Add h o h hypohss dcd on wgh a f + f +a h prdc sgnf + Inro o ML 24

AdaBoos: Algorhm Wak Larnrs H h: X{+,-} Inalzaon: FIXED Sampl S{,y..., m,y m } D /m Prdcon fsgnσα h Sp,, T D Inro o ML Rcv h WL w.r.. D Dfn ε Pr[h c*] α ½ log - ε / ε Dfn D +, Z D Z D - - y h y y h 26 h

AdaBoos: Inuon How do w chang h dsrbuon? Error wgh ncrass Corrc wgh dcrass Focus on h hard ampls Wha ar h paramrs? Th wak larnng class H Th numbr of raons T Assum o b npus Inro o ML 27

Illusrang AdaBoos Inal wghs for ach daa pon 0. 0. 0. Orgnal Daa + + + - - - - - + + 0.0625 B 0.0625 0.25 0.0094 0.0094 0.4623 Boosng Round + + + - - - - - - -.9459 0.693 28

Boosng Illusrang AdaBoos 0.0625 B 0.0625 0.0094 0.0094 0.4623 Round + + + - 0.25 - - - - - - 0.693.9459 Boosng B2 0.66 0.04 0.53 0.3037 0.0009 0.0422 Round 2 - - - - - - - - + + 2.9323 0.733 B3 0. 0.66 0.0 0.66 0.09 0.0276 0.89 0.0038 Boosng Round 3 + + + + + + + + + + 3.8744 0.69 Ovrall + + + - - - - - + + 29

AdaBoos: Analyss Thorm: Gvn,..., T h ranng rror of f s boundd by T T 2 - Proof basd on hr clams Inro o ML 30

AdaBoos: Analyss Clam : whr fσα h Corollary : Proof For + w hav unravl rcurrnc - f y T Z D D h y Z D D - - - - f y h y T h y T Z D Z D Z D D T - T f y Z md 3 m D / snc Inro o ML

AdaBoos: Analyss Clam 2: Proof T Z S f rror, - m T T T m f y m m Z D Z Z md m m f y I m f sgn y I m S f rror 0, Corollary z<0 -z > 32 Inro o ML

AdaBoos: Analyss Clam 3: Proof 2 Z - D D D Z h y h y h y m - - - - : : rcall : h y D 33 α ½ log - ε / ε and subsu Inro o ML α ε ε

AdaBoos: Analyss Why hs valu of alpha? vald for any α Mnmz Z o rduc rror clam 2 2 ln 2 2 Z d dz - - - - - - 34 Z - - Inro o ML

AdaBoos: Analyss Thorm: Gvn,..., T h rror of f s boundd by - - - T T T 2 2 2 4 2 g g 35 T Z Inro o ML

AdaBoos: Analyss Thorm: Gvn,..., T h rror of f s boundd by For 2 T T g > - g 2Tg - 2 T Inro o ML T - 4g 2 Z -2 g 2 36

Ensmbl Mhods

Ensmbl Mhods Hgh lvl da Gnra mulpl hypohss Combn hm o a sngl classfr Two mporan qusons How do w gnra mulpl hypohss w hav only on sampl How do w combn h mulpl hypohss Majory, AdaBoos, Inro o ML 39

Raonal for Ensmbl Mhods Sascal Compuaonal h f h h h h h f h h h f h Rprsnaonal Inro o ML 40 Sourc: hp://wb.ngr.orgonsa.du/~gd/publcaons/mcs-nsmbls.pdf

Boosng Boosng s acually an nsmbl mhod Gnrang dffrn hypohss: By changng h sampl dsrbuon Combnng hypohss wghd lnar prdcor Wghs drmn whn hypo. s slcd. Inro o ML 4

Baggng Inpu: a sngl larnng algorhm A How do w gnra dffrn Hypohss samplng wh rplacmn manans h sascs Formally, gvn a sampl S Sub sampl S,, S k Run A on S o gnra h Combnng: Smpl majory Inro o ML 42

Bas vs. Varanc Man Squar Error MSE E h [ f-h 2 ] Bas 2 + Var Whr Bas E[f-h] Var E[h 2 ] E 2 [h] MSE var bas 2 Inro o ML 43

Baggng raonal: Bas vrsus Var Why s on hypohss wors han many?! Epcd rror of h dncal o all h wors han ranng on all sampl smallr sampl BIAS Varanc of h rror sngl hypohss - flucuas consdrably majory of many - much mor sabl Mor sabl br gnralzaon h ranng rror br rflcs h ru rror Inro o ML 44

Dcson Tr vs. Baggng Sourc: hp://www.sldshar.n/0daa/gbm-2789077 Inro o ML 45

Sackng Inpu: Sampl S k algorhms A combng algo C Run A on S gnra h Gvn h,, h k gnra nw sampl,y h,,h k,y Run C o gnra ħ Oupu ħ Wha can b A? Wha can b C? Baggng: A sub-sampls C s a majory AdaBoos: A wak hypo m C wghd majory Inro o ML 46

Random Fors: movaon Dcson Trs: Bas Dcson r craon s vry nosy Dpnds on parcular sampl Lowrng Varanc: Avragng ovr dcson rs How can w gnra dffrn dcson rs? Sub-sampl h sampl Forc cran arbus Inro o ML 47

Random Fors: Algorhm Cra K dffrn dcson rs: Sampl: Slc a random sub-sampl Pracc: 66% GOAL: Gnra a vary of DT Wll corrlad wh y Combnng: Majory Arbus: In ach nod slc subs F of arbus F M Wak larnrs Slc h bs ar. n F Valus of M: M: random MN all arbus rgular DT << M<< N Subs of arbus Popular: M N Inro o ML 48

Random Fors Sourc: hp://cs.sanford.du/popl/karpahy/randomforsspral.png Inro o ML 49

Random Fors: Concluson Bnfs: Fas o run Farly sabl oucom Compv prformanc Handls mssng/paral daa Waknsss: Losss nrprably unlk DT Many paramrs Bu sms robus Faur slcon Collds wh arbus samplng Inro o ML 50

MNIST: Comparav rsuls Classfr Accuracy Tranng Tm Tsng Tm Nural N 97.80% 8.2654s 0.398s Lnar SVM 94.6% 68.6950s 58.00s Random Fors 96.4% 2.359s 26.0763s k nars nghbors k3 96.95% 4.6439s 26.785s Dcson Tr dph5 65.40% 3.346s 0.033s Adaboos 73.67% 37.6443s.585s Sourc: hps://marn-homa.com/comparng-classfrs/ Inro o ML 5

Concluson Boosng Wak vs srong larnng Thory 3 o Pracc AdaBoos Ensmbl Mhods Boosng Baggng Sackng Random Fors Inro o ML 52