COMP th April, 2007 Clement Pang

Size: px

Start display at page:

Download "COMP th April, 2007 Clement Pang"

Lionel Watkins
5 years ago
Views:

1 COMP th Aprl, 2007 Cleent Pang

2 Boostng Cobnng weak classers Fts an Addtve Model Is essentally Forward Stagewse Addtve Modelng wth Exponental Loss Loss Functons Classcaton: Msclasscaton, Exponental, Bnoal Devance, Squared Error, Support Vector Regresson: Squared Error, Absolute Error, Huber

3 MART Generalzaton o tree boostng Tres to tgate proble o decson trees ro beng less accurate than the best classer or a partcular proble

4 Let s see MART n acton rst beore gong nto detals Spa dataset ro Chapter 9 Error Rates: MART: 4.0% Addtve Logstc Regresson: 5.3% CART ully grown and pruned by CV): 8.7% MARS: 5.5% standard error o estates: 0.6%)

6 More on ths n Secton Predctor Varables Most Relevant:! $ hp reove Least relevant: table 3d

7 Pr spa x) x) log Pr eal x)

9 One Varable Shows dependence o log-odds wth predctor Two Varable Shows nteractons aong the predctor varables When to Run? Runnng MART wth J=2 an eects odel) yelds a hgher error rate when copared to runnng wth larger J

11 MART deonstraton wth TreeNet Classcaton and Regresson

12 Decson tree: Foral Expresson: Paraeter: Optzaton Process: j j x R x ) J j j R j x I x T 1 ) ) ; J j j R 1 }, { J j R x j j y L 1 ), arg n ˆ

13 Approxaton Fndng γ j gven R j Trval Estatng γ j s oten the ean/ode o y n regon R j Fndng R j Dcult Typcal way s to use a greedy, top-down recursve parttonng algorth Can also approxate by a soother and ore convenent crteron 10.26)

14 Su o Trees Solve usng FSAM The dcult part s ndng R j M M x T x 1 ) ; ) j j R x j j N x y L x T x y L ) ), arg n )) ; ), arg n 1 1 1

15 Soe specal cases are easer Square-error loss: nd the tree that best predct the current resdual Two-class w/ Exponental loss: Adaboost.M1; tree that nze weghted error rate; {-1, +1} N-class w/ Exponental loss: ˆ arg n N 1 w ) exp[ y T x ; γ can be ound by 10.31) weghted log-odds n each regon )]

16 Regresson: Absolute Error, Huber Loss Classcaton: Devance Wll robusty boostng trees However, they do not gve rse to sple ast boostng algorths

17 Solvng each step n FSAM by nuercal optzaton Derentable loss crteron Total loss: L Goal: ˆ ) N 1 L y, x arg n L )) )

18 s a vector Paraeters o are the values at each data pont { x ), x2),..., x 1 N Nuercal optzaton solves the proble wth a su o coponent vectors M 0 h 0 h M 0 )}

19 Greedy Strategy Gradents n Table 10.2 x x g g L x x y L g 1 1 ) ) ) arg n ) )), 1

20 Splyng To Ratonale: Mnze Loss vs. Generalzaton N x T x y L 1 1 )) ; ), arg n N x T g 1 2 )) ; arg n ~

22 Sze o tree J: nuber o ternal nodes) or each teraton o boostng Sple strategy: constant J How to nd J? Mnze predcton rsk on uture data

23 Analyss o Varance o Predctor Varables

Most probles have low-order nteracton eects donatng the proble space Thus, odels wth hgh-order nteracton wll suer n accuracy Interacton eects are lted

24 Most probles have low-order nteracton eects donatng the proble space Thus, odels wth hgh-order nteracton wll suer n accuracy Interacton eects are lted by J No nteracton eects o level greater than K-1 are possble J=2: Decson Stup only an eects, no nteractons) J=3: two-varable nteracton eects are allowed

26 Typcally J = 2 wll be nsucent J > 10 wll be hghly unlkely 4 <= J <= 8 works well n boostng by experence J=6 should be the ntal guess

27 Regularzaton: preventon o overttng o data by odels Exaple: Paraeter M Increases M reduces the tranng rsk Could lead to overttng Use a hold-out set Slar to early stoppng strategy n NN

28 Scale the contrbuton o each tree by a actor 0 < v < 1 J 1 x) v ji x R j) j1 x) Controllng the learnng rate o the boostng procedure v, M; v, M Eprcally, saller v avor better test error but longer tranng te Best strategy s to choose a sall v v < 0.1) and nd M by early stoppng

31 Consder the set o all possble J-ternal node regresson trees as bass unctons Thus, the lnear odel: K x) T k1 k k x) K = cartt) and s lkely to be uch larger than any possble tranng set Thus, penalzed least squares s requred to nd the alphas

32 Penalty Functon Rdge regresson Lasso K k k K k k N k k k J J J x T y ) ) ) ) arg n ) ˆ

33 Many alphas wll be zero wth a large labda Only a racton o possble tress are relevant Proble: Stll can t solve or all possble tress Soluton: Forward stagewse strategy Intalze to alpha = 0 rst More teratons lead to saller alphas

36 The approxaton works approxates lasso) Tree boostng wth shrnkage resebles penalzed regresson No shrnkage s analogous to subset selecton penalzes the nuber o non-zero coecents)

37 Superor perorance o boostng over procedures such as SVM ay be largely due to the plct use o L1 versus L2 penalty L1 penalty s better suted to sparse stuatonsdonoho et al., 1995) Though nzaton o L1-penalzed proble s uch ore dcult than that or L2 The orward stagewse approach provdes an approxate, practcal way to tackle the proble

38 Sngle decson tress are hghly nterpretable Lnear cobnaton o tress lose ths eature How to nterpret the odel then?

39 Brean et al. 1984) proposed a easure o relevance or each predctor varable or a sngle decson tree Intuton: varable s the one that gves axu estated proveent n squared error rsk Sply average over the trees or addtve odels Also works or K-class classers Pg. 332)

40 Vsualzaton s a great tool but s lted to low-densonal vews Margnal average o a odel gven a subset o nput varables and the copleent o that wthn all nput varables Works or k-class probles as well

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set