COMP th April, 2007 Clement Pang

COMP 540 12 th Aprl, 2007 Cleent Pang

Boostng Cobnng weak classers Fts an Addtve Model Is essentally Forward Stagewse Addtve Modelng wth Exponental Loss Loss Functons Classcaton: Msclasscaton, Exponental, Bnoal Devance, Squared Error, Support Vector Regresson: Squared Error, Absolute Error, Huber

MART Generalzaton o tree boostng Tres to tgate proble o decson trees ro beng less accurate than the best classer or a partcular proble

Let s see MART n acton rst beore gong nto detals Spa dataset ro Chapter 9 Error Rates: MART: 4.0% Addtve Logstc Regresson: 5.3% CART ully grown and pruned by CV): 8.7% MARS: 5.5% standard error o estates: 0.6%)

More on ths n Secton 10.13 57 Predctor Varables Most Relevant:! $ hp reove Least relevant: 857 415 table 3d

Pr spa x) x) log Pr eal x)

One Varable Shows dependence o log-odds wth predctor Two Varable Shows nteractons aong the predctor varables When to Run? Runnng MART wth J=2 an eects odel) yelds a hgher error rate when copared to runnng wth larger J

MART deonstraton wth TreeNet Classcaton and Regresson

Decson tree: Foral Expresson: Paraeter: Optzaton Process: j j x R x ) J j j R j x I x T 1 ) ) ; J j j R 1 }, { J j R x j j y L 1 ), arg n ˆ

Approxaton Fndng γ j gven R j Trval Estatng γ j s oten the ean/ode o y n regon R j Fndng R j Dcult Typcal way s to use a greedy, top-down recursve parttonng algorth Can also approxate by a soother and ore convenent crteron 10.26)

Su o Trees Solve usng FSAM The dcult part s ndng R j M M x T x 1 ) ; ) j j R x j j N x y L x T x y L ) ), arg n )) ; ), arg n 1 1 1

Soe specal cases are easer Square-error loss: nd the tree that best predct the current resdual Two-class w/ Exponental loss: Adaboost.M1; tree that nze weghted error rate; {-1, +1} N-class w/ Exponental loss: ˆ arg n N 1 w ) exp[ y T x ; γ can be ound by 10.31) weghted log-odds n each regon )]

Regresson: Absolute Error, Huber Loss Classcaton: Devance Wll robusty boostng trees However, they do not gve rse to sple ast boostng algorths

Solvng each step n FSAM by nuercal optzaton Derentable loss crteron Total loss: L Goal: ˆ ) N 1 L y, x arg n L )) )

s a vector Paraeters o are the values at each data pont { x ), x2),..., x 1 N Nuercal optzaton solves the proble wth a su o coponent vectors M 0 h 0 h M 0 )}

Greedy Strategy Gradents n Table 10.2 x x g g L x x y L g 1 1 ) ) ) arg n ) )), 1

Splyng To Ratonale: Mnze Loss vs. Generalzaton N x T x y L 1 1 )) ; ), arg n N x T g 1 2 )) ; arg n ~

Sze o tree J: nuber o ternal nodes) or each teraton o boostng Sple strategy: constant J How to nd J? Mnze predcton rsk on uture data

Analyss o Varance o Predctor Varables

Most probles have low-order nteracton eects donatng the proble space Thus, odels wth hgh-order nteracton wll suer n accuracy Interacton eects are lted by J No nteracton eects o level greater than K-1 are possble J=2: Decson Stup only an eects, no nteractons) J=3: two-varable nteracton eects are allowed

Typcally J = 2 wll be nsucent J > 10 wll be hghly unlkely 4 <= J <= 8 works well n boostng by experence J=6 should be the ntal guess

Regularzaton: preventon o overttng o data by odels Exaple: Paraeter M Increases M reduces the tranng rsk Could lead to overttng Use a hold-out set Slar to early stoppng strategy n NN

Scale the contrbuton o each tree by a actor 0 < v < 1 J 1 x) v ji x R j) j1 x) Controllng the learnng rate o the boostng procedure v, M; v, M Eprcally, saller v avor better test error but longer tranng te Best strategy s to choose a sall v v < 0.1) and nd M by early stoppng

Consder the set o all possble J-ternal node regresson trees as bass unctons Thus, the lnear odel: K x) T k1 k k x) K = cartt) and s lkely to be uch larger than any possble tranng set Thus, penalzed least squares s requred to nd the alphas

Penalty Functon Rdge regresson Lasso K k k K k k N k k k J J J x T y 1 1 2 1 2 ) ) ) ) arg n ) ˆ

Many alphas wll be zero wth a large labda Only a racton o possble tress are relevant Proble: Stll can t solve or all possble tress Soluton: Forward stagewse strategy Intalze to alpha = 0 rst More teratons lead to saller alphas

The approxaton works approxates lasso) Tree boostng wth shrnkage resebles penalzed regresson No shrnkage s analogous to subset selecton penalzes the nuber o non-zero coecents)

Superor perorance o boostng over procedures such as SVM ay be largely due to the plct use o L1 versus L2 penalty L1 penalty s better suted to sparse stuatonsdonoho et al., 1995) Though nzaton o L1-penalzed proble s uch ore dcult than that or L2 The orward stagewse approach provdes an approxate, practcal way to tackle the proble

Sngle decson tress are hghly nterpretable Lnear cobnaton o tress lose ths eature How to nterpret the odel then?

Brean et al. 1984) proposed a easure o relevance or each predctor varable or a sngle decson tree Intuton: varable s the one that gves axu estated proveent n squared error rsk Sply average over the trees or addtve odels Also works or K-class classers Pg. 332)

Vsualzaton s a great tool but s lted to low-densonal vews Margnal average o a odel gven a subset o nput varables and the copleent o that wthn all nput varables Works or k-class probles as well