Machine Learning & Data Mining CS/CNS/EE 155. Lecture 4: Regulariza5on, Sparsity & Lasso

Size: px

Start display at page:

Download "Machine Learning & Data Mining CS/CNS/EE 155. Lecture 4: Regulariza5on, Sparsity & Lasso"

Hector Gibbs
5 years ago
Views:

1 Machne Learnng Data Mnng CS/CS/EE 155 Lecture 4: Regularza5on, Sparsty Lasso 1

2 Recap: Complete Ppelne S = {(x, y )} Tranng Data f (x, b) = T x b Model Class(es) L(a, b) = (a b) 2 Loss Func5on,b L( y, f (x, b) ) SGD! Cross Valda5on Model Selec5on Proft! 2

3 Dfferent Model Classes? Op5on 1: SVMs vs As vs LR vs LS Op5on 2: Regularza5on,b L( y, f (x, b) ) SGD! Cross Valda5on Model Selec5on 3

4 ota5on Part 1 L0 orm (not actually a norm) of non-zero entres L1 orm Sum of absolute values L2 orm Squared L2 orm Sum of squares Sqrt(sum of squares) L-nfnty orm Max absolute value 0 = = lm p d = 1 = 1 d 0 [ ] d d 2 = d T d 2 = 2 d T p d d d p = max d d 4

5 ota5on Part 2 Mnmzng Squared Loss Regresson Least-Squares ( y T x + b) 2 (Unless Otherse Stated) E.g., Logs5c Regresson = Log Loss 5

6 Rdge Regresson,b λ T + ( y T x + b) 2 Regularza5on Tranng Loss aka L2-Regularzed Regresson Trades off model complexty vs tranng loss Each choce of λ a model class Wll dscuss the further later 6

7 ,b Larger Lambda! x = " " $ y = %$ 1 [age>10] 1 [gender=male] Test Loss Tranng Loss $ % 1 heght > 55" 0 heght 55" lambda λ T + ( y T x + b) b Tran Test Person Age>10 Male? Heght > 55 Alce Bob Carol Dave Ern Frank Gena Harold Irene John Kelly Larry

8 Updated Ppelne S = {(x, y )} Tranng Data f (x, b) = T x b Model Class L(a, b) = (a b) 2 Loss Func5on,b ( ) λ T + L y, f (x, b) Choosng λ! Cross Valda5on Model Selec5on Proft! 8

9 Test Tran Person Age>10 Male? Heght > 55 Model Score / Increasng Lambda Alce Bob Carol Dave Ern Frank Gena Harold Irene John Kelly Larry Best test error 9

10 Choce of Lambda Depends on Tranng Sze Tranng Ponts Tranng Ponts test loss 30 test loss lambda lambda test loss Tranng Ponts lambda test loss Tranng Ponts lambda 25 dmensonal space Randomly generated lnear response func5on + nose 10

11 Recap: Rdge Regularza5on Rdge Regresson: L2 Regularzed Least-Squares,b λ T + Large λ è more stable predc5ons Less lkely to overft to tranng data Too large λ è underft Works th other loss Hnge Loss, Log Loss, etc. ( y T x + b) 2 11

12 Asde: Stochas5c Gradent Descent,b!L(, b) = ( ) λ 2 + L y, f (x, b) 1 λ 2 + L (, b) 1!L(, b) = E! L (, b) Do SGD on ths 12

13 Model Class Interpreta5on,b Ths s not a model class! At least not hat e ve dscussed... An op5mza5on procedure Is there a connec5on? ( ) λ T + L y, f (x, b) 13

14 orm Constraned Model Class f (x, b) = T x b s.t. T c 2 c Vsualza5on c=1 c=2 c=3 L2 orm Seems to correspond to lambda,b λ T + L y, f (x, b) ( ) lambda 14

15 Lagrange Mul5plers s.t. T c L(y, ) ( y T x) 2 Op5malty Cond5on: Gradents algned! Constrant Boundary λ 0 :( L(y, ) = λ T ) ( T c) Omrng b 1 tranng data for smplcty hpp://en.kpeda.org/k/lagrange_mul5pler 15

16 orm Constraned Model Class Tranng: L(y, ) ( y T x) 2 s.t. T c Omrng b 1 tranng data for smplcty To Cond5ons Must Be Sa5sfed At Op5malty ó. ObservaPon about OpPmalty: λ 0 :( L(y, ) = λ T ) ( T c) Lagrangan:,λ Λ(, λ) = ( y T x) 2 + λ ( T c) Clam: Solvng Lagrangan Solves orm-constraned Tranng Problem OpPmalty ImplcaPon of Lagrangan: SaPsfes Frst CondPon! Λ(, λ) = 2x( y T x) T + 2λ 0 è 2x( y T x) T = 2λ hpp://en.kpeda.org/k/lagrange_mul5pler 16

17 orm Constraned Model Class Tranng: L(y, ) ( y T x) 2 s.t. T c Omrng b 1 tranng data for smplcty To Cond5ons Must Be Sa5sfed At Op5malty ó. ObservaPon about OpPmalty: λ 0 :( L(y, ) = λ T ) ( T c) Lagrangan:,λ Λ(, λ) = ( y T x) 2 + λ ( T c) Clam: Solvng Lagrangan Solves orm-constraned Tranng Problem OpPmalty ImplcaPon of Lagrangan: SaPsfes Frst CondPon! Λ(, λ) = 2x( y T x) T + 2λ 0 è 2x( y T x) T = 2λ hpp://en.kpeda.org/k/lagrange_mul5pler 17

18 orm Constraned Model Class Tranng: L(y, ) ( y T x) 2 s.t. T c Omrng b 1 tranng data for smplcty To Cond5ons Must Be Sa5sfed At Op5malty ó. ObservaPon about OpPmalty: λ 0 :( L(y, ) = λ T ) ( T c) Lagrangan:,λ Λ(, λ) = ( y T x) 2 + λ ( T c) Clam: Solvng Lagrangan Solves orm-constraned Tranng Problem SaPsfes 2 nd CondPon! OpPmalty ImplcaPon of Lagrangan: %' 0 f λ Λ(, λ) = T < c (' T c f T c 0 è T c hpp://en.kpeda.org/k/lagrange_mul5pler 18

19 orm Constraned Model Class Tranng: L(y, ) ( y T x) 2 s.t. T c L2 Regularzed Tranng: λ T + ( y T x) 2 Lagrangan: Λ(, λ) = ( y T x) 2 + λ ( T c),λ Lagrangan = orm Constraned Tranng: λ 0 :( L(y, ) = λ T ) T c Lagrangan = L2 Regularzed Tranng: Hold λ fxed Equvalent to solvng orm Constraned! For some c ( ) Omrng b 1 tranng data for smplcty hpp://en.kpeda.org/k/lagrange_mul5pler 19

20 Recap 2: Rdge Regularza5on Rdge Regresson: L2 Regularzed Least-Squares = orm Constraned Model,b λ T + L(),b Large λ è more stable predc5ons Less lkely to overft to tranng data Too large λ è underft Works th other loss Hnge Loss, Log Loss, etc. L() s.t. T c 20

21 Hallucna5ng Data Ponts λ T + ( y T x ) 2 = 2λ 2 x( y T x ) T Instead hallucnate D data ponts? D = 2 λe d T λe d 2x y T x d=1 D D d=1 d=1 ( 0 T λe ) 2 d + ( ) T D d=1 ( y T x ) 2 = 2 λe T d = 2 λ d = 2λ ( ) T IdenPcal to RegularzaPon! {( λe d, 0) } D d=1 Unt vector along d-th Dmenson! e d = " 0! 0 1 0! 0 $ % Omrng b for smplcty 21

22 Extenson: Mul5-task Learnng 2 predc5on tasks: Spam flter for Alce Spam flter for Bob Lmted tranng data for both but Alce s smlar to Bob 22

23 Extenson: Mul5-task Learnng To Tranng Sets rela5vely small S (1) = (x (1), y (1) { )} S (2) = (x (2), y (2) { )} OpPon 1: Tran Separately v ( ) 2 λ T + y (1) T x (1) ( ) 2 λv T v + y (2) v T x (2) Both models have hgh error. Omrng b for smplcty 23

24 Extenson: Mul5-task Learnng To Tranng Sets rela5vely small S (1) = (x (1), y (1) { )} S (2) = (x (2), y (2) { )} OpPon 2: Tran Jontly,v ( ) 2 λ T + y (1) T x (1) ( ) 2 + λv T v + y (2) v T x (2) Doesn t accomplsh anythng! ( v don t depend on each other) Omrng b for smplcty 24

25 Mul5-task Regularza5on,v ( ) 2 λ T + λv T v +γ ( v) T ( v) + y (1) T (1) x + y (2) v T (2) x ( ) 2 Standard Regularza5on Mul5-task Regularza5on Tranng Loss Prefer v to be close Controlled by γ Tasks smlar Larger γ helps! Tasks not den5cal γ not too large test loss Test Loss (Task 2) gamma 25

26 Lasso L1-Regularzed Least-Squares 26

27 L1 Regularzed Least Squares L2: λ + = 2 vs ( y T x ) 2 = λ 2 + ( y T x ) 2 L1: =1 = 2 =1 = = vs vs vs = 0 =1 = Omrng b for smplcty 27

28 Asde: Subgradent (sub-dfferen5al) a R(a) = c a' : R(a') R(a) c(a' a) { } Dfferen5able: a R(a) = a R(a) L1: d % $ % % 1 f d < 0 +1 f d > 0 [ 1,+1 ] f d = 0 ConPnuous range for =0! Omrng b for smplcty 28

29 L1 Regularzed Least Squares L2: λ + ( y T x ) 2 λ ( y T x ) 2 d 2 = 2 d L1: d % $ % % 1 f d < 0 +1 f d > 0 [ 1,+1 ] f d = Omrng b for smplcty 29

30 Lagrange Mul5plers s.t. c L(y, ) ( y T x) 2 d % $ % % 1 f d < 0 +1 f d > 0 [ 1,+1 ] f d = 0 SoluPons tend to be at corners! λ 0 :( L(y, ) λ ) ( c) Omrng b 1 tranng data for smplcty hpp://en.kpeda.org/k/lagrange_mul5pler 30

31 Sparsty s sparse f mostly 0 s: Small L0 orm 0 = d 1 d 0 [ ] Why not L0 Regularza5on? ot conpnuous! λ 0 + ( y T x ) 2 L1 nduces sparsty And s con5nuous! λ + ( y T x ) 2 Omrng b for smplcty 31

32 Why s Sparsty Important? Computa5onal / Memory Effcency Store 1M numbers n array Store 2 numbers per non-zero (Index, Value) pars E.g., [ (50,1), (51,1) ] Dot product more effcent: T x! " 0 0! ! 0 $ % Some5mes true s sparse Want to recover non-zero dmensons 32

33 Lasso Guarantee λ + ( y T x + b) 2 Suppose data generated as: y ~ ormal( T * x,σ 2 ) Then f: λ > 2 κ 2σ 2 log D Wth hgh probablty (ncreasng th ): Supp( ) Supp( * ) d : d λc Supp è ( ) = Supp( * ) Hgh Precson Parameter Recovery SomePmes Hgh Recall Supp( * ) = { d *,d 0} See also: hpps://.cs.utexas.edu/~pradeepr/courses/395t-lt/flez/hghdmii.pdf hpp://.eecs.berkeley.edu/~anrg/papers/wa_sparseinfo09.pdf 33

34 male? L1 L age >10? Magntude of the to eghts. (As regularza5on shrnks) Person Age>10 Male? Heght > 55 Alce Bob Carol Dave Ern Frank Gena Harold Irene John Kelly Larry

35 Recap: Lasso vs Rdge Model Assump5ons Lasso learns sparse eght vector Predc5ve Accuracy Lasso oxen not as accurate Re-run Least Squares on dmensons selected by Lasso Ease of Inspec5on Sparse s easer to nspect Ease of Op5mza5on Lasso somehat trcker to op5mze 35

36 Recap: Regularza5on L2 L1 (Lasso) λ 2 + λ + ( y T x ) 2 ( y T x ) 2 Mul5-task [Insert Yours Here!],v λ T + λv T v +γ ( v) T ( v) ( ) 2 + y (1) T (1) x + y (2) v T (2) x ( ) 2 Omrng b for smplcty 36

37 ext Lectures Decson Trees Baggng Random Forests Boos5ng Ensemble Selec5on o Recta5on ths Week

Machine Learning & Data Mining CS/CNS/EE 155. Lecture 4: Regularization, Sparsity & Lasso

Machine Learning & Data Mining CS/CNS/EE 155. Lecture 4: Regularization, Sparsity & Lasso Machne Learnng Data Mnng CS/CS/EE 155 Lecture 4: Regularzaton, Sparsty Lasso 1 Recap: Complete Ppelne S = {(x, y )} Tranng Data f (x, b) = T x b Model Class(es) L(a, b) = (a b) 2 Loss Functon,b L( y, f