Learning From Data Lecture 10 Nonlinear Transforms

Learning From Data Lecture 0 Nonlinear Transforms The Z-space Polynomial transforms Be careful M. Magdon-Ismail CSCI 400/600

recap: The Linear Model linear in w: makes the algorithms work linear in x: gives the line/hyperplane separator s = w t x Approve or Deny Perceptron Classification Error PLA, Pocket,... Credit Analysis Amount of Credit Linear Regression Squared Error Pseudo-inverse Probability of Default Logistic Regression Cross-entropy Error Gradient descent c AM L Creator: Malik Magdon-Ismail Nonlinear Transforms: 2 /7 Limitations of linear

The Linear Model has its Limits (a) Linear with outliers (b) Essentially nonlinear To address (b) we need something more than linear. c AM L Creator: Malik Magdon-Ismail Nonlinear Transforms: 3 /7 Change the features

Change Your Features Income Y 3 years no additional effect beyond Y = 3; Y 0.3 years no additional effect below Y = 0.3. Years in Residence, Y c AM L Creator: Malik Magdon-Ismail Nonlinear Transforms: 4 /7 Transform your features

Change Your Features Using a Transform Income Income Years in Residence, Y z z Y c AM L Creator: Malik Magdon-Ismail Nonlinear Transforms: 5 /7 Feature transform I: Z-space

Mechanics of the Feature Transform I Transform the data to a Z-space in which the data is separable. 0 x2 z2 = x 2 2 x z = x 2 x = x x 2 z = Φ(x) = = x 2 x 2 2 Φ (x) Φ 2 (x) c AM L Creator: Malik Magdon-Ismail Nonlinear Transforms: 6 /7 Feature transform II: classify in Z-space

Mechanics of the Feature Transform II Separate the data in the Z-space with w: g(z) = sign( w t z) c AM L Creator: Malik Magdon-Ismail Nonlinear Transforms: 7 /7 Feature transform III: bring back to X-space

Mechanics of the Feature Transform III To classify a new x, first transform x to Φ(x) Z-space and classify there with g. g(x) = g(φ(x)) = sign( w t Φ(x)) g(z) = sign( w t z) c AM L Creator: Malik Magdon-Ismail Nonlinear Transforms: 8 /7 Summary of nonlinear transform

The General Feature Transform X-space is R d x = x. x d Z-space is R d z = Φ(x) = Φ (x). Φ d(x) = z. z d x,x 2,...,x N z,z 2,...,z N y,y 2,...,y N no weights w = y,y 2,...,y N w 0 w. w d g(x) = sign( w t Φ(x)) c AM L Creator: Malik Magdon-Ismail Nonlinear Transforms: 9 /7 Generalization

Generalization d vc dvc d+ d+ Choose the feature transform with smallest d c AM L Creator: Malik Magdon-Ismail Nonlinear Transforms: 0 /7 Many possibilities to choose from

Many Nonlinear Features May Work x2 x 2 +x2 2 = 0.6 0 x z2 = x2 z2 = x 2 2 z = x 2 +x 2 2 0.6 z = (x +0.05) 2 z = x 2 A rat! A rat! This is called data snooping: looking at your data and tailoring your H. c AM L Creator: Malik Magdon-Ismail Nonlinear Transforms: /7 Choose before looking at data

Must Choose Φ BEFORE Your Look at the Data After constructing features carefully, before seeing the data......if you think linear is not enough, try the 2nd order polynomial transform. x x 2 = x Φ(x) = Φ (x) Φ 2 (x) Φ 3 (x) Φ 4 (x) Φ 5 (x) = x x 2 x 2 x x 2 x 2 2 c AM L Creator: Malik Magdon-Ismail Nonlinear Transforms: 2 /7 The polynomial transform

The General Polynomial Transform Φ k We can get even fancier: degree-k polynomial transform: Φ (x) = (,x,x 2 ), Φ 2 (x) = (,x,x 2,x 2,x x 2,x 2 2), Φ 3 (x) = (,x,x 2,x 2,x x 2,x 2 2,x 3,x 2 x 2,x x 2 2,x 3 2), Φ 4 (x) = (,x,x 2,x 2,x x 2,x 2 2,x 3,x 2 x 2,x x 2 2,x 3 2,x 4,x 3 x 2,x 2 x 2 2,x x 3 2,x 4 2),. Dimensionality of the feature space increases rapidly (d vc )! Similar transforms for d-dimensional original space. Approximation-generalization tradeoff Higher degree gives lower (even zero) E in but worse generalization. c AM L Creator: Malik Magdon-Ismail Nonlinear Transforms: 3 /7 Be carefull with nonlinear transforms

Be Careful with Feature Transforms c AM L Creator: Malik Magdon-Ismail Nonlinear Transforms: 4 /7 Insist on E in = 0

Be Careful with Feature Transforms High order polynomial transform leads to nonsense. c AM L Creator: Malik Magdon-Ismail Nonlinear Transforms: 5 /7 Digits data

Digits Data Versus All 0.35 Symmetry Symmetry Average Intensity Average Intensity Linear model E in = 2.3% E out = 2.38% 3rd order polynomial model E in =.75% E out =.87% c AM L Creator: Malik Magdon-Ismail Nonlinear Transforms: 6 /7 Use the linear model!

Use the Linear Model! First try a linear model simple, robust and works. Algorithms can tolerate error plus you have nonlinear feature transforms. Choose a feature transform before seeing the data. Stay simple. Data snooping is hazardous to your E out. Linear models are fundamental in their own right; they are also the building blocks of many more complex models like support vector machines. Nonlinear transforms also apply to regression and logistic regression. c AM L Creator: Malik Magdon-Ismail Nonlinear Transforms: 7 /7