Learning From Data Lecture 10 Nonlinear Transforms

Similar documents
Learning From Data Lecture 8 Linear Classification and Regression

SVMs: nonlinearity through kernels

Learning From Data Lecture 25 The Kernel Trick

Learning From Data Lecture 2 The Perceptron

CSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18

Learning From Data Lecture 15 Reflecting on Our Path - Epilogue to Part I

Preliminaries. Definition: The Euclidean dot product between two vectors is the expression. i=1

Learning From Data Lecture 9 Logistic Regression and Gradient Descent

Learning From Data Lecture 13 Validation and Model Selection

Learning From Data Lecture 5 Training Versus Testing

Learning From Data Lecture 26 Kernel Machines

Linear models and the perceptron algorithm

Learning From Data Lecture 14 Three Learning Principles

outline Nonlinear transformation Error measures Noisy targets Preambles to the theory

Learning From Data Lecture 7 Approximation Versus Generalization

CSC321 Lecture 5: Multilayer Perceptrons

Learning From Data Lecture 3 Is Learning Feasible?

CSCI567 Machine Learning (Fall 2014)

Binary Classification / Perceptron

Learning From Data Lecture 12 Regularization

Probabilistic Machine Learning. Industrial AI Lab.

Gaussian and Linear Discriminant Analysis; Multiclass Classification

Linear and Logistic Regression. Dr. Xiaowei Huang

Deep Neural Networks (1) Hidden layers; Back-propagation

SVMs, Duality and the Kernel Trick

CSC321 Lecture 4 The Perceptron Algorithm

Logistic Regression Trained with Different Loss Functions. Discussion

Machine Learning Lecture 7

Homework #3 RELEASE DATE: 10/28/2013 DUE DATE: extended to 11/18/2013, BEFORE NOON QUESTIONS ABOUT HOMEWORK MATERIALS ARE WELCOMED ON THE FORUM.

Machine Learning Basics Lecture 4: SVM I. Princeton University COS 495 Instructor: Yingyu Liang

Machine Learning Foundations

Modelli Lineari (Generalizzati) e SVM

CSC321 Lecture 4: Learning a Classifier

Vote. Vote on timing for night section: Option 1 (what we have now) Option 2. Lecture, 6:10-7:50 25 minute dinner break Tutorial, 8:15-9

Notes on Discriminant Functions and Optimal Classification

Kaggle.

CSC 411 Lecture 10: Neural Networks

ECE521 Lecture7. Logistic Regression

ESS2222. Lecture 4 Linear model

Linear Models for Classification

Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron

Deep Neural Networks (1) Hidden layers; Back-propagation

CSC321 Lecture 6: Backpropagation

Support Vector Machines

1 Machine Learning Concepts (16 points)

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

CS 188: Artificial Intelligence Spring Announcements

Stochastic gradient descent; Classification

MIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE

CSC321 Lecture 4: Learning a Classifier

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)

Day 4: Classification, support vector machines

Neural Networks. Haiming Zhou. Division of Statistics Northern Illinois University.

CPSC 340: Machine Learning and Data Mining

Linear Models for Classification

ESS2222. Lecture 5 Support Vector Machine

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop

Learning from Data: Multi-layer Perceptrons

Support vector machines Lecture 4

ECE521 Lectures 9 Fully Connected Neural Networks

Machine Learning Linear Models

Regression and Classification" with Linear Models" CMPSCI 383 Nov 15, 2011!

Machine Learning 4771

CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS

What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1

17 Neural Networks NEURAL NETWORKS. x XOR 1. x Jonathan Richard Shewchuk

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Overfitting, Bias / Variance Analysis

Two Posts to Fill On School Board

Linear & nonlinear classifiers

ECS171: Machine Learning

Lecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning

ECS171: Machine Learning

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li.

Classification objectives COMS 4771

Machine Learning Lecture 5

Machine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang

CS260: Machine Learning Algorithms

CSC321 Lecture 5 Learning in a Single Neuron

Machine Learning Foundations

(Kernels +) Support Vector Machines

Max Margin-Classifier

Logistic Regression. Machine Learning Fall 2018

CIS 520: Machine Learning Oct 09, Kernel Methods

Lecture 10: Logistic Regression

Machine Learning. 7. Logistic and Linear Regression

COMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS16

Support Vector Machines

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

Terminology for Statistical Data

BACKPROPAGATION. Neural network training optimization problem. Deriving backpropagation

Iterative Reweighted Least Squares

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.

Statistical and Computational Learning Theory

Fundamentals of Machine Learning. Mohammad Emtiyaz Khan EPFL Aug 25, 2015

Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.

Transcription:

Learning From Data Lecture 0 Nonlinear Transforms The Z-space Polynomial transforms Be careful M. Magdon-Ismail CSCI 400/600

recap: The Linear Model linear in w: makes the algorithms work linear in x: gives the line/hyperplane separator s = w t x Approve or Deny Perceptron Classification Error PLA, Pocket,... Credit Analysis Amount of Credit Linear Regression Squared Error Pseudo-inverse Probability of Default Logistic Regression Cross-entropy Error Gradient descent c AM L Creator: Malik Magdon-Ismail Nonlinear Transforms: 2 /7 Limitations of linear

The Linear Model has its Limits (a) Linear with outliers (b) Essentially nonlinear To address (b) we need something more than linear. c AM L Creator: Malik Magdon-Ismail Nonlinear Transforms: 3 /7 Change the features

Change Your Features Income Y 3 years no additional effect beyond Y = 3; Y 0.3 years no additional effect below Y = 0.3. Years in Residence, Y c AM L Creator: Malik Magdon-Ismail Nonlinear Transforms: 4 /7 Transform your features

Change Your Features Using a Transform Income Income Years in Residence, Y z z Y c AM L Creator: Malik Magdon-Ismail Nonlinear Transforms: 5 /7 Feature transform I: Z-space

Mechanics of the Feature Transform I Transform the data to a Z-space in which the data is separable. 0 x2 z2 = x 2 2 x z = x 2 x = x x 2 z = Φ(x) = = x 2 x 2 2 Φ (x) Φ 2 (x) c AM L Creator: Malik Magdon-Ismail Nonlinear Transforms: 6 /7 Feature transform II: classify in Z-space

Mechanics of the Feature Transform II Separate the data in the Z-space with w: g(z) = sign( w t z) c AM L Creator: Malik Magdon-Ismail Nonlinear Transforms: 7 /7 Feature transform III: bring back to X-space

Mechanics of the Feature Transform III To classify a new x, first transform x to Φ(x) Z-space and classify there with g. g(x) = g(φ(x)) = sign( w t Φ(x)) g(z) = sign( w t z) c AM L Creator: Malik Magdon-Ismail Nonlinear Transforms: 8 /7 Summary of nonlinear transform

The General Feature Transform X-space is R d x = x. x d Z-space is R d z = Φ(x) = Φ (x). Φ d(x) = z. z d x,x 2,...,x N z,z 2,...,z N y,y 2,...,y N no weights w = y,y 2,...,y N w 0 w. w d g(x) = sign( w t Φ(x)) c AM L Creator: Malik Magdon-Ismail Nonlinear Transforms: 9 /7 Generalization

Generalization d vc dvc d+ d+ Choose the feature transform with smallest d c AM L Creator: Malik Magdon-Ismail Nonlinear Transforms: 0 /7 Many possibilities to choose from

Many Nonlinear Features May Work x2 x 2 +x2 2 = 0.6 0 x z2 = x2 z2 = x 2 2 z = x 2 +x 2 2 0.6 z = (x +0.05) 2 z = x 2 A rat! A rat! This is called data snooping: looking at your data and tailoring your H. c AM L Creator: Malik Magdon-Ismail Nonlinear Transforms: /7 Choose before looking at data

Must Choose Φ BEFORE Your Look at the Data After constructing features carefully, before seeing the data......if you think linear is not enough, try the 2nd order polynomial transform. x x 2 = x Φ(x) = Φ (x) Φ 2 (x) Φ 3 (x) Φ 4 (x) Φ 5 (x) = x x 2 x 2 x x 2 x 2 2 c AM L Creator: Malik Magdon-Ismail Nonlinear Transforms: 2 /7 The polynomial transform

The General Polynomial Transform Φ k We can get even fancier: degree-k polynomial transform: Φ (x) = (,x,x 2 ), Φ 2 (x) = (,x,x 2,x 2,x x 2,x 2 2), Φ 3 (x) = (,x,x 2,x 2,x x 2,x 2 2,x 3,x 2 x 2,x x 2 2,x 3 2), Φ 4 (x) = (,x,x 2,x 2,x x 2,x 2 2,x 3,x 2 x 2,x x 2 2,x 3 2,x 4,x 3 x 2,x 2 x 2 2,x x 3 2,x 4 2),. Dimensionality of the feature space increases rapidly (d vc )! Similar transforms for d-dimensional original space. Approximation-generalization tradeoff Higher degree gives lower (even zero) E in but worse generalization. c AM L Creator: Malik Magdon-Ismail Nonlinear Transforms: 3 /7 Be carefull with nonlinear transforms

Be Careful with Feature Transforms c AM L Creator: Malik Magdon-Ismail Nonlinear Transforms: 4 /7 Insist on E in = 0

Be Careful with Feature Transforms High order polynomial transform leads to nonsense. c AM L Creator: Malik Magdon-Ismail Nonlinear Transforms: 5 /7 Digits data

Digits Data Versus All 0.35 Symmetry Symmetry Average Intensity Average Intensity Linear model E in = 2.3% E out = 2.38% 3rd order polynomial model E in =.75% E out =.87% c AM L Creator: Malik Magdon-Ismail Nonlinear Transforms: 6 /7 Use the linear model!

Use the Linear Model! First try a linear model simple, robust and works. Algorithms can tolerate error plus you have nonlinear feature transforms. Choose a feature transform before seeing the data. Stay simple. Data snooping is hazardous to your E out. Linear models are fundamental in their own right; they are also the building blocks of many more complex models like support vector machines. Nonlinear transforms also apply to regression and logistic regression. c AM L Creator: Malik Magdon-Ismail Nonlinear Transforms: 7 /7