CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 9

Similar documents
Machine Learning Support Vector Machines SVM

An Introduction to Support Vector Machines

18.7 Artificial Neural Networks

4. Eccentric axial loading, cross-section core

Chapter Newton-Raphson Method of Solving a Nonlinear Equation

Review of linear algebra. Nuno Vasconcelos UCSD

Remember: Project Proposals are due April 11.

ESCI 342 Atmospheric Dynamics I Lesson 1 Vectors and Vector Calculus

Dennis Bricker, 2001 Dept of Industrial Engineering The University of Iowa. MDP: Taxi page 1

Chapter Newton-Raphson Method of Solving a Nonlinear Equation

UNIVERSITY OF IOANNINA DEPARTMENT OF ECONOMICS. M.Sc. in Economics MICROECONOMIC THEORY I. Problem Set II

Machine Learning. Support Vector Machines. Le Song. CSE6740/CS7641/ISYE6740, Fall Lecture 8, Sept. 13, 2012 Based on slides from Eric Xing, CMU

Rank One Update And the Google Matrix by Al Bernstein Signal Science, LLC

Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede. with respect to λ. 1. χ λ χ λ ( ) λ, and thus:

Principle Component Analysis

SVMs for regression Multilayer neural networks

SVMs for regression Non-parametric/instance based classification method

CISE 301: Numerical Methods Lecture 5, Topic 4 Least Squares, Curve Fitting

DCDM BUSINESS SCHOOL NUMERICAL METHODS (COS 233-8) Solutions to Assignment 3. x f(x)

Quiz: Experimental Physics Lab-I

6 Roots of Equations: Open Methods

Partially Observable Systems. 1 Partially Observable Markov Decision Process (POMDP) Formalism

Lecture 4: Piecewise Cubic Interpolation

Abhilasha Classes Class- XII Date: SOLUTION (Chap - 9,10,12) MM 50 Mob no

Definition of Tracking

Physics 121 Sample Common Exam 2 Rev2 NOTE: ANSWERS ARE ON PAGE 7. Instructions:

Linear and Nonlinear Optimization

GAUSS ELIMINATION. Consider the following system of algebraic linear equations

Model Fitting and Robust Regression Methods

Variable time amplitude amplification and quantum algorithms for linear algebra. Andris Ambainis University of Latvia

Jens Siebel (University of Applied Sciences Kaiserslautern) An Interactive Introduction to Complex Numbers

INTERPOLATION(1) ELM1222 Numerical Analysis. ELM1222 Numerical Analysis Dr Muharrem Mercimek

International Journal of Pure and Applied Sciences and Technology

Least squares. Václav Hlaváč. Czech Technical University in Prague

Applied Statistics Qualifier Examination

Multiple view geometry

Katholieke Universiteit Leuven Department of Computer Science

In this Chapter. Chap. 3 Markov chains and hidden Markov models. Probabilistic Models. Example: CpG Islands

Pattern Classification

Lecture Notes on Linear Regression

LOCAL FRACTIONAL LAPLACE SERIES EXPANSION METHOD FOR DIFFUSION EQUATION ARISING IN FRACTAL HEAT TRANSFER

Lecture 20: Hypothesis testing

Lecture 36. Finite Element Methods

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

Linear Regression & Least Squares!

Support vector machines for regression

Chemical Reaction Engineering

Feature Selection: Part 1

Chemical Reaction Engineering

Logarithms. Logarithm is another word for an index or power. POWER. 2 is the power to which the base 10 must be raised to give 100.

The Number of Rows which Equal Certain Row

Topic 1 Notes Jeremy Orloff

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

1 Matrix representations of canonical matrices

INTRODUCTION TO COMPLEX NUMBERS

CENTROID (AĞIRLIK MERKEZİ )

COMPLEX NUMBERS AND QUADRATIC EQUATIONS

6.6 The Marquardt Algorithm

Generalized Linear Methods

Multilayer Perceptron (MLP)

Kernel Methods and SVMs Extension

The Schur-Cohn Algorithm

ORDINARY DIFFERENTIAL EQUATIONS

Torsion, Thermal Effects and Indeterminacy

7.2 Volume. A cross section is the shape we get when cutting straight through an object.

Lecture 5 Single factor design and analysis

Math 426: Probability Final Exam Practice

Announcements. Image Formation: Outline. The course. Image Formation and Cameras (cont.)

8. INVERSE Z-TRANSFORM

Module 2. Random Processes. Version 2 ECE IIT, Kharagpur

Introduction to Numerical Integration Part II

CS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015

COMPLEX NUMBER & QUADRATIC EQUATION

Polynomial Approximations for the Natural Logarithm and Arctangent Functions. Math 230

Pyramid Algorithms for Barycentric Rational Interpolation

Demand. Demand and Comparative Statics. Graphically. Marshallian Demand. ECON 370: Microeconomic Theory Summer 2004 Rice University Stanley Gilbert

Electrochemical Thermodynamics. Interfaces and Energy Conversion

Unit 5: Quadratic Equations & Functions

Haddow s Experiment:

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

ME 501A Seminar in Engineering Analysis Page 1

W. We shall do so one by one, starting with I 1, and we shall do it greedily, trying

Vectors and Tensors. R. Shankar Subramanian. R. Aris, Vectors, Tensors, and the Equations of Fluid Mechanics, Prentice Hall (1962).

Lecture 1. Functional series. Pointwise and uniform convergence.

The Regulated and Riemann Integrals

Section 8.3 Polar Form of Complex Numbers

Summary: Method of Separation of Variables

Two Coefficients of the Dyson Product

Course Review Introduction to Computer Methods

Substitution Matrices and Alignment Statistics. Substitution Matrices

APPENDIX A Some Linear Algebra

Math 497C Sep 17, Curves and Surfaces Fall 2004, PSU

Engineering Tensors. Friday November 16, h30 -Muddy Charles. A BEH430 review session by Thomas Gervais.

Singular Value Decomposition: Theory and Applications

5.1 Estimating with Finite Sums Calculus

Maximal Margin Classifier

Math1110 (Spring 2009) Prelim 3 - Solutions

Problem Set 9 Solutions

LECTURE NOTE #12 PROF. ALAN YUILLE

8.1 Arc Length. What is the length of a curve? How can we approximate it? We could do it following the pattern we ve used before

Transcription:

CS434/541: Pttern Recognton Prof. Olg Veksler Lecture 9

Announcements Fnl project proposl due Nov. 1 1-2 prgrph descrpton Lte Penlt: s 1 pont off for ech d lte Assgnment 3 due November 10 Dt for fnl project due Nov. 15 Must be ported n Mtlb, send me.mt fle wth dt nd short descrpton fle of wht the dt s Lte penlt s 1 pont off for ech d lte Fnl project progress report Meet wth me the week of November 22-26 5 ponts of f I wll see ou tht hve done NOTHNG et Assgnment 4 due December 1 Fnl project due December 8

Tod Lner Dscrmnnt Functons Introducton 2 clsses Multple clsses Optmzton wth grdent descent Perceptron Crteron Functon Btch perceptron rule Sngle smple perceptron rule

lghtness Lner dscrmnnt functons on Rod Mp No probblt dstrbuton (no shpe or prmeters re known) slmon bss slmon slmon Lbeled dt The shpe of dscrmnnt functons s known slmon bss lner dscrmnnt functon lot s known length Need to estmte prmeters of the dscrmnnt functon (prmeters of the lne n cse of lner dscrmnnt) lttle s known

lghtness lghtness Lner Dscrmnnt Functons: Bsc Ide bss slmon bss slmon bd boundr length good boundr length Hve smples from 2 clsses x 1, x 2,, x n Assume 2 clsses cn be seprted b lner boundr l(θ) wth some unknown prmeters θ Ft the best boundr to dt b optmzng over prmeters θ Wht s best? Mnmze clssfcton error on trnng dt? Does not gurntee smll testng error

Prmetrc Methods vs. Assume the shpe of denst for clsses s known p 1 (x θ ), 1 p 2 (x θ ), 2 Estmte θ 1, θ 2, from dt Use Besn clssfer to fnd decson regons c 3 c 2 Dscrmnnt Functons Assume dscrmnnt functons re or known shpe l(θ ), l(θ 1 2 ), wth prmeters θ, θ 1 2, Estmte θ 1, θ 2, from dt Use dscrmnnt functons for clssfcton c 3 c 2 c 1 c 1 In theor, Besn clssfer mnmzes the rsk In prctce, do not hve confdence n ssumed model shpes In prctce, do not rell need the ctul denst functons n the end Estmtng ccurte denst functons s much hrder thn estmtng ccurte dscrmnnt functons Some rgue tht estmtng denstes should be skpped Wh solve hrder problem thn needed?

LDF: Introducton Dscrmnnt functons cn be more generl thn lner For now, we wll stud lner dscrmnnt functons Smple model (should tr smpler models frst) Anltcll trctble Lner Dscrmnnt functons re optml for Gussn dstrbutons wth equl covrnce M not be optml for other dt dstrbutons, but the re ver smple to use Knowledge of clss denstes s not requred when usng lner dscrmnnt functons we cn s tht ths s non-prmetrc pproch

LDF: 2 Clsses A dscrmnnt functon s lner f t cn be wrtten s g(x) = w t x + w 0 w s clled the weght vector nd w 0 clled bs or threshold R 2 x (2) R 1 g(x) > 0 g g g ( x ) > 0 x clss 1 ( x ) < 0 x clss 2 ( x ) = 0 ether clss g(x) < 0 x (1) decson boundr g(x) = 0

LDF: 2 Clsses Decson boundr g(x) = w t x + w 0 =0 s hperplne set of vectors x whch for some sclrs α 0,, α d stsf α 0 +α 1 x (1) + + α d x (d) = 0 A hperplne s pont n 1D lne n 2D plne n 3D

LDF: 2 Clsses g(x) = w t x + w 0 w determnes orentton of the decson hperplne w 0 determnes locton of the decson surfce x (2) g(x) / w w x g(x) > 0 w 0 / w g(x) < 0 x (1) g(x) = 0

LDF: 2 Clsses

LDF: Mn Clsses Suppose we hve m clsses Defne m lner dscrmnnt functons g t ( x) w x + w 0 = = 1,...,m Gven x, ssgn clss c f g ( x) g ( x) j j Such clssfer s clled lner mchne A lner mchne dvdes the feture spce nto c decson regons, wth g (x) beng the lrgest dscrmnnt f x s n the regon R

LDF: Mn Clsses

LDF: Mn Clsses For two contguous regons R nd R j ; the boundr tht seprtes them s porton of hperplne H j defned b: g ( x) g ( x) w x + w = w x + w ( ) t w w x + ( w w ) 0 t t = j 0 j j0 j 0 j0 = Thus w w j s norml to H j And dstnce from x to H j s gven b d( x, H j ) = g ( x ) g ( x ) w w j j

LDF: Mn Clsses Decson regons for lner mchne re convex, z R α + 1 ( α ) z R z j g ( ) g j ( ) nd g ( z) g j ( z) ( α + ( 1 α ) z) g ( α + ( α ) z) j g 1 In prtculr, decson regons must be sptll contguous j R R R R R j s vld decson regon R j s not vld decson regon

LDF: Mn Clsses Thus pplcblt of lner mchne to mostl lmted to unmodl condtonl denstes p(x θ) even though we dd not ssume n prmetrc models Exmple: need non-contguous decson regons thus lner mchne wll fl

LDF: Augmented feture vector Lner dscrmnnt functon: t g( x) = w x + w 0 1 x Cn rewrte t: [ ] t t g x) = w w = = g( ) ( 0 new weght vector new feture vector s clled the ugmented feture vector Added dumm dmenson to get completel equvlent new homogeneous problem t g( x) = w x + w x old problem 1 x d 0 new problem g( ) = 1 x 1 x d t

LDF: Augmented feture vector Feture ugmentng s done for smpler notton From now on we lws ssume tht we hve ugmented feture vectors Gven smples x 1,, x n convert them to ugmented smples 1,, n b ddng new dmenson of vlue 1 = 1 x ( 2) R 2 g() < 0 R 1 g() > 0 g() / g() = 0 (1)

LDF: Trnng Error For the rest of the lecture, ssume we hve 2 clsses Smples 1,, n some n clss 1, some n clss 2 Use these smples to determne weghts n the t dscrmnnt functon g( ) = Wht should be our crteron for determnng? For now, suppose we wnt to mnmze the trnng error (tht s the number of msclssfed smples 1,, n ) g( ) > 0 clssfed c 1 Recll tht g( ) < 0 clssfed c 2 Thus trnng error s 0 f g( g( ) > 0 ) < 0 c c 1 2

LDF: Problem Normlzton Thus trnng error s 0 f Ths suggest problem normlzton : 1. Replce ll exmples from clss c 2 b ther negtve c 2 2. Seek weght vector s.t. t > 0 < > 2 1 0 0 c c t t Equvlentl, trnng error s 0 f ( ) > > 2 t 1 t c 0 c 0 If such exsts, t s clled seprtng or soluton vector Orgnl smples x 1,, x n cn ndeed be seprted b lne then

LDF: Problem Normlzton before normlzton ( 2) fter normlzton ( 2) (1) (1) Seek hperplne tht seprtes ptterns from dfferent ctegores Seek hperplne tht puts normlzed ptterns on the sme (postve) sde

LDF: Soluton Regon Fnd weght vector s.t. for ll smples 1,, n t ( 2) d = k = 0 k ( k ) > 0 best (1) In generl, there re mn such solutons

LDF: Soluton Regon Soluton regon for : set of ll possble solutons defned n terms of norml to the seprtng hperplne ( 2) (1) soluton regon

Optmzton Need to mnmze functon of mn vrbles ( x ) = J( x ) J,..., 1 x d We know how to mnmze J(x) Tke prtl dervtves nd set them to zero x x 1 d J J ( x ) ( x ) = J ( x ) = 0 grdent However solvng nltcll s not lws es Would ou lke to solve ths sstem of nonlner equtons? sn cos 2 2 3 x 4 ( x1 + x2 ) + e = 0 2 3 3 ( x + x ) + log( x ) Sometmes t s not even possble to wrte down n nltcl expresson for the dervtve, we wll see n exmple lter tod 1 2 5 x 2 4 = 0

Optmzton: Grdent Descent ( ) ( ) Grdent J x ponts n drecton of steepest ncrese of J(x), nd J x n drecton of steepest decrese one dmenson two dmensons J(x) dj dx ( ) J( ) dj dx ( ) x dj dx ( )

Optmzton: Grdent Descent (1 ) J( x ) J(x) J ( 2 ) ( x ) s (1) s (2) J ( k ) ( x ) = 0 x x (1) x (2) x (3) x (k) Grdent Descent for mnmzng n functon J(x) set k = 1 nd x (1) to some ntl guess for the weght vector ( k ) ( ) whle ( k η J x ) > ε choose lernng rte η (k) x (k+1) = x (k) η (k) k = k + 1 J( x ) (updte rule)

Optmzton: Grdent Descent Grdent descent s gurnteed to fnd onl locl mnmum J(x) x x (1) x (2) x (3) x (k) globl mnmum Nevertheless grdent descent s ver populr becuse t s smple nd pplcble to n functon

Optmzton: Grdent Descent Mn ssue: how to set prmeter η (lernng rte ) If η s too smll, need too mn tertons J(x) x If η s too lrge m overshoot the mnmum nd possbl never fnd t (f we keep overshootng) J(x) x (1) x (2) x

Tod Contnue Lner Dscrmnnt Functons Perceptron Crteron Functon Btch perceptron rule Sngle smple perceptron rule

lghtness LDF: Augmented feture vector Lner dscrmnnt functon: t g( x) = w x + w 0 bss slmon need to estmte prmeters w nd w 0 from dt length Augment smples x to get equvlent homogeneous problem n terms of smples : g x) = ( 0 1 x [ ] t w w t = = g( ) normlze b replcng ll exmples from clss c 2 b ther negtve c 2

LDF Augmented nd normlzed smples 1,, n t Seek weght vector s.t. > 0 ( 2) ( 2) (1) before normlzton (1) fter normlzton If such exsts, t s clled seprtng or soluton vector orgnl smples x 1,, x n cn ndeed be seprted b lne then

Optmzton: Grdent Descent J(x) (1 ) J( x ) J ( 2 ) ( x ) s x (1) s (1) x (2) s (2) x (3) J ( k ) ( x ) = 0 x (k) Grdent Descent for mnmzng n functon J(x) set k = 1 nd x (1) to some ntl guess for the weght vector ( k ) ( ) whle ( k η J x ) > ε choose lernng rte η (k) x (k+1) = x (k) η (k) k = k + 1 ( k + 1) ( k + 1) ( k ) ( k ) ( k ) = x J( x ) x = η ( J( x )) x (updte rule)

LDF: Crteron Functon Fnd weght vector s.t. for ll smples 1,, n t d = k = 0 k Need crteron functon J() whch s mnmzed when s soluton vector Let Y M be the set of exmples msclssfed b Y Frst nturl choce: number of msclssfed exmples J ( k ) ( ) { t = smple s.t. 0 } M < > ( ) = Y ( ) pecewse constnt, grdent descent s useless M 0 J()

LDF: Perceptron Crteron Functon Better choce: Perceptron crteron functon J p ( ) ( t = ) If s msclssfed, t 0 Thus ( ) 0 J p J p () s tmes sum of dstnces of msclssfed exmples to decson boundr Y M t / J p () s pecewse lner nd thus sutble for grdent descent J()

LDF: Perceptron Btch Rule Grdent of J p () s J ( ) p = ( ) Y M re smples msclssfed b (k) It s not possble to solve J p = nltcll Thus grdent decent btch updte rule for J p () s: J p ( ) ( t = ) becuse of Y M ( ) 0 Y M ( k + 1) ( k ) ( k = + η ) It s clled btch rule becuse t s bsed on ll msclssfed exmples Updte rule for grdent descent: x (k+1) = x (k) η (k) Y M Y M J( x )

LDF: Perceptron Sngle Smple Rule Thus grdent decent sngle smple rule for J p () s: ( k +1) ( k ) ( k ) = + note tht M s one smple msclssfed b (k) must hve consstent w of vstng smples η M Geometrc Interpretton: M msclssfed b (k) ( ( k )) t 0 M M s on the wrong sde of decson hperplne ddng η M to moves new decson hperplne n the rght drecton wth respect to M (k) (k+1) η M M

LDF: Perceptron Sngle Smple Rule ( k +1) ( k ) ( k ) = + η M (k) (k+1) M (k) (k+1) M k k η s too lrge, prevousl correctl clssfed smple k s now msclssfed η s too smll, M msclssfed s stll

LDF: Perceptron Exmple fetures grde nme good ttendnce? tll? sleeps n clss? chews gum? Jne es (1) es (1) no (-1) no (-1) A Steve es (1) es (1) es (1) es (1) F Mr no (-1) no (-1) no (-1) es (1) F Peter es (1) no (-1) no (-1) es (1) A clss 1: students who get grde A clss 2: students who get grde F

LDF Exmple: Augment feture vector fetures grde nme extr good ttendnce? tll? sleeps n clss? chews gum? Jne 1 es (1) es (1) no (-1) no (-1) A Steve 1 es (1) es (1) es (1) es (1) F Mr 1 no (-1) no (-1) no (-1) es (1) F Peter 1 es (1) no (-1) no (-1) es (1) A convert smples x 1,, x n to ugmented smples 1,, n b ddng new dmenson of vlue 1

LDF: Perform Normlzton fetures grde nme extr good ttendnce? tll? sleeps n clss? chews gum? Jne 1 es (1) es (1) no (-1) no (-1) A Steve -1 es (-1) es (-1) es (-1) es (-1) F Mr -1 no (1) no (1) no (1) es (-1) F Peter 1 es (1) no (-1) no (-1) es (1) A Replce ll exmples from clss c 2 b ther negtve c 2 Seek weght vector s.t. t > 0

LDF: Use Sngle Smple Rule fetures grde nme extr good ttendnce? tll? sleeps n clss? chews gum? Jne 1 es (1) es (1) no (-1) no (-1) A Steve -1 es (-1) es (-1) es (-1) es (-1) F Mr -1 no (1) no (1) no (1) es (-1) F Peter 1 es (1) no (-1) no (-1) es (1) A 4 = k = 0 t ( k ) Smple s msclssfed f k < 0 grdent descent sngle smple rule: Set fxed lernng rte to η (k) = 1: ( k + 1) ( k ) ( k = + η ) ( k +1) ( k ) = + M Y M

LDF: Grdent decent Exmple set equl ntl weghts (1) =[0.25, 0.25, 0.25, 0.25] vst ll smples sequentll, modfng the weghts for fter fndng msclssfed exmple nme Jne Steve t 0.25*1+0.25*1+0.25*1+0.25*(-1)+0.25*(-1) >0 0.25*(-1)+0.25*(-1)+0.25*(-1)+0.25*(-1)+0.25*(-1)<0 msclssfed? no es new weghts ( 2) ( 1) = + = [ 0.25 0.25 0.25 0.25 0. 25]+ M [ 1 1 1 1 ] = [ 0.75 0.75 0.75 0.75 0.75] + 1 =

LDF: Grdent decent Exmple ( 2) = [ 0.75 0.75 0.75 0.75 0.75] nme Mr t -0.75*(-1)-0.75*1-0.75 *1-0.75 *1-0.75*(-1) <0 msclssfed? es new weghts ( 3) ( 2) = + = [ 0.75 0.75 0.75 0.75 0. 75]+ M = [ 1 1 1 1 ] = + 1 [ 1.75 0.25 0.25 0.25 1.75]

LDF: Grdent decent Exmple ( 3) = [ 1.75 0.25 0.25 0.25 1.75] nme Peter t -1.75 *1 +0.25* 1+0.25* (-1) +0.25 *(-1)-1.75*1 <0 msclssfed? es new weghts ( 4) ( 3) = + = [ 1.75 0.25 0.25 0.25 1. 75]+ M [ 1 1 1 1 ] = + 1 = [ 0.75 1.25 0.75 0.75 0.75]

LDF: Grdent decent Exmple ( 4) = [ 0.75 1.25 0.75 0.75 0.75] nme Jne Steve Mr Peter t -0.75 *1 +1.25*1-0.75*1-0.75 *(-1) -0.75 *(-1)+0-0.75*(-1)+1.25*(-1) -0.75*(-1) -0.75*(-1)-0.75*(-1)>0-0.75 *(-1)+1.25*1-0.75*1-0.75 *1 0.75*(-1) >0-0.75 *1+ 1.25*1-0.75* (-1)-0.75* (-1) -0.75 *1 >0 msclssfed? no no no no Thus the dscrmnnt functon s ( 0) ( 1) ( 2) ( 3) ( 4) g( ) = 0.75 * + 1.25 * 0.75 * 0.75 * 0.75 * Convertng bck to the orgnl fetures x: ( 1) ( 2) ( 3) ( 4) g( x ) = 1.25 * x 0.75 * x 0.75 * x 0.75 * x 0.75

LDF: Grdent decent Exmple Convertng bck to the orgnl fetures x: 1.25 * x 1.25 * x ( 1) ( 2) ( 3) ( 4) 0.75 * x 0.75 * x 0.75 * x > 0. 75 ( 1) ( 2) ( 3) ( 4) 0.75 * x 0.75 * x 0.75 * x < 0. 75 grde grde A F good ttendnce tll sleeps n clss chews gum Ths s just one possble soluton vector If we strted wth weghts (1) =[0,0.5, 0.5, 0, 0], soluton would be [-1,1.5, -0.5, -1, -1] 1.5 * 1.5 * x x ( 1) ( 2 ) ( 3 ) ( 4 ) 0.5 * x x x ( 1) ( 2 ) ( 3 ) ( 4 ) 0.5 * x x x > 1 grde A < 1 grde F In ths soluton, beng tll s the lest mportnt feture

LDF: Nonseprble Exmple Suppose we hve 2 fetures nd smples re: Clss 1: [2,1], [4,3], [3,5] Clss 2: [1,3] nd [5,6] These smples re not seprble b lne Stll would lke to get pproxmte seprton b lne, good choce s shown n green some smples m be nos, nd t s ok f the re on the wrong sde of the lne Get 1, 2, 3, 4 b ddng extr feture nd normlzng 1 = 2 1 1 2 = 1 4 3 3 = 1 3 5 4 = 3 11 5 = 5 1 6

LDF: Nonseprble Exmple Let s ppl Perceptron sngle smple lgorthm ntl equl weghts ( 1 ) = [ 1 1 1] ths s lne x (1) +x (2) +1=0 fxed lernng rte η = 1 ( k +1) ( k ) = + M (1) 1 = 2 1 1 2 = 1 4 3 3 = 1 3 5 4 = 3 11 5 = 5 1 6 t 1 (1) = [1 1 1]*[1 2 1] t > 0 t 2 (1) = [1 1 1]*[1 4 3] t > 0 t 3 (1) = [1 1 1]*[1 3 5] t > 0

LDF: Nonseprble Exmple ( ) [ 1 1 1] 1 = ( k +1) ( k ) = + M 1 = 2 1 1 2 = 1 4 3 3 = 1 3 5 4 = 3 11 5 = 5 1 6 t 4 (1) =[1 1 1]*[-1-1 -3] t = -5< 0 (1) (2) ( 2 ) ( 1) = + = [ 1 1 1] + [ 1 1 3] = [ 0 0 2] M t 5 (2) =[0 0-2]*[-1-5 -6] t = 12 > 0 t 1 (2) =[0 0-2]*[1 2 1] t < 0 ( 3 ) ( 2 = ) + = [ 0 0 2 ] + [ 1 2 1 ] = [ 1 2 1 ] M

LDF: Nonseprble Exmple ( ) = [ 1 2 1] 3 ( k +1) ( k ) = + M 1 = 2 1 1 2 = 1 4 3 3 = 1 3 5 4 = 3 11 5 = 5 1 6 (3) t 2 (3) =[1 4 3]*[1 2-1] t =6 > 0 t 3 (3) =[1 3 5]*[1 2-1] t > 0 t 4 (3) =[-1-1 -3]*[1 2-1] t = 0 ( 4 ) ( 3 ) = + = [ 1 2 1] + [ 1 1 3] = [ 0 1 4] M (2)

LDF: Nonseprble Exmple ( ) = [ 0 1 4] 4 ( k +1) ( k ) = + M 1 = 2 1 1 2 = 1 4 3 3 = 1 3 5 4 = 3 11 5 = 5 1 6 (3) t 2 (3) =[1 4 3]*[1 2-1] t =6 > 0 t 3 (3) =[1 3 5]*[1 2-1] t > 0 t 4 (3) =[-1-1 -3]*[1 2-1] t = 0 ( 4 ) ( 3 ) = + = [ 1 2 1] + [ 1 1 3] = [ 0 1 4] M (4)

LDF: Nonseprble Exmple we cn contnue ths forever there s no soluton vector stsfng for ll t 5 = k = 0 k ( k ) > 0 need to stop but t good pont: solutons t tertons 900 through 915. Some re good some re not. How do we stop t good soluton?

LDF: Convergence of Perceptron rules If clsses re lnerl seprble, nd use fxed lernng rte, tht s for some constnt c, η (k) =c both sngle smple nd btch perceptron rules converge to correct soluton (could be n n the soluton spce) If clsses re not lnerl seprble: lgorthm does not stop, t keeps lookng for soluton whch does not exst b choosng pproprte lernng rte, cn lws ensure ( k ) convergence: η 0 s k ( 1) ( k ) η for exmple nverse lner lernng rte: η = k for nverse lner lernng rte convergence n the lnerl seprble cse cn lso be proven no gurntee tht we stopped t good pont, but there re good resons to choose nverse lner lernng rte

LDF: Perceptron Rule nd Grdent decent Lnerl seprble dt perceptron rule wth grdent decent works well Lnerl non-seprble dt need to stop perceptron rule lgorthm t good pont, ths mbe trck Btch Rule Smoother grdent becuse ll smples re used Sngle Smple Rule eser to nlze Concentrtes more thn necessr on n solted nos trnng exmples