Learning Linear Detectors

Size: px

Start display at page:

Download "Learning Linear Detectors"

Dinah Parks
5 years ago
Views:

1 Learning Linear Detectors Instructor - Simon Lucey Designing Computer Vision Apps

2 Today Detection versus Classification Bayes Classifiers Linear Classifiers

3 Examples of Detection 3

4 Learning: Detection = Binary Classification Binary Classifier yes/no 4

5 Learning: Detection = Binary Classification Binary Classifier 1 or 0 4

6 Vision: Detection = Localization Images of Object at various warps 5

7 Vision: Detection = Localization [255,134,45,...,34,12,124,67] [123,244,12,...,134,122,24,02] [67,13,245,...,112,51,92,181]... [65,09,67,...,78,66,76,215] Images of Object at various warps Vectors of pixel values at each warp position 5

8 Vision: Detection = Localization [255,134,45,...,34,12,124,67] [123,244,12,...,134,122,24,02] [67,13,245,...,112,51,92,181]... [65,09,67,...,78,66,76,215] Images of Object at various warps Vectors of pixel values at each warp position [65,09,67,...,78,66,76,215] Binary Classifier yes/no 5

9 Examples of Classification 6

10 Examples of Classification 7

11 Today Detection versus Classification Bayes Classifiers Linear Classifiers

12 Binary Classification [65,09,67,...,78,66,76,215] x 2 R D T object y(x) 0 < background 9

13 Binary Classification [65,09,67,...,78,66,76,215] x 2 R D T Binary Classifier object y(x) 0 < background 9

14 Binary Classification [65,09,67,...,78,66,76,215] x 2 R D T object Discriminant y(x) 0 < background y : R D! R 9

15 Binary Classification [65,09,67,...,78,66,76,215] x 2 R D T x 2 C 1 Discriminant y(x) 0 < x 2 C 2 y : R D! R 9

16 Binary Bayesian Classification [65,09,67,...,78,66,76,215] x 2 R D T x 2 C 1 p(x C 1 ) p(x C 2 ) < P (C 2 ) P (C 1 ) x 2 C 2 Thomas Bayes 10

17 Binary Bayesian Classification [65,09,67,...,78,66,76,215] x 2 R D T x 2 C 1 log p(x C 1) p(x C 2 ) 0 < x 2 C 2 Thomas Bayes 11

18 Binary Bayesian Classification p(x C 1 ) p(x, C 1 ) x 0 x p(x C 2 ) p(x, C 2 ) x R 1 R 2 C 1 C C 2 12

19 Binary Bayesian Classification x 2 p(x C 1 ) y([x 1,x 2 ] T )=0 p(x C 2 ) x 1 13

20 Binary Bayesian Classification Although ideal, numerous issues: p(x C) what type of distribution is? do we have enough samples to approximate? curse of dimensionality! p(x C) x 2 x 2 x 1 D =1 x 1 D =2 x 1 x 3 D =3 14

21 Curse of Dimensionality

22 Curse of Dimensionality Why not just sample the function? One dimensional function: Adapted from: Optimization Methods in Computer Vision. Anders Eriksson

23 Curse of Dimensionality Why not just sample the function? One dimensional function: Sample 10 points, pick lowest value. Probably works. Adapted from: Optimization Methods in Computer Vision. Anders Eriksson

24 Curse of Dimensionality Why not just sample the function? Two dimensional function: 2 10 samples. Adapted from: Optimization Methods in Computer Vision. Anders Eriksson

25 Curse of Dimensionality Why not just sample the function? Three dimensional function: 3 10 samples. Adapted from: Optimization Methods in Computer Vision. Anders Eriksson

26 No. of samples Curse of Dimensionality One can see that as the dimensionality D increases, and, assuming the same number of discrete samples n per dimension, the number of samples becomes Dimensionality(D)

27 Linear Binary Classification [65,09,67,...,78,66,76,215] x 2 R D T x 2 C 1 w T x + w 0 < 0 x 2 C 2 21

28 Linear Binary Classification [65,09,67,...,78,66,76,215] x 2 R D T x 2 C 1 Perceptron w T x + w 0 < 0 x 2 C 2 21

29 Linear Binary Classification [65,09,67,...,78,66,76,215] x 2 R D T Linear Discriminant x 2 C 1 w T x + w 0 < 0 x 2 C 2 21

30 Today Detection versus Classification Bayes Classifiers Linear Classifiers

31 Linear Discriminant Functions y>0 y =0 y<0 x 2 R 2 R 1 C 1 C 2 w x y(x) w x x 1 w 0 w

32 Multi-Class Linear Discriminants R i C j R j C i R k C k x B x A ˆx

33 No. of samples Why Linear? Linear discriminant functions are useful in this regard as the number of required samples n is linear with respect to the dimensionality D. Dimensionality(D)

34 No. of samples Why Linear? Linear discriminant functions are useful in this regard as the number of required samples n is linear with respect to the dimensionality D. Dimensionality(D)

Perceptron Rosenblatt simulated the perceptron on a IBM 704 computer at Cornell in 1957. Input scene (i.e. printed character) was illuminated by powerful lights and captured on a 20x20 cadmium sulphide photo cells.

35 Perceptron Rosenblatt simulated the perceptron on a IBM 704 computer at Cornell in Input scene (i.e. printed character) was illuminated by powerful lights and captured on a 20x20 cadmium sulphide photo cells. Weights of perceptron were applied using variable rotary resistors. Often times referred to as the very first neural network. special-purpose Frank Rosenblatt h

36 Perceptron

37 Perceptron Linear Discriminant t i =+1 t i = 1 binary labels x i = i-th training example w = weight vector w 0 = bias arg min w,w 0 NX n=1 max[0, t i (w T x i + w 0 )]

38 Perceptron Linear Discriminant t i =+1 t i = 1 binary labels x i = i-th training example w = weight vector w 0 = bias arg min w,w 0 NX n=1 max[0, t i (w T x i + w 0 )]

39 Linear Least Squares Discriminant Classical perceptron problematic as it does not have a unique minimum. One can employ a convex objective such as least-squares as it is guaranteed of having a unique minimum. NX arg min w,w 0 n=1 t i w T x i w 0 2 2

40 Linear Least Squares Discriminant Classical perceptron problematic as it does not have a unique minimum. One can employ a convex objective such as least-squares as it is guaranteed of having a unique minimum. NX arg min w,w 0 n=1 t i w T x i w 0 2 2

41 Linear Least Squares Discriminant

42 Battle of the Objectives Problem: want the best of both worlds. 1. Unique minimum. 2. Robustness to outliers L 2 perceptron

43 Linear Support Vector Machines margin / (w T w) 1 C = cost of error 1 arg min w,w 0 2 wt w + C NX n=1 max[0, 1 t n (w T x n + w 0 )]

44 Linear Support Vector Machine Can be re-written as a quadratic programming problem, NX 1 arg min w,w 0 2 wt w + C max(1 t n (w T x n + w 0 )] n=1

45 Linear Support Vector Machine Can be re-written as a quadratic programming problem, NX 1 arg min w,w 0 2 wt w + C max(1 t n (w T x n + w 0 )] n=1 1 arg min w,w 0 2 wt w + C NX n=1 s.t. t n (w T x n + w 0 ) 1 n, n =1...N n n 0, n =1...N

46 Reminder: Quadratic Programming Most widely used in vision and learning. arg min x x T Px + q T x + r s.t. Gx apple h Ax = b x 2 S D Examples - SfM, Support Vector Machines, Alignment, etc.

47 35 Non-Linear Extension Obvious that most decision boundaries in reality will NOT be linear. Popular to employ non-linear mappings (x). Feature extraction (e.g. SIFT, HOG) very useful here. 1 1 x 2 φ x φ 1

48 Generating Positive Examples Coarsely normalize for scale, orientation and translation. Bad idea to normalize too much due to the nature of the discrete ES. 36

49 Generating Negative Examples Obtain a large number of non-object images (through the web). Randomly, sample through various positions within the images. Inherently imbalanced as we produce far more negative than positive examples. 37

50 Generating Negative Examples Obtain a large number of non-object images (through the web). Randomly, sample through various positions within the images. Inherently imbalanced as we produce far more negative than positive examples. 37

51 More to read Bishop Pattern Recognition and Machine Learning, Chapters 3 & 4. Fukunaga, Introduction to Statistical Pattern Recognition, Chapter 1.

Linear Classifiers as Pattern Detectors

Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2014/2015 Lesson 16 8 April 2015 Contents Linear Classifiers as Pattern Detectors Notation...2 Linear