Regression and Classification" with Linear Models" CMPSCI 383 Nov 15, 2011!

Size: px

Start display at page:

Download "Regression and Classification" with Linear Models" CMPSCI 383 Nov 15, 2011!"

Matilda Gibson
5 years ago
Views:

1 Regression and Classification" with Linear Models" CMPSCI 383 Nov 15, 2011! 1

2 Todayʼs topics" Learning from Examples: brief review! Univariate Linear Regression! Batch gradient descent! Stochastic gradient descent! Multivariate Linear Regression! Regularization! Linear Classifiers! Perceptron learning rule! Logistic Regression! 2

3 Learning from Examples (supervised learning)" 3

4 Learning from Examples (supervised learning)" 4

5 Learning from Examples (supervised learning)" 5

6 Learning from Examples (supervised learning)" 6

7 Learning from Examples (supervised learning)" 7

8 Learning from Examples (supervised learning)" 8

9 Important issues" Generalization! Overfitting! Cross-validation! Holdout cross validation! K-fold cross validation! Leave-one-out cross-validation! Model selection! 9

10 Recall Notation" (x 1, y 1 ), (x 2, y 2 ),K (x N,y N ) training set! y j Where each was generated by! an unknown function! y = f (x) Discover a function that best approximates the true function! h f hypothesis! 10

11 Loss Functions" Suppose the true prediction for input x is f (x) = y but the hypothesis gives h(x) = y ˆ L(x, y, ˆ y ) = Utility(result of using y given input x) Utility(result of using ˆ y given input x) Simplified version : L(y, ˆ y ) Absolute value loss : L 1 (y, ˆ y ) = y ˆ y ( ) 2 Squared error loss : L 2 (y, y ˆ ) = y y ˆ 0/1 loss : L 0 /1 (y, ˆ y ) = 0 if y = ˆ y, else 1 Generalization loss: expected loss over all possible examples! Empirical loss: average loss over available examples! 11

12 Univariate Linear Regression" 12

13 Univariate Linear Regression contd." w = [ w 0,w ] 1 weight vector! h w (x) = w 1 x + w 0 Find weight vector that minimizes empirical loss, e.g., L2:! N Loss(h w ) = L 2 (y j, h w (x j )) = (y j h w (x j )) 2 = (y j (w 1 x j + w 0 )) 2 i.e., find j =1 w* such that! N j =1 w * = argmin w Loss(h w ) N j =1 13

14 Weight Space" 14

15 Finding w*" Find weights such that:! w 0 N j =1 (y j (w 1 x j + w 0 )) 2 = 0 and N (y w j (w 1 x j + w 0 )) 2 = 0 1 j =1 15

16 Gradient Descent" step size or! learning rate! w i w i α w i Loss(w) 16

17 Gradient Descent contd." For one training example (x,y) :! w 0 w 0 + α(y h w (x)) and w 1 w 1 + α(y h w (x))x For N training examples:! w 0 w 0 + α (y j h w (x j )) and w 1 w 1 + α (y j h w (x j )) j j x j batch gradient descent! stochastic gradient descent: take a step for one training example at a time! 17

18 The Multivariate case" h sw (x j ) = w 0 + w 1 x j,1 +L + w n x j,n = w 0 + i w i x j,i Augmented vectors: add a feature to each x by tacking on a 1:! x j,0 =1 Then:! h sw (x j ) = w x j = w T x j = i w i x j,i And batch gradient descent update becomes:! j w i w i + α (y j h w (x j )) x j,i 18

19 The Multivariate case contd." Or, solving analytically:! y Let be the vector of outputs for the training examples! X data matrix: each row is an input vector! Solving this for w* :! y = Xw w* = ( X T X) 1 X T y pseudo inverse! 19

20 Regularization" Cost(h) = EmpLoss(h) + λcomplexity(h) Complexity(h w ) = L q (w) = i w i q 20

21 L1 vs. L2 Regularization" 21

22 Linear Classification: hard thresholds" 22

23 Linear Classification: hard thresholds contd." Decision Boundary:! In linear case: linear separator, a hyperplane! Linearly separable:! data is linearly separable if the classes can be separated by a linear separator! Classification hypothesis:! h w (x) = Threshold(w x) where Threshold(z) =1 if z 0 and 0 otherwise 23

24 Perceptron Learning Rule" For a single sample (x, y) : w i w i + α( y h w (x))x i If the output is correct, i.e., y =h w (x), then the weights don't change If y =1 but h w (x) = 0, then w i is increased when x i is positive and decreased when x i is negative. If y = 0 but h w (x) =1, then w i is decreased when x i is positive and increased when x i is negative. Perceptron Convergence Theorem: For any data set thatʼs linearly separable and any training procedure that continues to present each training example, the learning rule is guaranteed to find a solution in a finite number of steps.! 24

25 Perceptron Performance" 25

26 Linear Classification with Logistic Regression" An important function! 26

27 Logistic Regression" h w (x) = Logistic(w x) = 1 1+ e w x For a single sample (x, y) and L 2 loss function : w i w i + α( y h w (x))h w (x)( 1 h w (x))x i derivative of logistic function! 27

28 Logistic Regression Performance" separable case! 28

29 Summary" Learning from Examples: brief review! Loss functions! Generalization! Overfitting! Cross-validation! Regularization! Univariate Linear Regression! Batch gradient descent! Stochastic gradient descent! Multivariate Linear Regression! Regularization! Linear Classifiers! Perceptron learning rule! Logistic Regression! 29

30 Next Class" Artificial Neural Networks, Nonparametric Models, & Support Vector Machines! Secs ! 30

CSC242: Intro to AI. Lecture 21

CSC242: Intro to AI. Lecture 21 CSC242: Intro to AI Lecture 21 Administrivia Project 4 (homeworks 18 & 19) due Mon Apr 16 11:59PM Posters Apr 24 and 26 You need an idea! You need to present it nicely on 2-wide by 4-high landscape pages