Loss Functions and Optimization. Lecture 3-1

Size: px
Start display at page:

Download "Loss Functions and Optimization. Lecture 3-1"

Transcription

1 Lecture 3: Loss Functions and Optimization Lecture 3-1

2 Administrative: Live Questions We ll use Zoom to take questions from remote students live-streaming the lecture Check Piazza for instructions and meeting ID: Lecture 3-2

3 Administrative: Office Hours Office hours started this week, schedule is on the course website: Areas of expertise for all TAs are posted on Piazza: Lecture 3-3

4 Administrative: Assignment 1 Assignment 1 is released: Due Wednesday April 18, 11:59pm Lecture 3-4

5 Administrative: Google Cloud You should have received an yesterday about claiming a coupon for Google Cloud; make a private post on Piazza if you didn t get it There was a problem s; this is resolved If you have problems with coupons: Post on Piazza DO NOT me, DO NOT Prof. Phil Levis Lecture 3-5

6 Administrative: SCPD Tutors This year the SCPD office has hired tutors specifically for SCPD students taking CS231N; you should have received an about this yesterday (4/9/2018) Lecture 3-6

7 Administrative: Poster Session Poster session will be Tuesday June 12 (our final exam slot) Attendance is mandatory for non-scpd students; if you don t have a legitimate reason for skipping it then you forfeit the points for the poster presentation Lecture 3-7

8 Recall from last time: Challenges of recognition Viewpoint Illumination Deformation This image is CC0 1.0 public domain This image by Umberto Salvagnin is licensed under CC-BY 2.0 Clutter Intraclass Variation This image is CC0 1.0 public domain This image is CC0 1.0 public domain Lecture 3-8 Occlusion This image by jonsson is licensed under CC-BY 2.0

9 Recall from last time: data-driven approach, knn 1-NN classifier test train train 5-NN classifier validation test Lecture 3-9

10 Recall from last time: Linear Classifier f(x,w) = Wx + b Lecture 3-10

11 Recall from last time: Linear Classifier TODO: 1. Define a loss function that quantifies our unhappiness with the scores across the training data. 2. Come up with a way of efficiently finding the parameters that minimize the loss function. (optimization) Cat image by Nikita is licensed under CC-BY 2.0; Car image is CC0 1.0 public domain; Frog image is in the public domain Lecture 3-11

12 Suppose: 3 training examples, 3 classes. With some W the scores are: cat car frog Lecture 3-12

13 Suppose: 3 training examples, 3 classes. With some W the scores are: A loss function tells how good our current classifier is Given a dataset of examples Where cat car frog is image and is (integer) label Loss over the dataset is a sum of loss over examples: Lecture 3-13

14 Suppose: 3 training examples, 3 classes. With some W the scores are: Multiclass SVM loss: Given an example where is the image and where is the (integer) label, and using the shorthand for the scores vector: cat car frog the SVM loss has the form: Lecture 3-14

15 Suppose: 3 training examples, 3 classes. With some W the scores are: Multiclass SVM loss: Given an example Hinge loss where is the image and where is the (integer) label, and using the shorthand for the scores vector: cat car frog the SVM loss has the form: Lecture 3-15

16 Suppose: 3 training examples, 3 classes. With some W the scores are: Multiclass SVM loss: Given an example where is the image and where is the (integer) label, and using the shorthand for the scores vector: cat car frog the SVM loss has the form: Lecture 3-16

17 Suppose: 3 training examples, 3 classes. With some W the scores are: Multiclass SVM loss: Given an example where is the image and where is the (integer) label, and using the shorthand for the scores vector: cat car frog Losses: the SVM loss has the form: = max(0, ) +max(0, ) = max(0, 2.9) + max(0, -3.9) = = 2.9 Lecture 3-17

18 Suppose: 3 training examples, 3 classes. With some W the scores are: Multiclass SVM loss: Given an example where is the image and where is the (integer) label, and using the shorthand for the scores vector: cat car frog Losses: the SVM loss has the form: = max(0, ) +max(0, ) = max(0, -2.6) + max(0, -1.9) =0+0 =0 Lecture 3-18

19 Suppose: 3 training examples, 3 classes. With some W the scores are: Multiclass SVM loss: Given an example where is the image and where is the (integer) label, and using the shorthand for the scores vector: cat car frog Losses: the SVM loss has the form: = max(0, (-3.1) + 1) +max(0, (-3.1) + 1) = max(0, 6.3) + max(0, 6.6) = = 12.9 Lecture 3-19

20 Suppose: 3 training examples, 3 classes. With some W the scores are: Multiclass SVM loss: Given an example where is the image and where is the (integer) label, and using the shorthand for the scores vector: cat car frog Losses: the SVM loss has the form: Loss over full dataset is average: L = ( )/3 = 5.27 Lecture 3-20

21 Suppose: 3 training examples, 3 classes. With some W the scores are: Multiclass SVM loss: Given an example where is the image and where is the (integer) label, and using the shorthand for the scores vector: cat car frog Losses: the SVM loss has the form: Q: What happens to loss if car scores change a bit? Lecture 3-21

22 Suppose: 3 training examples, 3 classes. With some W the scores are: Multiclass SVM loss: Given an example where is the image and where is the (integer) label, and using the shorthand for the scores vector: cat car frog Losses: the SVM loss has the form: Q2: what is the min/max possible loss? Lecture 3-22

23 Suppose: 3 training examples, 3 classes. With some W the scores are: Multiclass SVM loss: Given an example where is the image and where is the (integer) label, and using the shorthand for the scores vector: cat car frog Losses: the SVM loss has the form: Q3: At initialization W is small so all s 0. What is the loss? Lecture 3-23

24 Suppose: 3 training examples, 3 classes. With some W the scores are: Multiclass SVM loss: Given an example where is the image and where is the (integer) label, and using the shorthand for the scores vector: cat car frog Losses: the SVM loss has the form: Q4: What if the sum was over all classes? (including j = y_i) Lecture 3-24

25 Suppose: 3 training examples, 3 classes. With some W the scores are: Multiclass SVM loss: Given an example where is the image and where is the (integer) label, and using the shorthand for the scores vector: cat car frog Losses: the SVM loss has the form: Q5: What if we used mean instead of sum? Lecture 3-25

26 Suppose: 3 training examples, 3 classes. With some W the scores are: Multiclass SVM loss: Given an example where is the image and where is the (integer) label, and using the shorthand for the scores vector: cat car frog Losses: the SVM loss has the form: Q6: What if we used Lecture 3-26

27 Multiclass SVM Loss: Example code Lecture 3-27

28 E.g. Suppose that we found a W such that L = 0. Is this W unique? Lecture 3-28

29 E.g. Suppose that we found a W such that L = 0. Is this W unique? No! 2W is also has L = 0! Lecture 3-29

30 Suppose: 3 training examples, 3 classes. With some W the scores are: Before: = max(0, ) +max(0, ) = max(0, -2.6) + max(0, -1.9) =0+0 =0 cat car frog Losses: With W twice as large: = max(0, ) +max(0, ) = max(0, -6.2) + max(0, -4.8) =0+0 =0 Lecture 3-30

31 E.g. Suppose that we found a W such that L = 0. Is this W unique? No! 2W is also has L = 0! How do we choose between W and 2W? Lecture 3-31

32 Regularization Data loss: Model predictions should match training data Lecture 3-32

33 Regularization Data loss: Model predictions should match training data Regularization: Prevent the model from doing too well on training data Lecture 3-33

34 Regularization Data loss: Model predictions should match training data = regularization strength (hyperparameter) Regularization: Prevent the model from doing too well on training data Lecture 3-34

35 Regularization Data loss: Model predictions should match training data = regularization strength (hyperparameter) Regularization: Prevent the model from doing too well on training data Simple examples L2 regularization: L1 regularization: Elastic net (L1 + L2): Lecture 3-35

36 Regularization Data loss: Model predictions should match training data = regularization strength (hyperparameter) Regularization: Prevent the model from doing too well on training data Simple examples L2 regularization: L1 regularization: Elastic net (L1 + L2): More complex: Dropout Batch normalization Stochastic depth, fractional pooling, etc Lecture 3-36

37 Regularization Data loss: Model predictions should match training data = regularization strength (hyperparameter) Regularization: Prevent the model from doing too well on training data Why regularize? - Express preferences over weights - Make the model simple so it works on test data - Improve optimization by adding curvature Lecture 3-37

38 Regularization: Expressing Preferences L2 Regularization Lecture 3-38

39 Regularization: Expressing Preferences L2 Regularization L2 regularization likes to spread out the weights Lecture 3-39

40 Regularization: Prefer Simpler Models y x Lecture 3-40

41 Regularization: Prefer Simpler Models y f1 f2 x Lecture 3-41

42 Regularization: Prefer Simpler Models y f1 f2 x Regularization pushes against fitting the data too well so we don t fit noise in the data Lecture 3-42

43 Softmax Classifier (Multinomial Logistic Regression) Want to interpret raw classifier scores as probabilities cat car frog Lecture 3-43

44 Softmax Classifier (Multinomial Logistic Regression) Want to interpret raw classifier scores as probabilities Softmax Function cat car frog Lecture 3-44

45 Softmax Classifier (Multinomial Logistic Regression) Want to interpret raw classifier scores as probabilities Softmax Function Probabilities must be >= 0 cat car frog exp unnormalized probabilities Lecture 3-45

46 Softmax Classifier (Multinomial Logistic Regression) Want to interpret raw classifier scores as probabilities Softmax Function Probabilities must be >= 0 cat car frog exp Probabilities must sum to 1 normalize unnormalized probabilities probabilities Lecture 3-46

47 Softmax Classifier (Multinomial Logistic Regression) Want to interpret raw classifier scores as probabilities Softmax Function Probabilities must be >= 0 cat car frog Unnormalized log-probabilities / logits exp Probabilities must sum to 1 normalize unnormalized probabilities probabilities Lecture 3-47

48 Softmax Classifier (Multinomial Logistic Regression) Want to interpret raw classifier scores as probabilities Softmax Function Probabilities must be >= 0 cat car frog Unnormalized log-probabilities / logits exp Probabilities must sum to 1 normalize unnormalized probabilities Li = -log(0.13) = 0.89 probabilities Lecture 3-48

49 Softmax Classifier (Multinomial Logistic Regression) Want to interpret raw classifier scores as probabilities Softmax Function Probabilities must be >= 0 cat car frog Unnormalized log-probabilities / logits exp Probabilities must sum to 1 normalize unnormalized probabilities probabilities Li = -log(0.13) = 2.04 Maximum Likelihood Estimation Choose probabilities to maximize the likelihood of the observed data (See CS 229 for details) Lecture 3-49

50 Softmax Classifier (Multinomial Logistic Regression) Want to interpret raw classifier scores as probabilities Softmax Function Probabilities must be >= 0 cat car frog Unnormalized log-probabilities / logits exp Probabilities must sum to 1 normalize unnormalized probabilities probabilities Lecture 3-50 compare Correct probs

51 Softmax Classifier (Multinomial Logistic Regression) Want to interpret raw classifier scores as probabilities Softmax Function Probabilities must be >= 0 cat car frog Unnormalized log-probabilities / logits exp Probabilities must sum to 1 normalize unnormalized probabilities compare Kullback Leibler divergence probabilities Lecture Correct probs

52 Softmax Classifier (Multinomial Logistic Regression) Want to interpret raw classifier scores as probabilities Softmax Function Probabilities must be >= 0 cat car frog Unnormalized log-probabilities / logits exp Probabilities must sum to 1 normalize unnormalized probabilities compare Cross Entropy probabilities Lecture Correct probs

53 Softmax Classifier (Multinomial Logistic Regression) Want to interpret raw classifier scores as probabilities Softmax Function Maximize probability of correct class cat car frog Putting it all together: Lecture 3-53

54 Softmax Classifier (Multinomial Logistic Regression) Want to interpret raw classifier scores as probabilities Softmax Function Maximize probability of correct class cat car frog Putting it all together: Q: What is the min/max possible loss L_i? Lecture 3-54

55 Softmax Classifier (Multinomial Logistic Regression) Want to interpret raw classifier scores as probabilities Softmax Function Maximize probability of correct class cat car frog Putting it all together: Q: What is the min/max possible loss L_i? A: min 0, max infinity Lecture 3-55

56 Softmax Classifier (Multinomial Logistic Regression) Want to interpret raw classifier scores as probabilities Softmax Function Maximize probability of correct class cat car frog Putting it all together: Q2: At initialization all s will be approximately equal; what is the loss? Lecture 3-56

57 Softmax Classifier (Multinomial Logistic Regression) Want to interpret raw classifier scores as probabilities Softmax Function Maximize probability of correct class cat car frog Putting it all together: Q2: At initialization all s will be approximately equal; what is the loss? A: log(c), eg log(10) 2.3 Lecture 3-57

58 Softmax vs. SVM Lecture 3-58

59 Softmax vs. SVM Lecture 3-59

60 Softmax vs. SVM assume scores: [10, -2, 3] [10, 9, 9] [10, -100, -100] and Q: Suppose I take a datapoint and I jiggle a bit (changing its score slightly). What happens to the loss in both cases? Lecture 3-60

61 Recap - We have some dataset of (x,y) - We have a score function: - We have a loss function: e.g. Softmax SVM Full loss Lecture 3-61

62 How do we find the best W? Recap - We have some dataset of (x,y) - We have a score function: - We have a loss function: e.g. Softmax SVM Full loss Lecture 3-62

63 Optimization Lecture 3-63

64 This image is CC0 1.0 public domain Lecture 3-64

65 Walking man image is CC0 1.0 public domain Lecture 3-65

66 Strategy #1: A first very bad idea solution: Random search Lecture 3-66

67 Lets see how well this works on the test set % accuracy! not bad! (SOTA is ~95%) Lecture 3-67

68 Strategy #2: Follow the slope Lecture 3-68

69 Strategy #2: Follow the slope In 1-dimension, the derivative of a function: In multiple dimensions, the gradient is the vector of (partial derivatives) along each dimension The slope in any direction is the dot product of the direction with the gradient The direction of steepest descent is the negative gradient Lecture 3-69

70 current W: gradient dw: [0.34, -1.11, 0.78, 0.12, 0.55, 2.81, -3.1, -1.5, 0.33, ] loss [?,?,?,?,?,?,?,?,?, ] Lecture 3-70

71 current W: W + h (first dim): gradient dw: [0.34, -1.11, 0.78, 0.12, 0.55, 2.81, -3.1, -1.5, 0.33, ] loss [ , -1.11, 0.78, 0.12, 0.55, 2.81, -3.1, -1.5, 0.33, ] loss [?,?,?,?,?,?,?,?,?, ] Lecture 3-71

72 current W: W + h (first dim): [0.34, -1.11, 0.78, 0.12, 0.55, 2.81, -3.1, -1.5, 0.33, ] loss [ , -1.11, 0.78, 0.12, 0.55, 2.81, -3.1, -1.5, 0.33, ] loss gradient dw: [-2.5,?,?,?, )/ ( = -2.5?,?,?,?,?, ] Lecture 3-72

73 current W: W + h (second dim): gradient dw: [0.34, -1.11, 0.78, 0.12, 0.55, 2.81, -3.1, -1.5, 0.33, ] loss [0.34, , 0.78, 0.12, 0.55, 2.81, -3.1, -1.5, 0.33, ] loss [-2.5,?,?,?,?,?,?,?,?, ] Lecture 3-73

74 current W: W + h (second dim): [0.34, -1.11, 0.78, 0.12, 0.55, 2.81, -3.1, -1.5, 0.33, ] loss [0.34, , 0.78, 0.12, 0.55, 2.81, -3.1, -1.5, 0.33, ] loss gradient dw: [-2.5, 0.6,?,?,?, )/ ( = 0.6?,?,?,?, ] Lecture 3-74

75 current W: W + h (third dim): gradient dw: [0.34, -1.11, 0.78, 0.12, 0.55, 2.81, -3.1, -1.5, 0.33, ] loss [0.34, -1.11, , 0.12, 0.55, 2.81, -3.1, -1.5, 0.33, ] loss [-2.5, 0.6,?,?,?,?,?,?,?, ] Lecture 3-75

76 current W: W + h (third dim): [0.34, -1.11, 0.78, 0.12, 0.55, 2.81, -3.1, -1.5, 0.33, ] loss [0.34, -1.11, , 0.12, 0.55, 2.81, -3.1, -1.5, 0.33, ] loss gradient dw: [-2.5, 0.6, 0,?,?, ( )/0.0001?, =0?,?,?, ] Lecture 3-76

77 current W: W + h (third dim): [0.34, -1.11, 0.78, 0.12, 0.55, 2.81, -3.1, -1.5, 0.33, ] loss [0.34, -1.11, , 0.12, 0.55, 2.81, -3.1, -1.5, 0.33, ] loss gradient dw: [-2.5, 0.6, 0,?,?, Numeric Gradient?, Need to loop over - Slow!?, all dimensions - Approximate?,?, ] Lecture 3-77

78 This is silly. The loss is just a function of W: want Lecture 3-78

79 Hammer image is in the public domain This is silly. The loss is just a function of W: want Use calculus to compute an analytic gradient This image is in the public domain Lecture 3-79 This image is in the public domain

80 current W: gradient dw: [0.34, -1.11, 0.78, 0.12, 0.55, 2.81, -3.1, -1.5, 0.33, ] loss [-2.5, 0.6, 0, 0.2, 0.7, -0.5, 1.1, 1.3, -2.1, ] dw =... (some function data and W) Lecture 3-80

81 In summary: - Numerical gradient: approximate, slow, easy to write - Analytic gradient: exact, fast, error-prone => In practice: Always use analytic gradient, but check implementation with numerical gradient. This is called a gradient check. Lecture 3-81

82 Gradient Descent Lecture 3-82

83 W_2 original W W_1 negative gradient direction Lecture 3-83

84 Lecture 3-84

85 Stochastic Gradient Descent (SGD) Full sum expensive when N is large! Approximate sum using a minibatch of examples 32 / 64 / 128 common Lecture 3-85

86 Interactive Web Demo time... Lecture 3-86

87 Interactive Web Demo time... Lecture 3-87

88 Aside: Image Features f(x) = Wx Lecture 3-88 Class scores

89 Aside: Image Features f(x) = Wx Feature Representation Lecture 3-89 Class scores

90 Image Features: Motivation y x Cannot separate red and blue points with linear classifier Lecture 3-90

91 Image Features: Motivation y θ f(x, y) = (r(x, y), θ(x, y)) x r Cannot separate red and blue points with linear classifier After applying feature transform, points can be separated by linear classifier Lecture 3-91

92 Example: Color Histogram +1 Lecture 3-92

93 Example: Histogram of Oriented Gradients (HoG) Divide image into 8x8 pixel regions Within each region quantize edge direction into 9 bins Example: 320x240 image gets divided into 40x30 bins; in each bin there are 9 numbers so feature vector has 30*40*9 = 10,800 numbers Lowe, Object recognition from local scale-invariant features, ICCV 1999 Dalal and Triggs, "Histograms of oriented gradients for human detection," CVPR 2005 Lecture 3-93

94 Example: Bag of Words Step 1: Build codebook Extract random patches Cluster patches to form codebook of visual words Step 2: Encode images Fei-Fei and Perona, A bayesian hierarchical model for learning natural scene categories, CVPR 2005 Lecture 3-94

95 Aside: Image Features Lecture 3-95

96 Image features vs ConvNets f Feature Extraction 10 numbers giving scores for classes training 10 numbers giving scores for classes training Lecture 3-96

97 Next time: Introduction to neural networks Backpropagation Lecture 3-97

Loss Functions and Optimization. Lecture 3-1

Loss Functions and Optimization. Lecture 3-1 Lecture 3: Loss Functions and Optimization Lecture 3-1 Administrative Assignment 1 is released: http://cs231n.github.io/assignments2017/assignment1/ Due Thursday April 20, 11:59pm on Canvas (Extending

More information

CS 1674: Intro to Computer Vision. Final Review. Prof. Adriana Kovashka University of Pittsburgh December 7, 2016

CS 1674: Intro to Computer Vision. Final Review. Prof. Adriana Kovashka University of Pittsburgh December 7, 2016 CS 1674: Intro to Computer Vision Final Review Prof. Adriana Kovashka University of Pittsburgh December 7, 2016 Final info Format: multiple-choice, true/false, fill in the blank, short answers, apply an

More information

Machine Learning Basics

Machine Learning Basics Security and Fairness of Deep Learning Machine Learning Basics Anupam Datta CMU Spring 2019 Image Classification Image Classification Image classification pipeline Input: A training set of N images, each

More information

Neural networks and optimization

Neural networks and optimization Neural networks and optimization Nicolas Le Roux INRIA 8 Nov 2011 Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov 2011 1 / 80 1 Introduction 2 Linear classifier 3 Convolutional neural networks

More information

Applied Machine Learning Lecture 5: Linear classifiers, continued. Richard Johansson

Applied Machine Learning Lecture 5: Linear classifiers, continued. Richard Johansson Applied Machine Learning Lecture 5: Linear classifiers, continued Richard Johansson overview preliminaries logistic regression training a logistic regression classifier side note: multiclass linear classifiers

More information

Neural Networks. David Rosenberg. July 26, New York University. David Rosenberg (New York University) DS-GA 1003 July 26, / 35

Neural Networks. David Rosenberg. July 26, New York University. David Rosenberg (New York University) DS-GA 1003 July 26, / 35 Neural Networks David Rosenberg New York University July 26, 2017 David Rosenberg (New York University) DS-GA 1003 July 26, 2017 1 / 35 Neural Networks Overview Objectives What are neural networks? How

More information

Neural networks and optimization

Neural networks and optimization Neural networks and optimization Nicolas Le Roux Criteo 18/05/15 Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 1 / 85 1 Introduction 2 Deep networks 3 Optimization 4 Convolutional

More information

Machine Learning Basics Lecture 4: SVM I. Princeton University COS 495 Instructor: Yingyu Liang

Machine Learning Basics Lecture 4: SVM I. Princeton University COS 495 Instructor: Yingyu Liang Machine Learning Basics Lecture 4: SVM I Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d. from distribution

More information

Nonlinear Classification

Nonlinear Classification Nonlinear Classification INFO-4604, Applied Machine Learning University of Colorado Boulder October 5-10, 2017 Prof. Michael Paul Linear Classification Most classifiers we ve seen use linear functions

More information

CENG 793. On Machine Learning and Optimization. Sinan Kalkan

CENG 793. On Machine Learning and Optimization. Sinan Kalkan CENG 793 On Machine Learning and Optimization Sinan Kalkan 2 Now Introduction to ML Problem definition Classes of approaches K-NN Support Vector Machines Softmax classification / logistic regression Parzen

More information

Gaussian and Linear Discriminant Analysis; Multiclass Classification

Gaussian and Linear Discriminant Analysis; Multiclass Classification Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015

More information

Optimization and (Under/over)fitting

Optimization and (Under/over)fitting Optimization and (Under/over)fitting EECS 442 Prof. David Fouhey Winter 2019, University of Michigan http://web.eecs.umich.edu/~fouhey/teaching/eecs442_w19/ Administrivia We re grading HW2 and will try

More information

Machine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang

Machine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang Machine Learning Basics Lecture 7: Multiclass Classification Princeton University COS 495 Instructor: Yingyu Liang Example: image classification indoor Indoor outdoor Example: image classification (multiclass)

More information

Deep Learning Lab Course 2017 (Deep Learning Practical)

Deep Learning Lab Course 2017 (Deep Learning Practical) Deep Learning Lab Course 207 (Deep Learning Practical) Labs: (Computer Vision) Thomas Brox, (Robotics) Wolfram Burgard, (Machine Learning) Frank Hutter, (Neurorobotics) Joschka Boedecker University of

More information

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat Neural Networks, Computation Graphs CMSC 470 Marine Carpuat Binary Classification with a Multi-layer Perceptron φ A = 1 φ site = 1 φ located = 1 φ Maizuru = 1 φ, = 2 φ in = 1 φ Kyoto = 1 φ priest = 0 φ

More information

Lecture 35: Optimization and Neural Nets

Lecture 35: Optimization and Neural Nets Lecture 35: Optimization and Neural Nets CS 4670/5670 Sean Bell DeepDream [Google, Inceptionism: Going Deeper into Neural Networks, blog 2015] Aside: CNN vs ConvNet Note: There are many papers that use

More information

Algorithms for NLP. Language Modeling III. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley

Algorithms for NLP. Language Modeling III. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley Algorithms for NLP Language Modeling III Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley Announcements Office hours on website but no OH for Taylor until next week. Efficient Hashing Closed address

More information

Convolutional Neural Networks

Convolutional Neural Networks Convolutional Neural Networks Books» http://www.deeplearningbook.org/ Books http://neuralnetworksanddeeplearning.com/.org/ reviews» http://www.deeplearningbook.org/contents/linear_algebra.html» http://www.deeplearningbook.org/contents/prob.html»

More information

CSCI567 Machine Learning (Fall 2018)

CSCI567 Machine Learning (Fall 2018) CSCI567 Machine Learning (Fall 2018) Prof. Haipeng Luo U of Southern California Sep 12, 2018 September 12, 2018 1 / 49 Administration GitHub repos are setup (ask TA Chi Zhang for any issues) HW 1 is due

More information

Deep Learning (CNNs)

Deep Learning (CNNs) 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Deep Learning (CNNs) Deep Learning Readings: Murphy 28 Bishop - - HTF - - Mitchell

More information

Lecture 5 Neural models for NLP

Lecture 5 Neural models for NLP CS546: Machine Learning in NLP (Spring 2018) http://courses.engr.illinois.edu/cs546/ Lecture 5 Neural models for NLP Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center Office hours: Tue/Thu 2pm-3pm

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Natural Language Processing with Deep Learning CS224N/Ling284

Natural Language Processing with Deep Learning CS224N/Ling284 Natural Language Processing with Deep Learning CS224N/Ling284 Lecture 4: Word Window Classification and Neural Networks Richard Socher Organization Main midterm: Feb 13 Alternative midterm: Friday Feb

More information

Neural Networks: Backpropagation

Neural Networks: Backpropagation Neural Networks: Backpropagation Seung-Hoon Na 1 1 Department of Computer Science Chonbuk National University 2018.10.25 eung-hoon Na (Chonbuk National University) Neural Networks: Backpropagation 2018.10.25

More information

Online Videos FERPA. Sign waiver or sit on the sides or in the back. Off camera question time before and after lecture. Questions?

Online Videos FERPA. Sign waiver or sit on the sides or in the back. Off camera question time before and after lecture. Questions? Online Videos FERPA Sign waiver or sit on the sides or in the back Off camera question time before and after lecture Questions? Lecture 1, Slide 1 CS224d Deep NLP Lecture 4: Word Window Classification

More information

Logistic Regression Introduction to Machine Learning. Matt Gormley Lecture 9 Sep. 26, 2018

Logistic Regression Introduction to Machine Learning. Matt Gormley Lecture 9 Sep. 26, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Logistic Regression Matt Gormley Lecture 9 Sep. 26, 2018 1 Reminders Homework 3:

More information

Neural Network Training

Neural Network Training Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification

More information

Interpreting Deep Classifiers

Interpreting Deep Classifiers Ruprecht-Karls-University Heidelberg Faculty of Mathematics and Computer Science Seminar: Explainable Machine Learning Interpreting Deep Classifiers by Visual Distillation of Dark Knowledge Author: Daniela

More information

Lecture 17: Neural Networks and Deep Learning

Lecture 17: Neural Networks and Deep Learning UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016 Lecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions

More information

Clustering with k-means and Gaussian mixture distributions

Clustering with k-means and Gaussian mixture distributions Clustering with k-means and Gaussian mixture distributions Machine Learning and Category Representation 2012-2013 Jakob Verbeek, ovember 23, 2012 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.12.13

More information

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Nicolas Thome Prenom.Nom@cnam.fr http://cedric.cnam.fr/vertigo/cours/ml2/ Département Informatique Conservatoire

More information

CS 6375 Machine Learning

CS 6375 Machine Learning CS 6375 Machine Learning Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Course Info. Instructor: Nicholas Ruozzi Office: ECSS 3.409 Office hours: Tues.

More information

CSE 546 Final Exam, Autumn 2013

CSE 546 Final Exam, Autumn 2013 CSE 546 Final Exam, Autumn 0. Personal info: Name: Student ID: E-mail address:. There should be 5 numbered pages in this exam (including this cover sheet).. You can use any material you brought: any book,

More information

Machine Learning for Signal Processing Bayes Classification and Regression

Machine Learning for Signal Processing Bayes Classification and Regression Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For

More information

Stochastic gradient descent; Classification

Stochastic gradient descent; Classification Stochastic gradient descent; Classification Steve Renals Machine Learning Practical MLP Lecture 2 28 September 2016 MLP Lecture 2 Stochastic gradient descent; Classification 1 Single Layer Networks MLP

More information

Comments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms

Comments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms Neural networks Comments Assignment 3 code released implement classification algorithms use kernels for census dataset Thought questions 3 due this week Mini-project: hopefully you have started 2 Example:

More information

Mathematical Tools for Neuroscience (NEU 314) Princeton University, Spring 2016 Jonathan Pillow. Homework 8: Logistic Regression & Information Theory

Mathematical Tools for Neuroscience (NEU 314) Princeton University, Spring 2016 Jonathan Pillow. Homework 8: Logistic Regression & Information Theory Mathematical Tools for Neuroscience (NEU 34) Princeton University, Spring 206 Jonathan Pillow Homework 8: Logistic Regression & Information Theory Due: Tuesday, April 26, 9:59am Optimization Toolbox One

More information

ECE521 Lectures 9 Fully Connected Neural Networks

ECE521 Lectures 9 Fully Connected Neural Networks ECE521 Lectures 9 Fully Connected Neural Networks Outline Multi-class classification Learning multi-layer neural networks 2 Measuring distance in probability space We learnt that the squared L2 distance

More information

CS60010: Deep Learning

CS60010: Deep Learning CS60010: Deep Learning Sudeshna Sarkar Spring 2018 16 Jan 2018 FFN Goal: Approximate some unknown ideal function f : X! Y Ideal classifier: y = f*(x) with x and category y Feedforward Network: Define parametric

More information

Lecture 4: Training a Classifier

Lecture 4: Training a Classifier Lecture 4: Training a Classifier Roger Grosse 1 Introduction Now that we ve defined what binary classification is, let s actually train a classifier. We ll approach this problem in much the same way as

More information

Case Study 1: Estimating Click Probabilities. Kakade Announcements: Project Proposals: due this Friday!

Case Study 1: Estimating Click Probabilities. Kakade Announcements: Project Proposals: due this Friday! Case Study 1: Estimating Click Probabilities Intro Logistic Regression Gradient Descent + SGD Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April 4, 017 1 Announcements:

More information

Lecture 2 - Learning Binary & Multi-class Classifiers from Labelled Training Data

Lecture 2 - Learning Binary & Multi-class Classifiers from Labelled Training Data Lecture 2 - Learning Binary & Multi-class Classifiers from Labelled Training Data DD2424 March 23, 2017 Binary classification problem given labelled training data Have labelled training examples? Given

More information

Logistic Regression Introduction to Machine Learning. Matt Gormley Lecture 8 Feb. 12, 2018

Logistic Regression Introduction to Machine Learning. Matt Gormley Lecture 8 Feb. 12, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Logistic Regression Matt Gormley Lecture 8 Feb. 12, 2018 1 10-601 Introduction

More information

Discriminative part-based models. Many slides based on P. Felzenszwalb

Discriminative part-based models. Many slides based on P. Felzenszwalb More sliding window detection: ti Discriminative part-based models Many slides based on P. Felzenszwalb Challenge: Generic object detection Pedestrian detection Features: Histograms of oriented gradients

More information

Warm up: risk prediction with logistic regression

Warm up: risk prediction with logistic regression Warm up: risk prediction with logistic regression Boss gives you a bunch of data on loans defaulting or not: {(x i,y i )} n i= x i 2 R d, y i 2 {, } You model the data as: P (Y = y x, w) = + exp( yw T

More information

text classification 3: neural networks

text classification 3: neural networks text classification 3: neural networks CS 585, Fall 2018 Introduction to Natural Language Processing http://people.cs.umass.edu/~miyyer/cs585/ Mohit Iyyer College of Information and Computer Sciences University

More information

Distinguish between different types of scenes. Matching human perception Understanding the environment

Distinguish between different types of scenes. Matching human perception Understanding the environment Scene Recognition Adriana Kovashka UTCS, PhD student Problem Statement Distinguish between different types of scenes Applications Matching human perception Understanding the environment Indexing of images

More information

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton Motivation Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses

More information

Machine Learning. Lecture 3: Logistic Regression. Feng Li.

Machine Learning. Lecture 3: Logistic Regression. Feng Li. Machine Learning Lecture 3: Logistic Regression Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2016 Logistic Regression Classification

More information

Linear classifiers: Overfitting and regularization

Linear classifiers: Overfitting and regularization Linear classifiers: Overfitting and regularization Emily Fox University of Washington January 25, 2017 Logistic regression recap 1 . Thus far, we focused on decision boundaries Score(x i ) = w 0 h 0 (x

More information

COMP 551 Applied Machine Learning Lecture 21: Bayesian optimisation

COMP 551 Applied Machine Learning Lecture 21: Bayesian optimisation COMP 55 Applied Machine Learning Lecture 2: Bayesian optimisation Associate Instructor: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp55 Unless otherwise noted, all material posted

More information

Lecture 3: Multiclass Classification

Lecture 3: Multiclass Classification Lecture 3: Multiclass Classification Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Some slides are adapted from Vivek Skirmar and Dan Roth CS6501 Lecture 3 1 Announcement v Please enroll in

More information

CSC321 Lecture 4: Learning a Classifier

CSC321 Lecture 4: Learning a Classifier CSC321 Lecture 4: Learning a Classifier Roger Grosse Roger Grosse CSC321 Lecture 4: Learning a Classifier 1 / 31 Overview Last time: binary classification, perceptron algorithm Limitations of the perceptron

More information

Regularization Introduction to Machine Learning. Matt Gormley Lecture 10 Feb. 19, 2018

Regularization Introduction to Machine Learning. Matt Gormley Lecture 10 Feb. 19, 2018 1-61 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Regularization Matt Gormley Lecture 1 Feb. 19, 218 1 Reminders Homework 4: Logistic

More information

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1 On-line Resources http://neuralnetworksanddeeplearning.com/index.html Online book by Michael Nielsen http://matlabtricks.com/post-5/3x3-convolution-kernelswith-online-demo

More information

Value Function Methods. CS : Deep Reinforcement Learning Sergey Levine

Value Function Methods. CS : Deep Reinforcement Learning Sergey Levine Value Function Methods CS 294-112: Deep Reinforcement Learning Sergey Levine Class Notes 1. Homework 2 is due in one week 2. Remember to start forming final project groups and writing your proposal! Proposal

More information

From perceptrons to word embeddings. Simon Šuster University of Groningen

From perceptrons to word embeddings. Simon Šuster University of Groningen From perceptrons to word embeddings Simon Šuster University of Groningen Outline A basic computational unit Weighting some input to produce an output: classification Perceptron Classify tweets Written

More information

Introduction to CNN and PyTorch

Introduction to CNN and PyTorch Introduction to CNN and PyTorch Kripasindhu Sarkar kripasindhu.sarkar@dfki.de Kaiserslautern University, DFKI Deutsches Forschungszentrum für Künstliche Intelligenz http://av.dfki.de Some of the contents

More information

Kernelized Perceptron Support Vector Machines

Kernelized Perceptron Support Vector Machines Kernelized Perceptron Support Vector Machines Emily Fox University of Washington February 13, 2017 What is the perceptron optimizing? 1 The perceptron algorithm [Rosenblatt 58, 62] Classification setting:

More information

CSC321 Lecture 6: Backpropagation

CSC321 Lecture 6: Backpropagation CSC321 Lecture 6: Backpropagation Roger Grosse Roger Grosse CSC321 Lecture 6: Backpropagation 1 / 21 Overview We ve seen that multilayer neural networks are powerful. But how can we actually learn them?

More information

CSC2541 Lecture 5 Natural Gradient

CSC2541 Lecture 5 Natural Gradient CSC2541 Lecture 5 Natural Gradient Roger Grosse Roger Grosse CSC2541 Lecture 5 Natural Gradient 1 / 12 Motivation Two classes of optimization procedures used throughout ML (stochastic) gradient descent,

More information

Introduction to Convolutional Neural Networks (CNNs)

Introduction to Convolutional Neural Networks (CNNs) Introduction to Convolutional Neural Networks (CNNs) nojunk@snu.ac.kr http://mipal.snu.ac.kr Department of Transdisciplinary Studies Seoul National University, Korea Jan. 2016 Many slides are from Fei-Fei

More information

An Introduction to Optimization Methods in Deep Learning. Yuan YAO HKUST

An Introduction to Optimization Methods in Deep Learning. Yuan YAO HKUST 1 An Introduction to Optimization Methods in Deep Learning Yuan YAO HKUST Acknowledgement Feifei Li, Stanford cs231n Ruder, Sebastian (2016). An overview of gradient descent optimization algorithms. arxiv:1609.04747.

More information

Machine Learning - MT Linear Regression

Machine Learning - MT Linear Regression Machine Learning - MT 2016 2. Linear Regression Varun Kanade University of Oxford October 12, 2016 Announcements All students eligible to take the course for credit can sign-up for classes and practicals

More information

Lecture 6: Neural Networks for Representing Word Meaning

Lecture 6: Neural Networks for Representing Word Meaning Lecture 6: Neural Networks for Representing Word Meaning Mirella Lapata School of Informatics University of Edinburgh mlap@inf.ed.ac.uk February 7, 2017 1 / 28 Logistic Regression Input is a feature vector,

More information

Series 7, May 22, 2018 (EM Convergence)

Series 7, May 22, 2018 (EM Convergence) Exercises Introduction to Machine Learning SS 2018 Series 7, May 22, 2018 (EM Convergence) Institute for Machine Learning Dept. of Computer Science, ETH Zürich Prof. Dr. Andreas Krause Web: https://las.inf.ethz.ch/teaching/introml-s18

More information

Stochastic Gradient Descent: The Workhorse of Machine Learning. CS6787 Lecture 1 Fall 2017

Stochastic Gradient Descent: The Workhorse of Machine Learning. CS6787 Lecture 1 Fall 2017 Stochastic Gradient Descent: The Workhorse of Machine Learning CS6787 Lecture 1 Fall 2017 Fundamentals of Machine Learning? Machine Learning in Practice this course What s missing in the basic stuff? Efficiency!

More information

Lecture 4: Training a Classifier

Lecture 4: Training a Classifier Lecture 4: Training a Classifier Roger Grosse 1 Introduction Now that we ve defined what binary classification is, let s actually train a classifier. We ll approach this problem in much the same way as

More information

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, 23 2013 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run

More information

Learning features by contrasting natural images with noise

Learning features by contrasting natural images with noise Learning features by contrasting natural images with noise Michael Gutmann 1 and Aapo Hyvärinen 12 1 Dept. of Computer Science and HIIT, University of Helsinki, P.O. Box 68, FIN-00014 University of Helsinki,

More information

FINAL: CS 6375 (Machine Learning) Fall 2014

FINAL: CS 6375 (Machine Learning) Fall 2014 FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for

More information

Lecture 2: Linear regression

Lecture 2: Linear regression Lecture 2: Linear regression Roger Grosse 1 Introduction Let s ump right in and look at our first machine learning algorithm, linear regression. In regression, we are interested in predicting a scalar-valued

More information

Machine Learning. Lecture 04: Logistic and Softmax Regression. Nevin L. Zhang

Machine Learning. Lecture 04: Logistic and Softmax Regression. Nevin L. Zhang Machine Learning Lecture 04: Logistic and Softmax Regression Nevin L. Zhang lzhang@cse.ust.hk Department of Computer Science and Engineering The Hong Kong University of Science and Technology This set

More information

Machine Learning (CSE 446): Neural Networks

Machine Learning (CSE 446): Neural Networks Machine Learning (CSE 446): Neural Networks Noah Smith c 2017 University of Washington nasmith@cs.washington.edu November 6, 2017 1 / 22 Admin No Wednesday office hours for Noah; no lecture Friday. 2 /

More information

Logistic Regression. Machine Learning Fall 2018

Logistic Regression. Machine Learning Fall 2018 Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes

More information

Machine Learning Lecture 7

Machine Learning Lecture 7 Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant

More information

Linear Models in Machine Learning

Linear Models in Machine Learning CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,

More information

Non-Linearity. CS 188: Artificial Intelligence. Non-Linear Separators. Non-Linear Separators. Deep Learning I

Non-Linearity. CS 188: Artificial Intelligence. Non-Linear Separators. Non-Linear Separators. Deep Learning I Non-Linearity CS 188: Artificial Intelligence Deep Learning I Instructors: Pieter Abbeel & Anca Dragan --- University of California, Berkeley [These slides were created by Dan Klein, Pieter Abbeel, Anca

More information

Machine Learning Basics: Stochastic Gradient Descent. Sargur N. Srihari

Machine Learning Basics: Stochastic Gradient Descent. Sargur N. Srihari Machine Learning Basics: Stochastic Gradient Descent Sargur N. srihari@cedar.buffalo.edu 1 Topics 1. Learning Algorithms 2. Capacity, Overfitting and Underfitting 3. Hyperparameters and Validation Sets

More information

Topics we covered. Machine Learning. Statistics. Optimization. Systems! Basics of probability Tail bounds Density Estimation Exponential Families

Topics we covered. Machine Learning. Statistics. Optimization. Systems! Basics of probability Tail bounds Density Estimation Exponential Families Midterm Review Topics we covered Machine Learning Optimization Basics of optimization Convexity Unconstrained: GD, SGD Constrained: Lagrange, KKT Duality Linear Methods Perceptrons Support Vector Machines

More information

CSC321 Lecture 8: Optimization

CSC321 Lecture 8: Optimization CSC321 Lecture 8: Optimization Roger Grosse Roger Grosse CSC321 Lecture 8: Optimization 1 / 26 Overview We ve talked a lot about how to compute gradients. What do we actually do with them? Today s lecture:

More information

Based on the original slides of Hung-yi Lee

Based on the original slides of Hung-yi Lee Based on the original slides of Hung-yi Lee Google Trends Deep learning obtains many exciting results. Can contribute to new Smart Services in the Context of the Internet of Things (IoT). IoT Services

More information

Outline. CSCI567 Machine Learning (Spring 2019) Outline. Math formulation. Prof. Victor Adamchik. Feb. 12, 2019

Outline. CSCI567 Machine Learning (Spring 2019) Outline. Math formulation. Prof. Victor Adamchik. Feb. 12, 2019 Outline CSCI56 Machine Learning (Spring 29) Review of last lecture Prof. Victor Adamchik 2 Convolutional neural networks U of Southern California Feb. 2, 29 Kernel methods February 2, 29 / 48 February

More information

Machine Learning (CS 567) Lecture 5

Machine Learning (CS 567) Lecture 5 Machine Learning (CS 567) Lecture 5 Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol

More information

The connection of dropout and Bayesian statistics

The connection of dropout and Bayesian statistics The connection of dropout and Bayesian statistics Interpretation of dropout as approximate Bayesian modelling of NN http://mlg.eng.cam.ac.uk/yarin/thesis/thesis.pdf Dropout Geoffrey Hinton Google, University

More information

Gradient-Based Learning. Sargur N. Srihari

Gradient-Based Learning. Sargur N. Srihari Gradient-Based Learning Sargur N. srihari@cedar.buffalo.edu 1 Topics Overview 1. Example: Learning XOR 2. Gradient-Based Learning 3. Hidden Units 4. Architecture Design 5. Backpropagation and Other Differentiation

More information

From Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018

From Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018 From Binary to Multiclass Classification CS 6961: Structured Prediction Spring 2018 1 So far: Binary Classification We have seen linear models Learning algorithms Perceptron SVM Logistic Regression Prediction

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

Machine Learning for NLP

Machine Learning for NLP Machine Learning for NLP Linear Models Joakim Nivre Uppsala University Department of Linguistics and Philology Slides adapted from Ryan McDonald, Google Research Machine Learning for NLP 1(26) Outline

More information

Administration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6

Administration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6 Administration Registration Hw3 is out Due on Thursday 10/6 Questions Lecture Captioning (Extra-Credit) Look at Piazza for details Scribing lectures With pay; come talk to me/send email. 1 Projects Projects

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

Neural Networks: Backpropagation

Neural Networks: Backpropagation Neural Networks: Backpropagation Machine Learning Fall 2017 Based on slides and material from Geoffrey Hinton, Richard Socher, Dan Roth, Yoav Goldberg, Shai Shalev-Shwartz and Shai Ben-David, and others

More information

Style-aware Mid-level Representation for Discovering Visual Connections in Space and Time

Style-aware Mid-level Representation for Discovering Visual Connections in Space and Time Style-aware Mid-level Representation for Discovering Visual Connections in Space and Time Experiment presentation for CS3710:Visual Recognition Presenter: Zitao Liu University of Pittsburgh ztliu@cs.pitt.edu

More information

CS 188: Artificial Intelligence Spring Announcements

CS 188: Artificial Intelligence Spring Announcements CS 188: Artificial Intelligence Spring 2010 Lecture 24: Perceptrons and More! 4/22/2010 Pieter Abbeel UC Berkeley Slides adapted from Dan Klein Announcements W7 due tonight [this is your last written for

More information

CS 188: Artificial Intelligence Spring Announcements

CS 188: Artificial Intelligence Spring Announcements CS 188: Artificial Intelligence Spring 2010 Lecture 22: Nearest Neighbors, Kernels 4/18/2011 Pieter Abbeel UC Berkeley Slides adapted from Dan Klein Announcements On-going: contest (optional and FUN!)

More information

Midterm: CS 6375 Spring 2018

Midterm: CS 6375 Spring 2018 Midterm: CS 6375 Spring 2018 The exam is closed book (1 cheat sheet allowed). Answer the questions in the spaces provided on the question sheets. If you run out of room for an answer, use an additional

More information

CS 3710: Visual Recognition Describing Images with Features. Adriana Kovashka Department of Computer Science January 8, 2015

CS 3710: Visual Recognition Describing Images with Features. Adriana Kovashka Department of Computer Science January 8, 2015 CS 3710: Visual Recognition Describing Images with Features Adriana Kovashka Department of Computer Science January 8, 2015 Plan for Today Presentation assignments + schedule changes Image filtering Feature

More information

A Discriminatively Trained, Multiscale, Deformable Part Model

A Discriminatively Trained, Multiscale, Deformable Part Model A Discriminatively Trained, Multiscale, Deformable Part Model P. Felzenszwalb, D. McAllester, and D. Ramanan Edward Hsiao 16-721 Learning Based Methods in Vision February 16, 2009 Images taken from P.

More information

CSC321 Lecture 9: Generalization

CSC321 Lecture 9: Generalization CSC321 Lecture 9: Generalization Roger Grosse Roger Grosse CSC321 Lecture 9: Generalization 1 / 27 Overview We ve focused so far on how to optimize neural nets how to get them to make good predictions

More information

Part-of-Speech Tagging + Neural Networks 3: Word Embeddings CS 287

Part-of-Speech Tagging + Neural Networks 3: Word Embeddings CS 287 Part-of-Speech Tagging + Neural Networks 3: Word Embeddings CS 287 Review: Neural Networks One-layer multi-layer perceptron architecture, NN MLP1 (x) = g(xw 1 + b 1 )W 2 + b 2 xw + b; perceptron x is the

More information