Lecture 35: Optimization and Neural Nets
|
|
- Jacob Stanley
- 5 years ago
- Views:
Transcription
1 Lecture 35: Optimization and Neural Nets CS 4670/5670 Sean Bell DeepDream [Google, Inceptionism: Going Deeper into Neural Networks, blog 2015]
2 Aside: CNN vs ConvNet Note: There are many papers that use either phrase, but ConvNet is the preferred term, since CNN clashes with that other thing called CNN
3 (Review) Softmax Classifier p i, j = k es i, j e s i,k L i = log p i,yi Q2: At initialization, W is small and thus s ~= 0. What is the loss L? Slide from Karpathy 2016
4 Softmax vs SVM Loss Slide from Karpathy 2016
5 Softmax vs SVM Loss Softmax: Slide from Karpathy 2016
6 Softmax vs SVM Loss Softmax: SVM: Slide from Karpathy 2016
7 Softmax vs SVM Loss Softmax: SVM: Slide from Karpathy 2016
8 Softmax vs SVM Loss Softmax: SVM: Slide from Karpathy 2016
9 Softmax Classifier Let s code this up in NumPy: p i, j = es i, j k e s i,k
10 Softmax Classifier Let s code this up in NumPy: p i, j = k es i, j e s i,k Doesn t work what s the problem?
11 Softmax Classifier Let s code this up in NumPy: p i, j = k es i, j e s i,k Doesn t work what s the problem? - What if there is the value 1000 appears in s?
12 Softmax Classifier Let s code this up in NumPy: p i, j = k es i, j e s i,k Doesn t work what s the problem? - What if there is the value 1000 appears in s? Overflow > we get inf/inf = NaN
13 Softmax Classifier Let s code this up in NumPy: p i, j = k es i, j e s i,k Doesn t work what s the problem? - What if there is the value 1000 appears in s? Overflow > we get inf/inf = NaN - What if the largest value is -1000?
14 Softmax Classifier Let s code this up in NumPy: p i, j = k es i, j e s i,k Doesn t work what s the problem? - What if there is the value 1000 appears in s? Overflow > we get inf/inf = NaN - What if the largest value is -1000? Underflow > we get 0/0 = NaN
15 Softmax Classifier Let s code this up in NumPy: p i, j = k es i, j e s i,k Doesn t work what s the problem? - What if there is the value 1000 appears in s? Overflow > we get inf/inf = NaN - What if the largest value is -1000? Underflow > we get 0/0 = NaN This expression is numerically unstable
16 Softmax Classifier
17 Softmax Classifier Observation: subtracting a constant does not change p :
18 Softmax Classifier Observation: subtracting a constant does not change p : p i, j = k es i, j C e s j,k C = e C e s i, j k e C e s i,k = es i, j k e s i,k
19 Softmax Classifier Observation: subtracting a constant does not change p : p i, j = k es i, j C e s j,k C = e C e s i, j k e C e s i,k = es i, j k e s i,k If we choose C to be the max, then it works:
20 Softmax Classifier Observation: subtracting a constant does not change p : p i, j = k es i, j C e s j,k C = e C e s i, j k e C e s i,k = es i, j k e s i,k If we choose C to be the max, then it works: - If a large value appears in s, then that value will become 1 and all others will be 0 (avoiding overflow)
21 Softmax Classifier Observation: subtracting a constant does not change p : p i, j = k es i, j C e s j,k C = e C e s i, j k e C e s i,k = es i, j k e s i,k If we choose C to be the max, then it works: - If a large value appears in s, then that value will become 1 and all others will be 0 (avoiding overflow) - If all values in s are large negative, then they will be shifted up towards 0 (avoiding underflow)
22 Optimization How do we minimize the loss L?
23 Idea #1: Random Search
24 Idea #1: Random Search
25 Idea #1: Random Search
26 Idea #1: Random Search
27 Idea #2: Follow the Slope
28 Idea #2: Follow the Slope
29 Idea #2: Follow the Slope
30 Idea #2: Follow the Slope
31 Idea #2: Follow the Slope In 1D, the derivative of a function is: In multiple dimensions, the gradient is the vector of (partial derivatives). Slide from Karpathy 2016
32 Idea #2: Follow the Slope In 1D, the derivative of a function is: In multiple dimensions, the gradient is the vector of (partial derivatives). We could directly estimate it: Slide from Karpathy 2016
33 Idea #2: Follow the Slope In 1D, the derivative of a function is: In multiple dimensions, the gradient is the vector of (partial derivatives). We could directly estimate it: But this is super slow. Slide from Karpathy 2016
34 Idea #2: Follow the Slope We can just directly compute the gradient: Slide from Karpathy 2016
35 Idea #2: Follow the Slope We can just directly compute the gradient: Calculus Slide from Karpathy 2016
36 Idea #2: Follow the Slope Slide from Karpathy 2016
37 Idea #2: Follow the Slope Slide from Karpathy 2016
38 Gradient Descent Slide from Karpathy 2016
39 Gradient Descent Slide from Karpathy 2016
40 Gradient Descent Slide from Karpathy 2016
41 Gradient Descent Step size called the learning rate Slide from Karpathy 2016
42 Mini-batch Gradient Descent Slide from Karpathy 2016
43 Mini-batch Gradient Descent Note: In practice we use fancier update rules (momentum, RMSprop, ADAM, etc) Slide from Karpathy 2016
44 Mini-batch Gradient Descent Typical loss Slide from Karpathy 2016
45 Setting the learning rate Learning rate: 1e6 what could go wrong? L Loss A weight in W
46 Setting the learning rate Learning rate: 1e6 what could go wrong? L Loss A weight in W
47 Setting the learning rate Learning rate: 1e6 what could go wrong? L Loss A weight in W
48 Setting the learning rate Learning rate: 1e6 what could go wrong? L Loss A weight in W
49 Setting the learning rate Learning rate: 1e6 what could go wrong? L Loss A weight in W
50 Setting the learning rate Learning rate: 1e6 what could go wrong? L Loss A weight in W
51 Setting the learning rate Typical loss (more on this later) Slide from Karpathy 2016
52 Classification: Overview Training Images Test Image
53 Classification: Overview Training Images Gradient Descent Test Image
54 Classification: Overview Training Images Training Labels Gradient Descent Test Image
55 Classification: Overview Training Images Training Labels Trained Classifier Gradient Descent Test Image
56 Classification: Overview Training Images Training Labels Trained Classifier Gradient Descent Test Image
57 Classification: Overview Training Images Training Labels Trained Classifier Gradient Descent Prediction: outdoor Test Image
58 Neural Networks (First we ll cover Neural Nets, then build up to Convolutional Neural Nets)
59 Feature hierarchy with ConvNets End-to-end models [Zeiler and Fergus, Visualizing and Understanding Convolutional Networks, 2013]
60 Slide: R. Fergus
61 Inspiration from Biology Figure: Andrej Karpathy
62 Inspiration from Biology Figure: Andrej Karpathy
63 Inspiration from Biology Neural nets are loosely inspired by biology Figure: Andrej Karpathy
64 Inspiration from Biology Neural nets are loosely inspired by biology But they certainly are not a model of how the brain works, or even how neurons work Figure: Andrej Karpathy
65 Simple Neural Net: 1 Layer Let s consider a simple 1-layer network: x Wx + b f
66 Simple Neural Net: 1 Layer Let s consider a simple 1-layer network: x Wx + b f This is the same as what you saw last class:
67 1 Layer Neural Net Block Diagram: x Wx + b f (Input) (class scores)
68 1 Layer Neural Net Block Diagram: x Wx + b f (Input) (class scores) Expanded Block Diagram: M W + x b D D 1 M = f 1 M 1 M classes D features 1 example
69 1 Layer Neural Net Block Diagram: x Wx + b f (Input) (class scores) Expanded Block Diagram: M W + x b D D 1 M = f 1 M NumPy: 1 M classes D features 1 example
70 1 Layer Neural Net
71 1 Layer Neural Net How do we process N inputs at once?
72 1 Layer Neural Net How do we process N inputs at once? It s most convenient to have the first dimension (row) represent which example we are looking at, so we need to transpose everything
73 1 Layer Neural Net How do we process N inputs at once? It s most convenient to have the first dimension (row) represent which example we are looking at, so we need to transpose everything N X D
74 1 Layer Neural Net How do we process N inputs at once? It s most convenient to have the first dimension (row) represent which example we are looking at, so we need to transpose everything N X D W M D
75 1 Layer Neural Net How do we process N inputs at once? It s most convenient to have the first dimension (row) represent which example we are looking at, so we need to transpose everything N X W D b! b N D M M
76 1 Layer Neural Net How do we process N inputs at once? It s most convenient to have the first dimension (row) represent which example we are looking at, so we need to transpose everything N X W D b! b N = F N D M M M M classes D features N examples
77 1 Layer Neural Net How do we process N inputs at once? It s most convenient to have the first dimension (row) represent which example we are looking at, so we need to transpose everything N X W D b! b N = F N D M M M Note: Often, if the weights are transposed, they are still called W M classes D features N examples
78 1 Layer Neural Net
79 1 Layer Neural Net D X N Each row is one input example
80 1 Layer Neural Net D X N Each row is one input example W D Each column is the weights for one class M
81 1 Layer Neural Net D X N Each row is one input example W D Each column is the weights for one class M F M N Each row is the predicted scores for one example
82 1 Layer Neural Net Implementing this with NumPy: First attempt let s try this:
83 1 Layer Neural Net Implementing this with NumPy: First attempt let s try this: Doesn t work why?
84 1 Layer Neural Net Implementing this with NumPy: First attempt let s try this: N x D M Doesn t work why? D x M
85 1 Layer Neural Net Implementing this with NumPy: First attempt let s try this: N x D M Doesn t work why? D x M - NumPy needs to know how to expand b from 1D to 2D
86 1 Layer Neural Net Implementing this with NumPy: First attempt let s try this: N x D M Doesn t work why? D x M - NumPy needs to know how to expand b from 1D to 2D - This is called broadcasting
87 1 Layer Neural Net Implementing this with NumPy: What does np.newaxis do?
88 1 Layer Neural Net Implementing this with NumPy: What does np.newaxis do? b = [0, 1, 2]
89 1 Layer Neural Net Implementing this with NumPy: What does np.newaxis do? b = [0, 1, 2] Make b a row vector
90 1 Layer Neural Net Implementing this with NumPy: What does np.newaxis do? b = [0, 1, 2] Make b a row vector Make b a column vector
91 1 Layer Neural Net Implementing this with NumPy: What does np.newaxis do?
92 1 Layer Neural Net Implementing this with NumPy: What does np.newaxis do? Row vector (repeat along rows)
93 1 Layer Neural Net Implementing this with NumPy: What does np.newaxis do? Row vector (repeat along rows) Column vector (repeat along columns)
94 2 Layer Neural Net What if we just added another layer? x W (1) x + b (1) h
95 2 Layer Neural Net What if we just added another layer? x W (1) x + b (1) h W (2) h + b (2) f
96 2 Layer Neural Net What if we just added another layer? x W (1) x + b (1) h W (2) h + b (2) f Let s expand out the equation:
97 2 Layer Neural Net What if we just added another layer? x W (1) x + b (1) h W (2) h + b (2) f Let s expand out the equation: f = W (2) (W (1) x + b (1) ) + b (2) = (W (2) W (1) )x + (W (2) b (1) + b (2) )
98 2 Layer Neural Net What if we just added another layer? x W (1) x + b (1) h W (2) h + b (2) f Let s expand out the equation: f = W (2) (W (1) x + b (1) ) + b (2) = (W (2) W (1) )x + (W (2) b (1) + b (2) ) But this is just the same as a 1 layer net with:
99 2 Layer Neural Net What if we just added another layer? x W (1) x + b (1) h W (2) h + b (2) f Let s expand out the equation: f = W (2) (W (1) x + b (1) ) + b (2) = (W (2) W (1) )x + (W (2) b (1) + b (2) ) But this is just the same as a 1 layer net with: W = W (2) W (1) b = W (2) b (1) + b (2)
100 2 Layer Neural Net What if we just added another layer? x W (1) x + b (1) h W (2) h + b (2) f Let s expand out the equation: f = W (2) (W (1) x + b (1) ) + b (2) = (W (2) W (1) )x + (W (2) b (1) + b (2) ) But this is just the same as a 1 layer net with: W = W (2) W (1) b = W (2) b (1) + b (2) We need a non-linear operation between the layers
101 Nonlinearities
102 Sigmoid Nonlinearities
103 Nonlinearities Sigmoid Tanh
104 Nonlinearities Sigmoid Tanh ReLU
105 Nonlinearities Sigmoid Tanh ReLU Historically popular 2 Big problems: - Not zero centered - They saturate
106 Nonlinearities Sigmoid Tanh ReLU Historically popular 2 Big problems: - Not zero centered - They saturate - Zero-centered, - But also saturates
107 Nonlinearities Sigmoid Tanh ReLU Historically popular 2 Big problems: - Not zero centered - They saturate - Zero-centered, - But also saturates - No saturation - Very efficient
108 Nonlinearities Sigmoid Tanh ReLU Historically popular 2 Big problems: - Not zero centered - They saturate - Zero-centered, - But also saturates Best in practice for classification - No saturation - Very efficient
109 Nonlinearities Saturation What happens if we reach this part? Sigmoid
110 Nonlinearities Saturation What happens if we reach this part? Sigmoid
111 Nonlinearities Saturation What happens if we reach this part? Sigmoid
112 Nonlinearities Saturation What happens if we reach this part? Sigmoid
113 Nonlinearities Saturation What happens if we reach this part? σ (x) = 1 1+ e x
114 Nonlinearities Saturation What happens if we reach this part? σ (x) = 1 1+ e x σ x = σ (x)( 1 σ (x))
115 Nonlinearities Saturation What happens if we reach this part? σ (x) = 1 1+ e x σ x = σ (x)( 1 σ (x))
116 Nonlinearities Saturation What happens if we reach this part? σ (x) = 1 1+ e x σ x = σ (x)( 1 σ (x))
117 Nonlinearities Saturation What happens if we reach this part? σ (x) = 1 1+ e x σ x = σ (x)( 1 σ (x))
118 Nonlinearities Saturation What happens if we reach this part? σ (x) = 1 1+ e x σ x = σ (x)( 1 σ (x)) Saturation: the gradient is zero!
119 Nonlinearities [Krizhevsky 2012] (AlexNet) tanh ReLU
120 Nonlinearities [Krizhevsky 2012] (AlexNet) tanh ReLU In practice, ReLU converges ~6x faster than Tanh for classification problems
121 ReLU in NumPy Many ways to write ReLU these are all equivalent:
122 ReLU in NumPy Many ways to write ReLU these are all equivalent: (a) Elementwise max, 0 gets broadcasted to match the size of h1:
123 ReLU in NumPy Many ways to write ReLU these are all equivalent: (a) Elementwise max, 0 gets broadcasted to match the size of h1: (b) Make a boolean mask where negative values are True, and then set those entries in h1 to 0:
124 ReLU in NumPy Many ways to write ReLU these are all equivalent: (a) Elementwise max, 0 gets broadcasted to match the size of h1: (b) Make a boolean mask where negative values are True, and then set those entries in h1 to 0: (c) Make a boolean mask where positive values are True, and then do an elementwise multiplication (since int(true) = 1):
125 2 Layer Neural Net
126 2 Layer Neural Net (Layer 1) x W (1) x + b (1)
127 2 Layer Neural Net (Layer 1) x W (1) x + b (1) (Nonlinearity) max(0, ) h 1 h (1) ( hidden activations )
128 2 Layer Neural Net (Layer 1) (Nonlinearity) (Layer 2) x W (1) x + b (1) max(0, ) h 1 h (1) W (2) x + b (2) f ( hidden activations )
129 2 Layer Neural Net (Layer 1) (Nonlinearity) (Layer 2) x W (1) x + b (1) max(0, ) h 1 h (1) W (2) x + b (2) f ( hidden activations ) Let s expand out the equation:
130 2 Layer Neural Net (Layer 1) (Nonlinearity) (Layer 2) x W (1) x + b (1) max(0, ) h 1 h (1) W (2) x + b (2) f ( hidden activations ) Let s expand out the equation: f = W (2) max(0,w (1) x + b (1) ) + b (2)
131 2 Layer Neural Net (Layer 1) (Nonlinearity) (Layer 2) x W (1) x + b (1) max(0, ) h 1 h (1) W (2) x + b (2) f ( hidden activations ) Let s expand out the equation: f = W (2) max(0,w (1) x + b (1) ) + b (2) Now it no longer simplifies yay
132 2 Layer Neural Net (Layer 1) (Nonlinearity) (Layer 2) x W (1) x + b (1) max(0, ) h 1 h (1) W (2) x + b (2) f ( hidden activations ) Let s expand out the equation: f = W (2) max(0,w (1) x + b (1) ) + b (2) Now it no longer simplifies yay Note: any nonlinear function will prevent this collapse, but not all nonlinear functions actually work in practice
133 2 Layer Neural Net (Layer 1) (Nonlinearity) (Layer 2) x W (1) x + b (1) max(0, ) h 1 h (1) W (2) x + b (2) f ( hidden activations ) Note: Traditionally, the nonlinearity was considered part of the layer and is called an activation function In this class, we will consider them separate layers, but be aware that many others consider them part of the layer
134 Neural Net: Graphical Representation
135 Neural Net: Graphical Representation 2 layers
136 Neural Net: Graphical Representation 2 layers - Called fully connected because every output depends on every input. - Also called affine or inner product
137 Neural Net: Graphical Representation 2 layers - Called fully connected because every output depends on every input. - Also called affine or inner product 3 layers
138 Questions?
139 Neural Networks, More generally x Function h (1) Function h (2)... f
140 Neural Networks, More generally x Function h (1) Function h (2)... f This can be: - Fully connected layer - Nonlinearity (ReLU, Tanh, Sigmoid) - Convolution - Pooling (Max, Avg) - Vector normalization (L1, L2) - Invent new ones
141 Neural Networks, More generally x h (1) Function Function h (2)... f
142 Neural Networks, More generally (Input) (Hidden Activation) (Scores) x Function h (1) Function h (2)... f
143 Neural Networks, More generally (Input) (Parameters) θ (1) (Parameters) θ (2) (Hidden Activation) x Function h (1) Function h (2)... (Scores) f
144 Neural Networks, More generally (Input) (Parameters) θ (1) (Parameters) θ (2) (Hidden Activation) x Function h (1) Function h (2)... (Scores) f Here, θ represents whatever parameters that layer is using (e.g. for a fully connected layer ). θ (1) = { W (1), b (1) }
145 Neural Networks, More generally (Input) (Parameters) θ (1) (Parameters) θ (2) (Hidden Activation) x Function h (1) Function h (2)... (Scores) f (Ground Truth Labels) y Here, θ represents whatever parameters that layer is using (e.g. for a fully connected layer ). θ (1) = { W (1), b (1) }
146 Neural Networks, More generally (Input) (Parameters) θ (1) (Parameters) θ (2) (Hidden Activation) x Function h (1) Function h (2)... (Scores) f (Ground Truth Labels) y L Here, θ represents whatever parameters that layer is using (e.g. for a fully connected layer ). θ (1) = { W (1), b (1) } (Loss)
147 Neural Networks, More generally (Input) (Parameters) θ (1) (Parameters) θ (2) (Hidden Activation) x Function h (1) Function h (2)... (Scores) f (Ground Truth Labels) y L Here, θ represents whatever parameters that layer is using (e.g. for a fully connected layer ). θ (1) = { W (1), b (1) } Recall: the loss L measures how far the predictions f (Loss) are from the labels y. The most common loss is Softmax.
Understanding How ConvNets See
Understanding How ConvNets See Slides from Andrej Karpathy Springerberg et al, Striving for Simplicity: The All Convolutional Net (ICLR 2015 workshops) CSC321: Intro to Machine Learning and Neural Networks,
More informationIntroduction to Convolutional Neural Networks (CNNs)
Introduction to Convolutional Neural Networks (CNNs) nojunk@snu.ac.kr http://mipal.snu.ac.kr Department of Transdisciplinary Studies Seoul National University, Korea Jan. 2016 Many slides are from Fei-Fei
More informationCS 343: Artificial Intelligence
CS 343: Artificial Intelligence Deep Learning Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein, Pieter Abbeel, Anca Dragan for CS188 Intro to AI at UC Berkeley.
More informationNon-Linearity. CS 188: Artificial Intelligence. Non-Linear Separators. Non-Linear Separators. Deep Learning I
Non-Linearity CS 188: Artificial Intelligence Deep Learning I Instructors: Pieter Abbeel & Anca Dragan --- University of California, Berkeley [These slides were created by Dan Klein, Pieter Abbeel, Anca
More informationCS60010: Deep Learning
CS60010: Deep Learning Sudeshna Sarkar Spring 2018 16 Jan 2018 FFN Goal: Approximate some unknown ideal function f : X! Y Ideal classifier: y = f*(x) with x and category y Feedforward Network: Define parametric
More informationDEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY
DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1 On-line Resources http://neuralnetworksanddeeplearning.com/index.html Online book by Michael Nielsen http://matlabtricks.com/post-5/3x3-convolution-kernelswith-online-demo
More informationCS 1674: Intro to Computer Vision. Final Review. Prof. Adriana Kovashka University of Pittsburgh December 7, 2016
CS 1674: Intro to Computer Vision Final Review Prof. Adriana Kovashka University of Pittsburgh December 7, 2016 Final info Format: multiple-choice, true/false, fill in the blank, short answers, apply an
More informationConvolutional Neural Networks
Convolutional Neural Networks Books» http://www.deeplearningbook.org/ Books http://neuralnetworksanddeeplearning.com/.org/ reviews» http://www.deeplearningbook.org/contents/linear_algebra.html» http://www.deeplearningbook.org/contents/prob.html»
More informationBased on the original slides of Hung-yi Lee
Based on the original slides of Hung-yi Lee Google Trends Deep learning obtains many exciting results. Can contribute to new Smart Services in the Context of the Internet of Things (IoT). IoT Services
More information1 What a Neural Network Computes
Neural Networks 1 What a Neural Network Computes To begin with, we will discuss fully connected feed-forward neural networks, also known as multilayer perceptrons. A feedforward neural network consists
More informationNeural Networks. Nicholas Ruozzi University of Texas at Dallas
Neural Networks Nicholas Ruozzi University of Texas at Dallas Handwritten Digit Recognition Given a collection of handwritten digits and their corresponding labels, we d like to be able to correctly classify
More informationBayesian Networks (Part I)
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Bayesian Networks (Part I) Graphical Model Readings: Murphy 10 10.2.1 Bishop 8.1,
More informationTopics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound. Lecture 3: Introduction to Deep Learning (continued)
Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound Lecture 3: Introduction to Deep Learning (continued) Course Logistics - Update on course registrations - 6 seats left now -
More informationCSC321 Lecture 4: Learning a Classifier
CSC321 Lecture 4: Learning a Classifier Roger Grosse Roger Grosse CSC321 Lecture 4: Learning a Classifier 1 / 28 Overview Last time: binary classification, perceptron algorithm Limitations of the perceptron
More informationNeural Architectures for Image, Language, and Speech Processing
Neural Architectures for Image, Language, and Speech Processing Karl Stratos June 26, 2018 1 / 31 Overview Feedforward Networks Need for Specialized Architectures Convolutional Neural Networks (CNNs) Recurrent
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Neural Networks: A brief touch Yuejie Chi Department of Electrical and Computer Engineering Spring 2018 1/41 Outline
More informationLecture 4: Training a Classifier
Lecture 4: Training a Classifier Roger Grosse 1 Introduction Now that we ve defined what binary classification is, let s actually train a classifier. We ll approach this problem in much the same way as
More informationIntroduction to Machine Learning Spring 2018 Note Neural Networks
CS 189 Introduction to Machine Learning Spring 2018 Note 14 1 Neural Networks Neural networks are a class of compositional function approximators. They come in a variety of shapes and sizes. In this class,
More informationLecture 17: Neural Networks and Deep Learning
UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016 Lecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions
More informationTraining Neural Networks Practical Issues
Training Neural Networks Practical Issues M. Soleymani Sharif University of Technology Fall 2017 Most slides have been adapted from Fei Fei Li and colleagues lectures, cs231n, Stanford 2017, and some from
More informationDeep Learning (CNNs)
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Deep Learning (CNNs) Deep Learning Readings: Murphy 28 Bishop - - HTF - - Mitchell
More informationCSC321 Lecture 4: Learning a Classifier
CSC321 Lecture 4: Learning a Classifier Roger Grosse Roger Grosse CSC321 Lecture 4: Learning a Classifier 1 / 31 Overview Last time: binary classification, perceptron algorithm Limitations of the perceptron
More informationLecture 7 Convolutional Neural Networks
Lecture 7 Convolutional Neural Networks CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 17, 2017 We saw before: ŷ x 1 x 2 x 3 x 4 A series of matrix multiplications:
More informationBackpropagation and Neural Networks part 1. Lecture 4-1
Lecture 4: Backpropagation and Neural Networks part 1 Lecture 4-1 Administrative A1 is due Jan 20 (Wednesday). ~150 hours left Warning: Jan 18 (Monday) is Holiday (no class/office hours) Also note: Lectures
More informationLecture 4: Training a Classifier
Lecture 4: Training a Classifier Roger Grosse 1 Introduction Now that we ve defined what binary classification is, let s actually train a classifier. We ll approach this problem in much the same way as
More informationClassification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box
ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton Motivation Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses
More informationSGD and Deep Learning
SGD and Deep Learning Subgradients Lets make the gradient cheating more formal. Recall that the gradient is the slope of the tangent. f(w 1 )+rf(w 1 ) (w w 1 ) Non differentiable case? w 1 Subgradients
More informationOnline Videos FERPA. Sign waiver or sit on the sides or in the back. Off camera question time before and after lecture. Questions?
Online Videos FERPA Sign waiver or sit on the sides or in the back Off camera question time before and after lecture Questions? Lecture 1, Slide 1 CS224d Deep NLP Lecture 4: Word Window Classification
More informationNeural networks and support vector machines
Neural netorks and support vector machines Perceptron Input x 1 Weights 1 x 2 x 3... x D 2 3 D Output: sgn( x + b) Can incorporate bias as component of the eight vector by alays including a feature ith
More informationDeep Learning II: Momentum & Adaptive Step Size
Deep Learning II: Momentum & Adaptive Step Size CS 760: Machine Learning Spring 2018 Mark Craven and David Page www.biostat.wisc.edu/~craven/cs760 1 Goals for the Lecture You should understand the following
More informationEVERYTHING YOU NEED TO KNOW TO BUILD YOUR FIRST CONVOLUTIONAL NEURAL NETWORK (CNN)
EVERYTHING YOU NEED TO KNOW TO BUILD YOUR FIRST CONVOLUTIONAL NEURAL NETWORK (CNN) TARGETED PIECES OF KNOWLEDGE Linear regression Activation function Multi-Layers Perceptron (MLP) Stochastic Gradient Descent
More informationMore on Neural Networks
More on Neural Networks Yujia Yan Fall 2018 Outline Linear Regression y = Wx + b (1) Linear Regression y = Wx + b (1) Polynomial Regression y = Wφ(x) + b (2) where φ(x) gives the polynomial basis, e.g.,
More informationComputational statistics
Computational statistics Lecture 3: Neural networks Thierry Denœux 5 March, 2016 Neural networks A class of learning methods that was developed separately in different fields statistics and artificial
More informationPattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore
Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore Lecture - 27 Multilayer Feedforward Neural networks with Sigmoidal
More informationNeural Networks. David Rosenberg. July 26, New York University. David Rosenberg (New York University) DS-GA 1003 July 26, / 35
Neural Networks David Rosenberg New York University July 26, 2017 David Rosenberg (New York University) DS-GA 1003 July 26, 2017 1 / 35 Neural Networks Overview Objectives What are neural networks? How
More informationCSC321 Lecture 7: Optimization
CSC321 Lecture 7: Optimization Roger Grosse Roger Grosse CSC321 Lecture 7: Optimization 1 / 25 Overview We ve talked a lot about how to compute gradients. What do we actually do with them? Today s lecture:
More information<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)
Learning for Deep Neural Networks (Back-propagation) Outline Summary of Previous Standford Lecture Universal Approximation Theorem Inference vs Training Gradient Descent Back-Propagation
More informationCSC321 Lecture 8: Optimization
CSC321 Lecture 8: Optimization Roger Grosse Roger Grosse CSC321 Lecture 8: Optimization 1 / 26 Overview We ve talked a lot about how to compute gradients. What do we actually do with them? Today s lecture:
More informationMachine Learning for Computer Vision 8. Neural Networks and Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group
Machine Learning for Computer Vision 8. Neural Networks and Deep Learning Vladimir Golkov Technical University of Munich Computer Vision Group INTRODUCTION Nonlinear Coordinate Transformation http://cs.stanford.edu/people/karpathy/convnetjs/
More informationECE521 Lectures 9 Fully Connected Neural Networks
ECE521 Lectures 9 Fully Connected Neural Networks Outline Multi-class classification Learning multi-layer neural networks 2 Measuring distance in probability space We learnt that the squared L2 distance
More information(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann
(Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for
More informationMachine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6
Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)
More informationBackpropagation Introduction to Machine Learning. Matt Gormley Lecture 12 Feb 23, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Backpropagation Matt Gormley Lecture 12 Feb 23, 2018 1 Neural Networks Outline
More informationDeep Neural Networks (1) Hidden layers; Back-propagation
Deep Neural Networs (1) Hidden layers; Bac-propagation Steve Renals Machine Learning Practical MLP Lecture 3 2 October 2018 http://www.inf.ed.ac.u/teaching/courses/mlp/ MLP Lecture 3 / 2 October 2018 Deep
More informationCSC 411 Lecture 10: Neural Networks
CSC 411 Lecture 10: Neural Networks Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 10-Neural Networks 1 / 35 Inspiration: The Brain Our brain has 10 11
More informationLoss Functions and Optimization. Lecture 3-1
Lecture 3: Loss Functions and Optimization Lecture 3-1 Administrative Assignment 1 is released: http://cs231n.github.io/assignments2017/assignment1/ Due Thursday April 20, 11:59pm on Canvas (Extending
More informationConvolutional neural networks
11-1: Convolutional neural networks Prof. J.C. Kao, UCLA Convolutional neural networks Motivation Biological inspiration Convolution operation Convolutional layer Padding and stride CNN architecture 11-2:
More informationLearning with Momentum, Conjugate Gradient Learning
Learning with Momentum, Conjugate Gradient Learning Introduction to Neural Networks : Lecture 8 John A. Bullinaria, 2004 1. Visualising Learning 2. Learning with Momentum 3. Learning with Line Searches
More informationLecture 3 Feedforward Networks and Backpropagation
Lecture 3 Feedforward Networks and Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 3, 2017 Things we will look at today Recap of Logistic Regression
More informationComputing Neural Network Gradients
Computing Neural Network Gradients Kevin Clark 1 Introduction The purpose of these notes is to demonstrate how to quickly compute neural network gradients in a completely vectorized way. It is complementary
More informationData Mining & Machine Learning
Data Mining & Machine Learning CS57300 Purdue University March 1, 2018 1 Recap of Last Class (Model Search) Forward and Backward passes 2 Feedforward Neural Networks Neural Networks: Architectures 2-layer
More informationDeep Neural Networks (1) Hidden layers; Back-propagation
Deep Neural Networs (1) Hidden layers; Bac-propagation Steve Renals Machine Learning Practical MLP Lecture 3 4 October 2017 / 9 October 2017 MLP Lecture 3 Deep Neural Networs (1) 1 Recap: Softmax single
More informationNeural Networks (Part 1) Goals for the lecture
Neural Networks (Part ) Mark Craven and David Page Computer Sciences 760 Spring 208 www.biostat.wisc.edu/~craven/cs760/ Some of the slides in these lectures have been adapted/borrowed from materials developed
More informationCSCI 315: Artificial Intelligence through Deep Learning
CSCI 35: Artificial Intelligence through Deep Learning W&L Fall Term 27 Prof. Levy Convolutional Networks http://wernerstudio.typepad.com/.a/6ad83549adb53ef53629ccf97c-5wi Convolution: Convolution is
More informationNeural networks and optimization
Neural networks and optimization Nicolas Le Roux Criteo 18/05/15 Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 1 / 85 1 Introduction 2 Deep networks 3 Optimization 4 Convolutional
More informationComputational Photography
Computational Photography Si Lu Spring 2018 http://web.cecs.pdx.edu/~lusi/cs510/cs510_computati onal_photography.htm 05/29/2018 Last Time o 3D Video Stabilization 2 Introduction of Neural Networks 3 Content
More informationIntroduction to Neural Networks
CUONG TUAN NGUYEN SEIJI HOTTA MASAKI NAKAGAWA Tokyo University of Agriculture and Technology Copyright by Nguyen, Hotta and Nakagawa 1 Pattern classification Which category of an input? Example: Character
More informationLECTURE # - NEURAL COMPUTATION, Feb 04, Linear Regression. x 1 θ 1 output... θ M x M. Assumes a functional form
LECTURE # - EURAL COPUTATIO, Feb 4, 4 Linear Regression Assumes a functional form f (, θ) = θ θ θ K θ (Eq) where = (,, ) are the attributes and θ = (θ, θ, θ ) are the function parameters Eample: f (, θ)
More informationCSC321 Lecture 5: Multilayer Perceptrons
CSC321 Lecture 5: Multilayer Perceptrons Roger Grosse Roger Grosse CSC321 Lecture 5: Multilayer Perceptrons 1 / 21 Overview Recall the simple neuron-like unit: y output output bias i'th weight w 1 w2 w3
More informationNeural Networks, Computation Graphs. CMSC 470 Marine Carpuat
Neural Networks, Computation Graphs CMSC 470 Marine Carpuat Binary Classification with a Multi-layer Perceptron φ A = 1 φ site = 1 φ located = 1 φ Maizuru = 1 φ, = 2 φ in = 1 φ Kyoto = 1 φ priest = 0 φ
More informationMachine Learning (CSE 446): Neural Networks
Machine Learning (CSE 446): Neural Networks Noah Smith c 2017 University of Washington nasmith@cs.washington.edu November 6, 2017 1 / 22 Admin No Wednesday office hours for Noah; no lecture Friday. 2 /
More informationLecture 3 Feedforward Networks and Backpropagation
Lecture 3 Feedforward Networks and Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 3, 2017 Things we will look at today Recap of Logistic Regression
More informationNotes on Back Propagation in 4 Lines
Notes on Back Propagation in 4 Lines Lili Mou moull12@sei.pku.edu.cn March, 2015 Congratulations! You are reading the clearest explanation of forward and backward propagation I have ever seen. In this
More informationJakub Hajic Artificial Intelligence Seminar I
Jakub Hajic Artificial Intelligence Seminar I. 11. 11. 2014 Outline Key concepts Deep Belief Networks Convolutional Neural Networks A couple of questions Convolution Perceptron Feedforward Neural Network
More informationLoss Functions and Optimization. Lecture 3-1
Lecture 3: Loss Functions and Optimization Lecture 3-1 Administrative: Live Questions We ll use Zoom to take questions from remote students live-streaming the lecture Check Piazza for instructions and
More informationDeep Learning for NLP
Deep Learning for NLP CS224N Christopher Manning (Many slides borrowed from ACL 2012/NAACL 2013 Tutorials by me, Richard Socher and Yoshua Bengio) Machine Learning and NLP NER WordNet Usually machine learning
More informationIntroduction to Convolutional Neural Networks 2018 / 02 / 23
Introduction to Convolutional Neural Networks 2018 / 02 / 23 Buzzword: CNN Convolutional neural networks (CNN, ConvNet) is a class of deep, feed-forward (not recurrent) artificial neural networks that
More information11/3/15. Deep Learning for NLP. Deep Learning and its Architectures. What is Deep Learning? Advantages of Deep Learning (Part 1)
11/3/15 Machine Learning and NLP Deep Learning for NLP Usually machine learning works well because of human-designed representations and input features CS224N WordNet SRL Parser Machine learning becomes
More informationArtificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino
Artificial Neural Networks Data Base and Data Mining Group of Politecnico di Torino Elena Baralis Politecnico di Torino Artificial Neural Networks Inspired to the structure of the human brain Neurons as
More informationCSC321 Lecture 5 Learning in a Single Neuron
CSC321 Lecture 5 Learning in a Single Neuron Roger Grosse and Nitish Srivastava January 21, 2015 Roger Grosse and Nitish Srivastava CSC321 Lecture 5 Learning in a Single Neuron January 21, 2015 1 / 14
More informationNeural Networks: Backpropagation
Neural Networks: Backpropagation Machine Learning Fall 2017 Based on slides and material from Geoffrey Hinton, Richard Socher, Dan Roth, Yoav Goldberg, Shai Shalev-Shwartz and Shai Ben-David, and others
More informationDeep Learning Recurrent Networks 2/28/2018
Deep Learning Recurrent Networks /8/8 Recap: Recurrent networks can be incredibly effective Story so far Y(t+) Stock vector X(t) X(t+) X(t+) X(t+) X(t+) X(t+5) X(t+) X(t+7) Iterated structures are good
More informationNeural Networks 2. 2 Receptive fields and dealing with image inputs
CS 446 Machine Learning Fall 2016 Oct 04, 2016 Neural Networks 2 Professor: Dan Roth Scribe: C. Cheng, C. Cervantes Overview Convolutional Neural Networks Recurrent Neural Networks 1 Introduction There
More informationAn overview of deep learning methods for genomics
An overview of deep learning methods for genomics Matthew Ploenzke STAT115/215/BIO/BIST282 Harvard University April 19, 218 1 Snapshot 1. Brief introduction to convolutional neural networks What is deep
More informationConvolutional Neural Networks II. Slides from Dr. Vlad Morariu
Convolutional Neural Networks II Slides from Dr. Vlad Morariu 1 Optimization Example of optimization progress while training a neural network. (Loss over mini-batches goes down over time.) 2 Learning rate
More informationCS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes
CS 6501: Deep Learning for Computer Graphics Basics of Neural Networks Connelly Barnes Overview Simple neural networks Perceptron Feedforward neural networks Multilayer perceptron and properties Autoencoders
More informationCSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer
CSE446: Neural Networks Spring 2017 Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer Human Neurons Switching time ~ 0.001 second Number of neurons 10 10 Connections per neuron 10 4-5 Scene
More informationDay 3 Lecture 3. Optimizing deep networks
Day 3 Lecture 3 Optimizing deep networks Convex optimization A function is convex if for all α [0,1]: f(x) Tangent line Examples Quadratics 2-norms Properties Local minimum is global minimum x Gradient
More informationNeural Networks. Learning and Computer Vision Prof. Olga Veksler CS9840. Lecture 10
CS9840 Learning and Computer Vision Prof. Olga Veksler Lecture 0 Neural Networks Many slides are from Andrew NG, Yann LeCun, Geoffry Hinton, Abin - Roozgard Outline Short Intro Perceptron ( layer NN) Multilayer
More informationNeural Networks Learning the network: Backprop , Fall 2018 Lecture 4
Neural Networks Learning the network: Backprop 11-785, Fall 2018 Lecture 4 1 Recap: The MLP can represent any function The MLP can be constructed to represent anything But how do we construct it? 2 Recap:
More informationNumerical Computation for Deep Learning
Numerical Computation for Deep Learning Lecture slides for Chapter 4 of Deep Learning www.deeplearningbook.org Ian Goodfellow Last modified 2017-10-14 Thanks to Justin Gilmer and Jacob Buckman for helpful
More informationCSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!!
CSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!! November 18, 2015 THE EXAM IS CLOSED BOOK. Once the exam has started, SORRY, NO TALKING!!! No, you can t even say see ya
More informationLearning: Binary Perceptron. Examples: Perceptron. Separable Case. In the space of feature vectors
Linear Classifiers CS 88 Artificial Intelligence Perceptrons and Logistic Regression Pieter Abbeel & Dan Klein University of California, Berkeley Feature Vectors Some (Simplified) Biology Very loose inspiration
More informationLeast Mean Squares Regression
Least Mean Squares Regression Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 Lecture Overview Linear classifiers What functions do linear classifiers express? Least Squares Method
More informationSVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning
SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning Mark Schmidt University of British Columbia, May 2016 www.cs.ubc.ca/~schmidtm/svan16 Some images from this lecture are
More informationIntroduction to Machine Learning (67577)
Introduction to Machine Learning (67577) Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem Deep Learning Shai Shalev-Shwartz (Hebrew U) IML Deep Learning Neural Networks
More informationtext classification 3: neural networks
text classification 3: neural networks CS 585, Fall 2018 Introduction to Natural Language Processing http://people.cs.umass.edu/~miyyer/cs585/ Mohit Iyyer College of Information and Computer Sciences University
More informationLecture 8: Introduction to Deep Learning: Part 2 (More on backpropagation, and ConvNets)
COS 402 Machine Learning and Artificial Intelligence Fall 2016 Lecture 8: Introduction to Deep Learning: Part 2 (More on backpropagation, and ConvNets) Sanjeev Arora Elad Hazan Recap: Structure of a deep
More informationApprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning
Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Nicolas Thome Prenom.Nom@cnam.fr http://cedric.cnam.fr/vertigo/cours/ml2/ Département Informatique Conservatoire
More informationGenerative adversarial networks
14-1: Generative adversarial networks Prof. J.C. Kao, UCLA Generative adversarial networks Why GANs? GAN intuition GAN equilibrium GAN implementation Practical considerations Much of these notes are based
More informationLocal Affine Approximators for Improving Knowledge Transfer
Local Affine Approximators for Improving Knowledge Transfer Suraj Srinivas & François Fleuret Idiap Research Institute and EPFL {suraj.srinivas, francois.fleuret}@idiap.ch Abstract The Jacobian of a neural
More informationDeep Learning & Artificial Intelligence WS 2018/2019
Deep Learning & Artificial Intelligence WS 2018/2019 Linear Regression Model Model Error Function: Squared Error Has no special meaning except it makes gradients look nicer Prediction Ground truth / target
More informationDeep Feedforward Networks. Seung-Hoon Na Chonbuk National University
Deep Feedforward Networks Seung-Hoon Na Chonbuk National University Neural Network: Types Feedforward neural networks (FNN) = Deep feedforward networks = multilayer perceptrons (MLP) No feedback connections
More informationStatistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks
Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks Jan Drchal Czech Technical University in Prague Faculty of Electrical Engineering Department of Computer Science Topics covered
More informationMachine Learning: Chenhao Tan University of Colorado Boulder LECTURE 16
Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 16 Slides adapted from Jordan Boyd-Graber, Justin Johnson, Andrej Karpathy, Chris Ketelsen, Fei-Fei Li, Mike Mozer, Michael Nielson
More informationSPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks
Topics in Machine Learning-EE 5359 Neural Networks 1 The Perceptron Output: A perceptron is a function that maps D-dimensional vectors to real numbers. For notational convenience, we add a zero-th dimension
More informationDeep Feedforward Networks
Deep Feedforward Networks Liu Yang March 30, 2017 Liu Yang Short title March 30, 2017 1 / 24 Overview 1 Background A general introduction Example 2 Gradient based learning Cost functions Output Units 3
More informationMachine Learning
Machine Learning 10-315 Maria Florina Balcan Machine Learning Department Carnegie Mellon University 03/29/2019 Today: Artificial neural networks Backpropagation Reading: Mitchell: Chapter 4 Bishop: Chapter
More informationVectorization. Yu Wu, Ishan Patil. October 13, 2017
Vectorization Yu Wu, Ishan Patil October 13, 2017 Exercises to be covered We will implement some examples of image classification algorithms using a subset of the MNIST dataset logistic regression for
More informationNatural Language Processing with Deep Learning CS224N/Ling284
Natural Language Processing with Deep Learning CS224N/Ling284 Lecture 4: Word Window Classification and Neural Networks Richard Socher Organization Main midterm: Feb 13 Alternative midterm: Friday Feb
More information