Machine Learning Basics III
|
|
- Marilyn Baker
- 5 years ago
- Views:
Transcription
1 Machine Learning Basics III Benjamin Roth CIS LMU München Benjamin Roth (CIS LMU München) Machine Learning Basics III 1 / 62
2 Outline 1 Classification Logistic Regression 2 Gradient Based Optimization Gradient Descent for Logistic Regression Stochastic Gradient Descent 3 Deep Feedforward Networks Design Choices for Output Units Design Choices for Hidden Units Backpropagation 4 Regularization L2-Regularization Benjamin Roth (CIS LMU München) Machine Learning Basics III 2 / 62
3 Outline 1 Classification Logistic Regression 2 Gradient Based Optimization Gradient Descent for Logistic Regression Stochastic Gradient Descent 3 Deep Feedforward Networks Design Choices for Output Units Design Choices for Hidden Units Backpropagation 4 Regularization L2-Regularization Benjamin Roth (CIS LMU München) Machine Learning Basics III 3 / 62
4 From Regression to Classification So far, linear regression: A simple linear model. Probabilistic interpretation. Find optimal parameters using Maimum Likelihood Estimation. Can we do something similar for classification? Logistic Regression (... it s not actually used for regression... ) Benjamin Roth (CIS LMU München) Machine Learning Basics III 4 / 62
5 Logistic Regression Binary logistic model: Estimate the probability of a binary response y {0, 1} based on features w. Logistic Regression is a Generalized Linear Model: Linear model is related to the response variable via a link function. p(y = 1 ; θ) = f (θ T ) (Note: Y denotes a random variable, whereas y, y (i), 0, 1 denote values that the random variable can take on. If the random variable is obvious from the contet, it may be omitted.) Benjamin Roth (CIS LMU München) Machine Learning Basics III 5 / 62
6 Logistic Regression Recall linear regression: p(y ; θ) = N(y; θ T, I) Predicts y R Classification: Outcome (per eample) 0 or 1 Logistic sigmoid: σ(z) = 1 1+e z Logistic Regression: Linear function + logistic sigmoid p(y = 1 ; θ) = σ(θ T ) Benjamin Roth (CIS LMU München) Machine Learning Basics III 6 / 62
7 Binary Logistic Regression Probability of different outcomes (for one eample): Probability of positive outcome: p(y = 1 ; θ) = e θt Probability of negative outcome: p(y = 0 ; θ) = 1 p(y = 1 ; θ) Benjamin Roth (CIS LMU München) Machine Learning Basics III 7 / 62
8 Probability of a Training Eample Probability for actual label y (i) given features (i) Can be written for both labels (0 and 1) without case distinction Label eponentiation trick: use 0 = 1 p(y = y (i) (i) ; θ) { p(y = 1 (i) ; θ) if y (i) = 1 = p(y = 0 (i) ; θ) if y (i) = 0 { p(y = 1 (i) ; θ) 1 p(y = 0 (i) ; θ) 0 if y (i) = 1 = p(y = 1 (i) ; θ) 0 p(y = 0 (i) ; θ) 1 if y (i) = 0 = p(y = 1 (i) ; θ) y (i) p(y = 0 (i) 1 y (i) ; θ) Benjamin Roth (CIS LMU München) Machine Learning Basics III 8 / 62
9 Binary Logistic Regression Conditional Negative Log-Likelihood (NLL): = log = m i=1 NLL(θ) = log p(y X; θ) m = log p(y = y (i) (i) ; θ) i=1 p(y = 1 (i) ; θ) y (i) (1 p(y = 1 (i) 1 y (i) ; θ)) m y (i) log σ(θ T (i) ) + (1 y (i) ) log(1 σ(θ T (i) )) i=1 No closed form solution for minimum Use numerical / iterative methods. LBFGS, Gradient descent... Benjamin Roth (CIS LMU München) Machine Learning Basics III 9 / 62
10 Logistic Regression Logistic regression: Logistic sigmoid function applied to a a weighted linear combination of feature values. To be interpreted as the probability that the label for a specific eample equals 1. Applying the model on test data: Predict y (i) = 1 if p(y = 1 (i) ; θ) > 0.5 No closed form solution for maimizing NLL, iterative methods necessary. Net up: Gradient descent. Benjamin Roth (CIS LMU München) Machine Learning Basics III 10 / 62
11 Outline 1 Classification Logistic Regression 2 Gradient Based Optimization Gradient Descent for Logistic Regression Stochastic Gradient Descent 3 Deep Feedforward Networks Design Choices for Output Units Design Choices for Hidden Units Backpropagation 4 Regularization L2-Regularization Benjamin Roth (CIS LMU München) Machine Learning Basics III 11 / 62
12 Optimization Optimization: Minimize some function J(θ) by altering θ. Maimize f (θ) by minimizing J(θ) = f (θ) J(θ): criterion, objective function, cost function, loss function, error function In a probabilistic machine learning setting often (conditional) negative log-likelihood: log p(x; θ) or as a function of θ θ = arg min θ J(θ) log p(y X; θ) Benjamin Roth (CIS LMU München) Machine Learning Basics III 12 / 62
13 Optimization If J(θ) is conve, it is minimized where θ J(θ) = 0 If J(θ) is not conve, the gradient can help us to improve our objective nevertheless (and find a local optimum). Many optimization techniques were originally developed for conve objective functions, but are found to be working well for non-conve functions too. Use the fact that gradient indicates the slope of the function in the direction of steepest increase. Benjamin Roth (CIS LMU München) Machine Learning Basics III 13 / 62
14 Gradient-Based Optimization Derivative: Given a small change in input, what is the corresponding change in output? f ( + ɛ) f () + ɛf () f ( ɛ sign f ()) < f () for small enough ɛ Benjamin Roth (CIS LMU München) Machine Learning Basics III 14 / 62
15 Gradient Descent For J(θ) : R n R If partial derivative J(θ) θ j > 0, J(θ) will increase for small increases of θ j go in opposite direction of gradient (since we want to minimize) Steepest descent: iterate θ t+1 θ η θ J(θ) η is the learning rate (set to small positive constant). Converges if θ J(θ) is (close to) 0 Benjamin Roth (CIS LMU München) Machine Learning Basics III 15 / 62
16 Local Minima If function is non-conve, different results can be obtained at convergence, depending on initialization of θ. Benjamin Roth (CIS LMU München) Machine Learning Basics III 16 / 62
17 Local Minima Minima can be global or local: Critical (stationary) points: f () = 0 For neural networks, only good (not perfect) parameter values can be found. Benjamin Roth (CIS LMU München) Machine Learning Basics III 17 / 62
18 Outline 1 Classification Logistic Regression 2 Gradient Based Optimization Gradient Descent for Logistic Regression Stochastic Gradient Descent 3 Deep Feedforward Networks Design Choices for Output Units Design Choices for Hidden Units Backpropagation 4 Regularization L2-Regularization Benjamin Roth (CIS LMU München) Machine Learning Basics III 18 / 62
19 Gradient Descent for Logistic Regression θ NLL(θ) = θ m i=1 y (i) log σ(θ T (i) ) + (1 y (i) ) log(1 σ(θ T (i) )) m = (y (i) σ(θ T (i) )) (i) i=1 The gradient descent update becomes: m θ t+1 := θ t + η (y (i) σ(θ T t (i) )) (i) i=1 Note: Which feature weights are increased, which are decreased? Benjamin Roth (CIS LMU München) Machine Learning Basics III 19 / 62
20 Derivation of Gradient for Logistic Regression This is a great eercise! Use the following facts: Gradient Derivative of a sum Chain rule Derivative of logarithm ( θ f (θ)) j = d dz f (θ) θ j i f i(z) = df i (z) i dz F (z) = f (g(z)) F (z) = f (g(z))g (z) d log z dz = 1/z D. of logistic sigmoid Partial d. of dot-product dσ(z) dz θ T θ j = σ(z)(1 σ(z)) = j Benjamin Roth (CIS LMU München) Machine Learning Basics III 20 / 62
21 Gradient Descent: Summary Iterative method for function minimization. Gradient indicates rate of change in objective function, given a local change to feature weights. Substract the gradient: decrease parameters that (locally) have positive correlation with objective increase parameters that (locally) have negative correlation with objective Gradient updates only have the desired properties in a small region around previous parameters θ t. Control locality by step-size η. Gradient descent is slow: For relatively small step in the right direction, all of training data has to be processed. This version of gradient descent is often also called batch gradient descent. Benjamin Roth (CIS LMU München) Machine Learning Basics III 21 / 62
22 Outline 1 Classification Logistic Regression 2 Gradient Based Optimization Gradient Descent for Logistic Regression Stochastic Gradient Descent 3 Deep Feedforward Networks Design Choices for Output Units Design Choices for Hidden Units Backpropagation 4 Regularization L2-Regularization Benjamin Roth (CIS LMU München) Machine Learning Basics III 22 / 62
23 Stochastic Gradient Descent (SGD) Batch gradient descent is slow: For relatively small step in the right direction, all of training data has to be processed. θ t+1 θ t + η θ m i=1 log p(y i i ; θ) Stochastic gradient descent in a nutshell: For each update, only use random sample Bt of training data (mini-batch). θ t+1 θ t + η θ log p(y i i ; θ) i B t Mini-batch size can also just be 1. More frequent updates. θ t+1 θ t + η θ log p(y t t ; θ) Benjamin Roth (CIS LMU München) Machine Learning Basics III 23 / 62
24 Stochastic Gradient Descent (SGD) The actual gradient is approimated using only a sub-sample of the data. For objective functions that are highly non-conve, the random deviations of these approimations may even help to escape local minima. Treat batch size and learning rate as hyper-parameter. Benjamin Roth (CIS LMU München) Machine Learning Basics III 24 / 62
25 Outline 1 Classification Logistic Regression 2 Gradient Based Optimization Gradient Descent for Logistic Regression Stochastic Gradient Descent 3 Deep Feedforward Networks Design Choices for Output Units Design Choices for Hidden Units Backpropagation 4 Regularization L2-Regularization Benjamin Roth (CIS LMU München) Machine Learning Basics III 25 / 62
26 Deep Feedforward Networks Function approimation: find good mapping ŷ = f (; θ) Network: Composition of functions f (1), f (2), f (3) with multi-dimensional input and output Each f (i) represents one layer f () = f (1) (f (2) (f (3) ()))) Feedforward: Input intermediate representation output No feedback connections Cf. recurrent networks Benjamin Roth (CIS LMU München) Machine Learning Basics III 26 / 62
27 Deep Feedforward Networks: Training Loss function defined on output layer, e.g. ŷ f (; θ) 1 2 Quality criterion on other layers not directly defined. Training algorithm must decide how to use those layers most effectively (w.r.t. loss on output layer) Non-output layers can be viewed as providing a feature function φ() of the input, that is to be learned. Benjamin Roth (CIS LMU München) Machine Learning Basics III 27 / 62
28 Neural Networks Inspired by biological neurons (nerve cells) Neurons are connected to each other, and receive and send electrical pulses. If the [input] voltage changes by a large enough amount, an all-or-none electrochemical pulse called an action potential is generated, which travels rapidly along the cell s aon, and activates synaptic connections with other cells when it arrives. (Wikipedia) Benjamin Roth (CIS LMU München) Machine Learning Basics III 28 / 62
29 Activation Functions with Non-Linearities Linear Functions are limited in what they can epress. Famous eample: XOR Simple layered non-linear functions can represent XOR. Benjamin Roth (CIS LMU München) Machine Learning Basics III 29 / 62
30 Outline 1 Classification Logistic Regression 2 Gradient Based Optimization Gradient Descent for Logistic Regression Stochastic Gradient Descent 3 Deep Feedforward Networks Design Choices for Output Units Design Choices for Hidden Units Backpropagation 4 Regularization L2-Regularization Benjamin Roth (CIS LMU München) Machine Learning Basics III 30 / 62
31 Design Choices for Output Units Typically can be interpreted as probabilities. Logistic sigmoid Softma mean and variance of a Gaussian,... Trained with negative log-likelihood. Benjamin Roth (CIS LMU München) Machine Learning Basics III 31 / 62
32 Softma Logistic sigmoid Vector y of binary outcomes, with no contraints on how many can be 1. Bernoulli distribution. Softma Eactly one element of y is 1. Multinoulli (categorical) distribution. p(y = i φ()) p(y = i φ()) = 1 i softma(z) i = ep(z i) j ep(z j) Benjamin Roth (CIS LMU München) Machine Learning Basics III 32 / 62
33 Parametrizing a Gaussian Distribution Use final layer to predict parameters of Gaussian miture model. Weight of miture component: softma. Means: no non-linearity. Precisions ( 1 ) need to be positive: softplus σ 2 softplus(z) = ln(1 + ep(z)) Benjamin Roth (CIS LMU München) Machine Learning Basics III 33 / 62
34 Outline 1 Classification Logistic Regression 2 Gradient Based Optimization Gradient Descent for Logistic Regression Stochastic Gradient Descent 3 Deep Feedforward Networks Design Choices for Output Units Design Choices for Hidden Units Backpropagation 4 Regularization L2-Regularization Benjamin Roth (CIS LMU München) Machine Learning Basics III 34 / 62
35 Rectified Linear Units Rectified Linear Unit: relu(z) = ma(0, z) z = T w + b Consistent gradient of 1 when unit is active (i.e. if there is an error to propagate). Default choice for hidden units. Benjamin Roth (CIS LMU München) Machine Learning Basics III 35 / 62
36 A Simple ReLU Network to Solve XOR f (; W, c, w) = w T ma(0, W T + c) [ ] 1 1 W = 1 1 [ ] 0 c = 1 [ ] 1 w = 2 Benjamin Roth (CIS LMU München) Machine Learning Basics III 36 / 62
37 Other Choices for Hidden Units A good activation function aids learning, and provides large gradients. Sigmoidal functions (logistic sigmoid) have only a small region before they flatten out in either direction. Practice shows that this seems to be ok in conjunction with Log-loss objective. But they don t work as well as hidden units. ReLU are better alternative since gradient stays constant. Other hidden unit functions: maout: take maimum of several values in previous layer. purely linear: can serve as low-rank approimation. Benjamin Roth (CIS LMU München) Machine Learning Basics III 37 / 62
38 Outline 1 Classification Logistic Regression 2 Gradient Based Optimization Gradient Descent for Logistic Regression Stochastic Gradient Descent 3 Deep Feedforward Networks Design Choices for Output Units Design Choices for Hidden Units Backpropagation 4 Regularization L2-Regularization Benjamin Roth (CIS LMU München) Machine Learning Basics III 38 / 62
39 Forward propagation: Input information propagates through network to produce output ŷ (and cost J(θ) in training) Back-propagation: compute gradient w.r.t. model parameters Cost gradient propagates backwards through the network Back-propagation is part of learning procedure (e.g. stochastic gradient descent), not learning procedure in itself. Benjamin Roth (CIS LMU München) Machine Learning Basics III 39 / 62
40 Chain Rule of Calculus: Real Functions Let Then, y, z R f, g : R R y = g() z = f (g()) = f (y) dz d = dz dy dy d Benjamin Roth (CIS LMU München) Machine Learning Basics III 40 / 62
41 Chain Rule of Calculus: Multivariate Functions Let Then R m, y R n, z R f : R n R g : R m R n y = g() z = f (g()) = f (y) z i = n j=1 z y j y j i In order to write this in vector notation, we need to define the Jacobian matri. Benjamin Roth (CIS LMU München) Machine Learning Basics III 41 / 62
42 Jacobian The Jacobian matri is the matri of all first-order partial derivatives of a vector-valued function. g() 1 1 J = g() = How to write in terms of gradients? We can write the chain rule as: z = g() 1 m g() 2 g() 2 1 m. g() n g() n m Benjamin Roth (CIS LMU München) Machine Learning Basics III 42 / 62
43 Jacobian The Jacobian matri is the matri of all first-order partial derivatives of a vector-valued function. g() 1 1 J = g() = How to write in terms of gradients? We can write the chain rule as: ( ) y T z = y z g() 1 m g() 2 g() 2 1 m. g() n g() n m Benjamin Roth (CIS LMU München) Machine Learning Basics III 42 / 62
44 Viewing the Network as a Graph Nodes are function outputs (can be scalar or vector valued) Arrows are inputs Eample: Scalar multiplication z = y. z y y Benjamin Roth (CIS LMU München) Machine Learning Basics III 43 / 62
45 Which Function? ^y σ (u) u T w w Benjamin Roth (CIS LMU München) Machine Learning Basics III 44 / 62
46 Graph with Cost L log P( y ^y) ^y σ (u) u T w w Benjamin Roth (CIS LMU München) Machine Learning Basics III 45 / 62
47 Which Function? Parameter vectors can be converted to matri as needed. σ (u) ^y u ma(0,u') h u' h T w w V V Benjamin Roth (CIS LMU München) Machine Learning Basics III 46 / 62
48 Forward Pass Green: known or computed. L log P( y ^y) ^y ma(0,u') h u' σ (u) u h T w w V V Benjamin Roth (CIS LMU München) Machine Learning Basics III 47 / 62
49 Forward Pass Green: known or computed. L log P( y ^y) ^y ma(0,u') h u' σ (u) u h T w w V V Benjamin Roth (CIS LMU München) Machine Learning Basics III 48 / 62
50 Forward Pass Green: known or computed. L log P( y ^y) ^y ma(0,u') h u' σ (u) u h T w w V V Benjamin Roth (CIS LMU München) Machine Learning Basics III 49 / 62
51 Forward Pass End of forward pass (some steps skipped). L log P( y ^y) ^y ma(0,u') h u' σ (u) u h T w w V V Benjamin Roth (CIS LMU München) Machine Learning Basics III 50 / 62
52 Backward Pass Red: gradient of cost computed for node. L log P( y ^y) ^y σ (u) u dl d ^y ma(0,u') h u' h T w w V V Benjamin Roth (CIS LMU München) Machine Learning Basics III 51 / 62
53 Backward Pass Red: gradient of cost computed for node. L log P( y ^y) ^y σ (u) u h T w h dl d ^y dl d ^y d ^y du w ma(0,u') u' V V Benjamin Roth (CIS LMU München) Machine Learning Basics III 52 / 62
54 Backward Pass Red: gradient of cost computed for node. L dl d ^y δu d ^y du δ h ma(0,u') log P( y ^y) h u' σ (u) ^y u h T w dl d ^y dl d ^y d ^y du w dl d ^y δ u d ^y du δ w V V Benjamin Roth (CIS LMU München) Machine Learning Basics III 53 / 62
55 End of Backward Pass We have the gradients for all parameters, let s use them for SGD. L dl d ^y δu d ^y du δ h ma(0,u') log P( y ^y) h u' V σ (u) ^y u h T w ( δ h dl d ^y δ u δ u ' )T d ^y du δ h V dl d ^y dl d ^y d ^y du w dl d ^y δ u d ^y du δ w ( δ u ' T δ V ) ( δ h dl d ^y δ u δ u ' )T d ^y du δ h Benjamin Roth (CIS LMU München) Machine Learning Basics III 54 / 62
56 Summary Gradient descent: Minimize loss by iteratively substracting gradient from parameter vector. Stochastic gradient descent: Approimate gradient by considering small subsets of eamples. Regularization: penalize large parameter values, e.g. by adding l2-norm of parameter vector. Feedforward networks: layers of (non-linear) function compositions. Output non-linearities: interpreted as probability densities (logistic sigmoid, softma, Gaussian) Hidden layers: Rectified linear units (ma(0, z)) Backpropagation: Compute gradient of cost w.r.t. parameters using chain rule. Benjamin Roth (CIS LMU München) Machine Learning Basics III 55 / 62
57 Outline 1 Classification Logistic Regression 2 Gradient Based Optimization Gradient Descent for Logistic Regression Stochastic Gradient Descent 3 Deep Feedforward Networks Design Choices for Output Units Design Choices for Hidden Units Backpropagation 4 Regularization L2-Regularization Benjamin Roth (CIS LMU München) Machine Learning Basics III 56 / 62
58 Outline 1 Classification Logistic Regression 2 Gradient Based Optimization Gradient Descent for Logistic Regression Stochastic Gradient Descent 3 Deep Feedforward Networks Design Choices for Output Units Design Choices for Hidden Units Backpropagation 4 Regularization L2-Regularization Benjamin Roth (CIS LMU München) Machine Learning Basics III 57 / 62
59 Regularization prediction prediction prediction feature feature feature Overfitting vs. underfitting Regularization: Any modification to a learning algorithm for reducing its generalization error but not its training error Build a preference into ML algorithm for one solution in hypothesis space over another Solution space is still the same Unpreferred solution is penalized: only chosen if there it fits training data much better Benjamin Roth (CIS LMU München) Machine Learning Basics III 58 / 62
60 L2-Regularization Large parameters overfitting σ() σ(2) σ(100) Prefer models with smaller feature weights. Popular regularizers: Penalize large L2 norm. Penalize large L1 norm (aka LASSO, induces sparsity) Benjamin Roth (CIS LMU München) Machine Learning Basics III 59 / 62
61 Regularization Add term that penalizes large l2 norm. The amount of penalty is controlled by a parameter λ Linear regression: J(θ) = MSE(θ) + λ Logistic regression: 2 θt θ J(θ) = NLL(θ) + λ 2 θt θ From a Bayesian perspective, l2-regularization corresponds to a Gaussian prior on the parameters. Benjamin Roth (CIS LMU München) Machine Learning Basics III 60 / 62
62 L2-Regularization The surface of the objective function is now a combination of the original cost, and the regularization penalty. Benjamin Roth (CIS LMU München) Machine Learning Basics III 61 / 62
63 L2-Regularization Gradient of regularization term: θ λ 2 θt θ = λθ Gradient descent for regularized cost function: θ t+1 := θ t η θ (NLL(θ t ) + λθ T t θ t ) θ t+1 := (1 ηλ)θ t η θ NLL(θ t ) Benjamin Roth (CIS LMU München) Machine Learning Basics III 62 / 62
Deep Feedforward Networks
Deep Feedforward Networks Liu Yang March 30, 2017 Liu Yang Short title March 30, 2017 1 / 24 Overview 1 Background A general introduction Example 2 Gradient based learning Cost functions Output Units 3
More informationIntro to Neural Networks and Deep Learning
Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi UVA CS 6316 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions Backpropagation Nonlinearity Functions NNs
More informationArtificial Neuron (Perceptron)
9/6/208 Gradient Descent (GD) Hantao Zhang Deep Learning with Python Reading: https://en.wikipedia.org/wiki/gradient_descent Artificial Neuron (Perceptron) = w T = w 0 0 + + w 2 2 + + w d d where
More informationDeep Feedforward Networks
Deep Feedforward Networks Liu Yang March 30, 2017 Liu Yang Short title March 30, 2017 1 / 24 Overview 1 Background A general introduction Example 2 Gradient based learning Cost functions Output Units 3
More informationReading Group on Deep Learning Session 1
Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular
More informationDeep Feedforward Networks. Seung-Hoon Na Chonbuk National University
Deep Feedforward Networks Seung-Hoon Na Chonbuk National University Neural Network: Types Feedforward neural networks (FNN) = Deep feedforward networks = multilayer perceptrons (MLP) No feedback connections
More informationComments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms
Neural networks Comments Assignment 3 code released implement classification algorithms use kernels for census dataset Thought questions 3 due this week Mini-project: hopefully you have started 2 Example:
More informationGradient-Based Learning. Sargur N. Srihari
Gradient-Based Learning Sargur N. srihari@cedar.buffalo.edu 1 Topics Overview 1. Example: Learning XOR 2. Gradient-Based Learning 3. Hidden Units 4. Architecture Design 5. Backpropagation and Other Differentiation
More informationLecture 3 Feedforward Networks and Backpropagation
Lecture 3 Feedforward Networks and Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 3, 2017 Things we will look at today Recap of Logistic Regression
More informationLecture 3 Feedforward Networks and Backpropagation
Lecture 3 Feedforward Networks and Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 3, 2017 Things we will look at today Recap of Logistic Regression
More informationDeep Feedforward Networks. Lecture slides for Chapter 6 of Deep Learning Ian Goodfellow Last updated
Deep Feedforward Networks Lecture slides for Chapter 6 of Deep Learning www.deeplearningbook.org Ian Goodfellow Last updated 2016-10-04 Roadmap Example: Learning XOR Gradient-Based Learning Hidden Units
More informationApprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning
Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Nicolas Thome Prenom.Nom@cnam.fr http://cedric.cnam.fr/vertigo/cours/ml2/ Département Informatique Conservatoire
More informationLecture 5: Logistic Regression. Neural Networks
Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture
More informationLECTURE # - NEURAL COMPUTATION, Feb 04, Linear Regression. x 1 θ 1 output... θ M x M. Assumes a functional form
LECTURE # - EURAL COPUTATIO, Feb 4, 4 Linear Regression Assumes a functional form f (, θ) = θ θ θ K θ (Eq) where = (,, ) are the attributes and θ = (θ, θ, θ ) are the function parameters Eample: f (, θ)
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More informationLogistic Regression & Neural Networks
Logistic Regression & Neural Networks CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides credit: Graham Neubig, Jacob Eisenstein Logistic Regression Perceptron & Probabilities What if we want a probability
More information1 What a Neural Network Computes
Neural Networks 1 What a Neural Network Computes To begin with, we will discuss fully connected feed-forward neural networks, also known as multilayer perceptrons. A feedforward neural network consists
More informationArtificial Neural Networks 2
CSC2515 Machine Learning Sam Roweis Artificial Neural s 2 We saw neural nets for classification. Same idea for regression. ANNs are just adaptive basis regression machines of the form: y k = j w kj σ(b
More informationNeural Network Training
Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification
More informationCSC 411 Lecture 10: Neural Networks
CSC 411 Lecture 10: Neural Networks Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 10-Neural Networks 1 / 35 Inspiration: The Brain Our brain has 10 11
More informationLogistic Regression. COMP 527 Danushka Bollegala
Logistic Regression COMP 527 Danushka Bollegala Binary Classification Given an instance x we must classify it to either positive (1) or negative (0) class We can use {1,-1} instead of {1,0} but we will
More informationNeural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann
Neural Networks with Applications to Vision and Language Feedforward Networks Marco Kuhlmann Feedforward networks Linear separability x 2 x 2 0 1 0 1 0 0 x 1 1 0 x 1 linearly separable not linearly separable
More informationLecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning
Lecture 0 Neural networks and optimization Machine Learning and Data Mining November 2009 UBC Gradient Searching for a good solution can be interpreted as looking for a minimum of some error (loss) function
More informationNeural Networks, Computation Graphs. CMSC 470 Marine Carpuat
Neural Networks, Computation Graphs CMSC 470 Marine Carpuat Binary Classification with a Multi-layer Perceptron φ A = 1 φ site = 1 φ located = 1 φ Maizuru = 1 φ, = 2 φ in = 1 φ Kyoto = 1 φ priest = 0 φ
More informationSerious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks. Cannot approximate (learn) non-linear functions
BACK-PROPAGATION NETWORKS Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks Cannot approximate (learn) non-linear functions Difficult (if not impossible) to design
More informationLecture 6. Regression
Lecture 6. Regression Prof. Alan Yuille Summer 2014 Outline 1. Introduction to Regression 2. Binary Regression 3. Linear Regression; Polynomial Regression 4. Non-linear Regression; Multilayer Perceptron
More informationECE521 lecture 4: 19 January Optimization, MLE, regularization
ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity
More informationFeed-forward Networks Network Training Error Backpropagation Applications. Neural Networks. Oliver Schulte - CMPT 726. Bishop PRML Ch.
Neural Networks Oliver Schulte - CMPT 726 Bishop PRML Ch. 5 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of biological plausibility We will
More informationLecture 2: Learning with neural networks
Lecture 2: Learning with neural networks Deep Learning @ UvA LEARNING WITH NEURAL NETWORKS - PAGE 1 Lecture Overview o Machine Learning Paradigm for Neural Networks o The Backpropagation algorithm for
More informationCS60010: Deep Learning
CS60010: Deep Learning Sudeshna Sarkar Spring 2018 16 Jan 2018 FFN Goal: Approximate some unknown ideal function f : X! Y Ideal classifier: y = f*(x) with x and category y Feedforward Network: Define parametric
More informationIntroduction to Machine Learning
Introduction to Machine Learning Linear Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1
More informationCOMP 551 Applied Machine Learning Lecture 14: Neural Networks
COMP 551 Applied Machine Learning Lecture 14: Neural Networks Instructor: Ryan Lowe (ryan.lowe@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted,
More information(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann
(Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for
More informationNeural Networks Learning the network: Backprop , Fall 2018 Lecture 4
Neural Networks Learning the network: Backprop 11-785, Fall 2018 Lecture 4 1 Recap: The MLP can represent any function The MLP can be constructed to represent anything But how do we construct it? 2 Recap:
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More informationStatistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks
Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks Jan Drchal Czech Technical University in Prague Faculty of Electrical Engineering Department of Computer Science Topics covered
More informationDeep Feedforward Networks
Deep Feedforward Networks Yongjin Park 1 Goal of Feedforward Networks Deep Feedforward Networks are also called as Feedforward neural networks or Multilayer Perceptrons Their Goal: approximate some function
More informationNeural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications
Neural Networks Bishop PRML Ch. 5 Alireza Ghane Neural Networks Alireza Ghane / Greg Mori 1 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of
More informationMachine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6
Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)
More informationMachine Learning Basics: Stochastic Gradient Descent. Sargur N. Srihari
Machine Learning Basics: Stochastic Gradient Descent Sargur N. srihari@cedar.buffalo.edu 1 Topics 1. Learning Algorithms 2. Capacity, Overfitting and Underfitting 3. Hyperparameters and Validation Sets
More informationIntroduction to Machine Learning
Introduction to Machine Learning Neural Networks Varun Chandola x x 5 Input Outline Contents February 2, 207 Extending Perceptrons 2 Multi Layered Perceptrons 2 2. Generalizing to Multiple Labels.................
More informationFrom perceptrons to word embeddings. Simon Šuster University of Groningen
From perceptrons to word embeddings Simon Šuster University of Groningen Outline A basic computational unit Weighting some input to produce an output: classification Perceptron Classify tweets Written
More informationFeedforward Neural Networks
Feedforward Neural Networks Michael Collins 1 Introduction In the previous notes, we introduced an important class of models, log-linear models. In this note, we describe feedforward neural networks, which
More informationCOMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS16
COMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS6 Lecture 3: Classification with Logistic Regression Advanced optimization techniques Underfitting & Overfitting Model selection (Training-
More informationNeural Networks: Backpropagation
Neural Networks: Backpropagation Seung-Hoon Na 1 1 Department of Computer Science Chonbuk National University 2018.10.25 eung-hoon Na (Chonbuk National University) Neural Networks: Backpropagation 2018.10.25
More informationNeural networks COMS 4771
Neural networks COMS 4771 1. Logistic regression Logistic regression Suppose X = R d and Y = {0, 1}. A logistic regression model is a statistical model where the conditional probability function has a
More information1 Machine Learning Concepts (16 points)
CSCI 567 Fall 2018 Midterm Exam DO NOT OPEN EXAM UNTIL INSTRUCTED TO DO SO PLEASE TURN OFF ALL CELL PHONES Problem 1 2 3 4 5 6 Total Max 16 10 16 42 24 12 120 Points Please read the following instructions
More informationMachine Learning. Lecture 04: Logistic and Softmax Regression. Nevin L. Zhang
Machine Learning Lecture 04: Logistic and Softmax Regression Nevin L. Zhang lzhang@cse.ust.hk Department of Computer Science and Engineering The Hong Kong University of Science and Technology This set
More informationMACHINE LEARNING ADVANCED MACHINE LEARNING
MACHINE LEARNING ADVANCED MACHINE LEARNING Recap of Important Notions on Estimation of Probability Density Functions 2 2 MACHINE LEARNING Overview Definition pdf Definition joint, condition, marginal,
More informationMachine Learning: Chenhao Tan University of Colorado Boulder LECTURE 16
Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 16 Slides adapted from Jordan Boyd-Graber, Justin Johnson, Andrej Karpathy, Chris Ketelsen, Fei-Fei Li, Mike Mozer, Michael Nielson
More informationMachine Learning 4771
Machine Learning 4771 Instructor: Tony Jebara Topic 7 Unsupervised Learning Statistical Perspective Probability Models Discrete & Continuous: Gaussian, Bernoulli, Multinomial Maimum Likelihood Logistic
More informationCourse 395: Machine Learning - Lectures
Course 395: Machine Learning - Lectures Lecture 1-2: Concept Learning (M. Pantic) Lecture 3-4: Decision Trees & CBC Intro (M. Pantic & S. Petridis) Lecture 5-6: Evaluating Hypotheses (S. Petridis) Lecture
More informationDeep Feedforward Networks. Sargur N. Srihari
Deep Feedforward Networks Sargur N. srihari@cedar.buffalo.edu 1 Topics Overview 1. Example: Learning XOR 2. Gradient-Based Learning 3. Hidden Units 4. Architecture Design 5. Backpropagation and Other Differentiation
More information10-701/ Machine Learning, Fall
0-70/5-78 Machine Learning, Fall 2003 Homework 2 Solution If you have questions, please contact Jiayong Zhang .. (Error Function) The sum-of-squares error is the most common training
More informationAdvanced Machine Learning
Advanced Machine Learning Lecture 4: Deep Learning Essentials Pierre Geurts, Gilles Louppe, Louis Wehenkel 1 / 52 Outline Goal: explain and motivate the basic constructs of neural networks. From linear
More informationMLPR: Logistic Regression and Neural Networks
MLPR: Logistic Regression and Neural Networks Machine Learning and Pattern Recognition Amos Storkey Amos Storkey MLPR: Logistic Regression and Neural Networks 1/28 Outline 1 Logistic Regression 2 Multi-layer
More information4. Multilayer Perceptrons
4. Multilayer Perceptrons This is a supervised error-correction learning algorithm. 1 4.1 Introduction A multilayer feedforward network consists of an input layer, one or more hidden layers, and an output
More informationOutline. MLPR: Logistic Regression and Neural Networks Machine Learning and Pattern Recognition. Which is the correct model? Recap.
Outline MLPR: and Neural Networks Machine Learning and Pattern Recognition 2 Amos Storkey Amos Storkey MLPR: and Neural Networks /28 Recap Amos Storkey MLPR: and Neural Networks 2/28 Which is the correct
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationNeural Networks (Part 1) Goals for the lecture
Neural Networks (Part ) Mark Craven and David Page Computer Sciences 760 Spring 208 www.biostat.wisc.edu/~craven/cs760/ Some of the slides in these lectures have been adapted/borrowed from materials developed
More informationNeural Networks. Nicholas Ruozzi University of Texas at Dallas
Neural Networks Nicholas Ruozzi University of Texas at Dallas Handwritten Digit Recognition Given a collection of handwritten digits and their corresponding labels, we d like to be able to correctly classify
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Neural Networks: A brief touch Yuejie Chi Department of Electrical and Computer Engineering Spring 2018 1/41 Outline
More informationCh.6 Deep Feedforward Networks (2/3)
Ch.6 Deep Feedforward Networks (2/3) 16. 10. 17. (Mon.) System Software Lab., Dept. of Mechanical & Information Eng. Woonggy Kim 1 Contents 6.3. Hidden Units 6.3.1. Rectified Linear Units and Their Generalizations
More informationFeed-forward Network Functions
Feed-forward Network Functions Sargur Srihari Topics 1. Extension of linear models 2. Feed-forward Network Functions 3. Weight-space symmetries 2 Recap of Linear Models Linear Models for Regression, Classification
More informationLECTURE NOTE #NEW 6 PROF. ALAN YUILLE
LECTURE NOTE #NEW 6 PROF. ALAN YUILLE 1. Introduction to Regression Now consider learning the conditional distribution p(y x). This is often easier than learning the likelihood function p(x y) and the
More informationLecture 3: Pattern Classification. Pattern classification
EE E68: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mitures and
More informationMachine Learning. Neural Networks. (slides from Domingos, Pardo, others)
Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward
More informationNeural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016
Neural Networks Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016 Outline Part 1 Introduction Feedforward Neural Networks Stochastic Gradient Descent Computational Graph
More informationClassification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box
ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton Motivation Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses
More informationNeural Networks (and Gradient Ascent Again)
Neural Networks (and Gradient Ascent Again) Frank Wood April 27, 2010 Generalized Regression Until now we have focused on linear regression techniques. We generalized linear regression to include nonlinear
More informationRapid Introduction to Machine Learning/ Deep Learning
Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University 1/59 Lecture 4a Feedforward neural network October 30, 2015 2/59 Table of contents 1 1. Objectives of Lecture
More informationMachine Learning. Neural Networks. (slides from Domingos, Pardo, others)
Machine Learning Neural Networks (slides from Domingos, Pardo, others) Human Brain Neurons Input-Output Transformation Input Spikes Output Spike Spike (= a brief pulse) (Excitatory Post-Synaptic Potential)
More informationIntroduction to Neural Networks
CUONG TUAN NGUYEN SEIJI HOTTA MASAKI NAKAGAWA Tokyo University of Agriculture and Technology Copyright by Nguyen, Hotta and Nakagawa 1 Pattern classification Which category of an input? Example: Character
More informationMachine Learning (CSE 446): Backpropagation
Machine Learning (CSE 446): Backpropagation Noah Smith c 2017 University of Washington nasmith@cs.washington.edu November 8, 2017 1 / 32 Neuron-Inspired Classifiers correct output y n L n loss hidden units
More informationA summary of Deep Learning without Poor Local Minima
A summary of Deep Learning without Poor Local Minima by Kenji Kawaguchi MIT oral presentation at NIPS 2016 Learning Supervised (or Predictive) learning Learn a mapping from inputs x to outputs y, given
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationLecture 17: Neural Networks and Deep Learning
UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016 Lecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions
More informationVasil Khalidov & Miles Hansard. C.M. Bishop s PRML: Chapter 5; Neural Networks
C.M. Bishop s PRML: Chapter 5; Neural Networks Introduction The aim is, as before, to find useful decompositions of the target variable; t(x) = y(x, w) + ɛ(x) (3.7) t(x n ) and x n are the observations,
More informationMachine Learning. Neural Networks. (slides from Domingos, Pardo, others)
Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward
More informationMachine Learning (CSE 446): Neural Networks
Machine Learning (CSE 446): Neural Networks Noah Smith c 2017 University of Washington nasmith@cs.washington.edu November 6, 2017 1 / 22 Admin No Wednesday office hours for Noah; no lecture Friday. 2 /
More informationCS489/698: Intro to ML
CS489/698: Intro to ML Lecture 03: Multi-layer Perceptron Outline Failure of Perceptron Neural Network Backpropagation Universal Approximator 2 Outline Failure of Perceptron Neural Network Backpropagation
More informationDeep Learning & Artificial Intelligence WS 2018/2019
Deep Learning & Artificial Intelligence WS 2018/2019 Linear Regression Model Model Error Function: Squared Error Has no special meaning except it makes gradients look nicer Prediction Ground truth / target
More informationLinear Models for Regression
Linear Models for Regression CSE 4309 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 The Regression Problem Training data: A set of input-output
More informationCS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes
CS 6501: Deep Learning for Computer Graphics Basics of Neural Networks Connelly Barnes Overview Simple neural networks Perceptron Feedforward neural networks Multilayer perceptron and properties Autoencoders
More informationComputational statistics
Computational statistics Lecture 3: Neural networks Thierry Denœux 5 March, 2016 Neural networks A class of learning methods that was developed separately in different fields statistics and artificial
More informationBackpropagation Introduction to Machine Learning. Matt Gormley Lecture 13 Mar 1, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Backpropagation Matt Gormley Lecture 13 Mar 1, 2018 1 Reminders Homework 5: Neural
More informationLogistic Regression. Machine Learning Fall 2018
Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes
More informationCSC 578 Neural Networks and Deep Learning
CSC 578 Neural Networks and Deep Learning Fall 2018/19 3. Improving Neural Networks (Some figures adapted from NNDL book) 1 Various Approaches to Improve Neural Networks 1. Cost functions Quadratic Cross
More informationArtificial Neural Networks
Introduction ANN in Action Final Observations Application: Poverty Detection Artificial Neural Networks Alvaro J. Riascos Villegas University of los Andes and Quantil July 6 2018 Artificial Neural Networks
More informationLogistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu
Logistic Regression Review 10-601 Fall 2012 Recitation September 25, 2012 TA: Selen Uguroglu!1 Outline Decision Theory Logistic regression Goal Loss function Inference Gradient Descent!2 Training Data
More informationCS229 Supplemental Lecture notes
CS229 Supplemental Lecture notes John Duchi Binary classification In binary classification problems, the target y can take on at only two values. In this set of notes, we show how to model this problem
More informationIntroduction to Machine Learning
Introduction to Machine Learning Logistic Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574
More informationBackpropagation Introduction to Machine Learning. Matt Gormley Lecture 12 Feb 23, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Backpropagation Matt Gormley Lecture 12 Feb 23, 2018 1 Neural Networks Outline
More informationArtificial Neural Networks. MGS Lecture 2
Artificial Neural Networks MGS 2018 - Lecture 2 OVERVIEW Biological Neural Networks Cell Topology: Input, Output, and Hidden Layers Functional description Cost functions Training ANNs Back-Propagation
More informationIntroduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis
Introduction to Natural Computation Lecture 9 Multilayer Perceptrons and Backpropagation Peter Lewis 1 / 25 Overview of the Lecture Why multilayer perceptrons? Some applications of multilayer perceptrons.
More informationNeural Networks: Backpropagation
Neural Networks: Backpropagation Machine Learning Fall 2017 Based on slides and material from Geoffrey Hinton, Richard Socher, Dan Roth, Yoav Goldberg, Shai Shalev-Shwartz and Shai Ben-David, and others
More informationStatistical Machine Learning from Data
January 17, 2006 Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Multi-Layer Perceptrons Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationMachine Learning
Machine Learning 10-315 Maria Florina Balcan Machine Learning Department Carnegie Mellon University 03/29/2019 Today: Artificial neural networks Backpropagation Reading: Mitchell: Chapter 4 Bishop: Chapter
More information<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)
Learning for Deep Neural Networks (Back-propagation) Outline Summary of Previous Standford Lecture Universal Approximation Theorem Inference vs Training Gradient Descent Back-Propagation
More information