Machine Learning Basics III

Size: px
Start display at page:

Download "Machine Learning Basics III"

Transcription

1 Machine Learning Basics III Benjamin Roth CIS LMU München Benjamin Roth (CIS LMU München) Machine Learning Basics III 1 / 62

2 Outline 1 Classification Logistic Regression 2 Gradient Based Optimization Gradient Descent for Logistic Regression Stochastic Gradient Descent 3 Deep Feedforward Networks Design Choices for Output Units Design Choices for Hidden Units Backpropagation 4 Regularization L2-Regularization Benjamin Roth (CIS LMU München) Machine Learning Basics III 2 / 62

3 Outline 1 Classification Logistic Regression 2 Gradient Based Optimization Gradient Descent for Logistic Regression Stochastic Gradient Descent 3 Deep Feedforward Networks Design Choices for Output Units Design Choices for Hidden Units Backpropagation 4 Regularization L2-Regularization Benjamin Roth (CIS LMU München) Machine Learning Basics III 3 / 62

4 From Regression to Classification So far, linear regression: A simple linear model. Probabilistic interpretation. Find optimal parameters using Maimum Likelihood Estimation. Can we do something similar for classification? Logistic Regression (... it s not actually used for regression... ) Benjamin Roth (CIS LMU München) Machine Learning Basics III 4 / 62

5 Logistic Regression Binary logistic model: Estimate the probability of a binary response y {0, 1} based on features w. Logistic Regression is a Generalized Linear Model: Linear model is related to the response variable via a link function. p(y = 1 ; θ) = f (θ T ) (Note: Y denotes a random variable, whereas y, y (i), 0, 1 denote values that the random variable can take on. If the random variable is obvious from the contet, it may be omitted.) Benjamin Roth (CIS LMU München) Machine Learning Basics III 5 / 62

6 Logistic Regression Recall linear regression: p(y ; θ) = N(y; θ T, I) Predicts y R Classification: Outcome (per eample) 0 or 1 Logistic sigmoid: σ(z) = 1 1+e z Logistic Regression: Linear function + logistic sigmoid p(y = 1 ; θ) = σ(θ T ) Benjamin Roth (CIS LMU München) Machine Learning Basics III 6 / 62

7 Binary Logistic Regression Probability of different outcomes (for one eample): Probability of positive outcome: p(y = 1 ; θ) = e θt Probability of negative outcome: p(y = 0 ; θ) = 1 p(y = 1 ; θ) Benjamin Roth (CIS LMU München) Machine Learning Basics III 7 / 62

8 Probability of a Training Eample Probability for actual label y (i) given features (i) Can be written for both labels (0 and 1) without case distinction Label eponentiation trick: use 0 = 1 p(y = y (i) (i) ; θ) { p(y = 1 (i) ; θ) if y (i) = 1 = p(y = 0 (i) ; θ) if y (i) = 0 { p(y = 1 (i) ; θ) 1 p(y = 0 (i) ; θ) 0 if y (i) = 1 = p(y = 1 (i) ; θ) 0 p(y = 0 (i) ; θ) 1 if y (i) = 0 = p(y = 1 (i) ; θ) y (i) p(y = 0 (i) 1 y (i) ; θ) Benjamin Roth (CIS LMU München) Machine Learning Basics III 8 / 62

9 Binary Logistic Regression Conditional Negative Log-Likelihood (NLL): = log = m i=1 NLL(θ) = log p(y X; θ) m = log p(y = y (i) (i) ; θ) i=1 p(y = 1 (i) ; θ) y (i) (1 p(y = 1 (i) 1 y (i) ; θ)) m y (i) log σ(θ T (i) ) + (1 y (i) ) log(1 σ(θ T (i) )) i=1 No closed form solution for minimum Use numerical / iterative methods. LBFGS, Gradient descent... Benjamin Roth (CIS LMU München) Machine Learning Basics III 9 / 62

10 Logistic Regression Logistic regression: Logistic sigmoid function applied to a a weighted linear combination of feature values. To be interpreted as the probability that the label for a specific eample equals 1. Applying the model on test data: Predict y (i) = 1 if p(y = 1 (i) ; θ) > 0.5 No closed form solution for maimizing NLL, iterative methods necessary. Net up: Gradient descent. Benjamin Roth (CIS LMU München) Machine Learning Basics III 10 / 62

11 Outline 1 Classification Logistic Regression 2 Gradient Based Optimization Gradient Descent for Logistic Regression Stochastic Gradient Descent 3 Deep Feedforward Networks Design Choices for Output Units Design Choices for Hidden Units Backpropagation 4 Regularization L2-Regularization Benjamin Roth (CIS LMU München) Machine Learning Basics III 11 / 62

12 Optimization Optimization: Minimize some function J(θ) by altering θ. Maimize f (θ) by minimizing J(θ) = f (θ) J(θ): criterion, objective function, cost function, loss function, error function In a probabilistic machine learning setting often (conditional) negative log-likelihood: log p(x; θ) or as a function of θ θ = arg min θ J(θ) log p(y X; θ) Benjamin Roth (CIS LMU München) Machine Learning Basics III 12 / 62

13 Optimization If J(θ) is conve, it is minimized where θ J(θ) = 0 If J(θ) is not conve, the gradient can help us to improve our objective nevertheless (and find a local optimum). Many optimization techniques were originally developed for conve objective functions, but are found to be working well for non-conve functions too. Use the fact that gradient indicates the slope of the function in the direction of steepest increase. Benjamin Roth (CIS LMU München) Machine Learning Basics III 13 / 62

14 Gradient-Based Optimization Derivative: Given a small change in input, what is the corresponding change in output? f ( + ɛ) f () + ɛf () f ( ɛ sign f ()) < f () for small enough ɛ Benjamin Roth (CIS LMU München) Machine Learning Basics III 14 / 62

15 Gradient Descent For J(θ) : R n R If partial derivative J(θ) θ j > 0, J(θ) will increase for small increases of θ j go in opposite direction of gradient (since we want to minimize) Steepest descent: iterate θ t+1 θ η θ J(θ) η is the learning rate (set to small positive constant). Converges if θ J(θ) is (close to) 0 Benjamin Roth (CIS LMU München) Machine Learning Basics III 15 / 62

16 Local Minima If function is non-conve, different results can be obtained at convergence, depending on initialization of θ. Benjamin Roth (CIS LMU München) Machine Learning Basics III 16 / 62

17 Local Minima Minima can be global or local: Critical (stationary) points: f () = 0 For neural networks, only good (not perfect) parameter values can be found. Benjamin Roth (CIS LMU München) Machine Learning Basics III 17 / 62

18 Outline 1 Classification Logistic Regression 2 Gradient Based Optimization Gradient Descent for Logistic Regression Stochastic Gradient Descent 3 Deep Feedforward Networks Design Choices for Output Units Design Choices for Hidden Units Backpropagation 4 Regularization L2-Regularization Benjamin Roth (CIS LMU München) Machine Learning Basics III 18 / 62

19 Gradient Descent for Logistic Regression θ NLL(θ) = θ m i=1 y (i) log σ(θ T (i) ) + (1 y (i) ) log(1 σ(θ T (i) )) m = (y (i) σ(θ T (i) )) (i) i=1 The gradient descent update becomes: m θ t+1 := θ t + η (y (i) σ(θ T t (i) )) (i) i=1 Note: Which feature weights are increased, which are decreased? Benjamin Roth (CIS LMU München) Machine Learning Basics III 19 / 62

20 Derivation of Gradient for Logistic Regression This is a great eercise! Use the following facts: Gradient Derivative of a sum Chain rule Derivative of logarithm ( θ f (θ)) j = d dz f (θ) θ j i f i(z) = df i (z) i dz F (z) = f (g(z)) F (z) = f (g(z))g (z) d log z dz = 1/z D. of logistic sigmoid Partial d. of dot-product dσ(z) dz θ T θ j = σ(z)(1 σ(z)) = j Benjamin Roth (CIS LMU München) Machine Learning Basics III 20 / 62

21 Gradient Descent: Summary Iterative method for function minimization. Gradient indicates rate of change in objective function, given a local change to feature weights. Substract the gradient: decrease parameters that (locally) have positive correlation with objective increase parameters that (locally) have negative correlation with objective Gradient updates only have the desired properties in a small region around previous parameters θ t. Control locality by step-size η. Gradient descent is slow: For relatively small step in the right direction, all of training data has to be processed. This version of gradient descent is often also called batch gradient descent. Benjamin Roth (CIS LMU München) Machine Learning Basics III 21 / 62

22 Outline 1 Classification Logistic Regression 2 Gradient Based Optimization Gradient Descent for Logistic Regression Stochastic Gradient Descent 3 Deep Feedforward Networks Design Choices for Output Units Design Choices for Hidden Units Backpropagation 4 Regularization L2-Regularization Benjamin Roth (CIS LMU München) Machine Learning Basics III 22 / 62

23 Stochastic Gradient Descent (SGD) Batch gradient descent is slow: For relatively small step in the right direction, all of training data has to be processed. θ t+1 θ t + η θ m i=1 log p(y i i ; θ) Stochastic gradient descent in a nutshell: For each update, only use random sample Bt of training data (mini-batch). θ t+1 θ t + η θ log p(y i i ; θ) i B t Mini-batch size can also just be 1. More frequent updates. θ t+1 θ t + η θ log p(y t t ; θ) Benjamin Roth (CIS LMU München) Machine Learning Basics III 23 / 62

24 Stochastic Gradient Descent (SGD) The actual gradient is approimated using only a sub-sample of the data. For objective functions that are highly non-conve, the random deviations of these approimations may even help to escape local minima. Treat batch size and learning rate as hyper-parameter. Benjamin Roth (CIS LMU München) Machine Learning Basics III 24 / 62

25 Outline 1 Classification Logistic Regression 2 Gradient Based Optimization Gradient Descent for Logistic Regression Stochastic Gradient Descent 3 Deep Feedforward Networks Design Choices for Output Units Design Choices for Hidden Units Backpropagation 4 Regularization L2-Regularization Benjamin Roth (CIS LMU München) Machine Learning Basics III 25 / 62

26 Deep Feedforward Networks Function approimation: find good mapping ŷ = f (; θ) Network: Composition of functions f (1), f (2), f (3) with multi-dimensional input and output Each f (i) represents one layer f () = f (1) (f (2) (f (3) ()))) Feedforward: Input intermediate representation output No feedback connections Cf. recurrent networks Benjamin Roth (CIS LMU München) Machine Learning Basics III 26 / 62

27 Deep Feedforward Networks: Training Loss function defined on output layer, e.g. ŷ f (; θ) 1 2 Quality criterion on other layers not directly defined. Training algorithm must decide how to use those layers most effectively (w.r.t. loss on output layer) Non-output layers can be viewed as providing a feature function φ() of the input, that is to be learned. Benjamin Roth (CIS LMU München) Machine Learning Basics III 27 / 62

28 Neural Networks Inspired by biological neurons (nerve cells) Neurons are connected to each other, and receive and send electrical pulses. If the [input] voltage changes by a large enough amount, an all-or-none electrochemical pulse called an action potential is generated, which travels rapidly along the cell s aon, and activates synaptic connections with other cells when it arrives. (Wikipedia) Benjamin Roth (CIS LMU München) Machine Learning Basics III 28 / 62

29 Activation Functions with Non-Linearities Linear Functions are limited in what they can epress. Famous eample: XOR Simple layered non-linear functions can represent XOR. Benjamin Roth (CIS LMU München) Machine Learning Basics III 29 / 62

30 Outline 1 Classification Logistic Regression 2 Gradient Based Optimization Gradient Descent for Logistic Regression Stochastic Gradient Descent 3 Deep Feedforward Networks Design Choices for Output Units Design Choices for Hidden Units Backpropagation 4 Regularization L2-Regularization Benjamin Roth (CIS LMU München) Machine Learning Basics III 30 / 62

31 Design Choices for Output Units Typically can be interpreted as probabilities. Logistic sigmoid Softma mean and variance of a Gaussian,... Trained with negative log-likelihood. Benjamin Roth (CIS LMU München) Machine Learning Basics III 31 / 62

32 Softma Logistic sigmoid Vector y of binary outcomes, with no contraints on how many can be 1. Bernoulli distribution. Softma Eactly one element of y is 1. Multinoulli (categorical) distribution. p(y = i φ()) p(y = i φ()) = 1 i softma(z) i = ep(z i) j ep(z j) Benjamin Roth (CIS LMU München) Machine Learning Basics III 32 / 62

33 Parametrizing a Gaussian Distribution Use final layer to predict parameters of Gaussian miture model. Weight of miture component: softma. Means: no non-linearity. Precisions ( 1 ) need to be positive: softplus σ 2 softplus(z) = ln(1 + ep(z)) Benjamin Roth (CIS LMU München) Machine Learning Basics III 33 / 62

34 Outline 1 Classification Logistic Regression 2 Gradient Based Optimization Gradient Descent for Logistic Regression Stochastic Gradient Descent 3 Deep Feedforward Networks Design Choices for Output Units Design Choices for Hidden Units Backpropagation 4 Regularization L2-Regularization Benjamin Roth (CIS LMU München) Machine Learning Basics III 34 / 62

35 Rectified Linear Units Rectified Linear Unit: relu(z) = ma(0, z) z = T w + b Consistent gradient of 1 when unit is active (i.e. if there is an error to propagate). Default choice for hidden units. Benjamin Roth (CIS LMU München) Machine Learning Basics III 35 / 62

36 A Simple ReLU Network to Solve XOR f (; W, c, w) = w T ma(0, W T + c) [ ] 1 1 W = 1 1 [ ] 0 c = 1 [ ] 1 w = 2 Benjamin Roth (CIS LMU München) Machine Learning Basics III 36 / 62

37 Other Choices for Hidden Units A good activation function aids learning, and provides large gradients. Sigmoidal functions (logistic sigmoid) have only a small region before they flatten out in either direction. Practice shows that this seems to be ok in conjunction with Log-loss objective. But they don t work as well as hidden units. ReLU are better alternative since gradient stays constant. Other hidden unit functions: maout: take maimum of several values in previous layer. purely linear: can serve as low-rank approimation. Benjamin Roth (CIS LMU München) Machine Learning Basics III 37 / 62

38 Outline 1 Classification Logistic Regression 2 Gradient Based Optimization Gradient Descent for Logistic Regression Stochastic Gradient Descent 3 Deep Feedforward Networks Design Choices for Output Units Design Choices for Hidden Units Backpropagation 4 Regularization L2-Regularization Benjamin Roth (CIS LMU München) Machine Learning Basics III 38 / 62

39 Forward propagation: Input information propagates through network to produce output ŷ (and cost J(θ) in training) Back-propagation: compute gradient w.r.t. model parameters Cost gradient propagates backwards through the network Back-propagation is part of learning procedure (e.g. stochastic gradient descent), not learning procedure in itself. Benjamin Roth (CIS LMU München) Machine Learning Basics III 39 / 62

40 Chain Rule of Calculus: Real Functions Let Then, y, z R f, g : R R y = g() z = f (g()) = f (y) dz d = dz dy dy d Benjamin Roth (CIS LMU München) Machine Learning Basics III 40 / 62

41 Chain Rule of Calculus: Multivariate Functions Let Then R m, y R n, z R f : R n R g : R m R n y = g() z = f (g()) = f (y) z i = n j=1 z y j y j i In order to write this in vector notation, we need to define the Jacobian matri. Benjamin Roth (CIS LMU München) Machine Learning Basics III 41 / 62

42 Jacobian The Jacobian matri is the matri of all first-order partial derivatives of a vector-valued function. g() 1 1 J = g() = How to write in terms of gradients? We can write the chain rule as: z = g() 1 m g() 2 g() 2 1 m. g() n g() n m Benjamin Roth (CIS LMU München) Machine Learning Basics III 42 / 62

43 Jacobian The Jacobian matri is the matri of all first-order partial derivatives of a vector-valued function. g() 1 1 J = g() = How to write in terms of gradients? We can write the chain rule as: ( ) y T z = y z g() 1 m g() 2 g() 2 1 m. g() n g() n m Benjamin Roth (CIS LMU München) Machine Learning Basics III 42 / 62

44 Viewing the Network as a Graph Nodes are function outputs (can be scalar or vector valued) Arrows are inputs Eample: Scalar multiplication z = y. z y y Benjamin Roth (CIS LMU München) Machine Learning Basics III 43 / 62

45 Which Function? ^y σ (u) u T w w Benjamin Roth (CIS LMU München) Machine Learning Basics III 44 / 62

46 Graph with Cost L log P( y ^y) ^y σ (u) u T w w Benjamin Roth (CIS LMU München) Machine Learning Basics III 45 / 62

47 Which Function? Parameter vectors can be converted to matri as needed. σ (u) ^y u ma(0,u') h u' h T w w V V Benjamin Roth (CIS LMU München) Machine Learning Basics III 46 / 62

48 Forward Pass Green: known or computed. L log P( y ^y) ^y ma(0,u') h u' σ (u) u h T w w V V Benjamin Roth (CIS LMU München) Machine Learning Basics III 47 / 62

49 Forward Pass Green: known or computed. L log P( y ^y) ^y ma(0,u') h u' σ (u) u h T w w V V Benjamin Roth (CIS LMU München) Machine Learning Basics III 48 / 62

50 Forward Pass Green: known or computed. L log P( y ^y) ^y ma(0,u') h u' σ (u) u h T w w V V Benjamin Roth (CIS LMU München) Machine Learning Basics III 49 / 62

51 Forward Pass End of forward pass (some steps skipped). L log P( y ^y) ^y ma(0,u') h u' σ (u) u h T w w V V Benjamin Roth (CIS LMU München) Machine Learning Basics III 50 / 62

52 Backward Pass Red: gradient of cost computed for node. L log P( y ^y) ^y σ (u) u dl d ^y ma(0,u') h u' h T w w V V Benjamin Roth (CIS LMU München) Machine Learning Basics III 51 / 62

53 Backward Pass Red: gradient of cost computed for node. L log P( y ^y) ^y σ (u) u h T w h dl d ^y dl d ^y d ^y du w ma(0,u') u' V V Benjamin Roth (CIS LMU München) Machine Learning Basics III 52 / 62

54 Backward Pass Red: gradient of cost computed for node. L dl d ^y δu d ^y du δ h ma(0,u') log P( y ^y) h u' σ (u) ^y u h T w dl d ^y dl d ^y d ^y du w dl d ^y δ u d ^y du δ w V V Benjamin Roth (CIS LMU München) Machine Learning Basics III 53 / 62

55 End of Backward Pass We have the gradients for all parameters, let s use them for SGD. L dl d ^y δu d ^y du δ h ma(0,u') log P( y ^y) h u' V σ (u) ^y u h T w ( δ h dl d ^y δ u δ u ' )T d ^y du δ h V dl d ^y dl d ^y d ^y du w dl d ^y δ u d ^y du δ w ( δ u ' T δ V ) ( δ h dl d ^y δ u δ u ' )T d ^y du δ h Benjamin Roth (CIS LMU München) Machine Learning Basics III 54 / 62

56 Summary Gradient descent: Minimize loss by iteratively substracting gradient from parameter vector. Stochastic gradient descent: Approimate gradient by considering small subsets of eamples. Regularization: penalize large parameter values, e.g. by adding l2-norm of parameter vector. Feedforward networks: layers of (non-linear) function compositions. Output non-linearities: interpreted as probability densities (logistic sigmoid, softma, Gaussian) Hidden layers: Rectified linear units (ma(0, z)) Backpropagation: Compute gradient of cost w.r.t. parameters using chain rule. Benjamin Roth (CIS LMU München) Machine Learning Basics III 55 / 62

57 Outline 1 Classification Logistic Regression 2 Gradient Based Optimization Gradient Descent for Logistic Regression Stochastic Gradient Descent 3 Deep Feedforward Networks Design Choices for Output Units Design Choices for Hidden Units Backpropagation 4 Regularization L2-Regularization Benjamin Roth (CIS LMU München) Machine Learning Basics III 56 / 62

58 Outline 1 Classification Logistic Regression 2 Gradient Based Optimization Gradient Descent for Logistic Regression Stochastic Gradient Descent 3 Deep Feedforward Networks Design Choices for Output Units Design Choices for Hidden Units Backpropagation 4 Regularization L2-Regularization Benjamin Roth (CIS LMU München) Machine Learning Basics III 57 / 62

59 Regularization prediction prediction prediction feature feature feature Overfitting vs. underfitting Regularization: Any modification to a learning algorithm for reducing its generalization error but not its training error Build a preference into ML algorithm for one solution in hypothesis space over another Solution space is still the same Unpreferred solution is penalized: only chosen if there it fits training data much better Benjamin Roth (CIS LMU München) Machine Learning Basics III 58 / 62

60 L2-Regularization Large parameters overfitting σ() σ(2) σ(100) Prefer models with smaller feature weights. Popular regularizers: Penalize large L2 norm. Penalize large L1 norm (aka LASSO, induces sparsity) Benjamin Roth (CIS LMU München) Machine Learning Basics III 59 / 62

61 Regularization Add term that penalizes large l2 norm. The amount of penalty is controlled by a parameter λ Linear regression: J(θ) = MSE(θ) + λ Logistic regression: 2 θt θ J(θ) = NLL(θ) + λ 2 θt θ From a Bayesian perspective, l2-regularization corresponds to a Gaussian prior on the parameters. Benjamin Roth (CIS LMU München) Machine Learning Basics III 60 / 62

62 L2-Regularization The surface of the objective function is now a combination of the original cost, and the regularization penalty. Benjamin Roth (CIS LMU München) Machine Learning Basics III 61 / 62

63 L2-Regularization Gradient of regularization term: θ λ 2 θt θ = λθ Gradient descent for regularized cost function: θ t+1 := θ t η θ (NLL(θ t ) + λθ T t θ t ) θ t+1 := (1 ηλ)θ t η θ NLL(θ t ) Benjamin Roth (CIS LMU München) Machine Learning Basics III 62 / 62

Deep Feedforward Networks

Deep Feedforward Networks Deep Feedforward Networks Liu Yang March 30, 2017 Liu Yang Short title March 30, 2017 1 / 24 Overview 1 Background A general introduction Example 2 Gradient based learning Cost functions Output Units 3

More information

Intro to Neural Networks and Deep Learning

Intro to Neural Networks and Deep Learning Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi UVA CS 6316 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions Backpropagation Nonlinearity Functions NNs

More information

Artificial Neuron (Perceptron)

Artificial Neuron (Perceptron) 9/6/208 Gradient Descent (GD) Hantao Zhang Deep Learning with Python Reading: https://en.wikipedia.org/wiki/gradient_descent Artificial Neuron (Perceptron) = w T = w 0 0 + + w 2 2 + + w d d where

More information

Deep Feedforward Networks

Deep Feedforward Networks Deep Feedforward Networks Liu Yang March 30, 2017 Liu Yang Short title March 30, 2017 1 / 24 Overview 1 Background A general introduction Example 2 Gradient based learning Cost functions Output Units 3

More information

Reading Group on Deep Learning Session 1

Reading Group on Deep Learning Session 1 Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular

More information

Deep Feedforward Networks. Seung-Hoon Na Chonbuk National University

Deep Feedforward Networks. Seung-Hoon Na Chonbuk National University Deep Feedforward Networks Seung-Hoon Na Chonbuk National University Neural Network: Types Feedforward neural networks (FNN) = Deep feedforward networks = multilayer perceptrons (MLP) No feedback connections

More information

Comments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms

Comments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms Neural networks Comments Assignment 3 code released implement classification algorithms use kernels for census dataset Thought questions 3 due this week Mini-project: hopefully you have started 2 Example:

More information

Gradient-Based Learning. Sargur N. Srihari

Gradient-Based Learning. Sargur N. Srihari Gradient-Based Learning Sargur N. srihari@cedar.buffalo.edu 1 Topics Overview 1. Example: Learning XOR 2. Gradient-Based Learning 3. Hidden Units 4. Architecture Design 5. Backpropagation and Other Differentiation

More information

Lecture 3 Feedforward Networks and Backpropagation

Lecture 3 Feedforward Networks and Backpropagation Lecture 3 Feedforward Networks and Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 3, 2017 Things we will look at today Recap of Logistic Regression

More information

Lecture 3 Feedforward Networks and Backpropagation

Lecture 3 Feedforward Networks and Backpropagation Lecture 3 Feedforward Networks and Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 3, 2017 Things we will look at today Recap of Logistic Regression

More information

Deep Feedforward Networks. Lecture slides for Chapter 6 of Deep Learning Ian Goodfellow Last updated

Deep Feedforward Networks. Lecture slides for Chapter 6 of Deep Learning  Ian Goodfellow Last updated Deep Feedforward Networks Lecture slides for Chapter 6 of Deep Learning www.deeplearningbook.org Ian Goodfellow Last updated 2016-10-04 Roadmap Example: Learning XOR Gradient-Based Learning Hidden Units

More information

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Nicolas Thome Prenom.Nom@cnam.fr http://cedric.cnam.fr/vertigo/cours/ml2/ Département Informatique Conservatoire

More information

Lecture 5: Logistic Regression. Neural Networks

Lecture 5: Logistic Regression. Neural Networks Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture

More information

LECTURE # - NEURAL COMPUTATION, Feb 04, Linear Regression. x 1 θ 1 output... θ M x M. Assumes a functional form

LECTURE # - NEURAL COMPUTATION, Feb 04, Linear Regression. x 1 θ 1 output... θ M x M. Assumes a functional form LECTURE # - EURAL COPUTATIO, Feb 4, 4 Linear Regression Assumes a functional form f (, θ) = θ θ θ K θ (Eq) where = (,, ) are the attributes and θ = (θ, θ, θ ) are the function parameters Eample: f (, θ)

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

Logistic Regression & Neural Networks

Logistic Regression & Neural Networks Logistic Regression & Neural Networks CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides credit: Graham Neubig, Jacob Eisenstein Logistic Regression Perceptron & Probabilities What if we want a probability

More information

1 What a Neural Network Computes

1 What a Neural Network Computes Neural Networks 1 What a Neural Network Computes To begin with, we will discuss fully connected feed-forward neural networks, also known as multilayer perceptrons. A feedforward neural network consists

More information

Artificial Neural Networks 2

Artificial Neural Networks 2 CSC2515 Machine Learning Sam Roweis Artificial Neural s 2 We saw neural nets for classification. Same idea for regression. ANNs are just adaptive basis regression machines of the form: y k = j w kj σ(b

More information

Neural Network Training

Neural Network Training Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification

More information

CSC 411 Lecture 10: Neural Networks

CSC 411 Lecture 10: Neural Networks CSC 411 Lecture 10: Neural Networks Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 10-Neural Networks 1 / 35 Inspiration: The Brain Our brain has 10 11

More information

Logistic Regression. COMP 527 Danushka Bollegala

Logistic Regression. COMP 527 Danushka Bollegala Logistic Regression COMP 527 Danushka Bollegala Binary Classification Given an instance x we must classify it to either positive (1) or negative (0) class We can use {1,-1} instead of {1,0} but we will

More information

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann Neural Networks with Applications to Vision and Language Feedforward Networks Marco Kuhlmann Feedforward networks Linear separability x 2 x 2 0 1 0 1 0 0 x 1 1 0 x 1 linearly separable not linearly separable

More information

Lecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning

Lecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning Lecture 0 Neural networks and optimization Machine Learning and Data Mining November 2009 UBC Gradient Searching for a good solution can be interpreted as looking for a minimum of some error (loss) function

More information

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat Neural Networks, Computation Graphs CMSC 470 Marine Carpuat Binary Classification with a Multi-layer Perceptron φ A = 1 φ site = 1 φ located = 1 φ Maizuru = 1 φ, = 2 φ in = 1 φ Kyoto = 1 φ priest = 0 φ

More information

Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks. Cannot approximate (learn) non-linear functions

Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks. Cannot approximate (learn) non-linear functions BACK-PROPAGATION NETWORKS Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks Cannot approximate (learn) non-linear functions Difficult (if not impossible) to design

More information

Lecture 6. Regression

Lecture 6. Regression Lecture 6. Regression Prof. Alan Yuille Summer 2014 Outline 1. Introduction to Regression 2. Binary Regression 3. Linear Regression; Polynomial Regression 4. Non-linear Regression; Multilayer Perceptron

More information

ECE521 lecture 4: 19 January Optimization, MLE, regularization

ECE521 lecture 4: 19 January Optimization, MLE, regularization ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity

More information

Feed-forward Networks Network Training Error Backpropagation Applications. Neural Networks. Oliver Schulte - CMPT 726. Bishop PRML Ch.

Feed-forward Networks Network Training Error Backpropagation Applications. Neural Networks. Oliver Schulte - CMPT 726. Bishop PRML Ch. Neural Networks Oliver Schulte - CMPT 726 Bishop PRML Ch. 5 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of biological plausibility We will

More information

Lecture 2: Learning with neural networks

Lecture 2: Learning with neural networks Lecture 2: Learning with neural networks Deep Learning @ UvA LEARNING WITH NEURAL NETWORKS - PAGE 1 Lecture Overview o Machine Learning Paradigm for Neural Networks o The Backpropagation algorithm for

More information

CS60010: Deep Learning

CS60010: Deep Learning CS60010: Deep Learning Sudeshna Sarkar Spring 2018 16 Jan 2018 FFN Goal: Approximate some unknown ideal function f : X! Y Ideal classifier: y = f*(x) with x and category y Feedforward Network: Define parametric

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Linear Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1

More information

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

COMP 551 Applied Machine Learning Lecture 14: Neural Networks COMP 551 Applied Machine Learning Lecture 14: Neural Networks Instructor: Ryan Lowe (ryan.lowe@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted,

More information

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann (Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for

More information

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4 Neural Networks Learning the network: Backprop 11-785, Fall 2018 Lecture 4 1 Recap: The MLP can represent any function The MLP can be constructed to represent anything But how do we construct it? 2 Recap:

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks

Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks Jan Drchal Czech Technical University in Prague Faculty of Electrical Engineering Department of Computer Science Topics covered

More information

Deep Feedforward Networks

Deep Feedforward Networks Deep Feedforward Networks Yongjin Park 1 Goal of Feedforward Networks Deep Feedforward Networks are also called as Feedforward neural networks or Multilayer Perceptrons Their Goal: approximate some function

More information

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications Neural Networks Bishop PRML Ch. 5 Alireza Ghane Neural Networks Alireza Ghane / Greg Mori 1 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of

More information

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6 Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)

More information

Machine Learning Basics: Stochastic Gradient Descent. Sargur N. Srihari

Machine Learning Basics: Stochastic Gradient Descent. Sargur N. Srihari Machine Learning Basics: Stochastic Gradient Descent Sargur N. srihari@cedar.buffalo.edu 1 Topics 1. Learning Algorithms 2. Capacity, Overfitting and Underfitting 3. Hyperparameters and Validation Sets

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Neural Networks Varun Chandola x x 5 Input Outline Contents February 2, 207 Extending Perceptrons 2 Multi Layered Perceptrons 2 2. Generalizing to Multiple Labels.................

More information

From perceptrons to word embeddings. Simon Šuster University of Groningen

From perceptrons to word embeddings. Simon Šuster University of Groningen From perceptrons to word embeddings Simon Šuster University of Groningen Outline A basic computational unit Weighting some input to produce an output: classification Perceptron Classify tweets Written

More information

Feedforward Neural Networks

Feedforward Neural Networks Feedforward Neural Networks Michael Collins 1 Introduction In the previous notes, we introduced an important class of models, log-linear models. In this note, we describe feedforward neural networks, which

More information

COMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS16

COMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS16 COMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS6 Lecture 3: Classification with Logistic Regression Advanced optimization techniques Underfitting & Overfitting Model selection (Training-

More information

Neural Networks: Backpropagation

Neural Networks: Backpropagation Neural Networks: Backpropagation Seung-Hoon Na 1 1 Department of Computer Science Chonbuk National University 2018.10.25 eung-hoon Na (Chonbuk National University) Neural Networks: Backpropagation 2018.10.25

More information

Neural networks COMS 4771

Neural networks COMS 4771 Neural networks COMS 4771 1. Logistic regression Logistic regression Suppose X = R d and Y = {0, 1}. A logistic regression model is a statistical model where the conditional probability function has a

More information

1 Machine Learning Concepts (16 points)

1 Machine Learning Concepts (16 points) CSCI 567 Fall 2018 Midterm Exam DO NOT OPEN EXAM UNTIL INSTRUCTED TO DO SO PLEASE TURN OFF ALL CELL PHONES Problem 1 2 3 4 5 6 Total Max 16 10 16 42 24 12 120 Points Please read the following instructions

More information

Machine Learning. Lecture 04: Logistic and Softmax Regression. Nevin L. Zhang

Machine Learning. Lecture 04: Logistic and Softmax Regression. Nevin L. Zhang Machine Learning Lecture 04: Logistic and Softmax Regression Nevin L. Zhang lzhang@cse.ust.hk Department of Computer Science and Engineering The Hong Kong University of Science and Technology This set

More information

MACHINE LEARNING ADVANCED MACHINE LEARNING

MACHINE LEARNING ADVANCED MACHINE LEARNING MACHINE LEARNING ADVANCED MACHINE LEARNING Recap of Important Notions on Estimation of Probability Density Functions 2 2 MACHINE LEARNING Overview Definition pdf Definition joint, condition, marginal,

More information

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 16

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 16 Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 16 Slides adapted from Jordan Boyd-Graber, Justin Johnson, Andrej Karpathy, Chris Ketelsen, Fei-Fei Li, Mike Mozer, Michael Nielson

More information

Machine Learning 4771

Machine Learning 4771 Machine Learning 4771 Instructor: Tony Jebara Topic 7 Unsupervised Learning Statistical Perspective Probability Models Discrete & Continuous: Gaussian, Bernoulli, Multinomial Maimum Likelihood Logistic

More information

Course 395: Machine Learning - Lectures

Course 395: Machine Learning - Lectures Course 395: Machine Learning - Lectures Lecture 1-2: Concept Learning (M. Pantic) Lecture 3-4: Decision Trees & CBC Intro (M. Pantic & S. Petridis) Lecture 5-6: Evaluating Hypotheses (S. Petridis) Lecture

More information

Deep Feedforward Networks. Sargur N. Srihari

Deep Feedforward Networks. Sargur N. Srihari Deep Feedforward Networks Sargur N. srihari@cedar.buffalo.edu 1 Topics Overview 1. Example: Learning XOR 2. Gradient-Based Learning 3. Hidden Units 4. Architecture Design 5. Backpropagation and Other Differentiation

More information

10-701/ Machine Learning, Fall

10-701/ Machine Learning, Fall 0-70/5-78 Machine Learning, Fall 2003 Homework 2 Solution If you have questions, please contact Jiayong Zhang .. (Error Function) The sum-of-squares error is the most common training

More information

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Lecture 4: Deep Learning Essentials Pierre Geurts, Gilles Louppe, Louis Wehenkel 1 / 52 Outline Goal: explain and motivate the basic constructs of neural networks. From linear

More information

MLPR: Logistic Regression and Neural Networks

MLPR: Logistic Regression and Neural Networks MLPR: Logistic Regression and Neural Networks Machine Learning and Pattern Recognition Amos Storkey Amos Storkey MLPR: Logistic Regression and Neural Networks 1/28 Outline 1 Logistic Regression 2 Multi-layer

More information

4. Multilayer Perceptrons

4. Multilayer Perceptrons 4. Multilayer Perceptrons This is a supervised error-correction learning algorithm. 1 4.1 Introduction A multilayer feedforward network consists of an input layer, one or more hidden layers, and an output

More information

Outline. MLPR: Logistic Regression and Neural Networks Machine Learning and Pattern Recognition. Which is the correct model? Recap.

Outline. MLPR: Logistic Regression and Neural Networks Machine Learning and Pattern Recognition. Which is the correct model? Recap. Outline MLPR: and Neural Networks Machine Learning and Pattern Recognition 2 Amos Storkey Amos Storkey MLPR: and Neural Networks /28 Recap Amos Storkey MLPR: and Neural Networks 2/28 Which is the correct

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

Neural Networks (Part 1) Goals for the lecture

Neural Networks (Part 1) Goals for the lecture Neural Networks (Part ) Mark Craven and David Page Computer Sciences 760 Spring 208 www.biostat.wisc.edu/~craven/cs760/ Some of the slides in these lectures have been adapted/borrowed from materials developed

More information

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

Neural Networks. Nicholas Ruozzi University of Texas at Dallas Neural Networks Nicholas Ruozzi University of Texas at Dallas Handwritten Digit Recognition Given a collection of handwritten digits and their corresponding labels, we d like to be able to correctly classify

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Neural Networks: A brief touch Yuejie Chi Department of Electrical and Computer Engineering Spring 2018 1/41 Outline

More information

Ch.6 Deep Feedforward Networks (2/3)

Ch.6 Deep Feedforward Networks (2/3) Ch.6 Deep Feedforward Networks (2/3) 16. 10. 17. (Mon.) System Software Lab., Dept. of Mechanical & Information Eng. Woonggy Kim 1 Contents 6.3. Hidden Units 6.3.1. Rectified Linear Units and Their Generalizations

More information

Feed-forward Network Functions

Feed-forward Network Functions Feed-forward Network Functions Sargur Srihari Topics 1. Extension of linear models 2. Feed-forward Network Functions 3. Weight-space symmetries 2 Recap of Linear Models Linear Models for Regression, Classification

More information

LECTURE NOTE #NEW 6 PROF. ALAN YUILLE

LECTURE NOTE #NEW 6 PROF. ALAN YUILLE LECTURE NOTE #NEW 6 PROF. ALAN YUILLE 1. Introduction to Regression Now consider learning the conditional distribution p(y x). This is often easier than learning the likelihood function p(x y) and the

More information

Lecture 3: Pattern Classification. Pattern classification

Lecture 3: Pattern Classification. Pattern classification EE E68: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mitures and

More information

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others) Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward

More information

Neural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016

Neural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016 Neural Networks Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016 Outline Part 1 Introduction Feedforward Neural Networks Stochastic Gradient Descent Computational Graph

More information

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton Motivation Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses

More information

Neural Networks (and Gradient Ascent Again)

Neural Networks (and Gradient Ascent Again) Neural Networks (and Gradient Ascent Again) Frank Wood April 27, 2010 Generalized Regression Until now we have focused on linear regression techniques. We generalized linear regression to include nonlinear

More information

Rapid Introduction to Machine Learning/ Deep Learning

Rapid Introduction to Machine Learning/ Deep Learning Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University 1/59 Lecture 4a Feedforward neural network October 30, 2015 2/59 Table of contents 1 1. Objectives of Lecture

More information

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others) Machine Learning Neural Networks (slides from Domingos, Pardo, others) Human Brain Neurons Input-Output Transformation Input Spikes Output Spike Spike (= a brief pulse) (Excitatory Post-Synaptic Potential)

More information

Introduction to Neural Networks

Introduction to Neural Networks CUONG TUAN NGUYEN SEIJI HOTTA MASAKI NAKAGAWA Tokyo University of Agriculture and Technology Copyright by Nguyen, Hotta and Nakagawa 1 Pattern classification Which category of an input? Example: Character

More information

Machine Learning (CSE 446): Backpropagation

Machine Learning (CSE 446): Backpropagation Machine Learning (CSE 446): Backpropagation Noah Smith c 2017 University of Washington nasmith@cs.washington.edu November 8, 2017 1 / 32 Neuron-Inspired Classifiers correct output y n L n loss hidden units

More information

A summary of Deep Learning without Poor Local Minima

A summary of Deep Learning without Poor Local Minima A summary of Deep Learning without Poor Local Minima by Kenji Kawaguchi MIT oral presentation at NIPS 2016 Learning Supervised (or Predictive) learning Learn a mapping from inputs x to outputs y, given

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Lecture 17: Neural Networks and Deep Learning

Lecture 17: Neural Networks and Deep Learning UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016 Lecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions

More information

Vasil Khalidov & Miles Hansard. C.M. Bishop s PRML: Chapter 5; Neural Networks

Vasil Khalidov & Miles Hansard. C.M. Bishop s PRML: Chapter 5; Neural Networks C.M. Bishop s PRML: Chapter 5; Neural Networks Introduction The aim is, as before, to find useful decompositions of the target variable; t(x) = y(x, w) + ɛ(x) (3.7) t(x n ) and x n are the observations,

More information

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others) Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward

More information

Machine Learning (CSE 446): Neural Networks

Machine Learning (CSE 446): Neural Networks Machine Learning (CSE 446): Neural Networks Noah Smith c 2017 University of Washington nasmith@cs.washington.edu November 6, 2017 1 / 22 Admin No Wednesday office hours for Noah; no lecture Friday. 2 /

More information

CS489/698: Intro to ML

CS489/698: Intro to ML CS489/698: Intro to ML Lecture 03: Multi-layer Perceptron Outline Failure of Perceptron Neural Network Backpropagation Universal Approximator 2 Outline Failure of Perceptron Neural Network Backpropagation

More information

Deep Learning & Artificial Intelligence WS 2018/2019

Deep Learning & Artificial Intelligence WS 2018/2019 Deep Learning & Artificial Intelligence WS 2018/2019 Linear Regression Model Model Error Function: Squared Error Has no special meaning except it makes gradients look nicer Prediction Ground truth / target

More information

Linear Models for Regression

Linear Models for Regression Linear Models for Regression CSE 4309 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 The Regression Problem Training data: A set of input-output

More information

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes CS 6501: Deep Learning for Computer Graphics Basics of Neural Networks Connelly Barnes Overview Simple neural networks Perceptron Feedforward neural networks Multilayer perceptron and properties Autoencoders

More information

Computational statistics

Computational statistics Computational statistics Lecture 3: Neural networks Thierry Denœux 5 March, 2016 Neural networks A class of learning methods that was developed separately in different fields statistics and artificial

More information

Backpropagation Introduction to Machine Learning. Matt Gormley Lecture 13 Mar 1, 2018

Backpropagation Introduction to Machine Learning. Matt Gormley Lecture 13 Mar 1, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Backpropagation Matt Gormley Lecture 13 Mar 1, 2018 1 Reminders Homework 5: Neural

More information

Logistic Regression. Machine Learning Fall 2018

Logistic Regression. Machine Learning Fall 2018 Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes

More information

CSC 578 Neural Networks and Deep Learning

CSC 578 Neural Networks and Deep Learning CSC 578 Neural Networks and Deep Learning Fall 2018/19 3. Improving Neural Networks (Some figures adapted from NNDL book) 1 Various Approaches to Improve Neural Networks 1. Cost functions Quadratic Cross

More information

Artificial Neural Networks

Artificial Neural Networks Introduction ANN in Action Final Observations Application: Poverty Detection Artificial Neural Networks Alvaro J. Riascos Villegas University of los Andes and Quantil July 6 2018 Artificial Neural Networks

More information

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu Logistic Regression Review 10-601 Fall 2012 Recitation September 25, 2012 TA: Selen Uguroglu!1 Outline Decision Theory Logistic regression Goal Loss function Inference Gradient Descent!2 Training Data

More information

CS229 Supplemental Lecture notes

CS229 Supplemental Lecture notes CS229 Supplemental Lecture notes John Duchi Binary classification In binary classification problems, the target y can take on at only two values. In this set of notes, we show how to model this problem

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Logistic Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574

More information

Backpropagation Introduction to Machine Learning. Matt Gormley Lecture 12 Feb 23, 2018

Backpropagation Introduction to Machine Learning. Matt Gormley Lecture 12 Feb 23, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Backpropagation Matt Gormley Lecture 12 Feb 23, 2018 1 Neural Networks Outline

More information

Artificial Neural Networks. MGS Lecture 2

Artificial Neural Networks. MGS Lecture 2 Artificial Neural Networks MGS 2018 - Lecture 2 OVERVIEW Biological Neural Networks Cell Topology: Input, Output, and Hidden Layers Functional description Cost functions Training ANNs Back-Propagation

More information

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis Introduction to Natural Computation Lecture 9 Multilayer Perceptrons and Backpropagation Peter Lewis 1 / 25 Overview of the Lecture Why multilayer perceptrons? Some applications of multilayer perceptrons.

More information

Neural Networks: Backpropagation

Neural Networks: Backpropagation Neural Networks: Backpropagation Machine Learning Fall 2017 Based on slides and material from Geoffrey Hinton, Richard Socher, Dan Roth, Yoav Goldberg, Shai Shalev-Shwartz and Shai Ben-David, and others

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data January 17, 2006 Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Multi-Layer Perceptrons Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Machine Learning

Machine Learning Machine Learning 10-315 Maria Florina Balcan Machine Learning Department Carnegie Mellon University 03/29/2019 Today: Artificial neural networks Backpropagation Reading: Mitchell: Chapter 4 Bishop: Chapter

More information

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation) Learning for Deep Neural Networks (Back-propagation) Outline Summary of Previous Standford Lecture Universal Approximation Theorem Inference vs Training Gradient Descent Back-Propagation

More information