Vasil Khalidov & Miles Hansard. C.M. Bishop s PRML: Chapter 5; Neural Networks
|
|
- Noreen Mosley
- 5 years ago
- Views:
Transcription
1 C.M. Bishop s PRML: Chapter 5; Neural Networks
2 Introduction The aim is, as before, to find useful decompositions of the target variable; t(x) = y(x, w) + ɛ(x) (3.7) t(x n ) and x n are the observations, n = 1,..., N. ɛ(x) is the residual error.
3 Linear Models For example, recall the (Generalized) Linear Model: M y(x, w) = f w j φ j (x) (5.1) j=0 φ = (φ 0,..., φ M ) is the fixed model basis. w = (w 0,..., w M ) are the model coefficients. For regression: f( ) is the identity. For classification: f( ) maps to a posterior probability.
4 Feed-Forward Networks Feed-forward Neural Networks generalize the linear model M y(x, w) = f w j φ j (x) (5.1 again) j=0 The basis itself, as well as the coefficients w j, will be adapted. Roughly: the principle of (5.1) will be used twice; once to define the basis, and once to obtain the output.
5 Activations Construct M linear combinations of the inputs x 1,..., x D : a j = D i=0 w (1) ji x i (5.2) a j are the activations, j = 1,..., M. w (1) ji are the layer one weights, i = 1... D. w (1) j0 are the layer one biases. Each linear combination a j is transformed by a (nonlinear, differentiable) activation function: z j = h(a j ) (5.3)
6 Output Activations The hidden outputs z j = h(a j ) are linearly combined in layer two: a k = M j=0 a k are the output activations, k = 1,..., K. w (2) kj are the layer two weights, j = 1... D. w (2) k0 are the layer two biases. w (2) kj z j (5.4) The output activations a k are transformed by the output activation function: y k = σ(a k ) (5.5) y k are the final outputs. σ( ) is, like h( ), a sigmoidal function.
7 The Complete Two-Layer Model The model y k = σ(a k ) is, after substituting the definitions of a j and a k : M y k (x, w) = σ w (2) kj h D w (1) ji x i (5.9) j=0 i=0 a j a k h( ) and σ( ) are a sigmoidal functions, e.g. the logistic function. s(a) = exp( a) s(a) [0, 1] If σ( ) is the identity, then a regression model is obtained. Evaluation of (5.9) is called forward propagation.
8 Network Diagram The approximation process can be represented by a network: hidden units xd w (1) MD zm w (2) KM yk inputs outputs y1 x1 x0 z1 w (2) 10 z0 Figure: 5.1 Nodes are input, hidden and output units. Links are corresponding weights. Information propagates forwards from the explanatory variable x to the estimated response y k (x, w).
9 Properties & Generalizations Typically K D M, which means that the network is redundant if all h( ) are linear. There may be more than one layer of hidden units. Individual units need not be fully connected to the next layer. Individual links may skip over one or more subsequent layers. Networks with two (cf. 5.9) or more layers are universal approximators. Any continuous function can be uniformly approximated to arbitrary accuracy, given enough hidden units. This is true for many definitions of h( ), but excluding polynomials. There may be symmetries in the weight space, meaning that different choices of w may define the same mapping from input to output.
10 Maximum Likelihood Parameters The aim is to minimize the residual error between y(x n, w) and t n. Suppose that the target is a scalar-valued function, which is Normally distributed around the estimate: p(t x, w) = N ( t y(x, w), β 1 ) (5.12) Then it will be appropriate to consider the sum of squared-errors E(w) = 1 2 N ( ) 2 y(x n, w) t n (5.14) n=1 The maximum-likelihood estimate of w can be obtained by (numerical) minimization: w ML = min w E(w)
11 Maximum Likelihood Precision Having obtained the ML parameter estimate w ML, the precision, β can also be estimated. E.g. if the N observations are IID, then their joint probability is ( ) N p {t 1,..., t N } {x 1,..., x N }, w, β = p(t n x n, w, β) The negative log-likelihood, in this case, is n=1 log p = βe(w ML ) N 2 log β + N 2 log 2π (5.13) The derivative d/dβ is E(w ML ) N 2β and so 1 β ML = 1 N 2E(w ML) (5.15) And 1/β ML = 1 NK 2E(w ML) for K target variables.
12 Error Surface The residual error E(w) can be visualized as a surface in the weight-space: E(w) w A w B wc w 1 w 2 E Figure: 5.5 The error will, in practice, be highly nonlinear, with many minima, maxima and saddle-points. There will be inequivalent minima, determined by the particular data and model, as well as equivalent minima, corresponding to weight-space symmetries.
13 Parameter Optimization Iterative search for a local minimum of the error: w (τ+1) = w (τ) + w (τ) (5.27) E will be zero at a minimum of the error. τ is the time-step. w (τ) is the weight-vector update. The definition of the update depends on the choice of algorithm.
14 Local Quadratic Approximation The truncated Taylor expansion of E(w) around a weight-point ŵ is E(w) E(ŵ) + (w ŵ) b (w ŵ) H(w ŵ) (5.28) b = E w=ŵ is the gradient at ŵ. (H) ij = E w i w j w=ŵ is the Hessian E at ŵ. The gradient of E can be approximated by the gradient of the quadratic model (5.28); if w ŵ then where 1 2 E b + H(w ŵ) (5.31) ( (H + H )w Hŵ H ŵ ) = H(w ŵ), as H = H.
15 Approximation at a Minimum Suppose that w is at a minimum of E, so E w=w is zero, and H = E w=w E(w) = E(w ) (w w ) H(w w ) (5.32) is the Hessian. The eigenvectors Hu i = λu i are orthonormal. (w w ) can be represented in H-coordinates as i α iu i. Hence the second term of (5.32) can be written So that 1 2 (w w ) H(w w ) = 1 ( ) ( ) λ i α i u i α j u j 2 i E(w) = E(w ) + 1 λ i αi 2 (5.36) 2 i j
16 Characterization of a Minimum The eigenvalues λ i of H characterize the stationary point w. If all λ i > 0, then H is positive definite (v Hv > 0). This is analogous to the scalar condition 2 E w 2 w > 0. Zero gradient and positive principle curvatures mean that E(w ) is a minimum. w 2 u 2 u 1 λ 1/2 2 w λ 1/2 1 Figure: 5.6 w 1
17 Gradient Descent The simplest approach is to update w by a displacement in the negative gradient direction. w (τ+1) = w (τ) η E ( w (τ)) (5.41) This is a steepest descent algorithm. η is the learning rate. This is a batch method, as evaluation of E involves the entire data set. Conjugate gradient or quasi-newton methods may, in practice, be preferred. A range of starting points { w (0)} may be needed, in order to find a satisfactory minimum.
18 Optimization Scheme An efficient method for the evaluation of E(w) is needed. Each iteration of the descent algorithm has two stages: I. Evaluate derivatives of error with respect to weights (involving backpropagation of error though the network). II. Use derivatives to compute adjustments of the weights (e.g. steepest descent). Backpropagation is a general principle, which can be applied to many types of network and error function.
19 Simple Backpropagation The error function is, typically, a sum over the data points E(w) = N n=1 E n(w). For example, consider a linear model y k = i w ki x i (5.45) The error function, for an individual input x n, is E n = 1 (y nk t nk ) 2, where y nk = y k (x n, w). (5.46) 2 k The gradient with respect to a weight w ji is w ji is a particular link (x i to y j ). E n w ji = (y nj t nj ) x ni (5.47) x ni is the input to the link (i-th component of x n ). (y nj t nj ) is the error output by the link.
20 General Backpropagation Recall that, in general, each unit computes a weighted sum: a j = i w ji z i with activation z j = h(a j ). (5.48,5.49) For each error-term: E n = E n a j (5.50) w ji a j w ji }{{} δ j So, from 5.48: E n w ji = δ j z i (5.53) In the network: δ j E n a j = k E n a k a k a j where j {k} (5.55) Algorithm: δ j = h (a j ) k w kj δ k as a k a j = a k z j z j a j (5.56)
21 Backpropagation Algorithm The formula for the update of a given unit depends only on the later (i.e. closer to the output) layers: z i w ji δ j wkj δ k Figure: 5.7 Hence the backpropagation algorithm is: z j δ 1 Apply input x, and forward propagate to find the hidden and output activations. Evaluate δ k directly for the output units. Back propagate the δ s to obtain a δ j for each hidden unit. Evaluate the derivatives En w ji = δ j z i.
22 Computational Efficiency The back-propagation algorithm is computationally more efficient than standard numerical minimization of E n. Suppose that W is the total number of weights and biases in the network. Backpropagation: The evaluation is O(W ) for large W, as there are many more weights than units. Standard approach: Perturb each weight, and forward propagate to compute the change in E n. This requires W O(W ) computations, so the total complexity is O(W 2 ).
23 Jacobian Matrix The properties of the network can be investigated via the Jacobian J ki = y k x i (5.70) For example, (small) errors can be propagated through the trained network: y k y k x i x i (5.72) This is useful, but costly, as J ki itself depends on x. However, note that y k = y k a j = y k w ji (5.74) x i a j j x i a j j The required derivatives y k / a j can be efficiently computed by backpropagation.
Reading Group on Deep Learning Session 1
Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationNeural Network Training
Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification
More informationNEURAL NETWORKS
5 Neural Networks In Chapters 3 and 4 we considered models for regression and classification that comprised linear combinations of fixed basis functions. We saw that such models have useful analytical
More informationy(x n, w) t n 2. (1)
Network training: Training a neural network involves determining the weight parameter vector w that minimizes a cost function. Given a training set comprising a set of input vector {x n }, n = 1,...N,
More informationFeed-forward Networks Network Training Error Backpropagation Applications. Neural Networks. Oliver Schulte - CMPT 726. Bishop PRML Ch.
Neural Networks Oliver Schulte - CMPT 726 Bishop PRML Ch. 5 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of biological plausibility We will
More informationNeural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications
Neural Networks Bishop PRML Ch. 5 Alireza Ghane Neural Networks Alireza Ghane / Greg Mori 1 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of
More informationArtificial Neural Networks. MGS Lecture 2
Artificial Neural Networks MGS 2018 - Lecture 2 OVERVIEW Biological Neural Networks Cell Topology: Input, Output, and Hidden Layers Functional description Cost functions Training ANNs Back-Propagation
More informationIntroduction to Neural Networks
Introduction to Neural Networks Steve Renals Automatic Speech Recognition ASR Lecture 10 24 February 2014 ASR Lecture 10 Introduction to Neural Networks 1 Neural networks for speech recognition Introduction
More informationError Backpropagation
Error Backpropagation Sargur Srihari 1 Topics in Error Backpropagation Terminology of backpropagation 1. Evaluation of Error function derivatives 2. Error Backpropagation algorithm 3. A simple example
More informationMulticlass Logistic Regression
Multiclass Logistic Regression Sargur. Srihari University at Buffalo, State University of ew York USA Machine Learning Srihari Topics in Linear Classification using Probabilistic Discriminative Models
More informationMark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer.
University of Cambridge Engineering Part IIB & EIST Part II Paper I0: Advanced Pattern Processing Handouts 4 & 5: Multi-Layer Perceptron: Introduction and Training x y (x) Inputs x 2 y (x) 2 Outputs x
More informationIntroduction Neural Networks - Architecture Network Training Small Example - ZIP Codes Summary. Neural Networks - I. Henrik I Christensen
Neural Networks - I Henrik I Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I Christensen (RIM@GT) Neural Networks 1 /
More informationRegularization in Neural Networks
Regularization in Neural Networks Sargur Srihari 1 Topics in Neural Network Regularization What is regularization? Methods 1. Determining optimal number of hidden units 2. Use of regularizer in error function
More informationNeural Networks (and Gradient Ascent Again)
Neural Networks (and Gradient Ascent Again) Frank Wood April 27, 2010 Generalized Regression Until now we have focused on linear regression techniques. We generalized linear regression to include nonlinear
More informationArtificial Neural Networks
Introduction ANN in Action Final Observations Application: Poverty Detection Artificial Neural Networks Alvaro J. Riascos Villegas University of los Andes and Quantil July 6 2018 Artificial Neural Networks
More informationLecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning
Lecture 0 Neural networks and optimization Machine Learning and Data Mining November 2009 UBC Gradient Searching for a good solution can be interpreted as looking for a minimum of some error (loss) function
More informationLinear and logistic regression
Linear and logistic regression Guillaume Obozinski Ecole des Ponts - ParisTech Master MVA Linear and logistic regression 1/22 Outline 1 Linear regression 2 Logistic regression 3 Fisher discriminant analysis
More informationMachine Learning. 7. Logistic and Linear Regression
Sapienza University of Rome, Italy - Machine Learning (27/28) University of Rome La Sapienza Master in Artificial Intelligence and Robotics Machine Learning 7. Logistic and Linear Regression Luca Iocchi,
More informationIntroduction to gradient descent
6-1: Introduction to gradient descent Prof. J.C. Kao, UCLA Introduction to gradient descent Derivation and intuitions Hessian 6-2: Introduction to gradient descent Prof. J.C. Kao, UCLA Introduction Our
More informationMachine Learning Lecture 5
Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory
More informationMulti-layer Neural Networks
Multi-layer Neural Networks Steve Renals Informatics 2B Learning and Data Lecture 13 8 March 2011 Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 1 Overview Multi-layer neural
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 254 Part V
More informationIntroduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis
Introduction to Natural Computation Lecture 9 Multilayer Perceptrons and Backpropagation Peter Lewis 1 / 25 Overview of the Lecture Why multilayer perceptrons? Some applications of multilayer perceptrons.
More informationNeural Networks and Deep Learning
Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost
More informationNeural Networks and the Back-propagation Algorithm
Neural Networks and the Back-propagation Algorithm Francisco S. Melo In these notes, we provide a brief overview of the main concepts concerning neural networks and the back-propagation algorithm. We closely
More informationLINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception
LINEAR MODELS FOR CLASSIFICATION Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification,
More informationPerformance Surfaces and Optimum Points
CSC 302 1.5 Neural Networks Performance Surfaces and Optimum Points 1 Entrance Performance learning is another important class of learning law. Network parameters are adjusted to optimize the performance
More information4. Multilayer Perceptrons
4. Multilayer Perceptrons This is a supervised error-correction learning algorithm. 1 4.1 Introduction A multilayer feedforward network consists of an input layer, one or more hidden layers, and an output
More informationLecture 7. Logistic Regression. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 11, 2016
Lecture 7 Logistic Regression Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza December 11, 2016 Luigi Freda ( La Sapienza University) Lecture 7 December 11, 2016 1 / 39 Outline 1 Intro Logistic
More informationNumerical Optimization
Numerical Optimization Unit 2: Multivariable optimization problems Che-Rung Lee Scribe: February 28, 2011 (UNIT 2) Numerical Optimization February 28, 2011 1 / 17 Partial derivative of a two variable function
More informationLecture 5: Logistic Regression. Neural Networks
Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture
More informationRegression with Numerical Optimization. Logistic
CSG220 Machine Learning Fall 2008 Regression with Numerical Optimization. Logistic regression Regression with Numerical Optimization. Logistic regression based on a document by Andrew Ng October 3, 204
More informationArtificial Neural Networks
Artificial Neural Networks Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support Knowledge
More informationNONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function
More informationLast updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
Last updated: Oct 22, 2012 LINEAR CLASSIFIERS Problems 2 Please do Problem 8.3 in the textbook. We will discuss this in class. Classification: Problem Statement 3 In regression, we are modeling the relationship
More informationIterative Reweighted Least Squares
Iterative Reweighted Least Squares Sargur. University at Buffalo, State University of ew York USA Topics in Linear Classification using Probabilistic Discriminative Models Generative vs Discriminative
More informationCSC 578 Neural Networks and Deep Learning
CSC 578 Neural Networks and Deep Learning Fall 2018/19 3. Improving Neural Networks (Some figures adapted from NNDL book) 1 Various Approaches to Improve Neural Networks 1. Cost functions Quadratic Cross
More informationComputational statistics
Computational statistics Lecture 3: Neural networks Thierry Denœux 5 March, 2016 Neural networks A class of learning methods that was developed separately in different fields statistics and artificial
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationEngineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers
Engineering Part IIB: Module 4F0 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 202 Engineering Part IIB:
More informationECS171: Machine Learning
ECS171: Machine Learning Lecture 4: Optimization (LFD 3.3, SGD) Cho-Jui Hsieh UC Davis Jan 22, 2018 Gradient descent Optimization Goal: find the minimizer of a function min f (w) w For now we assume f
More informationLECTURE # - NEURAL COMPUTATION, Feb 04, Linear Regression. x 1 θ 1 output... θ M x M. Assumes a functional form
LECTURE # - EURAL COPUTATIO, Feb 4, 4 Linear Regression Assumes a functional form f (, θ) = θ θ θ K θ (Eq) where = (,, ) are the attributes and θ = (θ, θ, θ ) are the function parameters Eample: f (, θ)
More informationσ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =
Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,
More informationNeural Networks: Backpropagation
Neural Networks: Backpropagation Seung-Hoon Na 1 1 Department of Computer Science Chonbuk National University 2018.10.25 eung-hoon Na (Chonbuk National University) Neural Networks: Backpropagation 2018.10.25
More informationCh 4. Linear Models for Classification
Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,
More informationLinear Models for Regression. Sargur Srihari
Linear Models for Regression Sargur srihari@cedar.buffalo.edu 1 Topics in Linear Regression What is regression? Polynomial Curve Fitting with Scalar input Linear Basis Function Models Maximum Likelihood
More informationLogistic Regression. Sargur N. Srihari. University at Buffalo, State University of New York USA
Logistic Regression Sargur N. University at Buffalo, State University of New York USA Topics in Linear Classification using Probabilistic Discriminative Models Generative vs Discriminative 1. Fixed basis
More informationMachine Learning Lecture 7
Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant
More informationMultilayer Perceptron
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Single Perceptron 3 Boolean Function Learning 4
More informationLinear Models for Classification
Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,
More informationCSC 411: Lecture 04: Logistic Regression
CSC 411: Lecture 04: Logistic Regression Raquel Urtasun & Rich Zemel University of Toronto Sep 23, 2015 Urtasun & Zemel (UofT) CSC 411: 04-Prob Classif Sep 23, 2015 1 / 16 Today Key Concepts: Logistic
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 305 Part VII
More informationFeed-forward Network Functions
Feed-forward Network Functions Sargur Srihari Topics 1. Extension of linear models 2. Feed-forward Network Functions 3. Weight-space symmetries 2 Recap of Linear Models Linear Models for Regression, Classification
More informationGradient Descent. Sargur Srihari
Gradient Descent Sargur srihari@cedar.buffalo.edu 1 Topics Simple Gradient Descent/Ascent Difficulties with Simple Gradient Descent Line Search Brent s Method Conjugate Gradient Descent Weight vectors
More informationNeural Networks Learning the network: Backprop , Fall 2018 Lecture 4
Neural Networks Learning the network: Backprop 11-785, Fall 2018 Lecture 4 1 Recap: The MLP can represent any function The MLP can be constructed to represent anything But how do we construct it? 2 Recap:
More informationMachine Learning Basics III
Machine Learning Basics III Benjamin Roth CIS LMU München Benjamin Roth (CIS LMU München) Machine Learning Basics III 1 / 62 Outline 1 Classification Logistic Regression 2 Gradient Based Optimization Gradient
More informationIntroduction to Machine Learning
Introduction to Machine Learning Logistic Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574
More informationIntelligent Control. Module I- Neural Networks Lecture 7 Adaptive Learning Rate. Laxmidhar Behera
Intelligent Control Module I- Neural Networks Lecture 7 Adaptive Learning Rate Laxmidhar Behera Department of Electrical Engineering Indian Institute of Technology, Kanpur Recurrent Networks p.1/40 Subjects
More informationMachine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6
Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)
More informationNeural Networks Lecture 4: Radial Bases Function Networks
Neural Networks Lecture 4: Radial Bases Function Networks H.A Talebi Farzaneh Abdollahi Department of Electrical Engineering Amirkabir University of Technology Winter 2011. A. Talebi, Farzaneh Abdollahi
More informationNeural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington
Neural Networks CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Perceptrons x 0 = 1 x 1 x 2 z = h w T x Output: z x D A perceptron
More informationClassification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box
ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton Motivation Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses
More informationLinear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington
Linear Classification CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Example of Linear Classification Red points: patterns belonging
More informationNumerical Optimization Techniques
Numerical Optimization Techniques Léon Bottou NEC Labs America COS 424 3/2/2010 Today s Agenda Goals Representation Capacity Control Operational Considerations Computational Considerations Classification,
More informationMathematical optimization
Optimization Mathematical optimization Determine the best solutions to certain mathematically defined problems that are under constrained determine optimality criteria determine the convergence of the
More informationLearning Neural Networks
Learning Neural Networks Neural Networks can represent complex decision boundaries Variable size. Any boolean function can be represented. Hidden units can be interpreted as new features Deterministic
More informationMultilayer Perceptrons and Backpropagation
Multilayer Perceptrons and Backpropagation Informatics 1 CG: Lecture 7 Chris Lucas School of Informatics University of Edinburgh January 31, 2017 (Slides adapted from Mirella Lapata s.) 1 / 33 Reading:
More informationNeural networks III: The delta learning rule with semilinear activation function
Neural networks III: The delta learning rule with semilinear activation function The standard delta rule essentially implements gradient descent in sum-squared error for linear activation functions. We
More informationNeural networks. Chapter 20. Chapter 20 1
Neural networks Chapter 20 Chapter 20 1 Outline Brains Neural networks Perceptrons Multilayer networks Applications of neural networks Chapter 20 2 Brains 10 11 neurons of > 20 types, 10 14 synapses, 1ms
More informationLogistic Regression & Neural Networks
Logistic Regression & Neural Networks CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides credit: Graham Neubig, Jacob Eisenstein Logistic Regression Perceptron & Probabilities What if we want a probability
More informationComputer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression
Group Prof. Daniel Cremers 4. Gaussian Processes - Regression Definition (Rep.) Definition: A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution.
More informationMultilayer Perceptrons (MLPs)
CSE 5526: Introduction to Neural Networks Multilayer Perceptrons (MLPs) 1 Motivation Multilayer networks are more powerful than singlelayer nets Example: XOR problem x 2 1 AND x o x 1 x 2 +1-1 o x x 1-1
More informationApril 9, Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogotá. Linear Classification Models. Fabio A. González Ph.D.
Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogotá April 9, 2018 Content 1 2 3 4 Outline 1 2 3 4 problems { C 1, y(x) threshold predict(x) = C 2, y(x) < threshold, with threshold
More informationStatistical Machine Learning from Data
January 17, 2006 Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Multi-Layer Perceptrons Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole
More informationGradient Descent. Dr. Xiaowei Huang
Gradient Descent Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Three machine learning algorithms: decision tree learning k-nn linear regression only optimization objectives are discussed,
More informationLogistic Regression. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms January 25, / 48
Logistic Regression Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms January 25, 2017 1 / 48 Outline 1 Administration 2 Review of last lecture 3 Logistic regression
More informationUnit III. A Survey of Neural Network Model
Unit III A Survey of Neural Network Model 1 Single Layer Perceptron Perceptron the first adaptive network architecture was invented by Frank Rosenblatt in 1957. It can be used for the classification of
More informationArtificial Neural Networks
Artificial Neural Networks Oliver Schulte - CMPT 310 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of biological plausibility We will focus on
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationEigen Vector Descent and Line Search for Multilayer Perceptron
igen Vector Descent and Line Search for Multilayer Perceptron Seiya Satoh and Ryohei Nakano Abstract As learning methods of a multilayer perceptron (MLP), we have the BP algorithm, Newton s method, quasi-
More informationNon-Convex Optimization. CS6787 Lecture 7 Fall 2017
Non-Convex Optimization CS6787 Lecture 7 Fall 2017 First some words about grading I sent out a bunch of grades on the course management system Everyone should have all their grades in Not including paper
More informationConvex Optimization Algorithms for Machine Learning in 10 Slides
Convex Optimization Algorithms for Machine Learning in 10 Slides Presenter: Jul. 15. 2015 Outline 1 Quadratic Problem Linear System 2 Smooth Problem Newton-CG 3 Composite Problem Proximal-Newton-CD 4 Non-smooth,
More informationC4 Phenomenological Modeling - Regression & Neural Networks : Computational Modeling and Simulation Instructor: Linwei Wang
C4 Phenomenological Modeling - Regression & Neural Networks 4040-849-03: Computational Modeling and Simulation Instructor: Linwei Wang Recall.. The simple, multiple linear regression function ŷ(x) = a
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationArtificial Neural Networks 2
CSC2515 Machine Learning Sam Roweis Artificial Neural s 2 We saw neural nets for classification. Same idea for regression. ANNs are just adaptive basis regression machines of the form: y k = j w kj σ(b
More informationLecture 4: Types of errors. Bayesian regression models. Logistic regression
Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture
More informationLinear Models for Regression
Linear Models for Regression CSE 4309 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 The Regression Problem Training data: A set of input-output
More informationAI Programming CS F-20 Neural Networks
AI Programming CS662-2008F-20 Neural Networks David Galles Department of Computer Science University of San Francisco 20-0: Symbolic AI Most of this class has been focused on Symbolic AI Focus or symbols
More informationFrom perceptrons to word embeddings. Simon Šuster University of Groningen
From perceptrons to word embeddings Simon Šuster University of Groningen Outline A basic computational unit Weighting some input to produce an output: classification Perceptron Classify tweets Written
More informationNeural Networks, Computation Graphs. CMSC 470 Marine Carpuat
Neural Networks, Computation Graphs CMSC 470 Marine Carpuat Binary Classification with a Multi-layer Perceptron φ A = 1 φ site = 1 φ located = 1 φ Maizuru = 1 φ, = 2 φ in = 1 φ Kyoto = 1 φ priest = 0 φ
More informationArtificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011!
Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011! 1 Todayʼs lecture" How the brain works (!)! Artificial neural networks! Perceptrons! Multilayer feed-forward networks! Error
More informationLecture 4: Perceptrons and Multilayer Perceptrons
Lecture 4: Perceptrons and Multilayer Perceptrons Cognitive Systems II - Machine Learning SS 2005 Part I: Basic Approaches of Concept Learning Perceptrons, Artificial Neuronal Networks Lecture 4: Perceptrons
More informationCSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!!
CSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!! November 18, 2015 THE EXAM IS CLOSED BOOK. Once the exam has started, SORRY, NO TALKING!!! No, you can t even say see ya
More informationThe Error Surface of the XOR Network: The finite stationary Points
The Error Surface of the 2-2- XOR Network: The finite stationary Points Ida G. Sprinkhuizen-Kuyper Egbert J.W. Boers Abstract We investigated the error surface of the XOR problem with a 2-2- network with
More informationLearning from Data Logistic Regression
Learning from Data Logistic Regression Copyright David Barber 2-24. Course lecturer: Amos Storkey a.storkey@ed.ac.uk Course page : http://www.anc.ed.ac.uk/ amos/lfd/ 2.9.8.7.6.5.4.3.2...2.3.4.5.6.7.8.9
More informationFunctions of Several Variables
Functions of Several Variables The Unconstrained Minimization Problem where In n dimensions the unconstrained problem is stated as f() x variables. minimize f()x x, is a scalar objective function of vector
More informationComputer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression
Group Prof. Daniel Cremers 9. Gaussian Processes - Regression Repetition: Regularized Regression Before, we solved for w using the pseudoinverse. But: we can kernelize this problem as well! First step:
More information