Chapter 6: Backpropagation Nets

Size: px
Start display at page:

Download "Chapter 6: Backpropagation Nets"

Transcription

1 Chapter 6: Bacpropagation Nets Architecture: at least one laer of non-linear hidden units Learning: supervised, error driven, generalized delta rule Derivation of the weight update formula (with gradient descent approach Practical considerations Variations of BP nets Applications

2 Architecture of BP Nets Multi-laer, feed-forward networ Must have at least one hidden laer Hidden units must be non-linear units (usuall with sigmoid activation functions Full connected between units in two consecutive laers, but no connection between units within one laer. For a net with onl one hidden laer, each hidden unit z_ receives input from all input units x_i and sends output to all output units _ x xn v_p v_n v_ v_np z zp w_ w_2 Y non-linear units

3 Additional notations: (nets with one hidden laer x (x_,... x_n: input vector z (z_,... z_p: hidden vector (after x applied on input laer (_,... _m: output vector (computation result delta_: error term on Y_ Used to update weights w_ Bacpropagated to z_ delta_: error term on Z_ Used to update weights v_i weighted input z_in: v_0 + Sum(x_i * v_i: input to hidden unit Z in: w_0 + Sum(z_ * w_: input to output unit Y_ v_0 w_0 bias x_i v_i z_ w

4 Forward computing: Appl an input vector x to input units Computing activation/output vector z on hidden laer z f ( z 0 + v i x i i Computing the output vector on output laer f ( 0 + w z is the result of the computation. The net is said to be a map from input x to output Theoreticall nets of such architecture are able to approximate an L2 functions (all integral functions, including almost all commonl used math functions to an given degree of accurac, provided there are sufficient man hidden units Question: How to get these weights so that the mapping is what ou want

5 Learning for BP Nets Update of weights in W (between output and hidden laers: delta rule as in a single laer net Delta rule is not applicable to updating weights in V (between input and hidden laers because we don t now the target values for hidden units z_,... z_p Solution: Propagating errors at output units to hidden units, these computed errors on hidden units drives the update of weights in V (again b delta rule, thus called error BACPROPAGATION learning How to compute errors on hidden units is the e Error bacpropagation can be continued downward if the net has more than one hidden laer.

6 BP Learning Algorithm step 0: initialize the weights (W and V, including biases, to small random numbers step : while stop condition is false do steps 2 9 step 2: for each training sample x:t do steps 3 8 /* Feed-forward phase (computing output vector */ step 3: appl vector x to input laer step 4: compute input and output for each hidden unit Z_ z_in : v_0 + Sum(x_i * v_i; z_ : f(z_in; step 5: compute input and output for each output unit Y in : w_0 + Sum(v_ * w_; _ : f(_in;

7 /* Error bacpropagation phase */ step 6: for each output unit Y_ delta_ : (t_ _*f (_in /* error term */ delta_w_ : alpha*delta_*z_ /* weight change */ step 7: For each hidden unit Z_ delta_in : Sum(delta_ * w_ /* erro BP */ delta_ : delta_in * f (z_in /*error term */ delta_v_i : alpha*delta_*x_i /* weight change */ step 8: Update weights (incl. biases w_ : w_ + delta_w_ for all, ; v_i : v_i + delta_v_i for all i, ; step 9: test stop condition

8 Notes on BP learning: The error term for a hidden unit z_ is the weighted sum of error terms delta_ of all output units Y_ delta_in : Sum(delta_ * w_ times the derivative of its own output (f (z_in In other words, delta_in plas the same role for hidden units v_ as (t_ _ for output units _ Sigmoid function can be either binar or bipolar For multiple hidden laers: repeat step 7 (downward Stop condition: Total output error E Sum(t_ _^2 falls into the given acceptable error range E changes ver little for quite awhile Maximum time (or number of epochs is reached.

9 Derivation of BP Learning Rule Obective of BP learning: minimize the mean squared output error over all training samples p For clarit, the derivation is for error of one sample E 0.5 Approach: gradient descent. Gradient f m ( t given the direction and magnitude of change of f w.r.t its arguments d For a function of single argument f (x, f '( x dx Gradient descent requires that x changes in the opposite direction of the gradient, i.e., x f '( x. Then since / x d / dx for small x 2 we have f '( x x f ' ( x 0 monotonicall decreases 2 E P P m ( t ( p ( p x : t 2

10 For a multi-variable function (e.g., our error function E Gradient descent requires each argument opposite direction of the corresponding Then because de dt we have E E E w dw dt E ( w,... E w n changes in the n n T E E 2 E ( w,... wn wi ( w w i Gradient descent guarantees that E monotonicall decreases, and E 0 iff the partial derivatives E / w i 0 i Chain rule of derivatives is used for deriving partial derivatives d d dx If f ( x and x g( z, then dz dx dz i w i E w i n (... ( + E + w n dw dt E dw dt i ( i. e., w dw,... dt i n i E w T 0 i

11 Update W, the weights of the output laer For a particular weight (from units to J J J J J m J J z in f t in w in f t in f w t w t t w t w w E _ '( ( (b chain rule _ ( _ '( ( _ ( ( ( ( (0.5( ( ( w J The last equalit comes from the fact that onl one of the terms in, namel involves p z w in _ J w w J z J J J J z w E w in f t δ α α δ (.Then _ '( ( Let This is the update rule in Step 6 of the algorithm J Z Y

12 Update V, the weights of the hidden laer For a particular weight v (from unit X to Z E v IJ v IJ m m m m m (( t (( t (( t ( δ ( δ (0.5 w J m v IJ ( t v v f '( _ in ( _ in v IJ z IJ IJ J 2 IJ f ( _ in v (ever IJ ( _ in (becauseδ involvesv I J ( t IJ as it is connected to f '( _ in The last equalit comes from the fact that onl one of the p terms in, namel involves (via J _ in w z w J z J v IJ z z J

13 E v IJ m m m ( δ ( δ ( δ w w w J J J f '( z _ in J f '( z _ in J v x x I IJ I z J f '( z _ in m J f '( z _ in δ J _ in v x δ J w IJ I J ( z _ in J (onl v IJ (because x I x in z _ in I J and z _ in involvesv J IJ are indep.of Let δ J E δ _ in J f '( _ inj.then vij α ( α δ v This is the update rule in Step 7 of the algorithm IJ J x I

14 Strengths of BP Nets Great representation power An L2 function can be represented b a BP net (multi-laer feed-forward net with non-linear hidden units Man such functions can be learned b BP learning (gradient descent approach Wide applicabilit of BP learning Onl requires that a good set of training samples is available Does not require substantial prior nowledge or deep understanding of the domain itself (ill structured problems Tolerates noise and missing data in training samples (graceful degrading Eas to implement the core of the learning algorithm Good generalization power Accurate results for inputs outside the training set

15 Deficiencies of BP Nets Learning often taes a long time to converge Complex functions often need hundreds or thousands of epochs The net is essentiall a blac box If ma provide a desired mapping between input and output vectors (x, but does not have the information of wh a particular x is mapped to a particular. It thus cannot provide an intuitive (e.g., causal explanation for the computed result. This is because the hidden units and the learned weights do not have a semantics. What can be learned are operational parameters, not general, abstract nowledge of a domain Gradient descent approach onl guarantees to reduce the total error to a local minimum. (E ma be be reduced to zero Cannot escape from the local minimum error state Not ever function that is representable can be learned

16 How bad: depends on the shape of the error surface. Too man valles/wells will mae it eas to be trapped in local minima Possible remedies: Tr nets with different # of hidden laers and hidden units (the ma lead to different error surfaces, some might be better than others Tr different initial weights (different starting points on the surface Forced escape from local minima b random perturbation (e.g., simulated annealing Generalization is not guaranteed even if the error is reduced to zero Over-fitting/over-training problem: trained net fits the training samples perfectl (E reduced to 0 but it does not give accurate outputs for inputs not in the training set Unlie man statistical methods, there is no theoreticall wellfounded wa to assess the qualit of BP learning What is the confidence level one can have for a trained BP net, with the final E (which not or ma not be close to zero

17 Networ paralsis with sigmoid activation function Saturation regions: x >> x f ( x /( + e, its derivative f '( x f ( x( f ( x 0 when x. When x falls in a saturation region, f ( x hardl changes its value regardless how fast the magnitude of x increases Input to an unit ma fall into a saturation region when some of its incoming weights become ver large during learning. Consequentl, weights stop to change no matter how hard ou tr. E w α ( α ( t f '( _ in z w Possible remedies: Use non-saturating activation functions Periodicall normalize all weights w w : / w. 2

18 The learning (accurac, speed, and generalization is highl dependent of a set of learning parameters Initial weights, learning rate, # of hidden laers and # of units... Most of them can onl be determined empiricall (via experiments

19 A good BP net requires more than the core of the learning algorithms. Man parameters must be carefull selected to ensure a good performance. Although the deficiencies of BP nets cannot be completel cured, some of them can be eased b some practical means. Initial weights (and biases Random, [-0.05, 0.05], [-0., 0.], [-, ] Normalize weights for hidden laer (v_i (Nguen-Widrow Random assign v_i for all hidden units V_ For each V_, normalize its weight b where Practical Considerations v. 2 is the normalization factor where p # of hiddent nodes, n # of Avoid bias in weight initialization: v v β v i i input nodes. β 2 / v. and β 0.7 n p 2 after normalization

20 Training samples: Qualit and quantit of training samples determines the qualit of learning results Samples must be good representatives of the problem space Random sampling Proportional sampling (with prior nowledge of the problem space # of training patterns needed: There is no theoreticall idea number. Following is a rule of thumb W: total # of weights to be trained (depends on net structure e: desired classification error rate If we have P W/e training patterns, and we can train a net to correctl classif ( e/2p of them, Then this net would (in a statistical sense be able to correctl classif a fraction of e input patterns drawn from the same sample space Example: W 80, e 0., P 800. If we can successfull train the networ to correctl classif ( 0./2* of the samples, we would believe that the net will wor correctl 90% of time with other input.

21 Data representation: Binar vs bipolar Bipolar representation uses training samples more efficientl w α δ z vi α δ x i no learning will occur when x i 0 or z with binar rep. 0 # of patterns can be represented n input units: binar: 2^n bipolar: 2^(n- if no biases used, this is due to (antismmetr (if the net outputs for input x, it will output for input x Real value data Input units: real value units (ma subect to normalization Hidden units are sigmoid Activation function for output units: often linear (even identit e.g., _ in w z Training ma be much slower than with binar/bipolar data (some use binar encoding of real values

22 How man hidden laers and hidden units per laer: Theoreticall, one hidden laer (possibl with man hidden units is sufficient for an L2 functions There is no theoretical results on minimum necessar # of hidden units (either problem dependent or independent Practical rule of thumb: n # of input units; p # of hidden units For binar/bipolar data: p 2n For real data: p >> 2n Multiple hidden laers with fewer units ma be trained faster for similar qualit in some applications

23 Over-training/over-fitting Trained net fits ver well with the training samples (total error E 0, but not with new input patterns Over-training ma become serious if Training samples were not obtained properl Training samples have noise Control over-training for better generalization Cross-validation: dividing the samples into two sets - 90% into training set: used to train the networ - 0% into test set: used to validate training results periodicall test the trained net with test samples, stop training when test results start to deteriorating. Stop training earl (before E 0 Add noise to training samples: x:t becomes x+noise:t (for binar/bipolar: flip randoml selected input units

24 Variations of BP nets Adding momentum term (to speedup learning Weights update at time t+ contains the momentum of the previous updates, e.g., w then ( t + α δ z + µ w ( t, where 0 < µ < α << w ( t + t s µ t s α δ an exponentiall weighted sum of all previous updates Avoid sudden change of directions of weight update (smoothing the learning process Error is no longer monotonicall decreasing Batch mode of weight updates Weight update once per each epoch Smoothing the training sample outliers Learning independent of the order of sample presentations Usuall slower than in sequential mode ( s z ( s

25 Variations on learning rate α Give nown underrepresented samples higher rates Find the maximum safe step size at each stage of learning (to avoid overshoot the minimum E when increasing α Adaptive learning rate (delta-bar-delta method Each weight w_ has its own rate α_ w If remains in the same direction, increase α_ (E has a smooth curve in the vicinit of current W w If changes the direction, decrease α_ (E has a rough curve in the vicinit of current W E / w α δ ( t ( β ( t + β ( t ( t + z α ( t + κ ( γ α ( t α ( t if if if ( t ( t ( t ( t ( t ( t > < 0 0 0

26 delta-bar-delta also involves momentum term (of α Experimental comparison Training for XOR problem (batch mode 25 simulations: success if E averaged over 50 consecutive epochs is less than 0.04 results method simulations success Mean epochs BP ,859.8 BP with momentum ,056.3 BP with deltabar-delta

27 Other activation functions Change the range of the logistic function from (0, to (a, b x Let f ( x /( + e, r b a, η a. g( x rf ( x η is a sigmoid function with range ( a, b g'( x rf ( x( f ( x ( g( x + η( ( g( x + η( r g( x η r 2 2 ( + g( x / r In particular, for bipolar sigmoid function, we have a, b, then r 2, η g( x f ( x, and g'( x g( x( η / r g( x

28 Change the slope of the logistic function σx f ( x /( + e, f '( x σf ( x( f ( x Larger slope: quicer to move to saturation regions; faster convergence Smaller slope: slow to move to saturation regions, allows refined weight adustment σ thus has a effect similar to the learning rate α (but more drastic Adaptive slope (each node has a learned slope f σ _ in, z f ( σ z _ in. Then we have ( w αe / w αδ σ z, whereδ ( t f '( σ vi αe / v i αδ σ xi, whereδ δ σ w f '( σ σ αe / σ αδ _ in, σ αe / σ αδ i i _ in z _ in z _ in

29 Another sigmoid function with slower saturation speed f ( x 2 arctan( x, π f '( x 2 π + x For large x, is much smaller than 2 x x + x ( + e ( + e the derivative of logistic function A non-saturating function (also differentiable 2 f ( x log( + x log( x if if x x 0 < 0 if x 0 f '( x + x, then, f '( x 0 when x if x < 0 x

30 Non-sigmoid activation function Radial based function: it has a center c. f f ( x ( x > 0 for all x becomes smaller when x c becomes larger f ( x 0 when x c e.g., Gaussian function : x f ( x e, f '( x 2xf ( x 2

31 Applications of BP Nets A simple example: Learning XOR Initial weights and other parameters weights: random numbers in [-0.5, 0.5] hidden units: single laer of 4 units (A 2-4- net biases used; learning rate: 0.02 Variations tested binar vs. bipolar representation different stop criteria ( targets with ±.0 and with ± 0.8 normalizing initial weights (Nguen-Widrow Bipolar is faster than binar convergence: ~3000 epochs for binar, ~400 for bipolar Wh?

32

33 Relaxing acceptable error range ma speed up convergence ±.0 is an asmptotic limits of sigmoid function, When an output approaches ±.0, it falls in a saturation region Use ± a where 0 < a <.0 (e.g., ± 0.8 Normalizing initial weights ma also help Binar Bipolar Bipolar with targets ±0.8 Random 2, Nguen-Widrow,

34 Data compression Autoassociation of patterns (vectors with themselves using a small number of hidden units: training samples:: x:x (x has dimension n hidden units: m < n (A n-m-n net n V m W n If training is successful, appling an vector x on input units will generate the same x on output units Pattern z on hidden laer becomes a compressed representation of x (with smaller dimension m < n Application: reducing transmission cost x n V m z z m W n x sender Communication channel receiver

35 Example: compressing character bitmaps. Each character is represented b a 7 b 9 pixel bitmap, or a binar vector of dimension 63 0 characters (A J are used in experiment Error range: tight: 0. (off: 0 0.; on: loose: 0.2 (off: 0 0.2; on: Relationship between # hidden units, error range, and convergence rate (Fig. 6.7, p.304 relaxing error range ma speed up increasing # hidden units (to a point ma speed up error range: 0. hidden units: 0 # epochs 400+ error range: 0.2 hidden units: 0 # epochs 200+ error range: 0. hidden units: 20 # epochs 80+ error range: 0.2 hidden units: 20 # epochs 90+ no noticeable speed up when # hidden units increases to beond 22

36 Other applications. Medical diagnosis Input: manifestation (smptoms, lab tests, etc. Output: possible disease(s Problems: no causal relations can be established hard to determine what should be included as inputs Currentl focus on more restricted diagnostic tass e.g., predict prostate cancer or hepatitis B based on standard blood test Process control Input: environmental parameters Output: control parameters Learn ill-structured control functions

37 Stoc maret forecasting Input: financial factors (CPI, interest rate, etc. and stoc quotes of previous das (wees Output: forecast of stoc prices or stoc indices (e.g., S&P 500 Training samples: stoc maret data of past few ears Consumer credit evaluation Input: personal financial information (income, debt, pament histor, etc. Output: credit rating And man more e for successful application Careful design of input vector (including all important features: some domain nowledge Obtain good training samples: time and other cost

38 Summar of BP Nets Architecture Multi-laer, feed-forward (full connection between nodes in adacent laers, no connection within a laer One or more hidden laers with non-linear activation function (most commonl used are sigmoid functions BP learning algorithm Supervised learning (samples s:t Approach: gradient descent to reduce the total error (wh it is also called generalized delta rule Error terms at output units error terms at hidden units (wh it is called error BP Was to speed up the learning process Adding momentum terms Adaptive learning rate (delta-bar-delta Generalization (cross-validation test

39 Strengths of BP learning Great representation power Wide practical applicabilit Eas to implement Good generalization power Problems of BP learning Learning often taes a long time to converge The net is essentiall a blac box Gradient descent approach onl guarantees a local minimum error Not ever function that is representable can be learned Generalization is not guaranteed even if the error is reduced to zero No well-founded wa to assess the qualit of BP learning Networ paralsis ma occur (learning is stopped Selection of learning parameters can onl be done b trial-and-error BP learning is non-incremental (to include new training samples, the networ must be re-trained with all old and new samples

Chapter 3 Supervised learning:

Chapter 3 Supervised learning: Chapter 3 Supervised learning: Multilayer Networks I Backpropagation Learning Architecture: Feedforward network of at least one layer of non-linear hidden nodes, e.g., # of layers L 2 (not counting the

More information

Unit III. A Survey of Neural Network Model

Unit III. A Survey of Neural Network Model Unit III A Survey of Neural Network Model 1 Single Layer Perceptron Perceptron the first adaptive network architecture was invented by Frank Rosenblatt in 1957. It can be used for the classification of

More information

Backpropagation Neural Net

Backpropagation Neural Net Backpropagation Neural Net As is the case with most neural networks, the aim of Backpropagation is to train the net to achieve a balance between the ability to respond correctly to the input patterns that

More information

Multilayer Neural Networks

Multilayer Neural Networks Pattern Recognition Multilaer Neural Networs Lecture 4 Prof. Daniel Yeung School of Computer Science and Engineering South China Universit of Technolog Outline Introduction (6.) Artificial Neural Networ

More information

Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks. Cannot approximate (learn) non-linear functions

Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks. Cannot approximate (learn) non-linear functions BACK-PROPAGATION NETWORKS Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks Cannot approximate (learn) non-linear functions Difficult (if not impossible) to design

More information

y(x n, w) t n 2. (1)

y(x n, w) t n 2. (1) Network training: Training a neural network involves determining the weight parameter vector w that minimizes a cost function. Given a training set comprising a set of input vector {x n }, n = 1,...N,

More information

Lecture 5: Logistic Regression. Neural Networks

Lecture 5: Logistic Regression. Neural Networks Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture

More information

Part 8: Neural Networks

Part 8: Neural Networks METU Informatics Institute Min720 Pattern Classification ith Bio-Medical Applications Part 8: Neural Netors - INTRODUCTION: BIOLOGICAL VS. ARTIFICIAL Biological Neural Netors A Neuron: - A nerve cell as

More information

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis Introduction to Natural Computation Lecture 9 Multilayer Perceptrons and Backpropagation Peter Lewis 1 / 25 Overview of the Lecture Why multilayer perceptrons? Some applications of multilayer perceptrons.

More information

Chapter 2 Single Layer Feedforward Networks

Chapter 2 Single Layer Feedforward Networks Chapter 2 Single Layer Feedforward Networks By Rosenblatt (1962) Perceptrons For modeling visual perception (retina) A feedforward network of three layers of units: Sensory, Association, and Response Learning

More information

Supervised Learning in Neural Networks

Supervised Learning in Neural Networks The Norwegian University of Science and Technology (NTNU Trondheim, Norway keithd@idi.ntnu.no March 7, 2011 Supervised Learning Constant feedback from an instructor, indicating not only right/wrong, but

More information

AI Programming CS F-20 Neural Networks

AI Programming CS F-20 Neural Networks AI Programming CS662-2008F-20 Neural Networks David Galles Department of Computer Science University of San Francisco 20-0: Symbolic AI Most of this class has been focused on Symbolic AI Focus or symbols

More information

Backpropagation: The Good, the Bad and the Ugly

Backpropagation: The Good, the Bad and the Ugly Backpropagation: The Good, the Bad and the Ugly The Norwegian University of Science and Technology (NTNU Trondheim, Norway keithd@idi.ntnu.no October 3, 2017 Supervised Learning Constant feedback from

More information

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4 Neural Networks Learning the network: Backprop 11-785, Fall 2018 Lecture 4 1 Recap: The MLP can represent any function The MLP can be constructed to represent anything But how do we construct it? 2 Recap:

More information

Neuro-Fuzzy Comp. Ch. 4 June 1, 2005

Neuro-Fuzzy Comp. Ch. 4 June 1, 2005 Neuro-Fuzz Comp Ch 4 June, 25 4 Feedforward Multilaer Neural Networks part I Feedforward multilaer neural networks (introduced in sec 7) with supervised error correcting learning are used to approximate

More information

Neural Networks biological neuron artificial neuron 1

Neural Networks biological neuron artificial neuron 1 Neural Networks biological neuron artificial neuron 1 A two-layer neural network Output layer (activation represents classification) Weighted connections Hidden layer ( internal representation ) Input

More information

Input layer. Weight matrix [ ] Output layer

Input layer. Weight matrix [ ] Output layer MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science 6.034 Artificial Intelligence, Fall 2003 Recitation 10, November 4 th & 5 th 2003 Learning by perceptrons

More information

Deep Neural Networks (1) Hidden layers; Back-propagation

Deep Neural Networks (1) Hidden layers; Back-propagation Deep Neural Networs (1) Hidden layers; Bac-propagation Steve Renals Machine Learning Practical MLP Lecture 3 4 October 2017 / 9 October 2017 MLP Lecture 3 Deep Neural Networs (1) 1 Recap: Softmax single

More information

Slide04 Haykin Chapter 4: Multi-Layer Perceptrons

Slide04 Haykin Chapter 4: Multi-Layer Perceptrons Introduction Slide4 Hayin Chapter 4: Multi-Layer Perceptrons CPSC 636-6 Instructor: Yoonsuc Choe Spring 28 Networs typically consisting of input, hidden, and output layers. Commonly referred to as Multilayer

More information

Lecture 13 Back-propagation

Lecture 13 Back-propagation Lecture 13 Bac-propagation 02 March 2016 Taylor B. Arnold Yale Statistics STAT 365/665 1/21 Notes: Problem set 4 is due this Friday Problem set 5 is due a wee from Monday (for those of you with a midterm

More information

Artificial Neural Networks Examination, June 2005

Artificial Neural Networks Examination, June 2005 Artificial Neural Networks Examination, June 2005 Instructions There are SIXTY questions. (The pass mark is 30 out of 60). For each question, please select a maximum of ONE of the given answers (either

More information

4. Multilayer Perceptrons

4. Multilayer Perceptrons 4. Multilayer Perceptrons This is a supervised error-correction learning algorithm. 1 4.1 Introduction A multilayer feedforward network consists of an input layer, one or more hidden layers, and an output

More information

Lecture 16: Introduction to Neural Networks

Lecture 16: Introduction to Neural Networks Lecture 16: Introduction to Neural Networs Instructor: Aditya Bhasara Scribe: Philippe David CS 5966/6966: Theory of Machine Learning March 20 th, 2017 Abstract In this lecture, we consider Bacpropagation,

More information

Deep Neural Networks (1) Hidden layers; Back-propagation

Deep Neural Networks (1) Hidden layers; Back-propagation Deep Neural Networs (1) Hidden layers; Bac-propagation Steve Renals Machine Learning Practical MLP Lecture 3 2 October 2018 http://www.inf.ed.ac.u/teaching/courses/mlp/ MLP Lecture 3 / 2 October 2018 Deep

More information

Simple Neural Nets For Pattern Classification

Simple Neural Nets For Pattern Classification CHAPTER 2 Simple Neural Nets For Pattern Classification Neural Networks General Discussion One of the simplest tasks that neural nets can be trained to perform is pattern classification. In pattern classification

More information

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92 ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92 BIOLOGICAL INSPIRATIONS Some numbers The human brain contains about 10 billion nerve cells (neurons) Each neuron is connected to the others through 10000

More information

Feedforward Neural Networks

Feedforward Neural Networks Chapter 4 Feedforward Neural Networks 4. Motivation Let s start with our logistic regression model from before: P(k d) = softma k =k ( λ(k ) + w d λ(k, w) ). (4.) Recall that this model gives us a lot

More information

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009 AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009 SUPERVISED LEARNING We are given some training data: We must learn a function If y is discrete, we call it classification If it is

More information

Multilayer Perceptron

Multilayer Perceptron Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Single Perceptron 3 Boolean Function Learning 4

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

Multilayer Feedforward Networks. Berlin Chen, 2002

Multilayer Feedforward Networks. Berlin Chen, 2002 Multilayer Feedforard Netors Berlin Chen, 00 Introduction The single-layer perceptron classifiers discussed previously can only deal ith linearly separable sets of patterns The multilayer netors to be

More information

Bidirectional Representation and Backpropagation Learning

Bidirectional Representation and Backpropagation Learning Int'l Conf on Advances in Big Data Analytics ABDA'6 3 Bidirectional Representation and Bacpropagation Learning Olaoluwa Adigun and Bart Koso Department of Electrical Engineering Signal and Image Processing

More information

10-701/ Machine Learning, Fall

10-701/ Machine Learning, Fall 0-70/5-78 Machine Learning, Fall 2003 Homework 2 Solution If you have questions, please contact Jiayong Zhang .. (Error Function) The sum-of-squares error is the most common training

More information

Speaker Representation and Verification Part II. by Vasileios Vasilakakis

Speaker Representation and Verification Part II. by Vasileios Vasilakakis Speaker Representation and Verification Part II by Vasileios Vasilakakis Outline -Approaches of Neural Networks in Speaker/Speech Recognition -Feed-Forward Neural Networks -Training with Back-propagation

More information

Pattern Classification

Pattern Classification Pattern Classification All materials in these slides were taen from Pattern Classification (2nd ed) by R. O. Duda,, P. E. Hart and D. G. Stor, John Wiley & Sons, 2000 with the permission of the authors

More information

Data Mining Part 5. Prediction

Data Mining Part 5. Prediction Data Mining Part 5. Prediction 5.5. Spring 2010 Instructor: Dr. Masoud Yaghini Outline How the Brain Works Artificial Neural Networks Simple Computing Elements Feed-Forward Networks Perceptrons (Single-layer,

More information

Neural networks. Chapter 20. Chapter 20 1

Neural networks. Chapter 20. Chapter 20 1 Neural networks Chapter 20 Chapter 20 1 Outline Brains Neural networks Perceptrons Multilayer networks Applications of neural networks Chapter 20 2 Brains 10 11 neurons of > 20 types, 10 14 synapses, 1ms

More information

Neural Nets Supervised learning

Neural Nets Supervised learning 6.034 Artificial Intelligence Big idea: Learning as acquiring a function on feature vectors Background Nearest Neighbors Identification Trees Neural Nets Neural Nets Supervised learning y s(z) w w 0 w

More information

Multilayer Perceptrons and Backpropagation

Multilayer Perceptrons and Backpropagation Multilayer Perceptrons and Backpropagation Informatics 1 CG: Lecture 7 Chris Lucas School of Informatics University of Edinburgh January 31, 2017 (Slides adapted from Mirella Lapata s.) 1 / 33 Reading:

More information

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino Artificial Neural Networks Data Base and Data Mining Group of Politecnico di Torino Elena Baralis Politecnico di Torino Artificial Neural Networks Inspired to the structure of the human brain Neurons as

More information

Lecture 4: Perceptrons and Multilayer Perceptrons

Lecture 4: Perceptrons and Multilayer Perceptrons Lecture 4: Perceptrons and Multilayer Perceptrons Cognitive Systems II - Machine Learning SS 2005 Part I: Basic Approaches of Concept Learning Perceptrons, Artificial Neuronal Networks Lecture 4: Perceptrons

More information

CS4442/9542b Artificial Intelligence II prof. Olga Veksler. Lecture 5 Machine Learning. Neural Networks. Many presentation Ideas are due to Andrew NG

CS4442/9542b Artificial Intelligence II prof. Olga Veksler. Lecture 5 Machine Learning. Neural Networks. Many presentation Ideas are due to Andrew NG CS4442/9542b Artificial Intelligence II prof. Olga Vesler Lecture 5 Machine Learning Neural Networs Many presentation Ideas are due to Andrew NG Outline Motivation Non linear discriminant functions Introduction

More information

Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer.

Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer. University of Cambridge Engineering Part IIB & EIST Part II Paper I0: Advanced Pattern Processing Handouts 4 & 5: Multi-Layer Perceptron: Introduction and Training x y (x) Inputs x 2 y (x) 2 Outputs x

More information

ECLT 5810 Classification Neural Networks. Reference: Data Mining: Concepts and Techniques By J. Hand, M. Kamber, and J. Pei, Morgan Kaufmann

ECLT 5810 Classification Neural Networks. Reference: Data Mining: Concepts and Techniques By J. Hand, M. Kamber, and J. Pei, Morgan Kaufmann ECLT 5810 Classification Neural Networks Reference: Data Mining: Concepts and Techniques By J. Hand, M. Kamber, and J. Pei, Morgan Kaufmann Neural Networks A neural network is a set of connected input/output

More information

Lecture 4: Feed Forward Neural Networks

Lecture 4: Feed Forward Neural Networks Lecture 4: Feed Forward Neural Networks Dr. Roman V Belavkin Middlesex University BIS4435 Biological neurons and the brain A Model of A Single Neuron Neurons as data-driven models Neural Networks Training

More information

Deep Feedforward Networks. Lecture slides for Chapter 6 of Deep Learning Ian Goodfellow Last updated

Deep Feedforward Networks. Lecture slides for Chapter 6 of Deep Learning  Ian Goodfellow Last updated Deep Feedforward Networks Lecture slides for Chapter 6 of Deep Learning www.deeplearningbook.org Ian Goodfellow Last updated 2016-10-04 Roadmap Example: Learning XOR Gradient-Based Learning Hidden Units

More information

22c145-Fall 01: Neural Networks. Neural Networks. Readings: Chapter 19 of Russell & Norvig. Cesare Tinelli 1

22c145-Fall 01: Neural Networks. Neural Networks. Readings: Chapter 19 of Russell & Norvig. Cesare Tinelli 1 Neural Networks Readings: Chapter 19 of Russell & Norvig. Cesare Tinelli 1 Brains as Computational Devices Brains advantages with respect to digital computers: Massively parallel Fault-tolerant Reliable

More information

Neural Networks (Part 1) Goals for the lecture

Neural Networks (Part 1) Goals for the lecture Neural Networks (Part ) Mark Craven and David Page Computer Sciences 760 Spring 208 www.biostat.wisc.edu/~craven/cs760/ Some of the slides in these lectures have been adapted/borrowed from materials developed

More information

Last update: October 26, Neural networks. CMSC 421: Section Dana Nau

Last update: October 26, Neural networks. CMSC 421: Section Dana Nau Last update: October 26, 207 Neural networks CMSC 42: Section 8.7 Dana Nau Outline Applications of neural networks Brains Neural network units Perceptrons Multilayer perceptrons 2 Example Applications

More information

EPL442: Computational

EPL442: Computational EPL442: Computational Learning Systems Lab 2 Vassilis Vassiliades Department of Computer Science University of Cyprus Outline Artificial Neuron Feedforward Neural Network Back-propagation Algorithm Notes

More information

Multilayer Perceptrons (MLPs)

Multilayer Perceptrons (MLPs) CSE 5526: Introduction to Neural Networks Multilayer Perceptrons (MLPs) 1 Motivation Multilayer networks are more powerful than singlelayer nets Example: XOR problem x 2 1 AND x o x 1 x 2 +1-1 o x x 1-1

More information

Multilayer Perceptron

Multilayer Perceptron Aprendizagem Automática Multilayer Perceptron Ludwig Krippahl Aprendizagem Automática Summary Perceptron and linear discrimination Multilayer Perceptron, nonlinear discrimination Backpropagation and training

More information

Neural Networks Lecture 4: Radial Bases Function Networks

Neural Networks Lecture 4: Radial Bases Function Networks Neural Networks Lecture 4: Radial Bases Function Networks H.A Talebi Farzaneh Abdollahi Department of Electrical Engineering Amirkabir University of Technology Winter 2011. A. Talebi, Farzaneh Abdollahi

More information

Introduction to Neural Networks

Introduction to Neural Networks Introduction to Neural Networks Philipp Koehn 4 April 205 Linear Models We used before weighted linear combination of feature values h j and weights λ j score(λ, d i ) = j λ j h j (d i ) Such models can

More information

CSC242: Intro to AI. Lecture 21

CSC242: Intro to AI. Lecture 21 CSC242: Intro to AI Lecture 21 Administrivia Project 4 (homeworks 18 & 19) due Mon Apr 16 11:59PM Posters Apr 24 and 26 You need an idea! You need to present it nicely on 2-wide by 4-high landscape pages

More information

Artificial Neural Network : Training

Artificial Neural Network : Training Artificial Neural Networ : Training Debasis Samanta IIT Kharagpur debasis.samanta.iitgp@gmail.com 06.04.2018 Debasis Samanta (IIT Kharagpur) Soft Computing Applications 06.04.2018 1 / 49 Learning of neural

More information

An artificial neural networks (ANNs) model is a functional abstraction of the

An artificial neural networks (ANNs) model is a functional abstraction of the CHAPER 3 3. Introduction An artificial neural networs (ANNs) model is a functional abstraction of the biological neural structures of the central nervous system. hey are composed of many simple and highly

More information

Radial-Basis Function Networks

Radial-Basis Function Networks Radial-Basis Function etworks A function is radial basis () if its output depends on (is a non-increasing function of) the distance of the input from a given stored vector. s represent local receptors,

More information

Unit 8: Introduction to neural networks. Perceptrons

Unit 8: Introduction to neural networks. Perceptrons Unit 8: Introduction to neural networks. Perceptrons D. Balbontín Noval F. J. Martín Mateos J. L. Ruiz Reina A. Riscos Núñez Departamento de Ciencias de la Computación e Inteligencia Artificial Universidad

More information

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD WHAT IS A NEURAL NETWORK? The simplest definition of a neural network, more properly referred to as an 'artificial' neural network (ANN), is provided

More information

CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska. NEURAL NETWORKS Learning

CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska. NEURAL NETWORKS Learning CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS Learning Neural Networks Classifier Short Presentation INPUT: classification data, i.e. it contains an classification (class) attribute.

More information

Introduction to Neural Networks

Introduction to Neural Networks Introduction to Neural Networks Philipp Koehn 3 October 207 Linear Models We used before weighted linear combination of feature values h j and weights λ j score(λ, d i ) = j λ j h j (d i ) Such models

More information

Artificial Neural Networks Examination, March 2004

Artificial Neural Networks Examination, March 2004 Artificial Neural Networks Examination, March 2004 Instructions There are SIXTY questions (worth up to 60 marks). The exam mark (maximum 60) will be added to the mark obtained in the laborations (maximum

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Neural Networks Varun Chandola x x 5 Input Outline Contents February 2, 207 Extending Perceptrons 2 Multi Layered Perceptrons 2 2. Generalizing to Multiple Labels.................

More information

Neural Networks and Fuzzy Logic Rajendra Dept.of CSE ASCET

Neural Networks and Fuzzy Logic Rajendra Dept.of CSE ASCET Unit-. Definition Neural network is a massively parallel distributed processing system, made of highly inter-connected neural computing elements that have the ability to learn and thereby acquire knowledge

More information

C4 Phenomenological Modeling - Regression & Neural Networks : Computational Modeling and Simulation Instructor: Linwei Wang

C4 Phenomenological Modeling - Regression & Neural Networks : Computational Modeling and Simulation Instructor: Linwei Wang C4 Phenomenological Modeling - Regression & Neural Networks 4040-849-03: Computational Modeling and Simulation Instructor: Linwei Wang Recall.. The simple, multiple linear regression function ŷ(x) = a

More information

Artificial Neural Networks Examination, June 2004

Artificial Neural Networks Examination, June 2004 Artificial Neural Networks Examination, June 2004 Instructions There are SIXTY questions (worth up to 60 marks). The exam mark (maximum 60) will be added to the mark obtained in the laborations (maximum

More information

Introduction to Artificial Neural Networks

Introduction to Artificial Neural Networks Facultés Universitaires Notre-Dame de la Paix 27 March 2007 Outline 1 Introduction 2 Fundamentals Biological neuron Artificial neuron Artificial Neural Network Outline 3 Single-layer ANN Perceptron Adaline

More information

Back-Propagation Algorithm. Perceptron Gradient Descent Multilayered neural network Back-Propagation More on Back-Propagation Examples

Back-Propagation Algorithm. Perceptron Gradient Descent Multilayered neural network Back-Propagation More on Back-Propagation Examples Back-Propagation Algorithm Perceptron Gradient Descent Multilayered neural network Back-Propagation More on Back-Propagation Examples 1 Inner-product net =< w, x >= w x cos(θ) net = n i=1 w i x i A measure

More information

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others) Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward

More information

Comments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms

Comments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms Neural networks Comments Assignment 3 code released implement classification algorithms use kernels for census dataset Thought questions 3 due this week Mini-project: hopefully you have started 2 Example:

More information

Feedforward Neural Nets and Backpropagation

Feedforward Neural Nets and Backpropagation Feedforward Neural Nets and Backpropagation Julie Nutini University of British Columbia MLRG September 28 th, 2016 1 / 23 Supervised Learning Roadmap Supervised Learning: Assume that we are given the features

More information

Feed-forward Network Functions

Feed-forward Network Functions Feed-forward Network Functions Sargur Srihari Topics 1. Extension of linear models 2. Feed-forward Network Functions 3. Weight-space symmetries 2 Recap of Linear Models Linear Models for Regression, Classification

More information

BACKPROPAGATION. Neural network training optimization problem. Deriving backpropagation

BACKPROPAGATION. Neural network training optimization problem. Deriving backpropagation BACKPROPAGATION Neural network training optimization problem min J(w) w The application of gradient descent to this problem is called backpropagation. Backpropagation is gradient descent applied to J(w)

More information

Artificial Neural Networks

Artificial Neural Networks Artificial Neural Networks Threshold units Gradient descent Multilayer networks Backpropagation Hidden layer representations Example: Face Recognition Advanced topics 1 Connectionist Models Consider humans:

More information

Supervised learning in single-stage feedforward networks

Supervised learning in single-stage feedforward networks Supervised learning in single-stage feedforward networks Bruno A Olshausen September, 204 Abstract This handout describes supervised learning in single-stage feedforward networks composed of McCulloch-Pitts

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

Radial-Basis Function Networks

Radial-Basis Function Networks Radial-Basis Function etworks A function is radial () if its output depends on (is a nonincreasing function of) the distance of the input from a given stored vector. s represent local receptors, as illustrated

More information

Artificial Neural Networks 2

Artificial Neural Networks 2 CSC2515 Machine Learning Sam Roweis Artificial Neural s 2 We saw neural nets for classification. Same idea for regression. ANNs are just adaptive basis regression machines of the form: y k = j w kj σ(b

More information

CMSC 421: Neural Computation. Applications of Neural Networks

CMSC 421: Neural Computation. Applications of Neural Networks CMSC 42: Neural Computation definition synonyms neural networks artificial neural networks neural modeling connectionist models parallel distributed processing AI perspective Applications of Neural Networks

More information

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs) Multilayer Neural Networks (sometimes called Multilayer Perceptrons or MLPs) Linear separability Hyperplane In 2D: w x + w 2 x 2 + w 0 = 0 Feature x 2 = w w 2 x w 0 w 2 Feature 2 A perceptron can separate

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

COGS Q250 Fall Homework 7: Learning in Neural Networks Due: 9:00am, Friday 2nd November.

COGS Q250 Fall Homework 7: Learning in Neural Networks Due: 9:00am, Friday 2nd November. COGS Q250 Fall 2012 Homework 7: Learning in Neural Networks Due: 9:00am, Friday 2nd November. For the first two questions of the homework you will need to understand the learning algorithm using the delta

More information

Machine Learning. Neural Networks

Machine Learning. Neural Networks Machine Learning Neural Networks Bryan Pardo, Northwestern University, Machine Learning EECS 349 Fall 2007 Biological Analogy Bryan Pardo, Northwestern University, Machine Learning EECS 349 Fall 2007 THE

More information

Day 3 Lecture 3. Optimizing deep networks

Day 3 Lecture 3. Optimizing deep networks Day 3 Lecture 3 Optimizing deep networks Convex optimization A function is convex if for all α [0,1]: f(x) Tangent line Examples Quadratics 2-norms Properties Local minimum is global minimum x Gradient

More information

Artificial Neural Networks. MGS Lecture 2

Artificial Neural Networks. MGS Lecture 2 Artificial Neural Networks MGS 2018 - Lecture 2 OVERVIEW Biological Neural Networks Cell Topology: Input, Output, and Hidden Layers Functional description Cost functions Training ANNs Back-Propagation

More information

Artificial Neural Networks (ANN)

Artificial Neural Networks (ANN) Artificial Neural Networks (ANN) Edmondo Trentin April 17, 2013 ANN: Definition The definition of ANN is given in 3.1 points. Indeed, an ANN is a machine that is completely specified once we define its:

More information

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs) Multilayer Neural Networks (sometimes called Multilayer Perceptrons or MLPs) Linear separability Hyperplane In 2D: w 1 x 1 + w 2 x 2 + w 0 = 0 Feature 1 x 2 = w 1 w 2 x 1 w 0 w 2 Feature 2 A perceptron

More information

CSC 578 Neural Networks and Deep Learning

CSC 578 Neural Networks and Deep Learning CSC 578 Neural Networks and Deep Learning Fall 2018/19 3. Improving Neural Networks (Some figures adapted from NNDL book) 1 Various Approaches to Improve Neural Networks 1. Cost functions Quadratic Cross

More information

Multilayer Perceptron = FeedForward Neural Network

Multilayer Perceptron = FeedForward Neural Network Multilayer Perceptron = FeedForward Neural Networ History Definition Classification = feedforward operation Learning = bacpropagation = local optimization in the space of weights Pattern Classification

More information

Neural networks. Chapter 19, Sections 1 5 1

Neural networks. Chapter 19, Sections 1 5 1 Neural networks Chapter 19, Sections 1 5 Chapter 19, Sections 1 5 1 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural networks Chapter 19, Sections 1 5 2 Brains 10

More information

Convolutional Neural Networks

Convolutional Neural Networks Convolutional Neural Networks Books» http://www.deeplearningbook.org/ Books http://neuralnetworksanddeeplearning.com/.org/ reviews» http://www.deeplearningbook.org/contents/linear_algebra.html» http://www.deeplearningbook.org/contents/prob.html»

More information

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks INFOB2KI 2017-2018 Utrecht University The Netherlands ARTIFICIAL INTELLIGENCE Artificial Neural Networks Lecturer: Silja Renooij These slides are part of the INFOB2KI Course Notes available from www.cs.uu.nl/docs/vakken/b2ki/schema.html

More information

Vorlesung Neuronale Netze - Radiale-Basisfunktionen (RBF)-Netze -

Vorlesung Neuronale Netze - Radiale-Basisfunktionen (RBF)-Netze - Vorlesung Neuronale Netze - Radiale-Basisfunktionen (RBF)-Netze - SS 004 Holger Fröhlich (abg. Vorl. von S. Kaushik¹) Lehrstuhl Rechnerarchitektur, Prof. Dr. A. Zell ¹www.cse.iitd.ernet.in/~saroj Radial

More information

Introduction to Neural Networks

Introduction to Neural Networks CUONG TUAN NGUYEN SEIJI HOTTA MASAKI NAKAGAWA Tokyo University of Agriculture and Technology Copyright by Nguyen, Hotta and Nakagawa 1 Pattern classification Which category of an input? Example: Character

More information

Intelligent Control. Module I- Neural Networks Lecture 7 Adaptive Learning Rate. Laxmidhar Behera

Intelligent Control. Module I- Neural Networks Lecture 7 Adaptive Learning Rate. Laxmidhar Behera Intelligent Control Module I- Neural Networks Lecture 7 Adaptive Learning Rate Laxmidhar Behera Department of Electrical Engineering Indian Institute of Technology, Kanpur Recurrent Networks p.1/40 Subjects

More information

CS:4420 Artificial Intelligence

CS:4420 Artificial Intelligence CS:4420 Artificial Intelligence Spring 2018 Neural Networks Cesare Tinelli The University of Iowa Copyright 2004 18, Cesare Tinelli and Stuart Russell a a These notes were originally developed by Stuart

More information

LECTURE # - NEURAL COMPUTATION, Feb 04, Linear Regression. x 1 θ 1 output... θ M x M. Assumes a functional form

LECTURE # - NEURAL COMPUTATION, Feb 04, Linear Regression. x 1 θ 1 output... θ M x M. Assumes a functional form LECTURE # - EURAL COPUTATIO, Feb 4, 4 Linear Regression Assumes a functional form f (, θ) = θ θ θ K θ (Eq) where = (,, ) are the attributes and θ = (θ, θ, θ ) are the function parameters Eample: f (, θ)

More information

CSC 411 Lecture 10: Neural Networks

CSC 411 Lecture 10: Neural Networks CSC 411 Lecture 10: Neural Networks Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 10-Neural Networks 1 / 35 Inspiration: The Brain Our brain has 10 11

More information

C1.2 Multilayer perceptrons

C1.2 Multilayer perceptrons Supervised Models C1.2 Multilayer perceptrons Luis B Almeida Abstract This section introduces multilayer perceptrons, which are the most commonly used type of neural network. The popular backpropagation

More information