Neural Networks Task Sheet 2. Due date: May

Similar documents
4. Multilayer Perceptrons

Multilayer Perceptrons and Backpropagation

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

CSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!!

Multilayer Perceptrons (MLPs)

Statistical NLP for the Web

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

epochs epochs

CHALMERS, GÖTEBORGS UNIVERSITET. EXAM for ARTIFICIAL NEURAL NETWORKS. COURSE CODES: FFR 135, FIM 720 GU, PhD

Lab 5: 16 th April Exercises on Neural Networks

Multilayer Perceptron Tutorial

Neural Networks (Part 1) Goals for the lecture

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

COGS Q250 Fall Homework 7: Learning in Neural Networks Due: 9:00am, Friday 2nd November.

) (d o f. For the previous layer in a neural network (just the rightmost layer if a single neuron), the required update equation is: 2.

22c145-Fall 01: Neural Networks. Neural Networks. Readings: Chapter 19 of Russell & Norvig. Cesare Tinelli 1

Chapter ML:VI (continued)

Neural networks. Chapter 19, Sections 1 5 1

Deep Feedforward Networks

Artifical Neural Networks

Unit III. A Survey of Neural Network Model

Gradient Descent Training Rule: The Details

Midterm: CS 6375 Spring 2015 Solutions

Computational Intelligence Winter Term 2017/18

A Logarithmic Neural Network Architecture for Unbounded Non-Linear Function Approximation

Plan. Perceptron Linear discriminant. Associative memories Hopfield networks Chaotic networks. Multilayer perceptron Backpropagation

Computational Intelligence

Introduction to Neural Networks

Data Mining Part 5. Prediction

CSC321 Lecture 5: Multilayer Perceptrons

Artificial Intelligence

AI Programming CS F-20 Neural Networks

ESTIMATING THE ACTIVATION FUNCTIONS OF AN MLP-NETWORK

Artificial Neural Network

Feedforward Neural Nets and Backpropagation

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

Artificial Neuron (Perceptron)

Multilayer Perceptron

Learning and Neural Networks

Neural networks. Chapter 20, Section 5 1

Artificial Neural Networks Examination, June 2005

CMSC 421: Neural Computation. Applications of Neural Networks

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler

Comparison of the Complex Valued and Real Valued Neural Networks Trained with Gradient Descent and Random Search Algorithms

Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks. Cannot approximate (learn) non-linear functions

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

Deep Feedforward Networks

Neural networks. Chapter 20. Chapter 20 1

Chapter ML:VI (continued)

1 What a Neural Network Computes

Supervised Learning in Neural Networks

CSC 578 Neural Networks and Deep Learning

CS:4420 Artificial Intelligence

Multilayer Neural Networks

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Revision: Neural Network

Multilayer Perceptron

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Nonlinear Classification

Neural Networks: Basics. Darrell Whitley Colorado State University

Neural Nets Supervised learning

Introduction to Machine Learning Spring 2018 Note Neural Networks

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY

Neural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA 1/ 21

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

Lecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning

Training Multi-Layer Neural Networks. - the Back-Propagation Method. (c) Marcin Sydow

CSC 411 Lecture 10: Neural Networks

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks

y(x n, w) t n 2. (1)

ECE 471/571 - Lecture 17. Types of NN. History. Back Propagation. Recurrent (feedback during operation) Feedforward

Back-Propagation Algorithm. Perceptron Gradient Descent Multilayered neural network Back-Propagation More on Back-Propagation Examples

CSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer

Logistic Regression & Neural Networks

Machine Learning: Multi Layer Perceptrons

Artificial Neural Networks. Edward Gatt

Machine Learning (CSE 446): Neural Networks

Neural Networks. Xiaojin Zhu Computer Sciences Department University of Wisconsin, Madison. slide 1

Artificial Neural Networks

Analysis of Multilayer Neural Network Modeling and Long Short-Term Memory

Ch.6 Deep Feedforward Networks (2/3)

10 NEURAL NETWORKS Bio-inspired Multi-Layer Networks. Learning Objectives:

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

PV021: Neural networks. Tomáš Brázdil

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Neural Networks and the Back-propagation Algorithm

Midterm: CS 6375 Spring 2018

Multilayer Neural Networks

Input layer. Weight matrix [ ] Output layer

ECE521 Lectures 9 Fully Connected Neural Networks

Machine Learning (CSE 446): Backpropagation

Midterm, Fall 2003

Week 5: Logistic Regression & Neural Networks

Security Analytics. Topic 6: Perceptron and Support Vector Machine

Transcription:

Neural Networks 2007 Task Sheet 2 1/6 University of Zurich Prof. Dr. Rolf Pfeifer, pfeifer@ifi.unizh.ch Department of Informatics, AI Lab Matej Hoffmann, hoffmann@ifi.unizh.ch Andreasstrasse 15 Marc Ziegler, mziegler@ifi.unizh.ch 8050 Zürich Jonas Ruesch, ruesch@ifi.unizh.ch Neural Networks 2007 Task Sheet 2 Due date: May 4 2007 Student ID: First name: Family name Note: The purpose of this task sheet is to help you familiarize with the basic concepts that will be used throughout the class. The questions are very similar to the ones you will find in the final exam. Please write your name and student ID number at the respective fields on the title page, and most important: please try to write legibly, as we will not give you points for an answer that we cannot decipher! If you need additional sheets of paper, please staple them to the task sheets. Points: of 30

Neural Networks 2007 Task Sheet 2 2/6 Question 1 3 points Consider the following simple multilayer perceptron (MLP). Figure 1: Multi-Layer Perceptron for the OR function Compute the individual steps of the back propagation algorithm. Consider the learning rate to be =1 and sigmoid activation functions g h = 1 1 e 2 h in the nodes with gain =0.5. Calculate the complete cycle with the input pattern (0,1) and the input pattern (1,0). w 0 w 1 w 2 w 3 w 4 x 1 x 2 O h O o w 4 w 3 w 2 w 1 w 0 0.2-0.1 0 0.1 0.2 0 1 1 0 - - - - - - - Question 2 3 points Consider an MLP with two inputs, one output and one hidden layer. As learning algorithm use back-propagation and sigmoid activation functions in the nodes. Test the learning performance for networks with 1, 3 and 10 nodes in the hidden layer for the AND and the XOR (create the pattern sets as well) function and for learning rates and the momentum term according to the table below. Solve this question with the Java NN-Simulator (see our class-website). Note that the weights have to be reset before each new test run. Stop training if the total error is smaller than 0.1 or after 1000 epochs. Write down the steps and the total error for each combination. The initial weights are randomly distributed. So we recommend to run the simulation more than once to avoid misleading results.

Neural Networks 2007 Task Sheet 2 3/6 Hidden Units Learning rate momentum term steps AND total error AND 1 0.1 0.0 1000 0.25 1 0.1 0.9 1 0.8 0.0 1 0.8 0.9 3 0.1 0.0 steps XOR total error XOR 3 0.1 0.9 380 0.1 3 1.0 0.0 3 1.0 0.9 10 0.1 0.0 10 0.1 0.9 10 0.8 0.0 10 0.8 0.9 Question 3 a) 1 point, b) 1 point, c) 1 point, d) 1 point Now think of a MLP with 8 input neurons, n hidden neurons and 8 output neurons. The network should map inputs to identical outputs (i.e. input patterns and desired output patterns are the same; this is called self-supervised learning). The inputs patterns are defined to contain only one "1", all other entries are "0". What is the minimum number of hidden neurons required to solve and to learn the task? a) Solve the question by thinking. What is the minimal number of hidden neurons to solve the task? Why? b) Use the Java Simulator and verify a) by trying to learn the task and describe the results. How many hidden neurons do you need? c) How does the learning speed change with a larger number of hidden neurons? It increases. It decreases. The number of hidden neurons has no influence on the learning speed. d) What could be an application for this kind of network?

Neural Networks 2007 Task Sheet 2 4/6 Question 4 2 points What is the effect of a momentum term on the back-propagation algorithm? What are its advantages? Disadvantages? Use the 1-d error landscape shown below in your explanation (i.e. indicate the gradient descent with and without momentum) Question 5 a) 1 point, b) 1 point, c) 1 point, d) 1 point, e) 1 point, f) 1 point The back-propagation algorithm is a so-called gradient descent method. To compute the gradient you need the derivative g'(h) of the activation function g(h). a) Compute the derivatives of the following activation function: g h = 1 2 h 1 e (Sigmoid function) b) Prove that the derivative can be expressed as: g ' h =2 g h 1 g h c) Compute the derivatives of the following activation function: g h =tanh h (Hyperbolic tangent) d) Show that the derivative can be expressed as: g ' h = 1 g 2 h e) What is the relation between these two activation functions? (look at the graph of these two functions) f) Why are these especially suited for the Back propagation algorithm? Hints: tanh h = e2 h 1 e 2 h 1 d dh e ah =[e ah]'=a e ah d dh 1 f h =[ 1 f ' h ]= f h f 2 h d f g h =[ f g h ]' = f ' g h g ' h dh d f h f h f ' h g h f h g ' h =[ ]'= dh g h g h g 2 h

Neural Networks 2007 Task Sheet 2 5/6 Question 6 3 points Look at the back-propagation algorithm with the sigmoid function. What happens to the derivative when the activation in a node becomes 0 or 1 (check the result from question 5a)? What happens to the w ij? Where do you see problems and how would you solve them? Question 7 a) 1 point, b) 2points Run the Cascade Correlation applet, which is on the website. Be sure to uncheck the option for running the standard back propagation algorithm on the same problem. Clicking next will display a pull-down menu for selecting one of the pre-defined problems. The initial weights are randomly distributed. So we recommend to run the simulation more than once to avoid misleading results. Load the parity (4 bits) problem in the cascade correlation simulator with the default parameter settings. Set the score threshold parameter = 0.2 a) Try different patience parameter set to 2, 5, and then 10. What is the affect of the patience parameter upon the error curve? b) Restart the simulator and compare the cascade-correlation with the backpropagation using different parameters (i.e. with different problem sets, different learning lates etc.). What can you observe? Question 8 2 points The recruiting and training of hidden units occurs over several phases, where these phases are known as output phases. Learning continues in this fashion until network error is reduced to the extent that all output units have activations within a certain range of their targets on all training patterns. This range is a parameter called: Score threshold Activation threshold Learning rate Patience

Neural Networks 2007 Task Sheet 2 6/6 Question 9 a) 2 points, b) 2points Load the continuous XOR problem, using the same parameter settings as in question 7. a) How many hidden (recruited) neurons are required to reduce error to less than 0.08? 4 6 25 5 b) For what value of the score threshold are there 5 or more hidden nodes recruited? 0.1 0.2 0.3 0.4