CHALMERS, GÖTEBORGS UNIVERSITET. EXAM for ARTIFICIAL NEURAL NETWORKS. COURSE CODES: FFR 135, FIM 720 GU, PhD

Similar documents
CHALMERS, GÖTEBORGS UNIVERSITET. SOLUTIONS to RE-EXAM for ARTIFICIAL NEURAL NETWORKS. COURSE CODES: FFR 135, FIM 720 GU, PhD

Introduction to Neural Networks

Artificial Neural Networks Examination, March 2004

Artificial Neural Networks Examination, June 2005

Artificial Neural Networks Examination, June 2004

Multilayer Neural Networks

Plan. Perceptron Linear discriminant. Associative memories Hopfield networks Chaotic networks. Multilayer perceptron Backpropagation

Neural Networks Task Sheet 2. Due date: May

Neural Networks (Part 1) Goals for the lecture

Multilayer Perceptron

Simple Neural Nets For Pattern Classification

Neural networks. Chapter 19, Sections 1 5 1

Introduction To Artificial Neural Networks

Midterm: CS 6375 Spring 2015 Solutions

Artificial Neural Networks Examination, March 2002

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

Artificial Intelligence

Part 8: Neural Networks

Artificial neural networks

Introduction to Artificial Neural Networks

Course 395: Machine Learning - Lectures

Neural networks. Chapter 20. Chapter 20 1

Hopfield Neural Network and Associative Memory. Typical Myelinated Vertebrate Motoneuron (Wikipedia) Topic 3 Polymers and Neurons Lecture 5

Multi-layer Perceptron Networks

) (d o f. For the previous layer in a neural network (just the rightmost layer if a single neuron), the required update equation is: 2.

Neural Networks and the Back-propagation Algorithm

CSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!!

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

Machine Learning. Neural Networks

Neural networks: Unsupervised learning

CHALMERS, GÖTEBORGS UNIVERSITET. EXAM for COMPUTATIONAL BIOLOGY A. COURSE CODES: FFR 110, FIM740GU, PhD

ECE 471/571 - Lecture 17. Types of NN. History. Back Propagation. Recurrent (feedback during operation) Feedforward

Artificial Neural Networks. Q550: Models in Cognitive Science Lecture 5

Neural networks. Chapter 20, Section 5 1

Neural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA 1/ 21

4. Multilayer Perceptrons

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Feedforward Neural Nets and Backpropagation

Artificial Neural Networks. Historical description

Radial-Basis Function Networks

Artificial Neural Networks. Edward Gatt

Last update: October 26, Neural networks. CMSC 421: Section Dana Nau

Multilayer Perceptrons and Backpropagation

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Bits of Machine Learning Part 1: Supervised Learning

Artificial Neural Networks The Introduction

Perceptron. (c) Marcin Sydow. Summary. Perceptron

Using a Hopfield Network: A Nuts and Bolts Approach

Pattern Classification

Artificial Neural Networks

Artificial Neural Networks

Radial-Basis Function Networks

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

Single layer NN. Neuron Model

Classification with Perceptrons. Reading:

Administration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6

Christian Mohr

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Name (NetID): (1 Point)

18.6 Regression and Classification with Linear Models

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

Revision: Neural Network

Neural Networks. Mark van Rossum. January 15, School of Informatics, University of Edinburgh 1 / 28

A summary of Deep Learning without Poor Local Minima

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

CHALMERS, GÖTEBORGS UNIVERSITET. EXAM for DYNAMICAL SYSTEMS. COURSE CODES: TIF 155, FIM770GU, PhD

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I

Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer.

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

Lecture 7 Artificial neural networks: Supervised learning

CS:4420 Artificial Intelligence

Neural Networks Lecture 4: Radial Bases Function Networks

Slow Dynamics Due to Singularities of Hierarchical Learning Machines

Neural Networks and Fuzzy Logic Rajendra Dept.of CSE ASCET

Chapter ML:VI (continued)

Artificial Neural Network

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks

Artificial Intelligence Hopfield Networks

Simulating Neural Networks. Lawrence Ward P465A

Lecture 4: Feed Forward Neural Networks

Data Mining Part 5. Prediction

Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011!

Sample Exam COMP 9444 NEURAL NETWORKS Solutions

Reification of Boolean Logic

CSC321 Lecture 5: Multilayer Perceptrons

Multilayer Perceptron

Hopfield Neural Network

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

DM534 - Introduction to Computer Science

Security Analytics. Topic 6: Perceptron and Support Vector Machine

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

22c145-Fall 01: Neural Networks. Neural Networks. Readings: Chapter 19 of Russell & Norvig. Cesare Tinelli 1

Neural Networks. Fundamentals Framework for distributed processing Network topologies Training of ANN s Notation Perceptron Back Propagation

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Introduction to Neural Networks

Master Recherche IAC TC2: Apprentissage Statistique & Optimisation

AI Programming CS F-20 Neural Networks

Neural Networks DWML, /25

Introduction to Reinforcement Learning. CMPT 882 Mar. 18

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY

Transcription:

CHALMERS, GÖTEBORGS UNIVERSITET EXAM for ARTIFICIAL NEURAL NETWORKS COURSE CODES: FFR 135, FIM 72 GU, PhD Time: Place: Teachers: Allowed material: Not allowed: October 23, 217, at 8 3 12 3 Lindholmen-salar Bernhard Mehlig, 73-42 988 (mobile) Marina Rafajlović, 76-58 4288 (mobile), visits once at 9 3 Mathematics Handbook for Science and Engineering any other written material, calculator Maximum score on this exam: 12 points. Maximum score for homework problems: 12 points. To pass the course it is necessary to score at least 5 points on this written exam. CTH 14 passed; 17.5 grade 4; 22 grade 5, GU 14 grade G; 2 grade VG. 1. Recognition of one pattern. In the deterministic Hopfield model, the state S i of the i-th neuron is updated according to the Mc-Culloch Pitts rule S i sgn(b i ), where b i = N w ij S j. (1) j=1 Here N is the number of neurons, w ij are the weights. The weights depend on p patterns ζ (µ) = (ζ (µ) 1,..., ζ (µ) N )T stored in the network according to Hebb s rule: w ij = 1 p ζ (µ) i ζ (µ) j for i, j = 1,..., N. (2) N µ=1 Here ζ (µ) i takes values 1 or 1. a) Store the two patterns shown in Figs. 1A and B. Compute the weights w ij. (.5p) b) Apply the pattern shown in Fig. 1A to the network. Compute the network output after a single step of synchronous updating according to Eq. (1). Discuss your result. (.5p) c) Apply the pattern shown in Fig. 1B instead, and compute the network 1

A B C i = 1 i = 2 1 2 i = 1 i = 2 1 2 i = 1 i = 2 1 2 3i = 3 i = 4 i 3= 3 i = 4 4 3i = 3 i = 4 4 i = 5 i = 6 5 6 i = 5 i = 6 5 6 i = 5 i = 6 5 6!!!!!!!!!!!!!!!!!!!!!!!!! Figure 1: Question 1. Panel A shows pattern ζ (1). The pattern has N = 6 bits. Black corresponds to ζ (1) i = 1, and white to ζ (1) i = 1. Panel B shows pattern ζ (2), and panel C shows a perturbed version of input patterns. output after a single step of synchronous updating according to Eq. (1). Discuss your result. (.5p) d) Now apply the pattern shown in Fig. 1C instead. Compute the network output after a single step of synchronous updating according to Eq. (1). Discuss your result. (.5p) 2. Gradient-descent algorithm in supervised learning. To train a simple perceptron using a gradient-descent algorithm one needs update formulae for the weights and thresholds in the network. a) Derive these update formulae for the network shown in Fig. 2 using a stochastic path through the weight space, and assuming constant and small learning rate η, no momentum, and no weight decay. The weights are denoted by w i where i = 1, 2, 3. The threshold corresponding to the output unit is denoted by θ, and the activation function by g( ). The target value for input pattern ξ (µ) = (ξ (µ) 1, ξ (µ) 2, ξ (µ) 3 ) T is ζ (µ), and the network output is O (µ) = g( 3 i=1 w iξ (µ) i θ). The energy function is H = 1 p 2 µ=1 where p denotes the total number of input patterns. (1p) ( ζ (µ) O (µ) ) 2, b) Now repeat the task 2a) but implementing weight decay according to the gradient descent of the following energy function: H = 1 2 p ( ) 2 ζ (µ) O (µ) γ + 2 µ=1 3 wi 2. (3) Here γ is a positive constant. Discuss differences to the update rule you derived in 2a). What is an advantage of this update rule over the update 2 i=1

w i Figure 2: Question 2. output unit. ξ (µ) i O (µ) Simple perceptron with three input units and one rule in 2a)? (.5p) c) Now repeat the task 2b) but using the following energy function instead: H = 1 2 p ( ) 2 ζ (µ) O (µ) γ + 2 µ=1 3 i=1 w 2 i 1 + w 2 i. (4) Here γ is a positive constant. Discuss differences to the update rules you derived in 2a) and 2b). What is an advantage of this update rule over the update rule in 2b)? (.5p) 3. Multilayer perceptron. Consider the problem in Table 1. Here, µ = 1, 2,..., 11 is the index of input pattern ξ (µ) = (ξ (µ) 1, ξ (µ) 2 ) T, and ζ (µ) is the corresponding target output of this input pattern. The problem is illustrated geometrically in Fig. 3 where patterns with ζ (µ) = are denoted by white circles, and patterns with ζ (µ) = 1 are denoted by black circles. a) Can this problem be solved by a simple perceptron with 2 input units, and a single output unit. Why? (.5p) b) This problem can be solved by a multilayer perceptron with 2 input units (i = 1, 2), 3 hidden units ξ (µ) i V (µ) k and one output unit Here ( 2 = θ i=1 w ki ξ (µ) i θ k ), where k = 1, 2, 3, (5) ( 3 O (µ) = θ W k V (µ) k θ(x) = k=1 {, if x <, 1, if x > ) Θ. (6) is the Heaviside step function, and w ki, and W k are weights for the hidden, and the output layer, respectively. Finally, θ k and Θ are the thresholds 3 (7)

assigned to the hidden units, and to the output unit, respectively. One possibility to solve the problem is illustrated in Fig. 3 where the three lines (solid, dashed, and dash-dotted line) are determined by weights w ki and thresholds θ k assigned to the three hidden units in the hidden layer. Compute w ki and θ k corresponding to the lines shown in Fig. 3. Note that the point where the dashed and dash-dotted lines intersect has the following coordinates (.5,.8) T. (.5p) in the transformed (hid- c) For each pattern ξ (µ) write its coordinates V (µ) k den) space. (.5p) d) Now illustrate graphically the problem in this transformed space. Is the problem represented in this transformed space linearly separable? If yes, illustrate a possible solution to the problem in this space. (.5p) e) Compute the corresponding weights W k and the threshold Θ corresponding to the solution you illustrated graphically in 3d). (.5p) µ ξ (µ) 1 ξ (µ) 2 ζ (µ) 1.1.95 2.2.85 3.2.9 4.3.75 1 5.4.65 1 6.4.75 1 7.6.45 8.8.25 9.1.65 1 1.2.75 1 11.7.2 1 Table 1: Question 3. Inputs and target values for the problem in question 3. 4. Oja s learning rule. The aim of unsupervised learning is to construct a network that learns the properties of a distribution of input patterns ξ = (ξ 1,..., ξ N ) T. Consider a network with one linear output-unit ζ: ζ = N w i ξ i. (8) i=1 Under Oja s learning rule w i w i + δw i, where δw i = ηζ(ξ i ζw i ) and η >, (9) the weight vector w converges to a steady state w = (w1,..., wn )T. 4

1.9.8.7.6 ξ 2.5.4.3.2.1.2.4.6.8 1 ξ 1 Figure 3: Question 3. Input patterns for the problem in question 3. The target output for each pattern µ is either ζ (µ) = (white circles) or ζ (µ) = 1 (black circles). The three lines illustrate a solution to the problem by a multilayer perceptron (details in task 3b). a) Compute the steady-state weight vector w for the input patterns shown in Fig. 4. Determine the maximal principal-component direction of the input data. Discuss: how is the steady-state weight vector w in this case related to the maximal principal-component direction of the input data? (.5p) b) Now compute the steady-state weight vector w for the input patterns shown in Fig. 5 instead. Determine the maximal principal-component direction of the input data. Discuss: how is the steady-state weight vector w in this case related to the maximal principal-component direction of the input data? Compare your findings to those obtained in task 4a). Discuss. (1p) 5. Kohonen algorithm. The update rule for a Kohonen network is: w ij w ij + δw ij, where δw ij = ηλ(i, i )(ξ j w ij ). (1) Here w ij (i = 1,..., M, and j = 1,..., N) denotes the weight connecting the i-th output neuron to the j-th input neuron, η > is the learning rate, ξ = (ξ 1,..., ξ N ) T is an N-dimensional input pattern chosen uniformly at random out of p input patterns, and Λ(i, i ) = exp( r i r i 2 2σ 2 ) (11) is a neighbouring function with width σ >. In Eq. (11), r i denotes the position of the i-th output neuron in the output array. Finally, i denotes 5

1 1 ξ (6) = (.5,.5 3) T ξ 2.5 ξ (5) = (.3,.3 3) T ξ (4) = (.1,.1 3) T ξ (3) = (.1,.1 3) T.5 ξ (2) = (.3,.3 3) T 1.5.3.1.1.3.5.5.3.1.1.3.5 ξ 1 ξ (1) = (.5,.5 3) T Figure 4: Question 4a). Input patterns ξ (1),..., ξ (6) in input space (ξ 1, ξ 2 ) T. 1 1 ξ (6) = (.5,.5 3) T ξ 2.5 ξ (5) = (.7,.3 3) T ξ (4) = (.9,.1 3) T ξ (3) = (1.1,.1 3) T.5 ξ (2) = (1.3,.3 3) T ξ (1) = (1.5,.5 3) T 1.5.7.9 1 1.1 1.3 1.5 Figure 5: Question 4b). Input patterns ξ (1),..., ξ (6) in input space (ξ 1, ξ 2 ) T. ξ 1 6

the index of the winning output neuron for the input pattern ξ, i.e the neuron satisfying N (w i j ξ j ) 2 j=1 N (w ij ξ j ) 2, for all i = 1,..., M. (12) j=1 a) Explain and discuss the implementation of Kohonen s algorithm in a computer program. In the discussion refer to and explain the terms listed next. (Un)supervised learning. Explain the difference between supervised and unsupervised learning, and state whether a Kohonen network is used for supervised or unsupervised learning. Initialisation of the algorithm. Output array. Neighbourhood function. Discuss the limit of σ. Ordering phase and convergence phase. Explain differences between these two phases: what is the aim of the former, and what is the aim of the latter phase? How are these aims achieved in practice, in terms of the parameters for the learning rate and the width of the neighbourhood function? (You are not asked to suggest specific values for these parameters, but rather to explain whether or not these parameters are kept constant during learning in the different phases and, if not, how are they usually set to change during learning.) Your answer must not be longer than one A4 page. (1p) b) Assume that the input data ξ = (ξ 1, ξ 2 ) T to a Kohonen network is uniformly distributed in two dimensions within the area shown by the black region in Fig. 6, and zero outside of this region. Furthermore, assume that the output array in this Kohonen network is one-dimensional, and that the number of output units (M) is large, yet much smaller than the total number of input patterns. Illustrate the algorithm described in 5a) by schematically drawing the weight vectors w i = (w i1, w i2 ) T for i = 1,..., M in the input space at the start of learning, at the end of the ordering phase, and at the end of the convergence phase. (1p) 6. Deep learning. The parity function outputs 1 if and only if the input sequence of n binary numbers has an odd number of ones, and zero otherwise. The parity function for n = 2 is also known as the Boolean XOR function. a) The XOR function can be represented using a multilayer perceptron with an input layer of size 2, a fully connected hidden layer of size 2, an output layer of size 1, and by applying the Heaviside activation function { 1, for x >, and θ(x) = (13), for x < 7

1 ξ 2.5.5 1 1.5 ξ 1 Figure 6: Question 5b). The pattern coloured black depicts the area inside which points are uniformly distributed. The properties of this distribution are to be learned by the Kohonen network, as explained in question 5b). in all layers. Determine suitable weight vectors W i and thresholds Θ i for the two hidden nodes (i = 1, 2), as well as the weight vector w 1 and the threshold θ 1 for the output node that represent the XOR problem. (.5p) b) Illustrate the problem graphically in the input space, and indicate the planes determined by the weight vectors W i and thresholds Θ i that you determined in 6a). In a separate graph, illustrate the transformed input data in the hidden space and denote the plane determined by the weight vector w 1 and the threshold θ 1 that you determined in 6a). (.5p) c) Describe how you can combine several of the small XOR multilayer perceptrons analysed in 6a)-6b) to create a deep network that computes the parity function for n > 2. Explain how the total number of nodes in the network grows with the size n of the input. (1p) 8