CHALMERS, GÖTEBORGS UNIVERSITET. SOLUTIONS to RE-EXAM for ARTIFICIAL NEURAL NETWORKS. COURSE CODES: FFR 135, FIM 720 GU, PhD

Similar documents
CHALMERS, GÖTEBORGS UNIVERSITET. EXAM for ARTIFICIAL NEURAL NETWORKS. COURSE CODES: FFR 135, FIM 720 GU, PhD

EEE 241: Linear Systems

Multilayer Perceptron (MLP)

Multilayer Perceptrons and Backpropagation. Perceptrons. Recap: Perceptrons. Informatics 1 CG: Lecture 6. Mirella Lapata

Multilayer neural networks

Lecture Notes on Linear Regression

Multi-layer neural networks

Lecture 3. Ax x i a i. i i

Multigradient for Neural Networks for Equalizers 1

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z )

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

Math1110 (Spring 2009) Prelim 3 - Solutions

Admin NEURAL NETWORKS. Perceptron learning algorithm. Our Nervous System 10/25/16. Assignment 7. Class 11/22. Schedule for the rest of the semester

CSE 252C: Computer Vision III

Internet Engineering. Jacek Mazurkiewicz, PhD Softcomputing. Part 3: Recurrent Artificial Neural Networks Self-Organising Artificial Neural Networks

10-701/ Machine Learning, Fall 2005 Homework 3

Lecture 12: Discrete Laplacian

Dynamic Systems on Graphs

However, since P is a symmetric idempotent matrix, of P are either 0 or 1 [Eigen-values

Non-linear Canonical Correlation Analysis Using a RBF Network

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

DUE: WEDS FEB 21ST 2018

Differentiating Gaussian Processes

Lecture 10 Support Vector Machines. Oct

Hopfield networks and Boltzmann machines. Geoffrey Hinton et al. Presented by Tambet Matiisen

n α j x j = 0 j=1 has a nontrivial solution. Here A is the n k matrix whose jth column is the vector for all t j=0

1 Derivation of Point-to-Plane Minimization

This model contains two bonds per unit cell (one along the x-direction and the other along y). So we can rewrite the Hamiltonian as:

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Chapter Newton s Method

The Prncpal Component Transform The Prncpal Component Transform s also called Karhunen-Loeve Transform (KLT, Hotellng Transform, oregenvector Transfor

Singular Value Decomposition: Theory and Applications

763622S ADVANCED QUANTUM MECHANICS Solution Set 1 Spring c n a n. c n 2 = 1.

Hidden Markov Models & The Multivariate Gaussian (10/26/04)

σ τ τ τ σ τ τ τ σ Review Chapter Four States of Stress Part Three Review Review

= = = (a) Use the MATLAB command rref to solve the system. (b) Let A be the coefficient matrix and B be the right-hand side of the system.

p 1 c 2 + p 2 c 2 + p 3 c p m c 2

Important Instructions to the Examiners:

1 Convex Optimization

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Supplemental document

1 Matrix representations of canonical matrices

Introduction to the Introduction to Artificial Neural Network

For all questions, answer choice E) NOTA" means none of the above answers is correct.

Lecture 10 Support Vector Machines II

Linear Approximation with Regularization and Moving Least Squares

The exam is closed book, closed notes except your one-page cheat sheet.

BOUNDEDNESS OF THE RIESZ TRANSFORM WITH MATRIX A 2 WEIGHTS

Quantum Mechanics I - Session 4

Report on Image warping

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

Neural Networks. Perceptrons and Backpropagation. Silke Bussen-Heyen. 5th of Novemeber Universität Bremen Fachbereich 3. Neural Networks 1 / 17

Research Article Green s Theorem for Sign Data

APPENDIX A Some Linear Algebra

Support Vector Machines. Vibhav Gogate The University of Texas at dallas

15 Lagrange Multipliers

14 Lagrange Multipliers

10.34 Fall 2015 Metropolis Monte Carlo Algorithm

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /

Chapter 8 Indicator Variables

Composite Hypotheses testing

Pattern Classification

ρ some λ THE INVERSE POWER METHOD (or INVERSE ITERATION) , for , or (more usually) to

Generalized Linear Methods

332600_08_1.qxp 4/17/08 11:29 AM Page 481

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

More metrics on cartesian products

8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS

Supporting Information

Review: Fit a line to N data points

Solutions to exam in SF1811 Optimization, Jan 14, 2015

CSC321 Tutorial 9: Review of Boltzmann machines and simulated annealing

Homework Assignment 3 Due in class, Thursday October 15

xp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ

Maximal Margin Classifier

Causal Diamonds. M. Aghili, L. Bombelli, B. Pilgrim

Evaluation of classifiers MLPs

MATH 241B FUNCTIONAL ANALYSIS - NOTES EXAMPLES OF C ALGEBRAS

Feb 14: Spatial analysis of data fields

Decision Boundary Formation of Neural Networks 1

Support Vector Machines CS434

Please initial the statement below to show that you have read it

The equation of motion of a dynamical system is given by a set of differential equations. That is (1)

Week 5: Neural Networks

Norms, Condition Numbers, Eigenvalues and Eigenvectors

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)

Eigenvalues of Random Graphs

CSCE 790S Background Results

CHAPTER-5 INFORMATION MEASURE OF FUZZY MATRIX AND FUZZY BINARY RELATION

FUZZY FINITE ELEMENT METHOD

The Order Relation and Trace Inequalities for. Hermitian Operators

Math 217 Fall 2013 Homework 2 Solutions

Model of Neurons. CS 416 Artificial Intelligence. Early History of Neural Nets. Cybernetics. McCulloch-Pitts Neurons. Hebbian Modification.

LECTURE 9 CANONICAL CORRELATION ANALYSIS

Week3, Chapter 4. Position and Displacement. Motion in Two Dimensions. Instantaneous Velocity. Average Velocity

The Expectation-Maximization Algorithm

The Geometry of Logit and Probit

χ x B E (c) Figure 2.1.1: (a) a material particle in a body, (b) a place in space, (c) a configuration of the body

MATH 567: Mathematical Techniques in Data Science Lab 8

Problem Points Score Total 100

Transcription:

CHALMERS, GÖTEBORGS UNIVERSITET SOLUTIONS to RE-EXAM for ARTIFICIAL NEURAL NETWORKS COURSE CODES: FFR 35, FIM 72 GU, PhD Tme: Place: Teachers: Allowed materal: Not allowed: January 2, 28, at 8 3 2 3 SB Multsal Bernhard Mehlg, 73-42 988 (moble Johan Fres, 7-37 272 (moble, vsts once at 9 Mathematcs Handbook for Scence and Engneerng Any other wrtten materal, calculator Maxmum score on ths exam: 2 ponts. Maxmum score for homework problems: 2 ponts. To pass the course t s necessary to score at least 5 ponts on ths wrtten exam. CTH 4 passed; 7.5 grade 4; 22 grade 5, GU 4 grade G; 2 grade VG.. Recognton of one pattern. a Defne Q (µ,ν = =42 = ζ (µ ζ (ν. ( = ζ (ν, and wth f ζ (µ The bt constrbutes wth + to Q (µ,ν f ζ (µ. Snce the number of bts are 42, we have Q (µ,ν = 42 2 H (µ,ν, where ζ (ν H (µ,ν s the number of bts that are dfferent n pattern µ and pattern ν (the Hammng dstance. We fnd: H (, = Q (, = 42 H (,2 = Q (,2 = 22 H (,3 = 2 Q (,3 = 38 H (,4 = 42 Q (,4 = 42 H (,5 = 2 Q (,5 =

H (2, = H (,2 = Q (2, = 22 H (2,2 = Q (2,2 = 42 H (2,3 = Q (2,3 = 22 H (2,4 = 42 H (2, = 32 Q (2,4 = 22 H (2,5 = Q (2,5 = 2 b We have that b (ν = = 42 ζ( From a, we have that: b ( b (2 b (3 b (4 b (5 w ζ (ν = ζ ( ζ (ν ( ζ ( ζ ( + ζ (2 ζ (2 ζ (ν 42 + 42 ζ(2 ζ (2 ζ (ν = 42 ζ( Q (,ν + 42 ζ(2 Q (2,ν. (2 = Q(, 42 ζ( + Q(2, 42 ζ(2 = ζ ( + 22 42 ζ(2, = Q(,2 42 ζ( + Q(2,2 42 ζ(2 = 22 42 ζ( + ζ (2, = Q(,3 42 ζ( + Q(2,3 42 ζ(2 = 38 42 ζ( + 22 42 ζ(2, = Q(,4 42 ζ( + Q(2,4 42 ζ(2 = ζ ( 22 42 ζ(2, = Q(,5 42 ζ( + Q(2,5 42 ζ(2 = 2 42 ζ(2. (3 c We have that From b, we fnd that: ζ ( sgn(b ( = ζ (, ζ (2 sgn(b (2 = ζ (2, ζ (3 sgn(b (3 = ζ (, ζ (4 sgn(b (4 = ζ ( = ζ (4, Thus patterns ζ (, ζ (2 and ζ (4 are stable. ζ (5 sgn(b (5 = ζ (2. (4 2

2. Lnearly nseparable problem. a In the fgure below ξ (A and ξ (B are to have output and ξ (C and ξ (D are to have output. There s no straght lne that can separate patterns ξ (A and ξ (B from patterns ξ (C and ξ (D. b The trangle corners are: [ ] 4 ξ ( = ξ (2 = [ ] 4 ξ (3 = [ ]. (5 3 Let v = at ξ ( and ξ (2. Ths mples = w ξ ( + w 2 ξ ( 2 θ = 4w θ = w ξ (2 + w 2 ξ (2 2 θ = 4w w 2 θ θ = 4w and w 2 = 4w θ = 8w. (6 We choose w =, w 2 = 8 and θ = 4. 3

Let v 2 = at ξ (2 and ξ (3. Ths mples = w2 ξ (2 + w 22 ξ (2 2 θ 2 = 4w 2 w 22 θ 2 = w 2 ξ (3 + w 22 ξ (3 2 θ 2 = 3w 22 θ 2 w 22 = 4w 2 θ 2 = 4w 2 3w 22 w 22 = w 2 and θ 2 = 3w 22. (7 We choose w 2 = w 22 = and θ 2 = 3. Let v 3 = at ξ (3 and ξ (. Ths mples = w 3 ξ (3 + w 32 ξ (3 2 θ 3 = 3w 32 θ 3 = w 3 ξ ( + w 32 ξ ( 2 θ 3 = 4w 3 θ 3 3w 32 = 4w 3 and θ 3 = 3w 32. (8 We choose w 32 = 4, w 3 = 3 and θ 3 = 2. In summary: 8 4 w = and θ = 3. (9 3 4 2 The orgn maps to 4 v = H (w θ = H 3 =. ( 2 We know that the orgn maps to v = [,, ] T and that the hdden neurons change values at the dashed lnes: 4

Thus we can conclude that the regons n nput space maps to these regons n the hdden space: We want v = [,, ] T to map to and all other possble values of v to map to. The hdden space can be llustrated as ths: W must be normal to the plane passng through the crosses n the pcture 5

above. Also, W ponts to v = [,, ] T from v = [,, ] T. We may choose W = =. ( We know that the pont v = [/2,, ] T les on the decson boundary we are lookng for. So /2 W T T = T = 2. (2 3. Backpropagaton. a Let N m denote the number of weghts w (m. Let n m denote the number of hdden unts v (m,µ for =,..., L, let n denote the number of nput unts and let n L denote the number of output unts. Fnd that the number of weghts are L N m = m= and that the number of thresholds are L n m n m, (3 m= L n m. (4 m= 6

b v (m,µ = w (p =g ( =g ( ( g ( = g ( ( θ (m + w (m v (m,µ w (m v (m,µ. (5 Usng that p < m, we fnd: v (m,µ =g ( w (m v (m,µ. (6 w (p c From b, we have: But snce p = m, we fnd: v (m,µ = g ( d We have w (L 2 v (m,µ = g ( w (L 2 δ q δ r v (m,µ = g ( + δw (L 2, where δw (L 2 = η H w (L 2 w (m v (m,µ. (7 δ q v (m,µ r. (8. (9 We derve the energy functon: H w (L 2 From 3b and 3c, we have: v (m,µ ( g = ( g = w (L 2 2 = µ µ ( O (µ ζ (µ ( 2 O (µ ζ (µ O (µ w (L 2 w(m v (m,µ f p < m w (p δ q v r (m,µ f p = m. (2 (2 7

Defne v (L,µ = O (µ. We have: O (µ w (L 2 = v(l,µ w (L 2 Insert from eq. (2. Use that L 2 < L. } =g ( b (L,µ w (L v (L,µ w (L 2 Insert from eq. (2. Use that L 2 < L. } ( ( =g b (L,µ w (L g b (L,µ w (L v (L 2,µ k k w (L 2 Insert from eq. (2. Use that L 2 = L 2. } ( ( ( =g b (L,µ w (L g b (L,µ w (L k g k =g ( b (L,µ w (L g ( b (L,µ k k w (L q b (L 2,µ δ qk v (L 3,µ r g ( b (L 2,µ q v (L 3,µ r. (22 The update rule s eq. (9 wth the dervatve of the energy functon gven by eqs. (2 and (22. 4. True/False questons. Indcate whether the followng statements are true or false. 3-4 correct answers gve 2 ponts, -2 correct answers gve.5 ponts, 9- correct answers gves pont and, 8 correct answers gve.5 ponts and -7 correct answers gve zero ponts. (2 p. You need access to the state of all neurons n a multlayer perceptron when updatng all weghts through backpropagaton. TRUE (the update of a weght n layer depends on the value of the neuron n the layer before. 2. Consder the Hopfeld network. If a pattern s stable t must be an egenvector of the weght matrx.false (due to the step-functon. 3. If you store two orthogonal patterns n a Hopfeld network, they wll always turn out unstable. FALSE (the crosstalk term s zero. 4. Kohonens algorthm learns convex dstrbutons better than concave ones. TRUE (concave corners can cause problems. 5. The number of N-dmensonal Boolean functons s 2 N. FALSE (t s 2 (2N. 6. The weght matrces n a perceptron are symmetrc. FALSE (they may not even be square matrces. 8

7. Usng g(b = b as actvaton functon and puttng all thresholds to zero n a multlayer perceptron, allows you to solve some lnearly nseparable problems. FALSE (you have effectvely one weght matrx that s the product of all your orgnal ones. 8. You need at least four radal bass functons for the XOR-problem to be lnearly separable n the space of the radal bass functons. FALSE (two are enough. 9. Consder p > 2 patterns unformly dstrbuted on a crcle. None of the egenvalues of the covarance matrx of the patterns s zero. TRUE (zero egenvalue ndcates patterns on a lne.. Even f the weght vector n Oa s rule equals ts stable steady state at one teraton, t may change n the followng teratons. TRUE (t s only a statstcally steady state.. If your Kohonen network s supposed to learn the dstrbuton P (ξ, t s mportant to generate the patterns ξ (µ before you start tranng the network. FALSE (tranng your network does not affect whch pattern you draw from your dstrbuton. 2. All one-dmensonal Boolean problems are lnearly separable. TRUE (two dfferent ponts can always be separated by a lne. 3. In Kohonen s algorthm, the neurons have fxed postons n the output space. TRUE (t s the weghts, n the nput space, that are updated. 4. Some elements of the covarance matrx are varances. TRUE (the dagonal elements. 9

5. Oa s rule. a Insert ξξ T = C s a matrx, so: = δw = ηζ (ξ ζw ξζ = ζζw. (23 ζ = ξ T w = w T ξ : (24 = δw ξξ T w = w T ξξ T ww ξξ T w = w T ξξ T ww. (25 = δw Cw = w T Cww. (26 We see that δw = mples that w s an egenvector of C wth egenvalue λ = w T Cw: (note that w T w = w w. λ = w T Cw = w T λw = λw T w w T w =. (27 b Are the patterns centered? 5 ξ (µ = 6 2 + + + 5 = (28 µ= 5 µ= ξ (µ 2 = 5 4 + 2 + 3 + 4 =. (29 So ξ =, and the patterns are centered. Ths means that the covarance matrx s: C = 5 ξ (µ ξ (µ T = ξξ T. (3 5 We have µ= [ ] 36 3 ξ ( ξ ( T = 3 25 [ ] 4 8 ξ (2 ξ (2 T = 8 6 [ ] 4 4 ξ (3 ξ (3 T = 4 4 [ ] 3 ξ (4 ξ (4 T = 3 9 [ ] 25 2 ξ (5 ξ (5 T =. (3 2 6

We compute the elements of C: 5C =36 + 4 + 4 + + 25 = 7 5C 2 =5C 2 = 3 + 8 + 4 + 3 + 2 = 65 5C 22 =25 + 6 + 4 + 9 + 6 = 7. (32 We fnd that C = [ ] 4 3. (33 3 4 Maxmal egenvalue: = 4 λ 3 3 4 λ = (4 λ2 3 2 = λ 2 28λ + 4 2 3 2 = λ 2 28λ + 27 λ = 4 ± 4 2 27 = 4 ± 69 = 4 ± 3 λ max = 27. (34 Egenvector u: [ ] [ ] 4 3 u = 27 3 4 u 2 [ u u 2 ] u = u 2. (35 So an egenvector correspongng to the largest egenvalue of C s gven by [ ] u = t (36 for an arbtrary t. Ths s the prncpal component.

6. General Boolean problems. There was a typo n Eqn. (8 of the exam. The correct equaton s: v (µ f θ + = w ξ (µ > f θ + w ξ (µ. a The soluton uses w = ξ (. Ths means that the th row of the weght matrx w s a vector w ( = ξ ( : From the fgure above, we see that: w (T ξ ( = + + = 3. w (T ξ (µ = + = for µ =, 4 and 3. w (T ξ (µ = = for µ = 2, 5 and 7. w (T ξ (6 = = 3. Usng that θ = 2, we note that: w (T ξ ( θ s > f = µ < f µ. (37 So we have: v (,µ = f = µ f µ. (38 We can understant that the corner µ of the cube of possble nputs s separated from the other corners by that t assgns to the µ th hdden neuron and to the others. 2

From Fgure 4 n the exam, we see that there are exactly 4 of the 8 possble nputs ξ (µ that are to be mapped to O µ =. These are ξ (µ for µ = 2, 4, 5 and 7. These nputs wll assgn, respectvely: v (2 =, v (4 =, v (5 =, and v (7 =. (39 The weghts W are now to detect these and only these patterns, so that O (µ = W T w = for µ 2, 4, 5, 7} for µ, 3, 6, 8}. (4 Ths s acheved by lettng: W =. (4 3

b The soluton n 6a mples separatng each corner ξ (µ of the cube of nput patterns by lettng v (,µ f = µ =. (42 f µ Thus the soluton requres 2 3 = 8 hdden neurons. The analogous soluton n 2D s to separate each corner of a square, and t requres 2 N = 2 2 = 4 neurons. The decson boundares of the hdden neurons are shown here: 4