Matrix representation of a Neural Network

Similar documents
Modelling Salmonella transfer during grinding of meat

Lidar calibration What s the problem?

A Note on Powers in Finite Fields

The dipole moment of a wall-charged void in a bulk dielectric

An Equivalent Source Method for Modelling the Lithospheric Magnetic Field Using Satellite and Airborne Magnetic Data

Reducing Uncertainty of Near-shore wind resource Estimates (RUNE) using wind lidars and mesoscale models

Optimizing Reliability using BECAS - an Open-Source Cross Section Analysis Tool

Kommerciel arbejde med WAsP

Accelerating interior point methods with GPUs for smart grid systems

Numerical modeling of the conduction and radiation heating in precision glass moulding

Global Wind Atlas validation and uncertainty

Reflection of sound from finite-size plane and curved surfaces

Steady State Comparisons HAWC2 v12.2 vs HAWCStab2 v2.12

On numerical evaluation of two-dimensional phase integrals

Optimal acid digestion for multi-element analysis of different waste matrices

The Wien Bridge Oscillator Family

Dynamic Effects of Diabatization in Distillation Columns

Validation of Boundary Layer Winds from WRF Mesoscale Forecasts over Denmark

Radioactivity in the Risø District January-June 2014

Radioactivity in the Risø District July-December 2013

Realtime 3D stress measurement in curing epoxy packaging

Single Shooting and ESDIRK Methods for adjoint-based optimization of an oil reservoir

Sound absorption properties of a perforated plate and membrane ceiling element Nilsson, Anders C.; Rasmussen, Birgit

Aalborg Universitet. Lecture 14 - Introduction to experimental work Kramer, Morten Mejlhede. Publication date: 2015

Determination of packaging induced 3D stress utilizing a piezocoefficient mapping device

Design of Robust AMB Controllers for Rotors Subjected to Varying and Uncertain Seal Forces

Generating the optimal magnetic field for magnetic refrigeration

Impact of the interfaces for wind and wave modeling - interpretation using COAWST, SAR and point measurements

Room Allocation Optimisation at the Technical University of Denmark

Simulating wind energy resources with mesoscale models: Intercomparison of stateof-the-art

Prediction of airfoil performance at high Reynolds numbers.

A note on non-periodic tilings of the plane

Validity of PEC Approximation for On-Body Propagation

The colpitts oscillator family

Dynamic Asset Allocation - Identifying Regime Shifts in Financial Time Series to Build Robust Portfolios

Crack Tip Flipping: A New Phenomenon yet to be Resolved in Ductile Plate Tearing

The M/G/1 FIFO queue with several customer classes

Design possibilities for impact noise insulation in lightweight floors A parameter study

Comparisons of winds from satellite SAR, WRF and SCADA to characterize coastal gradients and wind farm wake effects at Anholt wind farm

Potential solution for rain erosion of wind turbine blades

Benchmarking of optimization methods for topology optimization problems

Application of mesoscale models with wind farm parametrisations in EERA-DTOC

Reflection and absorption coefficients for use in room acoustic simulations

Multiple Scenario Inversion of Reflection Seismic Prestack Data

Diffusion of sulfuric acid in protective organic coatings

Graphs with given diameter maximizing the spectral radius van Dam, Edwin

Plane Wave Medical Ultrasound Imaging Using Adaptive Beamforming

Wake decay constant for the infinite wind turbine array Application of asymptotic speed deficit concept to existing engineering wake model

Validation and comparison of numerical wind atlas methods: the South African example

Space charge fields in DC cables

Exposure Buildup Factors for Heavy Metal Oxide Glass: A Radiation Shield

Berøringsløs optisk måling af gassammensætning ved UV og IR spektroskopi

Vindkraftmeteorologi - metoder, målinger og data, kort, WAsP, WaSPEngineering,

On the Einstein-Stern model of rotational heat capacities

Measurements of the angular distribution of diffuse irradiance

Proving in the Isabelle Proof Assistant that the Set of Real Numbers is not Countable

A beam theory fracture mechanics approach for strength analysis of beams with a hole

Tilburg University. Strongly Regular Graphs with Maximal Energy Haemers, W. H. Publication date: Link to publication

Battling Bluetongue and Schmallenberg virus Local scale behavior of transmitting vectors

Higher-order spectral modelling of the diffraction force around a vertical circular cylinder

Polydiagnostic study on a surfatron plasma at atmospheric pressure

Aalborg Universitet. Land-use modelling by cellular automata Hansen, Henning Sten. Published in: NORDGI : Nordic Geographic Information

A systematic methodology for controller tuning in wastewater treatment plants

A Blue Lagoon Function

COMPARISON OF THERMAL HAND MODELS AND MEASUREMENT SYSTEMS

Advantages and Challenges of Superconducting Wind Turbine Generators

Multi (building)physics modeling

Application of the wind atlas for Egypt

Solar radiation and thermal performance of solar collectors for Denmark

Strongly 2-connected orientations of graphs

Alkali Release from Typical Danish Aggregates to Potential ASR Reactive Concrete

A comparison study of the two-bladed partial pitch turbine during normal operation and an extreme gust conditions

A Trust-region-based Sequential Quadratic Programming Algorithm

Determination of the minimum size of a statistical representative volume element from a fibre-reinforced composite based on point pattern statistics

Citation (APA): Holm, E., Frøsig, L., & Sidhu, R. (Eds.) (2006). Theory of sampling. (NKS-122).

Phase Equilibrium in Amino Acid Salt Systems for CO2 Capture

On a set of diophantine equations

Feasibility of non-linear simulation for Field II using an angular spectrum approach

Resistivity survey at Stora Uppåkra, Sweden

Unknown input observer based scheme for detecting faults in a wind turbine converter Odgaard, Peter Fogh; Stoustrup, Jakob

Geometry of power flows and convex-relaxed power flows in distribution networks with high penetration of renewables

Neutronics calculations for the ITER Collective Thomson Scattering Diagnostics

GIS Readiness Survey 2014 Schrøder, Anne Lise; Hvingel, Line Træholt; Hansen, Henning Sten; Skovdal, Jesper Høi

Aalborg Universitet. Published in: IEEE International Conference on Acoustics, Speech, and Signal Processing. Creative Commons License Unspecified

Conceptual design of a thorium supplied thermal molten salt wasteburner

Document Version Publisher s PDF, also known as Version of Record (includes final page, issue and volume numbers)

Upper and Lower Bounds of Frequency Interval Gramians for a Class of Perturbed Linear Systems Shaker, Hamid Reza

Irregular Wave Forces on Monopile Foundations. Effect af Full Nonlinearity and Bed Slope

Tectonic Thinking Beim, Anne; Bech-Danielsen, Claus; Bundgaard, Charlotte; Madsen, Ulrik Stylsvig

A Logarithmic Neural Network Architecture for Unbounded Non-Linear Function Approximation

Comparison between Multitemporal and Polarimetric SAR Data for Land Cover Classification

Surface composition of ordered alloys: An off-stoichiometric effect

The structure of a nucleolytic ribozyme that employs a catalytic metal ion Liu, Yijin; Wilson, Timothy; Lilley, David

Analytical Studies of the Influence of Regional Groundwater Flow by on the Performance of Borehole Heat Exchangers

Extreme sea levels and the assessment of future coastal flood risk

Robustness of a cross contamination model describing transfer of pathogens during grinding of meat

Development of a Dainish infrastructure for spatial information (DAISI) Brande-Lavridsen, Hanne; Jensen, Bent Hulegaard

Online algorithms for parallel job scheduling and strip packing Hurink, J.L.; Paulus, J.J.

Molecular orientation via a dynamically induced pulse-train: Wave packet dynamics of NaI in a static electric field

A zero-free interval for chromatic polynomials of graphs with 3-leaf spanning trees

Transcription:

Downloaded from orbit.dtu.dk on: May 01, 2018 Matrix representation of a Neural Network Christensen, Bjørn Klint Publication date: 2003 Document Version Publisher's PDF, also known as Version of record Link back to DTU Orbit Citation (APA): Christensen, B. K. (2003). Matrix representation of a Neural Network. Technical University of Denmark (DTU). General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. Users may download and print one copy of any publication from the public portal for the purpose of private study or research. You may not further distribute the material or use it for any profit-making activity or commercial gain You may freely distribute the URL identifying the publication in the public portal If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim.

Matrix representation of a Neural Network by Bjørn Christensen June 2003 Contents 1 INTRODUCTION 2 2 THE WEIGHT MATRIX 3 2.1 Feedforward 4 2.2 Backpropagation 5 2.3 Squared Error 7 3 REFERENCES 7 1

1 Introduction This paper describes the implementation of a three-layer feedforward backpropagation neural network. The paper does not explain feedforward, backpropagation or what a neural network is. It is assumed, that the reader knows all this. If not please read chapters 2, 8 and 9 in Parallel Distributed Processing, by David Rummelhart (Rummelhart 1986) for an easy-to-read introduction. What the paper does explain is how a matrix representation of a neural net allows for a very simple implementation. The matrix representation is introduced in (Rummelhart 1986, chapter 9), but only for a two-layer linear network and the feedforward algorithm. This paper develops the idea further to three-layer non-linear networks and the backpropagation algorithm. i0 h0 o0 i1 h1 o1 ii-1 hj-1 ok-1 ii=1 hj=1 input nodes hidden nodes output nodes Figure 1 Figure 1 shows the layout of a three-layer network. There are I input nodes, J hidden nodes and K output nodes all indexed from 0. Bias-node for the hidden nodes is called ii, and bias-node for the output nodes is called hj. 2

2 The Weight Matrix In the matrix representation the input, hidden and output nodes are represented by three vectors i, h and o respectively. The weights connecting each layer are represented by a matrix. V connects the input layer with the hidden layer, and W connects the hidden layer with the output layer. See Figure 2. w0,0 w0,1 w0,j-1 w0,j o0 w1,0 w1,1 w1,j-1 w1,j o1 input nodes i0 i1 ii-1 ii=1 wk-1,0 wk-1,1 wk-1,j-1 wk-1,j ok-1 output nodes v0,0 v0,1 v0,i-1 v0,i h0 v1,0 v1,1 v1,i-1 v1,i h1 vj-1,0 vj-1,1 vj-1,i-1 vj-1,i hj-1 hj=1 hidden nodes Figure 2 Definitions: i - input nodes vector, dimension = I, indexed by i[0,i-1] h - hidden nodes vector, dimension = J, indexed by j[0,j-1] o - output nodes vector, dimension = K, indexed by k[0,k-1] ib - input nodes vector incl bias, dimension = I+1, indexed by i[0,i] hb - hidden nodes vector incl bias, dimension = J+1, indexed by j[0,j] V weight matrix from Ib to h, dimension = J(I+1), V j,i weight ib i h j W weight matrix from hb to o, dimension = K(J+1),W k,j weight hb j o k Bold denotes a vector or a matrix in the text. 3

2.1 Feedforward The feedforward algorithm produces an output vector given an input vector. As described in (Callan 1999, chapter 1), the input to hidden node h j is I 1 (1) net j Vj, I iivji i0 The naming convention of Callan is changed to match the definitions in this paper which follows (Rummelhart 1986). Note that Callan has interchanged the indices in the weight matrix. The value of hidden node h j is h j f(netj) (2) Where f() is the activation function defined as 1 (3) f ( x) -x 1 e Using matrix-vector multiplication, the value of all hidden nodes h can be calculated in a single operation h F( V ib) (4) Where F() is the vector function that takes f() on all elements of it s argument. Similarly the output vector is calculated from o F( Whb) (5) ib and hb are produced from ib i ii for i0.. I1, ibi 1 (6) hb j hj for j 0.. J1, hbj (7) 1 4

2.2 Backpropagation The backpropagation algorithm calculates the changes of the weights based on the error between the output o as calculated the by the feedforward algorithm, and the desired target output value t. Definition: t - target vector, dimension = K, indexed by k[0,k-1] From equation 11 in (Rummelhart 1986, chapter 8), we have the change of each weight: ΔWkj ηδkhj (8) Where is the learning rate, k is the error signal form output node o k, and hj is the value of node h j. This formula applys to all weights in the network. The error signal from the output layer and the hidden layer are calculated differently. Rummelhart defines the error signal as follows Error signal from output node k δok ok(1 ok)(tk ok) Error signal from hidden node j K-1 (9) (10) δh j h j(1 h j) δokwkj Definition: o- error signal from output nodes dimension = K, indexed by k[0,k-1] h- error signal from hidden nodes dimension = J, indexed by j[0,j-1] k0 Now we are ready to calculate the weight change W for all weights in a singe operation: ΔW W0,0 W0,1 W0, J - 1 W0, J (11) η W1,0 W1,1 W1, J - 1 W1, J WK - 1,0 WK - 1,1 WK - 1, J - 1 WK - 1, J δo0h0 δo0h1 δo0hj - 1 δo0hj η δo1h 0 δo1h 1 δo1h J - 1 δo1h J - 1h 0-1h1-1hJ - 1-1hJ δo0 η δo1 h 0-1 h1 hj - 1 hj η δo hb So W is times the outer product of o and hb. is the operator for the outer product. Similarly for V we get Δ V ηδh ib (12) 5

We need to express o and h in vector form. Let us define a vector multiplication operator % that multiplies the components of vector x and y as follows: 1* 1 % 2* 2 x x x y y (13) x y n* yn Calculating o is easy: δo o% ( 1 o) %( t o) (14) Where 1 is a vector of 1 s with same dimension as o. Calculating h requires a little more work. Let us define a vector S as: T s W δo (15) W T is the transpose of W. Expanding W and o gives us: T W W W W δo (16) 0,0 0,1 0, J - 1 0, J 0 s W1,0 W1,1 W1, J - 1 W1, J δo1 WK - 1,0 WK - 1,1 WK - 1, J - 1 WK - 1, J - 1 W0,0 W1,0 WK - 1,0 δo0 W0,1 W1,1 WK - 1,1 δo1 W0, J - 1 W1, J - 1 WK - 1, J - 1-1 W0, J W1, J WK - 1, J δo0w0,0 δo1w1,0... - 1WK - 1,0... δo0w0,1 δo1w1,1-1wk - 1,1 δo0w0, J - 1 δo1w1, J - 1... - 1WK - 1, J - 1 δo0w0, J δo1w1, J... - 1WK - 1, J s0 K-1 s1 where sj δokwkj sj - 1 k0 sj The component s j is operand in the error signal h j. Note that we do not need an error signal from h J since it is a bias node. Therefore we skip the last component of s, and create a vector s defined as: s j s j for j0.. J1 (17) Dimension of s is J. Finally we can write h as: δh h% (1 h) % s (18) Here 1 has same dimension as h. Let us sum up the important equations of the backpropagation algorithm: Error signal from output nodes δ o o% ( 1 o) %( t o) (19) Changes of weights connected to output nodes W η δo hb (20) Propagated errors from output nodes T s W δo (21) Error signal from hidden nodes δh h% (1 h) % s (22) Changes of weights connected to hidden nodes Δ V ηδh ib (23) 6

2.3 Squared Error The squared error is a measure of the correctness of the network. It is calculated for each input pattern p. Rummelhart defines the squared error as: K 1 1 2 E p (tk ok) 2 k0 In vector notation using the inner product this is 1 Ep (t o) (t o) 2 (24) (25) Summing the squared errors for all patterns, we get the squared error E for an epoch n (26) E Ep p1 Where n is the number of patterns in the training set. 3 References Rummelhart 1986 Callan 1999 Parallel Distributed Programming: Explorations in the Microstructure of Cognition. Vol 1 David E Rummelhart, James L McClelland MIT-press 1986 The Essence of Nearal Networks Robert Callan Prentice Hall 1999 7