E( x ) [b(n) - a(n, m)x(m) ]

Similar documents
E( x ) = [b(n) - a(n,m)x(m) ]

Sensitivity of Nonlinear Network Training to Affine Transformed Inputs

Radial Basis Function Networks: Algorithms

LIMITATIONS OF RECEPTRON. XOR Problem The failure of the perceptron to successfully simple problem such as XOR (Minsky and Papert).

Recent Developments in Multilayer Perceptron Neural Networks

Round-off Errors and Computer Arithmetic - (1.2)

PROFIT MAXIMIZATION. π = p y Σ n i=1 w i x i (2)

Solved Problems. (a) (b) (c) Figure P4.1 Simple Classification Problems First we draw a line between each set of dark and light data points.

Machine Learning: Homework 4

Multilayer Perceptron Neural Network (MLPs) For Analyzing the Properties of Jordan Oil Shale

Node-voltage method using virtual current sources technique for special cases

Principal Components Analysis and Unsupervised Hebbian Learning

Chater Matrix Norms and Singular Value Decomosition Introduction In this lecture, we introduce the notion of a norm for matrices The singular value de

JJMIE Jordan Journal of Mechanical and Industrial Engineering

Feedback-error control

General Linear Model Introduction, Classes of Linear models and Estimation

Participation Factors. However, it does not give the influence of each state on the mode.

4. Score normalization technical details We now discuss the technical details of the score normalization method.

CSE555: Introduction to Pattern Recognition Midterm Exam Solution (100 points, Closed book/notes)

Image Alignment Computer Vision (Kris Kitani) Carnegie Mellon University

PHYS 301 HOMEWORK #9-- SOLUTIONS

Introduction to MVC. least common denominator of all non-identical-zero minors of all order of G(s). Example: The minor of order 2: 1 2 ( s 1)

Finite Mixture EFA in Mplus

7.5 Operations with Matrices. Copyright Cengage Learning. All rights reserved.

Outline. EECS150 - Digital Design Lecture 26 Error Correction Codes, Linear Feedback Shift Registers (LFSRs) Simple Error Detection Coding

Convex Optimization methods for Computing Channel Capacity

Numerical Linear Algebra

Hotelling s Two- Sample T 2

Knuth-Morris-Pratt Algorithm

State Estimation with ARMarkov Models

A Recursive Block Incomplete Factorization. Preconditioner for Adaptive Filtering Problem

Hermite subdivision on manifolds via parallel transport

Computer arithmetic. Intensive Computation. Annalisa Massini 2017/2018

THE multilayer perceptron (MLP) is a nonlinear signal

Announcements Wednesday, September 05

A Method to Solve Optimization Problem in Model. Predictive Control for Twin Rotor MIMO System. Using Modified Genetic Algorithm

Math 4400/6400 Homework #8 solutions. 1. Let P be an odd integer (not necessarily prime). Show that modulo 2,

Series Handout A. 1. Determine which of the following sums are geometric. If the sum is geometric, express the sum in closed form.

Neural network models for river flow forecasting

AI*IA 2003 Fusion of Multiple Pattern Classifiers PART III

Finite-Sample Bias Propagation in the Yule-Walker Method of Autoregressive Estimation

Best approximation by linear combinations of characteristic functions of half-spaces

Training sequence optimization for frequency selective channels with MAP equalization

Uncorrelated Multilinear Principal Component Analysis for Unsupervised Multilinear Subspace Learning

The Full-rank Linear Least Squares Problem

EE 508 Lecture 13. Statistical Characterization of Filter Characteristics

CSC165H, Mathematical expression and reasoning for computer science week 12

THE FIRST LAW OF THERMODYNAMICS

F(p) y + 3y + 2y = δ(t a) y(0) = 0 and y (0) = 0.

Radial-Basis Function Networks

Cryptanalysis of Pseudorandom Generators

1/25/2018 LINEAR INDEPENDENCE LINEAR INDEPENDENCE LINEAR INDEPENDENCE LINEAR INDEPENDENCE

Positive decomposition of transfer functions with multiple poles

AN ALGORITHM FOR MATRIX EXTENSION AND WAVELET CONSTRUCTION W. LAWTON, S. L. LEE AND ZUOWEI SHEN Abstract. This paper gives a practical method of exten

Nonlinear Static Analysis of Cable Net Structures by Using Newton-Raphson Method

Single layer NN. Neuron Model

Monopolist s mark-up and the elasticity of substitution

4. Multilayer Perceptrons

Note: the net distance along the path is a scalar quantity its direction is not important so the average speed is also a scalar.

2-D Analysis for Iterative Learning Controller for Discrete-Time Systems With Variable Initial Conditions Yong FANG 1, and Tommy W. S.

Linear Algebraic Equations

PRICE-BASED MARKET CLEARING UNDER MARGINAL PRICING: A BILEVEL PROGRAMMING APPROACH

ON POLYNOMIAL SELECTION FOR THE GENERAL NUMBER FIELD SIEVE

RESERVOIR INFLOW FORECASTING USING NEURAL NETWORKS

ECON Answers Homework #2

EE 508 Lecture 13. Statistical Characterization of Filter Characteristics

Scattering matrix of the interface

EIGENVALUES AND EIGENVECTORS

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

Neural networks. Chapter 20. Chapter 20 1

Why do non-gauge invariant terms appear in the vacuum polarization tensor?

LINEAR SYSTEMS WITH POLYNOMIAL UNCERTAINTY STRUCTURE: STABILITY MARGINS AND CONTROL

CHAPTER 5. The Operational Amplifier 1

Generalized Coiflets: A New Family of Orthonormal Wavelets

PARALLEL MATRIX MULTIPLICATION: A SYSTEMATIC JOURNEY

Solution of Linear systems

Multi-Operation Multi-Machine Scheduling

Unit 1 - Computer Arithmetic

Multilayer Perceptrons and Backpropagation

Uniformly best wavenumber approximations by spatial central difference operators: An initial investigation

Review of Matrices and Vectors 1/45

Approximating min-max k-clustering

G1110 & 852G1 Numerical Linear Algebra

Evaluating Circuit Reliability Under Probabilistic Gate-Level Fault Models

Lecture 22: Section 4.7

Parametrization of All Stabilizing Controllers Subject to Any Structural Constraint

Chapter 3. Systems of Linear Equations: Geometry

Number Theory Naoki Sato

Hidden Predictors: A Factor Analysis Primer

Open-air traveling-wave thermoacoustic generator

A Geometric Review of Linear Algebra

PMR5406 Redes Neurais e Lógica Fuzzy Aula 3 Single Layer Percetron

Pretest (Optional) Use as an additional pacing tool to guide instruction. August 21

Algorithms and Data Structures 2014 Exercises and Solutions Week 14

FFTs in Graphics and Vision. Rotational and Reflective Symmetry Detection

Section 1.7. Linear Independence

MATH 361: NUMBER THEORY THIRD LECTURE

Research Article An Iterative Algorithm for the Reflexive Solution of the General Coupled Matrix Equations

Distributed Rule-Based Inference in the Presence of Redundant Information

Math 304 (Spring 2010) - Lecture 2

Transcription:

Homework #, EE5353. An XOR network has two inuts, one hidden unit, and one outut. It is fully connected. Gie the network's weights if the outut unit has a ste actiation and the hidden unit actiation is (a) Also a ste function (b) The square of the its net function. Gie an algorithm for conerting decimal format class numbers, i c, to (a) Coded format desired oututs, t (i). (b) Uncoded format desired oututs, t (i). 3. In this roblem, we are inestigating the ector X which is used in functional link neural networks. X must be efficiently generated from the feature ector x. (a) Gie an efficient algorithm for generating X for the second degree case. The elements do not hae to be in any articular order. (b) Gie an efficient algorithm for generating X for the third degree case. (Hint; use three nested loos) (c) Find an exression for K(n,). 4. For the linear set of equations Ax = b, the residual error is b - Ax, and the norm squared residual error is E(x) = b - Ax, which is M E( x ) [b(n) - a(n, m)x(m) ] = n= m= Here, x and b hae resectiely and M elements, and A is M by. (a) Gie g(k) = E/ x(k) in terms of the cross correlation c(k) and the autocorrelation r(k,m). Gie exressions for these correlations. (b) Gie an exression for B in terms of (), c(), r(). (c) Gie seudocode for the conjugate gradient algorithm for minimizing E, in terms of symbols it, i it,, n, X n, X d, x(n), g(n), (n), B, B.

5. A functional link net has inuts, M oututs, and is degree D. The weights w ik, which feed into outut number i, are found by minimizing the error function, E(i) = [t (i) - y (i)] = y (i) = L m= w im X (m) using the conjugate gradient aroach (a) Gie an exression for the gradient of E(i) with resect to w ij in terms of the autocorrelation r(m,n) and the cross-correlation c(n,i). (b) How many conjugate gradient iterations are required to minimize E(i)? (c) Gien the direction ector elements (k), for k L, find an exression for B, such that the weight ector elements w ik +B (k) minimize E(i). (d) If some of the X (m) are linearly deendent (so that X (m) = a X (n) + b X (j) for examle), what adantages the conjugate gradient solution hae oer a Gauss-Jordan or other linear equation soler? 6. An MLP s kth hidden unit has threshold w(k,+) and weights w(k,n). Let x n be zero-mean, and assume that net control is to be erformed, where m d and σ d resectiely denote the desired hidden unit net function mean and standard deiation. Let r(k,m) be defined as the autocorrelation E[x k x m ]. (a) Find the mean m k of the kth hidden unit s net function. (b) Find the standard deiation σ k of the kth hidden unit s net function. (c) Gien w(k,+)and w(k,n), how should they be changed so that the net function has the desired mean and standard deiation? 7. MLP number has weight matrices W, W oh, and W oi, and has inuts modeled as x (n) = (n) + m(n) where m(n) is the mean of the nth inut, taken oer all training atterns. (n) ( of which (n) is the th examle) is zero-mean. MLP number has the same structure and most of the same weights, but the inuts are (n). Remember, x (+) = x(+) =. (a) Find hidden unit thresholds w (k,+) for MLP no., in terms of w(k,+), m(n), and w(k,n) so that both networks hae identical net functions. (b) Gien the networks of art (a), find outut thresholds w oi (i,+) for MLP no. so that the two networks hae identical oututs for all atterns.

8. MLP number has inut ectors x of dimension (+), where x (+)=. Additional arameters are w(k,n), w oh (i,k), and w oi (i,n). Let x be transformed as modeled as z = Ax where A is (+) by (+). For MLP no., z is the inut ector. MLP no. has arameters w (k,n), w oh (i,k), and w oi (i,n). (a) Gien W, find the h by net function ector n in terms of W and x. (b) Assume that MLP no. is equialent to MLP no. and has the same net function ectors, so (n = n!). Find its inut weight matrix W in terms of W and A. (c) Let G be a negatie gradient matrix whose elements are E g(k, n) = δ (k)x (n) w(k, n) = so the h by (+) negatie gradient matrix for W is G = δ ( x ) = T where δ is the h by ector of hidden unit delta functions. G is the negatie gradient matrix for MLP no., with elements g (k,n) = - E/ w (k,n). Find G in terms of G and A. Remember that n = n so δ = δ. (d) Using your results from art (b), G can be maed back to MLP no. and the resulting negatie gradient matrix is G, which can be used to train MLP. Exress G in terms of G and A. (e) If G = G, what condition should A satisfy? (f) If E is minimized with resect to W in MLP no., is E also minimized with resect to W in MLP no.?

9. Outut weights for a one-outut (M=) neural net (linear, FL, or MLP) are to be designed by minimizing the error function u E = [ t w( n) X ( n)] = n = (a) Find the gradient ector element g(m) = E/ w(m) in terms of the autocorrelation r() and the cross-correlation c(), and define r() and c(). (b) Gie the outut weight ector w in terms of the matrix R and the cross-correlation ector c. (c) Gie the Hessian matrix element E/ w(u) w(m) in terms of releant quantities from arts (a) or (b). (d) Using ewton s method, exress the outut weight ector w in terms of releant quantities from arts (a), (b) or (c). Is w from ewton s algorithm the same as w from art (b)? 0. In the BP art of OWO-BP, we find the direction matrix D which equals the negatie gradient matrix G of dimensions h by (+). We then minimize the error function M E( z) = [ t ( i) y ( i)] with resect to z, where = i = + + h y ( i) = w ( i, n) x ( n) + w ( i, k) f ( ( w( k, n) + z d( k, n)) x ( n)) oi oh n= k = n= (a) Gie E(z)/ z in terms of the symbol y (i)/ z. (b) Gie the Gauss-ewton aroximation of E(z)/ z in terms of the symbol y (i)/ z. (c) Gie the otimal learning factor in terms of the symbols E(z)/ z and E(z)/ z. (d) Gie y (i)/ z in terms of aroriate weights and symbols, including n (k),, h, d(k,n), etc.

. In two-stage OWO-BP training, we use the negatie gradient matrix G and the otimal learning factor (OLF) z to modify W. In the multile otimal learning factors (MOLF) algorithm, we use a different OLF z k for each hidden unit. After we e found G, our error function in terms of z and G is z M z E( z) = [ t ( i) y ( i)], z =. = i=. z h where the outut, in terms of the OLFs is + + h y ( i) = w ( i, n) x ( n) + w ( i, k) f ( [ w( k, n) + z g( k, n)] x ( n)) oi oh k n= k = n= (a) Gie y (i)/ z m where the artial is ealuated for z k s equal to 0. Remember, f(n (k) ) = O (k). (b) Gie an exression for g(m) = - E(z)/ z m in terms of the symbols y (i)/ z m, t (i), y (i), etc. ote that g(m) is an element of g. (c) If ewton s algorithm is used to find z in a gien iteration, the Hessian matrix elements are h(u,) = E/( z u z ). Gie the Gauss-ewton exression for h(u,). What are the dimensions of H? (d) Gie the equations to be soled for z, in matrix ector form. What method can be used to sole these linear equations?

. Some comilers roduce faster executables when matrix oerations are used in the source code. In a FL, let the rows of the data matrices D X, D t, and D y store (X ) T (t ) T and (y ) T resectiely, so that the data matrices dimensions are by L, by M, and by M. Assume that the MSE E to be minimized is defined, as usual, as E = M i= E( i), E( i) = [ t ( i) y ( i)] = (a) If y = W X write D y in terms of D X and W. (Hint: use the transose oeration) (b) We want to conert the D y equation of art (a) into our familiar equations C = R W T. In order to do this, what do we re-multily the D y equation by? (c) Let D y (i) and D t (i) denote the ith columns resectiely of D y and D t. Write E(i) in terms of D y (i), D t (i) and any other necessary symbols. (d) Relacing D y (i) and D t (i) in your E(i) exression by D y and D t and using the trace oerator (tr(a) = a(,)+a(,) ), generate the exression for E. Is this calculation efficient?