E( x ) = [b(n) - a(n,m)x(m) ]

Size: px

Start display at page:

Download "E( x ) = [b(n) - a(n,m)x(m) ]"

Adele Thornton
5 years ago
Views:

1 Exam #, EE5353, Fall 0. Here we consider MLPs with binary-valued inuts (0 or ). (a) If the MLP has inuts, what is the maximum degree D of its PBF model? (b) If the MLP has inuts, what is the maximum value of L' in its PBF model? (c) If the MLP has inuts, what is the maximum number of hidden units the network requires (to be comlete, as in reference material)? (d) For a 4-inut arity check network, how many hidden units are required, at most?. For the linear set of equations A x = b, the residual error is b-a x, and the norm squared residual error is E(x) = b-a x, which is M E( x ) = [b(n) - a(n,m)x(m) ] Here, x and b have resectively and M elements, and A is M by. (a) Exress E(x) in terms of aroriate auto- and cross-correlations, and define these correlation functions. (b) Give g(i), which equals (x)/ x(i), in terms of the correlation functions. (c) Write out the linear equations that result when g(i) is equated to 0. (d) If both sides of A x = b, are re-multilied by A T we get a new set of equations, G x = d. Give exressions for g(i,j) and d(i). Is this set of equations the same as the set in art (c )? 3. We want to exress net control in an MLP using the new simle notation, which uses w(k,+) in lace of the threshold θ(k). The inut weights are w(k,n) as usual, and x (+) =. Let m(n) be the mean value of inut x (n) and let r(k,m) denote the inut autocorrelation E[x(k) x(m)]. Assume that m d and σ d resectively denote the desired hidden unit net function mean and standard deviation. (a) Give m(+), the mean of inut number (+). (b) Give exressions for the kth hidden unit s net function n(k) and its mean m k. (c) Find the variance σ k of the kth hidden unit s net function in terms of the symbols r(i,j) and m(n). (d) What quantity should be multilied by the weights w(k,n), so that the net function s standard deviation becomes σ d? (e) In terms of m d, m k, σ d, and σ k, what quantity should be added to w(k,+), so that the net function s mean becomes m d? n= m=

2 4. A functional link net has inuts, M oututs, and is degree D. The weights w ik, which feed into outut number i, are found by minimizing the error function, v E(i) = [t (i) - y (i)] v = y (i) = L m= w im X (m) using the conjugate gradient aroach (a) Give an exression for the gradient of E(i) with resect to w ij in terms of the autocorrelation r(m,n) and the cross-correlation c(n,i). (b) Give an exression for L, and give seudocode which generates the basis vector elements X(m) from the inut vector elements x(n). (c) In matrix-vector form (using R, C, and W) give the M sets of linear equations that must be solved for the weight matrix W. What are the dimensions of R, C, and W? (d) How many conjugate gradient iterations are required to minimize E(i)? (e) Given the direction vector elements (k), for k L, find an exression for B, such that the weight vector elements w ik +B (k) minimize E(i). 5. Consider the Schmidt rocedure, as alied to a neural network. Assume that the A (See aendix), R, and C matrices are available for the training data. R and C are the usual correlation matrices. (a) Give the orthonormal system s weights w o '(i,k) in terms of elements of A, R, and C. (b) ow give the original system s outut weights w o (i,k) in terms of w o '(i,k) and elements of A. (c ) In the orthonormal system, suose that X through X + came from the inuts and the constant,, and that the remaining basis functions came from hidden units. Let y i (k) denote y i calculated from all inuts, the constant, and k of the hidden units. Give an exression for y i (k) in terms of w o '(i,m) and X m. Give an efficient exression for y i (k+) in terms of y i (k). (d) Given an orthonormal basis vector X, how many multilies are needed to construct oututs y i (k) for 0 k h?

3 6. In one-stage BP training (o OWO is used), we find negative gradient matrices G G oi and G oh. Assume that these three gradient matrices have already been found. Instead of using a single otimal learning factor (OLF) z, we can use three, one for each weight matrix. Our error function is z v M ( z) = [ ( ) ( )], z = v = i= z 3 E t i y i z where the outut, in terms of the OLFs is + y ( i) = [ w ( i, n) + z g ( i, n)] x ( n) + oi oi n= h + [ w ( i, k) + z g ( i, k)] f ( [ w( k, n) + z g( k, n)] x ( n)) oh oh 3 k = n= (a) Give y (i)/ z where the artial is evaluated for z, z, and z 3 equal to 0. Remember, f(n (k) ) = O (k). (b) Give y (i)/ z where the artial is evaluated for z, z, and z 3 equal to 0. (c) Give y (i)/ z 3 where the artial is evaluated for z, z, and z 3 equal to 0. Use the symbol f () if necessary. (d) Give exressions for (z)/ z, (z)/ z, and (z)/ z 3 in terms of the symbols y (i)/ z, y (i)/ z and y (i)/ z 3 resectively. (e) What additional artial derivatives are required, if we are to find the three OLFs?

4 Reference Material The variance of X is σ X = E[ X ] E [ X ] = E[( X E[ X ]) ] An -inut comlete network of degree D is one that has enough hidden units so that any D-degree olynomial function of inuts can be formed by adding one outut node and connecting weights to the existing hidden units. The weights for the new outut can be found by solving linear equations. If a comlete network's inuts and hidden units are linearly indeendent, the exhaustive PBF model of the network has a square C matrix. In a MLP with one hidden layer and linear outut activations, the outut and hidden unit deltas for the th attern are resectively δ o(i) = - net o(i) = ( t i - y ) i δ ( k ) = f ( net k ) δ o(i) woh(i,k) Then, - / w oh (i,k) and - / w(k,n) are found as M i= - = δ o(i) O woh(i,k) k - = δ ( k) x n w( k, n) where the kth unit is an inut or hidden unit and the nth unit is an inut unit.

5 Some Equations from Schmidt Procedure u y =w' ( i,k)x ' i o k k = u y = w (i,m)x i o m m= X ' = a X k k km m m= ewton s Method Assume all the weights you re interested in training are stored in the vector w, of dimension w. Let g be the negative gradient vector (negative Jacobian) for the training error E. For the Hessian matrix H, the mth row, nth column element is h(m,n) = E/ w(m) w(n) Let e = w w be the unknown weight change vector, where w is the new version of w that we re trying to find. Then, H e = g (3) and we see that e = H - g. The weight vector is then udated as w = w + e

E( x ) [b(n) - a(n, m)x(m) ]

E( x ) [b(n) - a(n, m)x(m) ] Homework #, EE5353. An XOR network has two inuts, one hidden unit, and one outut. It is fully connected. Gie the network's weights if the outut unit has a ste actiation and the hidden unit actiation is