Homework #, EE5353. An XOR network has two inuts, one hidden unit, and one outut. It is fully connected. Gie the network's weights if the outut unit has a ste actiation and the hidden unit actiation is (a) Also a ste function (b) The square of the its net function. Gie an algorithm for conerting decimal format class numbers, i c, to (a) Coded format desired oututs, t (i). (b) Uncoded format desired oututs, t (i). 3. In this roblem, we are inestigating the ector X which is used in functional link neural networks. X must be efficiently generated from the feature ector x. (a) Gie an efficient algorithm for generating X for the second degree case. The elements do not hae to be in any articular order. (b) Gie an efficient algorithm for generating X for the third degree case. (Hint; use three nested loos) (c) Find an exression for K(n,). 4. For the linear set of equations Ax = b, the residual error is b - Ax, and the norm squared residual error is E(x) = b - Ax, which is M E( x ) [b(n) - a(n, m)x(m) ] = n= m= Here, x and b hae resectiely and M elements, and A is M by. (a) Gie g(k) = E/ x(k) in terms of the cross correlation c(k) and the autocorrelation r(k,m). Gie exressions for these correlations. (b) Gie an exression for B in terms of (), c(), r(). (c) Gie seudocode for the conjugate gradient algorithm for minimizing E, in terms of symbols it, i it,, n, X n, X d, x(n), g(n), (n), B, B.
5. A functional link net has inuts, M oututs, and is degree D. The weights w ik, which feed into outut number i, are found by minimizing the error function, E(i) = [t (i) - y (i)] = y (i) = L m= w im X (m) using the conjugate gradient aroach (a) Gie an exression for the gradient of E(i) with resect to w ij in terms of the autocorrelation r(m,n) and the cross-correlation c(n,i). (b) How many conjugate gradient iterations are required to minimize E(i)? (c) Gien the direction ector elements (k), for k L, find an exression for B, such that the weight ector elements w ik +B (k) minimize E(i). (d) If some of the X (m) are linearly deendent (so that X (m) = a X (n) + b X (j) for examle), what adantages the conjugate gradient solution hae oer a Gauss-Jordan or other linear equation soler? 6. An MLP s kth hidden unit has threshold w(k,+) and weights w(k,n). Let x n be zero-mean, and assume that net control is to be erformed, where m d and σ d resectiely denote the desired hidden unit net function mean and standard deiation. Let r(k,m) be defined as the autocorrelation E[x k x m ]. (a) Find the mean m k of the kth hidden unit s net function. (b) Find the standard deiation σ k of the kth hidden unit s net function. (c) Gien w(k,+)and w(k,n), how should they be changed so that the net function has the desired mean and standard deiation? 7. MLP number has weight matrices W, W oh, and W oi, and has inuts modeled as x (n) = (n) + m(n) where m(n) is the mean of the nth inut, taken oer all training atterns. (n) ( of which (n) is the th examle) is zero-mean. MLP number has the same structure and most of the same weights, but the inuts are (n). Remember, x (+) = x(+) =. (a) Find hidden unit thresholds w (k,+) for MLP no., in terms of w(k,+), m(n), and w(k,n) so that both networks hae identical net functions. (b) Gien the networks of art (a), find outut thresholds w oi (i,+) for MLP no. so that the two networks hae identical oututs for all atterns.
8. MLP number has inut ectors x of dimension (+), where x (+)=. Additional arameters are w(k,n), w oh (i,k), and w oi (i,n). Let x be transformed as modeled as z = Ax where A is (+) by (+). For MLP no., z is the inut ector. MLP no. has arameters w (k,n), w oh (i,k), and w oi (i,n). (a) Gien W, find the h by net function ector n in terms of W and x. (b) Assume that MLP no. is equialent to MLP no. and has the same net function ectors, so (n = n!). Find its inut weight matrix W in terms of W and A. (c) Let G be a negatie gradient matrix whose elements are E g(k, n) = δ (k)x (n) w(k, n) = so the h by (+) negatie gradient matrix for W is G = δ ( x ) = T where δ is the h by ector of hidden unit delta functions. G is the negatie gradient matrix for MLP no., with elements g (k,n) = - E/ w (k,n). Find G in terms of G and A. Remember that n = n so δ = δ. (d) Using your results from art (b), G can be maed back to MLP no. and the resulting negatie gradient matrix is G, which can be used to train MLP. Exress G in terms of G and A. (e) If G = G, what condition should A satisfy? (f) If E is minimized with resect to W in MLP no., is E also minimized with resect to W in MLP no.?
9. Outut weights for a one-outut (M=) neural net (linear, FL, or MLP) are to be designed by minimizing the error function u E = [ t w( n) X ( n)] = n = (a) Find the gradient ector element g(m) = E/ w(m) in terms of the autocorrelation r() and the cross-correlation c(), and define r() and c(). (b) Gie the outut weight ector w in terms of the matrix R and the cross-correlation ector c. (c) Gie the Hessian matrix element E/ w(u) w(m) in terms of releant quantities from arts (a) or (b). (d) Using ewton s method, exress the outut weight ector w in terms of releant quantities from arts (a), (b) or (c). Is w from ewton s algorithm the same as w from art (b)? 0. In the BP art of OWO-BP, we find the direction matrix D which equals the negatie gradient matrix G of dimensions h by (+). We then minimize the error function M E( z) = [ t ( i) y ( i)] with resect to z, where = i = + + h y ( i) = w ( i, n) x ( n) + w ( i, k) f ( ( w( k, n) + z d( k, n)) x ( n)) oi oh n= k = n= (a) Gie E(z)/ z in terms of the symbol y (i)/ z. (b) Gie the Gauss-ewton aroximation of E(z)/ z in terms of the symbol y (i)/ z. (c) Gie the otimal learning factor in terms of the symbols E(z)/ z and E(z)/ z. (d) Gie y (i)/ z in terms of aroriate weights and symbols, including n (k),, h, d(k,n), etc.
. In two-stage OWO-BP training, we use the negatie gradient matrix G and the otimal learning factor (OLF) z to modify W. In the multile otimal learning factors (MOLF) algorithm, we use a different OLF z k for each hidden unit. After we e found G, our error function in terms of z and G is z M z E( z) = [ t ( i) y ( i)], z =. = i=. z h where the outut, in terms of the OLFs is + + h y ( i) = w ( i, n) x ( n) + w ( i, k) f ( [ w( k, n) + z g( k, n)] x ( n)) oi oh k n= k = n= (a) Gie y (i)/ z m where the artial is ealuated for z k s equal to 0. Remember, f(n (k) ) = O (k). (b) Gie an exression for g(m) = - E(z)/ z m in terms of the symbols y (i)/ z m, t (i), y (i), etc. ote that g(m) is an element of g. (c) If ewton s algorithm is used to find z in a gien iteration, the Hessian matrix elements are h(u,) = E/( z u z ). Gie the Gauss-ewton exression for h(u,). What are the dimensions of H? (d) Gie the equations to be soled for z, in matrix ector form. What method can be used to sole these linear equations?
. Some comilers roduce faster executables when matrix oerations are used in the source code. In a FL, let the rows of the data matrices D X, D t, and D y store (X ) T (t ) T and (y ) T resectiely, so that the data matrices dimensions are by L, by M, and by M. Assume that the MSE E to be minimized is defined, as usual, as E = M i= E( i), E( i) = [ t ( i) y ( i)] = (a) If y = W X write D y in terms of D X and W. (Hint: use the transose oeration) (b) We want to conert the D y equation of art (a) into our familiar equations C = R W T. In order to do this, what do we re-multily the D y equation by? (c) Let D y (i) and D t (i) denote the ith columns resectiely of D y and D t. Write E(i) in terms of D y (i), D t (i) and any other necessary symbols. (d) Relacing D y (i) and D t (i) in your E(i) exression by D y and D t and using the trace oerator (tr(a) = a(,)+a(,) ), generate the exression for E. Is this calculation efficient?