(b) (a) 1 Introduction. 2 Related Works. 3 Problem Definition. 4 The Analytical Formula. Yuandong Tian Facebook AI Research
|
|
- Wesley Carter
- 5 years ago
- Views:
Transcription
1 Supp Materials: An Analytical Formula of Population Gradient for two-layered ReLU network and its Applications in Convergence and Critical Point Analysis Yuandong Tian Facebook AI Research March 8, 07 Introduction No theorem is provided. Related Works No theorem is provided. 3 Problem Definition No theorem is provided. 4 The Analytical Formula Here we list all detailed proof for all the theorems. (a) e w (b) w e O O 0 <0 Figure : (a)-(b) Two cases in Thm..
2 4. One ReLU Case Theorem Suppose F (e, w) = X D(e)D(w)Xw where e is a unit vector and X = [x, x,, x N ] is N-by-d sample matrix. If x i N(0, I), then: where θ [0, π] is the angle between e and w. E [F (e, w)] = N ((π θ)w + w sin θe) () π Proof Note that F can be written in the following form: F (e, w) = x i x i w () i:x i e 0,x i w 0 where x i are samples so that X = [x, x,, x n ]. We set up the axes related to e and w as in Fig., while the rest of the axis are prependicular to the plane. In this coordinate system, any vector w/ w e cos θ x = [r sin φ, r cos φ, x 3,..., x d ]. We have an orthonomal set of bases: e, e = sin θ (and any set of bases that span the rest of the space). Under the basis, the representation for e and w is [, 0 d ] and [ w cos θ, w sin θ, 0 d ]. Note that here θ ( π, π]. The angle θ is positive when e chases after w, and is otherwise negative. Now we consider the quality R(φ 0 ) = E [ N ] i:φ x i [0,φ 0] ix i. If we take the expectation and use polar coordinate only in the first two dimensions, we have: R(φ 0 ) = E x i x i = E [x i x i N φ i [0, φ 0 ]] P [φ i [0, φ 0 ]] i:φ i [0,φ 0] + + r sin φ φ0 = r cos φ [ ] d r sin φ r cos φ... xd p(r)p(θ) p(x k )rdrdφdx 3... dx d k=3 x d where p(r) = e r / and p(θ) = /π. Note that R(φ 0 ) is a d-by-d matrix. The first -by- block can be computed in close form (note that + r p(r)rdr = ). Any off-diagonal element except for the first -by- 0 block is zero due to symmetric property of spherical Gaussian variables. Any diagonal element outside the first -by- block will be P [φ i [0, φ 0 ]] = φ 0 /π. Finally, we have: R(φ 0 ) = E x i x i = φ 0 sin φ 0 cos φ 0 0 cos φ 0 φ 0 + sin φ 0 0 (3) N 4π i:φ i [0,φ 0] 0 0 φ 0 I d = φ 0 π I d + sin φ 0 cos φ 0 0 cos φ 0 sin φ 0 0 (4) 4π With this equation, we could then compute E [F (e, w)]. When θ 0, the condition {i : x i e 0, x i w 0} is equivalent to {i : φ i [θ, π]} (Fig. (a)). Using w = [ w cos θ, w sin θ, 0 d ] and we have: E [F (e, w)] = N (R(π) R(θ)) w (5) = N sin θ cos θ 0 cos θ (π θ)w w cos θ sin θ 0 sin θ (6) 4π = N ( [ ]) sin θ (π θ)w + w (7) π 0 = N ((π θ)w + w sin θe) (8) π For θ < 0, the condition {i : x i e 0, x i w 0} is equivalent to {i : φ i [0, π + θ]} (Fig. (b)), and similarly we get E [F (e, w)] = N (R(π + θ) R(0)) w = N ((π + θ)w w sin θe) (9) π Notice that by abuse of notation, the θ appears in Eqn. is the absolute value and Eqn. follows.
3 5 Critical Point Analysis Remember that we have: suppose F (e, w) = X D(e)D(w)Xw where e is a unit vector and X = [x, x,, x N ] is N-by-d sample matrix. If x i N(0, I), then: E [F (e, w)] = N ((π θ)w + w sin θe) (0) π where θ [0, π] is the angle between e and w. And the expected gradient for g(x) = K respect to w j is the following: E [ wj J ] = j= σ(w j x) with K E [ F (e j, wj )] E [F (e j, w j )] () j = where e j = w j / w j. By solving Eqn. 64 (E [ wj J ] = 0 j =,..., K), it is possible to identify all critical points of g(x). In general Eqn. 64 is highly nonlinear and cannot be solved efficiently, however, we show that in our particular case, it is possible since Eqn. 64 has interesting properties. First of all, the system of equations or K E [F (e j, w j ] = j = is a linear combination of w j and w j E [ wj J ] = 0, j =,..., K () K E [ F (e j, wj ] j =, j =,..., K (3) but with varying coefficients. We could rewrite the system as follows: diagae + Bdiag we = diaga E + B W (4) where E, W, W, a, B, a and B are all K-by-K matrices: where θ j j (w j, wj ) and θj j = θj j θj i) and Θ = [θ i a = sin Θ w, a = sin Θ w (5) B = π Θ, B = π Θ (6) E = [e, e,..., e K ] (7) W = [w, w,..., w K ], W = [w, w,..., w K] (8) (w j, w j ), Θ = [θj i ] (the element at i-th row, j-th column of Θ is j ], w = [ w, w,..., w K ] and w = [ w, w,..., wk ]. Eqn. 4 already has interesting properties. The first thing we consider is whether the critical point will fall outside the Principle Hyperplane Π, which is the plane spanned by the ground truth weight vectors W. The following theorem shows that the critical points outside Π must lie in a manifold: Lemma If {w j } is a critical point satisfying Eqn. 4, then for any orthogonal mapping R with R Π = Id, {Rw j } is also a critical point. Proof First of all, since R is an orthogonal transformation, it keeps all angles and magnitudes and a, a, B, w and w are invariant. For simplicity we write Y = diaga + Bdiag w diaga and Y is also invariant under R. Since R Π = Id, we have RW = W and Y R E R B RW = Y E R B W R = (Y E B W ) R = 0 (9) Note that for d K +, there always exists R Id and satisfy such a condition, which yield continuous critical points. Further, such a transformation forms a Lie group. Therefore we have: Theorem If d K +, then any critical point satisfying Eqn. 4 and is outside Π must lie in a manifold. The intuition is simple. For any out-of-plane critical point, pick a matrix that satisfies the condition of the theorem, and tranforms it to a different yet infinitely close critical points. Such a matrix always exists, since for the d K subspace, if it is odd, then we can always pick a rotation whose fixed axis is not aligned with all K weights; if it is even, then there is a rotation matrix without a fixed point. 3
4 5. Characteristics within the Principle Plane We could right-multiple E and turn the normal equation to a linear function with respect to the magnitude of weights w. Note that we have: Therefore, Eqn. 4 becomes: E E = cos Θ, W E = diag w cos Θ, (W ) E = diag w Θ (0) diaga cos Θ + Bdiag w cos Θ = diaga cos Θ + B diag w cos Θ () which is a homogenous linear equation with respect to the magnitude of the weights (note that a and a is linear to the magnitudes). In particular, the (i, j) entry of the LHS and RHS of this equality are: ( K ) K LHS ij = cos θj i sin θi k w k + (π θi k ) w k cos θj k () RHS ij = cos θ i j k= ( K Therefore, the following equation holds: k= sin θi k wk ) + k= K (π θi k ) wk cos θj k (3) k= M w = M w (4) where M and M are K -by-k matrices. Each entry m ij,k that correponds to the coefficient of k-th weight vector at (i, j) entry of Eqn. 4 is defined as: m ij,k = (π θ k i ) cos θ k j + sin θ k i cos θ i j (5) m ij,k = (π θi k ) cos θj k + sin θi k cos θj i (6) Special case on the diagonal. For diagonal element (i, i), cos θ i i = and m ii,k = h(θ k i ), m ii,k = h(θ k i ), where h(θ) = (π θ) cos θ + sin θ. (7) Therefore, with only diagonal element, we arrive at the following subset of the constraints to be satisfied for any critical points: M r w = M r w (8) where M r = h(θ ) and M r = h(θ ) are both K-by-K matrices. Note that if M r is full-rank, then we could solve w from Eqn. 8 and plug it back in Eqn. 4 to check whether it is indeed a critical point. Lemma If w 0 (no trivial ground truth solutions), and for a given (Θ, Θ ), there exists a row (e.g. l-th row) of M and M, namely m l and m l, satisfying Then (Θ, Θ ) cannot be a critical point. m l Mr Mr m l > 0 or m l Mr Mr m l < 0 (9) Proof Suppose given (Θ, Θ ), we get M r and M r and compute w using Eqn. 8, then we have ( w ) M r M r m l = (Mr w ) Mr m l = w Mr Mr m l = w m l (30) Therefore, from the condition m l M r Mr m l > 0 and w 0 but w 0, we have ( w ) (m l Mr Mr m l ) = ( w ) m l w m l > 0 (3) but this contradicts with the necessary condition for (Θ, Θ ) to become a critical point (Eqn. 4). Similarly we can prove the other side. Separation property of Eqn. 9. Note that both the k-th element of m l and M r M r m l in Eqn. 9 are only dependent on the k-th true weight vector w k (and all {w j}). 4
5 For m l, this can be seen by Eqn. 6, in which the k-th element is only related to the angles θ k between wk and {w j}. For M r M r m l, notice that the k-th column of M r (the k-th row of M r ) is only related to w k but not other ground truth weight vectors. This separation property makes analysis much easier, as shown in the case of K =. Therefore, we could consider the following function regarding to one (rather than K) ground truth unit weight vector e and all estimated unit vectors {e l }: L ij (e, {e l }) = m ij v M r m ij (3) where v = [h(θ ), h(θ ),..., h(θ K )], θ j = (e, w j ) and m ij = (π θ i ) cos θ j + sin θ i cos θi j (like Eqn. 6). Proposition L ij (e, {e l }) = 0 for any e = e l, l K. In addition, L ii (e, {e l }) 0. Proof When e = e l, then θk = θl k and v becomes the l-th row of M r. Since M r Mr becomes a unit vector with only l-th element being. Therefore, again with θk = θl k, we have: = I K K, v M r L ij (e, {e l }) = m ij m ij,l = 0 (33) For L ii, by definition m ii is i-th column of M r, so Mr m ii is a unit vector with only i-th element being. Therefore (v ) Mr m ii = h(θi ) = m ii (34) Then the previous lemma can be written as the following: Theorem 3 If w 0, and for a given parameter w, L jj ({θl k }, Θ) > 0 or < 0 for all k K, then w cannot be a critical point. 5. ReLU network with two hidden nodes (K = ) For K =, we have 4-by- matrix (the row order is (, ), (, ), (, ), (, )): (π θ) cos θ + sin θ cos θ (π θ) cos θ + sin θ cos θ M = (π θ) cos θ + sin θ cos θ (π θ) cos θ + sin θ cos θ (π θ) cos θ + sin θ cos θ (π θ) cos θ + sin θ cos θ (35) (π θ) cos θ + sin θ cos θ (π θ) cos θ + sin θ cos θ π (π θ) cos θ + sin θ = π cos θ (π θ) + sin θ cos θ (π θ) + sin θ cos θ π cos θ (36) (π θ) cos θ + sin θ π since θ = θ = θ, θ = θ = 0. Similarly we could write M : (π θ ) cos θ + sin θ cos θ (π θ ) cos θ + sin θ cos θ M = (π θ ) cos θ + sin θ cos θ (π θ ) cos θ + sin θ cos θ (π θ ) cos θ + sin θ cos θ (π θ ) cos θ + sin θ cos θ (37) (π θ ) cos θ + sin θ cos θ (π θ ) cos θ + sin θ cos θ In this case, M r = [ ] h(θ ) h(θ) h(θ) h(θ), Mr = [ h(θ ) h(θ ) ) h(θ ) h(θ Therefore, if we know θ = θ, θ, θ, θ and θ, then we could compute M and M and solve a linear equation to get the magnitude of w and w, which collectly identify the critical points. Note that M is a 4-by- matrix, so critical point only happens if the matrix has singular structure. ] (38) 5
6 Global Optimum. One special case is when θ = θ = θ = θ = π/ and θ = θ = 0, in this case, we have: π M = M = 0 π/ π/ 0 (39) π and thus w j = wj is the unique solution. When K =, the following conjecture is empirically correct. Conjecture If e is in the interior of Cone(e, e ), then L (θ, θ, θ ) > 0. If e is in the exterior, then L < 0. If e is on the boundary then L = 0. Same for L. Remark Note that L j can be written as the following: Here we have L j (e, {e, e }) = m j [h(θ), h(θ)] Mr m j (40) = [(π θ)e + sin θe ] e j (4) [α(θ, θ, θ), β(θ, θ, θ)] [ (π θ )e + sin θ e (π θ )e + sin θ e ] e j (4) [α, β] = [α(θ, θ, θ), β(θ, θ, θ)] [h(θ ), h(θ )]M r (43) We know that L = 0 by Proposition. Therefore u j (π θ)e + sin θe [ (π θ)e + sin θe, (π θ)e + sin θe ] [ ] α(θ, θ, θ) β(θ, θ, θ) [ ] = (π θ)e + sin θe α(θ [πe, (π θ)e + sin θe ], θ, θ) β(θ, θ, θ) (44) is perpendicular to e. So if we compute the inner product between u and e Π and is orthogonal to e ), we get (the unit vector that is in u e = (π θ ) sin θ [(π θ) sin θ] β (45) Since e = cos θe + sin θe so we have: L (e, {e, e }) = u e = sin θ(u e ) (46) Note that Eqn. 45 is a function with -variables θ and θ (θ is determined by θ and θ, depending on whether e is inside or outside Cone(e, e )). And we could verify it numerically. Theorem 4 If Conjecture is correct, then for ReLU network, (w, w ) (w w ) is not a critical point, if they both are in Cone(w, w ), or both out of it. Proof If both w and w are inside Cone(w, w ), then from Conjecture, we have L (θ k, θ k, θ ) > 0 (47) for k =,. Since K = we could simply apply Thm.?? to say (w, w ) is not a critical point. Similary we prove the case for both w and w outside Cone(w, w ). 6 Convergence Analysis 6. Single ReLU case In this subsection, we mainly deal with the following dynamics: E [ w J] = N (w w ) + N ( ) θw w sin θw π w (48) 6
7 Theorem 5 In the region w 0 w < w, following the dynamics (Eqn. 48), the Lyapunov function V (w) = w w has V < 0 and the system is asymptotically stable and thus w t w when t +. Proof Denote that Ω = {w : w 0 w < w }. Note that V = (w w ) w J = y My (49) where y = [ w, w ] and M is the following -by- matrix: M = [ ] sin θ + π θ (π θ) cos θ sin θ (π θ) cos θ sin θ π (50) In the following we will show that M is positive definite when θ (0, π/]. It suffices to show that M > 0, M > 0 and det(m) > 0. The first two are trivial, while the last one is: 4det(M) = π(sin θ + π θ) [(π θ) cos θ + sin θ] (5) = π(sin θ + π θ) [ (π θ) cos θ + (π θ) sin θ + sin θ ] (5) = (4π ) sin θ 4πθ + 4πθ cos θ θ cos θ + θ sin θ (53) = (4π 4πθ ) sin θ + θ cos θ( sin θ θ cos θ) (54) Note that 4π 4πθ = 4π(π θ) > 0 for θ [0, π/], and g(θ) = sin θ θ cos θ 0 for θ [0, π/] since g(0) = 0 and g (θ) 0 in this region. Therefore, when θ (0, π/], M is positive definite. When θ = 0, M(θ) = π[, ;, ] and is semi-positive definite, with the null eigenvector being [, ], i.e., w = w. However, along θ = 0, the only w that satisfies w = w is w = w. Therefore, V = y My < 0 in Ω. Note that although this region could be expanded to the entire open half-space H = {w : w w > 0}, it is not straightforward to prove the convergence in H, since the trajectory might go outside H. On the other hand, Ω is the level set V < w so the trajectory starting within Ω remains inside. (a) O kw w k < kw k Convergent region w (b) r O w Sample region Successful samples V d ( )/ Figure : (a) Sampling strategy to maximize the probability of convergence. sampling range r and desired probability of success ( ɛ)/. (b) Relationship between Theorem 6 When K =, the dynamics in Eqn. 64 converges to w with probability at least ( ɛ)/, if the initial value w 0 is sampled uniformly from B r = {w : w r} with: π r ɛ d + w (55) Proof Given a ball of radius r, we first compute the gap δ of sphere cap (Fig. (b)). First cos θ = so δ = r cos θ = r w r w,. Then a sufficient condition for the probability argument to hold, is to ensure that the volume V shaded of the shaded area is greater than ɛ V d(r), where V d (r) is the volume of d-dimensional ball of radius r. Since V shaded V d(r) δv d, it suffices to have: V d(r) δv d ɛ V d(r) (56) 7
8 which gives Using δ = r w and V d(r) = V d ()r d, we thus have: V d δ ɛ (57) V d r ɛ V d() V d () w (58) where V d () is the volume of the unit ball. Since the volume of d-dimensional unit ball is where Γ(x) = 0 t x e t dt. So we have From Gautschi s Inequality x s < with s = / and x = d/ we have: Therefore, it suffices to have V d () = π d/ Γ(d/ + ) V d () V d () = Γ(d/ + /) π Γ(d/ + ) (59) (60) Γ(x + ) Γ(x + s) < (x + s) s x > 0, 0 < s < (6) ( ) / d + < Γ(d/ + /) Γ(d/ + ) < ( ) / d (6) π r ɛ d + w (63) Note that this upper bound is tight when δ 0 and d +, since all inequality involved asymptotically becomes equal. 6. Multiple ReLU case Explanation of Eqn. 8. We first write down the dynamics to be studied: E [ wj J ] = K E [ F (e j, wj )] E [F (e j, w j )] (64) j = We first define f(w j, w j, wj ) = F (w j/ w j, wj ) F (w j/ w j, w j ). Therefore, the dynamics can be written as: E [ wj J ] = E [ f(w j, w j, wj )] (65) j Suppose we have a finite group P = {P j } which is a subgroup of orthognoal group O(d). P is the identity element. If w and w have the following symmetry: w j = P j w and w j = P jw, then RHS of Eqn. 64 can be simplified as follows: E [ wj J ] = j E [ f(w j, w j, w j )] = j E [f(p j w, P j w, P j w )] = j E [f(p j w, P j P j w, P j P j w )] ({P j } K j= is a group) = P j j E [f(w, P j w, P j w )] ( P w = w, (P w, P w ) = (w, w )) = P j E [ w J] (66) 8
9 which means that if all w j and wj are symmetric under the action of cyclic group, so does their expected gradient. Therefore, the trajectory {w t } keeps such cyclic structure. Instead of solving a system of K equations, we only need to solve one: E [ w J] = K E [f(w, P j w, P j w )] (67) j= Theorem 7 For a bias-free two-layered ReLU network g(x; w) = j σ(w j x) (68) that takes spherical Gaussian inputs, if the teacher s parameters {wj } form a set of orthnomal bases, then: () When the student parameters is initialized to be [x 0, y 0,..., y 0 ] under the basis of w, where (x 0, y 0 ) Ω = {x (0, ], y [0, ], x > y}, then the dynamics (Eqn. 64) converges to teacher s parameters {w j } (or (x, y) = (, 0)); () when x 0 = y 0 (0, ], then it converges to a saddle point x = y = πk ( K arccos(/ K) + π). This theorem is quite complicated. We will start with a few lemmas and gradually come to the conclusion. First, if w 0 = [x, y, y,..., y] under the bases {wj }K j=, then from simple computation we know that wt also follows this pattern. Therefore, we only need to study the following D dynamics related to x and y: π [ ] { [ [ ] [ ]} N E x J θ x = [(π φ)(x + (K )y)] + y J ] φ + φ φ y [ ] + [(K )(α sin φ x sin φ) + α sin θ] (69) y Here the symmetrical factor (α wj / w j, θ θ j j, φ θj j, φ θ j j ) are defined as follows: α = (x + (K )y ) /, cos θ = αx, cos φ = αy, cos φ = α (xy + (K )y ) (70) Now we start a sequence of lemmas. Lemma 3 For φ, θ and φ defined in Eqn. 70: α (x + (K )y ) / (7) cos θ αx (7) cos φ αy (73) cos φ α (xy + (K )y ) (74) we have the following relations in the triangular region Ω ɛ0 = {(x, y) : x 0, y 0, x y + ɛ 0 } (Fig. (c)): () φ, φ [0, π/] and θ [0, θ 0 ) where θ 0 = arccos K. () cos φ = α (x y) and sin φ = α(x y) α (x y). (3) φ φ (equality holds only when y = 0) and φ > θ. Proof Propositions () and () are computed by direct calculations. In particular, note that since cos θ = αx = / + (K )(y/x) and x > y 0, we have cos θ (/ K, ] and θ [0, θ 0 ). For Preposition (3), φ = arccos αy > θ = arccos αx because x > y. Finally, for x > y > 0, we have cos φ cos φ = α (xy + (K )y ) = α(x + (K )y) > α(x + (K )y) > (75) αy The final inequality is because K, x, y > 0 and thus (x+(k )y) > x +(K ) y > x +(K )y = α. Therefore φ > φ. If y = 0 then φ = φ. 9
10 y (0, ) x=y+ Saddle point (x,x ) O (, 0) (, 0) Teacher s parameters x Figure 3: The region Ω ɛ considered in the analysis of Eqn. 69. Lemma 4 For the dynamics defined in Eqn. 69, there exists ɛ 0 > 0 so that the trianglar region Ω ɛ0 = {(x, y) : x 0, y 0, x y + ɛ 0 } (Fig. 3) is a convergent region. That is, the flow goes inwards for all three edges and any trajectory starting in Ω ɛ0 stays. Proof We discuss the three boundaries as follows: Case : y = 0, 0 x, horizontal line. In this case, θ = 0, φ = π/ and φ = π/. The y component of the dynamics in this line is: f π N yj = π (x ) 0 (76) So y J points to the interior of Ω. Case : x =, 0 y, vertical line. In this case, α and the x component of the dynamics is: f π N xj = (π φ)(k )y θ + (K )(α sin φ sin φ) + α sin θ (77) = (K ) [(π φ)y (α sin φ sin φ)] + (α sin θ θ) (78) Note that since α, α sin θ sin θ θ, so the second term is non-positive. For the first term, we only need to check whether (π φ)y (α sin φ sin φ) is nonnegative. Note that (π φ)y (α sin φ sin φ) (79) = (π φ)y + α(x y) α (x y) α α y (80) = y [π φ α ] α (x y) + α [x α (x y) ] α y (8) In Ω we have (x y), combined with α, we have α (x y) and α y. Since x =, the second term is nonnegative. For the first term, since α, π φ α α (x y) π π > 0 (8) So (π φ)y (α sin φ sin φ) 0 and x J 0, pointing inwards. Case 3: x = y + ɛ, 0 y, diagonal line. We compute the inner product between ( x J, y J) and (, ), the inward normal of Ω at the line. Using φ π sin φ for φ [0, π/] and φ θ = arccos αy arccos αx 0 when x y, we have: f 3 (y, ɛ) π N Note that for y > 0: [ x J y J] [ αy = ] = φ θ ɛφ + [(K )(α sin φ sin φ) + α sin θ] ɛ (83) [ ( ) ] ɛ(k ) α sin φ π + sin φ (K ) [ ( ) = ɛα(k ) α y π ] ɛ + α ɛ (K ) (x/y) + (K ) = ( + ɛ/y) + (K ) K (84) 0
11 For y = 0, αy = 0 < /K. So we have α y /K. And α ɛ. Therefore f 3 ɛα(k )(C ɛc ) with C /K > 0 and C ( + π/(k )) > 0. With ɛ = ɛ 0 > 0 sufficiently small, f 3 > 0. Lemma 5 (Reparametrization) Denote ɛ = x y > 0. The terms αx, αy and αɛ involved in the trigometric functions in Eqn. 69 has the following parameterization: α y x = β β β + (K )β (85) K ɛ Kβ where β = (K β )/(K ). The reverse transformation is given by β = K (K )α ɛ. Here β [, K) and β (0, ]. In particular, the critical point (x, y) = (, 0) corresponds to (β, ɛ) = (, ). As a result, all trigometric functions in Eqn. 69 only depend on the single variable β. In particular, the following relationship is useful: β = cos θ + K sin θ (86) Proof This transformation can be checked by simple algebraic manipulation. For example: ( ) αk (β β ) = K K α (K )ɛ ɛ = ( (Ky + ɛ) ɛ) = y (87) K To prove Eqn. 86, first we notice that K cos θ = Kαx = β + (K )β. Therefore, we have (K cos θ β) (K ) β = 0, which gives β β cos θ + K sin θ = 0. Solving this quadratic equation and notice that β, θ [0, π/] and we get: β = cos θ + cos θ + K sin θ = cos θ + K sin θ (88) Lemma 6 After reparametrization (Eqn. 85), f 3 (β, ɛ) 0 for ɛ (0, β /β]. Furthermore, the equality is true only if (β, ɛ) = (, ) or (y, ɛ) = (0, ). Proof Applying the parametrization (Eqn. 85) to Eqn. 83 and notice that αɛ = β = β (β), we could write f 3 = h (β) (φ + (K ) sin φ)ɛ (89) When β is fixed, f 3 now is a monotonously decreasing function with respect to ɛ > 0. Therefore, f 3 (β, ɛ) f 3 (β, ɛ ) for 0 < ɛ ɛ β /β. If we could prove f 3 (β, ɛ ) 0 and only attain zero at known critical point (β, ɛ) = (, ), the proof is complete. Denote f 3 (β, ɛ ) = f 3 + f 3 where f 3 (β, ɛ ) = φ θ ɛ φ + ɛ α sin θ (90) f 3 (β, ɛ ) = (K )(α sin φ sin φ)ɛ (9) For f 3 it suffices to prove that ɛ (α sin φ sin φ) = β sin φ β β sin φ 0, which is equivalent to sin φ sin φ/β 0. But this is trivially true since φ φ and β. Therefore, f 3 0. Note that the equality only holds when φ = φ and β =, which corresponds to the horizontal line x (0, ], y = 0. For f 3, since φ φ, φ > θ and ɛ (0, ], we have the following: ( f 3 = ɛ (φ φ) + ( ɛ )(φ θ) ɛ θ + β sin θ ɛ θ + β sin θ β sin θ θ ) (9) β And it reduces to showing whether β sin θ θ is nonnegative. Using Eqn. 86, we have: f 33 (θ) = β sin θ θ = sin θ + K sin θ θ (93) Note that f 33 = cos θ + K sin θ = K cos(θ θ 0 ), where θ 0 = arccos K. By Prepositions in Lemma 3, θ [0, θ 0 ). Therefore, f 33 0 and since f 33 (0) = 0, f Again the equity holds when θ = 0, φ = φ and ɛ =, which is the critical point (β, ɛ) = (, ) or (y, ɛ) = (0, ).
12 Lemma 7 For the dynamics defined in Eqn. 69, the only critical point ( x J = 0 and y J = 0) within Ω ɛ is (y, ɛ) = (0, ). Proof We prove by contradiction. Suppose (β, ɛ) is a critical point other than w. A necessary condition for this to hold is f 3 = 0 (Eqn. 83). By Lemma 7, ɛ > ɛ = β /β > 0 and ɛ + Ky = α (β α + β β ) = β α α = β β /ɛ α > β β /ɛ α So ɛ + Ky is strictly greater than zero. On the other hand, the condition f 3 = 0 implies that Using φ [0, π/], φ φ and φ > θ, we have: = 0 (94) ((K )(α sin φ sin φ) + α sin θ) = ɛ (φ θ) + φ (95) π N yj = (π φ)(ɛ + Ky) (φ φ) φy + ((K )(α sin φ sin φ) + α sin θ) y = (π φ)(ɛ + Ky) (φ φ) ɛ (φ θ)y < 0 (96) So the current point (β, ɛ) cannot be a critical point. Lemma 8 Any trajectory in Ω ɛ0 converges to (y, ɛ) = (, 0), following the dynamics defined in Eqn. 69. Proof We have Lyaponov function V = E [E] so that V = E [ w J w J] E [ w J] E [ w J] 0. By Lemma 7, other than the optimal solution w, there is no other symmetric critical point, w J 0 and thus V < 0. On the other hand, by Lemma 4, the triangular region Ω ɛ0 is convergent, in which the D dynamics is C differentiable. Therefore, any D solution curve ξ(t) will stay within. By PoincareBendixson theorem, when there is a unique critical point, the curve either converges to a limit circle or the critical point. However, limit cycle is not possible since V is strictly monotonous decreasing along the curve. Therefore, ξ(t) will converge to the unique critical point, which is (y, ɛ) = (, 0) and so does the symmetric system (Eqn. 64). Lemma 9 When x = y (0, ], the D dynamics (Eqn. 69) reduces to the following D case: π N xj = πk(x x ) (97) where x = πk ( K arccos(/ K) + π). Furthermore, x is a convergent critical point. Proof The D system can be computed with simple algebraic manipulations (note that when x = y, φ = 0 and θ = φ = arccos(/ K)). Note that the D system is linear and its close form solution is x t = x 0 + Ce K/Nt and thus convergent. Combining Lemma 8 and Lemma 9 yields Thm Simulations No theorems is provided. 8 Extension to multilayer ReLU network Proposition For neural network with ReLU nonlinearity and using l loss to match with a teacher network of the same size, the gradient inflow g j for node j at layer c has the following form: g j = Q j (Q j u j Q j u j ) (98) j where Q j and Q j are N-by-N diagonal matrices. For any k [c + ], Q k = j [c] w jkd j Q j and similarly for Q k. The gradient with respect to w j (the parameters immediately under node j), is computed as: wj J = X T c D j g j (99)
13 Proof We prove by induction on layer. For the first layer, there is only one node with g = u v, therefore Q j = Q j = I. Suppose the condition holds for all node j [c]. Then for node k [c + ], we have: g k = w jk D j g j = w jk D j Q j Q j u j Q j u j j j j j = j = j = k w jk D j Q j Q j D j w jk u k j w jk D j Q j j Q j D j j k j Q j k D j w jk u k w jk u k w jk D j Q j k w jk D j Q j Q j D j w jk u k j k j j j Q j D j k w jk u k w jk D j Q j j Q j D j w jk u k Setting Q k = j w jkd j Q j and Q k = j w jk D j Q j (both are diagonal matrices), we thus have: g k = Q k Q k u k Q k Q k u k = Q k Q k u k Q k u k (00) k k 3
Lecture 7: Positive Semidefinite Matrices
Lecture 7: Positive Semidefinite Matrices Rajat Mittal IIT Kanpur The main aim of this lecture note is to prepare your background for semidefinite programming. We have already seen some linear algebra.
More informationChapter 7. Extremal Problems. 7.1 Extrema and Local Extrema
Chapter 7 Extremal Problems No matter in theoretical context or in applications many problems can be formulated as problems of finding the maximum or minimum of a function. Whenever this is the case, advanced
More informationAPPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.
APPENDIX A Background Mathematics A. Linear Algebra A.. Vector algebra Let x denote the n-dimensional column vector with components 0 x x 2 B C @. A x n Definition 6 (scalar product). The scalar product
More informationCopositive matrices and periodic dynamical systems
Extreme copositive matrices and periodic dynamical systems Weierstrass Institute (WIAS), Berlin Optimization without borders Dedicated to Yuri Nesterovs 60th birthday February 11, 2016 and periodic dynamical
More informationEigenvalue comparisons in graph theory
Eigenvalue comparisons in graph theory Gregory T. Quenell July 1994 1 Introduction A standard technique for estimating the eigenvalues of the Laplacian on a compact Riemannian manifold M with bounded curvature
More information1 Lyapunov theory of stability
M.Kawski, APM 581 Diff Equns Intro to Lyapunov theory. November 15, 29 1 1 Lyapunov theory of stability Introduction. Lyapunov s second (or direct) method provides tools for studying (asymptotic) stability
More informationLinear Algebra Lecture Notes-II
Linear Algebra Lecture Notes-II Vikas Bist Department of Mathematics Panjab University, Chandigarh-64 email: bistvikas@gmail.com Last revised on March 5, 8 This text is based on the lectures delivered
More informationMath 321: Linear Algebra
Math 32: Linear Algebra T. Kapitula Department of Mathematics and Statistics University of New Mexico September 8, 24 Textbook: Linear Algebra,by J. Hefferon E-mail: kapitula@math.unm.edu Prof. Kapitula,
More informationSome notes on Coxeter groups
Some notes on Coxeter groups Brooks Roberts November 28, 2017 CONTENTS 1 Contents 1 Sources 2 2 Reflections 3 3 The orthogonal group 7 4 Finite subgroups in two dimensions 9 5 Finite subgroups in three
More informationTangent spaces, normals and extrema
Chapter 3 Tangent spaces, normals and extrema If S is a surface in 3-space, with a point a S where S looks smooth, i.e., without any fold or cusp or self-crossing, we can intuitively define the tangent
More informationAssignment 1: From the Definition of Convexity to Helley Theorem
Assignment 1: From the Definition of Convexity to Helley Theorem Exercise 1 Mark in the following list the sets which are convex: 1. {x R 2 : x 1 + i 2 x 2 1, i = 1,..., 10} 2. {x R 2 : x 2 1 + 2ix 1x
More informationFall 2016 MATH*1160 Final Exam
Fall 2016 MATH*1160 Final Exam Last name: (PRINT) First name: Student #: Instructor: M. R. Garvie Dec 16, 2016 INSTRUCTIONS: 1. The exam is 2 hours long. Do NOT start until instructed. You may use blank
More informationTEST CODE: MIII (Objective type) 2010 SYLLABUS
TEST CODE: MIII (Objective type) 200 SYLLABUS Algebra Permutations and combinations. Binomial theorem. Theory of equations. Inequalities. Complex numbers and De Moivre s theorem. Elementary set theory.
More informationRecall: Dot product on R 2 : u v = (u 1, u 2 ) (v 1, v 2 ) = u 1 v 1 + u 2 v 2, u u = u u 2 2 = u 2. Geometric Meaning:
Recall: Dot product on R 2 : u v = (u 1, u 2 ) (v 1, v 2 ) = u 1 v 1 + u 2 v 2, u u = u 2 1 + u 2 2 = u 2. Geometric Meaning: u v = u v cos θ. u θ v 1 Reason: The opposite side is given by u v. u v 2 =
More informationLECTURE 5: SURFACES IN PROJECTIVE SPACE. 1. Projective space
LECTURE 5: SURFACES IN PROJECTIVE SPACE. Projective space Definition: The n-dimensional projective space P n is the set of lines through the origin in the vector space R n+. P n may be thought of as the
More informationAnalysis-3 lecture schemes
Analysis-3 lecture schemes (with Homeworks) 1 Csörgő István November, 2015 1 A jegyzet az ELTE Informatikai Kar 2015. évi Jegyzetpályázatának támogatásával készült Contents 1. Lesson 1 4 1.1. The Space
More informationCSC Linear Programming and Combinatorial Optimization Lecture 10: Semidefinite Programming
CSC2411 - Linear Programming and Combinatorial Optimization Lecture 10: Semidefinite Programming Notes taken by Mike Jamieson March 28, 2005 Summary: In this lecture, we introduce semidefinite programming
More informationCopositive Plus Matrices
Copositive Plus Matrices Willemieke van Vliet Master Thesis in Applied Mathematics October 2011 Copositive Plus Matrices Summary In this report we discuss the set of copositive plus matrices and their
More information1. What is the determinant of the following matrix? a 1 a 2 4a 3 2a 2 b 1 b 2 4b 3 2b c 1. = 4, then det
What is the determinant of the following matrix? 3 4 3 4 3 4 4 3 A 0 B 8 C 55 D 0 E 60 If det a a a 3 b b b 3 c c c 3 = 4, then det a a 4a 3 a b b 4b 3 b c c c 3 c = A 8 B 6 C 4 D E 3 Let A be an n n matrix
More informationMath 302 Outcome Statements Winter 2013
Math 302 Outcome Statements Winter 2013 1 Rectangular Space Coordinates; Vectors in the Three-Dimensional Space (a) Cartesian coordinates of a point (b) sphere (c) symmetry about a point, a line, and a
More informationHalf of Final Exam Name: Practice Problems October 28, 2014
Math 54. Treibergs Half of Final Exam Name: Practice Problems October 28, 24 Half of the final will be over material since the last midterm exam, such as the practice problems given here. The other half
More informationImplicit Functions, Curves and Surfaces
Chapter 11 Implicit Functions, Curves and Surfaces 11.1 Implicit Function Theorem Motivation. In many problems, objects or quantities of interest can only be described indirectly or implicitly. It is then
More information154 Chapter 9 Hints, Answers, and Solutions The particular trajectories are highlighted in the phase portraits below.
54 Chapter 9 Hints, Answers, and Solutions 9. The Phase Plane 9.. 4. The particular trajectories are highlighted in the phase portraits below... 3. 4. 9..5. Shown below is one possibility with x(t) and
More informationHOMEWORK 2 - RIEMANNIAN GEOMETRY. 1. Problems In what follows (M, g) will always denote a Riemannian manifold with a Levi-Civita connection.
HOMEWORK 2 - RIEMANNIAN GEOMETRY ANDRÉ NEVES 1. Problems In what follows (M, g will always denote a Riemannian manifold with a Levi-Civita connection. 1 Let X, Y, Z be vector fields on M so that X(p Z(p
More information3 Stability and Lyapunov Functions
CDS140a Nonlinear Systems: Local Theory 02/01/2011 3 Stability and Lyapunov Functions 3.1 Lyapunov Stability Denition: An equilibrium point x 0 of (1) is stable if for all ɛ > 0, there exists a δ > 0 such
More informationLinear Algebra, Summer 2011, pt. 3
Linear Algebra, Summer 011, pt. 3 September 0, 011 Contents 1 Orthogonality. 1 1.1 The length of a vector....................... 1. Orthogonal vectors......................... 3 1.3 Orthogonal Subspaces.......................
More informationMANIFOLD STRUCTURES IN ALGEBRA
MANIFOLD STRUCTURES IN ALGEBRA MATTHEW GARCIA 1. Definitions Our aim is to describe the manifold structure on classical linear groups and from there deduce a number of results. Before we begin we make
More informationDiscrete Euclidean Curvature Flows
Discrete Euclidean Curvature Flows 1 1 Department of Computer Science SUNY at Stony Brook Tsinghua University 2010 Isothermal Coordinates Relation between conformal structure and Riemannian metric Isothermal
More informationChapter 1. Preliminaries. The purpose of this chapter is to provide some basic background information. Linear Space. Hilbert Space.
Chapter 1 Preliminaries The purpose of this chapter is to provide some basic background information. Linear Space Hilbert Space Basic Principles 1 2 Preliminaries Linear Space The notion of linear space
More informationThe University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.
The University of Texas at Austin Department of Electrical and Computer Engineering EE381V: Large Scale Learning Spring 2013 Assignment Two Caramanis/Sanghavi Due: Tuesday, Feb. 19, 2013. Computational
More informationEconomics 204 Summer/Fall 2010 Lecture 10 Friday August 6, 2010
Economics 204 Summer/Fall 2010 Lecture 10 Friday August 6, 2010 Diagonalization of Symmetric Real Matrices (from Handout Definition 1 Let δ ij = { 1 if i = j 0 if i j A basis V = {v 1,..., v n } of R n
More informationMath 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces.
Math 350 Fall 2011 Notes about inner product spaces In this notes we state and prove some important properties of inner product spaces. First, recall the dot product on R n : if x, y R n, say x = (x 1,...,
More informationSystems of Linear ODEs
P a g e 1 Systems of Linear ODEs Systems of ordinary differential equations can be solved in much the same way as discrete dynamical systems if the differential equations are linear. We will focus here
More informationLinear Models Review
Linear Models Review Vectors in IR n will be written as ordered n-tuples which are understood to be column vectors, or n 1 matrices. A vector variable will be indicted with bold face, and the prime sign
More informationInsights into the Geometry of the Gaussian Kernel and an Application in Geometric Modeling
Insights into the Geometry of the Gaussian Kernel and an Application in Geometric Modeling Master Thesis Michael Eigensatz Advisor: Joachim Giesen Professor: Mark Pauly Swiss Federal Institute of Technology
More informationL. Levaggi A. Tabacco WAVELETS ON THE INTERVAL AND RELATED TOPICS
Rend. Sem. Mat. Univ. Pol. Torino Vol. 57, 1999) L. Levaggi A. Tabacco WAVELETS ON THE INTERVAL AND RELATED TOPICS Abstract. We use an abstract framework to obtain a multilevel decomposition of a variety
More informationScattered Data Interpolation with Polynomial Precision and Conditionally Positive Definite Functions
Chapter 3 Scattered Data Interpolation with Polynomial Precision and Conditionally Positive Definite Functions 3.1 Scattered Data Interpolation with Polynomial Precision Sometimes the assumption on the
More informationDot Products. K. Behrend. April 3, Abstract A short review of some basic facts on the dot product. Projections. The spectral theorem.
Dot Products K. Behrend April 3, 008 Abstract A short review of some basic facts on the dot product. Projections. The spectral theorem. Contents The dot product 3. Length of a vector........................
More informationThe Symmetric Space for SL n (R)
The Symmetric Space for SL n (R) Rich Schwartz November 27, 2013 The purpose of these notes is to discuss the symmetric space X on which SL n (R) acts. Here, as usual, SL n (R) denotes the group of n n
More informationReview of Some Concepts from Linear Algebra: Part 2
Review of Some Concepts from Linear Algebra: Part 2 Department of Mathematics Boise State University January 16, 2019 Math 566 Linear Algebra Review: Part 2 January 16, 2019 1 / 22 Vector spaces A set
More informationBasic Concepts in Matrix Algebra
Basic Concepts in Matrix Algebra An column array of p elements is called a vector of dimension p and is written as x p 1 = x 1 x 2. x p. The transpose of the column vector x p 1 is row vector x = [x 1
More informationARCS IN FINITE PROJECTIVE SPACES. Basic objects and definitions
ARCS IN FINITE PROJECTIVE SPACES SIMEON BALL Abstract. These notes are an outline of a course on arcs given at the Finite Geometry Summer School, University of Sussex, June 26-30, 2017. Let K denote an
More informationOctober 25, 2013 INNER PRODUCT SPACES
October 25, 2013 INNER PRODUCT SPACES RODICA D. COSTIN Contents 1. Inner product 2 1.1. Inner product 2 1.2. Inner product spaces 4 2. Orthogonal bases 5 2.1. Existence of an orthogonal basis 7 2.2. Orthogonal
More informationStable Process. 2. Multivariate Stable Distributions. July, 2006
Stable Process 2. Multivariate Stable Distributions July, 2006 1. Stable random vectors. 2. Characteristic functions. 3. Strictly stable and symmetric stable random vectors. 4. Sub-Gaussian random vectors.
More informationTEST CODE: MMA (Objective type) 2015 SYLLABUS
TEST CODE: MMA (Objective type) 2015 SYLLABUS Analytical Reasoning Algebra Arithmetic, geometric and harmonic progression. Continued fractions. Elementary combinatorics: Permutations and combinations,
More informationPCA with random noise. Van Ha Vu. Department of Mathematics Yale University
PCA with random noise Van Ha Vu Department of Mathematics Yale University An important problem that appears in various areas of applied mathematics (in particular statistics, computer science and numerical
More informationZweier I-Convergent Sequence Spaces
Chapter 2 Zweier I-Convergent Sequence Spaces In most sciences one generation tears down what another has built and what one has established another undoes. In mathematics alone each generation builds
More informationLinear Algebra. Session 12
Linear Algebra. Session 12 Dr. Marco A Roque Sol 08/01/2017 Example 12.1 Find the constant function that is the least squares fit to the following data x 0 1 2 3 f(x) 1 0 1 2 Solution c = 1 c = 0 f (x)
More informationMath 322. Spring 2015 Review Problems for Midterm 2
Linear Algebra: Topic: Linear Independence of vectors. Question. Math 3. Spring Review Problems for Midterm Explain why if A is not square, then either the row vectors or the column vectors of A are linearly
More information1 Planar rotations. Math Abstract Linear Algebra Fall 2011, section E1 Orthogonal matrices and rotations
Math 46 - Abstract Linear Algebra Fall, section E Orthogonal matrices and rotations Planar rotations Definition: A planar rotation in R n is a linear map R: R n R n such that there is a plane P R n (through
More informationStructural and Multidisciplinary Optimization. P. Duysinx and P. Tossings
Structural and Multidisciplinary Optimization P. Duysinx and P. Tossings 2018-2019 CONTACTS Pierre Duysinx Institut de Mécanique et du Génie Civil (B52/3) Phone number: 04/366.91.94 Email: P.Duysinx@uliege.be
More informationBASIC NOTIONS. x + y = 1 3, 3x 5y + z = A + 3B,C + 2D, DC are not defined. A + C =
CHAPTER I BASIC NOTIONS (a) 8666 and 8833 (b) a =6,a =4 will work in the first case, but there are no possible such weightings to produce the second case, since Student and Student 3 have to end up with
More informationx 3y 2z = 6 1.2) 2x 4y 3z = 8 3x + 6y + 8z = 5 x + 3y 2z + 5t = 4 1.5) 2x + 8y z + 9t = 9 3x + 5y 12z + 17t = 7
Linear Algebra and its Applications-Lab 1 1) Use Gaussian elimination to solve the following systems x 1 + x 2 2x 3 + 4x 4 = 5 1.1) 2x 1 + 2x 2 3x 3 + x 4 = 3 3x 1 + 3x 2 4x 3 2x 4 = 1 x + y + 2z = 4 1.4)
More informationContinuous methods for numerical linear algebra problems
Continuous methods for numerical linear algebra problems Li-Zhi Liao (http://www.math.hkbu.edu.hk/ liliao) Department of Mathematics Hong Kong Baptist University The First International Summer School on
More informationChapter Two Elements of Linear Algebra
Chapter Two Elements of Linear Algebra Previously, in chapter one, we have considered single first order differential equations involving a single unknown function. In the next chapter we will begin to
More informationIntroduction to Semidefinite Programming I: Basic properties a
Introduction to Semidefinite Programming I: Basic properties and variations on the Goemans-Williamson approximation algorithm for max-cut MFO seminar on Semidefinite Programming May 30, 2010 Semidefinite
More informationA geometric proof of the spectral theorem for real symmetric matrices
0 0 0 A geometric proof of the spectral theorem for real symmetric matrices Robert Sachs Department of Mathematical Sciences George Mason University Fairfax, Virginia 22030 rsachs@gmu.edu January 6, 2011
More informationSolutions to old Exam 3 problems
Solutions to old Exam 3 problems Hi students! I am putting this version of my review for the Final exam review here on the web site, place and time to be announced. Enjoy!! Best, Bill Meeks PS. There are
More informationHints and Selected Answers to Homework Sets
Hints and Selected Answers to Homework Sets Phil R. Smith, Ph.D. March 8, 2 Test I: Systems of Linear Equations; Matrices Lesson. Here s a linear equation in terms of a, b, and c: a 5b + 7c 2 ( and here
More informationCS 246 Review of Linear Algebra 01/17/19
1 Linear algebra In this section we will discuss vectors and matrices. We denote the (i, j)th entry of a matrix A as A ij, and the ith entry of a vector as v i. 1.1 Vectors and vector operations A vector
More informationInterior-Point Methods for Linear Optimization
Interior-Point Methods for Linear Optimization Robert M. Freund and Jorge Vera March, 204 c 204 Robert M. Freund and Jorge Vera. All rights reserved. Linear Optimization with a Logarithmic Barrier Function
More informationStability of Feedback Solutions for Infinite Horizon Noncooperative Differential Games
Stability of Feedback Solutions for Infinite Horizon Noncooperative Differential Games Alberto Bressan ) and Khai T. Nguyen ) *) Department of Mathematics, Penn State University **) Department of Mathematics,
More informationCO 250 Final Exam Guide
Spring 2017 CO 250 Final Exam Guide TABLE OF CONTENTS richardwu.ca CO 250 Final Exam Guide Introduction to Optimization Kanstantsin Pashkovich Spring 2017 University of Waterloo Last Revision: March 4,
More informationON THE MATRIX EQUATION XA AX = X P
ON THE MATRIX EQUATION XA AX = X P DIETRICH BURDE Abstract We study the matrix equation XA AX = X p in M n (K) for 1 < p < n It is shown that every matrix solution X is nilpotent and that the generalized
More informationUNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems
UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems Robert M. Freund February 2016 c 2016 Massachusetts Institute of Technology. All rights reserved. 1 1 Introduction
More information1. < 0: the eigenvalues are real and have opposite signs; the fixed point is a saddle point
Solving a Linear System τ = trace(a) = a + d = λ 1 + λ 2 λ 1,2 = τ± = det(a) = ad bc = λ 1 λ 2 Classification of Fixed Points τ 2 4 1. < 0: the eigenvalues are real and have opposite signs; the fixed point
More informationMATH 423 Linear Algebra II Lecture 33: Diagonalization of normal operators.
MATH 423 Linear Algebra II Lecture 33: Diagonalization of normal operators. Adjoint operator and adjoint matrix Given a linear operator L on an inner product space V, the adjoint of L is a transformation
More informationSolving the 3D Laplace Equation by Meshless Collocation via Harmonic Kernels
Solving the 3D Laplace Equation by Meshless Collocation via Harmonic Kernels Y.C. Hon and R. Schaback April 9, Abstract This paper solves the Laplace equation u = on domains Ω R 3 by meshless collocation
More informationThroughout these notes we assume V, W are finite dimensional inner product spaces over C.
Math 342 - Linear Algebra II Notes Throughout these notes we assume V, W are finite dimensional inner product spaces over C 1 Upper Triangular Representation Proposition: Let T L(V ) There exists an orthonormal
More informationNow I switch to nonlinear systems. In this chapter the main object of study will be
Chapter 4 Stability 4.1 Autonomous systems Now I switch to nonlinear systems. In this chapter the main object of study will be ẋ = f(x), x(t) X R k, f : X R k, (4.1) where f is supposed to be locally Lipschitz
More informationAppendix PRELIMINARIES 1. THEOREMS OF ALTERNATIVES FOR SYSTEMS OF LINEAR CONSTRAINTS
Appendix PRELIMINARIES 1. THEOREMS OF ALTERNATIVES FOR SYSTEMS OF LINEAR CONSTRAINTS Here we consider systems of linear constraints, consisting of equations or inequalities or both. A feasible solution
More informationElementary linear algebra
Chapter 1 Elementary linear algebra 1.1 Vector spaces Vector spaces owe their importance to the fact that so many models arising in the solutions of specific problems turn out to be vector spaces. The
More informationIn particular, if A is a square matrix and λ is one of its eigenvalues, then we can find a non-zero column vector X with
Appendix: Matrix Estimates and the Perron-Frobenius Theorem. This Appendix will first present some well known estimates. For any m n matrix A = [a ij ] over the real or complex numbers, it will be convenient
More informationAlgebra Workshops 10 and 11
Algebra Workshops 1 and 11 Suggestion: For Workshop 1 please do questions 2,3 and 14. For the other questions, it s best to wait till the material is covered in lectures. Bilinear and Quadratic Forms on
More informationSolutions to Math 53 Math 53 Practice Final
Solutions to Math 5 Math 5 Practice Final 20 points Consider the initial value problem y t 4yt = te t with y 0 = and y0 = 0 a 8 points Find the Laplace transform of the solution of this IVP b 8 points
More informationj=1 u 1jv 1j. 1/ 2 Lemma 1. An orthogonal set of vectors must be linearly independent.
Lecture Notes: Orthogonal and Symmetric Matrices Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong taoyf@cse.cuhk.edu.hk Orthogonal Matrix Definition. Let u = [u
More informationLinear Algebra Review
January 29, 2013 Table of contents Metrics Metric Given a space X, then d : X X R + 0 and z in X if: d(x, y) = 0 is equivalent to x = y d(x, y) = d(y, x) d(x, y) d(x, z) + d(z, y) is a metric is for all
More informationREVIEW OF DIFFERENTIAL CALCULUS
REVIEW OF DIFFERENTIAL CALCULUS DONU ARAPURA 1. Limits and continuity To simplify the statements, we will often stick to two variables, but everything holds with any number of variables. Let f(x, y) be
More informationMath 321: Linear Algebra
Math 32: Linear Algebra T Kapitula Department of Mathematics and Statistics University of New Mexico September 8, 24 Textbook: Linear Algebra,by J Hefferon E-mail: kapitula@mathunmedu Prof Kapitula, Spring
More informationInformation About Ellipses
Information About Ellipses David Eberly, Geometric Tools, Redmond WA 9805 https://www.geometrictools.com/ This work is licensed under the Creative Commons Attribution 4.0 International License. To view
More informationx 1 + x 2 2 x 1 x 2 1 x 2 2 min 3x 1 + 2x 2
Lecture 1 LPs: Algebraic View 1.1 Introduction to Linear Programming Linear programs began to get a lot of attention in 1940 s, when people were interested in minimizing costs of various systems while
More information0.1 Rational Canonical Forms
We have already seen that it is useful and simpler to study linear systems using matrices. But matrices are themselves cumbersome, as they are stuffed with many entries, and it turns out that it s best
More informationLagrange Multipliers
Optimization with Constraints As long as algebra and geometry have been separated, their progress have been slow and their uses limited; but when these two sciences have been united, they have lent each
More informationMATH 1120 (LINEAR ALGEBRA 1), FINAL EXAM FALL 2011 SOLUTIONS TO PRACTICE VERSION
MATH (LINEAR ALGEBRA ) FINAL EXAM FALL SOLUTIONS TO PRACTICE VERSION Problem (a) For each matrix below (i) find a basis for its column space (ii) find a basis for its row space (iii) determine whether
More informationMORE NOTES FOR MATH 823, FALL 2007
MORE NOTES FOR MATH 83, FALL 007 Prop 1.1 Prop 1. Lemma 1.3 1. The Siegel upper half space 1.1. The Siegel upper half space and its Bergman kernel. The Siegel upper half space is the domain { U n+1 z C
More informationThe Closed Form Reproducing Polynomial Particle Shape Functions for Meshfree Particle Methods
The Closed Form Reproducing Polynomial Particle Shape Functions for Meshfree Particle Methods by Hae-Soo Oh Department of Mathematics, University of North Carolina at Charlotte, Charlotte, NC 28223 June
More informationLecture 8: Linear Algebra Background
CSE 521: Design and Analysis of Algorithms I Winter 2017 Lecture 8: Linear Algebra Background Lecturer: Shayan Oveis Gharan 2/1/2017 Scribe: Swati Padmanabhan Disclaimer: These notes have not been subjected
More informationr=1 r=1 argmin Q Jt (20) After computing the descent direction d Jt 2 dt H t d + P (x + d) d i = 0, i / J
7 Appendix 7. Proof of Theorem Proof. There are two main difficulties in proving the convergence of our algorithm, and none of them is addressed in previous works. First, the Hessian matrix H is a block-structured
More informationEigenvalues and Eigenvectors
/88 Chia-Ping Chen Department of Computer Science and Engineering National Sun Yat-sen University Linear Algebra Eigenvalue Problem /88 Eigenvalue Equation By definition, the eigenvalue equation for matrix
More informationConvex Analysis and Optimization Chapter 2 Solutions
Convex Analysis and Optimization Chapter 2 Solutions Dimitri P. Bertsekas with Angelia Nedić and Asuman E. Ozdaglar Massachusetts Institute of Technology Athena Scientific, Belmont, Massachusetts http://www.athenasc.com
More informationDS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.
DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1
More informationGrothendieck s Inequality
Grothendieck s Inequality Leqi Zhu 1 Introduction Let A = (A ij ) R m n be an m n matrix. Then A defines a linear operator between normed spaces (R m, p ) and (R n, q ), for 1 p, q. The (p q)-norm of A
More informationMath 5630: Iterative Methods for Systems of Equations Hung Phan, UMass Lowell March 22, 2018
1 Linear Systems Math 5630: Iterative Methods for Systems of Equations Hung Phan, UMass Lowell March, 018 Consider the system 4x y + z = 7 4x 8y + z = 1 x + y + 5z = 15. We then obtain x = 1 4 (7 + y z)
More informationMostow Rigidity. W. Dison June 17, (a) semi-simple Lie groups with trivial centre and no compact factors and
Mostow Rigidity W. Dison June 17, 2005 0 Introduction Lie Groups and Symmetric Spaces We will be concerned with (a) semi-simple Lie groups with trivial centre and no compact factors and (b) simply connected,
More informationTutorials in Optimization. Richard Socher
Tutorials in Optimization Richard Socher July 20, 2008 CONTENTS 1 Contents 1 Linear Algebra: Bilinear Form - A Simple Optimization Problem 2 1.1 Definitions........................................ 2 1.2
More informationLecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016
Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 1 Entropy Since this course is about entropy maximization,
More informationMAT 419 Lecture Notes Transcribed by Eowyn Cenek 6/1/2012
(Homework 1: Chapter 1: Exercises 1-7, 9, 11, 19, due Monday June 11th See also the course website for lectures, assignments, etc) Note: today s lecture is primarily about definitions Lots of definitions
More informationw T 1 w T 2. w T n 0 if i j 1 if i = j
Lyapunov Operator Let A F n n be given, and define a linear operator L A : C n n C n n as L A (X) := A X + XA Suppose A is diagonalizable (what follows can be generalized even if this is not possible -
More informationLecture 1: Review of linear algebra
Lecture 1: Review of linear algebra Linear functions and linearization Inverse matrix, least-squares and least-norm solutions Subspaces, basis, and dimension Change of basis and similarity transformations
More informationMath 1060 Linear Algebra Homework Exercises 1 1. Find the complete solutions (if any!) to each of the following systems of simultaneous equations:
Homework Exercises 1 1 Find the complete solutions (if any!) to each of the following systems of simultaneous equations: (i) x 4y + 3z = 2 3x 11y + 13z = 3 2x 9y + 2z = 7 x 2y + 6z = 2 (ii) x 4y + 3z =
More information