(b) (a) 1 Introduction. 2 Related Works. 3 Problem Definition. 4 The Analytical Formula. Yuandong Tian Facebook AI Research

Size: px
Start display at page:

Download "(b) (a) 1 Introduction. 2 Related Works. 3 Problem Definition. 4 The Analytical Formula. Yuandong Tian Facebook AI Research"

Transcription

1 Supp Materials: An Analytical Formula of Population Gradient for two-layered ReLU network and its Applications in Convergence and Critical Point Analysis Yuandong Tian Facebook AI Research March 8, 07 Introduction No theorem is provided. Related Works No theorem is provided. 3 Problem Definition No theorem is provided. 4 The Analytical Formula Here we list all detailed proof for all the theorems. (a) e w (b) w e O O 0 <0 Figure : (a)-(b) Two cases in Thm..

2 4. One ReLU Case Theorem Suppose F (e, w) = X D(e)D(w)Xw where e is a unit vector and X = [x, x,, x N ] is N-by-d sample matrix. If x i N(0, I), then: where θ [0, π] is the angle between e and w. E [F (e, w)] = N ((π θ)w + w sin θe) () π Proof Note that F can be written in the following form: F (e, w) = x i x i w () i:x i e 0,x i w 0 where x i are samples so that X = [x, x,, x n ]. We set up the axes related to e and w as in Fig., while the rest of the axis are prependicular to the plane. In this coordinate system, any vector w/ w e cos θ x = [r sin φ, r cos φ, x 3,..., x d ]. We have an orthonomal set of bases: e, e = sin θ (and any set of bases that span the rest of the space). Under the basis, the representation for e and w is [, 0 d ] and [ w cos θ, w sin θ, 0 d ]. Note that here θ ( π, π]. The angle θ is positive when e chases after w, and is otherwise negative. Now we consider the quality R(φ 0 ) = E [ N ] i:φ x i [0,φ 0] ix i. If we take the expectation and use polar coordinate only in the first two dimensions, we have: R(φ 0 ) = E x i x i = E [x i x i N φ i [0, φ 0 ]] P [φ i [0, φ 0 ]] i:φ i [0,φ 0] + + r sin φ φ0 = r cos φ [ ] d r sin φ r cos φ... xd p(r)p(θ) p(x k )rdrdφdx 3... dx d k=3 x d where p(r) = e r / and p(θ) = /π. Note that R(φ 0 ) is a d-by-d matrix. The first -by- block can be computed in close form (note that + r p(r)rdr = ). Any off-diagonal element except for the first -by- 0 block is zero due to symmetric property of spherical Gaussian variables. Any diagonal element outside the first -by- block will be P [φ i [0, φ 0 ]] = φ 0 /π. Finally, we have: R(φ 0 ) = E x i x i = φ 0 sin φ 0 cos φ 0 0 cos φ 0 φ 0 + sin φ 0 0 (3) N 4π i:φ i [0,φ 0] 0 0 φ 0 I d = φ 0 π I d + sin φ 0 cos φ 0 0 cos φ 0 sin φ 0 0 (4) 4π With this equation, we could then compute E [F (e, w)]. When θ 0, the condition {i : x i e 0, x i w 0} is equivalent to {i : φ i [θ, π]} (Fig. (a)). Using w = [ w cos θ, w sin θ, 0 d ] and we have: E [F (e, w)] = N (R(π) R(θ)) w (5) = N sin θ cos θ 0 cos θ (π θ)w w cos θ sin θ 0 sin θ (6) 4π = N ( [ ]) sin θ (π θ)w + w (7) π 0 = N ((π θ)w + w sin θe) (8) π For θ < 0, the condition {i : x i e 0, x i w 0} is equivalent to {i : φ i [0, π + θ]} (Fig. (b)), and similarly we get E [F (e, w)] = N (R(π + θ) R(0)) w = N ((π + θ)w w sin θe) (9) π Notice that by abuse of notation, the θ appears in Eqn. is the absolute value and Eqn. follows.

3 5 Critical Point Analysis Remember that we have: suppose F (e, w) = X D(e)D(w)Xw where e is a unit vector and X = [x, x,, x N ] is N-by-d sample matrix. If x i N(0, I), then: E [F (e, w)] = N ((π θ)w + w sin θe) (0) π where θ [0, π] is the angle between e and w. And the expected gradient for g(x) = K respect to w j is the following: E [ wj J ] = j= σ(w j x) with K E [ F (e j, wj )] E [F (e j, w j )] () j = where e j = w j / w j. By solving Eqn. 64 (E [ wj J ] = 0 j =,..., K), it is possible to identify all critical points of g(x). In general Eqn. 64 is highly nonlinear and cannot be solved efficiently, however, we show that in our particular case, it is possible since Eqn. 64 has interesting properties. First of all, the system of equations or K E [F (e j, w j ] = j = is a linear combination of w j and w j E [ wj J ] = 0, j =,..., K () K E [ F (e j, wj ] j =, j =,..., K (3) but with varying coefficients. We could rewrite the system as follows: diagae + Bdiag we = diaga E + B W (4) where E, W, W, a, B, a and B are all K-by-K matrices: where θ j j (w j, wj ) and θj j = θj j θj i) and Θ = [θ i a = sin Θ w, a = sin Θ w (5) B = π Θ, B = π Θ (6) E = [e, e,..., e K ] (7) W = [w, w,..., w K ], W = [w, w,..., w K] (8) (w j, w j ), Θ = [θj i ] (the element at i-th row, j-th column of Θ is j ], w = [ w, w,..., w K ] and w = [ w, w,..., wk ]. Eqn. 4 already has interesting properties. The first thing we consider is whether the critical point will fall outside the Principle Hyperplane Π, which is the plane spanned by the ground truth weight vectors W. The following theorem shows that the critical points outside Π must lie in a manifold: Lemma If {w j } is a critical point satisfying Eqn. 4, then for any orthogonal mapping R with R Π = Id, {Rw j } is also a critical point. Proof First of all, since R is an orthogonal transformation, it keeps all angles and magnitudes and a, a, B, w and w are invariant. For simplicity we write Y = diaga + Bdiag w diaga and Y is also invariant under R. Since R Π = Id, we have RW = W and Y R E R B RW = Y E R B W R = (Y E B W ) R = 0 (9) Note that for d K +, there always exists R Id and satisfy such a condition, which yield continuous critical points. Further, such a transformation forms a Lie group. Therefore we have: Theorem If d K +, then any critical point satisfying Eqn. 4 and is outside Π must lie in a manifold. The intuition is simple. For any out-of-plane critical point, pick a matrix that satisfies the condition of the theorem, and tranforms it to a different yet infinitely close critical points. Such a matrix always exists, since for the d K subspace, if it is odd, then we can always pick a rotation whose fixed axis is not aligned with all K weights; if it is even, then there is a rotation matrix without a fixed point. 3

4 5. Characteristics within the Principle Plane We could right-multiple E and turn the normal equation to a linear function with respect to the magnitude of weights w. Note that we have: Therefore, Eqn. 4 becomes: E E = cos Θ, W E = diag w cos Θ, (W ) E = diag w Θ (0) diaga cos Θ + Bdiag w cos Θ = diaga cos Θ + B diag w cos Θ () which is a homogenous linear equation with respect to the magnitude of the weights (note that a and a is linear to the magnitudes). In particular, the (i, j) entry of the LHS and RHS of this equality are: ( K ) K LHS ij = cos θj i sin θi k w k + (π θi k ) w k cos θj k () RHS ij = cos θ i j k= ( K Therefore, the following equation holds: k= sin θi k wk ) + k= K (π θi k ) wk cos θj k (3) k= M w = M w (4) where M and M are K -by-k matrices. Each entry m ij,k that correponds to the coefficient of k-th weight vector at (i, j) entry of Eqn. 4 is defined as: m ij,k = (π θ k i ) cos θ k j + sin θ k i cos θ i j (5) m ij,k = (π θi k ) cos θj k + sin θi k cos θj i (6) Special case on the diagonal. For diagonal element (i, i), cos θ i i = and m ii,k = h(θ k i ), m ii,k = h(θ k i ), where h(θ) = (π θ) cos θ + sin θ. (7) Therefore, with only diagonal element, we arrive at the following subset of the constraints to be satisfied for any critical points: M r w = M r w (8) where M r = h(θ ) and M r = h(θ ) are both K-by-K matrices. Note that if M r is full-rank, then we could solve w from Eqn. 8 and plug it back in Eqn. 4 to check whether it is indeed a critical point. Lemma If w 0 (no trivial ground truth solutions), and for a given (Θ, Θ ), there exists a row (e.g. l-th row) of M and M, namely m l and m l, satisfying Then (Θ, Θ ) cannot be a critical point. m l Mr Mr m l > 0 or m l Mr Mr m l < 0 (9) Proof Suppose given (Θ, Θ ), we get M r and M r and compute w using Eqn. 8, then we have ( w ) M r M r m l = (Mr w ) Mr m l = w Mr Mr m l = w m l (30) Therefore, from the condition m l M r Mr m l > 0 and w 0 but w 0, we have ( w ) (m l Mr Mr m l ) = ( w ) m l w m l > 0 (3) but this contradicts with the necessary condition for (Θ, Θ ) to become a critical point (Eqn. 4). Similarly we can prove the other side. Separation property of Eqn. 9. Note that both the k-th element of m l and M r M r m l in Eqn. 9 are only dependent on the k-th true weight vector w k (and all {w j}). 4

5 For m l, this can be seen by Eqn. 6, in which the k-th element is only related to the angles θ k between wk and {w j}. For M r M r m l, notice that the k-th column of M r (the k-th row of M r ) is only related to w k but not other ground truth weight vectors. This separation property makes analysis much easier, as shown in the case of K =. Therefore, we could consider the following function regarding to one (rather than K) ground truth unit weight vector e and all estimated unit vectors {e l }: L ij (e, {e l }) = m ij v M r m ij (3) where v = [h(θ ), h(θ ),..., h(θ K )], θ j = (e, w j ) and m ij = (π θ i ) cos θ j + sin θ i cos θi j (like Eqn. 6). Proposition L ij (e, {e l }) = 0 for any e = e l, l K. In addition, L ii (e, {e l }) 0. Proof When e = e l, then θk = θl k and v becomes the l-th row of M r. Since M r Mr becomes a unit vector with only l-th element being. Therefore, again with θk = θl k, we have: = I K K, v M r L ij (e, {e l }) = m ij m ij,l = 0 (33) For L ii, by definition m ii is i-th column of M r, so Mr m ii is a unit vector with only i-th element being. Therefore (v ) Mr m ii = h(θi ) = m ii (34) Then the previous lemma can be written as the following: Theorem 3 If w 0, and for a given parameter w, L jj ({θl k }, Θ) > 0 or < 0 for all k K, then w cannot be a critical point. 5. ReLU network with two hidden nodes (K = ) For K =, we have 4-by- matrix (the row order is (, ), (, ), (, ), (, )): (π θ) cos θ + sin θ cos θ (π θ) cos θ + sin θ cos θ M = (π θ) cos θ + sin θ cos θ (π θ) cos θ + sin θ cos θ (π θ) cos θ + sin θ cos θ (π θ) cos θ + sin θ cos θ (35) (π θ) cos θ + sin θ cos θ (π θ) cos θ + sin θ cos θ π (π θ) cos θ + sin θ = π cos θ (π θ) + sin θ cos θ (π θ) + sin θ cos θ π cos θ (36) (π θ) cos θ + sin θ π since θ = θ = θ, θ = θ = 0. Similarly we could write M : (π θ ) cos θ + sin θ cos θ (π θ ) cos θ + sin θ cos θ M = (π θ ) cos θ + sin θ cos θ (π θ ) cos θ + sin θ cos θ (π θ ) cos θ + sin θ cos θ (π θ ) cos θ + sin θ cos θ (37) (π θ ) cos θ + sin θ cos θ (π θ ) cos θ + sin θ cos θ In this case, M r = [ ] h(θ ) h(θ) h(θ) h(θ), Mr = [ h(θ ) h(θ ) ) h(θ ) h(θ Therefore, if we know θ = θ, θ, θ, θ and θ, then we could compute M and M and solve a linear equation to get the magnitude of w and w, which collectly identify the critical points. Note that M is a 4-by- matrix, so critical point only happens if the matrix has singular structure. ] (38) 5

6 Global Optimum. One special case is when θ = θ = θ = θ = π/ and θ = θ = 0, in this case, we have: π M = M = 0 π/ π/ 0 (39) π and thus w j = wj is the unique solution. When K =, the following conjecture is empirically correct. Conjecture If e is in the interior of Cone(e, e ), then L (θ, θ, θ ) > 0. If e is in the exterior, then L < 0. If e is on the boundary then L = 0. Same for L. Remark Note that L j can be written as the following: Here we have L j (e, {e, e }) = m j [h(θ), h(θ)] Mr m j (40) = [(π θ)e + sin θe ] e j (4) [α(θ, θ, θ), β(θ, θ, θ)] [ (π θ )e + sin θ e (π θ )e + sin θ e ] e j (4) [α, β] = [α(θ, θ, θ), β(θ, θ, θ)] [h(θ ), h(θ )]M r (43) We know that L = 0 by Proposition. Therefore u j (π θ)e + sin θe [ (π θ)e + sin θe, (π θ)e + sin θe ] [ ] α(θ, θ, θ) β(θ, θ, θ) [ ] = (π θ)e + sin θe α(θ [πe, (π θ)e + sin θe ], θ, θ) β(θ, θ, θ) (44) is perpendicular to e. So if we compute the inner product between u and e Π and is orthogonal to e ), we get (the unit vector that is in u e = (π θ ) sin θ [(π θ) sin θ] β (45) Since e = cos θe + sin θe so we have: L (e, {e, e }) = u e = sin θ(u e ) (46) Note that Eqn. 45 is a function with -variables θ and θ (θ is determined by θ and θ, depending on whether e is inside or outside Cone(e, e )). And we could verify it numerically. Theorem 4 If Conjecture is correct, then for ReLU network, (w, w ) (w w ) is not a critical point, if they both are in Cone(w, w ), or both out of it. Proof If both w and w are inside Cone(w, w ), then from Conjecture, we have L (θ k, θ k, θ ) > 0 (47) for k =,. Since K = we could simply apply Thm.?? to say (w, w ) is not a critical point. Similary we prove the case for both w and w outside Cone(w, w ). 6 Convergence Analysis 6. Single ReLU case In this subsection, we mainly deal with the following dynamics: E [ w J] = N (w w ) + N ( ) θw w sin θw π w (48) 6

7 Theorem 5 In the region w 0 w < w, following the dynamics (Eqn. 48), the Lyapunov function V (w) = w w has V < 0 and the system is asymptotically stable and thus w t w when t +. Proof Denote that Ω = {w : w 0 w < w }. Note that V = (w w ) w J = y My (49) where y = [ w, w ] and M is the following -by- matrix: M = [ ] sin θ + π θ (π θ) cos θ sin θ (π θ) cos θ sin θ π (50) In the following we will show that M is positive definite when θ (0, π/]. It suffices to show that M > 0, M > 0 and det(m) > 0. The first two are trivial, while the last one is: 4det(M) = π(sin θ + π θ) [(π θ) cos θ + sin θ] (5) = π(sin θ + π θ) [ (π θ) cos θ + (π θ) sin θ + sin θ ] (5) = (4π ) sin θ 4πθ + 4πθ cos θ θ cos θ + θ sin θ (53) = (4π 4πθ ) sin θ + θ cos θ( sin θ θ cos θ) (54) Note that 4π 4πθ = 4π(π θ) > 0 for θ [0, π/], and g(θ) = sin θ θ cos θ 0 for θ [0, π/] since g(0) = 0 and g (θ) 0 in this region. Therefore, when θ (0, π/], M is positive definite. When θ = 0, M(θ) = π[, ;, ] and is semi-positive definite, with the null eigenvector being [, ], i.e., w = w. However, along θ = 0, the only w that satisfies w = w is w = w. Therefore, V = y My < 0 in Ω. Note that although this region could be expanded to the entire open half-space H = {w : w w > 0}, it is not straightforward to prove the convergence in H, since the trajectory might go outside H. On the other hand, Ω is the level set V < w so the trajectory starting within Ω remains inside. (a) O kw w k < kw k Convergent region w (b) r O w Sample region Successful samples V d ( )/ Figure : (a) Sampling strategy to maximize the probability of convergence. sampling range r and desired probability of success ( ɛ)/. (b) Relationship between Theorem 6 When K =, the dynamics in Eqn. 64 converges to w with probability at least ( ɛ)/, if the initial value w 0 is sampled uniformly from B r = {w : w r} with: π r ɛ d + w (55) Proof Given a ball of radius r, we first compute the gap δ of sphere cap (Fig. (b)). First cos θ = so δ = r cos θ = r w r w,. Then a sufficient condition for the probability argument to hold, is to ensure that the volume V shaded of the shaded area is greater than ɛ V d(r), where V d (r) is the volume of d-dimensional ball of radius r. Since V shaded V d(r) δv d, it suffices to have: V d(r) δv d ɛ V d(r) (56) 7

8 which gives Using δ = r w and V d(r) = V d ()r d, we thus have: V d δ ɛ (57) V d r ɛ V d() V d () w (58) where V d () is the volume of the unit ball. Since the volume of d-dimensional unit ball is where Γ(x) = 0 t x e t dt. So we have From Gautschi s Inequality x s < with s = / and x = d/ we have: Therefore, it suffices to have V d () = π d/ Γ(d/ + ) V d () V d () = Γ(d/ + /) π Γ(d/ + ) (59) (60) Γ(x + ) Γ(x + s) < (x + s) s x > 0, 0 < s < (6) ( ) / d + < Γ(d/ + /) Γ(d/ + ) < ( ) / d (6) π r ɛ d + w (63) Note that this upper bound is tight when δ 0 and d +, since all inequality involved asymptotically becomes equal. 6. Multiple ReLU case Explanation of Eqn. 8. We first write down the dynamics to be studied: E [ wj J ] = K E [ F (e j, wj )] E [F (e j, w j )] (64) j = We first define f(w j, w j, wj ) = F (w j/ w j, wj ) F (w j/ w j, w j ). Therefore, the dynamics can be written as: E [ wj J ] = E [ f(w j, w j, wj )] (65) j Suppose we have a finite group P = {P j } which is a subgroup of orthognoal group O(d). P is the identity element. If w and w have the following symmetry: w j = P j w and w j = P jw, then RHS of Eqn. 64 can be simplified as follows: E [ wj J ] = j E [ f(w j, w j, w j )] = j E [f(p j w, P j w, P j w )] = j E [f(p j w, P j P j w, P j P j w )] ({P j } K j= is a group) = P j j E [f(w, P j w, P j w )] ( P w = w, (P w, P w ) = (w, w )) = P j E [ w J] (66) 8

9 which means that if all w j and wj are symmetric under the action of cyclic group, so does their expected gradient. Therefore, the trajectory {w t } keeps such cyclic structure. Instead of solving a system of K equations, we only need to solve one: E [ w J] = K E [f(w, P j w, P j w )] (67) j= Theorem 7 For a bias-free two-layered ReLU network g(x; w) = j σ(w j x) (68) that takes spherical Gaussian inputs, if the teacher s parameters {wj } form a set of orthnomal bases, then: () When the student parameters is initialized to be [x 0, y 0,..., y 0 ] under the basis of w, where (x 0, y 0 ) Ω = {x (0, ], y [0, ], x > y}, then the dynamics (Eqn. 64) converges to teacher s parameters {w j } (or (x, y) = (, 0)); () when x 0 = y 0 (0, ], then it converges to a saddle point x = y = πk ( K arccos(/ K) + π). This theorem is quite complicated. We will start with a few lemmas and gradually come to the conclusion. First, if w 0 = [x, y, y,..., y] under the bases {wj }K j=, then from simple computation we know that wt also follows this pattern. Therefore, we only need to study the following D dynamics related to x and y: π [ ] { [ [ ] [ ]} N E x J θ x = [(π φ)(x + (K )y)] + y J ] φ + φ φ y [ ] + [(K )(α sin φ x sin φ) + α sin θ] (69) y Here the symmetrical factor (α wj / w j, θ θ j j, φ θj j, φ θ j j ) are defined as follows: α = (x + (K )y ) /, cos θ = αx, cos φ = αy, cos φ = α (xy + (K )y ) (70) Now we start a sequence of lemmas. Lemma 3 For φ, θ and φ defined in Eqn. 70: α (x + (K )y ) / (7) cos θ αx (7) cos φ αy (73) cos φ α (xy + (K )y ) (74) we have the following relations in the triangular region Ω ɛ0 = {(x, y) : x 0, y 0, x y + ɛ 0 } (Fig. (c)): () φ, φ [0, π/] and θ [0, θ 0 ) where θ 0 = arccos K. () cos φ = α (x y) and sin φ = α(x y) α (x y). (3) φ φ (equality holds only when y = 0) and φ > θ. Proof Propositions () and () are computed by direct calculations. In particular, note that since cos θ = αx = / + (K )(y/x) and x > y 0, we have cos θ (/ K, ] and θ [0, θ 0 ). For Preposition (3), φ = arccos αy > θ = arccos αx because x > y. Finally, for x > y > 0, we have cos φ cos φ = α (xy + (K )y ) = α(x + (K )y) > α(x + (K )y) > (75) αy The final inequality is because K, x, y > 0 and thus (x+(k )y) > x +(K ) y > x +(K )y = α. Therefore φ > φ. If y = 0 then φ = φ. 9

10 y (0, ) x=y+ Saddle point (x,x ) O (, 0) (, 0) Teacher s parameters x Figure 3: The region Ω ɛ considered in the analysis of Eqn. 69. Lemma 4 For the dynamics defined in Eqn. 69, there exists ɛ 0 > 0 so that the trianglar region Ω ɛ0 = {(x, y) : x 0, y 0, x y + ɛ 0 } (Fig. 3) is a convergent region. That is, the flow goes inwards for all three edges and any trajectory starting in Ω ɛ0 stays. Proof We discuss the three boundaries as follows: Case : y = 0, 0 x, horizontal line. In this case, θ = 0, φ = π/ and φ = π/. The y component of the dynamics in this line is: f π N yj = π (x ) 0 (76) So y J points to the interior of Ω. Case : x =, 0 y, vertical line. In this case, α and the x component of the dynamics is: f π N xj = (π φ)(k )y θ + (K )(α sin φ sin φ) + α sin θ (77) = (K ) [(π φ)y (α sin φ sin φ)] + (α sin θ θ) (78) Note that since α, α sin θ sin θ θ, so the second term is non-positive. For the first term, we only need to check whether (π φ)y (α sin φ sin φ) is nonnegative. Note that (π φ)y (α sin φ sin φ) (79) = (π φ)y + α(x y) α (x y) α α y (80) = y [π φ α ] α (x y) + α [x α (x y) ] α y (8) In Ω we have (x y), combined with α, we have α (x y) and α y. Since x =, the second term is nonnegative. For the first term, since α, π φ α α (x y) π π > 0 (8) So (π φ)y (α sin φ sin φ) 0 and x J 0, pointing inwards. Case 3: x = y + ɛ, 0 y, diagonal line. We compute the inner product between ( x J, y J) and (, ), the inward normal of Ω at the line. Using φ π sin φ for φ [0, π/] and φ θ = arccos αy arccos αx 0 when x y, we have: f 3 (y, ɛ) π N Note that for y > 0: [ x J y J] [ αy = ] = φ θ ɛφ + [(K )(α sin φ sin φ) + α sin θ] ɛ (83) [ ( ) ] ɛ(k ) α sin φ π + sin φ (K ) [ ( ) = ɛα(k ) α y π ] ɛ + α ɛ (K ) (x/y) + (K ) = ( + ɛ/y) + (K ) K (84) 0

11 For y = 0, αy = 0 < /K. So we have α y /K. And α ɛ. Therefore f 3 ɛα(k )(C ɛc ) with C /K > 0 and C ( + π/(k )) > 0. With ɛ = ɛ 0 > 0 sufficiently small, f 3 > 0. Lemma 5 (Reparametrization) Denote ɛ = x y > 0. The terms αx, αy and αɛ involved in the trigometric functions in Eqn. 69 has the following parameterization: α y x = β β β + (K )β (85) K ɛ Kβ where β = (K β )/(K ). The reverse transformation is given by β = K (K )α ɛ. Here β [, K) and β (0, ]. In particular, the critical point (x, y) = (, 0) corresponds to (β, ɛ) = (, ). As a result, all trigometric functions in Eqn. 69 only depend on the single variable β. In particular, the following relationship is useful: β = cos θ + K sin θ (86) Proof This transformation can be checked by simple algebraic manipulation. For example: ( ) αk (β β ) = K K α (K )ɛ ɛ = ( (Ky + ɛ) ɛ) = y (87) K To prove Eqn. 86, first we notice that K cos θ = Kαx = β + (K )β. Therefore, we have (K cos θ β) (K ) β = 0, which gives β β cos θ + K sin θ = 0. Solving this quadratic equation and notice that β, θ [0, π/] and we get: β = cos θ + cos θ + K sin θ = cos θ + K sin θ (88) Lemma 6 After reparametrization (Eqn. 85), f 3 (β, ɛ) 0 for ɛ (0, β /β]. Furthermore, the equality is true only if (β, ɛ) = (, ) or (y, ɛ) = (0, ). Proof Applying the parametrization (Eqn. 85) to Eqn. 83 and notice that αɛ = β = β (β), we could write f 3 = h (β) (φ + (K ) sin φ)ɛ (89) When β is fixed, f 3 now is a monotonously decreasing function with respect to ɛ > 0. Therefore, f 3 (β, ɛ) f 3 (β, ɛ ) for 0 < ɛ ɛ β /β. If we could prove f 3 (β, ɛ ) 0 and only attain zero at known critical point (β, ɛ) = (, ), the proof is complete. Denote f 3 (β, ɛ ) = f 3 + f 3 where f 3 (β, ɛ ) = φ θ ɛ φ + ɛ α sin θ (90) f 3 (β, ɛ ) = (K )(α sin φ sin φ)ɛ (9) For f 3 it suffices to prove that ɛ (α sin φ sin φ) = β sin φ β β sin φ 0, which is equivalent to sin φ sin φ/β 0. But this is trivially true since φ φ and β. Therefore, f 3 0. Note that the equality only holds when φ = φ and β =, which corresponds to the horizontal line x (0, ], y = 0. For f 3, since φ φ, φ > θ and ɛ (0, ], we have the following: ( f 3 = ɛ (φ φ) + ( ɛ )(φ θ) ɛ θ + β sin θ ɛ θ + β sin θ β sin θ θ ) (9) β And it reduces to showing whether β sin θ θ is nonnegative. Using Eqn. 86, we have: f 33 (θ) = β sin θ θ = sin θ + K sin θ θ (93) Note that f 33 = cos θ + K sin θ = K cos(θ θ 0 ), where θ 0 = arccos K. By Prepositions in Lemma 3, θ [0, θ 0 ). Therefore, f 33 0 and since f 33 (0) = 0, f Again the equity holds when θ = 0, φ = φ and ɛ =, which is the critical point (β, ɛ) = (, ) or (y, ɛ) = (0, ).

12 Lemma 7 For the dynamics defined in Eqn. 69, the only critical point ( x J = 0 and y J = 0) within Ω ɛ is (y, ɛ) = (0, ). Proof We prove by contradiction. Suppose (β, ɛ) is a critical point other than w. A necessary condition for this to hold is f 3 = 0 (Eqn. 83). By Lemma 7, ɛ > ɛ = β /β > 0 and ɛ + Ky = α (β α + β β ) = β α α = β β /ɛ α > β β /ɛ α So ɛ + Ky is strictly greater than zero. On the other hand, the condition f 3 = 0 implies that Using φ [0, π/], φ φ and φ > θ, we have: = 0 (94) ((K )(α sin φ sin φ) + α sin θ) = ɛ (φ θ) + φ (95) π N yj = (π φ)(ɛ + Ky) (φ φ) φy + ((K )(α sin φ sin φ) + α sin θ) y = (π φ)(ɛ + Ky) (φ φ) ɛ (φ θ)y < 0 (96) So the current point (β, ɛ) cannot be a critical point. Lemma 8 Any trajectory in Ω ɛ0 converges to (y, ɛ) = (, 0), following the dynamics defined in Eqn. 69. Proof We have Lyaponov function V = E [E] so that V = E [ w J w J] E [ w J] E [ w J] 0. By Lemma 7, other than the optimal solution w, there is no other symmetric critical point, w J 0 and thus V < 0. On the other hand, by Lemma 4, the triangular region Ω ɛ0 is convergent, in which the D dynamics is C differentiable. Therefore, any D solution curve ξ(t) will stay within. By PoincareBendixson theorem, when there is a unique critical point, the curve either converges to a limit circle or the critical point. However, limit cycle is not possible since V is strictly monotonous decreasing along the curve. Therefore, ξ(t) will converge to the unique critical point, which is (y, ɛ) = (, 0) and so does the symmetric system (Eqn. 64). Lemma 9 When x = y (0, ], the D dynamics (Eqn. 69) reduces to the following D case: π N xj = πk(x x ) (97) where x = πk ( K arccos(/ K) + π). Furthermore, x is a convergent critical point. Proof The D system can be computed with simple algebraic manipulations (note that when x = y, φ = 0 and θ = φ = arccos(/ K)). Note that the D system is linear and its close form solution is x t = x 0 + Ce K/Nt and thus convergent. Combining Lemma 8 and Lemma 9 yields Thm Simulations No theorems is provided. 8 Extension to multilayer ReLU network Proposition For neural network with ReLU nonlinearity and using l loss to match with a teacher network of the same size, the gradient inflow g j for node j at layer c has the following form: g j = Q j (Q j u j Q j u j ) (98) j where Q j and Q j are N-by-N diagonal matrices. For any k [c + ], Q k = j [c] w jkd j Q j and similarly for Q k. The gradient with respect to w j (the parameters immediately under node j), is computed as: wj J = X T c D j g j (99)

13 Proof We prove by induction on layer. For the first layer, there is only one node with g = u v, therefore Q j = Q j = I. Suppose the condition holds for all node j [c]. Then for node k [c + ], we have: g k = w jk D j g j = w jk D j Q j Q j u j Q j u j j j j j = j = j = k w jk D j Q j Q j D j w jk u k j w jk D j Q j j Q j D j j k j Q j k D j w jk u k w jk u k w jk D j Q j k w jk D j Q j Q j D j w jk u k j k j j j Q j D j k w jk u k w jk D j Q j j Q j D j w jk u k Setting Q k = j w jkd j Q j and Q k = j w jk D j Q j (both are diagonal matrices), we thus have: g k = Q k Q k u k Q k Q k u k = Q k Q k u k Q k u k (00) k k 3

Lecture 7: Positive Semidefinite Matrices

Lecture 7: Positive Semidefinite Matrices Lecture 7: Positive Semidefinite Matrices Rajat Mittal IIT Kanpur The main aim of this lecture note is to prepare your background for semidefinite programming. We have already seen some linear algebra.

More information

Chapter 7. Extremal Problems. 7.1 Extrema and Local Extrema

Chapter 7. Extremal Problems. 7.1 Extrema and Local Extrema Chapter 7 Extremal Problems No matter in theoretical context or in applications many problems can be formulated as problems of finding the maximum or minimum of a function. Whenever this is the case, advanced

More information

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2. APPENDIX A Background Mathematics A. Linear Algebra A.. Vector algebra Let x denote the n-dimensional column vector with components 0 x x 2 B C @. A x n Definition 6 (scalar product). The scalar product

More information

Copositive matrices and periodic dynamical systems

Copositive matrices and periodic dynamical systems Extreme copositive matrices and periodic dynamical systems Weierstrass Institute (WIAS), Berlin Optimization without borders Dedicated to Yuri Nesterovs 60th birthday February 11, 2016 and periodic dynamical

More information

Eigenvalue comparisons in graph theory

Eigenvalue comparisons in graph theory Eigenvalue comparisons in graph theory Gregory T. Quenell July 1994 1 Introduction A standard technique for estimating the eigenvalues of the Laplacian on a compact Riemannian manifold M with bounded curvature

More information

1 Lyapunov theory of stability

1 Lyapunov theory of stability M.Kawski, APM 581 Diff Equns Intro to Lyapunov theory. November 15, 29 1 1 Lyapunov theory of stability Introduction. Lyapunov s second (or direct) method provides tools for studying (asymptotic) stability

More information

Linear Algebra Lecture Notes-II

Linear Algebra Lecture Notes-II Linear Algebra Lecture Notes-II Vikas Bist Department of Mathematics Panjab University, Chandigarh-64 email: bistvikas@gmail.com Last revised on March 5, 8 This text is based on the lectures delivered

More information

Math 321: Linear Algebra

Math 321: Linear Algebra Math 32: Linear Algebra T. Kapitula Department of Mathematics and Statistics University of New Mexico September 8, 24 Textbook: Linear Algebra,by J. Hefferon E-mail: kapitula@math.unm.edu Prof. Kapitula,

More information

Some notes on Coxeter groups

Some notes on Coxeter groups Some notes on Coxeter groups Brooks Roberts November 28, 2017 CONTENTS 1 Contents 1 Sources 2 2 Reflections 3 3 The orthogonal group 7 4 Finite subgroups in two dimensions 9 5 Finite subgroups in three

More information

Tangent spaces, normals and extrema

Tangent spaces, normals and extrema Chapter 3 Tangent spaces, normals and extrema If S is a surface in 3-space, with a point a S where S looks smooth, i.e., without any fold or cusp or self-crossing, we can intuitively define the tangent

More information

Assignment 1: From the Definition of Convexity to Helley Theorem

Assignment 1: From the Definition of Convexity to Helley Theorem Assignment 1: From the Definition of Convexity to Helley Theorem Exercise 1 Mark in the following list the sets which are convex: 1. {x R 2 : x 1 + i 2 x 2 1, i = 1,..., 10} 2. {x R 2 : x 2 1 + 2ix 1x

More information

Fall 2016 MATH*1160 Final Exam

Fall 2016 MATH*1160 Final Exam Fall 2016 MATH*1160 Final Exam Last name: (PRINT) First name: Student #: Instructor: M. R. Garvie Dec 16, 2016 INSTRUCTIONS: 1. The exam is 2 hours long. Do NOT start until instructed. You may use blank

More information

TEST CODE: MIII (Objective type) 2010 SYLLABUS

TEST CODE: MIII (Objective type) 2010 SYLLABUS TEST CODE: MIII (Objective type) 200 SYLLABUS Algebra Permutations and combinations. Binomial theorem. Theory of equations. Inequalities. Complex numbers and De Moivre s theorem. Elementary set theory.

More information

Recall: Dot product on R 2 : u v = (u 1, u 2 ) (v 1, v 2 ) = u 1 v 1 + u 2 v 2, u u = u u 2 2 = u 2. Geometric Meaning:

Recall: Dot product on R 2 : u v = (u 1, u 2 ) (v 1, v 2 ) = u 1 v 1 + u 2 v 2, u u = u u 2 2 = u 2. Geometric Meaning: Recall: Dot product on R 2 : u v = (u 1, u 2 ) (v 1, v 2 ) = u 1 v 1 + u 2 v 2, u u = u 2 1 + u 2 2 = u 2. Geometric Meaning: u v = u v cos θ. u θ v 1 Reason: The opposite side is given by u v. u v 2 =

More information

LECTURE 5: SURFACES IN PROJECTIVE SPACE. 1. Projective space

LECTURE 5: SURFACES IN PROJECTIVE SPACE. 1. Projective space LECTURE 5: SURFACES IN PROJECTIVE SPACE. Projective space Definition: The n-dimensional projective space P n is the set of lines through the origin in the vector space R n+. P n may be thought of as the

More information

Analysis-3 lecture schemes

Analysis-3 lecture schemes Analysis-3 lecture schemes (with Homeworks) 1 Csörgő István November, 2015 1 A jegyzet az ELTE Informatikai Kar 2015. évi Jegyzetpályázatának támogatásával készült Contents 1. Lesson 1 4 1.1. The Space

More information

CSC Linear Programming and Combinatorial Optimization Lecture 10: Semidefinite Programming

CSC Linear Programming and Combinatorial Optimization Lecture 10: Semidefinite Programming CSC2411 - Linear Programming and Combinatorial Optimization Lecture 10: Semidefinite Programming Notes taken by Mike Jamieson March 28, 2005 Summary: In this lecture, we introduce semidefinite programming

More information

Copositive Plus Matrices

Copositive Plus Matrices Copositive Plus Matrices Willemieke van Vliet Master Thesis in Applied Mathematics October 2011 Copositive Plus Matrices Summary In this report we discuss the set of copositive plus matrices and their

More information

1. What is the determinant of the following matrix? a 1 a 2 4a 3 2a 2 b 1 b 2 4b 3 2b c 1. = 4, then det

1. What is the determinant of the following matrix? a 1 a 2 4a 3 2a 2 b 1 b 2 4b 3 2b c 1. = 4, then det What is the determinant of the following matrix? 3 4 3 4 3 4 4 3 A 0 B 8 C 55 D 0 E 60 If det a a a 3 b b b 3 c c c 3 = 4, then det a a 4a 3 a b b 4b 3 b c c c 3 c = A 8 B 6 C 4 D E 3 Let A be an n n matrix

More information

Math 302 Outcome Statements Winter 2013

Math 302 Outcome Statements Winter 2013 Math 302 Outcome Statements Winter 2013 1 Rectangular Space Coordinates; Vectors in the Three-Dimensional Space (a) Cartesian coordinates of a point (b) sphere (c) symmetry about a point, a line, and a

More information

Half of Final Exam Name: Practice Problems October 28, 2014

Half of Final Exam Name: Practice Problems October 28, 2014 Math 54. Treibergs Half of Final Exam Name: Practice Problems October 28, 24 Half of the final will be over material since the last midterm exam, such as the practice problems given here. The other half

More information

Implicit Functions, Curves and Surfaces

Implicit Functions, Curves and Surfaces Chapter 11 Implicit Functions, Curves and Surfaces 11.1 Implicit Function Theorem Motivation. In many problems, objects or quantities of interest can only be described indirectly or implicitly. It is then

More information

154 Chapter 9 Hints, Answers, and Solutions The particular trajectories are highlighted in the phase portraits below.

154 Chapter 9 Hints, Answers, and Solutions The particular trajectories are highlighted in the phase portraits below. 54 Chapter 9 Hints, Answers, and Solutions 9. The Phase Plane 9.. 4. The particular trajectories are highlighted in the phase portraits below... 3. 4. 9..5. Shown below is one possibility with x(t) and

More information

HOMEWORK 2 - RIEMANNIAN GEOMETRY. 1. Problems In what follows (M, g) will always denote a Riemannian manifold with a Levi-Civita connection.

HOMEWORK 2 - RIEMANNIAN GEOMETRY. 1. Problems In what follows (M, g) will always denote a Riemannian manifold with a Levi-Civita connection. HOMEWORK 2 - RIEMANNIAN GEOMETRY ANDRÉ NEVES 1. Problems In what follows (M, g will always denote a Riemannian manifold with a Levi-Civita connection. 1 Let X, Y, Z be vector fields on M so that X(p Z(p

More information

3 Stability and Lyapunov Functions

3 Stability and Lyapunov Functions CDS140a Nonlinear Systems: Local Theory 02/01/2011 3 Stability and Lyapunov Functions 3.1 Lyapunov Stability Denition: An equilibrium point x 0 of (1) is stable if for all ɛ > 0, there exists a δ > 0 such

More information

Linear Algebra, Summer 2011, pt. 3

Linear Algebra, Summer 2011, pt. 3 Linear Algebra, Summer 011, pt. 3 September 0, 011 Contents 1 Orthogonality. 1 1.1 The length of a vector....................... 1. Orthogonal vectors......................... 3 1.3 Orthogonal Subspaces.......................

More information

MANIFOLD STRUCTURES IN ALGEBRA

MANIFOLD STRUCTURES IN ALGEBRA MANIFOLD STRUCTURES IN ALGEBRA MATTHEW GARCIA 1. Definitions Our aim is to describe the manifold structure on classical linear groups and from there deduce a number of results. Before we begin we make

More information

Discrete Euclidean Curvature Flows

Discrete Euclidean Curvature Flows Discrete Euclidean Curvature Flows 1 1 Department of Computer Science SUNY at Stony Brook Tsinghua University 2010 Isothermal Coordinates Relation between conformal structure and Riemannian metric Isothermal

More information

Chapter 1. Preliminaries. The purpose of this chapter is to provide some basic background information. Linear Space. Hilbert Space.

Chapter 1. Preliminaries. The purpose of this chapter is to provide some basic background information. Linear Space. Hilbert Space. Chapter 1 Preliminaries The purpose of this chapter is to provide some basic background information. Linear Space Hilbert Space Basic Principles 1 2 Preliminaries Linear Space The notion of linear space

More information

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013. The University of Texas at Austin Department of Electrical and Computer Engineering EE381V: Large Scale Learning Spring 2013 Assignment Two Caramanis/Sanghavi Due: Tuesday, Feb. 19, 2013. Computational

More information

Economics 204 Summer/Fall 2010 Lecture 10 Friday August 6, 2010

Economics 204 Summer/Fall 2010 Lecture 10 Friday August 6, 2010 Economics 204 Summer/Fall 2010 Lecture 10 Friday August 6, 2010 Diagonalization of Symmetric Real Matrices (from Handout Definition 1 Let δ ij = { 1 if i = j 0 if i j A basis V = {v 1,..., v n } of R n

More information

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces.

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces. Math 350 Fall 2011 Notes about inner product spaces In this notes we state and prove some important properties of inner product spaces. First, recall the dot product on R n : if x, y R n, say x = (x 1,...,

More information

Systems of Linear ODEs

Systems of Linear ODEs P a g e 1 Systems of Linear ODEs Systems of ordinary differential equations can be solved in much the same way as discrete dynamical systems if the differential equations are linear. We will focus here

More information

Linear Models Review

Linear Models Review Linear Models Review Vectors in IR n will be written as ordered n-tuples which are understood to be column vectors, or n 1 matrices. A vector variable will be indicted with bold face, and the prime sign

More information

Insights into the Geometry of the Gaussian Kernel and an Application in Geometric Modeling

Insights into the Geometry of the Gaussian Kernel and an Application in Geometric Modeling Insights into the Geometry of the Gaussian Kernel and an Application in Geometric Modeling Master Thesis Michael Eigensatz Advisor: Joachim Giesen Professor: Mark Pauly Swiss Federal Institute of Technology

More information

L. Levaggi A. Tabacco WAVELETS ON THE INTERVAL AND RELATED TOPICS

L. Levaggi A. Tabacco WAVELETS ON THE INTERVAL AND RELATED TOPICS Rend. Sem. Mat. Univ. Pol. Torino Vol. 57, 1999) L. Levaggi A. Tabacco WAVELETS ON THE INTERVAL AND RELATED TOPICS Abstract. We use an abstract framework to obtain a multilevel decomposition of a variety

More information

Scattered Data Interpolation with Polynomial Precision and Conditionally Positive Definite Functions

Scattered Data Interpolation with Polynomial Precision and Conditionally Positive Definite Functions Chapter 3 Scattered Data Interpolation with Polynomial Precision and Conditionally Positive Definite Functions 3.1 Scattered Data Interpolation with Polynomial Precision Sometimes the assumption on the

More information

Dot Products. K. Behrend. April 3, Abstract A short review of some basic facts on the dot product. Projections. The spectral theorem.

Dot Products. K. Behrend. April 3, Abstract A short review of some basic facts on the dot product. Projections. The spectral theorem. Dot Products K. Behrend April 3, 008 Abstract A short review of some basic facts on the dot product. Projections. The spectral theorem. Contents The dot product 3. Length of a vector........................

More information

The Symmetric Space for SL n (R)

The Symmetric Space for SL n (R) The Symmetric Space for SL n (R) Rich Schwartz November 27, 2013 The purpose of these notes is to discuss the symmetric space X on which SL n (R) acts. Here, as usual, SL n (R) denotes the group of n n

More information

Review of Some Concepts from Linear Algebra: Part 2

Review of Some Concepts from Linear Algebra: Part 2 Review of Some Concepts from Linear Algebra: Part 2 Department of Mathematics Boise State University January 16, 2019 Math 566 Linear Algebra Review: Part 2 January 16, 2019 1 / 22 Vector spaces A set

More information

Basic Concepts in Matrix Algebra

Basic Concepts in Matrix Algebra Basic Concepts in Matrix Algebra An column array of p elements is called a vector of dimension p and is written as x p 1 = x 1 x 2. x p. The transpose of the column vector x p 1 is row vector x = [x 1

More information

ARCS IN FINITE PROJECTIVE SPACES. Basic objects and definitions

ARCS IN FINITE PROJECTIVE SPACES. Basic objects and definitions ARCS IN FINITE PROJECTIVE SPACES SIMEON BALL Abstract. These notes are an outline of a course on arcs given at the Finite Geometry Summer School, University of Sussex, June 26-30, 2017. Let K denote an

More information

October 25, 2013 INNER PRODUCT SPACES

October 25, 2013 INNER PRODUCT SPACES October 25, 2013 INNER PRODUCT SPACES RODICA D. COSTIN Contents 1. Inner product 2 1.1. Inner product 2 1.2. Inner product spaces 4 2. Orthogonal bases 5 2.1. Existence of an orthogonal basis 7 2.2. Orthogonal

More information

Stable Process. 2. Multivariate Stable Distributions. July, 2006

Stable Process. 2. Multivariate Stable Distributions. July, 2006 Stable Process 2. Multivariate Stable Distributions July, 2006 1. Stable random vectors. 2. Characteristic functions. 3. Strictly stable and symmetric stable random vectors. 4. Sub-Gaussian random vectors.

More information

TEST CODE: MMA (Objective type) 2015 SYLLABUS

TEST CODE: MMA (Objective type) 2015 SYLLABUS TEST CODE: MMA (Objective type) 2015 SYLLABUS Analytical Reasoning Algebra Arithmetic, geometric and harmonic progression. Continued fractions. Elementary combinatorics: Permutations and combinations,

More information

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University PCA with random noise Van Ha Vu Department of Mathematics Yale University An important problem that appears in various areas of applied mathematics (in particular statistics, computer science and numerical

More information

Zweier I-Convergent Sequence Spaces

Zweier I-Convergent Sequence Spaces Chapter 2 Zweier I-Convergent Sequence Spaces In most sciences one generation tears down what another has built and what one has established another undoes. In mathematics alone each generation builds

More information

Linear Algebra. Session 12

Linear Algebra. Session 12 Linear Algebra. Session 12 Dr. Marco A Roque Sol 08/01/2017 Example 12.1 Find the constant function that is the least squares fit to the following data x 0 1 2 3 f(x) 1 0 1 2 Solution c = 1 c = 0 f (x)

More information

Math 322. Spring 2015 Review Problems for Midterm 2

Math 322. Spring 2015 Review Problems for Midterm 2 Linear Algebra: Topic: Linear Independence of vectors. Question. Math 3. Spring Review Problems for Midterm Explain why if A is not square, then either the row vectors or the column vectors of A are linearly

More information

1 Planar rotations. Math Abstract Linear Algebra Fall 2011, section E1 Orthogonal matrices and rotations

1 Planar rotations. Math Abstract Linear Algebra Fall 2011, section E1 Orthogonal matrices and rotations Math 46 - Abstract Linear Algebra Fall, section E Orthogonal matrices and rotations Planar rotations Definition: A planar rotation in R n is a linear map R: R n R n such that there is a plane P R n (through

More information

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings Structural and Multidisciplinary Optimization P. Duysinx and P. Tossings 2018-2019 CONTACTS Pierre Duysinx Institut de Mécanique et du Génie Civil (B52/3) Phone number: 04/366.91.94 Email: P.Duysinx@uliege.be

More information

BASIC NOTIONS. x + y = 1 3, 3x 5y + z = A + 3B,C + 2D, DC are not defined. A + C =

BASIC NOTIONS. x + y = 1 3, 3x 5y + z = A + 3B,C + 2D, DC are not defined. A + C = CHAPTER I BASIC NOTIONS (a) 8666 and 8833 (b) a =6,a =4 will work in the first case, but there are no possible such weightings to produce the second case, since Student and Student 3 have to end up with

More information

x 3y 2z = 6 1.2) 2x 4y 3z = 8 3x + 6y + 8z = 5 x + 3y 2z + 5t = 4 1.5) 2x + 8y z + 9t = 9 3x + 5y 12z + 17t = 7

x 3y 2z = 6 1.2) 2x 4y 3z = 8 3x + 6y + 8z = 5 x + 3y 2z + 5t = 4 1.5) 2x + 8y z + 9t = 9 3x + 5y 12z + 17t = 7 Linear Algebra and its Applications-Lab 1 1) Use Gaussian elimination to solve the following systems x 1 + x 2 2x 3 + 4x 4 = 5 1.1) 2x 1 + 2x 2 3x 3 + x 4 = 3 3x 1 + 3x 2 4x 3 2x 4 = 1 x + y + 2z = 4 1.4)

More information

Continuous methods for numerical linear algebra problems

Continuous methods for numerical linear algebra problems Continuous methods for numerical linear algebra problems Li-Zhi Liao (http://www.math.hkbu.edu.hk/ liliao) Department of Mathematics Hong Kong Baptist University The First International Summer School on

More information

Chapter Two Elements of Linear Algebra

Chapter Two Elements of Linear Algebra Chapter Two Elements of Linear Algebra Previously, in chapter one, we have considered single first order differential equations involving a single unknown function. In the next chapter we will begin to

More information

Introduction to Semidefinite Programming I: Basic properties a

Introduction to Semidefinite Programming I: Basic properties a Introduction to Semidefinite Programming I: Basic properties and variations on the Goemans-Williamson approximation algorithm for max-cut MFO seminar on Semidefinite Programming May 30, 2010 Semidefinite

More information

A geometric proof of the spectral theorem for real symmetric matrices

A geometric proof of the spectral theorem for real symmetric matrices 0 0 0 A geometric proof of the spectral theorem for real symmetric matrices Robert Sachs Department of Mathematical Sciences George Mason University Fairfax, Virginia 22030 rsachs@gmu.edu January 6, 2011

More information

Solutions to old Exam 3 problems

Solutions to old Exam 3 problems Solutions to old Exam 3 problems Hi students! I am putting this version of my review for the Final exam review here on the web site, place and time to be announced. Enjoy!! Best, Bill Meeks PS. There are

More information

Hints and Selected Answers to Homework Sets

Hints and Selected Answers to Homework Sets Hints and Selected Answers to Homework Sets Phil R. Smith, Ph.D. March 8, 2 Test I: Systems of Linear Equations; Matrices Lesson. Here s a linear equation in terms of a, b, and c: a 5b + 7c 2 ( and here

More information

CS 246 Review of Linear Algebra 01/17/19

CS 246 Review of Linear Algebra 01/17/19 1 Linear algebra In this section we will discuss vectors and matrices. We denote the (i, j)th entry of a matrix A as A ij, and the ith entry of a vector as v i. 1.1 Vectors and vector operations A vector

More information

Interior-Point Methods for Linear Optimization

Interior-Point Methods for Linear Optimization Interior-Point Methods for Linear Optimization Robert M. Freund and Jorge Vera March, 204 c 204 Robert M. Freund and Jorge Vera. All rights reserved. Linear Optimization with a Logarithmic Barrier Function

More information

Stability of Feedback Solutions for Infinite Horizon Noncooperative Differential Games

Stability of Feedback Solutions for Infinite Horizon Noncooperative Differential Games Stability of Feedback Solutions for Infinite Horizon Noncooperative Differential Games Alberto Bressan ) and Khai T. Nguyen ) *) Department of Mathematics, Penn State University **) Department of Mathematics,

More information

CO 250 Final Exam Guide

CO 250 Final Exam Guide Spring 2017 CO 250 Final Exam Guide TABLE OF CONTENTS richardwu.ca CO 250 Final Exam Guide Introduction to Optimization Kanstantsin Pashkovich Spring 2017 University of Waterloo Last Revision: March 4,

More information

ON THE MATRIX EQUATION XA AX = X P

ON THE MATRIX EQUATION XA AX = X P ON THE MATRIX EQUATION XA AX = X P DIETRICH BURDE Abstract We study the matrix equation XA AX = X p in M n (K) for 1 < p < n It is shown that every matrix solution X is nilpotent and that the generalized

More information

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems Robert M. Freund February 2016 c 2016 Massachusetts Institute of Technology. All rights reserved. 1 1 Introduction

More information

1. < 0: the eigenvalues are real and have opposite signs; the fixed point is a saddle point

1. < 0: the eigenvalues are real and have opposite signs; the fixed point is a saddle point Solving a Linear System τ = trace(a) = a + d = λ 1 + λ 2 λ 1,2 = τ± = det(a) = ad bc = λ 1 λ 2 Classification of Fixed Points τ 2 4 1. < 0: the eigenvalues are real and have opposite signs; the fixed point

More information

MATH 423 Linear Algebra II Lecture 33: Diagonalization of normal operators.

MATH 423 Linear Algebra II Lecture 33: Diagonalization of normal operators. MATH 423 Linear Algebra II Lecture 33: Diagonalization of normal operators. Adjoint operator and adjoint matrix Given a linear operator L on an inner product space V, the adjoint of L is a transformation

More information

Solving the 3D Laplace Equation by Meshless Collocation via Harmonic Kernels

Solving the 3D Laplace Equation by Meshless Collocation via Harmonic Kernels Solving the 3D Laplace Equation by Meshless Collocation via Harmonic Kernels Y.C. Hon and R. Schaback April 9, Abstract This paper solves the Laplace equation u = on domains Ω R 3 by meshless collocation

More information

Throughout these notes we assume V, W are finite dimensional inner product spaces over C.

Throughout these notes we assume V, W are finite dimensional inner product spaces over C. Math 342 - Linear Algebra II Notes Throughout these notes we assume V, W are finite dimensional inner product spaces over C 1 Upper Triangular Representation Proposition: Let T L(V ) There exists an orthonormal

More information

Now I switch to nonlinear systems. In this chapter the main object of study will be

Now I switch to nonlinear systems. In this chapter the main object of study will be Chapter 4 Stability 4.1 Autonomous systems Now I switch to nonlinear systems. In this chapter the main object of study will be ẋ = f(x), x(t) X R k, f : X R k, (4.1) where f is supposed to be locally Lipschitz

More information

Appendix PRELIMINARIES 1. THEOREMS OF ALTERNATIVES FOR SYSTEMS OF LINEAR CONSTRAINTS

Appendix PRELIMINARIES 1. THEOREMS OF ALTERNATIVES FOR SYSTEMS OF LINEAR CONSTRAINTS Appendix PRELIMINARIES 1. THEOREMS OF ALTERNATIVES FOR SYSTEMS OF LINEAR CONSTRAINTS Here we consider systems of linear constraints, consisting of equations or inequalities or both. A feasible solution

More information

Elementary linear algebra

Elementary linear algebra Chapter 1 Elementary linear algebra 1.1 Vector spaces Vector spaces owe their importance to the fact that so many models arising in the solutions of specific problems turn out to be vector spaces. The

More information

In particular, if A is a square matrix and λ is one of its eigenvalues, then we can find a non-zero column vector X with

In particular, if A is a square matrix and λ is one of its eigenvalues, then we can find a non-zero column vector X with Appendix: Matrix Estimates and the Perron-Frobenius Theorem. This Appendix will first present some well known estimates. For any m n matrix A = [a ij ] over the real or complex numbers, it will be convenient

More information

Algebra Workshops 10 and 11

Algebra Workshops 10 and 11 Algebra Workshops 1 and 11 Suggestion: For Workshop 1 please do questions 2,3 and 14. For the other questions, it s best to wait till the material is covered in lectures. Bilinear and Quadratic Forms on

More information

Solutions to Math 53 Math 53 Practice Final

Solutions to Math 53 Math 53 Practice Final Solutions to Math 5 Math 5 Practice Final 20 points Consider the initial value problem y t 4yt = te t with y 0 = and y0 = 0 a 8 points Find the Laplace transform of the solution of this IVP b 8 points

More information

j=1 u 1jv 1j. 1/ 2 Lemma 1. An orthogonal set of vectors must be linearly independent.

j=1 u 1jv 1j. 1/ 2 Lemma 1. An orthogonal set of vectors must be linearly independent. Lecture Notes: Orthogonal and Symmetric Matrices Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong taoyf@cse.cuhk.edu.hk Orthogonal Matrix Definition. Let u = [u

More information

Linear Algebra Review

Linear Algebra Review January 29, 2013 Table of contents Metrics Metric Given a space X, then d : X X R + 0 and z in X if: d(x, y) = 0 is equivalent to x = y d(x, y) = d(y, x) d(x, y) d(x, z) + d(z, y) is a metric is for all

More information

REVIEW OF DIFFERENTIAL CALCULUS

REVIEW OF DIFFERENTIAL CALCULUS REVIEW OF DIFFERENTIAL CALCULUS DONU ARAPURA 1. Limits and continuity To simplify the statements, we will often stick to two variables, but everything holds with any number of variables. Let f(x, y) be

More information

Math 321: Linear Algebra

Math 321: Linear Algebra Math 32: Linear Algebra T Kapitula Department of Mathematics and Statistics University of New Mexico September 8, 24 Textbook: Linear Algebra,by J Hefferon E-mail: kapitula@mathunmedu Prof Kapitula, Spring

More information

Information About Ellipses

Information About Ellipses Information About Ellipses David Eberly, Geometric Tools, Redmond WA 9805 https://www.geometrictools.com/ This work is licensed under the Creative Commons Attribution 4.0 International License. To view

More information

x 1 + x 2 2 x 1 x 2 1 x 2 2 min 3x 1 + 2x 2

x 1 + x 2 2 x 1 x 2 1 x 2 2 min 3x 1 + 2x 2 Lecture 1 LPs: Algebraic View 1.1 Introduction to Linear Programming Linear programs began to get a lot of attention in 1940 s, when people were interested in minimizing costs of various systems while

More information

0.1 Rational Canonical Forms

0.1 Rational Canonical Forms We have already seen that it is useful and simpler to study linear systems using matrices. But matrices are themselves cumbersome, as they are stuffed with many entries, and it turns out that it s best

More information

Lagrange Multipliers

Lagrange Multipliers Optimization with Constraints As long as algebra and geometry have been separated, their progress have been slow and their uses limited; but when these two sciences have been united, they have lent each

More information

MATH 1120 (LINEAR ALGEBRA 1), FINAL EXAM FALL 2011 SOLUTIONS TO PRACTICE VERSION

MATH 1120 (LINEAR ALGEBRA 1), FINAL EXAM FALL 2011 SOLUTIONS TO PRACTICE VERSION MATH (LINEAR ALGEBRA ) FINAL EXAM FALL SOLUTIONS TO PRACTICE VERSION Problem (a) For each matrix below (i) find a basis for its column space (ii) find a basis for its row space (iii) determine whether

More information

MORE NOTES FOR MATH 823, FALL 2007

MORE NOTES FOR MATH 823, FALL 2007 MORE NOTES FOR MATH 83, FALL 007 Prop 1.1 Prop 1. Lemma 1.3 1. The Siegel upper half space 1.1. The Siegel upper half space and its Bergman kernel. The Siegel upper half space is the domain { U n+1 z C

More information

The Closed Form Reproducing Polynomial Particle Shape Functions for Meshfree Particle Methods

The Closed Form Reproducing Polynomial Particle Shape Functions for Meshfree Particle Methods The Closed Form Reproducing Polynomial Particle Shape Functions for Meshfree Particle Methods by Hae-Soo Oh Department of Mathematics, University of North Carolina at Charlotte, Charlotte, NC 28223 June

More information

Lecture 8: Linear Algebra Background

Lecture 8: Linear Algebra Background CSE 521: Design and Analysis of Algorithms I Winter 2017 Lecture 8: Linear Algebra Background Lecturer: Shayan Oveis Gharan 2/1/2017 Scribe: Swati Padmanabhan Disclaimer: These notes have not been subjected

More information

r=1 r=1 argmin Q Jt (20) After computing the descent direction d Jt 2 dt H t d + P (x + d) d i = 0, i / J

r=1 r=1 argmin Q Jt (20) After computing the descent direction d Jt 2 dt H t d + P (x + d) d i = 0, i / J 7 Appendix 7. Proof of Theorem Proof. There are two main difficulties in proving the convergence of our algorithm, and none of them is addressed in previous works. First, the Hessian matrix H is a block-structured

More information

Eigenvalues and Eigenvectors

Eigenvalues and Eigenvectors /88 Chia-Ping Chen Department of Computer Science and Engineering National Sun Yat-sen University Linear Algebra Eigenvalue Problem /88 Eigenvalue Equation By definition, the eigenvalue equation for matrix

More information

Convex Analysis and Optimization Chapter 2 Solutions

Convex Analysis and Optimization Chapter 2 Solutions Convex Analysis and Optimization Chapter 2 Solutions Dimitri P. Bertsekas with Angelia Nedić and Asuman E. Ozdaglar Massachusetts Institute of Technology Athena Scientific, Belmont, Massachusetts http://www.athenasc.com

More information

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra. DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1

More information

Grothendieck s Inequality

Grothendieck s Inequality Grothendieck s Inequality Leqi Zhu 1 Introduction Let A = (A ij ) R m n be an m n matrix. Then A defines a linear operator between normed spaces (R m, p ) and (R n, q ), for 1 p, q. The (p q)-norm of A

More information

Math 5630: Iterative Methods for Systems of Equations Hung Phan, UMass Lowell March 22, 2018

Math 5630: Iterative Methods for Systems of Equations Hung Phan, UMass Lowell March 22, 2018 1 Linear Systems Math 5630: Iterative Methods for Systems of Equations Hung Phan, UMass Lowell March, 018 Consider the system 4x y + z = 7 4x 8y + z = 1 x + y + 5z = 15. We then obtain x = 1 4 (7 + y z)

More information

Mostow Rigidity. W. Dison June 17, (a) semi-simple Lie groups with trivial centre and no compact factors and

Mostow Rigidity. W. Dison June 17, (a) semi-simple Lie groups with trivial centre and no compact factors and Mostow Rigidity W. Dison June 17, 2005 0 Introduction Lie Groups and Symmetric Spaces We will be concerned with (a) semi-simple Lie groups with trivial centre and no compact factors and (b) simply connected,

More information

Tutorials in Optimization. Richard Socher

Tutorials in Optimization. Richard Socher Tutorials in Optimization Richard Socher July 20, 2008 CONTENTS 1 Contents 1 Linear Algebra: Bilinear Form - A Simple Optimization Problem 2 1.1 Definitions........................................ 2 1.2

More information

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 1 Entropy Since this course is about entropy maximization,

More information

MAT 419 Lecture Notes Transcribed by Eowyn Cenek 6/1/2012

MAT 419 Lecture Notes Transcribed by Eowyn Cenek 6/1/2012 (Homework 1: Chapter 1: Exercises 1-7, 9, 11, 19, due Monday June 11th See also the course website for lectures, assignments, etc) Note: today s lecture is primarily about definitions Lots of definitions

More information

w T 1 w T 2. w T n 0 if i j 1 if i = j

w T 1 w T 2. w T n 0 if i j 1 if i = j Lyapunov Operator Let A F n n be given, and define a linear operator L A : C n n C n n as L A (X) := A X + XA Suppose A is diagonalizable (what follows can be generalized even if this is not possible -

More information

Lecture 1: Review of linear algebra

Lecture 1: Review of linear algebra Lecture 1: Review of linear algebra Linear functions and linearization Inverse matrix, least-squares and least-norm solutions Subspaces, basis, and dimension Change of basis and similarity transformations

More information

Math 1060 Linear Algebra Homework Exercises 1 1. Find the complete solutions (if any!) to each of the following systems of simultaneous equations:

Math 1060 Linear Algebra Homework Exercises 1 1. Find the complete solutions (if any!) to each of the following systems of simultaneous equations: Homework Exercises 1 1 Find the complete solutions (if any!) to each of the following systems of simultaneous equations: (i) x 4y + 3z = 2 3x 11y + 13z = 3 2x 9y + 2z = 7 x 2y + 6z = 2 (ii) x 4y + 3z =

More information