Dissipativity Theory for Nesterov s Accelerated Method

Size: px
Start display at page:

Download "Dissipativity Theory for Nesterov s Accelerated Method"

Transcription

1 Bin Hu Laurent Lessard Abstract In this paper, we adapt the control theoretic concept of dissipativity theory to provide a natural understanding of Nesterov s accelerated method. Our theory ties rigorous convergence rate analysis to the physically intuitive notion of energy dissipation. Moreover, dissipativity allows one to efficiently construct Lyapunov functions (either numerically or analytically) by solving a small semidefinite program. Using novel supply rate functions, we show how to recover nown rate bounds for Nesterov s method and we generalize the approach to certify both linear and sublinear rates in a variety of settings. Finally, we lin the continuous-time version of dissipativity to recent wors on algorithm analysis that use discretizations of ordinary differential equations.. Introduction Nesterov s accelerated method (Nesterov, 003) has garnered attention and interest in the machine learning community because of its fast global convergence rate guarantees. he original convergence rate proofs of Nesterov s accelerated method are derived using the method of estimate sequences, which has proven difficult to interpret. his observation motivated a sequence of recent wors on new analysis and interpretations of Nesterov s accelerated method (Bubec et al., 05; Lessard et al., 06; Su et al., 06; Drusvyatsiy et al., 06; Flammarion & Bach, 05; Wibisono et al., 06; Wilson et al., 06). Many of these recent papers rely on Lyapunov-based stability arguments. Lyapunov theory is an analogue to the principle of minimum energy and brings a physical intuition to convergence behaviors. When applying such proof techniques, one must construct a Lyapunov function, which is a nonnegative function of the algorithm s state (an internal energy ) that decreases along all admissible trajec- University of Wisconsin Madison, Madison, WI 53706, USA. Correspondence to: Bin Hu <bhu38@wisc.edu>. Proceedings of the 34 th International Conference on Machine Learning, Sydney, Australia, PMLR 70, 07. Copyright 07 by the author(s). tories. Once a Lyapunov function is found, one can relate the rate of decrease of this internal energy to the rate of convergence of the algorithm. he main challenge in applying Lyapunov s method is finding a suitable Lyapunov function. here are two main approaches for Lyapunov function constructions. he first approach adopts the integral quadratic constraint (IQC) framewor (Megretsi & Rantzer, 997) from control theory and formulates a linear matrix equality (LMI) whose feasibility implies the linear convergence of the algorithm (Lessard et al., 06). Despite the generality of the IQC approach and the small size of the associated LMI, one must typically resort to numerical simulations to solve the LMI. he second approach sees an ordinary differential equation (ODE) that can be appropriately discretized to yield the algorithm of interest. One can then gain intuition about the trajectories of the algorithm by examining trajectories of the continuous-time ODE (Su et al., 06; Wibisono et al., 06; Wilson et al., 06). he wor of Wilson et al. (06) also establishes a general equivalence between Lyapunov functions and estimate sequence proofs. In this paper, we bridge the IQC approach (Lessard et al., 06) and the discretization approach (Wilson et al., 06) by using dissipativity theory (Willems, 97a;b). he term dissipativity is borrowed from the notion of energy dissipation in physics and the theory provides a general approach for the intuitive understanding and construction of Lyapunov functions. Dissipativity for quadratic Lyapunov functions in particular (Willems, 97b) has seen widespread use in controls. In the sequel, we tailor dissipativity theory to the automated construction of Lyapunov functions, which are not necessarily quadratic, for the analysis of optimization algorithms. Our dissipation inequality leads to an LMI condition that is simpler than the one in Lessard et al. (06) and hence more amenable to being solved analytically. When specialized to Nesterov s accelerated method, our LMI recovers the Lyapunov function proposed in Wilson et al. (06). Finally, we extend our LMI-based approach to the sublinear convergence analysis of Nesterov s accelerated method in both discrete and continuous time domains. his complements the original LMI-based approach in Lessard et al. (06), which mainly handles linear convergence rate analyses.

2 An LMI-based approach for sublinear rate analysis similar to ours was independently and simultaneously proposed by Fazlyab et al. (07). While this wor and the present wor both draw connections to the continuous-time results mentioned above, different algorithms and function classes are emphasized. For example, Fazlyab et al. (07) develops LMIs for gradient descent and proximal/projectionbased variants with convex/quasi-convex objective functions. In contrast, the present wor develops LMIs tailored to the analysis of discrete-time accelerated methods and Nesterov s method in particular.. Preliminaries.. Notation Let R and R + denote the real and nonnegative real numbers, respectively. Let I p and 0 p denote the p p identity and zero matrices, respectively. he Kronecer product of two matrices is denoted A B and satisfies the properties (A B) = A B and (A B)(C D) = (AC) (BD) when the matrices have compatible dimensions. Matrix inequalities hold in the semidefinite sense unless otherwise indicated. A differentiable function f : R p R is m-strongly convex if f(x) f(y) + f(y) (x y) + m x y for all x, y R p and is L-smooth if f(x) f(y) L x y for all x, y R p. Note that f is convex if f is 0-strongly convex. We use x to denote a point satisfying f(x ) = 0. When f is L-smooth and m-strongly convex, x is unique... Classical Dissipativity heory Consider a linear dynamical system governed by the statespace model ξ + = Aξ + B. () Here, ξ R n ξ is the state, R nw is the input, A R n ξ n ξ is the state transition matrix, and B R n ξ n w is the input matrix. he input can be physically interpreted as a driving force. Classical dissipativity theory describes how the internal energy stored in the state ξ evolves with time as one applies the input to drive the system. A ey concept in dissipativity theory is the supply rate, which characterizes the energy change in ξ due to the driving force. he supply rate is a function S : R n ξ R nw R that maps any state/input pair (ξ, w) to a scalar measuring the amount of energy delivered from w to state ξ. Now we introduce the notion of dissipativity. Definition he dynamical system () is dissipative with respect to the supply rate S if there exists a function V : R n ξ R + such that V (ξ) 0 for all ξ R n ξ and V (ξ + ) V (ξ ) S(ξ, ) () for all. he function V is called a storage function, which quantifies the energy stored in the state ξ. In addition, () is called the dissipation inequality. he dissipation inequality () states that the change of the internal energy stored in ξ is equal to the difference between the supplied energy and the dissipated energy. Since there will always be some energy dissipating from the system, the change in the stored energy (which is exactly V (ξ + ) V (ξ )) is always bounded above by the energy supplied to the system (which is exactly S(ξ, )). A variant of () nown as the exponential dissipation inequality states that for some 0 ρ <, we have V (ξ + ) ρ V (ξ ) S(ξ, ), (3) which states that at least a fraction ( ρ ) of the internal energy will dissipate at every step. he dissipation inequality (3) provides a direct way to construct a Lyapunov function based on the storage function. It is often the case that we have prior nowledge about how the driving force is related to the state ξ. hus, we may now additional information about the supply rate function S(ξ, ). For example, if S(ξ, ) 0 for all then (3) directly implies that V (ξ + ) ρ V (ξ ), and the storage function V can serve as a Lyapunov function. he condition S(ξ, ) 0 means that the driving force does not inject any energy into the system and may even extract energy out of the system. hen, the internal energy will decrease no slower than the linear rate ρ and approach a minimum value at equilibrium. An advantage of dissipativity theory is that for any quadratic supply rate, one can automatically construct the dissipation inequality using semidefinite programming. We now state a standard result from the controls literature. heorem Consider the following quadratic supply rate with X R (n ξ+n w) (n ξ +n w) and X = X. ξ ξ S(ξ, w) := X. (4) w w If there exists a matrix P R n ξ n ξ with P 0 such that A P A ρ P A P B B P A B X 0, (5) P B then the dissipation inequality (3) holds for all trajectories of () with V (ξ) := ξ P ξ. Proof. Based on the state-space model (), we have V (ξ + ) = ξ +P ξ + = (Aξ + B ) P (Aξ + B ) ξ A = P A A P B ξ B P A B. P B

3 Hence we can left and right multiply (5) by ξ w and ξ, w and directly obtain the desired conclusion. he left-hand side of (5) is linear in P, so (5) is a linear matrix inequality (LMI) for any fixed A, B, X, ρ. he set of P such that (5) holds is therefore a convex set and can be efficiently searched using interior point methods, for example. o apply the dissipativity theory for linear convergence rate analysis, one typically follows two steps.. Choose a proper quadratic supply rate function S satisfying certain desired properties, e.g. S(ξ, ) 0.. Solve the LMI (5) to obtain a storage function V, which is then used to construct a Lyapunov function. In step, the LMI obtained is typically very small, e.g. or 3 3, so we can often solve the LMI analytically. For illustrative purposes, we rephrase the existing LMI analysis of the gradient descent method (Lessard et al., 06, 4.4) using the notion of dissipativity..3. Example: Dissipativity for Gradient Descent here is an intrinsic connection between dissipativity theory and the IQC approach (Megretsi et al., 00; Seiler, 05). he IQC analysis of the gradient descent method in Lessard et al. (06) may be reframed using dissipativity theory. hen, the pointwise IQC (Lessard et al., 06, Lemma 6) amounts to using a quadratic supply rate S with S 0. Specifically, assume f is L-smooth and m-strongly convex, and consider the gradient descent method x + = x α f(x ). (6) We have x + x = x x α f(x ), where x is the unique point satisfying f(x ) = 0. Define ξ := x x and := f(x ). hen the gradient descent method is modeled by () with A := I p and B := αi p. Since = f(ξ + x ), we can define the following quadratic supply rate S(ξ, ) = ξ mli p (m + L)I p (m + L)I p ξ I p (7) By co-coercivity, we have S(ξ, ) 0 for all. his just restates Lessard et al. (06, Lemma 6). hen, we can directly apply heorem to construct the dissipation inequality. We can parameterize P = p I p and define the storage function as V (ξ ) = p ξ = p. he LMI (5) becomes ( ) ( ρ )p αp ml m + L αp α + I p m + L p 0. Hence for any 0 ρ <, we have p x + x ρ p if there exists p 0 such that ( ρ )p αp ml m + L αp α + 0 (8) p m + L he LMI (8) is simple and can be analytically solved to recover the existing rate results for the gradient descent method. For example, we can choose (α, ρ, p) to be ( L, m L, L ) or ( L+m, L m L+m, (L + m) ) to immediately recover the standard rate results in Polya (987). Based on the example above, it is evident that choosing a proper supply rate is critical for the construction of a Lyapunov function. he supply rate (7) turns out to be inadequate for the analysis of Nesterov s accelerated method. For Nesterov s accelerated method, the dependence between the internal energy and the driving force is more complicated due to the presence of momentum terms. We will next develop a new supply rate that captures this complicated dependence. We will also mae use of this new supply rate to recover the standard linear rate results for Nesterov s accelerated method. 3. Dissipativity for Accelerated Linear Rates 3.. Dissipativity for Nesterov s Method Suppose f is L-smooth and m-strongly convex with m > 0. Let x be the unique point satisfying f(x ) = 0. Now we consider Nesterov s accelerated method, which uses the following iteration rule to find x : We can rewrite (9) as x+ x x + = y α, y = ( + β)x βx. (9a) (9b) x x = A + Bw x x (0) where := = f (( + β)x βx ). Also, A := à I p, B := B I p, and Ã, B are defined by à := + β β α, B :=. () 0 0 Hence, Nesterov s accelerated method (9) is in the form of () with ξ = ( ) (x x ). Nesterov s accelerated method can improve the convergence rate since the input depends on both x and x, and drives the state in a specific direction, i.e. along (+β)x βx. his leads to a supply rate that extracts energy out of the system significantly faster than with gradient descent. his is formally stated in the next lemma.

4 Lemma 3 Let f be L-smooth and m-strongly convex with m > 0. Let x be the unique point satisfying f(x ) = 0. Consider Nesterov s method (9) or equivalently (0). he following inequalities hold for all trajectories. X X f(x ) f(x + ) f(x ) f(x + ) where X i = X i I p for i =,, and X i are defined by X := β m β m β β m β m β () β β α( Lα) X := ( + β) m β( + β)m ( + β) β( + β)m β m β (3) ( + β) β α( Lα) Given any 0 ρ, one can define the supply rate as (4) with a particular choice of X := ρ X + ( ρ )X. hen this supply rate satisfies the condition S(ξ, ) ρ (f(x ) f(x )) (f(x + ) f(x )). (4) Proof. he proof is similar to the proof of (3.3) (3.4) in Bubec (05), but Bubec (05, Lemma 3.6) must be modified to account for the strong convexity of f. See the supplementary material for a detailed proof. he supply rate (4) captures how the driving force is impacting the future state x +. he physical interpretation is that there is some amount of hidden energy in the system that taes the form of f(x ) f(x ). he supply rate condition (4) describes how the driving force is coupled with the hidden energy in the future. It says the delivered energy is bounded by a weighted decrease of the hidden energy. Based on this supply rate, one can search Lyapunov function using the following theorem. heorem 4 Let f be L-smooth and m-strongly convex with m > 0. Let x be the unique point satisfying f(x ) = 0. Consider Nesterov s accelerated method (9). For any rate 0 ρ <, set X := ρ X +( ρ ) X where X and X are defined in () (3). In addition, let Ã, B be defined by (). If there exists a matrix P R with P 0 such that à P à ρ P B P à à P B B P B X 0 (5) then set P := P I p and define the Lyapunov function V := x x x x P + f(x x x x x ) f(x ), (6) which satisfies V + ρ V for all. Moreover, we have f(x ) f(x ) ρ V 0 for Nesterov s method. Proof. ae the Kronecer product of (5) and I p, and hence (5) holds with A := à I p, B := B I p, and X := X I p. Let the supply rate S be defined by (4). hen, define the quadratic storage function V (ξ ) := ξ P ξ and apply heorem to show V (ξ + ) ρ V (ξ ) S(ξ, ). Based on the supply rate condition (4), we can define the Lyapunov function V := V (ξ )+f(x ) f(x ) and show V + ρ V. Finally, since P 0, we have f(x ) f(x ) ρ V 0. We can immediately recover the proposed Lyapunov function in Wilson et al. (06, heorem 6) by setting P to L P = L m m L. (7) L Clearly P 0. Now define κ := L m. Given α = L, β = κ κ+, and ρ = m L, it is straightforward to verify that the left side of the LMI (5) is equal to m( κ ) 3 (κ + 0 0, κ) which is clearly negative semidefinite. Hence we can immediately construct a Lyapunov function using (6) to prove the linear rate ρ = m L. Searching for analytic certificates such as (7) can either be carried out by directly analyzing the LMI, or by using numerical solutions to guide the search. For example, numerically solving (5) for any fixed L and m directly yields (7), which maes finding the analytical expression easy. 3.. Dissipativity heory for More General Methods We demonstrate the generality of the dissipativity theory on a more general variant of Nesterov s method. Consider a modified accelerated method x + = ( + β)x βx α, y = ( + η)x ηx. (8a) (8b) When β = η, we recover Nesterov s accelerated method. When η = 0, we recover the Heavy-ball method of Polya

5 (987). We can rewrite (8) in state-space form (0) where := = f (( + η)x ηx ), A := à I p, B := B I p, and Ã, B are defined by à := + β β 0, B := α 0. Lemma 5 Let f be L-smooth and m-strongly convex with m > 0. Let x be the unique point satisfying f(x ) = 0. Consider the general accelerated method (8). Define the state ξ := ( ) (x x ) and the input := = f(( + η)x ηx ). hen the following inequalities hold for all trajectories. ξ ξ (X + X ) f(x ) f(x + ) (9) ξ (X + X 3 ) ξ f(x w ) f(x + ) (0) with X i := X i I p for i =,, 3, and X i are defined by X := Lδ Lδ ( Lα)δ Lδ Lδ ( Lα)δ ( Lα)δ ( Lα)δ α( Lα) X := η m η m η η m η m η η η 0 X 3 := ( + η) m η( + η)m ( + η) η( + η)m η m η ( + η) η 0 with δ := β η. In addition, one can define the supply rate as (4) with X := X + ρ X + ( ρ )X 3. hen for all trajectories (ξ, ) of the general accelerated method (8), this supply rate satisfies the inequality S(ξ, ) ρ (f(x ) f(x )) (f(x + ) f(x )). () Proof. A detailed proof is presented in the supplementary material. One mainly needs to modify the proof by taing the difference between β and η into accounts. Based the supply rate (), we can immediately modify heorem 4 to handle the more general algorithm (8). Although we do not have general analytical formulas for the convergence rate of (8), preliminary numerical results suggest that there are a family of (α, β, η) leading to the rate ρ = m L, and the required value of P is quite different from (7). his indicates that our proposed LMI approach could go beyond the Lyapunov function (7). Remar 6 It is noted in Lessard et al. (06, 3.) that searching over combinations of multiple IQCs may yield improved rate bounds. he same is true of supply rates. For example, we could include λ, λ 0 as decision variables and search for a dissipation inequality with supply rate λ S + λ S where e.g. S is (7) and S is (). 4. Dissipativity for Sublinear Rates he LMI approach in (Lessard et al., 06) is tailored for the analysis of linear convergence rates for algorithms that are time-invariant (the A and B matrices in () do not change with ). We now show that dissipativity theory can be used to analyze the sublinear rates O(/) and O(/ ) via slight modifications of the dissipation inequality. 4.. Dissipativity for O(/) rates he O(/) modification, which we present first, is very similar to the linear rate result. heorem 7 Suppose f has a finite minimum f. Consider the LI system () with a supply rate satisfying S(ξ, ) (f(z ) f ) () for some sequence {z }. If there exists a nonnegative storage function V such that the dissipation inequality () holds over all trajectories of (ξ, ), then the following inequality holds over all trajectories as well. (f(z ) f ) V (ξ 0 ). (3) =0 In addition, we have the sublinear convergence rate min (f(z ) f ) V (ξ 0) : +. (4) If f(z + ) f(z ) for all, then (4) implies that f(z ) f V (ξ0) + for all. Proof. By the supply rate condition () and the dissipation inequality (), we immediately get V (ξ + ) V (ξ ) + f(z ) f 0. Summing the above inequality from = 0 to and using V 0 yields the desired result. o address the sublinear rate analysis, the critical step is to choose an appropriate supply rate. If f is L-smooth and convex, this is easily done. Consider the gradient method (6) and define the quantities ξ :=, A := I p, and B := αi p as in Section.3. Since f is L-smooth and convex, define the quadratic supply rate ξ 0p S(ξ, ) := I p ξ I, p L I p

6 which satisfies S(ξ, ) f f(x ) for all (cocoercivity). hen we can directly apply the LMI (5) with ρ = to construct the dissipation inequality. Setting P = p I p and defining the storage function as V (ξ ) := p ξ = p, the LMI (5) becomes ( ) 0 αp 0 + I p 0, αp α p L which is equivalent to 0 αp + αp + α p L 0. (5) Due to the (, ) entry being zero, (5) holds if and only if { αp + = 0 { p = α α p L 0 = α L We can choose α = L and the bound (4) becomes min (f(x ) f ) L x 0 x. ( + ) Since gradient descent has monotonically nonincreasing iterates, that is f(x + ) f(x ) for all, we immediately recover the standard O(/) rate result. 4.. Dissipativity for O(/ ) rates Certifying a O(/) rate for the gradient method required solving a single LMI (5). However, this is not the case for the O(/ ) rate analysis of Nesterov s accelerated method. Nesterov s algorithm has parameters that depend on so the analysis is more involved. We will begin with the general case and then specialize to Nesterov s algorithm. Consider the dynamical system ξ + = A ξ + B (6) he state matrix A and input matrix B change with the time step, and hence (6) is referred to as a linear timevarying (LV) system. he analysis of LV systems typically requires a time-dependent supply rate such as S (ξ, ) := ξ ξ X (7) If there exists a sequence {P } with P 0 such that A P + A P A P +B B P +A B P X 0 (8) +B for all, then we have V + (ξ + ) V (ξ ) S (ξ, ) with the time-dependent storage function defined as V (ξ ) := ξ P ξ. his is a standard approach for dissipation inequality constructions of LV systems and can be proved using the same proof technique in heorem. Note that we need (8) to simultaneously hold for all. his leads to an infinite number of LMIs in general. Now we consider Nesterov s accelerated method for a convex L-smooth objective function f (Nesterov, 003). x + = y α, y = ( + β )x β x. (9a) (9b) It is nown that (9) achieves a rate of O(/ ) when α := /L and β is defined recursively as follows. ζ = 0, ζ + = + + 4ζ, β = ζ. ζ he sequence {ζ } satisfies ζ ζ = ζ. We now present a dissipativity theory for the sublinear rate analysis of Nesterov s accelerated method. Rewrite (9) as x+ x x x = A x x + B x x (30) where := = f (( + β )x β x ), A := Ã I p, B := B I p, and Ã, B are given by + β β Ã := α, B :=. 0 0 Hence, Nesterov s accelerated method (9) is in the form of (6) with ξ := ( ) (x x ). he O(/ ) rate analysis of Nesterov s method (9) requires the following time-dependent supply rate. Lemma 8 Let f be L-smooth and convex. Let x be a point satisfying f(x ) = 0. In addition, set f := f(x ). Consider Nesterov s method (9) or equivalently (30). he following inequalities hold for all trajectories and for all. M N f(x ) f(x + ) f(x ) f(x + ) where M := M I p, N := Ñ I p, and M, Ñ are defined by 0 0 β M := 0 0 β, (3) β β L 0 0 ( + β ) Ñ := 0 0. (3) ( + β ) β β L

7 Given any nondecreasing sequence {µ }, one can define the supply rate as (7) with the particular choice X := µ M + (µ + µ )N for all. hen this supply rate satisfies the condition S(ξ, ) µ (f(x ) f ) µ + (f(x + ) f ). (33) Proof. he proof is very similar to the proof of Lemma 3 with an extra condition m = 0. A detailed proof is presented in the supplementary material. heorem 9 Consider the LV dynamical system (6). If there exist matrices {P } with P 0 and a nondecreasing sequence of nonnegative scalars {µ } such that A P + A P A P +B B P +A B P +B µ M (µ + µ )N 0 (34) then we have V + (ξ + ) V (ξ ) S (ξ, ) with the storage function V (ξ ) := ξ P ξ and the supply rate (7) using X := µ M + (µ + µ )N for all. In addition, if this supply rate satisfies (33), we have f(x ) f µ 0(f(x 0 ) f ) + V 0 (ξ 0 ) µ (35) Proof. Based on the state-space model (6), we can left and right multiply (34) by ξ w and ξ w, and directly obtain the dissipation inequality. Combining this dissipation inequality with (33), we can show V + (ξ + ) + µ + (f + f ) V (ξ ) + µ (f f ). Summing the above inequality as in the proof of heorem 7 and using the fact that P 0 for all yields the result. We are now ready to show the O(/ ) rate result for Nesterov s accelerated method. Set µ := (ζ ) and P := L ζ ζ ζ ζ. Note that P 0 and µ + µ = ζ. It is straightforward to verify that this choice of {P, µ } maes the left side of (34) the zero matrix and hence (35) holds. Using the fact that ζ / (easily proved by induction), we have µ /4 and the O(/ ) rate for Nesterov s method follows. Remar 0 heorem 9 is quite general. he infinite family of LMIs (34) can also be applied for linear rate analysis and collapses down to the single LMI (5) in that case. o apply (34) to linear rate analysis, one needs to slightly modify (3) (3) such that the strong convexity parameter m is incorporated into the formulas of M, N. By setting µ := ρ and P := ρ P, then the LMI (34) is the same for all and we recover (5). his illustrates how the infinite number of LMIs (34) can collapse to a single LMI under special circumstances. 5. Continuous-time Dissipation Inequality Finally, we briefly discuss dissipativity theory for the continuous-time ODEs used in optimization research. Note that dissipativity theory was first introduced in Willems (97a;b) in the context of continuous-time systems. We denote continuous-time variables in upper case. Consider a continuous-time state-space model Λ(t) = A(t)Λ(t) + B(t)W (t) (36) where Λ(t) is the state, W (t) is the input, and Λ(t) denotes the time derivative of Λ(t). In continuous-time, the supply rate is a function S : R nλ R n W R + R that assigns a scalar to each possible state and input pair. Here, we allow S to also depend on time t R +. o simplify our exposition, we will omit the explicit time dependence (t) from our notation. Definition he dynamical system (36) is dissipative with respect to the supply rate S if there exists a function V : R nλ R + R + such that V (Λ, t) 0 for all Λ R nλ and t 0 and V (Λ, t) S(Λ, W, t) (37) for every trajectory of (36). Here, V denotes the Lie derivative (or total derivative); it accounts for Λ s dependence on t. he function V is called a storage function, and (37) is a (continuous-time) dissipation inequality. For any given quadratic supply rate, one can automatically construct the continuous-time dissipation inequality using semidefinite programs. he following result is standard in the controls literature. heorem Suppose X(t) R (nλ+n W ) (n Λ+n W ) and X(t) =X(t) for all t. Consider the quadratic supply rate S(Λ, W, t) := Λ X W Λ W for all t. (38) If there exists a family of matrices P (t) R nλ nλ with P (t) 0 such that A P + P A + P P B B X 0 P 0 for all t. (39) hen we have V (Λ, t) S(Λ, W, t) with the storage function defined as V (Λ, t) := Λ P Λ.

8 Proof. Based on the state-space model (36), we can apply the product rule for total derivatives and obtain V (Λ, t) = Λ P Λ + Λ P Λ + Λ P Λ Λ A = P + P A + P P B Λ W B. P 0 W Hence we can left and right multiply (39) by Λ W and Λ W and obtain the desired conclusion. he algebraic structure of the LMI (39) is simpler than that of its discrete-time counterpart (5) because for given P, the continuous-time LMI is linear in A, B rather than being quadratic. his may explain why continuous-time ODEs are sometimes more amenable to analytic approaches than their discretized counterparts. We demonstrate the utility of (39) on the continuous-time limit of Nesterov s accelerated method in Su et al. (06): Ÿ + 3 Ẏ + f(y ) = 0, (40) t which we rewrite as (36) with Λ :=, Ẏ Y x W := f(y ), x is a point satisfying f(x ) = 0, and A, B are defined by 3 A(t) := t I p 0 p Ip, B(t) :=. I p 0 p Suppose f is convex and set f := f(x ). Su et al. (06, heorem 3) constructs the Lyapunov function V(Y, t) := t (f(y ) f )+ Y + t Ẏ x to show that V 0 and then directly demonstrate a O(/t ) rate for the ODE (40). o illustrate the power of the dissipation inequality, we use the LMI (39) to recover this Lyapunov function. Denote G(Y, t) := t (f(y ) f ). Note that convexity implies f(y ) f f(y ) (Y x ), which we rewrite as Ẏ 0 p 0 p 0 p Ẏ t(f(y ) f ) Y x 0 p 0 p ti p Y x. W 0 p ti p 0 p W Since Ġ(Y, t) = t(f(y ) f ) + t f(y ) Ẏ, we have Ẏ Ġ Y x W 0 p t 0 p 0 p I p Ẏ 0 p 0 p ti p Y x. t I p ti p 0 p W Now choose the supply rate S as (38) with X(t) given by t 0 p 0 p I p X(t) := 0 p 0 p ti p. t I p ti p 0 p Clearly S(Λ, W, t) Ġ(Y, t). Now we can choose P (t) := t I p I t p I p I p. Substituting P and X into (39), the left side of (39) becomes identically zero. herefore, V (Λ, t) S(Λ, W, t) Ġ(Y, t) with the storage function V (Λ, t) := Λ P Λ. By defining the Lyapunov function V(Y, t) := V (Y, t) + G(Y, t), we immediately obtain V 0 and also recover the same Lyapunov function used in Su et al. (06). Remar 3 As in the discrete-time case, the infinite family of LMIs (39) can also be reduced to a single LMI for the linear rate analysis of continuous-time ODEs. For further discussion on the topic of continuous-time exponential dissipation inequalities, see Hu & Seiler (06). 6. Conclusion and Future Wor In this paper, we developed new notions of dissipativity theory for understanding of Nesterov s accelerated method. Our approach enjoys advantages of both the IQC framewor (Lessard et al., 06) and the discretization approach (Wilson et al., 06) in the sense that our proposed LMI condition is simple enough for analytical rate analysis of Nesterov s method and can also be easily generalized to more complicated algorithms. Our approach also gives an intuitive interpretation of the convergence behavior of Nesterov s method using an energy dissipation perspective. One potential application of our dissipativity theory is for the design of accelerated methods that are robust to gradient noise. his is similar to the algorithm design wor in Lessard et al. (06, 6). However, compared with the IQC approach in Lessard et al. (06), our dissipativity theory leads to smaller LMIs. his can be beneficial since smaller LMIs are generally easier to solve analytically. In addition, the IQC approach in Lessard et al. (06) is only applicable to strongly-convex objective functions while our dissipativity theory may facilitate the design of robust algorithm for wealy-convex objective functions. he dissipativity framewor may also lead to the design of adaptive or timevarying algorithms. Acnowledgements Both authors would lie to than the anonymous reviewers for helpful suggestions that improved the clarity and quality of the final manuscript. his material is based upon wor supported by the National Science Foundation under Grant No Both authors also acnowledge support from the Wisconsin Institute for Discovery, the College of Engineering, and the Department of Electrical and Computer Engineering at the University of Wisconsin Madison.

9 References Bubec, S. Convex optimization: Algorithms and complexity. Foundations and rends R in Machine Learning, 8(3-4):3 357, 05. Bubec, S., Lee, Y., and Singh, M. A geometric alternative to Nesterov s accelerated gradient descent. arxiv preprint arxiv: , 05. Drusvyatsiy, D., Fazel, M., and Roy, S. An optimal first order method based on optimal quadratic averaging. arxiv preprint arxiv: , 06. Fazlyab, M., Ribeiro, A., Morari, M., and Preciado, V. M. Analysis of optimization algorithms via integral quadratic constraints: Nonstrongly convex problems. arxiv preprint arxiv: , 07. Wibisono, A., Wilson, A., and Jordan, M. A variational perspective on accelerated methods in optimization. Proceedings of the National Academy of Sciences, pp , 06. Willems, J.C. Dissipative dynamical systems Part I: General theory. Archive for Rational Mech. and Analysis, 45 (5):3 35, 97a. Willems, J.C. Dissipative dynamical systems Part II: Linear systems with quadratic supply rates. Archive for Rational Mech. and Analysis, 45(5):35 393, 97b. Wilson, A., Recht, B., and Jordan, M. A Lyapunov analysis of momentum methods in optimization. arxiv preprint arxiv:6.0635, 06. Flammarion, N. and Bach, F. From averaging to acceleration, there is only a step-size. In COL, pp , 05. Hu, B. and Seiler, P. Exponential decay rate conditions for uncertain linear systems using integral quadratic constraints. IEEE ransactions on Automatic Control, 6 (): , 06. Lessard, L., Recht, B., and Pacard, A. Analysis and design of optimization algorithms via integral quadratic constraints. SIAM Journal on Optimization, 6():57 95, 06. Megretsi, A. and Rantzer, A. System analysis via integral quadratic constraints. IEEE ransactions on Automatic Control, 4:89 830, 997. Megretsi, A., Jönsson, U., Kao, C. Y., and Rantzer, A. Control Systems Handboo, chapter 4: Integral Quadratic Constraints. CRC Press, 00. Nesterov, Y. Introductory Lectures on Convex Optimization: A Basic Course. Kluwer Academic Publishers, 003. Polya, B.. Introduction to optimization. Optimization Software, 987. Seiler, P. Stability analysis with dissipation inequalities and integral quadratic constraints. IEEE ransactions on Automatic Control, 60(6): , 05. Su, W., Boyd, S., and Candès, E. A differential equation for modeling Nesterov s accelerated gradient method: heory and insights. Journal of Machine Learning Research, 7(53): 43, 06.

automating the analysis and design of large-scale optimization algorithms

automating the analysis and design of large-scale optimization algorithms automating the analysis and design of large-scale optimization algorithms Laurent Lessard University of Wisconsin Madison Joint work with Ben Recht, Andy Packard, Bin Hu, Bryan Van Scoy, Saman Cyrus October,

More information

Analytical Convergence Regions of Accelerated First-Order Methods in Nonconvex Optimization under Regularity Condition

Analytical Convergence Regions of Accelerated First-Order Methods in Nonconvex Optimization under Regularity Condition Analytical Convergence Regions of Accelerated First-Order Methods in Nonconvex Optimization under Regularity Condition Huaqing Xiong Yuejie Chi Bin Hu Wei Zhang October 2018 arxiv:1810.03229v1 [math.oc

More information

Analysis of Approximate Stochastic Gradient Using Quadratic Constraints and Sequential Semidefinite Programs

Analysis of Approximate Stochastic Gradient Using Quadratic Constraints and Sequential Semidefinite Programs Analysis of Approximate Stochastic Gradient Using Quadratic Constraints and Sequential Semidefinite Programs Bin Hu Peter Seiler Laurent Lessard November, 017 Abstract We present convergence rate analysis

More information

A Conservation Law Method in Optimization

A Conservation Law Method in Optimization A Conservation Law Method in Optimization Bin Shi Florida International University Tao Li Florida International University Sundaraja S. Iyengar Florida International University Abstract bshi1@cs.fiu.edu

More information

arxiv: v1 [math.oc] 3 Nov 2017

arxiv: v1 [math.oc] 3 Nov 2017 Analysis of Approximate Stochastic Gradient Using Quadratic Constraints and Sequential Semidefinite Programs arxiv:1711.00987v1 [math.oc] 3 Nov 2017 Bin Hu Peter Seiler Laurent Lessard November 6, 2017

More information

NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained

NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS 1. Introduction. We consider first-order methods for smooth, unconstrained optimization: (1.1) minimize f(x), x R n where f : R n R. We assume

More information

IFT Lecture 6 Nesterov s Accelerated Gradient, Stochastic Gradient Descent

IFT Lecture 6 Nesterov s Accelerated Gradient, Stochastic Gradient Descent IFT 6085 - Lecture 6 Nesterov s Accelerated Gradient, Stochastic Gradient Descent This version of the notes has not yet been thoroughly checked. Please report any bugs to the scribes or instructor. Scribe(s):

More information

Gradient Methods Using Momentum and Memory

Gradient Methods Using Momentum and Memory Chapter 3 Gradient Methods Using Momentum and Memory The steepest descent method described in Chapter always steps in the negative gradient direction, which is orthogonal to the boundary of the level set

More information

Integration Methods and Optimization Algorithms

Integration Methods and Optimization Algorithms Integration Methods and Optimization Algorithms Damien Scieur INRIA, ENS, PSL Research University, Paris France damien.scieur@inria.fr Francis Bach INRIA, ENS, PSL Research University, Paris France francis.bach@inria.fr

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

Technical Notes and Correspondence

Technical Notes and Correspondence 1108 IEEE RANSACIONS ON AUOMAIC CONROL, VOL. 47, NO. 7, JULY 2002 echnical Notes and Correspondence Stability Analysis of Piecewise Discrete-ime Linear Systems Gang Feng Abstract his note presents a stability

More information

Nonlinear Control Design for Linear Differential Inclusions via Convex Hull Quadratic Lyapunov Functions

Nonlinear Control Design for Linear Differential Inclusions via Convex Hull Quadratic Lyapunov Functions Nonlinear Control Design for Linear Differential Inclusions via Convex Hull Quadratic Lyapunov Functions Tingshu Hu Abstract This paper presents a nonlinear control design method for robust stabilization

More information

Robust Stability. Robust stability against time-invariant and time-varying uncertainties. Parameter dependent Lyapunov functions

Robust Stability. Robust stability against time-invariant and time-varying uncertainties. Parameter dependent Lyapunov functions Robust Stability Robust stability against time-invariant and time-varying uncertainties Parameter dependent Lyapunov functions Semi-infinite LMI problems From nominal to robust performance 1/24 Time-Invariant

More information

An asymptotic ratio characterization of input-to-state stability

An asymptotic ratio characterization of input-to-state stability 1 An asymptotic ratio characterization of input-to-state stability Daniel Liberzon and Hyungbo Shim Abstract For continuous-time nonlinear systems with inputs, we introduce the notion of an asymptotic

More information

c 1998 Society for Industrial and Applied Mathematics

c 1998 Society for Industrial and Applied Mathematics SIAM J MARIX ANAL APPL Vol 20, No 1, pp 182 195 c 1998 Society for Industrial and Applied Mathematics POSIIVIY OF BLOCK RIDIAGONAL MARICES MARIN BOHNER AND ONDŘEJ DOŠLÝ Abstract his paper relates disconjugacy

More information

A Unified Approach to Proximal Algorithms using Bregman Distance

A Unified Approach to Proximal Algorithms using Bregman Distance A Unified Approach to Proximal Algorithms using Bregman Distance Yi Zhou a,, Yingbin Liang a, Lixin Shen b a Department of Electrical Engineering and Computer Science, Syracuse University b Department

More information

ROBUST STABILITY TEST FOR UNCERTAIN DISCRETE-TIME SYSTEMS: A DESCRIPTOR SYSTEM APPROACH

ROBUST STABILITY TEST FOR UNCERTAIN DISCRETE-TIME SYSTEMS: A DESCRIPTOR SYSTEM APPROACH Latin American Applied Research 41: 359-364(211) ROBUS SABILIY ES FOR UNCERAIN DISCREE-IME SYSEMS: A DESCRIPOR SYSEM APPROACH W. ZHANG,, H. SU, Y. LIANG, and Z. HAN Engineering raining Center, Shanghai

More information

Maximization of Submodular Set Functions

Maximization of Submodular Set Functions Northeastern University Department of Electrical and Computer Engineering Maximization of Submodular Set Functions Biomedical Signal Processing, Imaging, Reasoning, and Learning BSPIRAL) Group Author:

More information

arxiv: v3 [math.oc] 8 Jan 2019

arxiv: v3 [math.oc] 8 Jan 2019 Why Random Reshuffling Beats Stochastic Gradient Descent Mert Gürbüzbalaban, Asuman Ozdaglar, Pablo Parrilo arxiv:1510.08560v3 [math.oc] 8 Jan 2019 January 9, 2019 Abstract We analyze the convergence rate

More information

A projection algorithm for strictly monotone linear complementarity problems.

A projection algorithm for strictly monotone linear complementarity problems. A projection algorithm for strictly monotone linear complementarity problems. Erik Zawadzki Department of Computer Science epz@cs.cmu.edu Geoffrey J. Gordon Machine Learning Department ggordon@cs.cmu.edu

More information

WE consider an undirected, connected network of n

WE consider an undirected, connected network of n On Nonconvex Decentralized Gradient Descent Jinshan Zeng and Wotao Yin Abstract Consensus optimization has received considerable attention in recent years. A number of decentralized algorithms have been

More information

1 Lyapunov theory of stability

1 Lyapunov theory of stability M.Kawski, APM 581 Diff Equns Intro to Lyapunov theory. November 15, 29 1 1 Lyapunov theory of stability Introduction. Lyapunov s second (or direct) method provides tools for studying (asymptotic) stability

More information

On the Stabilization of Neutrally Stable Linear Discrete Time Systems

On the Stabilization of Neutrally Stable Linear Discrete Time Systems TWCCC Texas Wisconsin California Control Consortium Technical report number 2017 01 On the Stabilization of Neutrally Stable Linear Discrete Time Systems Travis J. Arnold and James B. Rawlings Department

More information

Written Examination

Written Examination Division of Scientific Computing Department of Information Technology Uppsala University Optimization Written Examination 202-2-20 Time: 4:00-9:00 Allowed Tools: Pocket Calculator, one A4 paper with notes

More information

A DELAYED PROXIMAL GRADIENT METHOD WITH LINEAR CONVERGENCE RATE. Hamid Reza Feyzmahdavian, Arda Aytekin, and Mikael Johansson

A DELAYED PROXIMAL GRADIENT METHOD WITH LINEAR CONVERGENCE RATE. Hamid Reza Feyzmahdavian, Arda Aytekin, and Mikael Johansson 204 IEEE INTERNATIONAL WORKSHOP ON ACHINE LEARNING FOR SIGNAL PROCESSING, SEPT. 2 24, 204, REIS, FRANCE A DELAYED PROXIAL GRADIENT ETHOD WITH LINEAR CONVERGENCE RATE Hamid Reza Feyzmahdavian, Arda Aytekin,

More information

18.657: Mathematics of Machine Learning

18.657: Mathematics of Machine Learning 8.657: Mathematics of Machine Learning Lecturer: Philippe Rigollet Lecture Scribe: Kevin Li Oct. 4, 05. CONVEX OPTIMIZATION FOR MACHINE LEARNING In this lecture, we will cover the basics of convex optimization

More information

Geometric Descent Method for Convex Composite Minimization

Geometric Descent Method for Convex Composite Minimization Geometric Descent Method for Convex Composite Minimization Shixiang Chen 1, Shiqian Ma 1, and Wei Liu 2 1 Department of SEEM, The Chinese University of Hong Kong, Hong Kong 2 Tencent AI Lab, Shenzhen,

More information

Cubic regularization of Newton s method for convex problems with constraints

Cubic regularization of Newton s method for convex problems with constraints CORE DISCUSSION PAPER 006/39 Cubic regularization of Newton s method for convex problems with constraints Yu. Nesterov March 31, 006 Abstract In this paper we derive efficiency estimates of the regularized

More information

arxiv: v2 [math.oc] 5 May 2018

arxiv: v2 [math.oc] 5 May 2018 The Impact of Local Geometry and Batch Size on Stochastic Gradient Descent for Nonconvex Problems Viva Patel a a Department of Statistics, University of Chicago, Illinois, USA arxiv:1709.04718v2 [math.oc]

More information

Designing Stable Inverters and State Observers for Switched Linear Systems with Unknown Inputs

Designing Stable Inverters and State Observers for Switched Linear Systems with Unknown Inputs Designing Stable Inverters and State Observers for Switched Linear Systems with Unknown Inputs Shreyas Sundaram and Christoforos N. Hadjicostis Abstract We present a method for estimating the inputs and

More information

How to Escape Saddle Points Efficiently? Praneeth Netrapalli Microsoft Research India

How to Escape Saddle Points Efficiently? Praneeth Netrapalli Microsoft Research India How to Escape Saddle Points Efficiently? Praneeth Netrapalli Microsoft Research India Chi Jin UC Berkeley Michael I. Jordan UC Berkeley Rong Ge Duke Univ. Sham M. Kakade U Washington Nonconvex optimization

More information

A State-Space Approach to Control of Interconnected Systems

A State-Space Approach to Control of Interconnected Systems A State-Space Approach to Control of Interconnected Systems Part II: General Interconnections Cédric Langbort Center for the Mathematics of Information CALIFORNIA INSTITUTE OF TECHNOLOGY clangbort@ist.caltech.edu

More information

FINITE HORIZON ROBUST MODEL PREDICTIVE CONTROL USING LINEAR MATRIX INEQUALITIES. Danlei Chu, Tongwen Chen, Horacio J. Marquez

FINITE HORIZON ROBUST MODEL PREDICTIVE CONTROL USING LINEAR MATRIX INEQUALITIES. Danlei Chu, Tongwen Chen, Horacio J. Marquez FINITE HORIZON ROBUST MODEL PREDICTIVE CONTROL USING LINEAR MATRIX INEQUALITIES Danlei Chu Tongwen Chen Horacio J Marquez Department of Electrical and Computer Engineering University of Alberta Edmonton

More information

Lecture Note 5: Semidefinite Programming for Stability Analysis

Lecture Note 5: Semidefinite Programming for Stability Analysis ECE7850: Hybrid Systems:Theory and Applications Lecture Note 5: Semidefinite Programming for Stability Analysis Wei Zhang Assistant Professor Department of Electrical and Computer Engineering Ohio State

More information

Minimum-Phase Property of Nonlinear Systems in Terms of a Dissipation Inequality

Minimum-Phase Property of Nonlinear Systems in Terms of a Dissipation Inequality Minimum-Phase Property of Nonlinear Systems in Terms of a Dissipation Inequality Christian Ebenbauer Institute for Systems Theory in Engineering, University of Stuttgart, 70550 Stuttgart, Germany ce@ist.uni-stuttgart.de

More information

1 The independent set problem

1 The independent set problem ORF 523 Lecture 11 Spring 2016, Princeton University Instructor: A.A. Ahmadi Scribe: G. Hall Tuesday, March 29, 2016 When in doubt on the accuracy of these notes, please cross chec with the instructor

More information

The Proximal Gradient Method

The Proximal Gradient Method Chapter 10 The Proximal Gradient Method Underlying Space: In this chapter, with the exception of Section 10.9, E is a Euclidean space, meaning a finite dimensional space endowed with an inner product,

More information

Exponential Decay Rate Conditions for Uncertain Linear Systems Using Integral Quadratic Constraints

Exponential Decay Rate Conditions for Uncertain Linear Systems Using Integral Quadratic Constraints Exponential Decay Rate Conditions for Uncertain Linear Systems Using Integral Quadratic Constraints 1 Bin Hu and Peter Seiler Abstract This paper develops linear matrix inequality (LMI) conditions to test

More information

EE363 homework 8 solutions

EE363 homework 8 solutions EE363 Prof. S. Boyd EE363 homework 8 solutions 1. Lyapunov condition for passivity. The system described by ẋ = f(x, u), y = g(x), x() =, with u(t), y(t) R m, is said to be passive if t u(τ) T y(τ) dτ

More information

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra. DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1

More information

Dissipativity. Outline. Motivation. Dissipative Systems. M. Sami Fadali EBME Dept., UNR

Dissipativity. Outline. Motivation. Dissipative Systems. M. Sami Fadali EBME Dept., UNR Dissipativity M. Sami Fadali EBME Dept., UNR 1 Outline Differential storage functions. QSR Dissipativity. Algebraic conditions for dissipativity. Stability of dissipative systems. Feedback Interconnections

More information

Passivity-based Stabilization of Non-Compact Sets

Passivity-based Stabilization of Non-Compact Sets Passivity-based Stabilization of Non-Compact Sets Mohamed I. El-Hawwary and Manfredi Maggiore Abstract We investigate the stabilization of closed sets for passive nonlinear systems which are contained

More information

Newton-like method with diagonal correction for distributed optimization

Newton-like method with diagonal correction for distributed optimization Newton-lie method with diagonal correction for distributed optimization Dragana Bajović Dušan Jaovetić Nataša Krejić Nataša Krlec Jerinić August 15, 2015 Abstract We consider distributed optimization problems

More information

arxiv: v1 [math.oc] 1 Jul 2016

arxiv: v1 [math.oc] 1 Jul 2016 Convergence Rate of Frank-Wolfe for Non-Convex Objectives Simon Lacoste-Julien INRIA - SIERRA team ENS, Paris June 8, 016 Abstract arxiv:1607.00345v1 [math.oc] 1 Jul 016 We give a simple proof that the

More information

6. Proximal gradient method

6. Proximal gradient method L. Vandenberghe EE236C (Spring 2016) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping

More information

Lecture 1: January 12

Lecture 1: January 12 10-725/36-725: Convex Optimization Fall 2015 Lecturer: Ryan Tibshirani Lecture 1: January 12 Scribes: Seo-Jin Bang, Prabhat KC, Josue Orellana 1.1 Review We begin by going through some examples and key

More information

IFT Lecture 2 Basics of convex analysis and gradient descent

IFT Lecture 2 Basics of convex analysis and gradient descent IF 6085 - Lecture Basics of convex analysis and gradient descent his version of the notes has not yet been thoroughly checked. Please report any bugs to the scribes or instructor. Scribes: Assya rofimov,

More information

KRIVINE SCHEMES ARE OPTIMAL

KRIVINE SCHEMES ARE OPTIMAL KRIVINE SCHEMES ARE OPTIMAL ASSAF NAOR AND ODED REGEV Abstract. It is shown that for every N there exists a Borel probability measure µ on { 1, 1} R { 1, 1} R such that for every m, n N and x 1,..., x

More information

ADMM and Accelerated ADMM as Continuous Dynamical Systems

ADMM and Accelerated ADMM as Continuous Dynamical Systems Guilherme França 1 Daniel P. Robinson 1 René Vidal 1 Abstract Recently, there has been an increasing interest in using tools from dynamical systems to analyze the behavior of simple optimization algorithms

More information

A Second-Order Path-Following Algorithm for Unconstrained Convex Optimization

A Second-Order Path-Following Algorithm for Unconstrained Convex Optimization A Second-Order Path-Following Algorithm for Unconstrained Convex Optimization Yinyu Ye Department is Management Science & Engineering and Institute of Computational & Mathematical Engineering Stanford

More information

Memoryless output feedback nullification and canonical forms, for time varying systems

Memoryless output feedback nullification and canonical forms, for time varying systems Memoryless output feedback nullification and canonical forms, for time varying systems Gera Weiss May 19, 2005 Abstract We study the possibility of nullifying time-varying systems with memoryless output

More information

On Acceleration with Noise-Corrupted Gradients. + m k 1 (x). By the definition of Bregman divergence:

On Acceleration with Noise-Corrupted Gradients. + m k 1 (x). By the definition of Bregman divergence: A Omitted Proofs from Section 3 Proof of Lemma 3 Let m x) = a i On Acceleration with Noise-Corrupted Gradients fxi ), u x i D ψ u, x 0 ) denote the function under the minimum in the lower bound By Proposition

More information

Nonlinear Systems Theory

Nonlinear Systems Theory Nonlinear Systems Theory Matthew M. Peet Arizona State University Lecture 2: Nonlinear Systems Theory Overview Our next goal is to extend LMI s and optimization to nonlinear systems analysis. Today we

More information

Comparison of Modern Stochastic Optimization Algorithms

Comparison of Modern Stochastic Optimization Algorithms Comparison of Modern Stochastic Optimization Algorithms George Papamakarios December 214 Abstract Gradient-based optimization methods are popular in machine learning applications. In large-scale problems,

More information

Optimization-based Modeling and Analysis Techniques for Safety-Critical Software Verification

Optimization-based Modeling and Analysis Techniques for Safety-Critical Software Verification Optimization-based Modeling and Analysis Techniques for Safety-Critical Software Verification Mardavij Roozbehani Eric Feron Laboratory for Information and Decision Systems Department of Aeronautics and

More information

arxiv: v4 [math.oc] 5 Jan 2016

arxiv: v4 [math.oc] 5 Jan 2016 Restarted SGD: Beating SGD without Smoothness and/or Strong Convexity arxiv:151.03107v4 [math.oc] 5 Jan 016 Tianbao Yang, Qihang Lin Department of Computer Science Department of Management Sciences The

More information

Network Reconstruction from Intrinsic Noise: Non-Minimum-Phase Systems

Network Reconstruction from Intrinsic Noise: Non-Minimum-Phase Systems Preprints of the 19th World Congress he International Federation of Automatic Control Network Reconstruction from Intrinsic Noise: Non-Minimum-Phase Systems David Hayden, Ye Yuan Jorge Goncalves Department

More information

Deep Linear Networks with Arbitrary Loss: All Local Minima Are Global

Deep Linear Networks with Arbitrary Loss: All Local Minima Are Global homas Laurent * 1 James H. von Brecht * 2 Abstract We consider deep linear networks with arbitrary convex differentiable loss. We provide a short and elementary proof of the fact that all local minima

More information

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison Optimization Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison optimization () cost constraints might be too much to cover in 3 hours optimization (for big

More information

Positive semidefinite matrix approximation with a trace constraint

Positive semidefinite matrix approximation with a trace constraint Positive semidefinite matrix approximation with a trace constraint Kouhei Harada August 8, 208 We propose an efficient algorithm to solve positive a semidefinite matrix approximation problem with a trace

More information

On stability analysis of linear discrete-time switched systems using quadratic Lyapunov functions

On stability analysis of linear discrete-time switched systems using quadratic Lyapunov functions On stability analysis of linear discrete-time switched systems using quadratic Lyapunov functions Paolo Mason, Mario Sigalotti, and Jamal Daafouz Abstract he paper deals with the stability properties of

More information

Riccati difference equations to non linear extended Kalman filter constraints

Riccati difference equations to non linear extended Kalman filter constraints International Journal of Scientific & Engineering Research Volume 3, Issue 12, December-2012 1 Riccati difference equations to non linear extended Kalman filter constraints Abstract Elizabeth.S 1 & Jothilakshmi.R

More information

Optimization Tutorial 1. Basic Gradient Descent

Optimization Tutorial 1. Basic Gradient Descent E0 270 Machine Learning Jan 16, 2015 Optimization Tutorial 1 Basic Gradient Descent Lecture by Harikrishna Narasimhan Note: This tutorial shall assume background in elementary calculus and linear algebra.

More information

Linear-Quadratic Optimal Control: Full-State Feedback

Linear-Quadratic Optimal Control: Full-State Feedback Chapter 4 Linear-Quadratic Optimal Control: Full-State Feedback 1 Linear quadratic optimization is a basic method for designing controllers for linear (and often nonlinear) dynamical systems and is actually

More information

Newton-like method with diagonal correction for distributed optimization

Newton-like method with diagonal correction for distributed optimization Newton-lie method with diagonal correction for distributed optimization Dragana Bajović Dušan Jaovetić Nataša Krejić Nataša Krlec Jerinić February 7, 2017 Abstract We consider distributed optimization

More information

Optimality, Duality, Complementarity for Constrained Optimization

Optimality, Duality, Complementarity for Constrained Optimization Optimality, Duality, Complementarity for Constrained Optimization Stephen Wright University of Wisconsin-Madison May 2014 Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 1 / 41 Linear

More information

Stochastic and online algorithms

Stochastic and online algorithms Stochastic and online algorithms stochastic gradient method online optimization and dual averaging method minimizing finite average Stochastic and online optimization 6 1 Stochastic optimization problem

More information

Analysis and synthesis: a complexity perspective

Analysis and synthesis: a complexity perspective Analysis and synthesis: a complexity perspective Pablo A. Parrilo ETH ZürichZ control.ee.ethz.ch/~parrilo Outline System analysis/design Formal and informal methods SOS/SDP techniques and applications

More information

Further Results on Model Structure Validation for Closed Loop System Identification

Further Results on Model Structure Validation for Closed Loop System Identification Advances in Wireless Communications and etworks 7; 3(5: 57-66 http://www.sciencepublishinggroup.com/j/awcn doi:.648/j.awcn.735. Further esults on Model Structure Validation for Closed Loop System Identification

More information

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term; Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many

More information

Gramians based model reduction for hybrid switched systems

Gramians based model reduction for hybrid switched systems Gramians based model reduction for hybrid switched systems Y. Chahlaoui Younes.Chahlaoui@manchester.ac.uk Centre for Interdisciplinary Computational and Dynamical Analysis (CICADA) School of Mathematics

More information

An improved convergence theorem for the Newton method under relaxed continuity assumptions

An improved convergence theorem for the Newton method under relaxed continuity assumptions An improved convergence theorem for the Newton method under relaxed continuity assumptions Andrei Dubin ITEP, 117218, BCheremushinsaya 25, Moscow, Russia Abstract In the framewor of the majorization technique,

More information

Converse Results on Existence of Sum of Squares Lyapunov Functions

Converse Results on Existence of Sum of Squares Lyapunov Functions 2011 50th IEEE Conference on Decision and Control and European Control Conference (CDC-ECC) Orlando, FL, USA, December 12-15, 2011 Converse Results on Existence of Sum of Squares Lyapunov Functions Amir

More information

An Optimization-based Approach to Decentralized Assignability

An Optimization-based Approach to Decentralized Assignability 2016 American Control Conference (ACC) Boston Marriott Copley Place July 6-8, 2016 Boston, MA, USA An Optimization-based Approach to Decentralized Assignability Alborz Alavian and Michael Rotkowitz Abstract

More information

arxiv: v1 [math.oc] 24 Sep 2018

arxiv: v1 [math.oc] 24 Sep 2018 Unified Necessary and Sufficient Conditions for the Robust Stability of Interconnected Sector-Bounded Systems Saman Cyrus Laurent Lessard arxiv:8090874v mathoc 4 Sep 08 Abstract Classical conditions for

More information

On the convergence of normal forms for analytic control systems

On the convergence of normal forms for analytic control systems Chapter 1 On the convergence of normal forms for analytic control systems Wei Kang Department of Mathematics, Naval Postgraduate School Monterey, CA 93943 wkang@nps.navy.mil Arthur J. Krener Department

More information

LMI Methods in Optimal and Robust Control

LMI Methods in Optimal and Robust Control LMI Methods in Optimal and Robust Control Matthew M. Peet Arizona State University Lecture 15: Nonlinear Systems and Lyapunov Functions Overview Our next goal is to extend LMI s and optimization to nonlinear

More information

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Solution only depends on a small subset of training

More information

Oracle Complexity of Second-Order Methods for Smooth Convex Optimization

Oracle Complexity of Second-Order Methods for Smooth Convex Optimization racle Complexity of Second-rder Methods for Smooth Convex ptimization Yossi Arjevani had Shamir Ron Shiff Weizmann Institute of Science Rehovot 7610001 Israel Abstract yossi.arjevani@weizmann.ac.il ohad.shamir@weizmann.ac.il

More information

Inertial Game Dynamics

Inertial Game Dynamics ... Inertial Game Dynamics R. Laraki P. Mertikopoulos CNRS LAMSADE laboratory CNRS LIG laboratory ADGO'13 Playa Blanca, October 15, 2013 ... Motivation Main Idea: use second order tools to derive efficient

More information

Research Article An Equivalent LMI Representation of Bounded Real Lemma for Continuous-Time Systems

Research Article An Equivalent LMI Representation of Bounded Real Lemma for Continuous-Time Systems Hindawi Publishing Corporation Journal of Inequalities and Applications Volume 28, Article ID 67295, 8 pages doi:1.1155/28/67295 Research Article An Equivalent LMI Representation of Bounded Real Lemma

More information

Fast proximal gradient methods

Fast proximal gradient methods L. Vandenberghe EE236C (Spring 2013-14) Fast proximal gradient methods fast proximal gradient method (FISTA) FISTA with line search FISTA as descent method Nesterov s second method 1 Fast (proximal) gradient

More information

Stability of linear time-varying systems through quadratically parameter-dependent Lyapunov functions

Stability of linear time-varying systems through quadratically parameter-dependent Lyapunov functions Stability of linear time-varying systems through quadratically parameter-dependent Lyapunov functions Vinícius F. Montagner Department of Telematics Pedro L. D. Peres School of Electrical and Computer

More information

Convergence rates for distributed stochastic optimization over random networks

Convergence rates for distributed stochastic optimization over random networks Convergence rates for distributed stochastic optimization over random networs Dusan Jaovetic, Dragana Bajovic, Anit Kumar Sahu and Soummya Kar Abstract We establish the O ) convergence rate for distributed

More information

Lecture: Adaptive Filtering

Lecture: Adaptive Filtering ECE 830 Spring 2013 Statistical Signal Processing instructors: K. Jamieson and R. Nowak Lecture: Adaptive Filtering Adaptive filters are commonly used for online filtering of signals. The goal is to estimate

More information

Research Article Convex Polyhedron Method to Stability of Continuous Systems with Two Additive Time-Varying Delay Components

Research Article Convex Polyhedron Method to Stability of Continuous Systems with Two Additive Time-Varying Delay Components Applied Mathematics Volume 202, Article ID 689820, 3 pages doi:0.55/202/689820 Research Article Convex Polyhedron Method to Stability of Continuous Systems with Two Additive Time-Varying Delay Components

More information

Hybrid Systems Course Lyapunov stability

Hybrid Systems Course Lyapunov stability Hybrid Systems Course Lyapunov stability OUTLINE Focus: stability of an equilibrium point continuous systems decribed by ordinary differential equations (brief review) hybrid automata OUTLINE Focus: stability

More information

Multiresolution analysis by infinitely differentiable compactly supported functions. N. Dyn A. Ron. September 1992 ABSTRACT

Multiresolution analysis by infinitely differentiable compactly supported functions. N. Dyn A. Ron. September 1992 ABSTRACT Multiresolution analysis by infinitely differentiable compactly supported functions N. Dyn A. Ron School of of Mathematical Sciences Tel-Aviv University Tel-Aviv, Israel Computer Sciences Department University

More information

Lecture Note 7: Switching Stabilization via Control-Lyapunov Function

Lecture Note 7: Switching Stabilization via Control-Lyapunov Function ECE7850: Hybrid Systems:Theory and Applications Lecture Note 7: Switching Stabilization via Control-Lyapunov Function Wei Zhang Assistant Professor Department of Electrical and Computer Engineering Ohio

More information

Analysis of periodic solutions in tapping-mode AFM: An IQC approach

Analysis of periodic solutions in tapping-mode AFM: An IQC approach Analysis of periodic solutions in tapping-mode AFM: An IQC approach A. Sebastian, M. V. Salapaka Department of Electrical and Computer Engineering Iowa State University Ames, IA 5004 USA Abstract The feedback

More information

THE HARTMAN-GROBMAN THEOREM AND THE EQUIVALENCE OF LINEAR SYSTEMS

THE HARTMAN-GROBMAN THEOREM AND THE EQUIVALENCE OF LINEAR SYSTEMS THE HARTMAN-GROBMAN THEOREM AND THE EQUIVALENCE OF LINEAR SYSTEMS GUILLAUME LAJOIE Contents 1. Introduction 2 2. The Hartman-Grobman Theorem 2 2.1. Preliminaries 2 2.2. The discrete-time Case 4 2.3. The

More information

Delay-dependent Stability Analysis for Markovian Jump Systems with Interval Time-varying-delays

Delay-dependent Stability Analysis for Markovian Jump Systems with Interval Time-varying-delays International Journal of Automation and Computing 7(2), May 2010, 224-229 DOI: 10.1007/s11633-010-0224-2 Delay-dependent Stability Analysis for Markovian Jump Systems with Interval Time-varying-delays

More information

Stationary Probabilities of Markov Chains with Upper Hessenberg Transition Matrices

Stationary Probabilities of Markov Chains with Upper Hessenberg Transition Matrices Stationary Probabilities of Marov Chains with Upper Hessenberg Transition Matrices Y. Quennel ZHAO Department of Mathematics and Statistics University of Winnipeg Winnipeg, Manitoba CANADA R3B 2E9 Susan

More information

Auxiliary signal design for failure detection in uncertain systems

Auxiliary signal design for failure detection in uncertain systems Auxiliary signal design for failure detection in uncertain systems R. Nikoukhah, S. L. Campbell and F. Delebecque Abstract An auxiliary signal is an input signal that enhances the identifiability of a

More information

Approximation Metrics for Discrete and Continuous Systems

Approximation Metrics for Discrete and Continuous Systems University of Pennsylvania ScholarlyCommons Departmental Papers (CIS) Department of Computer & Information Science May 2007 Approximation Metrics for Discrete Continuous Systems Antoine Girard University

More information

Distributed Randomized Algorithms for the PageRank Computation Hideaki Ishii, Member, IEEE, and Roberto Tempo, Fellow, IEEE

Distributed Randomized Algorithms for the PageRank Computation Hideaki Ishii, Member, IEEE, and Roberto Tempo, Fellow, IEEE IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 55, NO. 9, SEPTEMBER 2010 1987 Distributed Randomized Algorithms for the PageRank Computation Hideaki Ishii, Member, IEEE, and Roberto Tempo, Fellow, IEEE Abstract

More information

Global convergence of the Heavy-ball method for convex optimization

Global convergence of the Heavy-ball method for convex optimization Noname manuscript No. will be inserted by the editor Global convergence of the Heavy-ball method for convex optimization Euhanna Ghadimi Hamid Reza Feyzmahdavian Mikael Johansson Received: date / Accepted:

More information

An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods

An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods Renato D.C. Monteiro B. F. Svaiter May 10, 011 Revised: May 4, 01) Abstract This

More information

Intelligent Control. Module I- Neural Networks Lecture 7 Adaptive Learning Rate. Laxmidhar Behera

Intelligent Control. Module I- Neural Networks Lecture 7 Adaptive Learning Rate. Laxmidhar Behera Intelligent Control Module I- Neural Networks Lecture 7 Adaptive Learning Rate Laxmidhar Behera Department of Electrical Engineering Indian Institute of Technology, Kanpur Recurrent Networks p.1/40 Subjects

More information

Composite nonlinear models at scale

Composite nonlinear models at scale Composite nonlinear models at scale Dmitriy Drusvyatskiy Mathematics, University of Washington Joint work with D. Davis (Cornell), M. Fazel (UW), A.S. Lewis (Cornell) C. Paquette (Lehigh), and S. Roy (UW)

More information