Proximal Operator and Proximal Algorithms (Lecture notes of UCLA 285J Fall 2016)

Size: px
Start display at page:

Download "Proximal Operator and Proximal Algorithms (Lecture notes of UCLA 285J Fall 2016)"

Transcription

1 Proximal Operator and Proximal Algorithms (Lecture notes of UCLA 285J Fall 206) Instructor: Wotao Yin April 29, 207 Given a function f, the proximal operator maps an input point x to the minimizer of f restricted to a small proximity to x. The proximal operator has a simple definition yet has long been a powerful tool in optimization, giving rise to a variety of proximal algorithms such as the proximal-point algorithm, the prox-gradient algorithm, as well as many algorithms involving linearization and/or splitting. The proximal operator of a convex function involves the subgradients of the function, which does not need to be convex. Hence, proximal algorithms handle both differentiable and nondifferentiable functions. In comparison, Newton s algorithm requires a C 2 function, and gradient algorithms need C functions. Along with convex duality, proximal algorithms can solve problems with linear constraints. In fact, the Method of Multipliers, or the Augmented Lagrangian Method, is a special case of the point-point algorithm. What is unique about the proximal operator is its implicity. It computes the subgradient of a function at the output point. The subgradient and output point actually determine each other. (In comparison, in Newton s algorithm and gradient algorithms, the Hessian and gradient are evaluated at the input point and determine the output point.) Hence, computing the proximal operator may invert a matrix or evaluate a so-called resolvent. While this can be a disadvantage for some functions, a large number of extremely useful functions such as l, l 2, l 8 norms have closed-form proximal operators! Because the proximal operator is implicit, it is very stable. For example, the proximal-point algorithm converges for any positive step size. The implicitness also make proximal algorithms popular choices for certain nonconvex problems with structures. Proximal algorithms are often used for optimization problems with structures such as large sum, block separability, linear constraints, as well as linear transforms. Coordinate descent and operator splitting techniques often decompose a problem into simple subproblems that are easily solved by proximal algorithms. Therefore, the proximal operator give rise to parallel and distributed algorithms. The proximal operator also has interesting mathematical proper-

2 (lecture notes of ucla 285j fall 206) 2 ties. It is a generalization to projection and has the soft projection interpretation. As the projection to complementary linear subspaces produces an orthogonal decomposition for a point, the proximal operators of a convex function and its convex conjugate yield the Moreau decomposition of a point. Furthermore, the proximal operator provides an optimality condition: a function is minimized at a point if, and only if, the proximal operator of the function evaluated at the point returns the same point. It also a common part the fixed-point optimality conditions for more complicated optimization problems. For convex functions, the proximal operator enjoys the important firmly nonexpansive property, which plays a central role in the monotone operator theory and the operator splitting method. The property leads to sequence convergence and lets us assembly simple operators into an algorithm that solves difficult problems. Consequently, proximal operators are frequently used to handle simple or structured functions in operator-splitting algorithms. Notation To ensure the proximal operator is well defined and gives the unique output, we consider functions f : R n Ñ R Y t8u that is proper, not everywhere 8 closed 2, and convex. Except for examples, the results of this chapter generalizes from R n to a possibly infinite dimensional Hilbert space. Including 8 in the range of f will let us save x P dom f. The set of minimizers of f is denoted as argmin f tx P R n : f pxq min y f pyqu. 2 Its epigraph is a closed set. Equivalently, all its level sets are closed sets. Definition Definition. Given a proper closed convex function f : R n Ñ R Y t8u, the proximal operator, which is scaled by λ 0, is a mapping from R n Ñ R defined by prox λ f pxq : argmin ypr n f pyq 2λ }y x}2. () Lemma. For any λ and x, prox λ f pxq exists and is unique, if f is proper closed convex. Since f is proper closed convex, it is lower bounded by an affine function, therefore Fpyq f pyq 2λ }y x} 2 is coercive, i.e., lim }y}ñ 8 Fpyq 8, then we can take a minimizing sequence such that lim mñ 8 Fpy m q inf ypr n Fpyq, since Fpyq is coercive, so this sequence is bounded,

3 (lecture notes of ucla 285j fall 206) 3 therefore it has a subsequence that converges to a cluster point, say y mk Ñ y. Since Fpyq is closed, so Fpy q lim Fpy m kñ 8 k q inf Fpyq ypr n Therefore y is a minimizer of Fpyq, it is unique since Fpyq is strongly convex. We use λ f as the proximal subscript, but we prefer separating f and λ in the minimization problem in(). In fact, the definition does not change if, instead of (), we let prox λ f pxq : argmin λ f pyq }y x}2. 2 However, the separation yields the Moreau envelope or Moreau-Yoshida approximation: ˆf pxq : min ypr n f pyq }y x}2. 2λ The function ˆf approximate f but is everywhere differentiable, even if f is not so. It is easy to show ˆf pxq λ x prox λ f pxq. (2) Exercise: Prove (2). (add existence and uniqueness.) ([Add an D illustration of f pxq, f pyq 2λ }y x} 2, and ˆf pxq.]) Soft projection Let C be a nonempty closed convex set. Recall the indicator function $ & 0, if x P C δ C pxq : % 8, otherwise. Let λ 0. By definition, prox λδc pxq argmin y δ C pyq }y x}2 2λ argmint}y x} : y P Cu y proj C pxq. The proximal operator of C s indicator function is just the projection onto C. The scaling parameter λ does not make any difference. The indicator function has the special property argmin δ C domδ C C. In general, a proper convex function f satisfies argmin f dom f. Suppose that argmin f is nonempty. For a given x P R n,

4 (lecture notes of ucla 285j fall 206) 4 as λ Ò 8, prox λ f pxq approaches proj argmin f pxq. as λ Ó 0, prox λ f pxq approaches proj dom f pxq. (Add a 2D illustration.) Examples Linear function Given a P R n, b P R and prox λ f pxq : argmin ypr n f pxq : xa, xy b. The proximal operator of this linear function is pa T y bq 2λ }y x}2. The first-order optimality condition is obtained by differentiating the minimization objective in y, yielding a λ pprox λ f pxq xq 0 ðñ prox λ f pxq x λa. (3) The proximal map subtracts λ units along the negative normal direction from the input point. As an application, let f pq be the linear (st order) approximation of a differentiable function f at the point x, namely, Following (3), we obtain f pq pyq : f pxq x f pxq, y xy. prox λ f pqpxq x λ f pxq, which is the gradient descent step with the size λ. Quadratic function Can we recover Newton s algorithm with the proximal operator of the quadratic approximation? We will see it almost does! Let A P R n n be a symmetric positive semi-definite matrix and b P R n be a vector. Consider the quadratic function f pxq : xx, Axy xb, xy. 2

5 (lecture notes of ucla 285j fall 206) 5 The proximal of f is prox λ f pxq : argmin ypr n 2 xy, Ayy xb, yy 2λ }y x}2. By differentiating the minimization objective in y, we obtain the first order optimality condition: Therefore, pay bq λ py xq 0 ô y pλa Iq pλb vq ô y pλa Iq pλb λax x λaxq ô y x pa λ Iq pb Axq prox λ f pxq x pa λ Iq pb Axq. (4) Consider the least-squares problem minimize x }Bx b}2 2 which B P R m n and c P R m. By letting A B T B and b B T c, we recover from (4) the iterative refinement algorithm: x k Ð x k pa λ Iq pb Ax k q. As another application, let us take a C 2 function f, and define its quadratic (2nd order) approximation at a point x: f p2q pyq : f pxq x f pxq, y xy 2 xpy xq, 2 f pxqpy xqy. With A 2 f pxq and b p 2 f pxqq T x f pxq, we simplify f p2q as f p2q pyq xx, Axy xb, xy c, 2 where c collects all y-independent terms, which do not affect the evaluation of proximal operator. Following from (4), we get prox λ f p2qpxq x p 2 f pxq λ Iq f pxq. The iteration x k Ð prox λ f p2qpx k q recovers the modified-hessian Newton algorithm, which is also known as the Levenberg-Marquardt method.

6 (lecture notes of ucla 285j fall 206) 6 l -norm Let f pxq }x}. The point y prox λ f pxq must minimize }y} 2λ }y x} 2. Therefore, it satisfies 0 P B}y } λ py xq ôx y P λ B}y }. (5) Recall that, since } } is separable, its subdifferential simplifies to B}y } B y B y n. Therefore, the condition (5) reduces to the component-wise conditions x i y i P λb y i, i,..., n, which means the graph of px i y i q intersects that of λb y i. From the plot, we observe that $ '& x i λ, if x i λ y i x i λ, if x i λ '% 0, otherwise. Since y i preserves the sign of x i and reduces its absolute by λ; if x i λ, then y i 0, we also have y i signpx i q maxt0, x i λu. Putting this for all i,..., n together yields: prox λ} } pxq y signpxq maxt0, x λu, applied component wise. Therefore, the l -proximal operator earns the name shrinkage and soft-thresholding. In Matlab, the computation can be written in one line as y = sign(x).*max(0,abs(x)-lambda); l 2 norm Let f pxq }x} 2,

7 (lecture notes of ucla 285j fall 206) 7 which is a non-separable function. Recall its subdifferential $ & t B f pxq }x} x u, if x 0 2 % tp : }p} 2 u, otherwise. Since f is differentiable at all but one point, we can apply the assumption trick. Let λ 0. First assume that y prox λ} }2 pxq 0. Then, y must satisfy 0 y }y } 2 λ py xq. (6) Use the polar coordinates x pr x, θ x q, where r x }x} 2 and θ x tan p x 2 y x q. From (6), }y } and x y must have the same angle. 2 Since y }y } 2 and y have the sample angle, their angle must equal the angle of x, or its negative. Therefore, it must hold y αx for α P R. Substituting this into (6) yields Hence, 0 signpαq α λ r x. $ & r x λ r α x, if r x λ % 0, otherwise. and y αx. We have assumed y 0, but it is easy to verify that when r x λ, λ px 0q P tp : }x} 2 u and thus y prox λ} }2 pxq 0. Therefore, $ & }x} 2 λ prox λ} }2 pxq }x} x, if }x} 2 λ 2 (7) % 0, otherwise. In Matlab, the computation can be written in one line as y = (max(0,norm(x)-lambda)/(norm(x)+eps))*x; where eps is added to avoid division by zero. l p,q norm This norm is used to impose properties on a group of variables. For a matrix A P R m n, its l p,q norm is }A} p,q ņ j m i a ij p q p q.

8 (lecture notes of ucla 285j fall 206) 8 The most common example is the l 2, norm, used in the Group Lasso model, }A} 2, ņ j m a ij 2 2 i ņ j }A j } 2, where the A j is the jth column of A. Therefore, }A} 2, is separable to the sum of the l 2 -norms of its columns. To take advantage of this property, we write X X X 2 X n where X i P R n is the ith column of X. Then, prox λ} }2, pxq : argmin YPR m n }Y} 2, 2 }Y X}2 F prox λ} }2 px q,..., prox λ} }2 px n q. Here, each prox λ} }2 px i q is given by (7), i,..., n. l 8 norm prox λl8 can be derived in two ways, either directly from the definition of the proximal operator and } } 8 or following the Moreau decomposition in the next section. TODO: add the direct approach, which is based on B} } 8. It should reduce to the problem of finding t }y } 8 such that x i t λ. (8) Given the solution t to (8), i:x i t prox λl8 pxq signpxq mint x, tu, component-wise. Unitary-invariant matrix norms We call a matrix norm ~ ~ unitary-invariant if ~X~ ~UXV ~ for any matrix X P C m n and unitary matrices U P C m m, V P C n n. Since singular values are rotational (unitary) invariant, all singular-valued based matrix norms are rotational (unitary) invariant. Common examples are nuclear norm ~ ~ : l or sum of singular values, Frobenius norm ~ ~ F : l 2 of singular values, l 2 -operator norm ~ ~ 2 : l 8 or maximum of singular values.

9 (lecture notes of ucla 285j fall 206) 9 They are called Schatten-p norms for p, 2, 8, respectively. Let ~ ~ be a unitary-invariant matrix norm and } } be its corresponding norm of the singular values. Consider the matrix proximal operator Y prox λ~ ~ pxq : argmin ~Y~ Y 2λ ~Y X~2 F. One can show that the solution Y must share the singular value factors, which are unitary matrices, with the input matrix X. Hence, the steps to compute prox λ~ ~ pxq are. Apply SVD to X: A UdiagpσqV, where σ is the vector with the singular values of X and diagpσq is its diagonal matrix; 2. Compute the vector proximal operator: σ prox λ} } pσq; 3. Form the solution: Y Udiagpσ qv. Moreau decomposition Let f : R n Ñ R be a proper closed convex function and λ 0. Let f be the convex conjugate, or the Legendre transform, of f : f puq sup v xv, uy f pvq. (9) In this chapter, we leave this definition unexplained. Another chapter is dedicated to convex duality and the property of convex conjugacy. The Moreau decomposition applies any point x P R n and decomposes it as where y prox λ f pxq, z prox λ f pλ xq. x y λz Complementary linear subspaces Let S be a linear subspace of R n and S K be its complementary subspace. Then,... (9) reduces to x proj S pxq proj S Kpxq.

10 (lecture notes of ucla 285j fall 206) 0 Cone and polar cone l p -norm and l q -ball Let p, q P r, 8s be such that p q. By definition, $ & } } 0, if }u} q, ppuq sup xv, uy } } p pvq v % 8, otherwise. Hence, } } pp q δ } }q p q. (0) Let us compute the projection to the l q -ball of radius α 0: B α q : tx : }x} q αu. Obviously, B α q α tx : }x} q u, and thus δ B α q p q α δ } }q p q. Therefore, by applying the Moreau decomposition (9) with (0) and λ α, we obtain proj B α q pxq prox δb α q pxq prox αδ} } q pxq prox λ δ } } pλxq λ q pλxq prox } } p pλxq λ x λ prox } } p pλxq x αprox } }p px{αq Applying this above identity to l, l 2, l 8 balls, we arrive at. proj } } α pxq x αprox } } 8 p x α q 2. proj } }2 α pxq x αprox } } 2 p x α q 3. proj } }8 α pxq x αprox } } p x α q Projection to box constraints Two vectors l P pt 8u Y Rq n and u P pr Y t8uq n define the box set S rl, us tx P R n : l i x i u i u R n....

11 (lecture notes of ucla 285j fall 206) Projection to subspaces and affine sets S tx P R n : x x 2 x n u.... S tx P R n : xa, xy b 0u.... S tx P R n : xa, xy b 0, xc, xy d 0u. Total variation Consider x P R n,... In two or more dimensional spaces,... graph-cut, max-flow Function under linear transform Let A P R m n. Consider the function, hpxq f paxq Assume that AA T I. Here, if A is a square matrix, A is called an orthogonal or orthonormal matrix. If A is rectangular, we can it a frame. Since AA T I, we have rankpaq m and }y x} 2 }Ay Ax} 2 for any x, y and the linear transform T : x Ñ Ax is surjective. Therefore, prox λh pyq argmin xpr n pz : Axq argmin xpr n f paxq f paxq A T argmin zpr m f pzq A T prox λ f payq. }y x}2 2λ }Ay Ax}2 2λ }Ay z}2 2λ Here, the change of variable from x P R n to z : Ax and z P R n relies on the fact that T : x Ñ Ax is surjective. From the solution z, we can recover the solution x since A T z A T Ax x. Proximable functions Definition 2. A function f : R n Ñ R is proximable if prox γ f can be computed in Opnq or Opnpolylogpnqq time. We have some common proximal functions such as

12 (lecture notes of ucla 285j fall 206) 2. norms: l, l 2, l 2,, l 8 ; 2. separable functions and indicator functions of separable constraints; 3. the standard simplex: tx P R n : T x, x 0u; TODO: add more. This section studies part 2 and some summative proximable functions. Separable sum Proposition. For a separable function it holds that f pxq n i f i px i q, prox λ f pxq prox λ f px q,..., prox λ fn px n q. Summative proximable functions In general, even if f, g are both proximable, h f g may not be proximable. Operator splitting algorithms, one may have to deal with f and g in two subproblems. However, there are exceptions, which we call summative proximable functions. Definition 3. We call h f g is a summative proximable function if prox λh pxq prox λ f prox λg P R n, λ 0. Examples of summative proximal functions are. In R, let f : R Ñ R be convex and satisfy f p0q 0. Then the function f is summative proximable. An example is the elastic net regularization function }x} 2µ }x} 2 2, which is component-wise separable. 2. In R n, if g is a homogeneous function 3, then the function. } } 2 g is 3 gpαxq 0. summative proximable. Special cases of homogeneous functions include norms (e.g., l p -norm, p P r, 8s) and indicator functions δ 0 and δ In R n, let f be a prox-monotone function 4 and g be D total variation 4 For x P R n and i,..., n, prox f satisfies gpxq n i x i x i, then f g is summative proximable. Examples of prox-monotone functions include l x i x j ñ prox f pxq i prox pxq, l 2, l 8 and δ u P R n. The f function α}x} x i x j ñ prox f pxq i prox pxq gpxq is called the Fused Lasso regularizer. f x i x j ñ prox f pxq i prox pxq f i, i, i.

13 (lecture notes of ucla 285j fall 206) 3 Proximal fixed-point optimality conditions Theorem. Let λ 0. The point x P R n is a minimizer of a proper closed convex function f if, and only if, x prox λ f px q. Proof. ñ": Let x P argmin f pxq. Then for any x P R n, f pxq 2λ }x x } 2 f px q f px q 2λ }x x } 2. Thus, x argmin f pxq 2λ }x x } 2 prox f px q. ð": Let x prox λ f px q. By the subgradient optimality condition: 0 P B f px q λ px x q B f px q. Thus, 0 P B f px q, and x P argmin f pxq. Proximal-point algorithm The proximal-point algorithm (PPA) refers to the iteration x k Ð prox λ f px k q, () where λ 0 is the step size. Although it is seldom used as an algorithm to minimizer f, it recovers the Method of Multipliers (Augmented Lagrangian Method) and others. The step size can vary with the iteration in an closed interval, namely, λ k P rl, us for 0 l u 8. Subgradient-descent interpretation Although a negative subgradient may not be a descending direction, we will that, in PPA, a subgradient evaluated at the new point x k ensures function value descent. The PPA iteration () satisfies x k prox λ f px k q ô 0 P λb f px k q x k x k ô x k x k λ r f px k q, where r f px k q P B f px k q is a subgradient. It is uniquely determined by proxλ f px k q even if B f px k q has more than one element. Let us compare f px k q and f px k q. By the definition of subgradient, f px k q f px k q x r f px k q, x k x k y f px k q λ }x k x k } 2. Hence, unless }x k x k } 0, in which case r f px k q 0 and thus x k is optimal, the function value is always decreased.

14 (lecture notes of ucla 285j fall 206) 4 Dual interpretation Let y k r f px k q P B f px k q, then we have y k P B f px k λy k q Therefore computing prox λ f px k q is equivalent to solving for a subgradient at the descent destination. Diminishing Tikhonov-regularization interpretation In PPA iteration, x k argminp f pxq xpr n 2λ }x xk } 2 2 q The second term can be considered as a regularization term, which keeps x k close to x k. Because the regularization is not anchored but uses the current point x k, the amount of regularization goes away as x k converges. Bregman iterative regularization interpretation The Bregman iterative regularization refers to the iteration x k Ð argmin λ D r px; x k q f pxq, (2) xpr n where D r p ; x k q is the Bregman distance (or Bregman divergence) function induced by a proper closed convex (possibly nonsmooth) function r. Specifically, D r px; x k q : rpxq rpx k q xp, x x k y, where p P Brpx k q is a given subgradient. Since p determines the Bregman distance, we sometimes write D r px; x k, pq. By definition of the subgradient, D r px; x k q 0 for all x P R n, and it tends to be smaller when x is closer to x k. Although it is called a distance, it generally violates the conditions of a mathematical distance. The following Bregman distances are often used. Squared Euclidean distance Dpx; yq }x y} 2 2 is induced by rp q } } 2 2 ; 2. The Kullback-Leibler divergence Dpx; yq ņ i x i log x i y i x i y i is induced by rpxq n i px i logpx i q x i q. This Bregman distance measures of the difference between two probability densities x, y.

15 (lecture notes of ucla 285j fall 206) 5 3. l Bregman distance is induced by rp q } }. It is used in compressed sensing. The total variation Bregman distance is used image reconstruction. Note that, due to the existence of multiple subgradients, these two Bregman distances are not defined until a subgradient is specified. Typically, one picks 0 at the beginning and then the one that appears in the optimality condition of the previous iteration. The PPA iteration is a special case of (2) corresponding to the convex function rp q 2 } }2 2. Convergence of the Proximal-Point Algorithm Several more complicated algorithms (including the alternating direction method of multipliers, or ADMM, and a variety of primal-dual methods) are special cases of the PPA. They correspond to particular proximal or resolvent operators. Hence, analyzing the convergence of PPA is fundamental to the study of first-order optimization algorithms and operator splitting methods. The analysis approach that we will take below underlies the analysis of many other algorithms. Let us assume that f is proper closed convex and argmin f is nonempty (but possibly non-unique). We first study the convergence of the sequence tx k u. Definition 4. Consider an operator T : R n Ñ R n.. The operator R : I T is called the (fixed-point) residual operator, and Rpxq x Tpxq is called the (fixed-point) residual at x. 2. The operator T is called firmly nonexpansive if }Tpxq Tpyq} 2 }x y} 2 }Rpxq Rpyq} y P domt.(3) 3. The operator T is called strictly contractive or α-contractive for α P p0, q if }Tpxq Tpyq} α}x y P domt. Through simple algebras, one can show that T is firmly nonexpansive if, and only if, R I T is firmly nonexpansive. We introduce the firmly nonexpansive operator because it leads to sequence convergence and the proximal operator of a convex function is firmly nonexpansive.

16 (lecture notes of ucla 285j fall 206) 6 Proposition 2. For any proper closed convex function f and λ 0, prox λ f is firmly nonexpansive. If f is also strongly convex, then prox λ f is contractive. Proof. add proof As along as f is convex and has a minimizer (not necessarily unique), the PPA converges to a minimizer. Theorem 2. For any proper closed convex function f : R n Ñ R Y t8u that has a minimizer and λ 0, the the PPA iteration () produces a sequence tx k u that converges to some x P argmin f. Proof. Pick an arbitrary x P argmin f. Applying (3) with T prox λ f, x x k, and y x yields }x k x } 2 }x k x } 2 }prox λ f px k q x k } 2, (4) from which we conclude. }x k x } }x 0 x } for all k and, thus, the sequence px k q k 0 is bounded and has a subsequence px j q jpj px k q k 0 such that x : lim jpj x j ; (5) 2. summing (4) in a telescopic fashion gives 8 k0 }prox λ f px k q x k } 2 8 and, thus, lim k }prox λ f px k q x k } 0. (6) Since prox λ f pxq is continuous in x, so is }prox λ f pxq x}. Based on this continuity, from (5) and (6), }prox λ f p xq x} lim jpj }prox λ f px j q x j } 0. Therefore, prox λ f p xq x 0 and, by Theorem, x P argmin f. Recall that x P argmin is arbitrary. By letting x x in (4) we get }x k x} 2 }x k x} 2. For each k 0, define j k maxtj P J : j ku. As j k k, }x k x} }x j k x}. Because tj k : k 0u J, we get lim k }x k x} lim k }x j k x} lim jpj }x j x} 0. TODO: add convergence rates:

17 (lecture notes of ucla 285j fall 206) 7. }x k x k } is monotonically nonincreasing 2. }x k x k } 2 op{kq 3. }x k x k } 2 op{k 2 q using the monotonicity and summability of f px k q f On the other hand, if f is strongly convex, then the PPA converges to its unique minimizer at a linear (geometric) rate. This is a direct result of the Banach Fixed-point Theorem 5. 5 Proximal operators of nonconvex functions In general, a nonconvex function does not have a subdifferential, and the minimization problem of the proximal operator may have more than one stationary points. The proximal operator of a nonconvex function is, therefore, computed in a case-by-case basis. l 0 norm The l 0 function counts the number of nonzeros in the input, that is, }x} 0 : ti : xi 0u. Given a vector x P R n, sort its components so that x rs x r2s x rns, where x ris is the ith smallest component of x (not necessarily equal x i ). Let us compute prox λl0 pxq : argmin ypr n }y} 0 2λ }y x}2 2. Since the value of a nonzero component y i does not affect }y} 0, we only need to decide in the solution y, the set of nonzero components. If y i is nonzero, it would equal x i. In addition, suppose that there are p nonzero components in y, then }y } 0 is fixed as p and, thus, 2λ }y x} 2 2 reaches its minimal if we identify the largest p components of x and send them to the corresponding components of y, making y equal to thresholding the smallest n p components of x to 0. Therefore, the problem is simplified to figure out p. For p }y } 0 0,,..., n, the values of f p min ypr n }y} 0 2λ }y x} 2 2

18 (lecture notes of ucla 285j fall 206) 8 are, respectively, f 0 0 f 2λ 2λ ņ i ņ i2 x ris 2 x ris 2 f n pn q x rns 2 f n n 0. The difference f i f i 2λ x ris 2, i,..., n, is monotonically increasing. Let i argmaxti : f i f i 0u argmaxti : x ris? 2λu and, if argmax not attained, let i 0. Then, the minimal value f is achieved at f i : f 0 p f i f i q i ņ x 2λ ris 2, i i ii which is attained at $? '& 0, if x i 2λ, y prox λl0 pxq where y i 0 or x i,, if x i? 2λ, '% x i, otherwise. Therefore, prox λl0 is also called the (hard) thresholding operator. l {2 and l 2{3 quasi-norms (TODO) Uncovered topics Proximal based operator-splitting algorithms, such as the proxgradient algorithm (a future topic) Dual proximal algorithms (a future topic) Analysis of existence, continuity, boundedness, etc. (references:...) History notes In 6, von Neumann showed that any unitary invariant matrix norm 6 can be written as ~X~ gpσ X q, where σ X is the vector of singular values of X and g is symmetric gauge function. (TODO: add more)

19 (lecture notes of ucla 285j fall 206) 9 Exercise. Consider the function r : R Ñ R Y t8u given by $ & log x, if x 0 rpxq % 8, otherwise. Show that for all y P R and λ 0, prox λr pyq y a y 2 4λ Consider the function r : R Ñ R Y t8u given by $ & x, if x 0 rpxq % 0, otherwise. Derive the formula of prox λr pyq, where y P R and λ Consider the function r : R Ñ R Y t8u given by $ & x, if x 0 rpxq % 8, otherwise. Derive the formula of prox λr pyq, where y P R and λ Consider the weighted -norm }x},w ņ i w i x i. Derive the formula of prox λr pyq, where y P R n and λ Consider the weighted 2-norm g f }x} 2,w e ņ i w i x i 2. Derive the formula of prox λr pyq, where y P R n and λ Given a proper function g : R Ñ R Y t8u and its proximal mapping prox λg, derive the proximal mapping prox γ f for the function f pxq αgpx{βq, where α, β 0 are given.

20 (lecture notes of ucla 285j fall 206) Given a proper function g : R n Ñ R Y t8u and its proximal mapping prox λg, for all λ 0, derive the proximal mapping prox γ f of the function f pxq gpxq 2α }x x0 } 2 2, where α 0 and x 0 P R n are given. 8. Given a proper function g : R Ñ R Y t8u and its proximal mapping prox λg, for all λ 0, derive the proximal mapping prox γ f of the function f : R n Ñ R Y t8u: f pxq gpx x 2 x n q. 9. Given a proper function g : R n Ñ R Y t8u and its proximal mapping prox λg, for all λ 0, derive the proximal mapping of the function f : R n tor Y t8u: f pxq gpax bq, where A P R m n, b P R m are given and AA T αi for some α Define the set D tx P R n : x x 2 x n u. Given a proper function g : R Ñ R Y t8u and its proximal mapping prox λg, for all λ 0, derive the proximal mapping prox γ f of the function f : R n Ñ R Y t8u: f pxq δ D pxq ņ i gpx n q.. Let f, r: R n Ñ R Y t8u be proper closed convex functions, and assume f is continuously differentiable. Show that x P argminprpxq x f pxqq ðñ x prox λr px λ f px qq, where λ 0. Is this still true if r is nonconvex?

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 36 Why proximal? Newton s method: for C 2 -smooth, unconstrained problems allow

More information

Convergence of Fixed-Point Iterations

Convergence of Fixed-Point Iterations Convergence of Fixed-Point Iterations Instructor: Wotao Yin (UCLA Math) July 2016 1 / 30 Why study fixed-point iterations? Abstract many existing algorithms in optimization, numerical linear algebra, and

More information

Proximal methods. S. Villa. October 7, 2014

Proximal methods. S. Villa. October 7, 2014 Proximal methods S. Villa October 7, 2014 1 Review of the basics Often machine learning problems require the solution of minimization problems. For instance, the ERM algorithm requires to solve a problem

More information

EE 546, Univ of Washington, Spring Proximal mapping. introduction. review of conjugate functions. proximal mapping. Proximal mapping 6 1

EE 546, Univ of Washington, Spring Proximal mapping. introduction. review of conjugate functions. proximal mapping. Proximal mapping 6 1 EE 546, Univ of Washington, Spring 2012 6. Proximal mapping introduction review of conjugate functions proximal mapping Proximal mapping 6 1 Proximal mapping the proximal mapping (prox-operator) of a convex

More information

Coordinate Update Algorithm Short Course Operator Splitting

Coordinate Update Algorithm Short Course Operator Splitting Coordinate Update Algorithm Short Course Operator Splitting Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 25 Operator splitting pipeline 1. Formulate a problem as 0 A(x) + B(x) with monotone operators

More information

Math 273a: Optimization Subgradient Methods

Math 273a: Optimization Subgradient Methods Math 273a: Optimization Subgradient Methods Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com Nonsmooth convex function Recall: For ˉx R n, f(ˉx) := {g R

More information

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some

More information

The proximal mapping

The proximal mapping The proximal mapping http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/37 1 closed function 2 Conjugate function

More information

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization / Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear

More information

Math 273a: Optimization Lagrange Duality

Math 273a: Optimization Lagrange Duality Math 273a: Optimization Lagrange Duality Instructor: Wotao Yin Department of Mathematics, UCLA Winter 2015 online discussions on piazza.com Gradient descent / forward Euler assume function f is proper

More information

6. Proximal gradient method

6. Proximal gradient method L. Vandenberghe EE236C (Spring 2016) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping

More information

Optimization for Machine Learning

Optimization for Machine Learning Optimization for Machine Learning (Problems; Algorithms - A) SUVRIT SRA Massachusetts Institute of Technology PKU Summer School on Data Science (July 2017) Course materials http://suvrit.de/teaching.html

More information

Convex Optimization Conjugate, Subdifferential, Proximation

Convex Optimization Conjugate, Subdifferential, Proximation 1 Lecture Notes, HCI, 3.11.211 Chapter 6 Convex Optimization Conjugate, Subdifferential, Proximation Bastian Goldlücke Computer Vision Group Technical University of Munich 2 Bastian Goldlücke Overview

More information

Optimization and Optimal Control in Banach Spaces

Optimization and Optimal Control in Banach Spaces Optimization and Optimal Control in Banach Spaces Bernhard Schmitzer October 19, 2017 1 Convex non-smooth optimization with proximal operators Remark 1.1 (Motivation). Convex optimization: easier to solve,

More information

6. Proximal gradient method

6. Proximal gradient method L. Vandenberghe EE236C (Spring 2013-14) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping

More information

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013 Convex Optimization (EE227A: UC Berkeley) Lecture 15 (Gradient methods III) 12 March, 2013 Suvrit Sra Optimal gradient methods 2 / 27 Optimal gradient methods We saw following efficiency estimates for

More information

Convex Optimization and Modeling

Convex Optimization and Modeling Convex Optimization and Modeling Duality Theory and Optimality Conditions 5th lecture, 12.05.2010 Jun.-Prof. Matthias Hein Program of today/next lecture Lagrangian and duality: the Lagrangian the dual

More information

Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods

Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 30 Notation f : H R { } is a closed proper convex function domf := {x R n

More information

Primal/Dual Decomposition Methods

Primal/Dual Decomposition Methods Primal/Dual Decomposition Methods Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2018-19, HKUST, Hong Kong Outline of Lecture Subgradients

More information

Math 273a: Optimization Overview of First-Order Optimization Algorithms

Math 273a: Optimization Overview of First-Order Optimization Algorithms Math 273a: Optimization Overview of First-Order Optimization Algorithms Wotao Yin Department of Mathematics, UCLA online discussions on piazza.com 1 / 9 Typical flow of numerical optimization Optimization

More information

Tight Rates and Equivalence Results of Operator Splitting Schemes

Tight Rates and Equivalence Results of Operator Splitting Schemes Tight Rates and Equivalence Results of Operator Splitting Schemes Wotao Yin (UCLA Math) Workshop on Optimization for Modern Computing Joint w Damek Davis and Ming Yan UCLA CAM 14-51, 14-58, and 14-59 1

More information

Chapter 1. Preliminaries. The purpose of this chapter is to provide some basic background information. Linear Space. Hilbert Space.

Chapter 1. Preliminaries. The purpose of this chapter is to provide some basic background information. Linear Space. Hilbert Space. Chapter 1 Preliminaries The purpose of this chapter is to provide some basic background information. Linear Space Hilbert Space Basic Principles 1 2 Preliminaries Linear Space The notion of linear space

More information

Dual Decomposition.

Dual Decomposition. 1/34 Dual Decomposition http://bicmr.pku.edu.cn/~wenzw/opt-2017-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/34 1 Conjugate function 2 introduction:

More information

Optimality Conditions for Constrained Optimization

Optimality Conditions for Constrained Optimization 72 CHAPTER 7 Optimality Conditions for Constrained Optimization 1. First Order Conditions In this section we consider first order optimality conditions for the constrained problem P : minimize f 0 (x)

More information

Dual Proximal Gradient Method

Dual Proximal Gradient Method Dual Proximal Gradient Method http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/19 1 proximal gradient method

More information

Convex Optimization Theory. Chapter 5 Exercises and Solutions: Extended Version

Convex Optimization Theory. Chapter 5 Exercises and Solutions: Extended Version Convex Optimization Theory Chapter 5 Exercises and Solutions: Extended Version Dimitri P. Bertsekas Massachusetts Institute of Technology Athena Scientific, Belmont, Massachusetts http://www.athenasc.com

More information

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE CONVEX ANALYSIS AND DUALITY Basic concepts of convex analysis Basic concepts of convex optimization Geometric duality framework - MC/MC Constrained optimization

More information

Part 1a: Inner product, Orthogonality, Vector/Matrix norm

Part 1a: Inner product, Orthogonality, Vector/Matrix norm Part 1a: Inner product, Orthogonality, Vector/Matrix norm September 19, 2018 Numerical Linear Algebra Part 1a September 19, 2018 1 / 16 1. Inner product on a linear space V over the number field F A map,

More information

1 Sparsity and l 1 relaxation

1 Sparsity and l 1 relaxation 6.883 Learning with Combinatorial Structure Note for Lecture 2 Author: Chiyuan Zhang Sparsity and l relaxation Last time we talked about sparsity and characterized when an l relaxation could recover the

More information

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization Compiled by David Rosenberg Abstract Boyd and Vandenberghe s Convex Optimization book is very well-written and a pleasure to read. The

More information

Convex Analysis Notes. Lecturer: Adrian Lewis, Cornell ORIE Scribe: Kevin Kircher, Cornell MAE

Convex Analysis Notes. Lecturer: Adrian Lewis, Cornell ORIE Scribe: Kevin Kircher, Cornell MAE Convex Analysis Notes Lecturer: Adrian Lewis, Cornell ORIE Scribe: Kevin Kircher, Cornell MAE These are notes from ORIE 6328, Convex Analysis, as taught by Prof. Adrian Lewis at Cornell University in the

More information

Convex Functions. Pontus Giselsson

Convex Functions. Pontus Giselsson Convex Functions Pontus Giselsson 1 Today s lecture lower semicontinuity, closure, convex hull convexity preserving operations precomposition with affine mapping infimal convolution image function supremum

More information

C*-algebras - a case study

C*-algebras - a case study - a case study Definition Suppose that H is a Hilbert space. A C -algebra is an operator-norm closed -subalgebra of BpHq. are closed under ultraproducts and subalgebras so they should be captured by continous

More information

Algorithms for Nonsmooth Optimization

Algorithms for Nonsmooth Optimization Algorithms for Nonsmooth Optimization Frank E. Curtis, Lehigh University presented at Center for Optimization and Statistical Learning, Northwestern University 2 March 2018 Algorithms for Nonsmooth Optimization

More information

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability...

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability... Functional Analysis Franck Sueur 2018-2019 Contents 1 Metric spaces 1 1.1 Definitions........................................ 1 1.2 Completeness...................................... 3 1.3 Compactness......................................

More information

Splitting methods for decomposing separable convex programs

Splitting methods for decomposing separable convex programs Splitting methods for decomposing separable convex programs Philippe Mahey LIMOS - ISIMA - Université Blaise Pascal PGMO, ENSTA 2013 October 4, 2013 1 / 30 Plan 1 Max Monotone Operators Proximal techniques

More information

Iterative Convex Optimization Algorithms; Part One: Using the Baillon Haddad Theorem

Iterative Convex Optimization Algorithms; Part One: Using the Baillon Haddad Theorem Iterative Convex Optimization Algorithms; Part One: Using the Baillon Haddad Theorem Charles Byrne (Charles Byrne@uml.edu) http://faculty.uml.edu/cbyrne/cbyrne.html Department of Mathematical Sciences

More information

Sequential Unconstrained Minimization: A Survey

Sequential Unconstrained Minimization: A Survey Sequential Unconstrained Minimization: A Survey Charles L. Byrne February 21, 2013 Abstract The problem is to minimize a function f : X (, ], over a non-empty subset C of X, where X is an arbitrary set.

More information

Conditional Gradient (Frank-Wolfe) Method

Conditional Gradient (Frank-Wolfe) Method Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties

More information

Lecture 2: Linear Algebra Review

Lecture 2: Linear Algebra Review EE 227A: Convex Optimization and Applications January 19 Lecture 2: Linear Algebra Review Lecturer: Mert Pilanci Reading assignment: Appendix C of BV. Sections 2-6 of the web textbook 1 2.1 Vectors 2.1.1

More information

On the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1,

On the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1, Math 30 Winter 05 Solution to Homework 3. Recognizing the convexity of g(x) := x log x, from Jensen s inequality we get d(x) n x + + x n n log x + + x n n where the equality is attained only at x = (/n,...,

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

A Brief Review on Convex Optimization

A Brief Review on Convex Optimization A Brief Review on Convex Optimization 1 Convex set S R n is convex if x,y S, λ,µ 0, λ+µ = 1 λx+µy S geometrically: x,y S line segment through x,y S examples (one convex, two nonconvex sets): A Brief Review

More information

Proximal Methods for Optimization with Spasity-inducing Norms

Proximal Methods for Optimization with Spasity-inducing Norms Proximal Methods for Optimization with Spasity-inducing Norms Group Learning Presentation Xiaowei Zhou Department of Electronic and Computer Engineering The Hong Kong University of Science and Technology

More information

Functional Analysis I

Functional Analysis I Functional Analysis I Course Notes by Stefan Richter Transcribed and Annotated by Gregory Zitelli Polar Decomposition Definition. An operator W B(H) is called a partial isometry if W x = X for all x (ker

More information

CSCI : Optimization and Control of Networks. Review on Convex Optimization

CSCI : Optimization and Control of Networks. Review on Convex Optimization CSCI7000-016: Optimization and Control of Networks Review on Convex Optimization 1 Convex set S R n is convex if x,y S, λ,µ 0, λ+µ = 1 λx+µy S geometrically: x,y S line segment through x,y S examples (one

More information

An introduction to some aspects of functional analysis

An introduction to some aspects of functional analysis An introduction to some aspects of functional analysis Stephen Semmes Rice University Abstract These informal notes deal with some very basic objects in functional analysis, including norms and seminorms

More information

Monotone Operator Splitting Methods in Signal and Image Recovery

Monotone Operator Splitting Methods in Signal and Image Recovery Monotone Operator Splitting Methods in Signal and Image Recovery P.L. Combettes 1, J.-C. Pesquet 2, and N. Pustelnik 3 2 Univ. Pierre et Marie Curie, Paris 6 LJLL CNRS UMR 7598 2 Univ. Paris-Est LIGM CNRS

More information

Linear Algebra Massoud Malek

Linear Algebra Massoud Malek CSUEB Linear Algebra Massoud Malek Inner Product and Normed Space In all that follows, the n n identity matrix is denoted by I n, the n n zero matrix by Z n, and the zero vector by θ n An inner product

More information

An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods

An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods Renato D.C. Monteiro B. F. Svaiter May 10, 011 Revised: May 4, 01) Abstract This

More information

Convex Optimization & Lagrange Duality

Convex Optimization & Lagrange Duality Convex Optimization & Lagrange Duality Chee Wei Tan CS 8292 : Advanced Topics in Convex Optimization and its Applications Fall 2010 Outline Convex optimization Optimality condition Lagrange duality KKT

More information

Introduction to Alternating Direction Method of Multipliers

Introduction to Alternating Direction Method of Multipliers Introduction to Alternating Direction Method of Multipliers Yale Chang Machine Learning Group Meeting September 29, 2016 Yale Chang (Machine Learning Group Meeting) Introduction to Alternating Direction

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57

More information

Computational Statistics and Optimisation. Joseph Salmon Télécom Paristech, Institut Mines-Télécom

Computational Statistics and Optimisation. Joseph Salmon   Télécom Paristech, Institut Mines-Télécom Computational Statistics and Optimisation Joseph Salmon http://josephsalmon.eu Télécom Paristech, Institut Mines-Télécom Plan Duality gap and stopping criterion Back to gradient descent analysis Forward-backward

More information

Optimality Conditions for Nonsmooth Convex Optimization

Optimality Conditions for Nonsmooth Convex Optimization Optimality Conditions for Nonsmooth Convex Optimization Sangkyun Lee Oct 22, 2014 Let us consider a convex function f : R n R, where R is the extended real field, R := R {, + }, which is proper (f never

More information

Constrained Optimization and Lagrangian Duality

Constrained Optimization and Lagrangian Duality CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may

More information

Math 273a: Optimization Convex Conjugacy

Math 273a: Optimization Convex Conjugacy Math 273a: Optimization Convex Conjugacy Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com Convex conjugate (the Legendre transform) Let f be a closed proper

More information

Subgradient Projectors: Extensions, Theory, and Characterizations

Subgradient Projectors: Extensions, Theory, and Characterizations Subgradient Projectors: Extensions, Theory, and Characterizations Heinz H. Bauschke, Caifang Wang, Xianfu Wang, and Jia Xu April 13, 2017 Abstract Subgradient projectors play an important role in optimization

More information

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings Structural and Multidisciplinary Optimization P. Duysinx and P. Tossings 2018-2019 CONTACTS Pierre Duysinx Institut de Mécanique et du Génie Civil (B52/3) Phone number: 04/366.91.94 Email: P.Duysinx@uliege.be

More information

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)

More information

MATH 260 Class notes/questions January 10, 2013

MATH 260 Class notes/questions January 10, 2013 MATH 26 Class notes/questions January, 2 Linear transformations Last semester, you studied vector spaces (linear spaces) their bases, dimension, the ideas of linear dependence and linear independence Now

More information

Existence and Approximation of Fixed Points of. Bregman Nonexpansive Operators. Banach Spaces

Existence and Approximation of Fixed Points of. Bregman Nonexpansive Operators. Banach Spaces Existence and Approximation of Fixed Points of in Reflexive Banach Spaces Department of Mathematics The Technion Israel Institute of Technology Haifa 22.07.2010 Joint work with Prof. Simeon Reich General

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Chapter 2 Convex Analysis

Chapter 2 Convex Analysis Chapter 2 Convex Analysis The theory of nonsmooth analysis is based on convex analysis. Thus, we start this chapter by giving basic concepts and results of convexity (for further readings see also [202,

More information

Chapter 2: Preliminaries and elements of convex analysis

Chapter 2: Preliminaries and elements of convex analysis Chapter 2: Preliminaries and elements of convex analysis Edoardo Amaldi DEIB Politecnico di Milano edoardo.amaldi@polimi.it Website: http://home.deib.polimi.it/amaldi/opt-14-15.shtml Academic year 2014-15

More information

3.10 Lagrangian relaxation

3.10 Lagrangian relaxation 3.10 Lagrangian relaxation Consider a generic ILP problem min {c t x : Ax b, Dx d, x Z n } with integer coefficients. Suppose Dx d are the complicating constraints. Often the linear relaxation and the

More information

Nonsmooth optimization: conditioning, convergence, and semi-algebraic models

Nonsmooth optimization: conditioning, convergence, and semi-algebraic models Nonsmooth optimization: conditioning, convergence, and semi-algebraic models Adrian Lewis ORIE Cornell International Congress of Mathematicians Seoul, August 2014 1/16 Outline I Optimization and inverse

More information

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2017 LECTURE 5

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2017 LECTURE 5 STAT 39: MATHEMATICAL COMPUTATIONS I FALL 17 LECTURE 5 1 existence of svd Theorem 1 (Existence of SVD) Every matrix has a singular value decomposition (condensed version) Proof Let A C m n and for simplicity

More information

Convex Optimization M2

Convex Optimization M2 Convex Optimization M2 Lecture 3 A. d Aspremont. Convex Optimization M2. 1/49 Duality A. d Aspremont. Convex Optimization M2. 2/49 DMs DM par email: dm.daspremont@gmail.com A. d Aspremont. Convex Optimization

More information

Math Introduction to Numerical Analysis - Class Notes. Fernando Guevara Vasquez. Version Date: January 17, 2012.

Math Introduction to Numerical Analysis - Class Notes. Fernando Guevara Vasquez. Version Date: January 17, 2012. Math 5620 - Introduction to Numerical Analysis - Class Notes Fernando Guevara Vasquez Version 1990. Date: January 17, 2012. 3 Contents 1. Disclaimer 4 Chapter 1. Iterative methods for solving linear systems

More information

B553 Lecture 5: Matrix Algebra Review

B553 Lecture 5: Matrix Algebra Review B553 Lecture 5: Matrix Algebra Review Kris Hauser January 19, 2012 We have seen in prior lectures how vectors represent points in R n and gradients of functions. Matrices represent linear transformations

More information

Proposition 42. Let M be an m n matrix. Then (32) N (M M)=N (M) (33) R(MM )=R(M)

Proposition 42. Let M be an m n matrix. Then (32) N (M M)=N (M) (33) R(MM )=R(M) RODICA D. COSTIN. Singular Value Decomposition.1. Rectangular matrices. For rectangular matrices M the notions of eigenvalue/vector cannot be defined. However, the products MM and/or M M (which are square,

More information

Linear Programming. Larry Blume Cornell University, IHS Vienna and SFI. Summer 2016

Linear Programming. Larry Blume Cornell University, IHS Vienna and SFI. Summer 2016 Linear Programming Larry Blume Cornell University, IHS Vienna and SFI Summer 2016 These notes derive basic results in finite-dimensional linear programming using tools of convex analysis. Most sources

More information

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014 Convex Optimization Dani Yogatama School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA February 12, 2014 Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12,

More information

Lagrangian-Conic Relaxations, Part I: A Unified Framework and Its Applications to Quadratic Optimization Problems

Lagrangian-Conic Relaxations, Part I: A Unified Framework and Its Applications to Quadratic Optimization Problems Lagrangian-Conic Relaxations, Part I: A Unified Framework and Its Applications to Quadratic Optimization Problems Naohiko Arima, Sunyoung Kim, Masakazu Kojima, and Kim-Chuan Toh Abstract. In Part I of

More information

Division of the Humanities and Social Sciences. Supergradients. KC Border Fall 2001 v ::15.45

Division of the Humanities and Social Sciences. Supergradients. KC Border Fall 2001 v ::15.45 Division of the Humanities and Social Sciences Supergradients KC Border Fall 2001 1 The supergradient of a concave function There is a useful way to characterize the concavity of differentiable functions.

More information

Lecture 5 : Projections

Lecture 5 : Projections Lecture 5 : Projections EE227C. Lecturer: Professor Martin Wainwright. Scribe: Alvin Wan Up until now, we have seen convergence rates of unconstrained gradient descent. Now, we consider a constrained minimization

More information

Optimization methods

Optimization methods Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to

More information

GEOMETRIC APPROACH TO CONVEX SUBDIFFERENTIAL CALCULUS October 10, Dedicated to Franco Giannessi and Diethard Pallaschke with great respect

GEOMETRIC APPROACH TO CONVEX SUBDIFFERENTIAL CALCULUS October 10, Dedicated to Franco Giannessi and Diethard Pallaschke with great respect GEOMETRIC APPROACH TO CONVEX SUBDIFFERENTIAL CALCULUS October 10, 2018 BORIS S. MORDUKHOVICH 1 and NGUYEN MAU NAM 2 Dedicated to Franco Giannessi and Diethard Pallaschke with great respect Abstract. In

More information

BASICS OF CONVEX ANALYSIS

BASICS OF CONVEX ANALYSIS BASICS OF CONVEX ANALYSIS MARKUS GRASMAIR 1. Main Definitions We start with providing the central definitions of convex functions and convex sets. Definition 1. A function f : R n R + } is called convex,

More information

Sparse Optimization Lecture: Dual Methods, Part I

Sparse Optimization Lecture: Dual Methods, Part I Sparse Optimization Lecture: Dual Methods, Part I Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know dual (sub)gradient iteration augmented l 1 iteration

More information

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem Michael Patriksson 0-0 The Relaxation Theorem 1 Problem: find f := infimum f(x), x subject to x S, (1a) (1b) where f : R n R

More information

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725 Dual methods and ADMM Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Given f : R n R, the function is called its conjugate Recall conjugate functions f (y) = max x R n yt x f(x)

More information

Dedicated to Michel Théra in honor of his 70th birthday

Dedicated to Michel Théra in honor of his 70th birthday VARIATIONAL GEOMETRIC APPROACH TO GENERALIZED DIFFERENTIAL AND CONJUGATE CALCULI IN CONVEX ANALYSIS B. S. MORDUKHOVICH 1, N. M. NAM 2, R. B. RECTOR 3 and T. TRAN 4. Dedicated to Michel Théra in honor of

More information

5. Duality. Lagrangian

5. Duality. Lagrangian 5. Duality Convex Optimization Boyd & Vandenberghe Lagrange dual problem weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples generalized

More information

Constrained optimization

Constrained optimization Constrained optimization DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Compressed sensing Convex constrained

More information

GEORGIA INSTITUTE OF TECHNOLOGY H. MILTON STEWART SCHOOL OF INDUSTRIAL AND SYSTEMS ENGINEERING LECTURE NOTES OPTIMIZATION III

GEORGIA INSTITUTE OF TECHNOLOGY H. MILTON STEWART SCHOOL OF INDUSTRIAL AND SYSTEMS ENGINEERING LECTURE NOTES OPTIMIZATION III GEORGIA INSTITUTE OF TECHNOLOGY H. MILTON STEWART SCHOOL OF INDUSTRIAL AND SYSTEMS ENGINEERING LECTURE NOTES OPTIMIZATION III CONVEX ANALYSIS NONLINEAR PROGRAMMING THEORY NONLINEAR PROGRAMMING ALGORITHMS

More information

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725 Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:

More information

Dual methods for the minimization of the total variation

Dual methods for the minimization of the total variation 1 / 30 Dual methods for the minimization of the total variation Rémy Abergel supervisor Lionel Moisan MAP5 - CNRS UMR 8145 Different Learning Seminar, LTCI Thursday 21st April 2016 2 / 30 Plan 1 Introduction

More information

Sparse Optimization Lecture: Basic Sparse Optimization Models

Sparse Optimization Lecture: Basic Sparse Optimization Models Sparse Optimization Lecture: Basic Sparse Optimization Models Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know basic l 1, l 2,1, and nuclear-norm

More information

CSC Linear Programming and Combinatorial Optimization Lecture 10: Semidefinite Programming

CSC Linear Programming and Combinatorial Optimization Lecture 10: Semidefinite Programming CSC2411 - Linear Programming and Combinatorial Optimization Lecture 10: Semidefinite Programming Notes taken by Mike Jamieson March 28, 2005 Summary: In this lecture, we introduce semidefinite programming

More information

Convex Optimization Notes

Convex Optimization Notes Convex Optimization Notes Jonathan Siegel January 2017 1 Convex Analysis This section is devoted to the study of convex functions f : B R {+ } and convex sets U B, for B a Banach space. The case of B =

More information

On nonexpansive and accretive operators in Banach spaces

On nonexpansive and accretive operators in Banach spaces Available online at www.isr-publications.com/jnsa J. Nonlinear Sci. Appl., 10 (2017), 3437 3446 Research Article Journal Homepage: www.tjnsa.com - www.isr-publications.com/jnsa On nonexpansive and accretive

More information

A SHORT INTRODUCTION TO BANACH LATTICES AND

A SHORT INTRODUCTION TO BANACH LATTICES AND CHAPTER A SHORT INTRODUCTION TO BANACH LATTICES AND POSITIVE OPERATORS In tis capter we give a brief introduction to Banac lattices and positive operators. Most results of tis capter can be found, e.g.,

More information

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra. DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1

More information

Convex Optimization. Lecture 12 - Equality Constrained Optimization. Instructor: Yuanzhang Xiao. Fall University of Hawaii at Manoa

Convex Optimization. Lecture 12 - Equality Constrained Optimization. Instructor: Yuanzhang Xiao. Fall University of Hawaii at Manoa Convex Optimization Lecture 12 - Equality Constrained Optimization Instructor: Yuanzhang Xiao University of Hawaii at Manoa Fall 2017 1 / 19 Today s Lecture 1 Basic Concepts 2 for Equality Constrained

More information

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 7: Matrix completion Yuejie Chi The Ohio State University Page 1 Reference Guaranteed Minimum-Rank Solutions of Linear

More information

Subgradient. Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes. definition. subgradient calculus

Subgradient. Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes. definition. subgradient calculus 1/41 Subgradient Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes definition subgradient calculus duality and optimality conditions directional derivative Basic inequality

More information

Convex Feasibility Problems

Convex Feasibility Problems Laureate Prof. Jonathan Borwein with Matthew Tam http://carma.newcastle.edu.au/drmethods/paseky.html Spring School on Variational Analysis VI Paseky nad Jizerou, April 19 25, 2015 Last Revised: May 6,

More information

Numerical Methods. Elena loli Piccolomini. Civil Engeneering. piccolom. Metodi Numerici M p. 1/??

Numerical Methods. Elena loli Piccolomini. Civil Engeneering.  piccolom. Metodi Numerici M p. 1/?? Metodi Numerici M p. 1/?? Numerical Methods Elena loli Piccolomini Civil Engeneering http://www.dm.unibo.it/ piccolom elena.loli@unibo.it Metodi Numerici M p. 2/?? Least Squares Data Fitting Measurement

More information