Proximal Operator and Proximal Algorithms (Lecture notes of UCLA 285J Fall 2016)
|
|
- Arabella Phelps
- 6 years ago
- Views:
Transcription
1 Proximal Operator and Proximal Algorithms (Lecture notes of UCLA 285J Fall 206) Instructor: Wotao Yin April 29, 207 Given a function f, the proximal operator maps an input point x to the minimizer of f restricted to a small proximity to x. The proximal operator has a simple definition yet has long been a powerful tool in optimization, giving rise to a variety of proximal algorithms such as the proximal-point algorithm, the prox-gradient algorithm, as well as many algorithms involving linearization and/or splitting. The proximal operator of a convex function involves the subgradients of the function, which does not need to be convex. Hence, proximal algorithms handle both differentiable and nondifferentiable functions. In comparison, Newton s algorithm requires a C 2 function, and gradient algorithms need C functions. Along with convex duality, proximal algorithms can solve problems with linear constraints. In fact, the Method of Multipliers, or the Augmented Lagrangian Method, is a special case of the point-point algorithm. What is unique about the proximal operator is its implicity. It computes the subgradient of a function at the output point. The subgradient and output point actually determine each other. (In comparison, in Newton s algorithm and gradient algorithms, the Hessian and gradient are evaluated at the input point and determine the output point.) Hence, computing the proximal operator may invert a matrix or evaluate a so-called resolvent. While this can be a disadvantage for some functions, a large number of extremely useful functions such as l, l 2, l 8 norms have closed-form proximal operators! Because the proximal operator is implicit, it is very stable. For example, the proximal-point algorithm converges for any positive step size. The implicitness also make proximal algorithms popular choices for certain nonconvex problems with structures. Proximal algorithms are often used for optimization problems with structures such as large sum, block separability, linear constraints, as well as linear transforms. Coordinate descent and operator splitting techniques often decompose a problem into simple subproblems that are easily solved by proximal algorithms. Therefore, the proximal operator give rise to parallel and distributed algorithms. The proximal operator also has interesting mathematical proper-
2 (lecture notes of ucla 285j fall 206) 2 ties. It is a generalization to projection and has the soft projection interpretation. As the projection to complementary linear subspaces produces an orthogonal decomposition for a point, the proximal operators of a convex function and its convex conjugate yield the Moreau decomposition of a point. Furthermore, the proximal operator provides an optimality condition: a function is minimized at a point if, and only if, the proximal operator of the function evaluated at the point returns the same point. It also a common part the fixed-point optimality conditions for more complicated optimization problems. For convex functions, the proximal operator enjoys the important firmly nonexpansive property, which plays a central role in the monotone operator theory and the operator splitting method. The property leads to sequence convergence and lets us assembly simple operators into an algorithm that solves difficult problems. Consequently, proximal operators are frequently used to handle simple or structured functions in operator-splitting algorithms. Notation To ensure the proximal operator is well defined and gives the unique output, we consider functions f : R n Ñ R Y t8u that is proper, not everywhere 8 closed 2, and convex. Except for examples, the results of this chapter generalizes from R n to a possibly infinite dimensional Hilbert space. Including 8 in the range of f will let us save x P dom f. The set of minimizers of f is denoted as argmin f tx P R n : f pxq min y f pyqu. 2 Its epigraph is a closed set. Equivalently, all its level sets are closed sets. Definition Definition. Given a proper closed convex function f : R n Ñ R Y t8u, the proximal operator, which is scaled by λ 0, is a mapping from R n Ñ R defined by prox λ f pxq : argmin ypr n f pyq 2λ }y x}2. () Lemma. For any λ and x, prox λ f pxq exists and is unique, if f is proper closed convex. Since f is proper closed convex, it is lower bounded by an affine function, therefore Fpyq f pyq 2λ }y x} 2 is coercive, i.e., lim }y}ñ 8 Fpyq 8, then we can take a minimizing sequence such that lim mñ 8 Fpy m q inf ypr n Fpyq, since Fpyq is coercive, so this sequence is bounded,
3 (lecture notes of ucla 285j fall 206) 3 therefore it has a subsequence that converges to a cluster point, say y mk Ñ y. Since Fpyq is closed, so Fpy q lim Fpy m kñ 8 k q inf Fpyq ypr n Therefore y is a minimizer of Fpyq, it is unique since Fpyq is strongly convex. We use λ f as the proximal subscript, but we prefer separating f and λ in the minimization problem in(). In fact, the definition does not change if, instead of (), we let prox λ f pxq : argmin λ f pyq }y x}2. 2 However, the separation yields the Moreau envelope or Moreau-Yoshida approximation: ˆf pxq : min ypr n f pyq }y x}2. 2λ The function ˆf approximate f but is everywhere differentiable, even if f is not so. It is easy to show ˆf pxq λ x prox λ f pxq. (2) Exercise: Prove (2). (add existence and uniqueness.) ([Add an D illustration of f pxq, f pyq 2λ }y x} 2, and ˆf pxq.]) Soft projection Let C be a nonempty closed convex set. Recall the indicator function $ & 0, if x P C δ C pxq : % 8, otherwise. Let λ 0. By definition, prox λδc pxq argmin y δ C pyq }y x}2 2λ argmint}y x} : y P Cu y proj C pxq. The proximal operator of C s indicator function is just the projection onto C. The scaling parameter λ does not make any difference. The indicator function has the special property argmin δ C domδ C C. In general, a proper convex function f satisfies argmin f dom f. Suppose that argmin f is nonempty. For a given x P R n,
4 (lecture notes of ucla 285j fall 206) 4 as λ Ò 8, prox λ f pxq approaches proj argmin f pxq. as λ Ó 0, prox λ f pxq approaches proj dom f pxq. (Add a 2D illustration.) Examples Linear function Given a P R n, b P R and prox λ f pxq : argmin ypr n f pxq : xa, xy b. The proximal operator of this linear function is pa T y bq 2λ }y x}2. The first-order optimality condition is obtained by differentiating the minimization objective in y, yielding a λ pprox λ f pxq xq 0 ðñ prox λ f pxq x λa. (3) The proximal map subtracts λ units along the negative normal direction from the input point. As an application, let f pq be the linear (st order) approximation of a differentiable function f at the point x, namely, Following (3), we obtain f pq pyq : f pxq x f pxq, y xy. prox λ f pqpxq x λ f pxq, which is the gradient descent step with the size λ. Quadratic function Can we recover Newton s algorithm with the proximal operator of the quadratic approximation? We will see it almost does! Let A P R n n be a symmetric positive semi-definite matrix and b P R n be a vector. Consider the quadratic function f pxq : xx, Axy xb, xy. 2
5 (lecture notes of ucla 285j fall 206) 5 The proximal of f is prox λ f pxq : argmin ypr n 2 xy, Ayy xb, yy 2λ }y x}2. By differentiating the minimization objective in y, we obtain the first order optimality condition: Therefore, pay bq λ py xq 0 ô y pλa Iq pλb vq ô y pλa Iq pλb λax x λaxq ô y x pa λ Iq pb Axq prox λ f pxq x pa λ Iq pb Axq. (4) Consider the least-squares problem minimize x }Bx b}2 2 which B P R m n and c P R m. By letting A B T B and b B T c, we recover from (4) the iterative refinement algorithm: x k Ð x k pa λ Iq pb Ax k q. As another application, let us take a C 2 function f, and define its quadratic (2nd order) approximation at a point x: f p2q pyq : f pxq x f pxq, y xy 2 xpy xq, 2 f pxqpy xqy. With A 2 f pxq and b p 2 f pxqq T x f pxq, we simplify f p2q as f p2q pyq xx, Axy xb, xy c, 2 where c collects all y-independent terms, which do not affect the evaluation of proximal operator. Following from (4), we get prox λ f p2qpxq x p 2 f pxq λ Iq f pxq. The iteration x k Ð prox λ f p2qpx k q recovers the modified-hessian Newton algorithm, which is also known as the Levenberg-Marquardt method.
6 (lecture notes of ucla 285j fall 206) 6 l -norm Let f pxq }x}. The point y prox λ f pxq must minimize }y} 2λ }y x} 2. Therefore, it satisfies 0 P B}y } λ py xq ôx y P λ B}y }. (5) Recall that, since } } is separable, its subdifferential simplifies to B}y } B y B y n. Therefore, the condition (5) reduces to the component-wise conditions x i y i P λb y i, i,..., n, which means the graph of px i y i q intersects that of λb y i. From the plot, we observe that $ '& x i λ, if x i λ y i x i λ, if x i λ '% 0, otherwise. Since y i preserves the sign of x i and reduces its absolute by λ; if x i λ, then y i 0, we also have y i signpx i q maxt0, x i λu. Putting this for all i,..., n together yields: prox λ} } pxq y signpxq maxt0, x λu, applied component wise. Therefore, the l -proximal operator earns the name shrinkage and soft-thresholding. In Matlab, the computation can be written in one line as y = sign(x).*max(0,abs(x)-lambda); l 2 norm Let f pxq }x} 2,
7 (lecture notes of ucla 285j fall 206) 7 which is a non-separable function. Recall its subdifferential $ & t B f pxq }x} x u, if x 0 2 % tp : }p} 2 u, otherwise. Since f is differentiable at all but one point, we can apply the assumption trick. Let λ 0. First assume that y prox λ} }2 pxq 0. Then, y must satisfy 0 y }y } 2 λ py xq. (6) Use the polar coordinates x pr x, θ x q, where r x }x} 2 and θ x tan p x 2 y x q. From (6), }y } and x y must have the same angle. 2 Since y }y } 2 and y have the sample angle, their angle must equal the angle of x, or its negative. Therefore, it must hold y αx for α P R. Substituting this into (6) yields Hence, 0 signpαq α λ r x. $ & r x λ r α x, if r x λ % 0, otherwise. and y αx. We have assumed y 0, but it is easy to verify that when r x λ, λ px 0q P tp : }x} 2 u and thus y prox λ} }2 pxq 0. Therefore, $ & }x} 2 λ prox λ} }2 pxq }x} x, if }x} 2 λ 2 (7) % 0, otherwise. In Matlab, the computation can be written in one line as y = (max(0,norm(x)-lambda)/(norm(x)+eps))*x; where eps is added to avoid division by zero. l p,q norm This norm is used to impose properties on a group of variables. For a matrix A P R m n, its l p,q norm is }A} p,q ņ j m i a ij p q p q.
8 (lecture notes of ucla 285j fall 206) 8 The most common example is the l 2, norm, used in the Group Lasso model, }A} 2, ņ j m a ij 2 2 i ņ j }A j } 2, where the A j is the jth column of A. Therefore, }A} 2, is separable to the sum of the l 2 -norms of its columns. To take advantage of this property, we write X X X 2 X n where X i P R n is the ith column of X. Then, prox λ} }2, pxq : argmin YPR m n }Y} 2, 2 }Y X}2 F prox λ} }2 px q,..., prox λ} }2 px n q. Here, each prox λ} }2 px i q is given by (7), i,..., n. l 8 norm prox λl8 can be derived in two ways, either directly from the definition of the proximal operator and } } 8 or following the Moreau decomposition in the next section. TODO: add the direct approach, which is based on B} } 8. It should reduce to the problem of finding t }y } 8 such that x i t λ. (8) Given the solution t to (8), i:x i t prox λl8 pxq signpxq mint x, tu, component-wise. Unitary-invariant matrix norms We call a matrix norm ~ ~ unitary-invariant if ~X~ ~UXV ~ for any matrix X P C m n and unitary matrices U P C m m, V P C n n. Since singular values are rotational (unitary) invariant, all singular-valued based matrix norms are rotational (unitary) invariant. Common examples are nuclear norm ~ ~ : l or sum of singular values, Frobenius norm ~ ~ F : l 2 of singular values, l 2 -operator norm ~ ~ 2 : l 8 or maximum of singular values.
9 (lecture notes of ucla 285j fall 206) 9 They are called Schatten-p norms for p, 2, 8, respectively. Let ~ ~ be a unitary-invariant matrix norm and } } be its corresponding norm of the singular values. Consider the matrix proximal operator Y prox λ~ ~ pxq : argmin ~Y~ Y 2λ ~Y X~2 F. One can show that the solution Y must share the singular value factors, which are unitary matrices, with the input matrix X. Hence, the steps to compute prox λ~ ~ pxq are. Apply SVD to X: A UdiagpσqV, where σ is the vector with the singular values of X and diagpσq is its diagonal matrix; 2. Compute the vector proximal operator: σ prox λ} } pσq; 3. Form the solution: Y Udiagpσ qv. Moreau decomposition Let f : R n Ñ R be a proper closed convex function and λ 0. Let f be the convex conjugate, or the Legendre transform, of f : f puq sup v xv, uy f pvq. (9) In this chapter, we leave this definition unexplained. Another chapter is dedicated to convex duality and the property of convex conjugacy. The Moreau decomposition applies any point x P R n and decomposes it as where y prox λ f pxq, z prox λ f pλ xq. x y λz Complementary linear subspaces Let S be a linear subspace of R n and S K be its complementary subspace. Then,... (9) reduces to x proj S pxq proj S Kpxq.
10 (lecture notes of ucla 285j fall 206) 0 Cone and polar cone l p -norm and l q -ball Let p, q P r, 8s be such that p q. By definition, $ & } } 0, if }u} q, ppuq sup xv, uy } } p pvq v % 8, otherwise. Hence, } } pp q δ } }q p q. (0) Let us compute the projection to the l q -ball of radius α 0: B α q : tx : }x} q αu. Obviously, B α q α tx : }x} q u, and thus δ B α q p q α δ } }q p q. Therefore, by applying the Moreau decomposition (9) with (0) and λ α, we obtain proj B α q pxq prox δb α q pxq prox αδ} } q pxq prox λ δ } } pλxq λ q pλxq prox } } p pλxq λ x λ prox } } p pλxq x αprox } }p px{αq Applying this above identity to l, l 2, l 8 balls, we arrive at. proj } } α pxq x αprox } } 8 p x α q 2. proj } }2 α pxq x αprox } } 2 p x α q 3. proj } }8 α pxq x αprox } } p x α q Projection to box constraints Two vectors l P pt 8u Y Rq n and u P pr Y t8uq n define the box set S rl, us tx P R n : l i x i u i u R n....
11 (lecture notes of ucla 285j fall 206) Projection to subspaces and affine sets S tx P R n : x x 2 x n u.... S tx P R n : xa, xy b 0u.... S tx P R n : xa, xy b 0, xc, xy d 0u. Total variation Consider x P R n,... In two or more dimensional spaces,... graph-cut, max-flow Function under linear transform Let A P R m n. Consider the function, hpxq f paxq Assume that AA T I. Here, if A is a square matrix, A is called an orthogonal or orthonormal matrix. If A is rectangular, we can it a frame. Since AA T I, we have rankpaq m and }y x} 2 }Ay Ax} 2 for any x, y and the linear transform T : x Ñ Ax is surjective. Therefore, prox λh pyq argmin xpr n pz : Axq argmin xpr n f paxq f paxq A T argmin zpr m f pzq A T prox λ f payq. }y x}2 2λ }Ay Ax}2 2λ }Ay z}2 2λ Here, the change of variable from x P R n to z : Ax and z P R n relies on the fact that T : x Ñ Ax is surjective. From the solution z, we can recover the solution x since A T z A T Ax x. Proximable functions Definition 2. A function f : R n Ñ R is proximable if prox γ f can be computed in Opnq or Opnpolylogpnqq time. We have some common proximal functions such as
12 (lecture notes of ucla 285j fall 206) 2. norms: l, l 2, l 2,, l 8 ; 2. separable functions and indicator functions of separable constraints; 3. the standard simplex: tx P R n : T x, x 0u; TODO: add more. This section studies part 2 and some summative proximable functions. Separable sum Proposition. For a separable function it holds that f pxq n i f i px i q, prox λ f pxq prox λ f px q,..., prox λ fn px n q. Summative proximable functions In general, even if f, g are both proximable, h f g may not be proximable. Operator splitting algorithms, one may have to deal with f and g in two subproblems. However, there are exceptions, which we call summative proximable functions. Definition 3. We call h f g is a summative proximable function if prox λh pxq prox λ f prox λg P R n, λ 0. Examples of summative proximal functions are. In R, let f : R Ñ R be convex and satisfy f p0q 0. Then the function f is summative proximable. An example is the elastic net regularization function }x} 2µ }x} 2 2, which is component-wise separable. 2. In R n, if g is a homogeneous function 3, then the function. } } 2 g is 3 gpαxq 0. summative proximable. Special cases of homogeneous functions include norms (e.g., l p -norm, p P r, 8s) and indicator functions δ 0 and δ In R n, let f be a prox-monotone function 4 and g be D total variation 4 For x P R n and i,..., n, prox f satisfies gpxq n i x i x i, then f g is summative proximable. Examples of prox-monotone functions include l x i x j ñ prox f pxq i prox pxq, l 2, l 8 and δ u P R n. The f function α}x} x i x j ñ prox f pxq i prox pxq gpxq is called the Fused Lasso regularizer. f x i x j ñ prox f pxq i prox pxq f i, i, i.
13 (lecture notes of ucla 285j fall 206) 3 Proximal fixed-point optimality conditions Theorem. Let λ 0. The point x P R n is a minimizer of a proper closed convex function f if, and only if, x prox λ f px q. Proof. ñ": Let x P argmin f pxq. Then for any x P R n, f pxq 2λ }x x } 2 f px q f px q 2λ }x x } 2. Thus, x argmin f pxq 2λ }x x } 2 prox f px q. ð": Let x prox λ f px q. By the subgradient optimality condition: 0 P B f px q λ px x q B f px q. Thus, 0 P B f px q, and x P argmin f pxq. Proximal-point algorithm The proximal-point algorithm (PPA) refers to the iteration x k Ð prox λ f px k q, () where λ 0 is the step size. Although it is seldom used as an algorithm to minimizer f, it recovers the Method of Multipliers (Augmented Lagrangian Method) and others. The step size can vary with the iteration in an closed interval, namely, λ k P rl, us for 0 l u 8. Subgradient-descent interpretation Although a negative subgradient may not be a descending direction, we will that, in PPA, a subgradient evaluated at the new point x k ensures function value descent. The PPA iteration () satisfies x k prox λ f px k q ô 0 P λb f px k q x k x k ô x k x k λ r f px k q, where r f px k q P B f px k q is a subgradient. It is uniquely determined by proxλ f px k q even if B f px k q has more than one element. Let us compare f px k q and f px k q. By the definition of subgradient, f px k q f px k q x r f px k q, x k x k y f px k q λ }x k x k } 2. Hence, unless }x k x k } 0, in which case r f px k q 0 and thus x k is optimal, the function value is always decreased.
14 (lecture notes of ucla 285j fall 206) 4 Dual interpretation Let y k r f px k q P B f px k q, then we have y k P B f px k λy k q Therefore computing prox λ f px k q is equivalent to solving for a subgradient at the descent destination. Diminishing Tikhonov-regularization interpretation In PPA iteration, x k argminp f pxq xpr n 2λ }x xk } 2 2 q The second term can be considered as a regularization term, which keeps x k close to x k. Because the regularization is not anchored but uses the current point x k, the amount of regularization goes away as x k converges. Bregman iterative regularization interpretation The Bregman iterative regularization refers to the iteration x k Ð argmin λ D r px; x k q f pxq, (2) xpr n where D r p ; x k q is the Bregman distance (or Bregman divergence) function induced by a proper closed convex (possibly nonsmooth) function r. Specifically, D r px; x k q : rpxq rpx k q xp, x x k y, where p P Brpx k q is a given subgradient. Since p determines the Bregman distance, we sometimes write D r px; x k, pq. By definition of the subgradient, D r px; x k q 0 for all x P R n, and it tends to be smaller when x is closer to x k. Although it is called a distance, it generally violates the conditions of a mathematical distance. The following Bregman distances are often used. Squared Euclidean distance Dpx; yq }x y} 2 2 is induced by rp q } } 2 2 ; 2. The Kullback-Leibler divergence Dpx; yq ņ i x i log x i y i x i y i is induced by rpxq n i px i logpx i q x i q. This Bregman distance measures of the difference between two probability densities x, y.
15 (lecture notes of ucla 285j fall 206) 5 3. l Bregman distance is induced by rp q } }. It is used in compressed sensing. The total variation Bregman distance is used image reconstruction. Note that, due to the existence of multiple subgradients, these two Bregman distances are not defined until a subgradient is specified. Typically, one picks 0 at the beginning and then the one that appears in the optimality condition of the previous iteration. The PPA iteration is a special case of (2) corresponding to the convex function rp q 2 } }2 2. Convergence of the Proximal-Point Algorithm Several more complicated algorithms (including the alternating direction method of multipliers, or ADMM, and a variety of primal-dual methods) are special cases of the PPA. They correspond to particular proximal or resolvent operators. Hence, analyzing the convergence of PPA is fundamental to the study of first-order optimization algorithms and operator splitting methods. The analysis approach that we will take below underlies the analysis of many other algorithms. Let us assume that f is proper closed convex and argmin f is nonempty (but possibly non-unique). We first study the convergence of the sequence tx k u. Definition 4. Consider an operator T : R n Ñ R n.. The operator R : I T is called the (fixed-point) residual operator, and Rpxq x Tpxq is called the (fixed-point) residual at x. 2. The operator T is called firmly nonexpansive if }Tpxq Tpyq} 2 }x y} 2 }Rpxq Rpyq} y P domt.(3) 3. The operator T is called strictly contractive or α-contractive for α P p0, q if }Tpxq Tpyq} α}x y P domt. Through simple algebras, one can show that T is firmly nonexpansive if, and only if, R I T is firmly nonexpansive. We introduce the firmly nonexpansive operator because it leads to sequence convergence and the proximal operator of a convex function is firmly nonexpansive.
16 (lecture notes of ucla 285j fall 206) 6 Proposition 2. For any proper closed convex function f and λ 0, prox λ f is firmly nonexpansive. If f is also strongly convex, then prox λ f is contractive. Proof. add proof As along as f is convex and has a minimizer (not necessarily unique), the PPA converges to a minimizer. Theorem 2. For any proper closed convex function f : R n Ñ R Y t8u that has a minimizer and λ 0, the the PPA iteration () produces a sequence tx k u that converges to some x P argmin f. Proof. Pick an arbitrary x P argmin f. Applying (3) with T prox λ f, x x k, and y x yields }x k x } 2 }x k x } 2 }prox λ f px k q x k } 2, (4) from which we conclude. }x k x } }x 0 x } for all k and, thus, the sequence px k q k 0 is bounded and has a subsequence px j q jpj px k q k 0 such that x : lim jpj x j ; (5) 2. summing (4) in a telescopic fashion gives 8 k0 }prox λ f px k q x k } 2 8 and, thus, lim k }prox λ f px k q x k } 0. (6) Since prox λ f pxq is continuous in x, so is }prox λ f pxq x}. Based on this continuity, from (5) and (6), }prox λ f p xq x} lim jpj }prox λ f px j q x j } 0. Therefore, prox λ f p xq x 0 and, by Theorem, x P argmin f. Recall that x P argmin is arbitrary. By letting x x in (4) we get }x k x} 2 }x k x} 2. For each k 0, define j k maxtj P J : j ku. As j k k, }x k x} }x j k x}. Because tj k : k 0u J, we get lim k }x k x} lim k }x j k x} lim jpj }x j x} 0. TODO: add convergence rates:
17 (lecture notes of ucla 285j fall 206) 7. }x k x k } is monotonically nonincreasing 2. }x k x k } 2 op{kq 3. }x k x k } 2 op{k 2 q using the monotonicity and summability of f px k q f On the other hand, if f is strongly convex, then the PPA converges to its unique minimizer at a linear (geometric) rate. This is a direct result of the Banach Fixed-point Theorem 5. 5 Proximal operators of nonconvex functions In general, a nonconvex function does not have a subdifferential, and the minimization problem of the proximal operator may have more than one stationary points. The proximal operator of a nonconvex function is, therefore, computed in a case-by-case basis. l 0 norm The l 0 function counts the number of nonzeros in the input, that is, }x} 0 : ti : xi 0u. Given a vector x P R n, sort its components so that x rs x r2s x rns, where x ris is the ith smallest component of x (not necessarily equal x i ). Let us compute prox λl0 pxq : argmin ypr n }y} 0 2λ }y x}2 2. Since the value of a nonzero component y i does not affect }y} 0, we only need to decide in the solution y, the set of nonzero components. If y i is nonzero, it would equal x i. In addition, suppose that there are p nonzero components in y, then }y } 0 is fixed as p and, thus, 2λ }y x} 2 2 reaches its minimal if we identify the largest p components of x and send them to the corresponding components of y, making y equal to thresholding the smallest n p components of x to 0. Therefore, the problem is simplified to figure out p. For p }y } 0 0,,..., n, the values of f p min ypr n }y} 0 2λ }y x} 2 2
18 (lecture notes of ucla 285j fall 206) 8 are, respectively, f 0 0 f 2λ 2λ ņ i ņ i2 x ris 2 x ris 2 f n pn q x rns 2 f n n 0. The difference f i f i 2λ x ris 2, i,..., n, is monotonically increasing. Let i argmaxti : f i f i 0u argmaxti : x ris? 2λu and, if argmax not attained, let i 0. Then, the minimal value f is achieved at f i : f 0 p f i f i q i ņ x 2λ ris 2, i i ii which is attained at $? '& 0, if x i 2λ, y prox λl0 pxq where y i 0 or x i,, if x i? 2λ, '% x i, otherwise. Therefore, prox λl0 is also called the (hard) thresholding operator. l {2 and l 2{3 quasi-norms (TODO) Uncovered topics Proximal based operator-splitting algorithms, such as the proxgradient algorithm (a future topic) Dual proximal algorithms (a future topic) Analysis of existence, continuity, boundedness, etc. (references:...) History notes In 6, von Neumann showed that any unitary invariant matrix norm 6 can be written as ~X~ gpσ X q, where σ X is the vector of singular values of X and g is symmetric gauge function. (TODO: add more)
19 (lecture notes of ucla 285j fall 206) 9 Exercise. Consider the function r : R Ñ R Y t8u given by $ & log x, if x 0 rpxq % 8, otherwise. Show that for all y P R and λ 0, prox λr pyq y a y 2 4λ Consider the function r : R Ñ R Y t8u given by $ & x, if x 0 rpxq % 0, otherwise. Derive the formula of prox λr pyq, where y P R and λ Consider the function r : R Ñ R Y t8u given by $ & x, if x 0 rpxq % 8, otherwise. Derive the formula of prox λr pyq, where y P R and λ Consider the weighted -norm }x},w ņ i w i x i. Derive the formula of prox λr pyq, where y P R n and λ Consider the weighted 2-norm g f }x} 2,w e ņ i w i x i 2. Derive the formula of prox λr pyq, where y P R n and λ Given a proper function g : R Ñ R Y t8u and its proximal mapping prox λg, derive the proximal mapping prox γ f for the function f pxq αgpx{βq, where α, β 0 are given.
20 (lecture notes of ucla 285j fall 206) Given a proper function g : R n Ñ R Y t8u and its proximal mapping prox λg, for all λ 0, derive the proximal mapping prox γ f of the function f pxq gpxq 2α }x x0 } 2 2, where α 0 and x 0 P R n are given. 8. Given a proper function g : R Ñ R Y t8u and its proximal mapping prox λg, for all λ 0, derive the proximal mapping prox γ f of the function f : R n Ñ R Y t8u: f pxq gpx x 2 x n q. 9. Given a proper function g : R n Ñ R Y t8u and its proximal mapping prox λg, for all λ 0, derive the proximal mapping of the function f : R n tor Y t8u: f pxq gpax bq, where A P R m n, b P R m are given and AA T αi for some α Define the set D tx P R n : x x 2 x n u. Given a proper function g : R Ñ R Y t8u and its proximal mapping prox λg, for all λ 0, derive the proximal mapping prox γ f of the function f : R n Ñ R Y t8u: f pxq δ D pxq ņ i gpx n q.. Let f, r: R n Ñ R Y t8u be proper closed convex functions, and assume f is continuously differentiable. Show that x P argminprpxq x f pxqq ðñ x prox λr px λ f px qq, where λ 0. Is this still true if r is nonconvex?
Coordinate Update Algorithm Short Course Proximal Operators and Algorithms
Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 36 Why proximal? Newton s method: for C 2 -smooth, unconstrained problems allow
More informationConvergence of Fixed-Point Iterations
Convergence of Fixed-Point Iterations Instructor: Wotao Yin (UCLA Math) July 2016 1 / 30 Why study fixed-point iterations? Abstract many existing algorithms in optimization, numerical linear algebra, and
More informationProximal methods. S. Villa. October 7, 2014
Proximal methods S. Villa October 7, 2014 1 Review of the basics Often machine learning problems require the solution of minimization problems. For instance, the ERM algorithm requires to solve a problem
More informationEE 546, Univ of Washington, Spring Proximal mapping. introduction. review of conjugate functions. proximal mapping. Proximal mapping 6 1
EE 546, Univ of Washington, Spring 2012 6. Proximal mapping introduction review of conjugate functions proximal mapping Proximal mapping 6 1 Proximal mapping the proximal mapping (prox-operator) of a convex
More informationCoordinate Update Algorithm Short Course Operator Splitting
Coordinate Update Algorithm Short Course Operator Splitting Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 25 Operator splitting pipeline 1. Formulate a problem as 0 A(x) + B(x) with monotone operators
More informationMath 273a: Optimization Subgradient Methods
Math 273a: Optimization Subgradient Methods Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com Nonsmooth convex function Recall: For ˉx R n, f(ˉx) := {g R
More informationMaster 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique
Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some
More informationThe proximal mapping
The proximal mapping http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/37 1 closed function 2 Conjugate function
More informationUses of duality. Geoff Gordon & Ryan Tibshirani Optimization /
Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear
More informationMath 273a: Optimization Lagrange Duality
Math 273a: Optimization Lagrange Duality Instructor: Wotao Yin Department of Mathematics, UCLA Winter 2015 online discussions on piazza.com Gradient descent / forward Euler assume function f is proper
More information6. Proximal gradient method
L. Vandenberghe EE236C (Spring 2016) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping
More informationOptimization for Machine Learning
Optimization for Machine Learning (Problems; Algorithms - A) SUVRIT SRA Massachusetts Institute of Technology PKU Summer School on Data Science (July 2017) Course materials http://suvrit.de/teaching.html
More informationConvex Optimization Conjugate, Subdifferential, Proximation
1 Lecture Notes, HCI, 3.11.211 Chapter 6 Convex Optimization Conjugate, Subdifferential, Proximation Bastian Goldlücke Computer Vision Group Technical University of Munich 2 Bastian Goldlücke Overview
More informationOptimization and Optimal Control in Banach Spaces
Optimization and Optimal Control in Banach Spaces Bernhard Schmitzer October 19, 2017 1 Convex non-smooth optimization with proximal operators Remark 1.1 (Motivation). Convex optimization: easier to solve,
More information6. Proximal gradient method
L. Vandenberghe EE236C (Spring 2013-14) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping
More informationConvex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013
Convex Optimization (EE227A: UC Berkeley) Lecture 15 (Gradient methods III) 12 March, 2013 Suvrit Sra Optimal gradient methods 2 / 27 Optimal gradient methods We saw following efficiency estimates for
More informationConvex Optimization and Modeling
Convex Optimization and Modeling Duality Theory and Optimality Conditions 5th lecture, 12.05.2010 Jun.-Prof. Matthias Hein Program of today/next lecture Lagrangian and duality: the Lagrangian the dual
More informationCoordinate Update Algorithm Short Course Subgradients and Subgradient Methods
Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 30 Notation f : H R { } is a closed proper convex function domf := {x R n
More informationPrimal/Dual Decomposition Methods
Primal/Dual Decomposition Methods Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2018-19, HKUST, Hong Kong Outline of Lecture Subgradients
More informationMath 273a: Optimization Overview of First-Order Optimization Algorithms
Math 273a: Optimization Overview of First-Order Optimization Algorithms Wotao Yin Department of Mathematics, UCLA online discussions on piazza.com 1 / 9 Typical flow of numerical optimization Optimization
More informationTight Rates and Equivalence Results of Operator Splitting Schemes
Tight Rates and Equivalence Results of Operator Splitting Schemes Wotao Yin (UCLA Math) Workshop on Optimization for Modern Computing Joint w Damek Davis and Ming Yan UCLA CAM 14-51, 14-58, and 14-59 1
More informationChapter 1. Preliminaries. The purpose of this chapter is to provide some basic background information. Linear Space. Hilbert Space.
Chapter 1 Preliminaries The purpose of this chapter is to provide some basic background information. Linear Space Hilbert Space Basic Principles 1 2 Preliminaries Linear Space The notion of linear space
More informationDual Decomposition.
1/34 Dual Decomposition http://bicmr.pku.edu.cn/~wenzw/opt-2017-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/34 1 Conjugate function 2 introduction:
More informationOptimality Conditions for Constrained Optimization
72 CHAPTER 7 Optimality Conditions for Constrained Optimization 1. First Order Conditions In this section we consider first order optimality conditions for the constrained problem P : minimize f 0 (x)
More informationDual Proximal Gradient Method
Dual Proximal Gradient Method http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/19 1 proximal gradient method
More informationConvex Optimization Theory. Chapter 5 Exercises and Solutions: Extended Version
Convex Optimization Theory Chapter 5 Exercises and Solutions: Extended Version Dimitri P. Bertsekas Massachusetts Institute of Technology Athena Scientific, Belmont, Massachusetts http://www.athenasc.com
More informationLECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE
LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE CONVEX ANALYSIS AND DUALITY Basic concepts of convex analysis Basic concepts of convex optimization Geometric duality framework - MC/MC Constrained optimization
More informationPart 1a: Inner product, Orthogonality, Vector/Matrix norm
Part 1a: Inner product, Orthogonality, Vector/Matrix norm September 19, 2018 Numerical Linear Algebra Part 1a September 19, 2018 1 / 16 1. Inner product on a linear space V over the number field F A map,
More information1 Sparsity and l 1 relaxation
6.883 Learning with Combinatorial Structure Note for Lecture 2 Author: Chiyuan Zhang Sparsity and l relaxation Last time we talked about sparsity and characterized when an l relaxation could recover the
More informationExtreme Abridgment of Boyd and Vandenberghe s Convex Optimization
Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization Compiled by David Rosenberg Abstract Boyd and Vandenberghe s Convex Optimization book is very well-written and a pleasure to read. The
More informationConvex Analysis Notes. Lecturer: Adrian Lewis, Cornell ORIE Scribe: Kevin Kircher, Cornell MAE
Convex Analysis Notes Lecturer: Adrian Lewis, Cornell ORIE Scribe: Kevin Kircher, Cornell MAE These are notes from ORIE 6328, Convex Analysis, as taught by Prof. Adrian Lewis at Cornell University in the
More informationConvex Functions. Pontus Giselsson
Convex Functions Pontus Giselsson 1 Today s lecture lower semicontinuity, closure, convex hull convexity preserving operations precomposition with affine mapping infimal convolution image function supremum
More informationC*-algebras - a case study
- a case study Definition Suppose that H is a Hilbert space. A C -algebra is an operator-norm closed -subalgebra of BpHq. are closed under ultraproducts and subalgebras so they should be captured by continous
More informationAlgorithms for Nonsmooth Optimization
Algorithms for Nonsmooth Optimization Frank E. Curtis, Lehigh University presented at Center for Optimization and Statistical Learning, Northwestern University 2 March 2018 Algorithms for Nonsmooth Optimization
More informationFunctional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability...
Functional Analysis Franck Sueur 2018-2019 Contents 1 Metric spaces 1 1.1 Definitions........................................ 1 1.2 Completeness...................................... 3 1.3 Compactness......................................
More informationSplitting methods for decomposing separable convex programs
Splitting methods for decomposing separable convex programs Philippe Mahey LIMOS - ISIMA - Université Blaise Pascal PGMO, ENSTA 2013 October 4, 2013 1 / 30 Plan 1 Max Monotone Operators Proximal techniques
More informationIterative Convex Optimization Algorithms; Part One: Using the Baillon Haddad Theorem
Iterative Convex Optimization Algorithms; Part One: Using the Baillon Haddad Theorem Charles Byrne (Charles Byrne@uml.edu) http://faculty.uml.edu/cbyrne/cbyrne.html Department of Mathematical Sciences
More informationSequential Unconstrained Minimization: A Survey
Sequential Unconstrained Minimization: A Survey Charles L. Byrne February 21, 2013 Abstract The problem is to minimize a function f : X (, ], over a non-empty subset C of X, where X is an arbitrary set.
More informationConditional Gradient (Frank-Wolfe) Method
Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties
More informationLecture 2: Linear Algebra Review
EE 227A: Convex Optimization and Applications January 19 Lecture 2: Linear Algebra Review Lecturer: Mert Pilanci Reading assignment: Appendix C of BV. Sections 2-6 of the web textbook 1 2.1 Vectors 2.1.1
More informationOn the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1,
Math 30 Winter 05 Solution to Homework 3. Recognizing the convexity of g(x) := x log x, from Jensen s inequality we get d(x) n x + + x n n log x + + x n n where the equality is attained only at x = (/n,...,
More informationOptimization methods
Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,
More informationA Brief Review on Convex Optimization
A Brief Review on Convex Optimization 1 Convex set S R n is convex if x,y S, λ,µ 0, λ+µ = 1 λx+µy S geometrically: x,y S line segment through x,y S examples (one convex, two nonconvex sets): A Brief Review
More informationProximal Methods for Optimization with Spasity-inducing Norms
Proximal Methods for Optimization with Spasity-inducing Norms Group Learning Presentation Xiaowei Zhou Department of Electronic and Computer Engineering The Hong Kong University of Science and Technology
More informationFunctional Analysis I
Functional Analysis I Course Notes by Stefan Richter Transcribed and Annotated by Gregory Zitelli Polar Decomposition Definition. An operator W B(H) is called a partial isometry if W x = X for all x (ker
More informationCSCI : Optimization and Control of Networks. Review on Convex Optimization
CSCI7000-016: Optimization and Control of Networks Review on Convex Optimization 1 Convex set S R n is convex if x,y S, λ,µ 0, λ+µ = 1 λx+µy S geometrically: x,y S line segment through x,y S examples (one
More informationAn introduction to some aspects of functional analysis
An introduction to some aspects of functional analysis Stephen Semmes Rice University Abstract These informal notes deal with some very basic objects in functional analysis, including norms and seminorms
More informationMonotone Operator Splitting Methods in Signal and Image Recovery
Monotone Operator Splitting Methods in Signal and Image Recovery P.L. Combettes 1, J.-C. Pesquet 2, and N. Pustelnik 3 2 Univ. Pierre et Marie Curie, Paris 6 LJLL CNRS UMR 7598 2 Univ. Paris-Est LIGM CNRS
More informationLinear Algebra Massoud Malek
CSUEB Linear Algebra Massoud Malek Inner Product and Normed Space In all that follows, the n n identity matrix is denoted by I n, the n n zero matrix by Z n, and the zero vector by θ n An inner product
More informationAn Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods
An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods Renato D.C. Monteiro B. F. Svaiter May 10, 011 Revised: May 4, 01) Abstract This
More informationConvex Optimization & Lagrange Duality
Convex Optimization & Lagrange Duality Chee Wei Tan CS 8292 : Advanced Topics in Convex Optimization and its Applications Fall 2010 Outline Convex optimization Optimality condition Lagrange duality KKT
More informationIntroduction to Alternating Direction Method of Multipliers
Introduction to Alternating Direction Method of Multipliers Yale Chang Machine Learning Group Meeting September 29, 2016 Yale Chang (Machine Learning Group Meeting) Introduction to Alternating Direction
More informationSTAT 200C: High-dimensional Statistics
STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57
More informationComputational Statistics and Optimisation. Joseph Salmon Télécom Paristech, Institut Mines-Télécom
Computational Statistics and Optimisation Joseph Salmon http://josephsalmon.eu Télécom Paristech, Institut Mines-Télécom Plan Duality gap and stopping criterion Back to gradient descent analysis Forward-backward
More informationOptimality Conditions for Nonsmooth Convex Optimization
Optimality Conditions for Nonsmooth Convex Optimization Sangkyun Lee Oct 22, 2014 Let us consider a convex function f : R n R, where R is the extended real field, R := R {, + }, which is proper (f never
More informationConstrained Optimization and Lagrangian Duality
CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may
More informationMath 273a: Optimization Convex Conjugacy
Math 273a: Optimization Convex Conjugacy Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com Convex conjugate (the Legendre transform) Let f be a closed proper
More informationSubgradient Projectors: Extensions, Theory, and Characterizations
Subgradient Projectors: Extensions, Theory, and Characterizations Heinz H. Bauschke, Caifang Wang, Xianfu Wang, and Jia Xu April 13, 2017 Abstract Subgradient projectors play an important role in optimization
More informationStructural and Multidisciplinary Optimization. P. Duysinx and P. Tossings
Structural and Multidisciplinary Optimization P. Duysinx and P. Tossings 2018-2019 CONTACTS Pierre Duysinx Institut de Mécanique et du Génie Civil (B52/3) Phone number: 04/366.91.94 Email: P.Duysinx@uliege.be
More informationFrank-Wolfe Method. Ryan Tibshirani Convex Optimization
Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)
More informationMATH 260 Class notes/questions January 10, 2013
MATH 26 Class notes/questions January, 2 Linear transformations Last semester, you studied vector spaces (linear spaces) their bases, dimension, the ideas of linear dependence and linear independence Now
More informationExistence and Approximation of Fixed Points of. Bregman Nonexpansive Operators. Banach Spaces
Existence and Approximation of Fixed Points of in Reflexive Banach Spaces Department of Mathematics The Technion Israel Institute of Technology Haifa 22.07.2010 Joint work with Prof. Simeon Reich General
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring
More informationChapter 2 Convex Analysis
Chapter 2 Convex Analysis The theory of nonsmooth analysis is based on convex analysis. Thus, we start this chapter by giving basic concepts and results of convexity (for further readings see also [202,
More informationChapter 2: Preliminaries and elements of convex analysis
Chapter 2: Preliminaries and elements of convex analysis Edoardo Amaldi DEIB Politecnico di Milano edoardo.amaldi@polimi.it Website: http://home.deib.polimi.it/amaldi/opt-14-15.shtml Academic year 2014-15
More information3.10 Lagrangian relaxation
3.10 Lagrangian relaxation Consider a generic ILP problem min {c t x : Ax b, Dx d, x Z n } with integer coefficients. Suppose Dx d are the complicating constraints. Often the linear relaxation and the
More informationNonsmooth optimization: conditioning, convergence, and semi-algebraic models
Nonsmooth optimization: conditioning, convergence, and semi-algebraic models Adrian Lewis ORIE Cornell International Congress of Mathematicians Seoul, August 2014 1/16 Outline I Optimization and inverse
More informationSTAT 309: MATHEMATICAL COMPUTATIONS I FALL 2017 LECTURE 5
STAT 39: MATHEMATICAL COMPUTATIONS I FALL 17 LECTURE 5 1 existence of svd Theorem 1 (Existence of SVD) Every matrix has a singular value decomposition (condensed version) Proof Let A C m n and for simplicity
More informationConvex Optimization M2
Convex Optimization M2 Lecture 3 A. d Aspremont. Convex Optimization M2. 1/49 Duality A. d Aspremont. Convex Optimization M2. 2/49 DMs DM par email: dm.daspremont@gmail.com A. d Aspremont. Convex Optimization
More informationMath Introduction to Numerical Analysis - Class Notes. Fernando Guevara Vasquez. Version Date: January 17, 2012.
Math 5620 - Introduction to Numerical Analysis - Class Notes Fernando Guevara Vasquez Version 1990. Date: January 17, 2012. 3 Contents 1. Disclaimer 4 Chapter 1. Iterative methods for solving linear systems
More informationB553 Lecture 5: Matrix Algebra Review
B553 Lecture 5: Matrix Algebra Review Kris Hauser January 19, 2012 We have seen in prior lectures how vectors represent points in R n and gradients of functions. Matrices represent linear transformations
More informationProposition 42. Let M be an m n matrix. Then (32) N (M M)=N (M) (33) R(MM )=R(M)
RODICA D. COSTIN. Singular Value Decomposition.1. Rectangular matrices. For rectangular matrices M the notions of eigenvalue/vector cannot be defined. However, the products MM and/or M M (which are square,
More informationLinear Programming. Larry Blume Cornell University, IHS Vienna and SFI. Summer 2016
Linear Programming Larry Blume Cornell University, IHS Vienna and SFI Summer 2016 These notes derive basic results in finite-dimensional linear programming using tools of convex analysis. Most sources
More informationConvex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014
Convex Optimization Dani Yogatama School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA February 12, 2014 Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12,
More informationLagrangian-Conic Relaxations, Part I: A Unified Framework and Its Applications to Quadratic Optimization Problems
Lagrangian-Conic Relaxations, Part I: A Unified Framework and Its Applications to Quadratic Optimization Problems Naohiko Arima, Sunyoung Kim, Masakazu Kojima, and Kim-Chuan Toh Abstract. In Part I of
More informationDivision of the Humanities and Social Sciences. Supergradients. KC Border Fall 2001 v ::15.45
Division of the Humanities and Social Sciences Supergradients KC Border Fall 2001 1 The supergradient of a concave function There is a useful way to characterize the concavity of differentiable functions.
More informationLecture 5 : Projections
Lecture 5 : Projections EE227C. Lecturer: Professor Martin Wainwright. Scribe: Alvin Wan Up until now, we have seen convergence rates of unconstrained gradient descent. Now, we consider a constrained minimization
More informationOptimization methods
Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to
More informationGEOMETRIC APPROACH TO CONVEX SUBDIFFERENTIAL CALCULUS October 10, Dedicated to Franco Giannessi and Diethard Pallaschke with great respect
GEOMETRIC APPROACH TO CONVEX SUBDIFFERENTIAL CALCULUS October 10, 2018 BORIS S. MORDUKHOVICH 1 and NGUYEN MAU NAM 2 Dedicated to Franco Giannessi and Diethard Pallaschke with great respect Abstract. In
More informationBASICS OF CONVEX ANALYSIS
BASICS OF CONVEX ANALYSIS MARKUS GRASMAIR 1. Main Definitions We start with providing the central definitions of convex functions and convex sets. Definition 1. A function f : R n R + } is called convex,
More informationSparse Optimization Lecture: Dual Methods, Part I
Sparse Optimization Lecture: Dual Methods, Part I Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know dual (sub)gradient iteration augmented l 1 iteration
More informationLecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem
Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem Michael Patriksson 0-0 The Relaxation Theorem 1 Problem: find f := infimum f(x), x subject to x S, (1a) (1b) where f : R n R
More informationDual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725
Dual methods and ADMM Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Given f : R n R, the function is called its conjugate Recall conjugate functions f (y) = max x R n yt x f(x)
More informationDedicated to Michel Théra in honor of his 70th birthday
VARIATIONAL GEOMETRIC APPROACH TO GENERALIZED DIFFERENTIAL AND CONJUGATE CALCULI IN CONVEX ANALYSIS B. S. MORDUKHOVICH 1, N. M. NAM 2, R. B. RECTOR 3 and T. TRAN 4. Dedicated to Michel Théra in honor of
More information5. Duality. Lagrangian
5. Duality Convex Optimization Boyd & Vandenberghe Lagrange dual problem weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples generalized
More informationConstrained optimization
Constrained optimization DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Compressed sensing Convex constrained
More informationGEORGIA INSTITUTE OF TECHNOLOGY H. MILTON STEWART SCHOOL OF INDUSTRIAL AND SYSTEMS ENGINEERING LECTURE NOTES OPTIMIZATION III
GEORGIA INSTITUTE OF TECHNOLOGY H. MILTON STEWART SCHOOL OF INDUSTRIAL AND SYSTEMS ENGINEERING LECTURE NOTES OPTIMIZATION III CONVEX ANALYSIS NONLINEAR PROGRAMMING THEORY NONLINEAR PROGRAMMING ALGORITHMS
More informationProximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725
Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:
More informationDual methods for the minimization of the total variation
1 / 30 Dual methods for the minimization of the total variation Rémy Abergel supervisor Lionel Moisan MAP5 - CNRS UMR 8145 Different Learning Seminar, LTCI Thursday 21st April 2016 2 / 30 Plan 1 Introduction
More informationSparse Optimization Lecture: Basic Sparse Optimization Models
Sparse Optimization Lecture: Basic Sparse Optimization Models Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know basic l 1, l 2,1, and nuclear-norm
More informationCSC Linear Programming and Combinatorial Optimization Lecture 10: Semidefinite Programming
CSC2411 - Linear Programming and Combinatorial Optimization Lecture 10: Semidefinite Programming Notes taken by Mike Jamieson March 28, 2005 Summary: In this lecture, we introduce semidefinite programming
More informationConvex Optimization Notes
Convex Optimization Notes Jonathan Siegel January 2017 1 Convex Analysis This section is devoted to the study of convex functions f : B R {+ } and convex sets U B, for B a Banach space. The case of B =
More informationOn nonexpansive and accretive operators in Banach spaces
Available online at www.isr-publications.com/jnsa J. Nonlinear Sci. Appl., 10 (2017), 3437 3446 Research Article Journal Homepage: www.tjnsa.com - www.isr-publications.com/jnsa On nonexpansive and accretive
More informationA SHORT INTRODUCTION TO BANACH LATTICES AND
CHAPTER A SHORT INTRODUCTION TO BANACH LATTICES AND POSITIVE OPERATORS In tis capter we give a brief introduction to Banac lattices and positive operators. Most results of tis capter can be found, e.g.,
More informationDS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.
DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1
More informationConvex Optimization. Lecture 12 - Equality Constrained Optimization. Instructor: Yuanzhang Xiao. Fall University of Hawaii at Manoa
Convex Optimization Lecture 12 - Equality Constrained Optimization Instructor: Yuanzhang Xiao University of Hawaii at Manoa Fall 2017 1 / 19 Today s Lecture 1 Basic Concepts 2 for Equality Constrained
More informationECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis
ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 7: Matrix completion Yuejie Chi The Ohio State University Page 1 Reference Guaranteed Minimum-Rank Solutions of Linear
More informationSubgradient. Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes. definition. subgradient calculus
1/41 Subgradient Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes definition subgradient calculus duality and optimality conditions directional derivative Basic inequality
More informationConvex Feasibility Problems
Laureate Prof. Jonathan Borwein with Matthew Tam http://carma.newcastle.edu.au/drmethods/paseky.html Spring School on Variational Analysis VI Paseky nad Jizerou, April 19 25, 2015 Last Revised: May 6,
More informationNumerical Methods. Elena loli Piccolomini. Civil Engeneering. piccolom. Metodi Numerici M p. 1/??
Metodi Numerici M p. 1/?? Numerical Methods Elena loli Piccolomini Civil Engeneering http://www.dm.unibo.it/ piccolom elena.loli@unibo.it Metodi Numerici M p. 2/?? Least Squares Data Fitting Measurement
More information