Stochastic Gradient Descent Algorithms for Resource Allocation

Size: px
Start display at page:

Download "Stochastic Gradient Descent Algorithms for Resource Allocation"

Transcription

1 Stochastic Gradient Descent Algorithms for Resource Allocation Amrit Singh Bedi Supervisor: Dr. Ketan Rajawat Department of Electrical Engineering, Indian Institute of Technology, Kanpur Kanpur, Uttar Pradesh 1/65

2 Outline Gradient descent algorithm 2/65

3 Outline Gradient descent algorithm Subgradient descent algorithm 2/65

4 Outline Gradient descent algorithm Subgradient descent algorithm Stochastic subgradient algorithm 2/65

5 Outline Gradient descent algorithm Subgradient descent algorithm Stochastic subgradient algorithm Incremental Stochastic subgradient algorithm 2/65

6 Outline Gradient descent algorithm Subgradient descent algorithm Stochastic subgradient algorithm Incremental Stochastic subgradient algorithm Ergodic stochastic algorithm 2/65

7 Outline Gradient descent algorithm Subgradient descent algorithm Stochastic subgradient algorithm Incremental Stochastic subgradient algorithm Ergodic stochastic algorithm Applications in wireless communication, smart grid. 2/65

8 Outline Gradient descent algorithm Subgradient descent algorithm Stochastic subgradient algorithm Incremental Stochastic subgradient algorithm Ergodic stochastic algorithm Applications in wireless communication, smart grid. Future work 2/65

9 Introduction Standard convex optimization problem minimize f (x) (1) subject to g(x) 0 (2) x X 3/65

10 Introduction Standard convex optimization problem minimize f (x) (1) subject to g(x) 0 (2) x X x is the optimization variable 3/65

11 Introduction Standard convex optimization problem minimize f (x) (1) subject to g(x) 0 (2) x X x is the optimization variable f (x) is the objective function 3/65

12 Introduction Standard convex optimization problem minimize f (x) (1) subject to g(x) 0 (2) x X x is the optimization variable f (x) is the objective function g(x) is the constraint function 3/65

13 Introduction Standard convex optimization problem minimize f (x) (1) subject to g(x) 0 (2) x X x is the optimization variable f (x) is the objective function g(x) is the constraint function X represents the convex domain for x 3/65

14 First-order methods 4/65

15 First-order methods Only the first order derivative is required 4/65

16 First-order methods Only the first order derivative is required Every iteration is inexpensive, does not require second derivative 4/65

17 First-order methods Only the first order derivative is required Every iteration is inexpensive, does not require second derivative Useful for large scale optimization problems 4/65

18 First-order methods Only the first order derivative is required Every iteration is inexpensive, does not require second derivative Useful for large scale optimization problems Can be easily extended to include uncertainty cases 4/65

19 First-order methods Only the first order derivative is required Every iteration is inexpensive, does not require second derivative Useful for large scale optimization problems Can be easily extended to include uncertainty cases Useful to take optimal decisions on-the-fly 4/65

20 Gradient Descent Algorithm [1] Motivation: Very useful for large scale problems, much faster 5/65

21 Gradient Descent Algorithm [1] Motivation: Very useful for large scale problems, much faster Definition: If f : R n R, the gradient is given by ( f f (x 1, x 2,, x n ) :=, f,, f ) x 1 x 2 x n (3) 5/65

22 Gradient Descent Algorithm [1] Motivation: Very useful for large scale problems, much faster Definition: If f : R n R, the gradient is given by ( f f (x 1, x 2,, x n ) :=, f,, f ) x 1 x 2 x n (3) Algorithm: x (t+1) = x (t) ɛ (t) f (x (t) ) (4) where, ɛ (t) is the step size for the algorithms. [1] Boyd S, Vandenberghe L. Convex optimization. Cambridge university press; /65

23 Convergence analysis Convergence properties of the algorithm is governed by ɛ(t) Too small values of ɛ(t) will cause the algorithm to converge slowly Too large values could cause the algorithm to overshoot and diverge A simple convergence analysis for constant step size is discussed here: Definition: A function f : R n R is called L-Lipschitz continuous gradient if and only if f (x) f (y) 2 L x y 2, x, y R n (5) 6/65

24 Convergence analysis Convergence properties of the algorithm is governed by ɛ(t) Too small values of ɛ(t) will cause the algorithm to converge slowly Too large values could cause the algorithm to overshoot and diverge A simple convergence analysis for constant step size is discussed here: Definition: A function f : R n R is called L-Lipschitz continuous gradient if and only if Implications: f (x) f (y) 2 L x y 2, x, y R n (5) Lipschitz continuous gradient, denoted as f C L Speed at which gradient varies is bounded Objective function has bounded curvature 6/65

25 Lemma Let f C L, then the following upper bound holds f (y) f (x) + f (x) T (y x) + L 2 y x 2 2 (6) 7/65

26 Lemma Let f C L, then the following upper bound holds f (y) f (x) + f (x) T (y x) + L 2 y x 2 2 (6) Theorem If f C L and f = min x f (x) >, then algorithm with constant step size ɛ < 2 L will converge to stationary point, as f (x (t) ) 2 0, as t. (7) 7/65

27 Proof: Using the lemma, put y = x (t+1), we get f (x (t+1) ) f (x (t) ) + f (x (t) ) T (x (t+1) x (t) ) + L 2 x(t+1) x (t) 2 =f (x (t) ) ɛ f (x (t) ) 2 + ɛ2 L 2 f (x(t) ) 2 =f (x (t) ) ɛ(1 ɛ 2 L) f (x(t) ) 2 = f (x (t) ) 2 1 ( ) ɛ(1 ɛ 2 L) f (x (t) ) f (x (t+1) ) T 1 f (x (t) ) 2 1 ( ) ɛ(1 ɛ t=0 2 L) f (x (0) ) f (x (T ) ) 1 ( ) ɛ(1 ɛ 2 L) f (x (0) ) f ) (8) (9) (10) 8/65

28 Proof: Using the lemma, put y = x (t+1), we get f (x (t+1) ) f (x (t) ) + f (x (t) ) T (x (t+1) x (t) ) + L 2 x(t+1) x (t) 2 =f (x (t) ) ɛ f (x (t) ) 2 + ɛ2 L 2 f (x(t) ) 2 =f (x (t) ) ɛ(1 ɛ 2 L) f (x(t) ) 2 = f (x (t) ) 2 1 ( ) ɛ(1 ɛ 2 L) f (x (t) ) f (x (t+1) ) T 1 f (x (t) ) 2 1 ( ) ɛ(1 ɛ t=0 2 L) f (x (0) ) f (x (T ) ) 1 ( ) ɛ(1 ɛ 2 L) f (x (0) ) f ) (8) (9) (10) Since f >, therefore, as T, the LHS must converge f (x (t) ) 2 0, as t. (11) 8/65

29 Bound on x x 2 In a similar way, if the function f is assumed to be strongly convex, 2 f (x) mi, then we could bound the term x x 2 as follows: which will follows from the inequality by substituting y = x. x x 2 2 m f (x) 2 2 (12) f (y) f (x) + f (x) T (y x) + m 2 y x 2 2 (13) 9/65

30 Convergence rate Smoothness of the objective controls the convergence rate of gradient based methods Convex objective f (x) Iterations... Nondifferentiable O(1/ɛ 2 ) differentiable O(1/ɛ) Smooth (Lipschitz gradient) O(1/ ɛ) Strongly convex O(log (1/ɛ)) 10/65

31 Contrast with Newton method 11/65

32 Contrast with Newton method In general, if we minimize a n dimensional objective function 11/65

33 Contrast with Newton method In general, if we minimize a n dimensional objective function Gradient descent requires more iterations, but each one is fast (we only need to compute 1st derivatives) 11/65

34 Contrast with Newton method In general, if we minimize a n dimensional objective function Gradient descent requires more iterations, but each one is fast (we only need to compute 1st derivatives) Newton s method requires fewer iterations, but each one is slow (we need to compute 2nd dervatives too) 11/65

35 Contrast with Newton method In general, if we minimize a n dimensional objective function Gradient descent requires more iterations, but each one is fast (we only need to compute 1st derivatives) Newton s method requires fewer iterations, but each one is slow (we need to compute 2nd dervatives too) Recent result Accelerated method are proposed in [2] An O(1/k) Gradient Method for Network Resource Allocation is proposed in [3] [2]Tseng P. On accelerated proximal gradient methods for convex-concave optimization Submitted to SIAM J. Optim [3]Beck A, Nedic A, Ozdaglar A, Teboulle M. An Gradient Method for Network Resource Allocation Problems. IEEE TCNS /65

36 Subgradient Methods Subgradient method [4] is a simple algorithm for minimizing the non-differential convex function. 12/65

37 Subgradient Methods Subgradient method [4] is a simple algorithm for minimizing the non-differential convex function. Applies direcly to non-differential objective functions 12/65

38 Subgradient Methods Subgradient method [4] is a simple algorithm for minimizing the non-differential convex function. Applies direcly to non-differential objective functions In contrast to gradient method, function value may increase 12/65

39 Subgradient Methods Subgradient method [4] is a simple algorithm for minimizing the non-differential convex function. Applies direcly to non-differential objective functions In contrast to gradient method, function value may increase Definition: Vector g is the subgradient of f ( ) at x, if f (y) f (x) + g T (y x) for all y (14) [4]Boyd S, Mutapcic A. Subgradient methods. Lecture notes of EE364b, Stanford University, Winter Quarter. 2006; /65

40 Figure: Example for one dimensional setting g 1, g 2 and g 3 are the subgradients at x 1, and x 2. 13/65

41 Algorithm For unconstrained convex problems, the algorithm is x (t+1) = x (t) ɛ (t) g (t) (15) Here, g (t) is any subgradient of f at x (t) and ɛ (t) > 0 is the step size. 14/65

42 Algorithm For unconstrained convex problems, the algorithm is x (t+1) = x (t) ɛ (t) g (t) (15) Here, g (t) is any subgradient of f at x (t) and ɛ (t) > 0 is the step size. Since, it in not a descent method, it is common to keep track of the best point found so far given by f t best := min{f (x (1) ), f (x (2) ),, f (x (t) )} (16) 14/65

43 Convergence Analysis It is guaranteed to converge to within some range of optimal value lim f best t f < ɛ (17) t where, ɛ is function of step size parameter. Assumptions: Consider the following assumptions for the proof: There exist a minimizer of f, say x The norm of subgradient is bounded, i.e., g (t) 2 G for all t 15/65

44 Convergence Analysis Consider the Euclidean distance to the optimal point, we have x (t+1) x 2 2 = x (t) ɛg (t) x 2 2 (18) ( ) x (t) x 2 2 2ɛ f (x (t) f ) + ɛ 2 G 2 (19) 16/65

45 Convergence Analysis Consider the Euclidean distance to the optimal point, we have x (t+1) x 2 2 = x (t) ɛg (t) x 2 2 (18) ( ) x (t) x 2 2 2ɛ f (x (t) f ) + ɛ 2 G 2 (19) summation over t yields x (t+1) x 2 2 x (1) x 2 2 2ɛ t i=1 ( ) f (x (i) f ) + ɛ 2 TG 2 (20) 16/65

46 Convergence Analysis Consider the Euclidean distance to the optimal point, we have x (t+1) x 2 2 = x (t) ɛg (t) x 2 2 (18) ( ) x (t) x 2 2 2ɛ f (x (t) f ) + ɛ 2 G 2 (19) summation over t yields = x (t+1) x 2 2 x (1) x 2 2 2ɛ 2ɛ t i=1 t i=1 ( ) f (x (i) f ) + ɛ 2 TG 2 (20) ( ) f (x (i) f ) R 2 + ɛ 2 TG 2 (21) f (t) best f R2 + ɛ 2 TG 2 2ɛT (22) 16/65

47 A simple example[5] 17/65

48 Projected subgradient method Consider the constrained optimization problem minimize f (x) (23) subject to x X (24) 18/65

49 Projected subgradient method Consider the constrained optimization problem minimize f (x) (23) subject to x X (24) Projected subgradient algorithm [4] for this problem is x (t+1) = P X [x (t) ɛ (t) g (t)] (25) [4]Boyd S, Mutapcic A. Subgradient methods. Lecture notes of EE364b, Stanford University, Winter Quarter. 2006; /65

50 Simple to implement and can be applied to variety of problems Subgradient methods were first introduced in the middle sixties by N. Z. Shor [6] Extensive treatment of these subgradient methods are provided in books [7, 17] Nemirovski & Yudin in [8] derived the worst-case complexity bound to achieve an ɛ-solution where it is O( 1 ɛ 2 ) for Lipschitz continuous nonsmooth problems and O( 1 ɛ ) for smooth problems with Lipschitz continuous gradient Mixture with primal or dual decomposition techniques, sometimes provides simple distributed algorithm [9] [6]Shor NZ. Minimization Methods for Non-differentiable Functions. Springer, [7] Shor NZ. Nondifferentiable Optimization and Polynomial Problems. Springer Science & Business Media; [8]Blair C. Problem Complexity and Method Efficiency in Optimization (AS Nemirovsky and DB Yudin). SIAM Review [9] Palomar DP, Chiang M. A tutorial on decomposition methods for network utility maximization. IEEE JSAC /65

51 Dual Descent Algorithms [10, 11] Consider the Primal problem minimize f 0 (x) (26) subject to f i (x) 0, i = 1,, m. (27) 20/65

52 Dual Descent Algorithms [10, 11] Consider the Primal problem minimize f 0 (x) (26) subject to f i (x) 0, i = 1,, m. (27) the Lagrangian is given by m L(x, λ) = f 0 (x) + λ i f i (x) (28) We assume that, x (λ) is unique maximizer of Lagrangian over x. i=1 [10]Kelly F. Charging and rate control for elastic traffic. European tran. on Telecommun., [11]Bertsekas DP, Nedi A, Ozdaglar AE. Convex analysis and optimization. 20/65

53 The dual function is g(λ) = inf x L(x, λ) = f 0(x (λ)) + (for λ 0). The dual problem is m λ i f i (x (λ)) (29) i=1 maximize g(λ) (30) subject to λ 0 Assuming the slater condition holds, we have x = x (λ ). 21/65

54 The dual function is g(λ) = inf x L(x, λ) = f 0(x (λ)) + (for λ 0). The dual problem is m λ i f i (x (λ)) (29) i=1 maximize g(λ) (30) subject to λ 0 Assuming the slater condition holds, we have x = x (λ ). Applying projected subgradient method on dual, we get ( ) m x (t) = arg min f 0 (x) + λ (t) i f i (x) x i=1 [ ] λ (t+1) i = λ (t) i + α t f i (x (t) ) + (31) (32) 21/65

55 Motivation Motivation for solving dual problem: 22/65

56 Motivation Motivation for solving dual problem: The dual is a convex optimization problem 22/65

57 Motivation Motivation for solving dual problem: The dual is a convex optimization problem Dual may have smaller dimensions than primal 22/65

58 Motivation Motivation for solving dual problem: The dual is a convex optimization problem Dual may have smaller dimensions than primal If duality gap is zero, primal optimal can be derived form dual 22/65

59 Motivation Motivation for solving dual problem: The dual is a convex optimization problem Dual may have smaller dimensions than primal If duality gap is zero, primal optimal can be derived form dual if duality gap is non-zero, dual provides a lower bound for primal 22/65

60 Recent results and scope 23/65

61 Recent results and scope Recent convergence results in this direction are discussed in [12] averaging scheme applied Provides convergence rate estimates for the approximate solutions (Slater s qualification) Amount of feasibility violation Provides upper and lower bound for primal objective function Accelerated Dual Descent for Network Flow Optimization [13] [12] Nedic A, Ozdaglar A. Approximate primal solutions and rate analysis for dual subgradient methods. SIAM Journal on Opt [13]Zargham M, Ribeiro A, Ozdaglar A, Jadbabaie A. Accelerated dual descent for network flow optimization. IEEE TAC /65

62 Standard Convex Stochastic Optimization Problem minimize E [f 0 (x, w)] (33) subject to E [f i (x, w)] 0, i = 1, 2,, m. (34) x X (35) 24/65

63 Standard Convex Stochastic Optimization Problem minimize E [f 0 (x, w)] (33) subject to E [f i (x, w)] 0, i = 1, 2,, m. (34) x X (35) x is the optimization variable, w is the random variable Here, f i (x, w) is convex in x for each realization of w. X represents the box constraints. 24/65

64 History and Motivation History of stochastic algorithms way back to adaptive filtering algorithm by Robbins & Monro [14] and Widrow & Stearns [15] Extensively studied in the context of LMS and RLS algorithms [16] Stochastic subgradient methods with detailed analysis are discussed in [18, 19] [14]Robbins H, Monro S. A stochastic approximation method. AMS. Sep [15] Widrow B, Stearns SD. Adaptive signal processing. Englewood Cliffs, NJ, Prentice-Hall, Inc., [16]Sayed AH. Adaptive filters. John Wiley & Sons; Oct [18]Ermoliev Y. stochastic quasigradient methods and their application to system optimization. Stochastics: An IJPSP /65

65 Different applications Stochastic gradient, subgradient methods have been widely applied to Neural networks [20], Parameter tracking [21], Large scale machine learning [22, 23], and Resource allocation problems [24, 25, 27, 28, 34]. [20]Bottou L. Stochastic gradient learning in neural networks. Proceedings of Neuro-Nmes [21]Kushner HJ, Yang J. Analysis of adaptive step size SA algorithms for parameter tracking., Proc. of IEEE Conference on DoC, [22]Bottou L. Large-scale machine learning with stochastic gradient descent. In Proc. of COMPSTAT [24]Alaei S, Hajiaghayi M, Liaghat V. The online stochastic generalized assignment problem. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques [27]Wang X, Gao N. Stochastic resource allocation over fading multiple access and broadcast channels. IEEE TIT, /65

66 Relation with LMS algorithm The goal is to minimize E [ e(t) 2] 27/65

67 Relation with LMS algorithm The goal is to minimize E [ e(t) 2] Utilizing the steepest descent we get h(t + 1) = h(t) + µe [u(t)e(t)] (36) when statistics are not known, then using the approximation for average term, we get LMS h(t + 1) = h(t) + µu(t)e(t) (37) 27/65

68 Relation with LMS algorithm The goal is to minimize E [ e(t) 2] Utilizing the steepest descent we get h(t + 1) = h(t) + µe [u(t)e(t)] (36) when statistics are not known, then using the approximation for average term, we get LMS h(t + 1) = h(t) + µu(t)e(t) (37) which is nothing but the stochastic gradient descent algorithm. 27/65

69 Stochastic Subgradient Methods 28/65

70 Stochastic Subgradient Methods Stochastic subgradient algorithms [29] are generalization of gradient one 28/65

71 Stochastic Subgradient Methods Stochastic subgradient algorithms [29] are generalization of gradient one Uses noisy subgradient and more limited set of step size rules 28/65

72 Stochastic Subgradient Methods Stochastic subgradient algorithms [29] are generalization of gradient one Uses noisy subgradient and more limited set of step size rules Definition: A noisy (unbiased) subgradient of f ( ) at x domf is defined as vector g, if f (y) f (x) + (E [ g]) T (y x) (38) for all y. This noisy subgradient can be written as g = g + ν, where g f (x) and ν is a zero mean random vector. 28/65

73 Stochastic Subgradient Methods Stochastic subgradient algorithms [29] are generalization of gradient one Uses noisy subgradient and more limited set of step size rules Definition: A noisy (unbiased) subgradient of f ( ) at x domf is defined as vector g, if f (y) f (x) + (E [ g]) T (y x) (38) for all y. This noisy subgradient can be written as g = g + ν, where g f (x) and ν is a zero mean random vector. If x in the problem is also random (specifically in resource allocation problems), the g is said to be noisy subgradient of f ( ) at x, if yf (y) f (x) + (E [ g x]) T (y x) (39) [29]Weng, Lingjie, and Yutian Chen. Stochastic Subgradient Methods. 28/65

74 Algorithm For unconstrained minimization of a convex function f : R n R, the stochastic subgradient update is given as x (t+1) = x (t) ɛ (t) g (t) (40) where, ɛ (t) > 0 is the t th step size, and g (t) is stochastic subgradient. [ E g (t) x (t)] = g (t) f (x (t) ) (41) 29/65

75 Algorithm For unconstrained minimization of a convex function f : R n R, the stochastic subgradient update is given as x (t+1) = x (t) ɛ (t) g (t) (40) where, ɛ (t) > 0 is the t th step size, and g (t) is stochastic subgradient. [ E g (t) x (t)] = g (t) f (x (t) ) (41) In this algorithm, similar to subgradient one, f t best := min{f (x (1) ), f (x (2) ),, f (x (t) )} (42) 29/65

76 Convergence analysis Assumptions: There exist a minimizer of f, say x. There exist G such that, E g (t) 2 G for all t. There exist R such that x (1) x R 2. Results: Convergence in Expectation E{f (t) best } f as t. (43) Convergence in Probability ( ) lim Prob f (t) t best f + α = 0 for any α > 0. (44) 30/65

77 ( E x (t+1) x 2 2 ) ( ) x(t) =E x (t) ɛ g (t) x 2 2 x(t) x (t) x 2 2 2ɛE + ɛ 2 G 2 = x (t) x 2 2 2ɛ ( g (t)t (x (t) x ) ( f (x (t) ) f ) + ɛ 2 G 2 (45) ) x(t) The above inequality holds almost surely. Now, take expectation to get ( ) ( E x (t+1) x 2 2 E x (t) x 2 2 2ɛ Ef (x (t) ) f ) + ɛ 2 G 2 31/65

78 Taking the summation over t = 1, 2,, T, we get E{f (t) best } = E{ min f i=1,,t (x(i) )} R2 + ɛ 2 TG 2 2ɛT (46) 32/65

79 Taking the summation over t = 1, 2,, T, we get E{f (t) best } = E{ min f i=1,,t (x(i) )} R2 + ɛ 2 TG 2 2ɛT (46) For convergence in probability, use the Markov s inequality ( ) Prob f (t) best f α E(f (t) best f ) α The RHS goes to zero as t, so the LHS as well. for any α > 0. (47) 32/65

80 Incremental Stochastic SubGradient Algorithms [30] A problem of recent interest in distributed networks is To design decentralized algorithms to minimize a sum of functions With each component function is known only to a particular agent 33/65

81 Incremental Stochastic SubGradient Algorithms [30] A problem of recent interest in distributed networks is To design decentralized algorithms to minimize a sum of functions With each component function is known only to a particular agent Consider a network of m agents, indexed by i = 1, 2,, m. The aim is to solve the following optimization problem: minimize f (x) = subject to x X m f i (x) (48) i=1 33/65

82 Incremental Stochastic SubGradient Algorithms [30] A problem of recent interest in distributed networks is To design decentralized algorithms to minimize a sum of functions With each component function is known only to a particular agent Consider a network of m agents, indexed by i = 1, 2,, m. The aim is to solve the following optimization problem: minimize f (x) = subject to x X m f i (x) (48) i=1 x R n is the decision parameter vector 33/65

83 Incremental Stochastic SubGradient Algorithms [30] A problem of recent interest in distributed networks is To design decentralized algorithms to minimize a sum of functions With each component function is known only to a particular agent Consider a network of m agents, indexed by i = 1, 2,, m. The aim is to solve the following optimization problem: minimize f (x) = subject to x X m f i (x) (48) i=1 x R n is the decision parameter vector X is the closed and convex subset of R n 33/65

84 Incremental Stochastic SubGradient Algorithms [30] A problem of recent interest in distributed networks is To design decentralized algorithms to minimize a sum of functions With each component function is known only to a particular agent Consider a network of m agents, indexed by i = 1, 2,, m. The aim is to solve the following optimization problem: minimize f (x) = subject to x X m f i (x) (48) i=1 x R n is the decision parameter vector X is the closed and convex subset of R n function f i is a convex function from R n to R known to only agent i [30]Ram SS, Nedic A, Veeravalli VV. Incremental stochastic subgradient algorithms for convex optimization. in SIAM /65

85 Algorithms Cyclic incremental subgradient algorithm in a network where agents are connected in a directed ring structure, the updates are z 0,t+1 = z m,t = x t (49) ] z i,t+1 = P X [z i 1,t+1 α (t+1) ( f i (z i 1,t+1 + ɛ i,t+1 )) (50) 34/65

86 Randomized incremental subgradient algorithm In this algorithm, agent i that updates is selected randomly according to a distribution. Formally the updates are x (t+1) = P X [x(t) ( )] α (t+1) f s(t+1) (x (t+1) + ɛ s(t+1),t+1 ) (51) The integer s(t + 1) is the index of the agent that performs the update at time t /65

87 Convergence analysis Assumptions: The Set X R n is closed and convex. The function f i : R n R is convex for each i {1, 2,, m}. 36/65

88 Convergence analysis Assumptions: The Set X R n is closed and convex. The function f i : R n R is convex for each i {1, 2,, m}. There exists scalar sequences µ t and ν t such that 36/65

89 Convergence analysis Assumptions: The Set X R n is closed and convex. The function f i : R n R is convex for each i {1, 2,, m}. There exists scalar sequences µ t and ν t such that E [ ɛ i,t Ft i 1 ] µt (52) E [ ɛ i,t 2 Ft i 1 ] ν 2 t (53) 36/65

90 Convergence analysis Assumptions: The Set X R n is closed and convex. The function f i : R n R is convex for each i {1, 2,, m}. There exists scalar sequences µ t and ν t such that E [ ɛ i,t Ft i 1 ] µt (52) E [ ɛ i,t 2 Ft i 1 ] ν 2 t (53) For every i, f i (x) C i, for all x X. 36/65

91 Results (constant step size, α) 37/65

92 Results (constant step size, α) Bound on Iterates: E [ x (t+1) y 2 F m t ] x(t) y 2 Ft m 2α (f (x t ) f (y)) m + 2ɛµ (t+1) E [ z i 1,(t+1) y 2 Ft m ] i=1 ) 2 m + α (mν 2 (t+1) + C i (54) i=1 37/65

93 Results (constant step size, α) Bound on Iterates: E [ x (t+1) y 2 F m t ] x(t) y 2 Ft m 2α (f (x t ) f (y)) m + 2ɛµ (t+1) E [ z i 1,(t+1) y 2 Ft m ] i=1 ) 2 m + α (mν 2 (t+1) + C i (54) i=1 Error bound: inf f (x (t)) f + mµ max x y + α t 0 x,y X 2 with probability 1. ( mν + ) 2 m C i (55) i=1 37/65

94 Ergodic Stochastic Optimization Algorithm [31] Stochastic resource allocation problem (x, {p t } t N ) = arg max f 0 (x) (56) s. t. E [s t (p t, x)] 0 (57) x X, p t P t (58) f 0 : R n R is concave function. s t : R n R p R k is a random function, indexed by time t. x R n is the optimization variable. p t R p is the online policy to be determined for all t N. [31]Ribeiro A. Ergodic stochastic optimization algorithms for wireless communication and networking. IEEE TSP, Dec /65

95 ESO algorithm The ergodic stochastic optimization (ESO) algorithm consist of iterative application of following steps: Primal iteration: (x t, p t ) = arg max x X,p P t f 0 (x) + λ T t s t (p, x) (59) 39/65

96 ESO algorithm The ergodic stochastic optimization (ESO) algorithm consist of iterative application of following steps: Primal iteration: (x t, p t ) = arg max x X,p P t f 0 (x) + λ T t s t (p, x) (59) Dual iteration: λ t+1 = [λ t ɛs t (x t, p t )] + (60) 39/65

97 Convergence results The following asymptotic results are established in [31]. 40/65

98 Convergence results The following asymptotic results are established in [31]. Almost sure feasibility 1 t 1 lim s τ (p τ, x τ ) 0 (61) t t τ=0 40/65

99 Convergence results The following asymptotic results are established in [31]. Almost sure feasibility Almost sure near optimality 1 t 1 lim s τ (p τ, x τ ) 0 (61) t t τ=0 G 2 lim f ( x t) P ɛ t 2 (62) 40/65

100 Recent results A recent technique for large scale machine learning problems is proposed in [32] The decentralized double stochastic averaging gradient (DSA) algorithm is proposed as a solution alternative that relies on: The use of local stochastic averaging gradients. Determination of descent steps as differences of consecutive stochastic averaging gradients. Strong convexity of local functions and Lipschitz continuity of local gradients is shown to guarantee linear convergence of the sequence generated by DSA in expectation. 41/65

101 Recent results A recent technique for large scale machine learning problems is proposed in [32] The decentralized double stochastic averaging gradient (DSA) algorithm is proposed as a solution alternative that relies on: The use of local stochastic averaging gradients. Determination of descent steps as differences of consecutive stochastic averaging gradients. Strong convexity of local functions and Lipschitz continuity of local gradients is shown to guarantee linear convergence of the sequence generated by DSA in expectation. Future scope: 41/65

102 Recent results A recent technique for large scale machine learning problems is proposed in [32] The decentralized double stochastic averaging gradient (DSA) algorithm is proposed as a solution alternative that relies on: The use of local stochastic averaging gradients. Determination of descent steps as differences of consecutive stochastic averaging gradients. Strong convexity of local functions and Lipschitz continuity of local gradients is shown to guarantee linear convergence of the sequence generated by DSA in expectation. Future scope:the almost sure convergence rate results for stochastic dual descent are not there [32] Mokhtari A, Ribeiro A. DSA: Decentralized Double Stochastic Averaging Gradient Algorithm. arxiv preprint arxiv: /65

103 Applications Resource allocation in OFDM networks [26, 27, 28] 42/65

104 Applications Resource allocation in OFDM networks [26, 27, 28] Load shedding in smart grid with real time pricing (RTP) [33, 34] [27] Wang X, Gao N. Stochastic resource allocation over fading multiple access and broadcast channels. IEEE TIT, [26]Wang X, Giannakis GB. Resource allocation for wireless multiuser OFDM networks. IEEE TIT, Jul [33]Gatsis N, Marques AG. A stochastic approximation approach to load shedding inpower networks. IEEE ICASSP, /65

105 Resource allocation in OFDM networks Modeling preliminaries 43/65

106 Utility based resource allocation Optimization problem max r r th,(α,p) F s.t. U( r) (63) [ K ] ˆr j E c j,k (αj,k, t pj,k) t, j assign µ j k=1 J K E P assign λ j=1 k=1 p t j,k 44/65

107 Explanation αj,k t pj,k t 0, is the time sharing fraction of a slot, average transmit power allocated J αj,k t 1, k = 1,, K. (64) j=1 assuming, AWGN at receiver with unit variance and sub-bandwidth = 1, the maximum achievable rate is { ( ) α cj,k t t = j,k log γt j,k pt j,k α, α t j,k t > 0 j,k 0, αj,k t = 0. (65) Set F is F := (α, p) α j,k 0, p j,k 0, J J K αj,k t 1, E j=1 j=1 k=1 p j,k P (66) 45/65

108 Utility based resource allocation Optimization problem max r r th,(α,p) F s.t. U( r) (67) [ K ] ˆr j E c j,k (αj,k, t pj,k) t, j assign µ j k=1 J K E P assign λ j=1 k=1 p t j,k 46/65

109 Offline solution After calculating the Lagrangian, it results in two separate primal subproblems (across r and (α, p)) as follows; Subproblem I: max U( r) µ T r (68) r r th 47/65

110 Offline solution After calculating the Lagrangian, it results in two separate primal subproblems (across r and (α, p)) as follows; Subproblem I: max U( r) µ T r (68) r r th Subproblem II: J K max λp + E µ j cj,k(α t j,k, t pj,k) t λpj,k t (69) (α,p) F j=1 k=1 its solution provides, ˆr (µ) and {(α (λ, µ)), p (λ, µ), j, k}. 47/65

111 Dual optimal using projected gradient algorithm The dual iterations are as follows: K λ[i + 1] = λ[i] + β E µ j [i + 1] = [ µ j [i] + β ( k=1 j=1 r j (µ[i]) E J p t (λ[i], µ[i]) P [ K k=1 + (70) r t j,k(λ[i], µ[i])])] + (71) 48/65

112 Online version Primal updates: with γ[t], ˆλ[t] and µ[t] ˆ available per slot, the AP schedules according to allocation α t (ˆλ[t], ˆµ[t], γ[t]) and p t (ˆλ[t], ˆµ[t], γ[t]) Dual updates: K ˆλ[t + 1] = ˆλ[t] + β ˆµ[t + 1] = [ ˆµ[t] + β ( k=1 j=1 J p t (ˆλ[t], ˆµ[t]) P r j (ˆµ[t]) K k=1 + (72) r t j,k(ˆλ[t], ˆµ[t]))] + (73) 49/65

113 Future work Almost sure convergence results for the Incremental stochastic algorithms. Convergence results for the Cyclo-stationary case are not available. 50/65

114 Thank you 51/65

115 References [1] Boyd S, Vandenberghe L. Convex optimization. Cambridge university press; 2004 Mar 8. [2] Tseng P. On accelerated proximal gradient methods for convex-concave optimization Submitted to SIAM J. Optim [3] Beck A, Nedic A, Ozdaglar A, Teboulle M. An Gradient Method for Network Resource Allocation Problems. IEEE TCNS [4] Boyd S, Mutapcic A. Subgradient methods. Lecture notes of EE364b, Stanford University, Winter Quarter. 2006;2007. [5] Boyd S, Xiao L, Mutapcic A. Subgradient methods. lecture notes of EE392o, Stanford University, Autumn Quarter Oct 1;2004: [6] Shor NZ. Minimization Methods for Non-differentiable Functions. Springer Series in Computational Mathematics. Springer, /65

116 References [7] Shor NZ. Nondifferentiable Optimization and Polynomial Problems. Springer Science & Business Media; [8] Blair C. Problem Complexity and Method Efficiency in Optimization (AS Nemirovsky and DB Yudin). SIAM Review [9] Palomar DP, Chiang M. A tutorial on decomposition methods for network utility maximization. IEEE JSAC [10] Kelly F. Charging and rate control for elastic traffic. European transactions on Telecommunications Jan 1. [11] Bertsekas DP, Nedi A, Ozdaglar AE. Convex analysis and optimization. [12] Nedic A, Ozdaglar A. Approximate primal solutions and rate analysis for dual subgradient methods. SIAM Journal on Opt [13] Zargham M, Ribeiro A, Ozdaglar A, Jadbabaie A. Accelerated dual descent for network flow optimization. IEEE TAC /65

117 References [14] Robbins H, Monro S. A stochastic approximation method. The annals of mathematical statistics. Sep [15] Widrow B, Stearns SD. Adaptive signal processing. Englewood Cliffs, NJ, Prentice-Hall, Inc., [16] Sayed AH. Adaptive filters. John Wiley & Sons; Oct [17] Bertsekas, D.P., Nonlinear programmingm, [18] Ermoliev Y. stochastic quasigradient methods and their application to system optimization. Stochastics: An International Journal of Probability and Stochastic Processes Jan. [19] Bertsekas DP, Tsitsiklis JN. Neuro-dynamic programming: an overview. InDecision and Control, 1995., Proceedings of the 34th IEEE Conference on 1995 Dec 13. [20] Bottou L. Stochastic gradient learning in neural networks. Proceedings of Neuro-Nmes /65

118 References [21] Kushner HJ, Yang J. Analysis of adaptive step size SA algorithms for parameter tracking., Proceedings of the 33rd IEEE Conference on Decision and Control, [22] Bottou L. Large-scale machine learning with stochastic gradient descent. InProceedings of COMPSTAT Jan 1 (pp ). Physica-Verlag HD. [23] Moulines E, Bach FR. Non-asymptotic analysis of stochastic approximation algorithms for machine learning. InAdvances in Neural Information Processing Systems 2011 (pp ). [24] Alaei S, Hajiaghayi M, Liaghat V. The online stochastic generalized assignment problem. InApproximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques 2013 Jan 1 (pp ). Springer Berlin Heidelberg. [25] Legrain A, Jaillet P. Stochastic online bipartite resource allocation problems. CIRRELT; 2013 Jun. 55/65

119 References [26] Wang X, Giannakis GB. Resource allocation for wireless multiuser OFDM networks. IEEE Transactions on Information Theory, Jul [27] Wang X, Gao N. Stochastic resource allocation over fading multiple access and broadcast channels. IEEE Transactions on Information Theory, [28] Ribeiro A. Optimal resource allocation in wireless communication and networking. EURASIP Journal on Wireless Communications and Networking, Dec [29] Weng, Lingjie, and Yutian Chen. Stochastic Subgradient Methods. [30] Ram SS, Nedic A, Veeravalli VV. Incremental stochastic subgradient algorithms for convex optimization. SIAM Journal on Optimization. Jun /65

120 References [31] Ribeiro A. Ergodic stochastic optimization algorithms for wireless communication and networking. IEEE Transactions on Signal Processing, Dec [32] Mokhtari A, Ribeiro A. DSA: Decentralized Double Stochastic Averaging Gradient Algorithm. arxiv preprint arxiv: Jun 13. [33] Gatsis N, Marques AG. A stochastic approximation approach to load shedding inpower networks. IEEE International Conference in Acoustics, Speech and Signal Processing (ICASSP), [34] Deng R, Yang Z, Chen J, Chow MY. Load scheduling with price uncertainty and temporally-coupled constraints in smart grids. IEEE Transactions on Power Systems, Nov /65

121 Load Shedding in Smart Grid System model Demand Energy User 1 Renewable source Grid Price Load serving entity (LSE) User 2 User 3 User K Energy storage (battery) 58/65

122 System variables System parameters: π t : actual demand - procured energy w t : produced renewable energy at slot t a t : real time energy price at slot t r t : state of charge of battery at the end of slot t 59/65

123 System variables System parameters: π t : actual demand - procured energy w t : produced renewable energy at slot t a t : real time energy price at slot t r t : state of charge of battery at the end of slot t Optimization variables: s t k: amount of load shedded for user k at slot t b t : amount of energy bought at slot t e t out: energy discharged from battery at slot t [33] Gatsis N, Marques AG. A stochastic approximation approach to load shedding inpower networks. IEEE (ICASSP), [34] Deng R, Yang Z. Load scheduling with price uncertainty and temporally-coupled constraints in smart grids. IEEE TPS, /65

124 Problem formulation The system variables must satisfy the following relation K π t w t sk t + b t + η dis eout t (74) k=1 60/65

125 Problem formulation The system variables must satisfy the following relation K π t w t sk t + b t + η dis eout t (74) k=1 Battery dynamics equation r t = t t 1 + e t in e t out, t = 1,, T (75a) 0 r t R, t = 1,, T (75b) ein t =η ch min{ein max, [π t w t ]}; 0 eout t eout max (75c) where, [x] := max{ x, 0}. 60/65

126 Optimization problem min K {s t },{b t },{eout t },{ŝ k } k=1 subject to lim T lim T 1 J k (ŝ k ) + lim T T T a t b t t=1 (76a) T sk t ŝ k, k = 1,, K, assign σ k (76b) t=1 T t=1 e t out = lim T T ein, t assign λ (76c) t=1 (74), (75), 0 b t b max, t (76d) 0 sk t sk max, t&k (76e) 61/65

127 Optimization problem min K {s t },{b t },{eout t },{ŝ k } k=1 subject to lim T lim T 1 J k (ŝ k ) + lim T T T a t b t t=1 (77a) T sk t ŝ k, k = 1,, K, assign σ k (77b) t=1 T t=1 e t out = lim T T ein, t assign λ (77c) t=1 (74), (75), 0 b t b max, t (77d) 0 sk t sk max, t&k (77e) 62/65

128 Offline solution average primal variables ŝ k (σ k ) = arg min s {J k (s) σ k s} (78) 63/65

129 Offline solution average primal variables ŝ k (σ k ) = arg min s {J k (s) σ k s} (78) instantaneous primal variables 63/65

130 Offline solution average primal variables ŝ k (σ k ) = arg min s {J k (s) σ k s} (78) instantaneous primal variables If π t w t 0, then there is an instantaneous extra energy, to be stored in battery. ein t = η ch min{ein max, w t π t }, while eout( ), t b t ( ), sk t ( ) are 0, k. 63/65

131 Offline solution average primal variables ŝ k (σ k ) = arg min s {J k (s) σ k s} (78) instantaneous primal variables If π t w t 0, then there is an instantaneous extra energy, to be stored in battery. ein t = η ch min{ein max, w t π t }, while eout( ), t b t ( ), sk t ( ) are 0, k. If π t w t > 0, there is instantaneous energy deficit, optimization variables are found by solving min s t k,bt,e t out S a t b t + ρe t out + K σ k sk t (79) k=1 subj. to π t w t b t + η dis e t out + K sk t (80) k=1 63/65

132 Online version Challenges with offline technique: main obstacle is to find optimal σk, ρ knowledge if joint distribution of {w t, a t } is required algorithm will be computationally expensive Merits of stochastic approximation only current samples σk t, at is required computationally efficient 64/65

133 Online algorithm With µ σ > 0 and µ ρ > 0 denoting constant step size, Dual updates: ρ t+1 = [ ρ t µ ρ ( ηch e t in (d t ) e t out(d t ) )] + σ t+1 k = [ σ t k + µ σ (ŝk (σ t k) s t k (d t ) )] + (81) (82) primal variables are calculated similar to offline manner by replacing ρ with ρ t and σ k with σ t k. 65/65

Asynchronous Non-Convex Optimization For Separable Problem

Asynchronous Non-Convex Optimization For Separable Problem Asynchronous Non-Convex Optimization For Separable Problem Sandeep Kumar and Ketan Rajawat Dept. of Electrical Engineering, IIT Kanpur Uttar Pradesh, India Distributed Optimization A general multi-agent

More information

BLOCK ALTERNATING OPTIMIZATION FOR NON-CONVEX MIN-MAX PROBLEMS: ALGORITHMS AND APPLICATIONS IN SIGNAL PROCESSING AND COMMUNICATIONS

BLOCK ALTERNATING OPTIMIZATION FOR NON-CONVEX MIN-MAX PROBLEMS: ALGORITHMS AND APPLICATIONS IN SIGNAL PROCESSING AND COMMUNICATIONS BLOCK ALTERNATING OPTIMIZATION FOR NON-CONVEX MIN-MAX PROBLEMS: ALGORITHMS AND APPLICATIONS IN SIGNAL PROCESSING AND COMMUNICATIONS Songtao Lu, Ioannis Tsaknakis, and Mingyi Hong Department of Electrical

More information

Stochastic Subgradient Method

Stochastic Subgradient Method Stochastic Subgradient Method Lingjie Weng, Yutian Chen Bren School of Information and Computer Science UC Irvine Subgradient Recall basic inequality for convex differentiable f : f y f x + f x T (y x)

More information

ADMM and Fast Gradient Methods for Distributed Optimization

ADMM and Fast Gradient Methods for Distributed Optimization ADMM and Fast Gradient Methods for Distributed Optimization João Xavier Instituto Sistemas e Robótica (ISR), Instituto Superior Técnico (IST) European Control Conference, ECC 13 July 16, 013 Joint work

More information

Dual Decomposition.

Dual Decomposition. 1/34 Dual Decomposition http://bicmr.pku.edu.cn/~wenzw/opt-2017-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/34 1 Conjugate function 2 introduction:

More information

Primal/Dual Decomposition Methods

Primal/Dual Decomposition Methods Primal/Dual Decomposition Methods Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2018-19, HKUST, Hong Kong Outline of Lecture Subgradients

More information

Subgradient Methods in Network Resource Allocation: Rate Analysis

Subgradient Methods in Network Resource Allocation: Rate Analysis Subgradient Methods in Networ Resource Allocation: Rate Analysis Angelia Nedić Department of Industrial and Enterprise Systems Engineering University of Illinois Urbana-Champaign, IL 61801 Email: angelia@uiuc.edu

More information

Stochastic Composition Optimization

Stochastic Composition Optimization Stochastic Composition Optimization Algorithms and Sample Complexities Mengdi Wang Joint works with Ethan X. Fang, Han Liu, and Ji Liu ORFE@Princeton ICCOPT, Tokyo, August 8-11, 2016 1 / 24 Collaborators

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

Adaptive Distributed Algorithms for Optimal Random Access Channels

Adaptive Distributed Algorithms for Optimal Random Access Channels Forty-Eighth Annual Allerton Conference Allerton House, UIUC, Illinois, USA September 29 - October, 2 Adaptive Distributed Algorithms for Optimal Random Access Channels Yichuan Hu and Alejandro Ribeiro

More information

9. Dual decomposition and dual algorithms

9. Dual decomposition and dual algorithms EE 546, Univ of Washington, Spring 2016 9. Dual decomposition and dual algorithms dual gradient ascent example: network rate control dual decomposition and the proximal gradient method examples with simple

More information

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method L. Vandenberghe EE236C (Spring 2016) 1. Gradient method gradient method, first-order methods quadratic bounds on convex functions analysis of gradient method 1-1 Approximate course outline First-order

More information

Incremental Gradient, Subgradient, and Proximal Methods for Convex Optimization

Incremental Gradient, Subgradient, and Proximal Methods for Convex Optimization Incremental Gradient, Subgradient, and Proximal Methods for Convex Optimization Dimitri P. Bertsekas Laboratory for Information and Decision Systems Massachusetts Institute of Technology February 2014

More information

A SIMPLE PARALLEL ALGORITHM WITH AN O(1/T ) CONVERGENCE RATE FOR GENERAL CONVEX PROGRAMS

A SIMPLE PARALLEL ALGORITHM WITH AN O(1/T ) CONVERGENCE RATE FOR GENERAL CONVEX PROGRAMS A SIMPLE PARALLEL ALGORITHM WITH AN O(/T ) CONVERGENCE RATE FOR GENERAL CONVEX PROGRAMS HAO YU AND MICHAEL J. NEELY Abstract. This paper considers convex programs with a general (possibly non-differentiable)

More information

6. Proximal gradient method

6. Proximal gradient method L. Vandenberghe EE236C (Spring 2016) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping

More information

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems)

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Donghwan Kim and Jeffrey A. Fessler EECS Department, University of Michigan

More information

Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method

Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method Davood Hajinezhad Iowa State University Davood Hajinezhad Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method 1 / 35 Co-Authors

More information

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44 Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)

More information

Enhanced Fritz John Optimality Conditions and Sensitivity Analysis

Enhanced Fritz John Optimality Conditions and Sensitivity Analysis Enhanced Fritz John Optimality Conditions and Sensitivity Analysis Dimitri P. Bertsekas Laboratory for Information and Decision Systems Massachusetts Institute of Technology March 2016 1 / 27 Constrained

More information

Primal Solutions and Rate Analysis for Subgradient Methods

Primal Solutions and Rate Analysis for Subgradient Methods Primal Solutions and Rate Analysis for Subgradient Methods Asu Ozdaglar Joint work with Angelia Nedić, UIUC Conference on Information Sciences and Systems (CISS) March, 2008 Department of Electrical Engineering

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 29, 2016 Outline Convex vs Nonconvex Functions Coordinate Descent Gradient Descent Newton s method Stochastic Gradient Descent Numerical Optimization

More information

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem: CDS270 Maryam Fazel Lecture 2 Topics from Optimization and Duality Motivation network utility maximization (NUM) problem: consider a network with S sources (users), each sending one flow at rate x s, through

More information

A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization

A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization Panos Parpas Department of Computing Imperial College London www.doc.ic.ac.uk/ pp500 p.parpas@imperial.ac.uk jointly with D.V.

More information

Stochastic Subgradient Methods

Stochastic Subgradient Methods Stochastic Subgradient Methods Stephen Boyd and Almir Mutapcic Notes for EE364b, Stanford University, Winter 26-7 April 13, 28 1 Noisy unbiased subgradient Suppose f : R n R is a convex function. We say

More information

Convex Optimization Lecture 16

Convex Optimization Lecture 16 Convex Optimization Lecture 16 Today: Projected Gradient Descent Conditional Gradient Descent Stochastic Gradient Descent Random Coordinate Descent Recall: Gradient Descent (Steepest Descent w.r.t Euclidean

More information

1 Problem Formulation

1 Problem Formulation Book Review Self-Learning Control of Finite Markov Chains by A. S. Poznyak, K. Najim, and E. Gómez-Ramírez Review by Benjamin Van Roy This book presents a collection of work on algorithms for learning

More information

Lagrange Relaxation: Introduction and Applications

Lagrange Relaxation: Introduction and Applications 1 / 23 Lagrange Relaxation: Introduction and Applications Operations Research Anthony Papavasiliou 2 / 23 Contents 1 Context 2 Applications Application in Stochastic Programming Unit Commitment 3 / 23

More information

Constrained Optimization and Lagrangian Duality

Constrained Optimization and Lagrangian Duality CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may

More information

Solving Dual Problems

Solving Dual Problems Lecture 20 Solving Dual Problems We consider a constrained problem where, in addition to the constraint set X, there are also inequality and linear equality constraints. Specifically the minimization problem

More information

LARGE-SCALE NONCONVEX STOCHASTIC OPTIMIZATION BY DOUBLY STOCHASTIC SUCCESSIVE CONVEX APPROXIMATION

LARGE-SCALE NONCONVEX STOCHASTIC OPTIMIZATION BY DOUBLY STOCHASTIC SUCCESSIVE CONVEX APPROXIMATION LARGE-SCALE NONCONVEX STOCHASTIC OPTIMIZATION BY DOUBLY STOCHASTIC SUCCESSIVE CONVEX APPROXIMATION Aryan Mokhtari, Alec Koppel, Gesualdo Scutari, and Alejandro Ribeiro Department of Electrical and Systems

More information

Distributed online optimization over jointly connected digraphs

Distributed online optimization over jointly connected digraphs Distributed online optimization over jointly connected digraphs David Mateos-Núñez Jorge Cortés University of California, San Diego {dmateosn,cortes}@ucsd.edu Mathematical Theory of Networks and Systems

More information

Trade-Offs in Distributed Learning and Optimization

Trade-Offs in Distributed Learning and Optimization Trade-Offs in Distributed Learning and Optimization Ohad Shamir Weizmann Institute of Science Includes joint works with Yossi Arjevani, Nathan Srebro and Tong Zhang IHES Workshop March 2016 Distributed

More information

Stochastic Dynamic Pricing: Utilizing Demand Response in an Adaptive Manner

Stochastic Dynamic Pricing: Utilizing Demand Response in an Adaptive Manner 53rd IEEE Conference on Decision and Control December 15-17, 2014. Los Angeles, California, USA Stochastic Dynamic Pricing: Utilizing Demand Response in an Adaptive Manner Wenyuan Tang Rahul Jain Ram Rajagopal

More information

Analysis of Coupling Dynamics for Power Systems with Iterative Discrete Decision Making Architectures

Analysis of Coupling Dynamics for Power Systems with Iterative Discrete Decision Making Architectures Analysis of Coupling Dynamics for Power Systems with Iterative Discrete Decision Making Architectures Zhixin Miao Department of Electrical Engineering, University of South Florida, Tampa FL USA 3362. Email:

More information

Alternative Decompositions for Distributed Maximization of Network Utility: Framework and Applications

Alternative Decompositions for Distributed Maximization of Network Utility: Framework and Applications Alternative Decompositions for Distributed Maximization of Network Utility: Framework and Applications Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization

More information

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions International Journal of Control Vol. 00, No. 00, January 2007, 1 10 Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions I-JENG WANG and JAMES C.

More information

Optimization Tutorial 1. Basic Gradient Descent

Optimization Tutorial 1. Basic Gradient Descent E0 270 Machine Learning Jan 16, 2015 Optimization Tutorial 1 Basic Gradient Descent Lecture by Harikrishna Narasimhan Note: This tutorial shall assume background in elementary calculus and linear algebra.

More information

Practical Coding Schemes For Bandwidth Limited One-Way Communication Resource Allocation

Practical Coding Schemes For Bandwidth Limited One-Way Communication Resource Allocation 2016 IEEE 55th Conference on Decision and Control (CDC) ARIA Resort & Casino December 12-14, 2016, Las Vegas, USA Practical Coding Schemes For Bandwidth Limited One-Way Communication Resource Allocation

More information

Dual and primal-dual methods

Dual and primal-dual methods ELE 538B: Large-Scale Optimization for Data Science Dual and primal-dual methods Yuxin Chen Princeton University, Spring 2018 Outline Dual proximal gradient method Primal-dual proximal gradient method

More information

Proximal Minimization by Incremental Surrogate Optimization (MISO)

Proximal Minimization by Incremental Surrogate Optimization (MISO) Proximal Minimization by Incremental Surrogate Optimization (MISO) (and a few variants) Julien Mairal Inria, Grenoble ICCOPT, Tokyo, 2016 Julien Mairal, Inria MISO 1/26 Motivation: large-scale machine

More information

Convex Optimization. Ofer Meshi. Lecture 6: Lower Bounds Constrained Optimization

Convex Optimization. Ofer Meshi. Lecture 6: Lower Bounds Constrained Optimization Convex Optimization Ofer Meshi Lecture 6: Lower Bounds Constrained Optimization Lower Bounds Some upper bounds: #iter μ 2 M #iter 2 M #iter L L μ 2 Oracle/ops GD κ log 1/ε M x # ε L # x # L # ε # με f

More information

Optimization for Machine Learning

Optimization for Machine Learning Optimization for Machine Learning (Problems; Algorithms - A) SUVRIT SRA Massachusetts Institute of Technology PKU Summer School on Data Science (July 2017) Course materials http://suvrit.de/teaching.html

More information

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term; Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many

More information

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST) Lagrange Duality Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2017-18, HKUST, Hong Kong Outline of Lecture Lagrangian Dual function Dual

More information

Ergodic Stochastic Optimization Algorithms for Wireless Communication and Networking

Ergodic Stochastic Optimization Algorithms for Wireless Communication and Networking University of Pennsylvania ScholarlyCommons Departmental Papers (ESE) Department of Electrical & Systems Engineering 11-17-2010 Ergodic Stochastic Optimization Algorithms for Wireless Communication and

More information

Sum-Power Iterative Watefilling Algorithm

Sum-Power Iterative Watefilling Algorithm Sum-Power Iterative Watefilling Algorithm Daniel P. Palomar Hong Kong University of Science and Technolgy (HKUST) ELEC547 - Convex Optimization Fall 2009-10, HKUST, Hong Kong November 11, 2009 Outline

More information

Motivation Subgradient Method Stochastic Subgradient Method. Convex Optimization. Lecture 15 - Gradient Descent in Machine Learning

Motivation Subgradient Method Stochastic Subgradient Method. Convex Optimization. Lecture 15 - Gradient Descent in Machine Learning Convex Optimization Lecture 15 - Gradient Descent in Machine Learning Instructor: Yuanzhang Xiao University of Hawaii at Manoa Fall 2017 1 / 21 Today s Lecture 1 Motivation 2 Subgradient Method 3 Stochastic

More information

Learning with stochastic proximal gradient

Learning with stochastic proximal gradient Learning with stochastic proximal gradient Lorenzo Rosasco DIBRIS, Università di Genova Via Dodecaneso, 35 16146 Genova, Italy lrosasco@mit.edu Silvia Villa, Băng Công Vũ Laboratory for Computational and

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Lecture 8: Optimization Cho-Jui Hsieh UC Davis May 9, 2017 Optimization Numerical Optimization Numerical Optimization: min X f (X ) Can be applied

More information

A Low Complexity Algorithm with O( T ) Regret and Finite Constraint Violations for Online Convex Optimization with Long Term Constraints

A Low Complexity Algorithm with O( T ) Regret and Finite Constraint Violations for Online Convex Optimization with Long Term Constraints A Low Complexity Algorithm with O( T ) Regret and Finite Constraint Violations for Online Convex Optimization with Long Term Constraints Hao Yu and Michael J. Neely Department of Electrical Engineering

More information

NEW LAGRANGIAN METHODS FOR CONSTRAINED CONVEX PROGRAMS AND THEIR APPLICATIONS. Hao Yu

NEW LAGRANGIAN METHODS FOR CONSTRAINED CONVEX PROGRAMS AND THEIR APPLICATIONS. Hao Yu NEW LAGRANGIAN METHODS FOR CONSTRAINED CONVEX PROGRAMS AND THEIR APPLICATIONS by Hao Yu A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment

More information

Big Data Analytics: Optimization and Randomization

Big Data Analytics: Optimization and Randomization Big Data Analytics: Optimization and Randomization Tianbao Yang Tutorial@ACML 2015 Hong Kong Department of Computer Science, The University of Iowa, IA, USA Nov. 20, 2015 Yang Tutorial for ACML 15 Nov.

More information

6. Proximal gradient method

6. Proximal gradient method L. Vandenberghe EE236C (Spring 2013-14) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping

More information

On the Iteration Complexity of Some Projection Methods for Monotone Linear Variational Inequalities

On the Iteration Complexity of Some Projection Methods for Monotone Linear Variational Inequalities On the Iteration Complexity of Some Projection Methods for Monotone Linear Variational Inequalities Caihua Chen Xiaoling Fu Bingsheng He Xiaoming Yuan January 13, 2015 Abstract. Projection type methods

More information

Distributed Multiuser Optimization: Algorithms and Error Analysis

Distributed Multiuser Optimization: Algorithms and Error Analysis Distributed Multiuser Optimization: Algorithms and Error Analysis Jayash Koshal, Angelia Nedić and Uday V. Shanbhag Abstract We consider a class of multiuser optimization problems in which user interactions

More information

Parallel Coordinate Optimization

Parallel Coordinate Optimization 1 / 38 Parallel Coordinate Optimization Julie Nutini MLRG - Spring Term March 6 th, 2018 2 / 38 Contours of a function F : IR 2 IR. Goal: Find the minimizer of F. Coordinate Descent in 2D Contours of a

More information

12. Interior-point methods

12. Interior-point methods 12. Interior-point methods Convex Optimization Boyd & Vandenberghe inequality constrained minimization logarithmic barrier function and central path barrier method feasibility and phase I methods complexity

More information

Distributed Resource Allocation Using One-Way Communication with Applications to Power Networks

Distributed Resource Allocation Using One-Way Communication with Applications to Power Networks Distributed Resource Allocation Using One-Way Communication with Applications to Power Networks Sindri Magnússon, Chinwendu Enyioha, Kathryn Heal, Na Li, Carlo Fischione, and Vahid Tarokh Abstract Typical

More information

Stochastic Compositional Gradient Descent: Algorithms for Minimizing Nonlinear Functions of Expected Values

Stochastic Compositional Gradient Descent: Algorithms for Minimizing Nonlinear Functions of Expected Values Stochastic Compositional Gradient Descent: Algorithms for Minimizing Nonlinear Functions of Expected Values Mengdi Wang Ethan X. Fang Han Liu Abstract Classical stochastic gradient methods are well suited

More information

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen Numerisches Rechnen (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang Institut für Geometrie und Praktische Mathematik RWTH Aachen Wintersemester 2011/12 IGPM, RWTH Aachen Numerisches Rechnen

More information

On Projected Stochastic Gradient Descent Algorithm with Weighted Averaging for Least Squares Regression

On Projected Stochastic Gradient Descent Algorithm with Weighted Averaging for Least Squares Regression On Projected Stochastic Gradient Descent Algorithm with Weighted Averaging for Least Squares Regression arxiv:606.03000v [cs.it] 9 Jun 206 Kobi Cohen, Angelia Nedić and R. Srikant Abstract The problem

More information

Author copy. Do not redistribute.

Author copy. Do not redistribute. Author copy. Do not redistribute. DYNAMIC DECENTRALIZED VOLTAGE CONTROL FOR POWER DISTRIBUTION NETWORKS Hao Jan Liu, Wei Shi, and Hao Zhu Department of ECE and CSL, University of Illinois at Urbana-Champaign

More information

Selected Topics in Optimization. Some slides borrowed from

Selected Topics in Optimization. Some slides borrowed from Selected Topics in Optimization Some slides borrowed from http://www.stat.cmu.edu/~ryantibs/convexopt/ Overview Optimization problems are almost everywhere in statistics and machine learning. Input Model

More information

Adaptive Online Learning in Dynamic Environments

Adaptive Online Learning in Dynamic Environments Adaptive Online Learning in Dynamic Environments Lijun Zhang, Shiyin Lu, Zhi-Hua Zhou National Key Laboratory for Novel Software Technology Nanjing University, Nanjing 210023, China {zhanglj, lusy, zhouzh}@lamda.nju.edu.cn

More information

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison Optimization Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison optimization () cost constraints might be too much to cover in 3 hours optimization (for big

More information

Distributed Estimation via Dual Decomposition

Distributed Estimation via Dual Decomposition Proceedings of the European Control Conference 27 Kos, Greece, July 2 5, 27 TuC9.1 Distributed Estimation via Dual Decomposition Sikandar Samar Integrated Data Systems Department Siemens Corporate Research

More information

Transmission Schemes for Lifetime Maximization in Wireless Sensor Networks: Uncorrelated Source Observations

Transmission Schemes for Lifetime Maximization in Wireless Sensor Networks: Uncorrelated Source Observations Transmission Schemes for Lifetime Maximization in Wireless Sensor Networks: Uncorrelated Source Observations Xiaolu Zhang, Meixia Tao and Chun Sum Ng Department of Electrical and Computer Engineering National

More information

Distributed Smooth and Strongly Convex Optimization with Inexact Dual Methods

Distributed Smooth and Strongly Convex Optimization with Inexact Dual Methods Distributed Smooth and Strongly Convex Optimization with Inexact Dual Methods Mahyar Fazlyab, Santiago Paternain, Alejandro Ribeiro and Victor M. Preciado Abstract In this paper, we consider a class of

More information

Subgradient Method. Guest Lecturer: Fatma Kilinc-Karzan. Instructors: Pradeep Ravikumar, Aarti Singh Convex Optimization /36-725

Subgradient Method. Guest Lecturer: Fatma Kilinc-Karzan. Instructors: Pradeep Ravikumar, Aarti Singh Convex Optimization /36-725 Subgradient Method Guest Lecturer: Fatma Kilinc-Karzan Instructors: Pradeep Ravikumar, Aarti Singh Convex Optimization 10-725/36-725 Adapted from slides from Ryan Tibshirani Consider the problem Recall:

More information

Distributed Consensus Optimization

Distributed Consensus Optimization Distributed Consensus Optimization Ming Yan Michigan State University, CMSE/Mathematics September 14, 2018 Decentralized-1 Backgroundwhy andwe motivation need decentralized optimization? I Decentralized

More information

Unconstrained minimization of smooth functions

Unconstrained minimization of smooth functions Unconstrained minimization of smooth functions We want to solve min x R N f(x), where f is convex. In this section, we will assume that f is differentiable (so its gradient exists at every point), and

More information

Relative-Continuity for Non-Lipschitz Non-Smooth Convex Optimization using Stochastic (or Deterministic) Mirror Descent

Relative-Continuity for Non-Lipschitz Non-Smooth Convex Optimization using Stochastic (or Deterministic) Mirror Descent Relative-Continuity for Non-Lipschitz Non-Smooth Convex Optimization using Stochastic (or Deterministic) Mirror Descent Haihao Lu August 3, 08 Abstract The usual approach to developing and analyzing first-order

More information

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725 Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:

More information

Approximate Primal Solutions and Rate Analysis for Dual Subgradient Methods

Approximate Primal Solutions and Rate Analysis for Dual Subgradient Methods Approximate Primal Solutions and Rate Analysis for Dual Subgradient Methods Angelia Nedich Department of Industrial and Enterprise Systems Engineering University of Illinois at Urbana-Champaign 117 Transportation

More information

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization E5295/5B5749 Convex optimization with engineering applications Lecture 8 Smooth convex unconstrained and equality-constrained minimization A. Forsgren, KTH 1 Lecture 8 Convex optimization 2006/2007 Unconstrained

More information

Lecture 24 November 27

Lecture 24 November 27 EE 381V: Large Scale Optimization Fall 01 Lecture 4 November 7 Lecturer: Caramanis & Sanghavi Scribe: Jahshan Bhatti and Ken Pesyna 4.1 Mirror Descent Earlier, we motivated mirror descent as a way to improve

More information

Lecture 6: Conic Optimization September 8

Lecture 6: Conic Optimization September 8 IE 598: Big Data Optimization Fall 2016 Lecture 6: Conic Optimization September 8 Lecturer: Niao He Scriber: Juan Xu Overview In this lecture, we finish up our previous discussion on optimality conditions

More information

Beyond stochastic gradient descent for large-scale machine learning

Beyond stochastic gradient descent for large-scale machine learning Beyond stochastic gradient descent for large-scale machine learning Francis Bach INRIA - Ecole Normale Supérieure, Paris, France Joint work with Eric Moulines - October 2014 Big data revolution? A new

More information

Parallel Successive Convex Approximation for Nonsmooth Nonconvex Optimization

Parallel Successive Convex Approximation for Nonsmooth Nonconvex Optimization Parallel Successive Convex Approximation for Nonsmooth Nonconvex Optimization Meisam Razaviyayn meisamr@stanford.edu Mingyi Hong mingyi@iastate.edu Zhi-Quan Luo luozq@umn.edu Jong-Shi Pang jongship@usc.edu

More information

Dual Proximal Gradient Method

Dual Proximal Gradient Method Dual Proximal Gradient Method http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/19 1 proximal gradient method

More information

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE CONVEX ANALYSIS AND DUALITY Basic concepts of convex analysis Basic concepts of convex optimization Geometric duality framework - MC/MC Constrained optimization

More information

Dynamic Power Allocation and Routing for Time Varying Wireless Networks

Dynamic Power Allocation and Routing for Time Varying Wireless Networks Dynamic Power Allocation and Routing for Time Varying Wireless Networks X 14 (t) X 12 (t) 1 3 4 k a P ak () t P a tot X 21 (t) 2 N X 2N (t) X N4 (t) µ ab () rate µ ab µ ab (p, S 3 ) µ ab µ ac () µ ab (p,

More information

Decentralized Quadratically Approximated Alternating Direction Method of Multipliers

Decentralized Quadratically Approximated Alternating Direction Method of Multipliers Decentralized Quadratically Approimated Alternating Direction Method of Multipliers Aryan Mokhtari Wei Shi Qing Ling Alejandro Ribeiro Department of Electrical and Systems Engineering, University of Pennsylvania

More information

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014 Convex Optimization Dani Yogatama School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA February 12, 2014 Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12,

More information

arxiv: v1 [math.oc] 1 Jul 2016

arxiv: v1 [math.oc] 1 Jul 2016 Convergence Rate of Frank-Wolfe for Non-Convex Objectives Simon Lacoste-Julien INRIA - SIERRA team ENS, Paris June 8, 016 Abstract arxiv:1607.00345v1 [math.oc] 1 Jul 016 We give a simple proof that the

More information

Lecture: Adaptive Filtering

Lecture: Adaptive Filtering ECE 830 Spring 2013 Statistical Signal Processing instructors: K. Jamieson and R. Nowak Lecture: Adaptive Filtering Adaptive filters are commonly used for online filtering of signals. The goal is to estimate

More information

Convex Optimization of Graph Laplacian Eigenvalues

Convex Optimization of Graph Laplacian Eigenvalues Convex Optimization of Graph Laplacian Eigenvalues Stephen Boyd Abstract. We consider the problem of choosing the edge weights of an undirected graph so as to maximize or minimize some function of the

More information

Stochastic Variational Inference

Stochastic Variational Inference Stochastic Variational Inference David M. Blei Princeton University (DRAFT: DO NOT CITE) December 8, 2011 We derive a stochastic optimization algorithm for mean field variational inference, which we call

More information

Dynamic Network Utility Maximization with Delivery Contracts

Dynamic Network Utility Maximization with Delivery Contracts Dynamic Network Utility Maximization with Delivery Contracts N. Trichakis A. Zymnis S. Boyd August 31, 27 Abstract We consider a multi-period variation of the network utility maximization problem that

More information

Lasso: Algorithms and Extensions

Lasso: Algorithms and Extensions ELE 538B: Sparsity, Structure and Inference Lasso: Algorithms and Extensions Yuxin Chen Princeton University, Spring 2017 Outline Proximal operators Proximal gradient methods for lasso and its extensions

More information

Stochastic Optimization Algorithms Beyond SG

Stochastic Optimization Algorithms Beyond SG Stochastic Optimization Algorithms Beyond SG Frank E. Curtis 1, Lehigh University involving joint work with Léon Bottou, Facebook AI Research Jorge Nocedal, Northwestern University Optimization Methods

More information

Distributed Scheduling Algorithms for Optimizing Information Freshness in Wireless Networks

Distributed Scheduling Algorithms for Optimizing Information Freshness in Wireless Networks Distributed Scheduling Algorithms for Optimizing Information Freshness in Wireless Networks Rajat Talak, Sertac Karaman, and Eytan Modiano arxiv:803.06469v [cs.it] 7 Mar 208 Abstract Age of Information

More information

Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research

Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research Introduction to Machine Learning Lecture 7 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Convex Optimization Differentiation Definition: let f : X R N R be a differentiable function,

More information

High Order Methods for Empirical Risk Minimization

High Order Methods for Empirical Risk Minimization High Order Methods for Empirical Risk Minimization Alejandro Ribeiro Department of Electrical and Systems Engineering University of Pennsylvania aribeiro@seas.upenn.edu IPAM Workshop of Emerging Wireless

More information

An interior-point stochastic approximation method and an L1-regularized delta rule

An interior-point stochastic approximation method and an L1-regularized delta rule Photograph from National Geographic, Sept 2008 An interior-point stochastic approximation method and an L1-regularized delta rule Peter Carbonetto, Mark Schmidt and Nando de Freitas University of British

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Proximal and First-Order Methods for Convex Optimization

Proximal and First-Order Methods for Convex Optimization Proximal and First-Order Methods for Convex Optimization John C Duchi Yoram Singer January, 03 Abstract We describe the proximal method for minimization of convex functions We review classical results,

More information

Optimal Power Control in Decentralized Gaussian Multiple Access Channels

Optimal Power Control in Decentralized Gaussian Multiple Access Channels 1 Optimal Power Control in Decentralized Gaussian Multiple Access Channels Kamal Singh Department of Electrical Engineering Indian Institute of Technology Bombay. arxiv:1711.08272v1 [eess.sp] 21 Nov 2017

More information

Information geometry of mirror descent

Information geometry of mirror descent Information geometry of mirror descent Geometric Science of Information Anthea Monod Department of Statistical Science Duke University Information Initiative at Duke G. Raskutti (UW Madison) and S. Mukherjee

More information

On the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1,

On the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1, Math 30 Winter 05 Solution to Homework 3. Recognizing the convexity of g(x) := x log x, from Jensen s inequality we get d(x) n x + + x n n log x + + x n n where the equality is attained only at x = (/n,...,

More information