Universitatea Politehnica Bucureşti Facultatea de Automatică şi Calculatoare Departamentul de Automatică şi Ingineria Sistemelor

Size: px
Start display at page:

Download "Universitatea Politehnica Bucureşti Facultatea de Automatică şi Calculatoare Departamentul de Automatică şi Ingineria Sistemelor"

Transcription

1 Universitatea Politehnica Bucureşti Facultatea de Automatică şi Calculatoare Departamentul de Automatică şi Ingineria Sistemelor TEZĂ DE ABILITARE Metode de Descreştere pe Coordonate pentru Optimizare Rară (Coordinate Descent Methods for Sparse Optimization) Ion Necoară 013

2 . ii

3 Contents 1 Rezumat Contributiile acestei teze Principalele publicatii pe algoritmi de optimizare pe coordonate Summary 6.1 Contributions of this thesis Main publications on coordinate descent algorithms Random coordinate descent methods for linearly constrained smooth optimization Introduction Problem formulation Previous work Random block coordinate descent method Convergence rate in expectation Design of probabilities Comparison with full projected gradient method Convergence rate for strongly convex case Convergence rate in probability Random pairs sampling Generalizations Parallel coordinate descent algorithm Optimization problems with general equality constraints Applications Recovering approximate primal solutions from full dual gradient Random coordinate descent methods for singly linearly constrained smooth optimization Introduction Problem formulation Random block coordinate descent method Convergence rate in expectation Choices for probabilities Worst case analysis between (RCD) and full projected gradient Convergence rate in probability Convergence rate for strongly convex case Extensions Generalization of algorithm (RCD) to more than blocks Extension to different local norms iii

4 Contents iv 4.9 Numerical experiments Random coordinate descent method for linearly constrained composite optimization Introduction Problem formulation Previous work Random coordinate descent algorithm Convergence rate in expectation Convergence rate for strongly convex functions Convergence rate in probability Generalizations Complexity analysis and comparison with other approaches Numerical experiments Support vector machine Chebyshev center of a set of points Random generated problems with l 1 -regularization term Random coordinate descent methods for nonconvex composite optimization Introduction Unconstrained minimization of composite objective functions Problem formulation An 1-random coordinate descent algorithm Convergence Linear convergence for objective functions with error bound Constrained minimization of composite objective functions Problem formulation A -random coordinate descent algorithm Convergence Constrained minimization of smooth objective functions Numerical experiments Distributed random coordinate descent methods for composite optimization Introduction Problem formulation Motivating practical applications Distributed and parallel coordinate descent method Sublinear convergence for smooth convex minimization Linear convergence for error bound convex minimization Conditions for generalized error bound functions Case 1: f strongly convex and Ψ convex Case : Ψ indicator function of a polyhedral set Case 3: Ψ polyhedral function Case 4: dual formulation Convergence analysis under sparsity conditions Distributed implementation Comparison with other approaches Numerical simulations

5 Contents v 8 Parallel coordinate descent algorithm for separable constraints optimization: application to MPC Introduction Parallel coordinate descent algorithm (PCDM) for separable constraints minimization Parallel Block-Coordinate Descent Method Application of PCDM to distributed suboptimal MPC MPC for networked systems: terminal cost and no end constraints Distributed synthesis for a terminal cost Stability of the MPC scheme Distributed implementation of MPC scheme based on PCDM Numerical Results Quadruple tank process Implementation of the MPC scheme using MPI Implementation of the MPC scheme using Siemens S7-100 PLC Implementation of MPC scheme for random networked systems Future Work Huge-scale sparse optimization: theory, algorithms and applications Methodology and capacity to generate results Optimization based control for distributed networked systems Methodology and capacity to generate results Optimization based control for resource-constrained embedded systems Methodology and capacity to generate results Bibliography 174

6 Chapter 1 Rezumat 1.1 Contributiile acestei teze Principala problema de optimizare considerata in aceasta teza este de urmatoarea forma: min F (x) (= f(x) + Ψ(x)) (1.1) x Rn s.t.: Ax = b, unde f este o functie neteda (gradient Lipschitz), Ψ este o functie de regularizare simpla (minimizarea sumei dintre aceasta functie si una patratica este usoara) si matricea A R m n este de obicei rara data de structura unui graf asociat problemei. O alta caracteristica a problemei este dimensiunea foarte mare, adica n este de ordinul milioanelor sau miliardelor. Presupunem de asemenea ca variabila de decizie x poate fi descompusa in (blocuri) componente x = [x T 1 x T... x T N ]T, unde x i R n i si i n i = n. De notat este faptul ca acesta problema de optimizare este foarte generala si apare in multe aplicatii din inginerie: Ψ este functia indicator a unei multimi convexe X care poate fi scrisa de obicei ca un produs Cartezian X = X 1 X X N, unde X i R n i. Aceasta problema este cunoscuta sub numele de problema de optimizare separabila cu restrictii de cuplare liniare si apare in multe aplicatii din control si estimare distribuita [13,6,65,100,11], optimizare in retea [9,, 8, 98, 110, 11], computer vision [10, 44], etc. Ψ este fie functia indicator a unei multimi convexe X = X 1 X X N sau norma 1, notatata x 1 (pentru a obtine solutie rara) iar matricea A = 0. Aceasta problema apare in control predictiv distribuit [61, 103], procesare de imagine [14, 1, 47, 105], clasificare [99, 13, 14], data mining [16, 86, 119], etc. Ψ este functia indicator a unei multimi convexe X = X 1 X X N iar A = a T, adica avem o singura restrictie liniara de cuplare. Aceasta problema apare in ierarhizarea paginilor (problema Google) [59, 76], control [39, 83, 84, 104], invatare [16 18, 109, 111], truss topology design [4], etc. Se observa ca (1.1) se incadreaza in clasa de probleme de optimizare de mari dimensiuni cu date si/sau solutii rare. Abordarea standard pentru rezolvarea problemei de optimizare de dimensiuni foarte mari (1.1) se bazeaza pe descompunere. Metodele de descompunere reprezinta o unealta eficienta pentru rezolvarea acestui tip de problema datorita faptului ca acestea permit impartirea problemei originale de dimensiuni mari in subprobleme mici care sunt apoi coordonate de o 1

7 1.1 Contributiile acestei teze problema master. Metodele de descompunere se impart in doua clase: descompunere primala si duala. In metodele de descompunere primala problema originala este tratata direct, in timp ce in metodele duale restrictiile de cuplare sunt mutate in cost folosind multiplicatorii Lagrange, dupa care se rezolva problema duala. In activitatea mea de cercetare din ultimii 7 ani am dezvoltat si analizat algoritmi apartinand ambelor clase de metode de descompunere. Din cunostintele mele am fost printre primii cercetatori care au folosit tehnicile de netezire in descompunerea duala pentru a obtine rate de convergenta mai rapide pentru algoritmii duali propusi (vezi lucrarile [64, 65, 71, 7, 90, 91, 110]). Totusi, in aceasta teza am optat pentru prezentarea celor mai recente rezultate obtinute de mine pentru metodele de descompunere primala, si anume metodele de descrestere pe coordonate (vezi lucrarile [59 61, 65, 67, 70, 84]). Principalele contributii ale acestei teze, pe capitole, sunt urmatoarele: Capitol 3: In acest capitol dezvoltam metode aleatoare de descrestere pe coordonate pentru minimizarea problemelor de optimizare convexa de dimensiuni foarte mari supuse la constrangeri liniare de cuplare si avand functia obiectiv cu gradient Lipschitz pe coordonate. Deoarece avem constrangeri de cuplare in problema de optimizare, trebuie sa definim un algoritm care actualizeaza doua (blocuri) componente pe iteratie. Demonstram ca pentru aceste metode se obtine o ϵ-aproximativ solutie in valoarea medie a functiei obiectiv in cel mult O( 1 ) iteratii. Pe de alta parte, complexitatea numerica a fiecarei iteratii este mult mai mica ϵ decat a metodelor bazate pe intreg gradientul. Ne concentram de asemenea atentia pe alegerea optima a probabilitatilor pentru a face acesti algoritmi sa convearga rapid si demonstram ca aceasta conduce la rezolvarea unei probleme SDP rare si de mici dimensiuni. Analiza ratei de convergenta in probabilitate este de asemenea data in acest capitol. Pentru functii obiectiv tari convexe aratam ca noii algoritmi converg liniar. Extindem de asemena algoritmul principal, in care se actualizeaza o singura componenta (bloc), la un algorithm paralel in care se updateaza mai multe (blocuri de) componente pe iteratie si aratam ca pentru aceasta versiune paralela rata de convergenta depinde liniar de numarul de (blocuri) componente actualizate. Testele numerice confirma ca pe probleme de optimizare de largi dimensiuni, pentru care calcularea unei componente a gradientului este usoara din punct de vedere numeric, noile metode propuse sunt mult mai eficiente decat metodele bazate pe intreg gradientul. Acest capitol se bazeaza pe articolele [67, 68]. Capitol 4: In acest capitol dezvoltam metode aleatoare de descrestere pe coordonate pentru minimizarea problemelor de optimizare convexa multi-agent avand functia obiectiv cu gradient Lipschitz pe coordonate si cu o singura constrangere de cuplare. Datorita prezentei constrangerii de cuplare in problema de optimizare, algoritmii prezentati sunt de descrestere pe doua coordonate. Pentru astfel de metode demonstram ca in valoarea medie a functiei obiectiv putem obtine 1 o ϵ-aproximativ solutie in cel mult O( ) iteratii, unde λ λ (Q)ϵ (Q) este cea de-a doua valoare proprie a unei matrici Q definita in termeni de probabilitatile alese si numarul de blocuri. Pe de alta parte, complexitatea numerica per iteratie a metodelor noastre este mult mai mica decat a celor bazate pe intreg gradientul iar fiecare iteratie poate fi calculata intr-un mod distribuit. Analizam de asemenea posibilitatea alegerii optime a probabilitatilor si aratam ca aceasta analiza conduce la rezolvarea unei probleme SDP rare. Pentru metodele dezvoltate prezentam si ratele de convergenta in probabilitate. In cazul functiilor tari convexe aratam ca noii algoritmi au convergenta liniara. Prezentam de asemenea o versiune paralela a algoritmului principal, unde actualizam mai multe (blocuri de) componente pe iteratie, pentru care derivam de asemenea rata de convergenta. Algoritmii dezvoltati au fost implementati in Matlab pentru rezolvarea problemei Google iar rezultatele din simulari arata superioritatea acestora fata de metodele

8 1.1 Contributiile acestei teze 3 bazate pe informatie de intreg gradient. Acest capitol se bazeaza pe lucrarile [58, 59, 69]. Capitol 5: In acest capitol propunem o varianta a unei metode aleatoare de descrestere pe coordonate pentru rezolvarea problemelor de optimizare convexa cu funtia obiectiv de tip composite (compusa dintr-o functie convexa, cu gradient Lipschitz pe coordonate si o functie convexa, cu structura simpla, dar posibil nediferentiabila) si constrangeri liniare de cuplare. Daca partea neteda a functiei obiectiv are gradient Lipschitz pe coordonate, atunci metoda propusa alege aleator doua (blocuri) componente si obtine o ϵ-aproximativ solutie in valoarea medie a functiei obiectiv in O(N /ϵ) iteratii, unde N este numarul de (blocuri) componente. Pentru probleme de optimizare avand complexitate numerica mica pentru evaluarea unei componente a gradientului, metoda propusa este mai eficienta decat metodele bazate pe intreg gradientul. Analiza ratei de convergenta in probabilitate este de asemenea data in acest capitol. Pentru functii obiectiv tari convexe aratam ca noii algoritmi converg liniar. Algoritmul propus a fost implementat in cod C si testat pe date reale de SVM si pe problema gasirii centrului Chebyshev corespunzator unei multimi de puncte. Experimentele numerice confirma ca pe problemele de dimensiuni mari metoda noastra este mai eficienta decat metodele bazate pe intreg gradientul sau metodele greedy de descrestere pe coordonate. Acest capitol se bazeaza pe lucrarea [70]. Capitol 6: In acest capitol analizam noi metode aleatoare de descrestere pe coordonate pentru rezolvarea problemelor de optimizare neconvexe cu funtia obiectiv de tip composite: compusa dintr-o functie neconvexa dar cu gradient Lipschitz pe coordonate si o functie convexa, cu structura simpla, dar posibil nediferentiabila. De asemenea abordam ambele cazuri: neconstrans dar si cu constrangeri liniare de cuplare. Pentru problemele de optimizare cu structura definita mai sus, propunem metode aleatoare de descrestere pe coordonate si analizam proprietatile de convergenta ale acestora. In cazul general, demonstram pentru sirurile generate de noii algoritmi convergenta asimptotica la punctele stationare si rata de convergenta subliniara in valoarea medie a unei anumite functii masura de optimalitate. In plus, daca functia obiectiv satisface o anumita conditie de marginire a erorii de optimalitate, derivam convergenta locala liniara in valoarea medie a functiei obiectiv. Prezentam de asemenea experimente numerice pentru evaluarea performantelor practice ale algoritmilor propusi pe binecunoscuta problema de complementaritate a valorii proprii. Din experimentele numerice se observa ca pe problemele de dimensiuni mari metoda noastra este mai eficienta decat metodele bazate pe intreg gradientul. Acest capitol se bazeaza pe lucrarile [84, 85]. Capitol 7: In acest capitol propunem o versiune distribuita a unei metode aleatoare de descrestere pe coordonate pentru minimizarea unei functii obiectiv de tip composite: compusa dintr-o functie neteda convexa, partial separabila si una total separabila, convexa, dar posibil nediferentiabila. Sub ipoteza de gradient Lipschitz a partii netede, aceasta metoda are o rata de convergenta subliniara. Rata de convergenta liniara se obtine pentru o clasa nou introdusa de functii obiectiv ce satisface o conditie generalizata de marginire a erorii de optimalitate. Aratam ca in noua clasa de functii se regasesc functii deja studiate, cum ar fi clasa de functii tari convexe sau clasa de functii ce satisface conditia de marginire a erorii de optimalitate clasica. Demonstram de asemenea, ca estimarile teoretice ale ratelor de convergenta depind liniar de numarul de (blocuri) componente alese aleator si de o masura a separabilitatii functiei obiectiv. Algoritmul propus a fost implementat in cod C si testat pe problema lasso constransa. Experimentele numerice confirma ca prin paralelizare se poate accelera substantial rata de convergenta a metodei clasice de descrestere pe coordonate. Acest capitol se bazeaza pe lucrarea [60].

9 1. Principalele publicatii pe algoritmi de optimizare pe coordonate 4 Capitol 8: In acest capitol propunem un algoritm paralel de descrestere pe coordonate pentru rezolvarea problemelor de optimizare convexa cu restrictii separabile ce pot aparea de exemplu in controlul predictiv distribuit bazat pe model (MPC) pentru sisteme liniare de tip retea. Algoritmul nostru se bazeaza pe actualizarea in paralel pe coordonate si are iteratia foarte simpla. Demonstram rata de convergenta liniara (subliniara) pentru sirul generat de noul algoritm sub ipoteze standard pentru functia obiectiv. Mai mult, algoritmul foloseste informatie locala pentru actualizarea componentelor variabilei de decizie si astfel este adecvat pentru implementare distribuita. Avand, de asemenea complexitatea iteratiei mica, este potrivit pentru controlul de tip embedded. Propunem o metoda de control de tip MPC bazata pe acest algoritm, pentru care fiecare subsistem din retea poate calcula intrari fezabile si stabilizatoare folosind calcule ieftine si distribuite. Metoda de control propusa a fost implementata pe un PLC Siemens in scopul controlului unei instalatii reale cu patru rezervoare. Acest capitol se bazeaza pe lucrarea [61]. 1. Principalele publicatii pe algoritmi de optimizare pe coordonate Rezultatele prezentate in aceasta teza au fost acceptate spre publicare in reviste ISI de top sau conferinte de prestigiu. O parte din rezultate (Capitolul 7) au fost trimise recent la reviste. Prezentam mai jos lista de publicatii pe care se bazeaza aceasta teza. Articole in reviste ISI I. Necoara and D. Clipici, Distributed random coordinate descent methods for composite minimization, partially accepted in SIAM Journal of Optimization, pp. 1 40, December 013. A. Patrascu and I. Necoara, Random coordinate descent methods for l0 regularized convex optimization, accepted in IEEE Transactions on Automatic Control, to appear, 014. I. Necoara, Random coordinate descent algorithms for multi-agent convex optimization over networks, IEEE Transactions on Automatic Control, vol. 58, no. 8, pp. 1 1, 013. A. Patrascu, I. Necoara, Efficient random coordinate descent algorithms for large-scale structured nonconvex optimization, Journal of Global Optimization, DOI: /s , 1-3, 014. I. Necoara. A. Patrascu, A random coordinate descent algorithm for optimization problems with composite objective function and linear coupled constraints, Computational Optimization and Applications, vol. 57, no., pp , 014. I. Necoara, D. Clipici, Efficient parallel coordinate descent algorithm for convex optimization problems with separable constraints: application to distributed MPC, Journal of Process Control, vol. 3, no. 3, pp , 013. I. Necoara, V. Nedelcu and I. Dumitrache, Parallel and distributed optimization methods for estimation and control in networks, Journal of Process Control, vol. 1, no. 5, pp , 011.

10 1. Principalele publicatii pe algoritmi de optimizare pe coordonate 5 Articole in pregatire I. Necoara, Y. Nesterov and F. Glineur, A random coordinate descent method on large optimization problems with linear constraints, Technical Report, University Politehnica Bucharest, June 014. Articole in conferinte I. Necoara, Y. Nesterov and F. Glineur, A random coordinate descent method on large optimization problems with linear constraints, The Fourth International Conference on Continuous Optimization, Lisbon, 013. I. Necoara, A. Patrascu, A random coordinate descent algorithm for optimization problems with composite objective function and linear coupled constraints, The Fourth International Conference on Continuous Optimization, Lisbon, 013. I. Necoara, D. Clipici, A computationally efficient parallel coordinate descent algorithm for MPC implementation on a PLC, in Proceedings of 1th European Control Conference, 013 A. Patrascu, I. Necoara, A random coordinate descent algorithm for large-scale sparse nonconvex optimization, in Proceedings of 1th European Control Conference, 013. I. Necoara, Suboptimal distributed MPC based on a block-coordinate descent method with feasibility and stability guarantees, in Proceedings of 51th IEEE Conference on Decision and Control, 01. I. Necoara, A random coordinate descent method for large-scale resource allocation problems, in Proceedings of 51th IEEE Conference on Decision and Control, 01. I. Necoara, A Patrascu, A random coordinate descent algorithm for singly linear constrained smooth optimization, in Proceedings of 0th Mathematical Theory of Network and Systems, 01.

11 Chapter Summary.1 Contributions of this thesis The main optimization problem of interest considered in this thesis has the following form: min F (x) (= f(x) + Ψ(x)) (.1) x Rn s.t.: Ax = b, where f is a smooth function (i.e. with Lipschitz gradient), Ψ is a simple convex function (i.e. minimization of the sum of this function with a quadratic term is easy) and matrix A R m n is usually sparse according to some graph structure. Another characteristic of this problem is its very large dimension, i.e. n is very large, in particular we deal with millions or even billions of variables. We further assume that the decision variable x can be decomposed in (block) components x = [x T 1 x T... x T N ]T, where x i R n i and i n i = n. Note that this problem is very general and appears in many engineering applications: Ψ is the indicator function of some convex set X that can be written usually as a Cartesian product X = X 1 X X N, where X i R n i. This problem is known in the literature as separable optimization problem with linear coupling constraints and appears in many applications from distributed control and estimation [13,6,65,100,11], network optimization [9,, 8, 98, 110, 11], computer vision [10, 44], etc. Ψ is either the indicator function of some convex set X = X 1 X X N or 1- norm x 1 (in order to induce sparsity in the solution) and matrix A = 0. This problem appears in distributed model predictive control [61,103], image processing [14,1,47,105], classification [99, 13, 14], data mining [16, 86, 119], etc. Ψ is the indicator function of some convex set X = X 1 X X N and A = a T, i.e. a single linear coupled constraint. This problem appears is page ranking (also known as Google problem) [59, 76], control [39, 83, 84, 104], learning [16 18, 109, 111], truss topology design [4], etc. We notice that (.1) belongs to the class of large-scale optimization problems with sparse data/solutions. The standard approach for solving the large-scale optimization problem (.1) is to use decomposition. Decomposition methods represent a powerful tool for solving these type of problems due to their ability of dividing the original large-scale problem into smaller subproblems which are coordinated by a master problem. Decomposition methods can be 6

12 .1 Contributions of this thesis 7 divided in two main classes: primal and dual decomposition. While in the primal decomposition methods the optimization problem is solved using the original formulation and variables, in dual decomposition the constraints are moved into the cost using the Lagrange multipliers and the dual problem is solved. In the last 7 years I have pursued both approaches in my research. From my knowledge I am one of the first researchers that used smoothing techniques in Lagrangian dual decomposition in order to obtain faster convergence rates for the corresponding algorithms (see e.g. the papers [64, 65, 71, 7, 90, 91, 110]). In this thesis however, I have opted to present some of my recent results on primal decomposition, namely coordinate descent methods (see e.g. the papers [59 61, 65, 67, 70, 84]). The main contributions of this thesis, by chapters, are as follows: Chapter 3: In this chapter we develop random (block) coordinate descent methods for minimizing large-scale convex problems with linearly coupled constraints and prove that it obtains in expectation an ϵ-accurate solution in at most O( 1) iterations. Since we have ϵ coupled constraints in the problem, we need to devise an algorithm that updates randomly two (block) components per iteration. However, the numerical complexity per iteration of the new methods is usually much cheaper than that of methods based on full gradient information. We focus on how to choose the probabilities to make the randomized algorithm to converge as fast as possible and we arrive at solving sparse SDPs. Analysis for rate of convergence in probability is also provided. For strongly convex functions the new methods converge linearly. We also extend the main algorithm, where we update two (block) components per iteration, to a parallel random coordinate descent algorithm, where we update more than two (block) components per iteration and we show that for this parallel version the convergence rate depends linearly on the number of (block) components updated. Numerical tests confirm that on large optimization problems with cheap coordinate derivatives the new methods are much more efficient than methods based on full gradient. This chapter is based on papers [67,68]. Chapter 4: In this chapter we develop randomized block-coordinate descent methods for minimizing multi-agent convex optimization problems with a single linear coupled constraint over networks and prove that they obtain in expectation an ϵ accurate solution in at most 1 O( ) iterations, where λ λ (Q)ϵ (Q) is the second smallest eigenvalue of a matrix Q that is defined in terms of the probabilities and the number of blocks. However, the computational complexity per iteration of our methods is much simpler than the one of a method based on full gradient information and each iteration can be computed in a completely distributed way. We focus on how to choose the probabilities to make these randomized algorithms to converge as fast as possible and we arrive at solving a sparse SDP. Analysis for rate of convergence in probability is also provided. For strongly convex functions our distributed algorithms converge linearly. We also extend the main algorithm to a parallel random coordinate descent method and to problems with more general linearly coupled constraints for which we also derive rate of convergence. The new algorithms were implemented in Matlab and applied for solving the Google problem, and the simulation results show the superiority of our approach compared to methods based on full gradient. This chapter is based on papers [58, 59, 69]. Chapter 5: In this chapter we propose a variant of the random coordinate descent method for solving linearly constrained convex optimization problems with composite objective functions. If the smooth part of the objective function has Lipschitz continuous gradient, then we prove that our method obtains an ϵ-optimal solution in O(N /ϵ) iterations, where N is the number of blocks. For the class of problems with cheap coordinate derivatives we show that the new method

13 .1 Contributions of this thesis 8 is faster than methods based on full-gradient information. Analysis for the rate of convergence in probability is also provided. For strongly convex functions our method converges linearly. The proposed algorithm was implemented in code C and tested on real data from SVM and on the problem of finding the Chebyshev center for a set of points. Extensive numerical tests confirm that on very large problems, our method is much more numerically efficient than methods based on full gradient information or coordinate descent methods based on greedy index selection. This chapter is based on paper [70]. Chapter 6: In this chapter we analyze several new methods for solving nonconvex optimization problems with the objective function formed as a sum of two terms: one is nonconvex and smooth, and another is convex but simple and its structure is known. Further, we consider both cases: unconstrained and linearly constrained nonconvex problems. For optimization problems of the above structure, we propose random coordinate descent algorithms and analyze their convergence properties. For the general case, when the objective function is nonconvex and composite we prove asymptotic convergence for the sequences generated by our algorithms to stationary points and sublinear rate of convergence in expectation for some optimality measure. Additionally, if the objective function satisfies an error bound condition we derive a local linear rate of convergence for the expected values of the objective function. We also present extensive numerical experiments on the eigenvalue complementarity problem for evaluating the performance of our algorithms in comparison with state-of-the-art methods. From the numerical experiments we can observe that on large optimization problems the new methods are much more efficient than methods based on full gradient. This chapter is based on papers [84,85]. Chapter 7: In this chapter we propose a distributed version of a randomized (block) coordinate descent method for minimizing the sum of a partially separable smooth convex function and a fully separable non-smooth convex function. Under the assumption of block Lipschitz continuity of the gradient of the smooth function, this method is shown to have a sublinear convergence rate. Linear convergence rate of the method is obtained for the newly introduced class of generalized error bound functions. We prove that the new class of generalized error bound functions encompasses both global/local error bound functions and smooth strongly convex functions. We also show that the theoretical estimates on the convergence rate depend on the number of blocks chosen randomly and a natural measure of separability of the objective function. The new algorithm was implemented in code C and tested on the constrained lasso problem. Numerical experiments show that by parallelization we can accelerate substantially the rate of convergence of the classical random coordinate descent method. This chapter is based on paper [60]. Chapter 8: In this chapter we propose a parallel coordinate descent algorithm for solving smooth convex optimization problems with separable constraints that may arise e.g. in distributed model predictive control (MPC) for linear networked systems. Our algorithm is based on block coordinate descent updates in parallel and has a very simple iteration. We prove (sub)linear rate of convergence for the new algorithm under standard assumptions for smooth convex optimization. Further, our algorithm uses local information and thus is suitable for distributed implementations. Moreover, it has low iteration complexity, which makes it appropriate for embedded control. An MPC scheme based on this new parallel algorithm is derived, for which every subsystem in the network can compute feasible and stabilizing control inputs using distributed and cheap computations. For ensuring stability of the MPC scheme, we use a terminal cost formulation derived from a distributed synthesis. The proposed control method was q implemented on a PLC Siemens

14 . Main publications on coordinate descent algorithms 9 for controlling a four tank process. This chapter is based on paper [61].. Main publications on coordinate descent algorithms Most of the material that is presented in this thesis has been published, or accepted for publication, in top journals or conference proceedings. Some of the material (Chapter 7) has been submitted for publication recently. We detail below the main publications from this thesis. Articles in ISI journals I. Necoara and D. Clipici, Distributed random coordinate descent methods for composite minimization, partially accepted in SIAM Journal of Optimization, pp. 1 40, December 013. A. Patrascu and I. Necoara, Random coordinate descent methods for l0 regularized convex optimization, accepted in IEEE Transactions on Automatic Control, to appear, 014. I. Necoara, Random coordinate descent algorithms for multi-agent convex optimization over networks, IEEE Transactions on Automatic Control, vol. 58, no. 8, pp. 1 1, 013. A. Patrascu, I. Necoara, Efficient random coordinate descent algorithms for large-scale structured nonconvex optimization, Journal of Global Optimization, DOI: /s , 1-3, 014. I. Necoara. A. Patrascu, A random coordinate descent algorithm for optimization problems with composite objective function and linear coupled constraints, Computational Optimization and Applications, vol. 57, no., pp , 014. I. Necoara, D. Clipici, Efficient parallel coordinate descent algorithm for convex optimization problems with separable constraints: application to distributed MPC, Journal of Process Control, vol. 3, no. 3, pp , 013. I. Necoara, V. Nedelcu and I. Dumitrache, Parallel and distributed optimization methods for estimation and control in networks, Journal of Process Control, vol. 1, no. 5, pp , 011. Articles in preparation I. Necoara, Y. Nesterov and F. Glineur, A random coordinate descent method on large optimization problems with linear constraints, Technical Report, University Politehnica Bucharest, June 014. Articles in conferences I. Necoara, Y. Nesterov and F. Glineur, A random coordinate descent method on large optimization problems with linear constraints, The Fourth International Conference on Continuous Optimization, Lisbon, 013. I. Necoara, A. Patrascu, A random coordinate descent algorithm for optimization problems with composite objective function and linear coupled constraints, The Fourth International Conference on Continuous Optimization, Lisbon, 013.

15 . Main publications on coordinate descent algorithms 10 I. Necoara, D. Clipici, A computationally efficient parallel coordinate descent algorithm for MPC implementation on a PLC, in Proceedings of 1th European Control Conference, 013 A. Patrascu, I. Necoara, A random coordinate descent algorithm for large-scale sparse nonconvex optimization, in Proceedings of 1th European Control Conference, 013. I. Necoara, Suboptimal distributed MPC based on a block-coordinate descent method with feasibility and stability guarantees, in Proceedings of 51th IEEE Conference on Decision and Control, 01. I. Necoara, A random coordinate descent method for large-scale resource allocation problems, in Proceedings of 51th IEEE Conference on Decision and Control, 01. I. Necoara, A Patrascu, A random coordinate descent algorithm for singly linear constrained smooth optimization, in Proceedings of 0th Mathematical Theory of Network and Systems, 01.

16 Chapter 3 Random coordinate descent methods for linearly constrained smooth optimization In this chapter we develop random block coordinate descent methods for minimizing large-scale convex problems with linearly coupled constraints and prove that it obtains in expectation an ϵ- accurate solution in at most O( 1 ) iterations. Since we have coupled constraints in the problem, ϵ we need to devise an algorithm that updates randomly two (block) components per iteration. However, the numerical complexity per iteration of the new methods is usually much cheaper than that of methods based on full gradient information. We focus on how to choose the probabilities to make the randomized algorithm to converge as fast as possible and we arrive at solving sparse SDPs. Analysis for rate of convergence in probability is also provided. For strongly convex functions the new methods converge linearly. We also extend the main algorithm, where we update two (block) components per iteration, to a parallel random coordinate descent algorithm, where we update more than two (block) components per iteration and we show that for this parallel version the convergence rate depends linearly on the number of (block) components updated. Numerical tests confirm that on large optimization problems with cheap coordinate derivatives the new methods are much more efficient than methods based on full gradient. This chapter is based on papers [67, 68]. 3.1 Introduction The performance of a network composed of interconnected subsystems can be increased if the traditionally separated subsystems are jointly optimized. Recently, parallel and distributed optimization methods have emerged as a powerful tool for solving large network optimization problems: e.g. resource allocation [3, 34, 11], telecommunications [8, 110], coordination in multi-agent systems [11], estimation in sensor networks [65], distributed control [65], image processing [1], traffic equilibrium problems [8], network flow [8] and other areas [5, 89, 10]. In this chapter we propose efficient distributed algorithms with cheap iteration for solving large separable convex problems with linearly coupled constraints that arise in network applications. For a centralized setup and problems of moderate size there exist many iterative algorithms to solve them such as Newton, quasi-newton or projected gradient methods. However, the problems that we consider in this chapter have the following features: the size of data is very large so that usual methods based on whole gradient computations are prohibitive. Moreover the incomplete structure of information (e.g. the data are distributed over the all nodes of the network, so that at a given time we need to work only with the data available then) may also be an obstacle for 11

17 3.1 Introduction 1 whole gradient computations. In this case, an appropriate way to approach these problems is through (block) coordinate descent methods. (Block) coordinate descent methods, early variants of which can be traced back to a paper of Schwartz from 1870 [101], have recently become popular in the optimization community due to their low cost per iteration and good scalability properties. Much of this work is motivated by problems in networked systems, largely since such systems are a popular framework with which we can model different problems in a wide range of applications [5, 76, 77, 89, 93, 110, 10]. The main differences in all variants of coordinate descent methods consist in the criterion of choosing at each iteration the coordinate over which we minimize the objective function and the complexity of this choice. Two classical criteria used often in these algorithms are the cyclic [8] and the greedy descent coordinate search [107], which significantly differs by the amount of computations required to choose the appropriate index. For cyclic coordinate search estimates on the rate of convergence were given recently in [6], while for the greedy coordinate search (e.g. Gauss-Southwell rule) the convergence rate is given e.g. in [107]. Another interesting approach is based on random choice rule, where the coordinate search is random. Recent complexity results on random coordinate descent methods for smooth convex functions were obtained by Nesterov in [76]. The extension to composite functions was given in [93]. However, in most of the previous work the authors considered optimization models where the constraint set is decoupled (i.e. characterized by Cartesian product). In this chapter we develop random block coordinate descent methods suited for large optimization problems in networks where the information cannot be gather centrally, but rather the information is distributed over the network. Moreover, we focus on optimization problems with linearly coupled constraints (i.e. the constraint set is coupled). Due to the coupling in the constraints we introduce a -block variant of random coordinate descent method, that involves at each iteration the closed form solution of an optimization problem only with respect to block variables while keeping all the other variables fixed. We prove for the new algorithm an expected convergence rate of order O( 1 ) for the function values, where k is the number of iterations. We k focus on how to design the probabilities to make this algorithm to converge as fast as possible and we prove that this problem can be recast as a sparse SDP. We also show that for functions with cheap coordinate derivatives the new method is faster than schemes based on full gradient information or based on greedy coordinate descent. Analysis for rate of convergence in probability is also provided. For strongly convex functions we prove that the new method converges linearly. We also extend the algorithm to a scheme where we can choose more than -block components per iteration and we show that the number of components appears directly in the convergence rate of this algorithm. While the most obvious benefit of randomization is that it can lead to faster algorithms, either in worst case complexity analysis and/or numerical implementation, there are also other benefits of our algorithm that are at least as important. For example, the use of randomization leads to a simpler algorithm that is easier to analyze, produces a more robust output and can often be organized to exploit modern computational architectures (e.g distributed and parallel computer architectures). The chapter is organized as follows. In Section 3. we introduce the optimization model analyzed in this chapter and the main assumptions. In Section 3.4 we present and analyze a random -block coordinate descent method for solving our optimization problem. We derive the convergence rate in expectation in Section 3.5 and we also provide means to choose the probability distribution. We also compare in Section 3.6 our algorithm with the full projected gradient method and other existing methods and show that on problems with cheap coordinate derivatives our method has better arithmetic complexity. In Sections 3.7 and 3.8 we analyze the convergence rate for strongly

18 3. Problem formulation 13 convex functions and in probability, respectively. In Section 3.10 we extend our algorithm to more than a pair of indexes and analyze the convergence rate of the new scheme. 3. Problem formulation We work in the space R n composed by column vectors. For x, y R n denote the standard Euclidian inner product x, y = n i=1 x iy i and Euclidian norm x = x, x 1/. We use the same notation, and for spaces of different dimension. The inner product on the space of symmetric matrices is denoted with W 1, W 1 = trace(w 1 W ) for all W 1, W symmetric matrices. We decompose the full space R Nn = Π N i=1r n. We also define the corresponding partition of the identity matrix: I Nn = [U 1 U N ], where U i R Nn n. Then for any x R Nn we write x = i U ix i. We denote e R N the vector with all entries 1 and e i R N the vector with all entries zero except the component i equal to 1. Furthermore, we define: U = [I n I n ] = e T I n R n Nn and V i = [0 U i 0] = e T U i R Nn Nn, where denotes the Kronecker product. Given a vector ν = [ν 1 ν N ] T R N, we define the vector ν p = [ν p 1 ν p N ]T for any integer p and diag(ν) denotes the diagonal matrix with the entries ν i on the diagonal. For a positive semidefinite matrix W R N N we consider the following order on its eigenvalues 0 λ 1 λ λ N and the notation x W = xt W x for any x. We consider large network optimization problems where each agent in the network is associated a local variable so that their sum is fixed and we need to minimize a separable convex objective function: { f = min f 1(x 1 ) + + f N (x N ) x i R n (3.1) s.t.: x x N = 0. Optimization problems with linearly coupled constraints (3.1) arise in many areas such as resource allocation in economic systems [34] or distributed computer systems [45], in signal processing [1], in traffic equilibrium and network flow [8] or distributed control [65]. For problem (3.1) we associate a network composed of several nodes V = {1,, N} that can exchange information according to a communication graph G = (V, E), where E denotes the set of edges, i.e. (i, j) E V V models that node j sends information to node i. We assume that the graph G is undirected and connected. The local information structure imposed by the graph G should be considered as part of the problem formulation. Note that constraints of the form α 1 x α N x N = b, where α i R, can be easily handled in our framework by a change of coordinates. The goal of this chapter is to devise a distributed algorithm that iteratively solves the convex problem (3.1) by passing the estimate of the optimizer only between neighboring nodes. There is great interest in designing such distributed algorithms, since centralized algorithms scale poorly with the number of nodes and are less resilient to failure of the central node. Let us define the extended subspace: { } N S = x R Nn : x i = 0, that has the orthogonal complement the subspace T = {u R Nn : u 1 = = u N }. We also use the notation: N x = [x T 1 x T N] T = U i x i R Nn, f(x) = f 1 (x 1 ) + + f N (x N ). The basic assumption considered in this chapter is the following: i=1 i=1

19 3.3 Previous work 14 Assumption 3..1 We assume that the functions f i are convex and have Lipschitz continuous gradient, with Lipschitz constants L i > 0, i.e.: f i (x i ) f i (y i ) L i x i y i x i, y i R n, i V. (3.) From the Lipschitz property of the gradient (3.), the following inequality holds (see e.g. Section in [75]): f i (x i + d i ) f i (x i ) + f i (x i ), d i + L i d i x i, d i R n. (3.3) The following inequality, which is central in our derivations below, is a straightforward consequence of (3.3) and holds for all x R Nn and d i, d j R n : f(x+u i d i +U j d j ) f(x)+ f i (x i ), d i + L i d i + f j (x j ), d j + L j d j. (3.4) We denote with X the set of optimal solutions for problem (3.1). The optimality conditions for optimization problem (3.1) become: x is optimal solution for the convex problem (3.1) if and only if N x i = 0, f i (x i ) = f j (x j) i j V. i=1 3.3 Previous work We briefly review some well-known methods from the literature for solving the optimization model (3.1). In [3, 11] distributed weighted gradient methods were proposed to solve a similar problem as in (3.1), in particular the authors in [11] consider strongly convex functions f i with positive definite Hessians. These papers propose a class of center-free algorithms (in these papers the term center-free refers to the absence of a coordinator) with the following iteration: x k+1 i = x k i j N i w ij ( fj (x k j ) f i (x k i ) ) i V, k 0, (3.5) where N i denotes the set of neighbors of node i in the graph G. Under the strong convexity assumption and provided that the weights w ij are chosen as a solution of a certain SDP, linear rate of convergence is obtained. Note however, that this method requires at each iteration the computation of the full gradient and the iteration complexity is O (N(n + n f )), where n f is the number of operations for evaluating the gradient of any function f i for all i V. In [107] Tseng studied optimization problems with linearly coupled constraints and composite objective functions of the form f + h, where h is convex nonsmooth function, and developed a block coordinate descent method based on the Gauss-Southwell choice rule. The principal requirement for this method is that at each iteration a subset of indexes I needs to be chosen with respect to the Gauss-Southwell rule and then the update direction is a solution of the following QP problem: d H (x; I) = arg min s: j I s j=0 f(x), s + 1 s H + h(x + s), where H is a positive definite matrix chosen at the initial step of the algorithm. Using this direction and choosing an appropriate step size α k, the next iterate is defined as: x k+1 = x k +

20 3.4 Random block coordinate descent method 15 α k d H (x k ; I k ). The total complexity per iteration of this method is O (Nn + n f ). In [107], the authors proved, for the particular case of a single linear constraint and the nonsmooth part h of the objective function is piece-wise linear and separable, that after k iterations a sublinear convergence rate of order O( Nn LR0 ) is attained for the function values, where L = max k i V L i and R 0 is the Euclidian distance of the starting iterate to the set of optimal solutions. In [5] a -coordinate descent method is developed for minimizing a smooth function subject to a single linear equality constraint and additional bound constraints on the decision variables. In the convex case, when all the variables are lower bounded but not upper bounded, ) the author, while the shows that the sequence of function values converges at a sublinear rate O ( Nn LR 0 k complexity per iteration is at least O (Nn + n f ). A random coordinate descent algorithm for an optimization model with smooth objective function an separable constraints was analyzed by Nesterov in [76], where a complete rate analysis is provided. The main feature of his randomized algorithm is the cheap iteration complexity of order O(n f + n + ln N), while still keeping sublinear rate of convergence. The generalization of this algorithm to composite objective function has been studied in [89, 93]. However, none of these papers studied the application of random coordinate descent algorithms to smooth convex problems with linearly coupled constraints. In this chapter we develop a random coordinate descent method for this type of optimization model as described in (3.1). 3.4 Random block coordinate descent method In this section we devise a randomized block coordinate descent algorithm for solving the separable convex problem (3.1) and analyze its convergence. We present a distributed method where only neighbors need to communicate with each other. At a certain iteration having a feasible estimate x S of the optimizer, we choose randomly a pair (i, j) E with probability p ij > 0. Since we assume an undirected graph G = (V, E) associated to problem (3.1) (the generalization of the present scheme to directed graphs is straightforward), we consider p ij = p ji. We assume that the graph G is connected. For a feasible x S and a randomly chosen pair of indexes (i, j), with i < j, we define the next feasible step x + R n as follows: x + = x + U i d i + U j d j. Derivation of the directions d i and d j is based on the inequality (3.4): f(x + ) f(x) + f i (x i ), d i + f j (x j ), d j + L i d i + L j d j. (3.6) Minimizing the right hand side of inequality (3.6), but imposing additionally feasibility for the next iterate x + (i.e. we require d i + d j = 0), we arrive at the following local minimization problem: [d T i d T j ] T = arg min s i,s j R n : s i +s j =0 f i(x i ), s i + f j (x j ), s j + L i s i + L j s j that has the closed form solution d i = 1 L i + L j ( f i (x i ) f j (x j )), d j = d i. (3.7)

Randomized Coordinate Descent Methods on Optimization Problems with Linearly Coupled Constraints

Randomized Coordinate Descent Methods on Optimization Problems with Linearly Coupled Constraints Randomized Coordinate Descent Methods on Optimization Problems with Linearly Coupled Constraints By I. Necoara, Y. Nesterov, and F. Glineur Lijun Xu Optimization Group Meeting November 27, 2012 Outline

More information

Random coordinate descent algorithms for. huge-scale optimization problems. Ion Necoara

Random coordinate descent algorithms for. huge-scale optimization problems. Ion Necoara Random coordinate descent algorithms for huge-scale optimization problems Ion Necoara Automatic Control and Systems Engineering Depart. 1 Acknowledgement Collaboration with Y. Nesterov, F. Glineur ( Univ.

More information

Sisteme cu logica fuzzy

Sisteme cu logica fuzzy Sisteme cu logica fuzzy 1/15 Sisteme cu logica fuzzy Mamdani Fie un sistem cu logică fuzzy Mamdani două intrări x şi y ieşire z x y SLF Structura z 2/15 Sisteme cu logica fuzzy Mamdani Baza de reguli R

More information

A random coordinate descent algorithm for optimization problems with composite objective function and linear coupled constraints

A random coordinate descent algorithm for optimization problems with composite objective function and linear coupled constraints Comput. Optim. Appl. manuscript No. (will be inserted by the editor) A random coordinate descent algorithm for optimization problems with composite objective function and linear coupled constraints Ion

More information

ON THE QUATERNARY QUADRATIC DIOPHANTINE EQUATIONS (II) NICOLAE BRATU 1 ADINA CRETAN 2

ON THE QUATERNARY QUADRATIC DIOPHANTINE EQUATIONS (II) NICOLAE BRATU 1 ADINA CRETAN 2 ON THE QUATERNARY QUADRATIC DIOPHANTINE EQUATIONS (II) NICOLAE BRATU 1 ADINA CRETAN ABSTRACT This paper has been updated and completed thanks to suggestions and critics coming from Dr. Mike Hirschhorn,

More information

UNITATEA DE ÎNVĂȚARE 3 Analiza algoritmilor

UNITATEA DE ÎNVĂȚARE 3 Analiza algoritmilor UNITATEA DE ÎNVĂȚARE 3 Analiza algoritmilor Obiective urmărite: La sfârşitul parcurgerii acestei UI, studenţii vor 1.1 cunoaște conceptul de eficienta a unui algoritm vor cunoaste si inţelege modalitatile

More information

A GENERALIZATION OF A CLASSICAL MONTE CARLO ALGORITHM TO ESTIMATE π

A GENERALIZATION OF A CLASSICAL MONTE CARLO ALGORITHM TO ESTIMATE π U.P.B. Sci. Bull., Series A, Vol. 68, No., 6 A GENERALIZATION OF A CLASSICAL MONTE CARLO ALGORITHM TO ESTIMATE π S.C. ŞTEFĂNESCU Algoritmul Monte Carlo clasic A1 estimeazează valoarea numărului π bazându-se

More information

Rezolvarea ecuaţiilor şi sistemelor de ecuaţii diferenţiale ordinare (II)

Rezolvarea ecuaţiilor şi sistemelor de ecuaţii diferenţiale ordinare (II) Rezolvarea ecuaţiilor şi sistemelor de ecuaţii diferenţiale ordinare (II) Metode multipas Prof.dr.ing. Universitatea "Politehnica" Bucureşti, Facultatea de Inginerie Electrică Suport didactic pentru disciplina

More information

Coordinate Descent and Ascent Methods

Coordinate Descent and Ascent Methods Coordinate Descent and Ascent Methods Julie Nutini Machine Learning Reading Group November 3 rd, 2015 1 / 22 Projected-Gradient Methods Motivation Rewrite non-smooth problem as smooth constrained problem:

More information

Coordinate gradient descent methods. Ion Necoara

Coordinate gradient descent methods. Ion Necoara Coordinate gradient descent methods Ion Necoara January 2, 207 ii Contents Coordinate gradient descent methods. Motivation..................................... 5.. Coordinate minimization versus coordinate

More information

Soluţii juniori., unde 1, 2

Soluţii juniori., unde 1, 2 Soluţii juniori Problema 1 Se consideră suma S x1x x3x4... x015 x016 Este posibil să avem S 016? Răspuns: Da., unde 1,,..., 016 3, 3 Termenii sumei sunt de forma 3 3 1, x x x. 3 5 6 sau Cristian Lazăr

More information

Teorema Reziduurilor şi Bucuria Integralelor Reale Prezentare de Alexandru Negrescu

Teorema Reziduurilor şi Bucuria Integralelor Reale Prezentare de Alexandru Negrescu Teorema Reiduurilor şi Bucuria Integralelor Reale Preentare de Alexandru Negrescu Integrale cu funcţii raţionale ce depind de sint şi cost u notaţia e it, avem: cost sint i ( + ( dt d i, iar integrarea

More information

Legi de distribuţie (principalele distribuţii de probabilitate) Tudor Drugan

Legi de distribuţie (principalele distribuţii de probabilitate) Tudor Drugan Legi de distribuţie (principalele distribuţii de probabilitate) Tudor Drugan Introducere In general distribuţiile variabilelor aleatoare definite pe o populaţie, care face obiectul unui studiu, nu se cunosc.

More information

O V E R V I E W. This study suggests grouping of numbers that do not divide the number

O V E R V I E W. This study suggests grouping of numbers that do not divide the number MSCN(2010) : 11A99 Author : Barar Stelian Liviu Adress : Israel e-mail : stelibarar@yahoo.com O V E R V I E W This study suggests grouping of numbers that do not divide the number 3 and/or 5 in eight collumns.

More information

Metode şi Algoritmi de Planificare (MAP) Curs 2 Introducere în problematica planificării

Metode şi Algoritmi de Planificare (MAP) Curs 2 Introducere în problematica planificării Metode şi Algoritmi de Planificare (MAP) 2009-2010 Curs 2 Introducere în problematica planificării 20.10.2009 Metode si Algoritmi de Planificare Curs 2 1 Introduction to scheduling Scheduling problem definition

More information

Habilitation Thesis. Periodic solutions of differential systems: existence, stability and bifurcations

Habilitation Thesis. Periodic solutions of differential systems: existence, stability and bifurcations UNIVERSITATEA BABEŞ BOLYAI CLUJ-NAPOCA FACULTATEA DE MATEMATICĂ ŞI INFORMATICĂ Habilitation Thesis Mathematics presented by Adriana Buică Periodic solutions of differential systems: existence, stability

More information

Conditional Gradient (Frank-Wolfe) Method

Conditional Gradient (Frank-Wolfe) Method Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties

More information

1.3. OPERAŢII CU NUMERE NEZECIMALE

1.3. OPERAŢII CU NUMERE NEZECIMALE 1.3. OPERAŢII CU NUMERE NEZECIMALE 1.3.1 OPERAŢII CU NUMERE BINARE A. ADUNAREA NUMERELOR BINARE Reguli de bază: 0 + 0 = 0 transport 0 0 + 1 = 1 transport 0 1 + 0 = 1 transport 0 1 + 1 = 0 transport 1 Pentru

More information

Random Block Coordinate Descent Methods for Linearly Constrained Optimization over Networks

Random Block Coordinate Descent Methods for Linearly Constrained Optimization over Networks J. Optimization Theory & Applications manuscript No. (will be inserted by the editor) Random Block Coordinate Descent Methods for Linearly Constrained Optimization over Networks Ion Necoara, Yurii Nesterov

More information

Coordinate descent methods

Coordinate descent methods Coordinate descent methods Master Mathematics for data science and big data Olivier Fercoq November 3, 05 Contents Exact coordinate descent Coordinate gradient descent 3 3 Proximal coordinate descent 5

More information

Cercet¼ari operaţionale

Cercet¼ari operaţionale Cercet¼ari operaţionale B¼arb¼acioru Iuliana Carmen CURSUL 9 Cursul 9 Cuprins Programare liniar¼a 5.1 Modelul matematic al unei probleme de programare liniar¼a.................... 5. Forme de prezentare

More information

TWO BOUNDARY ELEMENT APPROACHES FOR THE COMPRESSIBLE FLUID FLOW AROUND A NON-LIFTING BODY

TWO BOUNDARY ELEMENT APPROACHES FOR THE COMPRESSIBLE FLUID FLOW AROUND A NON-LIFTING BODY U.P.B. Sci. Bull., Series A, Vol. 7, Iss., 9 ISSN 3-77 TWO BOUNDARY ELEMENT APPROACHES FOR THE COMPRESSIBLE FLUID FLOW AROUND A NON-LIFTING BODY Luminiţa GRECU, Gabriela DEMIAN, Mihai DEMIAN 3 În lucrare

More information

INEGALITĂŢI DE TIP HARNACK ŞI SOLUŢII POZITIVE MULTIPLE PENTRU PROBLEME NELINIARE

INEGALITĂŢI DE TIP HARNACK ŞI SOLUŢII POZITIVE MULTIPLE PENTRU PROBLEME NELINIARE UNIVERSITATEA BABEŞ-BOLYAI CLUJ-NAPOCA ŞCOALA DOCTORALĂ DE MATEMATICĂ ŞI INFORMATICĂ INEGALITĂŢI DE TIP HARNACK ŞI SOLUŢII POZITIVE MULTIPLE PENTRU PROBLEME NELINIARE Rezumatul tezei de doctorat Doctorand:

More information

Convex relaxation. In example below, we have N = 6, and the cut we are considering

Convex relaxation. In example below, we have N = 6, and the cut we are considering Convex relaxation The art and science of convex relaxation revolves around taking a non-convex problem that you want to solve, and replacing it with a convex problem which you can actually solve the solution

More information

Lecture 3: Huge-scale optimization problems

Lecture 3: Huge-scale optimization problems Liege University: Francqui Chair 2011-2012 Lecture 3: Huge-scale optimization problems Yurii Nesterov, CORE/INMA (UCL) March 9, 2012 Yu. Nesterov () Huge-scale optimization problems 1/32March 9, 2012 1

More information

Lecture 25: November 27

Lecture 25: November 27 10-725: Optimization Fall 2012 Lecture 25: November 27 Lecturer: Ryan Tibshirani Scribes: Matt Wytock, Supreeth Achar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have

More information

Inteligenta Artificiala

Inteligenta Artificiala Inteligenta Artificiala Universitatea Politehnica Bucuresti Anul universitar 2010-2011 Adina Magda Florea http://turing.cs.pub.ro/ia_10 si curs.cs.pub.ro 1 Curs nr. 4 Cautare cu actiuni nedeterministe

More information

A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming

A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming Zhaosong Lu Lin Xiao March 9, 2015 (Revised: May 13, 2016; December 30, 2016) Abstract We propose

More information

Gradient Descent. Dr. Xiaowei Huang

Gradient Descent. Dr. Xiaowei Huang Gradient Descent Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Three machine learning algorithms: decision tree learning k-nn linear regression only optimization objectives are discussed,

More information

Subgradient methods for huge-scale optimization problems

Subgradient methods for huge-scale optimization problems Subgradient methods for huge-scale optimization problems Yurii Nesterov, CORE/INMA (UCL) May 24, 2012 (Edinburgh, Scotland) Yu. Nesterov Subgradient methods for huge-scale problems 1/24 Outline 1 Problems

More information

Algorithms for Nonsmooth Optimization

Algorithms for Nonsmooth Optimization Algorithms for Nonsmooth Optimization Frank E. Curtis, Lehigh University presented at Center for Optimization and Statistical Learning, Northwestern University 2 March 2018 Algorithms for Nonsmooth Optimization

More information

Convex relaxation. In example below, we have N = 6, and the cut we are considering

Convex relaxation. In example below, we have N = 6, and the cut we are considering Convex relaxation The art and science of convex relaxation revolves around taking a non-convex problem that you want to solve, and replacing it with a convex problem which you can actually solve the solution

More information

A DECOMPOSITION PROCEDURE BASED ON APPROXIMATE NEWTON DIRECTIONS

A DECOMPOSITION PROCEDURE BASED ON APPROXIMATE NEWTON DIRECTIONS Working Paper 01 09 Departamento de Estadística y Econometría Statistics and Econometrics Series 06 Universidad Carlos III de Madrid January 2001 Calle Madrid, 126 28903 Getafe (Spain) Fax (34) 91 624

More information

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)

More information

Definiţie. Pr(X a) - probabilitatea ca X să ia valoarea a ; Pr(a X b) - probabilitatea ca X să ia o valoare în intervalul a,b.

Definiţie. Pr(X a) - probabilitatea ca X să ia valoarea a ; Pr(a X b) - probabilitatea ca X să ia o valoare în intervalul a,b. Variabile aleatoare Definiţie Se numeşte variabilă aleatoare pe un spaţiu fundamental E şi se notează prin X, o funcţie definită pe E cu valori în mulţimea numerelor reale. Unei variabile aleatoare X i

More information

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization / Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

Asynchronous Non-Convex Optimization For Separable Problem

Asynchronous Non-Convex Optimization For Separable Problem Asynchronous Non-Convex Optimization For Separable Problem Sandeep Kumar and Ketan Rajawat Dept. of Electrical Engineering, IIT Kanpur Uttar Pradesh, India Distributed Optimization A general multi-agent

More information

Lecture 13: Constrained optimization

Lecture 13: Constrained optimization 2010-12-03 Basic ideas A nonlinearly constrained problem must somehow be converted relaxed into a problem which we can solve (a linear/quadratic or unconstrained problem) We solve a sequence of such problems

More information

Parallel Successive Convex Approximation for Nonsmooth Nonconvex Optimization

Parallel Successive Convex Approximation for Nonsmooth Nonconvex Optimization Parallel Successive Convex Approximation for Nonsmooth Nonconvex Optimization Meisam Razaviyayn meisamr@stanford.edu Mingyi Hong mingyi@iastate.edu Zhi-Quan Luo luozq@umn.edu Jong-Shi Pang jongship@usc.edu

More information

Lecture 23: November 21

Lecture 23: November 21 10-725/36-725: Convex Optimization Fall 2016 Lecturer: Ryan Tibshirani Lecture 23: November 21 Scribes: Yifan Sun, Ananya Kumar, Xin Lu Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

CHAPTER 11. A Revision. 1. The Computers and Numbers therein

CHAPTER 11. A Revision. 1. The Computers and Numbers therein CHAPTER A Revision. The Computers and Numbers therein Traditional computer science begins with a finite alphabet. By stringing elements of the alphabet one after another, one obtains strings. A set of

More information

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44 Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)

More information

Convex Optimization Algorithms for Machine Learning in 10 Slides

Convex Optimization Algorithms for Machine Learning in 10 Slides Convex Optimization Algorithms for Machine Learning in 10 Slides Presenter: Jul. 15. 2015 Outline 1 Quadratic Problem Linear System 2 Smooth Problem Newton-CG 3 Composite Problem Proximal-Newton-CD 4 Non-smooth,

More information

Cristalul cu N atomi = un sistem de N oscilatori de amplitudini mici;

Cristalul cu N atomi = un sistem de N oscilatori de amplitudini mici; Curs 8 Caldura specifica a retelei Cristalul cu N atomi = un sistem de N oscilatori de amplitudini mici; pentru tratarea cuantica, se inlocuieste tratamentul clasic al oscilatorilor cuplati, cu cel cuantic

More information

Accelerated Block-Coordinate Relaxation for Regularized Optimization

Accelerated Block-Coordinate Relaxation for Regularized Optimization Accelerated Block-Coordinate Relaxation for Regularized Optimization Stephen J. Wright Computer Sciences University of Wisconsin, Madison October 09, 2012 Problem descriptions Consider where f is smooth

More information

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term; Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many

More information

Semidefinite Programming Basics and Applications

Semidefinite Programming Basics and Applications Semidefinite Programming Basics and Applications Ray Pörn, principal lecturer Åbo Akademi University Novia University of Applied Sciences Content What is semidefinite programming (SDP)? How to represent

More information

COMPARATIVE DISCUSSION ABOUT THE DETERMINING METHODS OF THE STRESSES IN PLANE SLABS

COMPARATIVE DISCUSSION ABOUT THE DETERMINING METHODS OF THE STRESSES IN PLANE SLABS 74 COMPARATIVE DISCUSSION ABOUT THE DETERMINING METHODS OF THE STRESSES IN PLANE SLABS Codrin PRECUPANU 3, Dan PRECUPANU,, Ștefan OPREA Correspondent Member of Technical Sciences Academy Gh. Asachi Technical

More information

Alternative Decompositions for Distributed Maximization of Network Utility: Framework and Applications

Alternative Decompositions for Distributed Maximization of Network Utility: Framework and Applications Alternative Decompositions for Distributed Maximization of Network Utility: Framework and Applications Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization

More information

Written Examination

Written Examination Division of Scientific Computing Department of Information Technology Uppsala University Optimization Written Examination 202-2-20 Time: 4:00-9:00 Allowed Tools: Pocket Calculator, one A4 paper with notes

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 29, 2016 Outline Convex vs Nonconvex Functions Coordinate Descent Gradient Descent Newton s method Stochastic Gradient Descent Numerical Optimization

More information

Linear Convergence under the Polyak-Łojasiewicz Inequality

Linear Convergence under the Polyak-Łojasiewicz Inequality Linear Convergence under the Polyak-Łojasiewicz Inequality Hamed Karimi, Julie Nutini and Mark Schmidt The University of British Columbia LCI Forum February 28 th, 2017 1 / 17 Linear Convergence of Gradient-Based

More information

You should be able to...

You should be able to... Lecture Outline Gradient Projection Algorithm Constant Step Length, Varying Step Length, Diminishing Step Length Complexity Issues Gradient Projection With Exploration Projection Solving QPs: active set

More information

Metode numerice de aproximare. a zerourilor unor operatori. şi de rezolvare a inegalităţilor variaţionale. cu aplicaţii

Metode numerice de aproximare. a zerourilor unor operatori. şi de rezolvare a inegalităţilor variaţionale. cu aplicaţii Facultatea de Matematică şi Informatică Universitatea Babeş-Bolyai Erika Nagy Metode numerice de aproximare a zerourilor unor operatori şi de rezolvare a inegalităţilor variaţionale cu aplicaţii Rezumatul

More information

Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method

Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method Davood Hajinezhad Iowa State University Davood Hajinezhad Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method 1 / 35 Co-Authors

More information

Optimality Conditions for Constrained Optimization

Optimality Conditions for Constrained Optimization 72 CHAPTER 7 Optimality Conditions for Constrained Optimization 1. First Order Conditions In this section we consider first order optimality conditions for the constrained problem P : minimize f 0 (x)

More information

Nonlinear Optimization: What s important?

Nonlinear Optimization: What s important? Nonlinear Optimization: What s important? Julian Hall 10th May 2012 Convexity: convex problems A local minimizer is a global minimizer A solution of f (x) = 0 (stationary point) is a minimizer A global

More information

Numerical optimization. Numerical optimization. Longest Shortest where Maximal Minimal. Fastest. Largest. Optimization problems

Numerical optimization. Numerical optimization. Longest Shortest where Maximal Minimal. Fastest. Largest. Optimization problems 1 Numerical optimization Alexander & Michael Bronstein, 2006-2009 Michael Bronstein, 2010 tosca.cs.technion.ac.il/book Numerical optimization 048921 Advanced topics in vision Processing and Analysis of

More information

Teoreme de compresie-extensie de tip Krasnoselskii şi aplicaţii (Rezumatul tezei de doctorat)

Teoreme de compresie-extensie de tip Krasnoselskii şi aplicaţii (Rezumatul tezei de doctorat) Teoreme de compresie-extensie de tip Krasnoselskii şi aplicaţii (Rezumatul tezei de doctorat) Sorin Monel Budişan Coordonator ştiinţi c: Prof. dr. Radu Precup Cuprins Introducere 1 1 Generaliz¼ari ale

More information

Optimisation in Higher Dimensions

Optimisation in Higher Dimensions CHAPTER 6 Optimisation in Higher Dimensions Beyond optimisation in 1D, we will study two directions. First, the equivalent in nth dimension, x R n such that f(x ) f(x) for all x R n. Second, constrained

More information

Sparse Covariance Selection using Semidefinite Programming

Sparse Covariance Selection using Semidefinite Programming Sparse Covariance Selection using Semidefinite Programming A. d Aspremont ORFE, Princeton University Joint work with O. Banerjee, L. El Ghaoui & G. Natsoulis, U.C. Berkeley & Iconix Pharmaceuticals Support

More information

Rezultate în Teoria Punctului Fix şi Procese Iterative cu Aplicaţii

Rezultate în Teoria Punctului Fix şi Procese Iterative cu Aplicaţii Rezultate în Teoria Punctului Fix şi Procese Iterative cu Aplicaţii Asist. drd. Adrian Sorinel Ghiura Departamentul de Matematică & Informatică Universitatea Politehnica din Bucureşti REZUMATUL TEZEI DE

More information

FORMULELE LUI STIRLING, WALLIS, GAUSS ŞI APLICAŢII

FORMULELE LUI STIRLING, WALLIS, GAUSS ŞI APLICAŢII DIDACTICA MATHEMATICA, Vol. 34), pp. 53 67 FORMULELE LUI STIRLING, WALLIS, GAUSS ŞI APLICAŢII Eugenia Duca, Emilia Copaciu şi Dorel I. Duca Abstract. In this paper are presented the Wallis, Stirling, Gauss

More information

AN APPROACH TO THE NONLINEAR LOCAL PROBLEMS IN MECHANICAL STRUCTURES

AN APPROACH TO THE NONLINEAR LOCAL PROBLEMS IN MECHANICAL STRUCTURES U.P.B. Sci. Bull., Series D, Vol. 74, Iss. 3, 2012 ISSN 1454-2358 AN APPROACH TO THE NONLINEAR LOCAL PROBLEMS IN MECHANICAL STRUCTURES Marius-Alexandru GROZEA 1, Anton HADĂR 2 Acest articol prezintă o

More information

Solving Dual Problems

Solving Dual Problems Lecture 20 Solving Dual Problems We consider a constrained problem where, in addition to the constraint set X, there are also inequality and linear equality constraints. Specifically the minimization problem

More information

An Inexact Sequential Quadratic Optimization Method for Nonlinear Optimization

An Inexact Sequential Quadratic Optimization Method for Nonlinear Optimization An Inexact Sequential Quadratic Optimization Method for Nonlinear Optimization Frank E. Curtis, Lehigh University involving joint work with Travis Johnson, Northwestern University Daniel P. Robinson, Johns

More information

Pentru clasa a X-a Ştiinţele naturii-sem II

Pentru clasa a X-a Ştiinţele naturii-sem II Pentru clasa a X-a Ştiinţele naturii-sem II Reprezentarea algoritmilor. Pseudocod. Principiile programării structurate. Structuri de bază: structura liniară structura alternativă structura repetitivă Algoritmi

More information

Numerical optimization

Numerical optimization Numerical optimization Lecture 4 Alexander & Michael Bronstein tosca.cs.technion.ac.il/book Numerical geometry of non-rigid shapes Stanford University, Winter 2009 2 Longest Slowest Shortest Minimal Maximal

More information

Numerical Methods I Non-Square and Sparse Linear Systems

Numerical Methods I Non-Square and Sparse Linear Systems Numerical Methods I Non-Square and Sparse Linear Systems Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 September 25th, 2014 A. Donev (Courant

More information

Convex Optimization of Graph Laplacian Eigenvalues

Convex Optimization of Graph Laplacian Eigenvalues Convex Optimization of Graph Laplacian Eigenvalues Stephen Boyd Abstract. We consider the problem of choosing the edge weights of an undirected graph so as to maximize or minimize some function of the

More information

GENERATOARE DE SEMNAL DIGITALE

GENERATOARE DE SEMNAL DIGITALE Technical University of Iasi, Romania Faculty of Electronics and Telecommunications Signals, Circuits and Systems laboratory Prof. Victor Grigoras Cuprins Clasificarea generatoarelor Filtre reursive la

More information

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725 Dual methods and ADMM Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Given f : R n R, the function is called its conjugate Recall conjugate functions f (y) = max x R n yt x f(x)

More information

Convex Optimization and l 1 -minimization

Convex Optimization and l 1 -minimization Convex Optimization and l 1 -minimization Sangwoon Yun Computational Sciences Korea Institute for Advanced Study December 11, 2009 2009 NIMS Thematic Winter School Outline I. Convex Optimization II. l

More information

12. Interior-point methods

12. Interior-point methods 12. Interior-point methods Convex Optimization Boyd & Vandenberghe inequality constrained minimization logarithmic barrier function and central path barrier method feasibility and phase I methods complexity

More information

arxiv: v1 [math.oc] 1 Jul 2016

arxiv: v1 [math.oc] 1 Jul 2016 Convergence Rate of Frank-Wolfe for Non-Convex Objectives Simon Lacoste-Julien INRIA - SIERRA team ENS, Paris June 8, 016 Abstract arxiv:1607.00345v1 [math.oc] 1 Jul 016 We give a simple proof that the

More information

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R

More information

Parallel Coordinate Optimization

Parallel Coordinate Optimization 1 / 38 Parallel Coordinate Optimization Julie Nutini MLRG - Spring Term March 6 th, 2018 2 / 38 Contours of a function F : IR 2 IR. Goal: Find the minimizer of F. Coordinate Descent in 2D Contours of a

More information

Randomized Block Coordinate Non-Monotone Gradient Method for a Class of Nonlinear Programming

Randomized Block Coordinate Non-Monotone Gradient Method for a Class of Nonlinear Programming Randomized Block Coordinate Non-Monotone Gradient Method for a Class of Nonlinear Programming Zhaosong Lu Lin Xiao June 25, 2013 Abstract In this paper we propose a randomized block coordinate non-monotone

More information

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison Optimization Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison optimization () cost constraints might be too much to cover in 3 hours optimization (for big

More information

Higher-Order Methods

Higher-Order Methods Higher-Order Methods Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. PCMI, July 2016 Stephen Wright (UW-Madison) Higher-Order Methods PCMI, July 2016 1 / 25 Smooth

More information

SIMULAREA DECIZIEI FINANCIARE

SIMULAREA DECIZIEI FINANCIARE SIMULAREA DECIZIEI FINANCIARE Conf. univ. dr. Nicolae BÂRSAN-PIPU T5.1 TEMA 5 DISTRIBUŢII DISCRETE T5. Cuprins T5.3 5.1 Variabile aleatoare discrete 5. Distribuţia de probabilitate a unei variabile aleatoare

More information

Convex Optimization M2

Convex Optimization M2 Convex Optimization M2 Lecture 8 A. d Aspremont. Convex Optimization M2. 1/57 Applications A. d Aspremont. Convex Optimization M2. 2/57 Outline Geometrical problems Approximation problems Combinatorial

More information

Gradient Methods Using Momentum and Memory

Gradient Methods Using Momentum and Memory Chapter 3 Gradient Methods Using Momentum and Memory The steepest descent method described in Chapter always steps in the negative gradient direction, which is orthogonal to the boundary of the level set

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Lecture 8: Optimization Cho-Jui Hsieh UC Davis May 9, 2017 Optimization Numerical Optimization Numerical Optimization: min X f (X ) Can be applied

More information

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization / Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods

More information

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015 EE613 Machine Learning for Engineers Kernel methods Support Vector Machines jean-marc odobez 2015 overview Kernel methods introductions and main elements defining kernels Kernelization of k-nn, K-Means,

More information

Fantope Regularization in Metric Learning

Fantope Regularization in Metric Learning Fantope Regularization in Metric Learning CVPR 2014 Marc T. Law (LIP6, UPMC), Nicolas Thome (LIP6 - UPMC Sorbonne Universités), Matthieu Cord (LIP6 - UPMC Sorbonne Universités), Paris, France Introduction

More information

Fast Linear Iterations for Distributed Averaging 1

Fast Linear Iterations for Distributed Averaging 1 Fast Linear Iterations for Distributed Averaging 1 Lin Xiao Stephen Boyd Information Systems Laboratory, Stanford University Stanford, CA 943-91 lxiao@stanford.edu, boyd@stanford.edu Abstract We consider

More information

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization E5295/5B5749 Convex optimization with engineering applications Lecture 8 Smooth convex unconstrained and equality-constrained minimization A. Forsgren, KTH 1 Lecture 8 Convex optimization 2006/2007 Unconstrained

More information

Constrained optimization: direct methods (cont.)

Constrained optimization: direct methods (cont.) Constrained optimization: direct methods (cont.) Jussi Hakanen Post-doctoral researcher jussi.hakanen@jyu.fi Direct methods Also known as methods of feasible directions Idea in a point x h, generate a

More information

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2 Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2 Coralia Cartis, University of Oxford INFOMM CDT: Modelling, Analysis and Computation of Continuous Real-World Problems Methods

More information

Iterative Methods for Solving A x = b

Iterative Methods for Solving A x = b Iterative Methods for Solving A x = b A good (free) online source for iterative methods for solving A x = b is given in the description of a set of iterative solvers called templates found at netlib: http

More information

Lecture 5: September 12

Lecture 5: September 12 10-725/36-725: Convex Optimization Fall 2015 Lecture 5: September 12 Lecturer: Lecturer: Ryan Tibshirani Scribes: Scribes: Barun Patra and Tyler Vuong Note: LaTeX template courtesy of UC Berkeley EECS

More information

ALGORITMI DE OPTIMIZARE IN INGINERIE ELECTRICA. Sef lucrari ing. Alin-Iulian DOLAN

ALGORITMI DE OPTIMIZARE IN INGINERIE ELECTRICA. Sef lucrari ing. Alin-Iulian DOLAN ALGORITMI DE OPTIMIZARE IN INGINERIE ELECTRICA Sef lucrari ing. Alin-Iulian DOLAN PROBLEME DE OPTIMIZARE OPTIMIZAREA gasirea celei mai bune solutii ale unei probleme, constand in minimizarea (maximizarea)

More information

Acta Technica Napocensis: Civil Engineering & Architecture Vol. 54 No.1 (2011)

Acta Technica Napocensis: Civil Engineering & Architecture Vol. 54 No.1 (2011) 1 Technical University of Cluj-Napoca, Faculty of Civil Engineering. 15 C Daicoviciu Str., 400020, Cluj-Napoca, Romania Received 25 July 2011; Accepted 1 September 2011 The Generalised Beam Theory (GBT)

More information

Sparse Gaussian conditional random fields

Sparse Gaussian conditional random fields Sparse Gaussian conditional random fields Matt Wytock, J. ico Kolter School of Computer Science Carnegie Mellon University Pittsburgh, PA 53 {mwytock, zkolter}@cs.cmu.edu Abstract We propose sparse Gaussian

More information

Minimizing the Difference of L 1 and L 2 Norms with Applications

Minimizing the Difference of L 1 and L 2 Norms with Applications 1/36 Minimizing the Difference of L 1 and L 2 Norms with Department of Mathematical Sciences University of Texas Dallas May 31, 2017 Partially supported by NSF DMS 1522786 2/36 Outline 1 A nonconvex approach:

More information

Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization

Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization Frank E. Curtis, Lehigh University Beyond Convexity Workshop, Oaxaca, Mexico 26 October 2017 Worst-Case Complexity Guarantees and Nonconvex

More information

NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained

NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS 1. Introduction. We consider first-order methods for smooth, unconstrained optimization: (1.1) minimize f(x), x R n where f : R n R. We assume

More information