Network Newton. Aryan Mokhtari, Qing Ling and Alejandro Ribeiro. University of Pennsylvania, University of Science and Technology (China)

Size: px
Start display at page:

Download "Network Newton. Aryan Mokhtari, Qing Ling and Alejandro Ribeiro. University of Pennsylvania, University of Science and Technology (China)"

Transcription

1 Network Newton Aryan Mokhtari, Qing Ling and Alejandro Ribeiro University of Pennsylvania, University of Science and Technology (China) Asilomar Conference on Signals, Systems and Computers Pacific Grove, CA, November 4, 2014 Mokhtari, Ling, Ribeiro Network Newton 1

2 Distributed optimization Network with n nodes. Each node i has access to local function f i (x) n Collaborate to minimize global objective f (x) = f i (x) Sample subsets to train classifier i=1 f 2 (x) f 5 (x) f 8 (x) f 1 (x) f 3 (x) f 6 (x) f 9 (x) f 4 (x) f 7 (x) f 10 (x) Nodes can operate (train or estimate) locally but would benefit by sharing Cost of aggregating functions is large Comms and computation Recursive exchanges with neighbors j N i to aggregate global information Mokhtari, Ling, Ribeiro Network Newton 2

3 Methods for distributed optimization n Replicate common variable at each node f (x 1,..., x n) = f i (x i ) Enforce equality between neighbors x i = x j (thus between all nodes) i=1 f 2 (x) f 5 (x) f 8 (x) f 1 (x) f 3 (x) f 6 (x) f 9 (x) f 4 (x) f 7 (x) f 10 (x) Operate recursively to enforce equality asymptotically. Differ on how. Distributed gradient descent, recursive averaging, [Nedic, Ozdaglar 09] Distributed dual descent, prices, [Rabbat et al 05] Distributed ADMM, prices, [Schizas et al 08] All are first order methods, thus, convergence times not always reasonable Mokhtari, Ling, Ribeiro Network Newton 3

4 Methods for distributed optimization n Replicate common variable at each node f (x 1,..., x n) = f i (x i ) Enforce equality between neighbors x i = x j (thus between all nodes) i=1 f 2 (x 2 ) f 5 (x 5 ) f 8 (x 8 ) f 1 (x 1 ) f 3 (x 3 ) f 6 (x 6 ) f 9 (x 9 ) f 4 (x 4 ) f 7 (x 7 ) f 10 (x 10 ) Operate recursively to enforce equality asymptotically. Differ on how. Distributed gradient descent, recursive averaging, [Nedic, Ozdaglar 09] Distributed dual descent, prices, [Rabbat et al 05] Distributed ADMM, prices, [Schizas et al 08] All are first order methods, thus, convergence times not always reasonable Mokhtari, Ling, Ribeiro Network Newton 4

5 (Approximate) Network Newton (NN) Reinterpret distributed gradient descent (DGD) as a penalty method Newton step for objective + penalty requires global coordination Approximate with local operations by truncating Taylor s series of Hessian inverse Hessian is neighbor sparse kth term of series is k-hop neighbor sparse NN-k aggregates information from k-hop neighborhood f 2 (x 2 ) f 5 (x 5 ) f 8 (x 8 ) f 1 (x 1 ) f 3 (x 3 ) f 6 (x 6 ) f 9 (x 9 ) f 4 (x 4 ) f 7 (x 7 ) f 10 (x 10 ) NN-k converges linearly always and exhibits a quadratic phase in a range Mokhtari, Ling, Ribeiro Network Newton 5

6 (Approximate) Network Newton (NN) Reinterpret distributed gradient descent (DGD) as a penalty method Newton step for objective + penalty requires global coordination Approximate with local operations by truncating Taylor s series of Hessian inverse Hessian is neighbor sparse kth term of series is k-hop neighbor sparse NN-k aggregates information from k-hop neighborhood NN-1 f 2 (x 2 ) f 5 (x 5 ) f 8 (x 8 ) f 1 (x 1 ) f 3 (x 3 ) f 6 (x 6 ) f 9 (x 9 ) f 4 (x 4 ) f 7 (x 7 ) f 10 (x 10 ) NN-k converges linearly always and exhibits a quadratic phase in a range Mokhtari, Ling, Ribeiro Network Newton 6

7 (Approximate) Network Newton (NN) Reinterpret distributed gradient descent (DGD) as a penalty method Newton step for objective + penalty requires global coordination Approximate with local operations by truncating Taylor s series of Hessian inverse Hessian is neighbor sparse kth term of series is k-hop neighbor sparse NN-k aggregates information from k-hop neighborhood NN-1 NN-2 f 2 (x 2 ) f 5 (x 5 ) f 8 (x 8 ) f 1 (x 1 ) f 3 (x 3 ) f 6 (x 6 ) f 9 (x 9 ) f 4 (x 4 ) f 7 (x 7 ) f 10 (x 10 ) NN-k converges linearly always and exhibits a quadratic phase in a range Mokhtari, Ling, Ribeiro Network Newton 7

8 (Approximate) Network Newton (NN) Reinterpret distributed gradient descent (DGD) as a penalty method Newton step for objective + penalty requires global coordination Approximate with local operations by truncating Taylor s series of Hessian inverse Hessian is neighbor sparse kth term of series is k-hop neighbor sparse NN-k aggregates information from k-hop neighborhood NN-1 NN-2 NN-3 f 2 (x 2 ) f 5 (x 5 ) f 8 (x 8 ) f 1 (x 1 ) f 3 (x 3 ) f 6 (x 6 ) f 9 (x 9 ) f 4 (x 4 ) f 7 (x 7 ) f 10 (x 10 ) NN-k converges linearly always and exhibits a quadratic phase in a range Mokhtari, Ling, Ribeiro Network Newton 8

9 Decentralized Gradient Descent (DGD) Problem in distributed form min x 1,...,x n n f i (x i ), i=1 s.t. x i = x j, for j N i With nonnegative doubly stochastic weights W = [w ij ], DGD update at node i is x i,t+1 = j=i,j N i w ij x j,t α f i (x i,t ) Average of local and neighboring variables + local gradient descent Mokhtari, Ling, Ribeiro Network Newton 9

10 Decentralized Gradient Descent (DGD) Rewrite DGD in vector form (aggregate variable y := [x 1;... ; x n]), y t+1 = Wy t αh(y t) Weight matrix W, W := W I. Gradient h(y) := [ f1(x 1);... ; f n(x n)] Reorder terms in vector form DGD y t+1 = y t [ (I W)y t + αh(y t) ] Gradient descent on function F (y) := 1 n 2 yt (I W) y + α f i (x i ) i=1 Mokhtari, Ling, Ribeiro Network Newton 10

11 DGD as a penalty method (reconsidering the mystery of DGD) Why do gradient descent on F (y) := 1 2 yt (I W) n y + α f i (x i )? i=1 Weight matrix W is constructed such that null(i W) = span(1) Thus null(i W) = span(1 I) and (I W)y = 0 if and only if x i = x j Same is true of (I W) 1/2 problem in distributed form is equivalent to min y n f i (x i ), s.t. (I W) 1/2 y = 0 i=1 DGD is a penalty method to solve this (equivalent) problem Squared norm penalty 1 2 (I W) 1/2 y 2 with coefficient 1/α Converges to wrong solution. Not far from right if α is small Gradient descent in F (y) works Why not using Newton steps on F (y)? Mokhtari, Ling, Ribeiro Network Newton 11

12 Newton method for penalized objective function Penalized objective function F (y) = 1 n 2 yt (I W)y + α f i (x i ) i=1 To implement Newton on F (y) need Hessians H t = I W + αg t G t is block diagonal with blocks G ii,t = 2 f i (x i,t ) Hessian H t has the sparsity pattern of W = sparsity pattern of graph Can be computed with local information + exchanges with neighbors Newton step depends on Hessian inverse d t := H 1 t g t Inverse of H t is, in general, not block sparse nor locally computable Mokhtari, Ling, Ribeiro Network Newton 12

13 Network Newton Hessian approximation Define diagonal matrix D t := αg t + 2(I diag( W)) Define block graph sparse matrix B := I 2diag( W) + W ( ) Split Hessian as H t = D t B = D 1/2 t I D 1/2 t BD 1/2 t D 1/2 t Use Taylor series (I X) 1 = k=0 Xk to write Hessian inverse as H 1 t = D 1/2 t k=0 ( ) kd D 1/2 t BD 1/2 1/2 t t Define NN-K step d (K) t := Ĥ(K) 1 t g t by truncating sum at Kth term Ĥ (K) 1 t := D 1/2 t K k=0 ( ) k D 1/2 t BD 1/2 1/2 t D t ( ) k D 1/2 t BD 1/2 t graph sparse D 1/2 t BD 1/2 t k-hop neighborhood sparse Mokhtari, Ling, Ribeiro Network Newton 13

14 Distributed computation of NN-K step Recursion for NN-k steps. Define d (0) t d (k+1) t = D 1 t Bd (k) t = D 1 t g t and for all other k D 1 t g t Given that D t is diagonal, can rewrite recursion componentwise as d (k+1) i,t d (k+1) i,t = D 1 ii,t n j=1 B ij d (k) j,t D 1 ii,t g i,t But given that B is graph sparse B ij = 0 unless i and j are neighbors = D 1 ii,t j N i,j=i B ij d (k) j,t D 1 ii,t g i,t Local piece of NN-(k + 1) step is computed as a function of Local matrices, local gradient components, local piece of NN-k step Pieces of the NN-k step of neighboring nodes. Can exchange. Mokhtari, Ling, Ribeiro Network Newton 14

15 NN-K Algorithm at node i (0) Initialize at x i,0. Repeat for times t = 0, 1,... (1) Exchange local iterates x i,t with neighboring nodes j N i. (2) Compute local gradient components g i,t = (1 w ii )x i,t j N i w ij x j,t + α f i (x i,t ) (3) Initialize NN step computation with NN-0 step d (0) i,t (4) Repeat for k = 0, 1,..., K 1 = D 1 ii,t g i,t (5) Exchange local elements d (k) i,t of NN-k step with neighbors j N i. (6) Compute local component of NN-(k + 1) step d (k+1) i,t = D 1 ii,t j N i,j=i (7) Update local iterate: x i,t+1 = x i,t + ɛ d (K) i,t B ij d (k) j,t D 1 ii,t g i,t Mokhtari, Ling, Ribeiro Network Newton 15

16 Assumptions Assumption 1 The local objective functions f i (x) are twice differentiable The Hessians 2 f i (x) have bounded eigenvalues mi 2 f i (x) MI Assumption 2 The local Hessians are Lipschitz continuous 2 f i (x) 2 f i (ˆx) L x ˆx Assumption 3 The local weights w ii are bounded 0 δ w ii < 1 i = 1,..., n. The upper bound is implied by connectivity condition. Mokhtari, Ling, Ribeiro Network Newton 16

17 Linear convergence of NN-K Theorem For a specific choice of stepsize ɛ the sequence F (y t) converges to the optimal argument F (y ) at least linearly with constant 0 < 1 ζ α < 1, i.e., F (y t) F (y ) (1 ζ α) t (F (y 0) F (y )) ɛ is the minimum of 1 and a constant depending on problem parameters Trade-off between convergence rate and accuracy Large α implies small ζ α and faster convergence Smaller choices of α implies more accurate convergence Mokhtari, Ling, Ribeiro Network Newton 17

18 Superlinear convergence Lemma Lemma For specific values of Γ 1 and Γ 2 the sequence of weighted gradient norm satisfies ] D 1/2 t g t+1 (1 ɛ+ɛρ K+1 ) [1 + Γ 1(1 ζ α) t 1 4 D 1/2 +ɛ t 1 gt 2 D 1/2 2 Γ 2 t 1 gt where ρ < 1. D 1 2 t g t+1 is upper bounded by linear and quadratic terms of D 1 2 Similar to the convergence analysis of Newton method with constant ɛ For t large enough Γ 1(1 ζ α) t t 1 gt There must be intervals in which the quadratic term dominates linear term Rate of convergence is quadratic in that interval Mokhtari, Ling, Ribeiro Network Newton 18

19 Quadratic phase of NN-K convergence Theorem For η t := [(1 ɛ + ɛρ K+1 )(1 + Γ 1(1 ζ) (t 1)/4 )] and t 0 := argmin t {t η t < 1}, we have that for all t t 0 if then ηt(1 η t) < D 1/2 ɛ 2 t 1 Γ gt < 1 η t, 2 ɛ 2 Γ 2 D 1/2 t g t+1 ɛ2 Γ 2 1 η t D 1/2 t 1 gt 2. Quadratic convergence of D 1/2 gt in a specified interval. t 1 Mokhtari, Ling, Ribeiro Network Newton 19

20 Numerical results Convergence path for f (x) := 100 i=1 xt A i x/2 + b T i x Condition number = 10 3, α = 10 2, graph is d regular with d = 4, ɛ = 1 xi x 2 x 2 error = 1 n n i= DGD NN-0 NN-1 NN Number of iterations DGD is slower than different versions of NN-K NN-K with larger K converges faster in terms of number of iterations Mokhtari, Ling, Ribeiro Network Newton 20

21 Numerical results Number of required information exchanges to achieve accuracy e = 10 2 n = 100, d = {4,..., 10} and c.n.= {10 2, 10 3, 10 4 } Empirical distribution Empirical distribution DGD Number of information exchanges NN Number of information exchanges Empirical distribution Empirical distribution NN-0 Number of information exchanges NN Number of information exchanges Different versions of NN-k have almost similar performances DGD is slower than all versions of NN-K by an order of magnitude Mokhtari, Ling, Ribeiro Network Newton 21

22 Numerical Results Convergence of NN-K and DGD with decreasing α Divide α by 10 when algorithm is converged DGD NN-0 NN-1 NN DGD NN-0 NN-1 NN-2 error 10 3 error number of iterations a) α 0 = number of iterations b) α 0 = 10 1 Exact convergence is achieved by decreasing α Larger initial value for α leads to faster convergence for both algorithms Mokhtari, Ling, Ribeiro Network Newton 22

23 Conclusions Introduced a network optimization formulation Each agent has local cost function f i Global cost f = n i=1 f i Network Newton is proposed as a second-order distributed method Approximates Newton by truncating the Taylor series of Hessian inverse Linear convergence is established Quadratic convergence in a specific interval is shown NN has faster convergence relative to DGD according to numerical results Mokhtari, Ling, Ribeiro Network Newton 23

Efficient Methods for Large-Scale Optimization

Efficient Methods for Large-Scale Optimization Efficient Methods for Large-Scale Optimization Aryan Mokhtari Department of Electrical and Systems Engineering University of Pennsylvania aryanm@seas.upenn.edu Ph.D. Proposal Advisor: Alejandro Ribeiro

More information

DECENTRALIZED algorithms are used to solve optimization

DECENTRALIZED algorithms are used to solve optimization 5158 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 64, NO. 19, OCTOBER 1, 016 DQM: Decentralized Quadratically Approximated Alternating Direction Method of Multipliers Aryan Mohtari, Wei Shi, Qing Ling,

More information

Decentralized Quasi-Newton Methods

Decentralized Quasi-Newton Methods 1 Decentralized Quasi-Newton Methods Mark Eisen, Aryan Mokhtari, and Alejandro Ribeiro Abstract We introduce the decentralized Broyden-Fletcher- Goldfarb-Shanno (D-BFGS) method as a variation of the BFGS

More information

Decentralized Quadratically Approximated Alternating Direction Method of Multipliers

Decentralized Quadratically Approximated Alternating Direction Method of Multipliers Decentralized Quadratically Approimated Alternating Direction Method of Multipliers Aryan Mokhtari Wei Shi Qing Ling Alejandro Ribeiro Department of Electrical and Systems Engineering, University of Pennsylvania

More information

DLM: Decentralized Linearized Alternating Direction Method of Multipliers

DLM: Decentralized Linearized Alternating Direction Method of Multipliers 1 DLM: Decentralized Linearized Alternating Direction Method of Multipliers Qing Ling, Wei Shi, Gang Wu, and Alejandro Ribeiro Abstract This paper develops the Decentralized Linearized Alternating Direction

More information

High Order Methods for Empirical Risk Minimization

High Order Methods for Empirical Risk Minimization High Order Methods for Empirical Risk Minimization Alejandro Ribeiro Department of Electrical and Systems Engineering University of Pennsylvania aribeiro@seas.upenn.edu Thanks to: Aryan Mokhtari, Mark

More information

ADMM and Fast Gradient Methods for Distributed Optimization

ADMM and Fast Gradient Methods for Distributed Optimization ADMM and Fast Gradient Methods for Distributed Optimization João Xavier Instituto Sistemas e Robótica (ISR), Instituto Superior Técnico (IST) European Control Conference, ECC 13 July 16, 013 Joint work

More information

A Distributed Newton Method for Network Utility Maximization, II: Convergence

A Distributed Newton Method for Network Utility Maximization, II: Convergence A Distributed Newton Method for Network Utility Maximization, II: Convergence Ermin Wei, Asuman Ozdaglar, and Ali Jadbabaie October 31, 2012 Abstract The existing distributed algorithms for Network Utility

More information

High Order Methods for Empirical Risk Minimization

High Order Methods for Empirical Risk Minimization High Order Methods for Empirical Risk Minimization Alejandro Ribeiro Department of Electrical and Systems Engineering University of Pennsylvania aribeiro@seas.upenn.edu IPAM Workshop of Emerging Wireless

More information

Incremental Quasi-Newton methods with local superlinear convergence rate

Incremental Quasi-Newton methods with local superlinear convergence rate Incremental Quasi-Newton methods wh local superlinear convergence rate Aryan Mokhtari, Mark Eisen, and Alejandro Ribeiro Department of Electrical and Systems Engineering Universy of Pennsylvania Int. Conference

More information

Distributed Smooth and Strongly Convex Optimization with Inexact Dual Methods

Distributed Smooth and Strongly Convex Optimization with Inexact Dual Methods Distributed Smooth and Strongly Convex Optimization with Inexact Dual Methods Mahyar Fazlyab, Santiago Paternain, Alejandro Ribeiro and Victor M. Preciado Abstract In this paper, we consider a class of

More information

A Distributed Newton Method for Network Utility Maximization, I: Algorithm

A Distributed Newton Method for Network Utility Maximization, I: Algorithm A Distributed Newton Method for Networ Utility Maximization, I: Algorithm Ermin Wei, Asuman Ozdaglar, and Ali Jadbabaie October 31, 2012 Abstract Most existing wors use dual decomposition and first-order

More information

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 62, NO. 23, DECEMBER 1, Aryan Mokhtari and Alejandro Ribeiro, Member, IEEE

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 62, NO. 23, DECEMBER 1, Aryan Mokhtari and Alejandro Ribeiro, Member, IEEE IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 62, NO. 23, DECEMBER 1, 2014 6089 RES: Regularized Stochastic BFGS Algorithm Aryan Mokhtari and Alejandro Ribeiro, Member, IEEE Abstract RES, a regularized

More information

Decentralized Consensus Optimization with Asynchrony and Delay

Decentralized Consensus Optimization with Asynchrony and Delay Decentralized Consensus Optimization with Asynchrony and Delay Tianyu Wu, Kun Yuan 2, Qing Ling 3, Wotao Yin, and Ali H. Sayed 2 Department of Mathematics, 2 Department of Electrical Engineering, University

More information

High Order Methods for Empirical Risk Minimization

High Order Methods for Empirical Risk Minimization High Order Methods for Empirical Risk Minimization Alejandro Ribeiro Department of Electrical and Systems Engineering University of Pennsylvania aribeiro@seas.upenn.edu IPAM Workshop of Emerging Wireless

More information

r=1 r=1 argmin Q Jt (20) After computing the descent direction d Jt 2 dt H t d + P (x + d) d i = 0, i / J

r=1 r=1 argmin Q Jt (20) After computing the descent direction d Jt 2 dt H t d + P (x + d) d i = 0, i / J 7 Appendix 7. Proof of Theorem Proof. There are two main difficulties in proving the convergence of our algorithm, and none of them is addressed in previous works. First, the Hessian matrix H is a block-structured

More information

Stochastic Optimization Algorithms Beyond SG

Stochastic Optimization Algorithms Beyond SG Stochastic Optimization Algorithms Beyond SG Frank E. Curtis 1, Lehigh University involving joint work with Léon Bottou, Facebook AI Research Jorge Nocedal, Northwestern University Optimization Methods

More information

Unconstrained minimization of smooth functions

Unconstrained minimization of smooth functions Unconstrained minimization of smooth functions We want to solve min x R N f(x), where f is convex. In this section, we will assume that f is differentiable (so its gradient exists at every point), and

More information

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725 Proximal Newton Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: primal-dual interior-point method Given the problem min x subject to f(x) h i (x) 0, i = 1,... m Ax = b where f, h

More information

Lecture 15 Newton Method and Self-Concordance. October 23, 2008

Lecture 15 Newton Method and Self-Concordance. October 23, 2008 Newton Method and Self-Concordance October 23, 2008 Outline Lecture 15 Self-concordance Notion Self-concordant Functions Operations Preserving Self-concordance Properties of Self-concordant Functions Implications

More information

ABSTRACT 1. INTRODUCTION

ABSTRACT 1. INTRODUCTION A DIAGONAL-AUGMENTED QUASI-NEWTON METHOD WITH APPLICATION TO FACTORIZATION MACHINES Aryan Mohtari and Amir Ingber Department of Electrical and Systems Engineering, University of Pennsylvania, PA, USA Big-data

More information

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R

More information

Contents. 1 Introduction. 1.1 History of Optimization ALG-ML SEMINAR LISSA: LINEAR TIME SECOND-ORDER STOCHASTIC ALGORITHM FEBRUARY 23, 2016

Contents. 1 Introduction. 1.1 History of Optimization ALG-ML SEMINAR LISSA: LINEAR TIME SECOND-ORDER STOCHASTIC ALGORITHM FEBRUARY 23, 2016 ALG-ML SEMINAR LISSA: LINEAR TIME SECOND-ORDER STOCHASTIC ALGORITHM FEBRUARY 23, 2016 LECTURERS: NAMAN AGARWAL AND BRIAN BULLINS SCRIBE: KIRAN VODRAHALLI Contents 1 Introduction 1 1.1 History of Optimization.....................................

More information

Distributed Consensus Optimization

Distributed Consensus Optimization Distributed Consensus Optimization Ming Yan Michigan State University, CMSE/Mathematics September 14, 2018 Decentralized-1 Backgroundwhy andwe motivation need decentralized optimization? I Decentralized

More information

Unconstrained optimization

Unconstrained optimization Chapter 4 Unconstrained optimization An unconstrained optimization problem takes the form min x Rnf(x) (4.1) for a target functional (also called objective function) f : R n R. In this chapter and throughout

More information

WE consider the problem of estimating a time varying

WE consider the problem of estimating a time varying 450 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL 61, NO 2, JANUARY 15, 2013 D-MAP: Distributed Maximum a Posteriori Probability Estimation of Dynamic Systems Felicia Y Jakubiec Alejro Ribeiro Abstract This

More information

Newton-like method with diagonal correction for distributed optimization

Newton-like method with diagonal correction for distributed optimization Newton-lie method with diagonal correction for distributed optimization Dragana Bajović Dušan Jaovetić Nataša Krejić Nataša Krlec Jerinić February 7, 2017 Abstract We consider distributed optimization

More information

Lecture 3. Optimization Problems and Iterative Algorithms

Lecture 3. Optimization Problems and Iterative Algorithms Lecture 3 Optimization Problems and Iterative Algorithms January 13, 2016 This material was jointly developed with Angelia Nedić at UIUC for IE 598ns Outline Special Functions: Linear, Quadratic, Convex

More information

10. Unconstrained minimization

10. Unconstrained minimization Convex Optimization Boyd & Vandenberghe 10. Unconstrained minimization terminology and assumptions gradient descent method steepest descent method Newton s method self-concordant functions implementation

More information

Newton-like method with diagonal correction for distributed optimization

Newton-like method with diagonal correction for distributed optimization Newton-lie method with diagonal correction for distributed optimization Dragana Bajović Dušan Jaovetić Nataša Krejić Nataša Krlec Jerinić August 15, 2015 Abstract We consider distributed optimization problems

More information

LARGE-SCALE NONCONVEX STOCHASTIC OPTIMIZATION BY DOUBLY STOCHASTIC SUCCESSIVE CONVEX APPROXIMATION

LARGE-SCALE NONCONVEX STOCHASTIC OPTIMIZATION BY DOUBLY STOCHASTIC SUCCESSIVE CONVEX APPROXIMATION LARGE-SCALE NONCONVEX STOCHASTIC OPTIMIZATION BY DOUBLY STOCHASTIC SUCCESSIVE CONVEX APPROXIMATION Aryan Mokhtari, Alec Koppel, Gesualdo Scutari, and Alejandro Ribeiro Department of Electrical and Systems

More information

A Distributed Newton Method for Network Utility Maximization

A Distributed Newton Method for Network Utility Maximization A Distributed Newton Method for Networ Utility Maximization Ermin Wei, Asuman Ozdaglar, and Ali Jadbabaie Abstract Most existing wor uses dual decomposition and subgradient methods to solve Networ Utility

More information

ARock: an algorithmic framework for asynchronous parallel coordinate updates

ARock: an algorithmic framework for asynchronous parallel coordinate updates ARock: an algorithmic framework for asynchronous parallel coordinate updates Zhimin Peng, Yangyang Xu, Ming Yan, Wotao Yin ( UCLA Math, U.Waterloo DCO) UCLA CAM Report 15-37 ShanghaiTech SSDS 15 June 25,

More information

Newton s Method for Constrained Norm Minimization and Its Application to Weighted Graph Problems

Newton s Method for Constrained Norm Minimization and Its Application to Weighted Graph Problems Newton s Method for Constrained Norm Minimization and Its Application to Weighted Graph Problems Mahmoud El Chamie 1 Giovanni Neglia 1 Abstract Due to increasing computer processing power, Newton s method

More information

WE consider an undirected, connected network of n

WE consider an undirected, connected network of n On Nonconvex Decentralized Gradient Descent Jinshan Zeng and Wotao Yin Abstract Consensus optimization has received considerable attention in recent years. A number of decentralized algorithms have been

More information

On the linear convergence of distributed optimization over directed graphs

On the linear convergence of distributed optimization over directed graphs 1 On the linear convergence of distributed optimization over directed graphs Chenguang Xi, and Usman A. Khan arxiv:1510.0149v4 [math.oc] 7 May 016 Abstract This paper develops a fast distributed algorithm,

More information

Newton s Method. Javier Peña Convex Optimization /36-725

Newton s Method. Javier Peña Convex Optimization /36-725 Newton s Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, f ( (y) = max y T x f(x) ) x Properties and

More information

D4L: Decentralized Dynamic Discriminative Dictionary Learning

D4L: Decentralized Dynamic Discriminative Dictionary Learning D4L: Decentralized Dynamic Discriminative Dictionary Learning Alec Koppel, Garrett Warnell, Ethan Stump, and Alejandro Ribeiro Abstract We consider discriminative dictionary learning in a distributed online

More information

Asynchronous Non-Convex Optimization For Separable Problem

Asynchronous Non-Convex Optimization For Separable Problem Asynchronous Non-Convex Optimization For Separable Problem Sandeep Kumar and Ketan Rajawat Dept. of Electrical Engineering, IIT Kanpur Uttar Pradesh, India Distributed Optimization A general multi-agent

More information

arxiv: v1 [math.oc] 29 Sep 2018

arxiv: v1 [math.oc] 29 Sep 2018 Distributed Finite-time Least Squares Solver for Network Linear Equations Tao Yang a, Jemin George b, Jiahu Qin c,, Xinlei Yi d, Junfeng Wu e arxiv:856v mathoc 9 Sep 8 a Department of Electrical Engineering,

More information

Improving the Convergence of Back-Propogation Learning with Second Order Methods

Improving the Convergence of Back-Propogation Learning with Second Order Methods the of Back-Propogation Learning with Second Order Methods Sue Becker and Yann le Cun, Sept 1988 Kasey Bray, October 2017 Table of Contents 1 with Back-Propagation 2 the of BP 3 A Computationally Feasible

More information

A SIMPLE PARALLEL ALGORITHM WITH AN O(1/T ) CONVERGENCE RATE FOR GENERAL CONVEX PROGRAMS

A SIMPLE PARALLEL ALGORITHM WITH AN O(1/T ) CONVERGENCE RATE FOR GENERAL CONVEX PROGRAMS A SIMPLE PARALLEL ALGORITHM WITH AN O(/T ) CONVERGENCE RATE FOR GENERAL CONVEX PROGRAMS HAO YU AND MICHAEL J. NEELY Abstract. This paper considers convex programs with a general (possibly non-differentiable)

More information

Stochastic Quasi-Newton Methods

Stochastic Quasi-Newton Methods Stochastic Quasi-Newton Methods Donald Goldfarb Department of IEOR Columbia University UCLA Distinguished Lecture Series May 17-19, 2016 1 / 35 Outline Stochastic Approximation Stochastic Gradient Descent

More information

The Conjugate Gradient Method

The Conjugate Gradient Method The Conjugate Gradient Method Lecture 5, Continuous Optimisation Oxford University Computing Laboratory, HT 2006 Notes by Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The notion of complexity (per iteration)

More information

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL) Part 3: Trust-region methods for unconstrained optimization Nick Gould (RAL) minimize x IR n f(x) MSc course on nonlinear optimization UNCONSTRAINED MINIMIZATION minimize x IR n f(x) where the objective

More information

Stochastic Proximal Gradient Algorithm

Stochastic Proximal Gradient Algorithm Stochastic Institut Mines-Télécom / Telecom ParisTech / Laboratoire Traitement et Communication de l Information Joint work with: Y. Atchade, Ann Arbor, USA, G. Fort LTCI/Télécom Paristech and the kind

More information

Accelerated Distributed Nesterov Gradient Descent

Accelerated Distributed Nesterov Gradient Descent Accelerated Distributed Nesterov Gradient Descent Guannan Qu, Na Li arxiv:705.0776v3 [math.oc] 6 Aug 08 Abstract This paper considers the distributed optimization problem over a network, where the objective

More information

Why should you care about the solution strategies?

Why should you care about the solution strategies? Optimization Why should you care about the solution strategies? Understanding the optimization approaches behind the algorithms makes you more effectively choose which algorithm to run Understanding the

More information

Lecture 4 - The Gradient Method Objective: find an optimal solution of the problem

Lecture 4 - The Gradient Method Objective: find an optimal solution of the problem Lecture 4 - The Gradient Method Objective: find an optimal solution of the problem min{f (x) : x R n }. The iterative algorithms that we will consider are of the form x k+1 = x k + t k d k, k = 0, 1,...

More information

Lecture 4 - The Gradient Method Objective: find an optimal solution of the problem

Lecture 4 - The Gradient Method Objective: find an optimal solution of the problem Lecture 4 - The Gradient Method Objective: find an optimal solution of the problem min{f (x) : x R n }. The iterative algorithms that we will consider are of the form x k+1 = x k + t k d k, k = 0, 1,...

More information

Network Optimization with Heuristic Rational Agents

Network Optimization with Heuristic Rational Agents Network Optimization with Heuristic Rational Agents Ceyhun Eksin and Alejandro Ribeiro Department of Electrical and Systems Engineering, University of Pennsylvania {ceksin, aribeiro}@seas.upenn.edu Abstract

More information

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression Noah Simon Jerome Friedman Trevor Hastie November 5, 013 Abstract In this paper we purpose a blockwise descent

More information

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization E5295/5B5749 Convex optimization with engineering applications Lecture 8 Smooth convex unconstrained and equality-constrained minimization A. Forsgren, KTH 1 Lecture 8 Convex optimization 2006/2007 Unconstrained

More information

ECE580 Fall 2015 Solution to Midterm Exam 1 October 23, Please leave fractions as fractions, but simplify them, etc.

ECE580 Fall 2015 Solution to Midterm Exam 1 October 23, Please leave fractions as fractions, but simplify them, etc. ECE580 Fall 2015 Solution to Midterm Exam 1 October 23, 2015 1 Name: Solution Score: /100 This exam is closed-book. You must show ALL of your work for full credit. Please read the questions carefully.

More information

Lecture 25: November 27

Lecture 25: November 27 10-725: Optimization Fall 2012 Lecture 25: November 27 Lecturer: Ryan Tibshirani Scribes: Matt Wytock, Supreeth Achar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have

More information

On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method

On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method Optimization Methods and Software Vol. 00, No. 00, Month 200x, 1 11 On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method ROMAN A. POLYAK Department of SEOR and Mathematical

More information

How to Characterize the Worst-Case Performance of Algorithms for Nonconvex Optimization

How to Characterize the Worst-Case Performance of Algorithms for Nonconvex Optimization How to Characterize the Worst-Case Performance of Algorithms for Nonconvex Optimization Frank E. Curtis Department of Industrial and Systems Engineering, Lehigh University Daniel P. Robinson Department

More information

Homework 5. Convex Optimization /36-725

Homework 5. Convex Optimization /36-725 Homework 5 Convex Optimization 10-725/36-725 Due Tuesday November 22 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)

More information

ORIE 6326: Convex Optimization. Quasi-Newton Methods

ORIE 6326: Convex Optimization. Quasi-Newton Methods ORIE 6326: Convex Optimization Quasi-Newton Methods Professor Udell Operations Research and Information Engineering Cornell April 10, 2017 Slides on steepest descent and analysis of Newton s method adapted

More information

Introduction to gradient descent

Introduction to gradient descent 6-1: Introduction to gradient descent Prof. J.C. Kao, UCLA Introduction to gradient descent Derivation and intuitions Hessian 6-2: Introduction to gradient descent Prof. J.C. Kao, UCLA Introduction Our

More information

Newton s Method. Ryan Tibshirani Convex Optimization /36-725

Newton s Method. Ryan Tibshirani Convex Optimization /36-725 Newton s Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, Properties and examples: f (y) = max x

More information

MATH 4211/6211 Optimization Quasi-Newton Method

MATH 4211/6211 Optimization Quasi-Newton Method MATH 4211/6211 Optimization Quasi-Newton Method Xiaojing Ye Department of Mathematics & Statistics Georgia State University Xiaojing Ye, Math & Stat, Georgia State University 0 Quasi-Newton Method Motivation:

More information

Math 273a: Optimization Netwon s methods

Math 273a: Optimization Netwon s methods Math 273a: Optimization Netwon s methods Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 some material taken from Chong-Zak, 4th Ed. Main features of Newton s method Uses both first derivatives

More information

Numerical Methods I Solving Nonlinear Equations

Numerical Methods I Solving Nonlinear Equations Numerical Methods I Solving Nonlinear Equations Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 October 16th, 2014 A. Donev (Courant Institute)

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 29, 2016 Outline Convex vs Nonconvex Functions Coordinate Descent Gradient Descent Newton s method Stochastic Gradient Descent Numerical Optimization

More information

Lecture 14: October 17

Lecture 14: October 17 1-725/36-725: Convex Optimization Fall 218 Lecture 14: October 17 Lecturer: Lecturer: Ryan Tibshirani Scribes: Pengsheng Guo, Xian Zhou Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

Numerical Optimization

Numerical Optimization Unconstrained Optimization Computer Science and Automation Indian Institute of Science Bangalore 560 01, India. NPTEL Course on Unconstrained Minimization Let f : R n R. Consider the optimization problem:

More information

ALADIN An Algorithm for Distributed Non-Convex Optimization and Control

ALADIN An Algorithm for Distributed Non-Convex Optimization and Control ALADIN An Algorithm for Distributed Non-Convex Optimization and Control Boris Houska, Yuning Jiang, Janick Frasch, Rien Quirynen, Dimitris Kouzoupis, Moritz Diehl ShanghaiTech University, University of

More information

Randomized Hessian Estimation and Directional Search

Randomized Hessian Estimation and Directional Search Randomized Hessian Estimation and Directional Search D. Leventhal A.S. Lewis September 4, 008 Key words: derivative-free optimization, directional search, quasi-newton, random search, steepest descent

More information

Consensus-Based Distributed Optimization with Malicious Nodes

Consensus-Based Distributed Optimization with Malicious Nodes Consensus-Based Distributed Optimization with Malicious Nodes Shreyas Sundaram Bahman Gharesifard Abstract We investigate the vulnerabilities of consensusbased distributed optimization protocols to nodes

More information

Midterm exam CS 189/289, Fall 2015

Midterm exam CS 189/289, Fall 2015 Midterm exam CS 189/289, Fall 2015 You have 80 minutes for the exam. Total 100 points: 1. True/False: 36 points (18 questions, 2 points each). 2. Multiple-choice questions: 24 points (8 questions, 3 points

More information

WE consider an undirected, connected network of n

WE consider an undirected, connected network of n On Nonconvex Decentralized Gradient Descent Jinshan Zeng and Wotao Yin Abstract Consensus optimization has received considerable attention in recent years. A number of decentralized algorithms have been

More information

A Quick Tour of Linear Algebra and Optimization for Machine Learning

A Quick Tour of Linear Algebra and Optimization for Machine Learning A Quick Tour of Linear Algebra and Optimization for Machine Learning Masoud Farivar January 8, 2015 1 / 28 Outline of Part I: Review of Basic Linear Algebra Matrices and Vectors Matrix Multiplication Operators

More information

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization / Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods

More information

A Distributed Newton Method for Network Optimization

A Distributed Newton Method for Network Optimization A Distributed Newton Method for Networ Optimization Ali Jadbabaie, Asuman Ozdaglar, and Michael Zargham Abstract Most existing wor uses dual decomposition and subgradient methods to solve networ optimization

More information

Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno February 6, / 25 (BFG. Limited memory BFGS (L-BFGS)

Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno February 6, / 25 (BFG. Limited memory BFGS (L-BFGS) Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno (BFGS) Limited memory BFGS (L-BFGS) February 6, 2014 Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb

More information

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44 Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)

More information

Convex Optimization. 9. Unconstrained minimization. Prof. Ying Cui. Department of Electrical Engineering Shanghai Jiao Tong University

Convex Optimization. 9. Unconstrained minimization. Prof. Ying Cui. Department of Electrical Engineering Shanghai Jiao Tong University Convex Optimization 9. Unconstrained minimization Prof. Ying Cui Department of Electrical Engineering Shanghai Jiao Tong University 2017 Autumn Semester SJTU Ying Cui 1 / 40 Outline Unconstrained minimization

More information

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 3. Gradient Method

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 3. Gradient Method Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 3 Gradient Method Shiqian Ma, MAT-258A: Numerical Optimization 2 3.1. Gradient method Classical gradient method: to minimize a differentiable convex

More information

Coordinate Descent and Ascent Methods

Coordinate Descent and Ascent Methods Coordinate Descent and Ascent Methods Julie Nutini Machine Learning Reading Group November 3 rd, 2015 1 / 22 Projected-Gradient Methods Motivation Rewrite non-smooth problem as smooth constrained problem:

More information

Accelerated Block-Coordinate Relaxation for Regularized Optimization

Accelerated Block-Coordinate Relaxation for Regularized Optimization Accelerated Block-Coordinate Relaxation for Regularized Optimization Stephen J. Wright Computer Sciences University of Wisconsin, Madison October 09, 2012 Problem descriptions Consider where f is smooth

More information

A Second-Order Method for Strongly Convex l 1 -Regularization Problems

A Second-Order Method for Strongly Convex l 1 -Regularization Problems Noname manuscript No. (will be inserted by the editor) A Second-Order Method for Strongly Convex l 1 -Regularization Problems Kimon Fountoulakis and Jacek Gondzio Technical Report ERGO-13-11 June, 13 Abstract

More information

arxiv: v2 [cs.lg] 8 Nov 2018

arxiv: v2 [cs.lg] 8 Nov 2018 An Exact Quantized Decentralized Gradient Descent Algorithm Amirhossein Reisizadeh, Aryan Mokhtari, Hamed Hassani, Ramtin Pedarsani arxiv:806.536v cs.lg] 8 Nov 08 Abstract We consider the problem of decentralized

More information

LECTURE 22: SWARM INTELLIGENCE 3 / CLASSICAL OPTIMIZATION

LECTURE 22: SWARM INTELLIGENCE 3 / CLASSICAL OPTIMIZATION 15-382 COLLECTIVE INTELLIGENCE - S19 LECTURE 22: SWARM INTELLIGENCE 3 / CLASSICAL OPTIMIZATION TEACHER: GIANNI A. DI CARO WHAT IF WE HAVE ONE SINGLE AGENT PSO leverages the presence of a swarm: the outcome

More information

Machine Learning. Support Vector Machines. Fabio Vandin November 20, 2017

Machine Learning. Support Vector Machines. Fabio Vandin November 20, 2017 Machine Learning Support Vector Machines Fabio Vandin November 20, 2017 1 Classification and Margin Consider a classification problem with two classes: instance set X = R d label set Y = { 1, 1}. Training

More information

Inverse problems Total Variation Regularization Mark van Kraaij Casa seminar 23 May 2007 Technische Universiteit Eindh ove n University of Technology

Inverse problems Total Variation Regularization Mark van Kraaij Casa seminar 23 May 2007 Technische Universiteit Eindh ove n University of Technology Inverse problems Total Variation Regularization Mark van Kraaij Casa seminar 23 May 27 Introduction Fredholm first kind integral equation of convolution type in one space dimension: g(x) = 1 k(x x )f(x

More information

Distributed Optimization over Networks Gossip-Based Algorithms

Distributed Optimization over Networks Gossip-Based Algorithms Distributed Optimization over Networks Gossip-Based Algorithms Angelia Nedić angelia@illinois.edu ISE Department and Coordinated Science Laboratory University of Illinois at Urbana-Champaign Outline Random

More information

On the Linear Convergence of Distributed Optimization over Directed Graphs

On the Linear Convergence of Distributed Optimization over Directed Graphs 1 On the Linear Convergence of Distributed Optimization over Directed Graphs Chenguang Xi, and Usman A. Khan arxiv:1510.0149v1 [math.oc] 7 Oct 015 Abstract This paper develops a fast distributed algorithm,

More information

Simple Iteration, cont d

Simple Iteration, cont d Jim Lambers MAT 772 Fall Semester 2010-11 Lecture 2 Notes These notes correspond to Section 1.2 in the text. Simple Iteration, cont d In general, nonlinear equations cannot be solved in a finite sequence

More information

5 Quasi-Newton Methods

5 Quasi-Newton Methods Unconstrained Convex Optimization 26 5 Quasi-Newton Methods If the Hessian is unavailable... Notation: H = Hessian matrix. B is the approximation of H. C is the approximation of H 1. Problem: Solve min

More information

Parallel Coordinate Optimization

Parallel Coordinate Optimization 1 / 38 Parallel Coordinate Optimization Julie Nutini MLRG - Spring Term March 6 th, 2018 2 / 38 Contours of a function F : IR 2 IR. Goal: Find the minimizer of F. Coordinate Descent in 2D Contours of a

More information

arxiv: v3 [math.oc] 1 Jul 2015

arxiv: v3 [math.oc] 1 Jul 2015 On the Convergence of Decentralized Gradient Descent Kun Yuan Qing Ling Wotao Yin arxiv:1310.7063v3 [math.oc] 1 Jul 015 Abstract Consider the consensus problem of minimizing f(x) = n fi(x), where x Rp

More information

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization / Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear

More information

Trade-Offs in Distributed Learning and Optimization

Trade-Offs in Distributed Learning and Optimization Trade-Offs in Distributed Learning and Optimization Ohad Shamir Weizmann Institute of Science Includes joint works with Yossi Arjevani, Nathan Srebro and Tong Zhang IHES Workshop March 2016 Distributed

More information

Adaptive Piecewise Polynomial Estimation via Trend Filtering

Adaptive Piecewise Polynomial Estimation via Trend Filtering Adaptive Piecewise Polynomial Estimation via Trend Filtering Liubo Li, ShanShan Tu The Ohio State University li.2201@osu.edu, tu.162@osu.edu October 1, 2015 Liubo Li, ShanShan Tu (OSU) Trend Filtering

More information

Sparse Gaussian conditional random fields

Sparse Gaussian conditional random fields Sparse Gaussian conditional random fields Matt Wytock, J. ico Kolter School of Computer Science Carnegie Mellon University Pittsburgh, PA 53 {mwytock, zkolter}@cs.cmu.edu Abstract We propose sparse Gaussian

More information

Nonlinear Programming

Nonlinear Programming Nonlinear Programming Kees Roos e-mail: C.Roos@ewi.tudelft.nl URL: http://www.isa.ewi.tudelft.nl/ roos LNMB Course De Uithof, Utrecht February 6 - May 8, A.D. 2006 Optimization Group 1 Outline for week

More information

Lecture 17: October 27

Lecture 17: October 27 0-725/36-725: Convex Optimiation Fall 205 Lecturer: Ryan Tibshirani Lecture 7: October 27 Scribes: Brandon Amos, Gines Hidalgo Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These

More information

Optimization Methods for Machine Learning

Optimization Methods for Machine Learning Optimization Methods for Machine Learning Sathiya Keerthi Microsoft Talks given at UC Santa Cruz February 21-23, 2017 The slides for the talks will be made available at: http://www.keerthis.com/ Introduction

More information

Iterative Methods. Splitting Methods

Iterative Methods. Splitting Methods Iterative Methods Splitting Methods 1 Direct Methods Solving Ax = b using direct methods. Gaussian elimination (using LU decomposition) Variants of LU, including Crout and Doolittle Other decomposition

More information