Distributed online optimization over jointly connected digraphs
|
|
- Phillip Heath
- 5 years ago
- Views:
Transcription
1 Distributed online optimization over jointly connected digraphs David Mateos-Núñez Jorge Cortés University of California, San Diego Southern California Optimization Day UC San Diego, May 23, / 29
2 Overview: Distributed online optimization Distributed optimization Online optimization Case study: medical diagnosis 2 / 29
3 Distributed online optimization Why distributed? information is distributed across group of agents need to interact to optimize performance Why online? information becomes incrementally available need adaptive solution 3 / 29
4 Machine learning in healthcare Medical findings, symptoms: age factor amnesia before impact deterioration in GCS score open skull fracture loss of consciousness vomiting Any acute brain finding revealed on Computerized Tomography? (-1 = not present, 1 = present) The Canadian CT Head Rule for patients with minor head injury 4 / 29
5 Binary classification feature vector of patient s: w s = ((w s ) 1,..., (w s ) d 1 ) true diagnosis: y s { 1, 1} wanted weights: x = (x 1,..., x d ) predictor: h(x, w s ) = x (w s, 1) margin: m s (x) = y s h(x, w s ) 5 / 29
6 Binary classification feature vector of patient s: w s = ((w s ) 1,..., (w s ) d 1 ) true diagnosis: y s { 1, 1} wanted weights: x = (x 1,..., x d ) predictor: h(x, w s ) = x (w s, 1) margin: m s (x) = y s h(x, w s ) Given the data set {w s } P s=1, estimate x Rd by solving min f (x) = min x Rd x R d P l(m s (x)) s=1 where the loss function l : R R is decreasing and convex 5 / 29
7 Review of distributed convex optimization 6 / 29
8 In the diagnosis example Health center i {1,..., N} manages a set of patients P i f (x) = i=1 s P i l(y s h(x, w s )) = f i (x) i=1 7 / 29
9 In the diagnosis example Health center i {1,..., N} manages a set of patients P i f (x) = i=1 s P i l(y s h(x, w s )) = f i (x) i=1 Goal: best predicting model w h(x, w) min x R d f i (x) i=1 using local information 7 / 29
10 What do we mean by using local information? Agent i maintains an estimate x i t of x arg min x R d f i (x) Agent i has access to f i Agent i can share its estimate x i t with neighboring agents i=1 f 1 f 2 f 5 a 13 a 14 a 21 a 25 A = a 31 a 32 a 41 a 52 f 3 f 4 8 / 29
11 What do we mean by using local information? Agent i maintains an estimate x i t of x arg min x R d f i (x) Agent i has access to f i Agent i can share its estimate x i t with neighboring agents i=1 f 1 f 2 f 5 a 13 a 14 a 21 a 25 A = a 31 a 32 a 41 a 52 f 3 f 4 Application to distributed estimation in wireless sensor networks and beyond... sensor is any channel for the machine to learn 8 / 29
12 How do agents agree on the optimizer? Spreading of information (gossip, time-varying topologies, B-joint connectivity) Relation between consensus & local minimization 9 / 29
13 How do agents agree on the optimizer? Spreading of information (gossip, time-varying topologies, B-joint connectivity) Relation between consensus & local minimization A. Nedić and A. Ozdaglar, TAC, 09 N zk+1 i = a ij,k z j k η tg, xk+1 i = Π X (zk+1 i ), j=1 where A k = (a ij,k ) is doubly stochastic and g f i (xk i ) J. C. Duchi, A. Agarwal, and M. J. Wainwright, TAC, 12 9 / 29
14 Saddle-point dynamics The minimization problem can be regarded as min x R d f i (x) = i=1 min x 1,...,x N R d x 1 =...=x N f i (x i ) = i=1 min x (R d ) N Lx=0 f i (x i ), i=1 where (Lx) i = N j=1 a ij(x i x j ) 10 / 29
15 Saddle-point dynamics The minimization problem can be regarded as min x R d f i (x) = i=1 min x 1,...,x N R d x 1 =...=x N f i (x i ) = i=1 min x (R d ) N Lx=0 f i (x i ), i=1 where (Lx) i = N j=1 a ij(x i x j ) The augmented Lagrangian when L is symmetric is F (x, z) := f (x) + a 2 x L x + z L x, which is convex-concave, and the saddle-point dynamics F (x, z) ẋ = = f (x) a Lx Lz x F (x, z) ż = = Lx z 10 / 29
16 Weight-balanced digraphs ẋ = f (x) a Lx Lz ż = Lx Lagrangian F (x, z) f (x) + a 2 x L x + z L x, f (x) = N i=1 f i (x i ) 11 / 29
17 Weight-balanced digraphs ẋ = f (x) a Lx Lz ż = Lx Lagrangian F (x, z) f (x) + a 2 x L x + z L x, f (x) = N i=1 f i (x i ) ẋ = F (x, z) x = f (x) a 1 2( L + L ) x L z (Non distributed!) ż = F (x, z) z = Lx J. Wang and N. Elia (with L = L), Allerton, / 29
18 Weight-balanced digraphs ẋ = f (x) a Lx Lz ż = Lx Lagrangian F (x, z) f (x) + a 2 x L x + z L x, f (x) = N i=1 f i (x i ) F (x, z) ẋ = = f (x) a 1 x 2( ) L + L x L z (Non distributed!) changed to f (x) a Lx Lz ż = F (x, z) z = Lx B. Gharesifard and J. Cortés, CDC, 12 Note: a > 0; otherwise the linear part of the saddle-point dynamics is a Hamiltonian system 11 / 29
19 Weight-balanced digraphs ẋ = f (x) a Lx Lz ż = Lx Example: 4 agents in a directed cycle Evolution of the agents s estimates {[x i 1(t), x i 2(t)]} f 1 (x 1, x 2 ) = 1 2 ((x 1 4) 2 + (x 2 3) 2 ) f 2 (x 1, x 2 ) = x 1 + 3x 2 2 f 3 (x 1, x 2 ) = log(e x1+3 + e x2+1 ) f 4 (x 1, x 2 ) = (x 1 + 2x 2 + 5) 2 + (x 1 x 2 4) time, t convergence to a neighborhood of optimizer (1.10, 2.74) size of the neighborhood depends on size of the noise [DMN-JC, 13] 11 / 29
20 Review of online convex optimization 12 / 29
21 Different kind of optimization: sequential decision making Resuming the diagnosis example: Each round t {1,..., T } question (features, medical findings): w t decision (about using CT): h(x t, w t ) outcome (by CT findings/follow up of patient): y t loss: l(y t h(x t, w t )) Choose x t & Incur loss f t (x t ) := l(y t h(x t, w t )) 13 / 29
22 Different kind of optimization: sequential decision making Resuming the diagnosis example: Each round t {1,..., T } question (features, medical findings): w t decision (about using CT): h(x t, w t ) outcome (by CT findings/follow up of patient): y t loss: l(y t h(x t, w t )) Choose x t & Incur loss f t (x t ) := l(y t h(x t, w t )) Goal: sublinear regret T T R(u, T ) := f t (x t ) f t (u) o(t ) t=1 t=1 using historical observations 13 / 29
23 Why regret? If the regret is sublinear, T f t (x t ) t=1 T f t (u) + o(t ), t=1 then, 1 T T f t (x t ) t=1 1 T T t=1 f t (u) + o(t ) T In temporal average, online decisions {x t } T t=1 fixed decision in hindsight perform as well as best No regrets, my friend 14 / 29
24 What about generalization error? Sublinear regret does not imply x t+1 will do well with f t+1 No assumptions about sequence {f t }; it can follow an unknown stochastic or deterministic model, or be chosen adversarially In our example, f t := l(y t h(x t, w t )). If some model w h(x, w) explains reasonably the data in hindsight, then the online models w h(xt, w) perform just as well in average 15 / 29
25 What about generalization error? Sublinear regret does not imply x t+1 will do well with f t+1 No assumptions about sequence {f t }; it can follow an unknown stochastic or deterministic model, or be chosen adversarially In our example, f t := l(y t h(x t, w t )). If some model w h(x, w) explains reasonably the data in hindsight, then the online models w h(xt, w) perform just as well in average Other applications: portfolio selection online advertisement placement interactive learning 15 / 29
26 Some classical results Projected gradient descent: x t+1 = Π S (x t η t f t (x t )), (1) where Π S is a projection onto a compact set S R d, & f 2 H Follow-the-Regularized-Leader: x t+1 = arg min y S ( t s=1 ) f s (y) + ψ(y) 16 / 29
27 Some classical results Projected gradient descent: x t+1 = Π S (x t η t f t (x t )), (1) where Π S is a projection onto a compact set S R d, & f 2 H Follow-the-Regularized-Leader: x t+1 = arg min y S ( t s=1 ) f s (y) + ψ(y) Martin Zinkevich, 03 (1) achieves O( T ) regret under convexity with η t = 1 t Elad Hazan, Amit Agarwal, and Satyen Kale, 07 (1) achieves O(log T ) regret under p-strong convexity with ηt = 1 pt Others: Online Newton Step, Follow the Regularized Leader, etc. 16 / 29
28 Our contribution: Combining both aspects 17 / 29
29 Back again to the diagnosis example: Health center i {1,..., N} takes care of a set of patients P i t at time t f i (x) = T l(y s h(x, w s )) = t=1 s Pt i T t=1 f i t (x) 18 / 29
30 Back again to the diagnosis example: Health center i {1,..., N} takes care of a set of patients P i t at time t f i (x) = T l(y s h(x, w s )) = t=1 s Pt i T t=1 f i t (x) R j (u, T ) := Goal: sublinear agent regret T t=1 i=1 f i t (x j t ) T t=1 i=1 f i t (u) o(t ) using local information & historical observations 18 / 29
31 Challenge: Coordinate hospitals f 1 t f 2 t f 5 t ft 3 ft 4 Need to design distributed online algorithms 19 / 29
32 Previous work on consensus-based online algorithms F. Yan, S. Sundaram, S. V. N. Vishwanathan and Y. Qi, TAC Projected Subgradient Descent log(t ) regret (local strong convexity & bounded subgradients) T regret (convexity & bounded subgradients) Both analysis require a projection onto a compact set S. Hosseini, A. Chapman and M. Mesbahi, CDC, 13 Dual Averaging T regret (convexity & bounded subgradients) General regularized projection onto a convex closed set. K. I. Tsianos and M. G. Rabbat, arxiv, 12 Projected Subgradient Descent Empirical risk as opposed to regret analysis Communication digraph in all cases is fixed, strongly connected & weight-balanced 20 / 29
33 Our contributions (informally) time-varying communication digraphs under B-joint connectivity & weight-balanced unconstrained optimization (no projection step onto a bounded set) log T regret (local strong convexity & bounded subgradients) T regret (convexity & bounded subgradients) 21 / 29
34 Coordination algorithm x i t+1 = x i t η t g x i t Subgradient descent on previous local objectives, g x i t f i t 22 / 29
35 Coordination algorithm x i t+1 = x i t η t g x i t ( + σ a j=1,j i ( a ij,t x j t xt i ) Proportional (linear) feedback on disagreement with neighbors 22 / 29
36 Coordination algorithm x i t+1 = x i t z i t+1 = z i t η t g x i t ( + σ a σ j=1,j i j=1,j i ( a ij,t x j t xt i ) + N ( a ij,t z j t z i ) ) t j=1,j i ( a ij,t x j t xt i ) Integral (linear) feedback on disagreement with neighbors 22 / 29
37 Coordination algorithm x i t+1 = x i t z i t+1 = z i t η t g x i t ( + σ a σ j=1,j i j=1,j i ( a ij,t x j t xt i ) + N ( a ij,t z j t z i ) ) t j=1,j i ( a ij,t x j t xt i ) Union of graphs over intervals of length B is strongly connected. f 1 t f 2 t f 5 t ft 3 ft 4 a 41,t+1 a 14,t+1 time t time t + 1 time t / 29
38 Coordination algorithm x i t+1 = x i t z i t+1 = z i t η t g x i t ( + σ a σ j=1,j i j=1,j i ( a ij,t x j t xt i ) + N ( a ij,t z j t z i ) ) t j=1,j i ( a ij,t x j t xt i ) Compact representation [ xt+1 z t+1 ] = [ xt z t ] σ [ alt L t L t 0 ] [ xt z t ] ] [ gxt η t 0 22 / 29
39 Coordination algorithm x i t+1 = x i t z i t+1 = z i t η t g x i t ( + σ a σ j=1,j i j=1,j i ( a ij,t x j t xt i ) + N ( a ij,t z j t z i ) ) t j=1,j i ( a ij,t x j t xt i ) Compact representation & generalization [ xt+1 z t+1 ] = [ xt z t ] σ [ alt L t L t 0 ] [ xt z t v t+1 = (I σg L t )v t η t g t, ] ] [ gxt η t 0 22 / 29
40 Our contributions Theorem Assume that {ft 1,..., ft N } T t=1 are convex functions in Rd with H-bounded subgradient sets, nonempty and uniformly bounded sets of minimizers, and p-strongly convex in a suff. large neighborhood of their minimizers The sequence of weight-balanced communication digraphs is nondegenerate, and B-jointly-connected G R K K is diagonalizable with positive real eigenvalues Then, taking learning rates η t = 1 p t, for any j {1,..., N} and u R d R j (u, T ) C( u log T ), 23 / 29
41 Our contributions Theorem Assume that {ft 1,..., ft N } T t=1 are convex functions in Rd with H-bounded subgradient sets, nonempty and uniformly bounded sets of minimizers, and p-strongly convex in a suff. large neighborhood of their minimizers The sequence of weight-balanced communication digraphs is nondegenerate, and B-jointly-connected G R K K is diagonalizable with positive real eigenvalues Relaxing strong convexity to convexity and using the Doubling Trick scheme (see S. Shalev-Shwartz) for the learning rates, R j (u, T ) C u 2 2 T, for any j {1,..., N} and u R d 23 / 29
42 Idea of the proof Network regret R j (u, T ) := T t=1 i=1 f i t (x i t) T t=1 i=1 f i t (u) Disagreement dynamics under B-joint connectivity Bound on the trajectories uniform in T 24 / 29
43 Simulations: acute brain finding revealed on Computerized Tomography 25 / 29
44 Agents estimates Average regret ( max j 1/T R j x T, { N } i ft i T t =1) { x i t,7 }N i= time, t Centralized Proportional-I ntegral disagreement feedback Proportional disagreement feedback C entralized time horizon, T ft i (x) = l(y s h(x, w s )) x 2 2 s Pt i where l(m) = log ( 1 + e 2m) 26 / 29
45 Conclusions Distributed online unconstrained convex optimization with sublinear regret under B-joint connectivity Relevant for regression & classification that play a crucial role in machine learning, computer vision, etc. Future work Refine guarantees under model for evolution of objective functions Enable agents to cooperatively select features that strike the balance sensibility/specificity Effect of noise on the performance 27 / 29
46 Future horizons for distributed optimization in healthcare Engage & detect disease before it happens 28 / 29
47 Thank you for listening! 29 / 29
Distributed online optimization over jointly connected digraphs
Distributed online optimization over jointly connected digraphs David Mateos-Núñez Jorge Cortés University of California, San Diego {dmateosn,cortes}@ucsd.edu Mathematical Theory of Networks and Systems
More informationDistributed online second-order dynamics for convex optimization over switching connected graphs
Distributed online second-order dynamics for convex optimization over switching connected graphs David Mateos-Núñez Jorge Cortés Abstract This paper studies the regret of a family of distributed algorithms
More informationADMM and Fast Gradient Methods for Distributed Optimization
ADMM and Fast Gradient Methods for Distributed Optimization João Xavier Instituto Sistemas e Robótica (ISR), Instituto Superior Técnico (IST) European Control Conference, ECC 13 July 16, 013 Joint work
More informationTutorial: PART 1. Online Convex Optimization, A Game- Theoretic Approach to Learning.
Tutorial: PART 1 Online Convex Optimization, A Game- Theoretic Approach to Learning http://www.cs.princeton.edu/~ehazan/tutorial/tutorial.htm Elad Hazan Princeton University Satyen Kale Yahoo Research
More informationAsynchronous Non-Convex Optimization For Separable Problem
Asynchronous Non-Convex Optimization For Separable Problem Sandeep Kumar and Ketan Rajawat Dept. of Electrical Engineering, IIT Kanpur Uttar Pradesh, India Distributed Optimization A general multi-agent
More informationFull-information Online Learning
Introduction Expert Advice OCO LM A DA NANJING UNIVERSITY Full-information Lijun Zhang Nanjing University, China June 2, 2017 Outline Introduction Expert Advice OCO 1 Introduction Definitions Regret 2
More informationA Low Complexity Algorithm with O( T ) Regret and Finite Constraint Violations for Online Convex Optimization with Long Term Constraints
A Low Complexity Algorithm with O( T ) Regret and Finite Constraint Violations for Online Convex Optimization with Long Term Constraints Hao Yu and Michael J. Neely Department of Electrical Engineering
More informationOLSO. Online Learning and Stochastic Optimization. Yoram Singer August 10, Google Research
OLSO Online Learning and Stochastic Optimization Yoram Singer August 10, 2016 Google Research References Introduction to Online Convex Optimization, Elad Hazan, Princeton University Online Learning and
More informationAdaptive Online Gradient Descent
University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 6-4-2007 Adaptive Online Gradient Descent Peter Bartlett Elad Hazan Alexander Rakhlin University of Pennsylvania Follow
More informationOnline Learning with Experts & Multiplicative Weights Algorithms
Online Learning with Experts & Multiplicative Weights Algorithms CS 159 lecture #2 Stephan Zheng April 1, 2016 Caltech Table of contents 1. Online Learning with Experts With a perfect expert Without perfect
More informationStochastic and Adversarial Online Learning without Hyperparameters
Stochastic and Adversarial Online Learning without Hyperparameters Ashok Cutkosky Department of Computer Science Stanford University ashokc@cs.stanford.edu Kwabena Boahen Department of Bioengineering Stanford
More informationOnline Convex Optimization. Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016
Online Convex Optimization Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016 The General Setting The General Setting (Cover) Given only the above, learning isn't always possible Some Natural
More informationTutorial: PART 2. Online Convex Optimization, A Game- Theoretic Approach to Learning
Tutorial: PART 2 Online Convex Optimization, A Game- Theoretic Approach to Learning Elad Hazan Princeton University Satyen Kale Yahoo Research Exploiting curvature: logarithmic regret Logarithmic regret
More informationLecture: Adaptive Filtering
ECE 830 Spring 2013 Statistical Signal Processing instructors: K. Jamieson and R. Nowak Lecture: Adaptive Filtering Adaptive filters are commonly used for online filtering of signals. The goal is to estimate
More informationProjection-free Distributed Online Learning in Networks
Wenpeng Zhang Peilin Zhao 2 Wenwu Zhu Steven C. H. Hoi 3 Tong Zhang 4 Abstract The conditional gradient algorithm has regained a surge of research interest in recent years due to its high efficiency in
More informationA survey: The convex optimization approach to regret minimization
A survey: The convex optimization approach to regret minimization Elad Hazan September 10, 2009 WORKING DRAFT Abstract A well studied and general setting for prediction and decision making is regret minimization
More informationNo-Regret Algorithms for Unconstrained Online Convex Optimization
No-Regret Algorithms for Unconstrained Online Convex Optimization Matthew Streeter Duolingo, Inc. Pittsburgh, PA 153 matt@duolingo.com H. Brendan McMahan Google, Inc. Seattle, WA 98103 mcmahan@google.com
More informationThe Online Approach to Machine Learning
The Online Approach to Machine Learning Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Online Approach to ML 1 / 53 Summary 1 My beautiful regret 2 A supposedly fun game I
More informationDistributed Online Optimization in Dynamic Environments Using Mirror Descent. Shahin Shahrampour and Ali Jadbabaie, Fellow, IEEE
714 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 63, NO. 3, MARCH 2018 Distributed Online Optimization in Dynamic Environments Using Mirror Descent Shahin Shahrampour and Ali Jadbabaie, Fellow, IEEE Abstract
More informationOptimal Stochastic Strongly Convex Optimization with a Logarithmic Number of Projections
Optimal Stochastic Strongly Convex Optimization with a Logarithmic Number of Projections Jianhui Chen 1, Tianbao Yang 2, Qihang Lin 2, Lijun Zhang 3, and Yi Chang 4 July 18, 2016 Yahoo Research 1, The
More informationConvergence rate of SGD
Convergence rate of SGD heorem: (see Nemirovski et al 09 from readings) Let f be a strongly convex stochastic function Assume gradient of f is Lipschitz continuous and bounded hen, for step sizes: he expected
More informationOnline Learning of Optimal Strategies in Unknown Environments
215 IEEE 54th Annual Conference on Decision and Control (CDC) December 15-18, 215. Osaka, Japan Online Learning of Optimal Strategies in Unknown Environments Santiago Paternain and Alejandro Ribeiro Abstract
More informationConsensus-Based Distributed Optimization with Malicious Nodes
Consensus-Based Distributed Optimization with Malicious Nodes Shreyas Sundaram Bahman Gharesifard Abstract We investigate the vulnerabilities of consensusbased distributed optimization protocols to nodes
More informationTrade-Offs in Distributed Learning and Optimization
Trade-Offs in Distributed Learning and Optimization Ohad Shamir Weizmann Institute of Science Includes joint works with Yossi Arjevani, Nathan Srebro and Tong Zhang IHES Workshop March 2016 Distributed
More informationAlireza Shafaei. Machine Learning Reading Group The University of British Columbia Summer 2017
s s Machine Learning Reading Group The University of British Columbia Summer 2017 (OCO) Convex 1/29 Outline (OCO) Convex Stochastic Bernoulli s (OCO) Convex 2/29 At each iteration t, the player chooses
More informationSubgradient Methods for Maximum Margin Structured Learning
Nathan D. Ratliff ndr@ri.cmu.edu J. Andrew Bagnell dbagnell@ri.cmu.edu Robotics Institute, Carnegie Mellon University, Pittsburgh, PA. 15213 USA Martin A. Zinkevich maz@cs.ualberta.ca Department of Computing
More informationLittlestone s Dimension and Online Learnability
Littlestone s Dimension and Online Learnability Shai Shalev-Shwartz Toyota Technological Institute at Chicago The Hebrew University Talk at UCSD workshop, February, 2009 Joint work with Shai Ben-David
More informationAdaptive Online Learning in Dynamic Environments
Adaptive Online Learning in Dynamic Environments Lijun Zhang, Shiyin Lu, Zhi-Hua Zhou National Key Laboratory for Novel Software Technology Nanjing University, Nanjing 210023, China {zhanglj, lusy, zhouzh}@lamda.nju.edu.cn
More informationBeyond the regret minimization barrier: an optimal algorithm for stochastic strongly-convex optimization
JMLR: Workshop and Conference Proceedings vol (2010) 1 16 24th Annual Conference on Learning heory Beyond the regret minimization barrier: an optimal algorithm for stochastic strongly-convex optimization
More informationNotes on AdaGrad. Joseph Perla 2014
Notes on AdaGrad Joseph Perla 2014 1 Introduction Stochastic Gradient Descent (SGD) is a common online learning algorithm for optimizing convex (and often non-convex) functions in machine learning today.
More informationAdaptive Gradient Methods AdaGrad / Adam. Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade
Adaptive Gradient Methods AdaGrad / Adam Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade 1 Announcements: HW3 posted Dual coordinate ascent (some review of SGD and random
More informationAdaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer Vicente L. Malave February 23, 2011 Outline Notation minimize a number of functions φ
More informationStochastic Optimization
Introduction Related Work SGD Epoch-GD LM A DA NANJING UNIVERSITY Lijun Zhang Nanjing University, China May 26, 2017 Introduction Related Work SGD Epoch-GD Outline 1 Introduction 2 Related Work 3 Stochastic
More informationDistributed continuous-time convex optimization on weight-balanced digraphs
Distributed continuous-time convex optimization on weight-balanced digraphs 1 Bahman Gharesifard Jorge Cortés Abstract This paper studies the continuous-time distributed optimization of a sum of convex
More informationLecture 16: Perceptron and Exponential Weights Algorithm
EECS 598-005: Theoretical Foundations of Machine Learning Fall 2015 Lecture 16: Perceptron and Exponential Weights Algorithm Lecturer: Jacob Abernethy Scribes: Yue Wang, Editors: Weiqing Yu and Andrew
More informationarxiv: v1 [cs.lg] 23 Jan 2019
Cooperative Online Learning: Keeping your Neighbors Updated Nicolò Cesa-Bianchi, Tommaso R. Cesari, and Claire Monteleoni 2 Dipartimento di Informatica, Università degli Studi di Milano, Italy 2 Department
More informationDecentralized Quadratically Approximated Alternating Direction Method of Multipliers
Decentralized Quadratically Approimated Alternating Direction Method of Multipliers Aryan Mokhtari Wei Shi Qing Ling Alejandro Ribeiro Department of Electrical and Systems Engineering, University of Pennsylvania
More informationOn the Generalization Ability of Online Strongly Convex Programming Algorithms
On the Generalization Ability of Online Strongly Convex Programming Algorithms Sham M. Kakade I Chicago Chicago, IL 60637 sham@tti-c.org Ambuj ewari I Chicago Chicago, IL 60637 tewari@tti-c.org Abstract
More informationOnline Learning and Online Convex Optimization
Online Learning and Online Convex Optimization Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Online Learning 1 / 49 Summary 1 My beautiful regret 2 A supposedly fun game
More informationDistributed Optimization over Random Networks
Distributed Optimization over Random Networks Ilan Lobel and Asu Ozdaglar Allerton Conference September 2008 Operations Research Center and Electrical Engineering & Computer Science Massachusetts Institute
More informationAdvanced Machine Learning
Advanced Machine Learning Online Convex Optimization MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Outline Online projected sub-gradient descent. Exponentiated Gradient (EG). Mirror descent.
More informationRegret bounded by gradual variation for online convex optimization
Noname manuscript No. will be inserted by the editor Regret bounded by gradual variation for online convex optimization Tianbao Yang Mehrdad Mahdavi Rong Jin Shenghuo Zhu Received: date / Accepted: date
More informationConvergence Rates in Decentralized Optimization. Alex Olshevsky Department of Electrical and Computer Engineering Boston University
Convergence Rates in Decentralized Optimization Alex Olshevsky Department of Electrical and Computer Engineering Boston University Distributed and Multi-agent Control Strong need for protocols to coordinate
More informationOnline Passive-Aggressive Algorithms
Online Passive-Aggressive Algorithms Koby Crammer Ofer Dekel Shai Shalev-Shwartz Yoram Singer School of Computer Science & Engineering The Hebrew University, Jerusalem 91904, Israel {kobics,oferd,shais,singer}@cs.huji.ac.il
More informationDistributed Smooth and Strongly Convex Optimization with Inexact Dual Methods
Distributed Smooth and Strongly Convex Optimization with Inexact Dual Methods Mahyar Fazlyab, Santiago Paternain, Alejandro Ribeiro and Victor M. Preciado Abstract In this paper, we consider a class of
More informationOnline Submodular Minimization
Online Submodular Minimization Elad Hazan IBM Almaden Research Center 650 Harry Rd, San Jose, CA 95120 hazan@us.ibm.com Satyen Kale Yahoo! Research 4301 Great America Parkway, Santa Clara, CA 95054 skale@yahoo-inc.com
More informationAd Placement Strategies
Case Study : Estimating Click Probabilities Intro Logistic Regression Gradient Descent + SGD AdaGrad Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox January 7 th, 04 Ad
More informationOverfitting, Bias / Variance Analysis
Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic
More informationResilient Distributed Optimization Algorithm against Adversary Attacks
207 3th IEEE International Conference on Control & Automation (ICCA) July 3-6, 207. Ohrid, Macedonia Resilient Distributed Optimization Algorithm against Adversary Attacks Chengcheng Zhao, Jianping He
More informationQuasi-Newton Algorithms for Non-smooth Online Strongly Convex Optimization. Mark Franklin Godwin
Quasi-Newton Algorithms for Non-smooth Online Strongly Convex Optimization by Mark Franklin Godwin A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy
More informationThe No-Regret Framework for Online Learning
The No-Regret Framework for Online Learning A Tutorial Introduction Nahum Shimkin Technion Israel Institute of Technology Haifa, Israel Stochastic Processes in Engineering IIT Mumbai, March 2013 N. Shimkin,
More informationAdaptive Gradient Methods AdaGrad / Adam
Case Study 1: Estimating Click Probabilities Adaptive Gradient Methods AdaGrad / Adam Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade 1 The Problem with GD (and SGD)
More informationDistributed Coordination for Separable Convex Optimization with Coupling Constraints
Distributed Coordination for Separable Convex Optimization with Coupling Constraints Simon K. Niederländer Jorge Cortés Abstract This paper considers a network of agents described by an undirected graph
More informationAsynchronous Online Learning in Multi-Agent Systems with Proximity Constraints
1 Asynchronous Online Learning in Multi-Agent Systems with Proximity Constraints Amrit Singh Bedi, Student Member, IEEE, Alec Koppel, Member, IEEE, and Ketan Rajawat, Member, IEEE Abstract We consider
More informationOnline Sparse Linear Regression
JMLR: Workshop and Conference Proceedings vol 49:1 11, 2016 Online Sparse Linear Regression Dean Foster Amazon DEAN@FOSTER.NET Satyen Kale Yahoo Research SATYEN@YAHOO-INC.COM Howard Karloff HOWARD@CC.GATECH.EDU
More informationStatistical Machine Learning II Spring 2017, Learning Theory, Lecture 4
Statistical Machine Learning II Spring 07, Learning Theory, Lecture 4 Jean Honorio jhonorio@purdue.edu Deterministic Optimization For brevity, everywhere differentiable functions will be called smooth.
More informationLecture 23: Online convex optimization Online convex optimization: generalization of several algorithms
EECS 598-005: heoretical Foundations of Machine Learning Fall 2015 Lecture 23: Online convex optimization Lecturer: Jacob Abernethy Scribes: Vikas Dhiman Disclaimer: hese notes have not been subjected
More informationarxiv: v1 [stat.ml] 23 Dec 2015
Adaptive Algorithms for Online Convex Optimization with Long-term Constraints Rodolphe Jenatton Jim Huang Cédric Archambeau Amazon, Berlin Amazon, Seattle {jenatton,huangjim,cedrica}@amazon.com arxiv:5.74v
More informationOnline Convex Optimization Using Predictions
Online Convex Optimization Using Predictions Niangjun Chen Joint work with Anish Agarwal, Lachlan Andrew, Siddharth Barman, and Adam Wierman 1 c " c " (x " ) F x " 2 c ) c ) x ) F x " x ) β x ) x " 3 F
More informationLearning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley
Learning Methods for Online Prediction Problems Peter Bartlett Statistics and EECS UC Berkeley Course Synopsis A finite comparison class: A = {1,..., m}. Converting online to batch. Online convex optimization.
More informationOnline Learning with Non-Convex Losses and Non-Stationary Regret
Online Learning with Non-Convex Losses and Non-Stationary Regret Xiang Gao Xiaobo Li Shuzhong Zhang University of Minnesota University of Minnesota University of Minnesota Abstract In this paper, we consider
More informationStochastic gradient methods for machine learning
Stochastic gradient methods for machine learning Francis Bach INRIA - Ecole Normale Supérieure, Paris, France Joint work with Eric Moulines, Nicolas Le Roux and Mark Schmidt - January 2013 Context Machine
More informationIncremental Gradient, Subgradient, and Proximal Methods for Convex Optimization
Incremental Gradient, Subgradient, and Proximal Methods for Convex Optimization Dimitri P. Bertsekas Laboratory for Information and Decision Systems Massachusetts Institute of Technology February 2014
More informationGeneralized Boosting Algorithms for Convex Optimization
Alexander Grubb agrubb@cmu.edu J. Andrew Bagnell dbagnell@ri.cmu.edu School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 523 USA Abstract Boosting is a popular way to derive powerful
More informationLearnability, Stability, Regularization and Strong Convexity
Learnability, Stability, Regularization and Strong Convexity Nati Srebro Shai Shalev-Shwartz HUJI Ohad Shamir Weizmann Karthik Sridharan Cornell Ambuj Tewari Michigan Toyota Technological Institute Chicago
More informationMachine Learning. Support Vector Machines. Fabio Vandin November 20, 2017
Machine Learning Support Vector Machines Fabio Vandin November 20, 2017 1 Classification and Margin Consider a classification problem with two classes: instance set X = R d label set Y = { 1, 1}. Training
More informationSimultaneous Model Selection and Optimization through Parameter-free Stochastic Learning
Simultaneous Model Selection and Optimization through Parameter-free Stochastic Learning Francesco Orabona Yahoo! Labs New York, USA francesco@orabona.com Abstract Stochastic gradient descent algorithms
More informationDecentralized Consensus Optimization with Asynchrony and Delay
Decentralized Consensus Optimization with Asynchrony and Delay Tianyu Wu, Kun Yuan 2, Qing Ling 3, Wotao Yin, and Ali H. Sayed 2 Department of Mathematics, 2 Department of Electrical Engineering, University
More informationMaking Gradient Descent Optimal for Strongly Convex Stochastic Optimization
Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization Alexander Rakhlin University of Pennsylvania Ohad Shamir Microsoft Research New England Karthik Sridharan University of Pennsylvania
More informationWarm up: risk prediction with logistic regression
Warm up: risk prediction with logistic regression Boss gives you a bunch of data on loans defaulting or not: {(x i,y i )} n i= x i 2 R d, y i 2 {, } You model the data as: P (Y = y x, w) = + exp( yw T
More informationImitation Learning by Coaching. Abstract
Imitation Learning by Coaching He He Hal Daumé III Department of Computer Science University of Maryland College Park, MD 20740 {hhe,hal}@cs.umd.edu Abstract Jason Eisner Department of Computer Science
More informationOnline Passive-Aggressive Algorithms
Online Passive-Aggressive Algorithms Koby Crammer Ofer Dekel Shai Shalev-Shwartz Yoram Singer School of Computer Science & Engineering The Hebrew University, Jerusalem 91904, Israel {kobics,oferd,shais,singer}@cs.huji.ac.il
More informationLecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron
CS446: Machine Learning, Fall 2017 Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron Lecturer: Sanmi Koyejo Scribe: Ke Wang, Oct. 24th, 2017 Agenda Recap: SVM and Hinge loss, Representer
More informationarxiv: v1 [math.oc] 29 Sep 2018
Distributed Finite-time Least Squares Solver for Network Linear Equations Tao Yang a, Jemin George b, Jiahu Qin c,, Xinlei Yi d, Junfeng Wu e arxiv:856v mathoc 9 Sep 8 a Department of Electrical Engineering,
More informationComputational and Statistical Learning Theory
Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 17: Stochastic Optimization Part II: Realizable vs Agnostic Rates Part III: Nearest Neighbor Classification Stochastic
More informationStochastic Optimization with Variance Reduction for Infinite Datasets with Finite Sum Structure
Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite Sum Structure Alberto Bietti Julien Mairal Inria Grenoble (Thoth) March 21, 2017 Alberto Bietti Stochastic MISO March 21,
More informationSupport Vector Machines
Support Vector Machines Support vector machines (SVMs) are one of the central concepts in all of machine learning. They are simply a combination of two ideas: linear classification via maximum (or optimal
More informationFrom Bandits to Experts: A Tale of Domination and Independence
From Bandits to Experts: A Tale of Domination and Independence Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Domination and Independence 1 / 1 From Bandits to Experts: A
More informationRecitation 9. Gradient Boosting. Brett Bernstein. March 30, CDS at NYU. Brett Bernstein (CDS at NYU) Recitation 9 March 30, / 14
Brett Bernstein CDS at NYU March 30, 2017 Brett Bernstein (CDS at NYU) Recitation 9 March 30, 2017 1 / 14 Initial Question Intro Question Question Suppose 10 different meteorologists have produced functions
More informationLogarithmic Regret Algorithms for Online Convex Optimization
Logarithmic Regret Algorithms for Online Convex Optimization Elad Hazan 1, Adam Kalai 2, Satyen Kale 1, and Amit Agarwal 1 1 Princeton University {ehazan,satyen,aagarwal}@princeton.edu 2 TTI-Chicago kalai@tti-c.org
More informationDistributed Online Optimization for Multi-Agent Networks with Coupled Inequality Constraints
Distributed Online Optimization for Multi-Agent etworks with Coupled Inequality Constraints Xiuxian Li, Xinlei Yi, and Lihua Xie arxiv:805.05573v [math.oc] 5 May 08 Abstract his paper investigates the
More informationECS171: Machine Learning
ECS171: Machine Learning Lecture 3: Linear Models I (LFD 3.2, 3.3) Cho-Jui Hsieh UC Davis Jan 17, 2018 Linear Regression (LFD 3.2) Regression Classification: Customer record Yes/No Regression: predicting
More informationStochastic Gradient Descent with Only One Projection
Stochastic Gradient Descent with Only One Projection Mehrdad Mahdavi, ianbao Yang, Rong Jin, Shenghuo Zhu, and Jinfeng Yi Dept. of Computer Science and Engineering, Michigan State University, MI, USA Machine
More informationAlgorithmic Stability and Generalization Christoph Lampert
Algorithmic Stability and Generalization Christoph Lampert November 28, 2018 1 / 32 IST Austria (Institute of Science and Technology Austria) institute for basic research opened in 2009 located in outskirts
More informationOnline Learning and Sequential Decision Making
Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Online Learning
More informationarxiv: v4 [math.oc] 5 Jan 2016
Restarted SGD: Beating SGD without Smoothness and/or Strong Convexity arxiv:151.03107v4 [math.oc] 5 Jan 016 Tianbao Yang, Qihang Lin Department of Computer Science Department of Management Sciences The
More informationZeno-free, distributed event-triggered communication and control for multi-agent average consensus
Zeno-free, distributed event-triggered communication and control for multi-agent average consensus Cameron Nowzari Jorge Cortés Abstract This paper studies a distributed event-triggered communication and
More informationSupport vector machines Lecture 4
Support vector machines Lecture 4 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, and Carlos Guestrin Q: What does the Perceptron mistake bound tell us? Theorem: The
More informationConstrained Consensus and Optimization in Multi-Agent Networks
Constrained Consensus Optimization in Multi-Agent Networks The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published Publisher
More information4.1 Online Convex Optimization
CS/CNS/EE 53: Advanced Topics in Machine Learning Topic: Online Convex Optimization and Online SVM Lecturer: Daniel Golovin Scribe: Xiaodi Hou Date: Jan 3, 4. Online Convex Optimization Definition 4..
More informationDistributed Optimization over Networks Gossip-Based Algorithms
Distributed Optimization over Networks Gossip-Based Algorithms Angelia Nedić angelia@illinois.edu ISE Department and Coordinated Science Laboratory University of Illinois at Urbana-Champaign Outline Random
More informationNo-Regret Learning in Convex Games
Geoffrey J. Gordon Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA 15213 Amy Greenwald Casey Marks Department of Computer Science, Brown University, Providence, RI 02912 Abstract
More informationMinimizing Adaptive Regret with One Gradient per Iteration
Minimizing Adaptive Regret with One Gradient per Iteration Guanghui Wang, Dakuan Zhao, Lijun Zhang National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China wanggh@lamda.nju.edu.cn,
More informationNetwork Newton. Aryan Mokhtari, Qing Ling and Alejandro Ribeiro. University of Pennsylvania, University of Science and Technology (China)
Network Newton Aryan Mokhtari, Qing Ling and Alejandro Ribeiro University of Pennsylvania, University of Science and Technology (China) aryanm@seas.upenn.edu, qingling@mail.ustc.edu.cn, aribeiro@seas.upenn.edu
More informationarxiv: v3 [math.oc] 1 Jul 2015
On the Convergence of Decentralized Gradient Descent Kun Yuan Qing Ling Wotao Yin arxiv:1310.7063v3 [math.oc] 1 Jul 015 Abstract Consider the consensus problem of minimizing f(x) = n fi(x), where x Rp
More informationQuantized Average Consensus on Gossip Digraphs
Quantized Average Consensus on Gossip Digraphs Hideaki Ishii Tokyo Institute of Technology Joint work with Kai Cai Workshop on Uncertain Dynamical Systems Udine, Italy August 25th, 2011 Multi-Agent Consensus
More informationarxiv: v1 [math.oc] 24 Dec 2018
On Increasing Self-Confidence in Non-Bayesian Social Learning over Time-Varying Directed Graphs César A. Uribe and Ali Jadbabaie arxiv:82.989v [math.oc] 24 Dec 28 Abstract We study the convergence of the
More informationOn Computational Thinking, Inferential Thinking and Data Science
On Computational Thinking, Inferential Thinking and Data Science Michael I. Jordan University of California, Berkeley December 17, 2016 A Job Description, circa 2015 Your Boss: I need a Big Data system
More informationA Greedy Framework for First-Order Optimization
A Greedy Framework for First-Order Optimization Jacob Steinhardt Department of Computer Science Stanford University Stanford, CA 94305 jsteinhardt@cs.stanford.edu Jonathan Huggins Department of EECS Massachusetts
More informationZero-Gradient-Sum Algorithms for Distributed Convex Optimization: The Continuous-Time Case
0 American Control Conference on O'Farrell Street, San Francisco, CA, USA June 9 - July 0, 0 Zero-Gradient-Sum Algorithms for Distributed Convex Optimization: The Continuous-Time Case Jie Lu and Choon
More information