On Computational Thinking, Inferential Thinking and Data Science
|
|
- Cory Arnold
- 6 years ago
- Views:
Transcription
1 On Computational Thinking, Inferential Thinking and Data Science Michael I. Jordan University of California, Berkeley December 17, 2016
2 A Job Description, circa 2015 Your Boss: I need a Big Data system that will replace our classic service with a personalized service
3 A Job Description, circa 2015 Your Boss: I need a Big Data system that will replace our classic service with a personalized service It should work reasonably well for anyone and everyone; I can tolerate a few errors but not too many dumb ones that will embarrass us
4 A Job Description, circa 2015 Your Boss: I need a Big Data system that will replace our classic service with a personalized service It should work reasonably well for anyone and everyone; I can tolerate a few errors but not too many dumb ones that will embarrass us It should run just as fast as our classic service
5 A Job Description, circa 2015 Your Boss: I need a Big Data system that will replace our classic service with a personalized service It should work reasonably well for anyone and everyone; I can tolerate a few errors but not too many dumb ones that will embarrass us It should run just as fast as our classic service It should only improve as we collect more data; in particular it shouldn t slow down
6 A Job Description, circa 2015 Your Boss: I need a Big Data system that will replace our classic service with a personalized service It should work reasonably well for anyone and everyone; I can tolerate a few errors but not too many dumb ones that will embarrass us It should run just as fast as our classic service It should only improve as we collect more data; in particular it shouldn t slow down There are serious privacy concerns of course, and they vary across the clients
7 Some Challenges Driven by Big Data Big Data analysis requires a thorough blending of computational thinking and inferential thinking
8 Some Challenges Driven by Big Data Big Data analysis requires a thorough blending of computational thinking and inferential thinking What I mean by computational thinking abstraction, modularity, scalability, robustness, etc.
9 Some Challenges Driven by Big Data Big Data analysis requires a thorough blending of computational thinking and inferential thinking What I mean by computational thinking abstraction, modularity, scalability, robustness, etc. Inferential thinking means (1) considering the realworld phenomenon behind the data, (2) considering the sampling pattern that gave rise to the data, and (3) developing procedures that will go backwards from the data to the underlying phenomenon
10 The Challenges are Daunting The core theories in computer science and statistics developed separately and there is an oil and water problem Core statistical theory doesn t have a place for runtime and other computational resources Core computational theory doesn t have a place for statistical risk
11 Outline Inference under privacy constraints Inference under communication constraints Lower bounds, the variational perspective and symplectic integration
12 Part I: Inference and Privacy with John Duchi and Martin Wainwright
13 Privacy and Data Analysis Individuals are not generally willing to allow their personal data to be used without control on how it will be used and now much privacy loss they will incur Privacy loss can be quantified via differential privacy We want to trade privacy loss against the value we obtain from data analysis The question becomes that of quantifying such value and juxtaposing it with privacy loss
14 Privacy query database
15 Privacy query database
16 Privacy query database Q privatized database
17 Privacy query query database Q privatized database
18 Privacy query query database Q privatized database ˆ
19 Privacy query query database Q privatized database ˆ Classical problem in differential privacy: show that are close under constraints on Q ˆ and
20 Inference query database
21 Inference query P S database
22 Inference query query P S database
23 Inference query query P S database Classical problem in statistical theory: show that are close under constraints on S and
24 Privacy and Inference query query query P S database Q privatized database ˆ The privacy-meets-inference problem: show that are close under constraints on Q and on S and ˆ
25 Background on Inference In the 1930 s, Wald laid the foundations of statistical decision theory Given a family of distributions P, a parameter (P ) for each P 2 P, an estimator ˆ, and a loss l(ˆ, (P )), define the risk: i R P (ˆ ) :=E P hl(ˆ, (P ))
26 Background on Inference In the 1930 s, Wald laid the foundations of statistical decision theory Given a family of distributions P, a parameter (P ) for each P 2 P, an estimator ˆ, and a loss l(ˆ, (P )), define the risk: i R P (ˆ ) :=E P hl(ˆ, (P )) Minimax principle [Wald, 39, 43]: choose estimator minimizing worst-case risk: sup P 2P i E P hl(ˆ, (P ))
27 Private Local Privacy
28 Local Privacy Private Channel Individuals Estimator with private data
29 Differential Privacy Definition: channel private if is -differentially [Dwork, McSherry, Nissim, Smith 06] Given Z, cannot reliably discriminate between x and x
30 Private Minimax Risk Parameter of distribution Family of distributions Loss measuring error Family of private channels -private Minimax risk Best -private channel Minimax risk under privacy constraint
31 Vignette: Private Mean Estimation Example: estimate reasons for hospital visits Patients admitted to hospital for substance abuse Estimate prevalence of different substances 1 Alcohol 1 Cocaine 0 Heroin 0 Cannabis 0 LSD 0 Amphetamines Proportions = =.45 =.32 =.16 =.20 =.00 =.02
32 Vignette: Mean Estimation Consider estimation of mean errors measured in -norm, for, with Proposition: Minimax rate (achieved by sample mean)
33 Vignette: Mean Estimation Consider estimation of mean errors measured in -norm, for, with Proposition: Private minimax rate for
34 Vignette: Mean Estimation Consider estimation of mean errors measured in -norm, for, with Proposition: Private minimax rate for Note: Effective sample size
35 Optimal mechanism? Non-private observation Idea 1: add independent noise (e.g. Laplace mechanism) [Dwork et al. 06] Problem: magnitude much too large (this is unavoidable: provably sub-optimal)
36 Empirical evidence Sample size n Estimate proportion of emergency room visits involving different substances Data source: Drug Abuse Warning Network
37 Additional Examples Fixed-design regression Convex risk minimization Multinomial estimation Nonparametric density estimation Almost always, the effective sample size reduction is:
38 Part III: Inference and Compression with Yuchen Zhang, John Duchi and Martin Wainwright
39 Computation and Inference How does inferential quality trade off against classical computational resources such as time and space?
40 Computation and Inference How does inferential quality trade off against classical computational resources such as time and space? Hard!
41 Computation and Inference: Mechanisms and Bounds Tradeoffs via convex relaxations linking runtime to convex geometry and risk to convex geometry Tradeoffs via concurrency control optimistic concurrency control Bounds via optimization oracles number of accesses to a gradient as a surrogate for computation Bounds via communication complexity Tradeoffs via subsampling bag of little bootstraps, variational consensus Monte Carlo
42 A Variational Framework for Accelerated Methods in Optimization with Andre Wibisono and Ashia Wilson July 12, 2016
43 Accelerated gradient descent Setting: Unconstrained convex optimization min x R d f (x)
44 Accelerated gradient descent Setting: Unconstrained convex optimization min x R d f (x) Classical gradient descent: x k+1 = x k β f (x k ) obtains a convergence rate of O(1/k)
45 Accelerated gradient descent Setting: Unconstrained convex optimization min x R d f (x) Classical gradient descent: x k+1 = x k β f (x k ) obtains a convergence rate of O(1/k) Accelerated gradient descent: y k+1 = x k β f (x k ) x k+1 = (1 λ k )y k+1 + λ k y k obtains the (optimal) convergence rate of O(1/k 2 )
46 The acceleration phenomenon Two classes of algorithms: Gradient methods Gradient descent, mirror descent, cubic-regularized Newton s method (Nesterov and Polyak 06), etc. Greedy descent methods, relatively well-understood
47 The acceleration phenomenon Two classes of algorithms: Gradient methods Gradient descent, mirror descent, cubic-regularized Newton s method (Nesterov and Polyak 06), etc. Greedy descent methods, relatively well-understood Accelerated methods Nesterov s accelerated gradient descent, accelerated mirror descent, accelerated cubic-regularized Newton s method (Nesterov 08), etc. Important for both theory (optimal rate for first-order methods) and practice (many extensions: FISTA, stochastic setting, etc.) Not descent methods, faster than gradient methods, still mysterious
48 Accelerated methods Analysis using Nesterov s estimate sequence technique Common interpretation as momentum methods (Euclidean case)
49 Accelerated methods Analysis using Nesterov s estimate sequence technique Common interpretation as momentum methods (Euclidean case) Many proposed explanations: Chebyshev polynomial (Hardt 13) Linear coupling (Allen-Zhu, Orecchia 14) Optimized first-order method (Drori, Teboulle 14; Kim, Fessler 15) Geometric shrinking (Bubeck, Lee, Singh 15) Universal catalyst (Lin, Mairal, Harchaoui 15)...
50 Accelerated methods Analysis using Nesterov s estimate sequence technique Common interpretation as momentum methods (Euclidean case) Many proposed explanations: Chebyshev polynomial (Hardt 13) Linear coupling (Allen-Zhu, Orecchia 14) Optimized first-order method (Drori, Teboulle 14; Kim, Fessler 15) Geometric shrinking (Bubeck, Lee, Singh 15) Universal catalyst (Lin, Mairal, Harchaoui 15)... But only for strongly convex functions, or first-order methods Question: What is the underlying mechanism that generates acceleration (including for higher-order methods)?
51 Accelerated methods: Continuous time perspective Gradient descent is discretization of gradient flow Ẋ t = f (X t ) (and mirror descent is discretization of natural gradient flow)
52 Accelerated methods: Continuous time perspective Gradient descent is discretization of gradient flow Ẋ t = f (X t ) (and mirror descent is discretization of natural gradient flow) Su, Boyd, Candes 14: Continuous time limit of accelerated gradient descent is a second-order ODE Ẍ t + 3 t Ẋt + f (X t ) = 0
53 Accelerated methods: Continuous time perspective Gradient descent is discretization of gradient flow Ẋ t = f (X t ) (and mirror descent is discretization of natural gradient flow) Su, Boyd, Candes 14: Continuous time limit of accelerated gradient descent is a second-order ODE Ẍ t + 3 t Ẋt + f (X t ) = 0 These ODEs are obtained by taking continuous time limits. Is there a deeper generative mechanism? Our work: A general variational approach to acceleration A systematic discretization methodology
54 Bregman Lagrangian Define the Bregman Lagrangian: ) L(x, ẋ, t) = e (D γt+αt h (x + e αt ẋ, x) e βt f (x)
55 Bregman Lagrangian Define the Bregman Lagrangian: ) L(x, ẋ, t) = e (D γt+αt h (x + e αt ẋ, x) e βt f (x) Function of position x, velocity ẋ, and time t
56 Bregman Lagrangian Define the Bregman Lagrangian: ) L(x, ẋ, t) = e (D γt+αt h (x + e αt ẋ, x) e βt f (x) Function of position x, velocity ẋ, and time t D h (y, x) = h(y) h(x) h(x), y x is the Bregman divergence h is the convex distance-generating function h(y) D h (y, x) h(x) x y
57 Bregman Lagrangian Define the Bregman Lagrangian: ) L(x, ẋ, t) = e (D γt+αt h (x + e αt ẋ, x) e βt f (x) Function of position x, velocity ẋ, and time t D h (y, x) = h(y) h(x) h(x), y x is the Bregman divergence h is the convex distance-generating function f is the convex objective function h(x) h(y) D h (y, x) x y
58 Bregman Lagrangian Define the Bregman Lagrangian: ) L(x, ẋ, t) = e (D γt+αt h (x + e αt ẋ, x) e βt f (x) Function of position x, velocity ẋ, and time t D h (y, x) = h(y) h(x) h(x), y x is the Bregman divergence h is the convex distance-generating function f is the convex objective function α t, β t, γ t R are arbitrary smooth functions h(x) x h(y) y D h (y, x)
59 Bregman Lagrangian Define the Bregman Lagrangian: ( ) 1 L(x, ẋ, t) = e γt αt 2 ẋ 2 e 2αt+βt f (x) Function of position x, velocity ẋ, and time t D h (y, x) = h(y) h(x) h(x), y x is the Bregman divergence h is the convex distance-generating function f is the convex objective function α t, β t, γ t R are arbitrary smooth functions In Euclidean setting, simplifies to damped Lagrangian h(x) x h(y) y D h (y, x)
60 Bregman Lagrangian Define the Bregman Lagrangian: ) L(x, ẋ, t) = e (D γt+αt h (x + e αt ẋ, x) e βt f (x) Function of position x, velocity ẋ, and time t D h (y, x) = h(y) h(x) h(x), y x is the Bregman divergence h is the convex distance-generating function f is the convex objective function α t, β t, γ t R are arbitrary smooth functions In Euclidean setting, simplifies to damped Lagrangian Ideal scaling conditions: β t e αt γ t = e αt h(x) x h(y) y D h (y, x)
61 Bregman Lagrangian ) L(x, ẋ, t) = e (D γt+αt h (x + e αt ẋ, x) e βt f (x) Variational problem over curves: min L(X t, Ẋt, t) dt X Optimal curve is characterized by Euler-Lagrange equation: { } d L dt ẋ (X t, Ẋ t, t) = L x (X t, Ẋ t, t) x t
62 Bregman Lagrangian ) L(x, ẋ, t) = e (D γt+αt h (x + e αt ẋ, x) e βt f (x) Variational problem over curves: min L(X t, Ẋt, t) dt X Optimal curve is characterized by Euler-Lagrange equation: { } d L dt ẋ (X t, Ẋ t, t) = L x (X t, Ẋ t, t) E-L equation for Bregman Lagrangian under ideal scaling: Ẍ t + (e αt α t )Ẋ t + e 2αt+βt [ 2 h(x t + e αt Ẋ t)] 1 f (Xt ) = 0 x t
63 General convergence rate Theorem Theorem Under ideal scaling, the E-L equation has convergence rate f (X t ) f (x ) O(e βt )
64 General convergence rate Theorem Theorem Under ideal scaling, the E-L equation has convergence rate f (X t ) f (x ) O(e βt ) Proof. Exhibit a Lyapunov function for the dynamics: ) E t = D h (x, X t + e αt Ẋ t + e βt (f (X t ) f (x )) E t = e αt+βt D f (x, X t ) + ( β t e αt )e βt (f (X t ) f (x )) 0 Note: Only requires convexity and differentiability of f, h
65 Polynomial convergence rate For p > 0, choose parameters: α t = log p log t β t = p log t + log C γ t = p log t E-L equation has O(e βt ) = O(1/t p ) convergence rate: Ẍ t + p + 1 Ẋ t + Cp 2 t p 2[ ( 2 h X t + t Ẋt)] 1 f (Xt ) = 0 t p
66 Polynomial convergence rate For p > 0, choose parameters: α t = log p log t β t = p log t + log C γ t = p log t E-L equation has O(e βt ) = O(1/t p ) convergence rate: Ẍ t + p + 1 Ẋ t + Cp 2 t p 2[ ( 2 h X t + t Ẋt)] 1 f (Xt ) = 0 t p For p = 2: Recover result of Krichene et al with O(1/t 2 ) convergence rate In Euclidean case, recover ODE of Su et al: Ẍ t + 3 t Ẋt + f (X t ) = 0
67 Time dilation property (reparameterizing time) (p = 2: accelerated gradient descent) ( ) 1 O t 2 : Ẍ t + 3 [ ( t Ẋt + 4C 2 h X t + 2Ẋt)] t 1 f (Xt ) = 0 speed up time: Y t = X t 3/2 ( ) 1 O t 3 : Ÿ t + 4 Y t [ ( t + 9Ct 2 h Y t + t Y 3 )] 1 f t (Yt ) = 0 (p = 3: accelerated cubic-regularized Newton s method)
68 Time dilation property (reparameterizing time) (p = 2: accelerated gradient descent) ( ) 1 O t 2 : Ẍ t + 3 [ ( t Ẋt + 4C 2 h X t + 2Ẋt)] t 1 f (Xt ) = 0 speed up time: Y t = X t 3/2 ( ) 1 O t 3 : Ÿ t + 4 Y t [ ( t + 9Ct 2 h Y t + t Y 3 )] 1 f t (Yt ) = 0 (p = 3: accelerated cubic-regularized Newton s method) All accelerated methods are traveling the same curve in space-time at different speeds Gradient methods don t have this property From gradient flow to rescaled gradient flow: Replace by 1 p p
69 Time dilation for general Bregman Lagrangian O(e βt ) : E-L for Lagrangian L α,β,γ speed up time: Y t = X τ(t) where O(e β τ(t) ) : E-L for Lagrangian L α, β, γ α t = α τ(t) + log τ(t) β t = β τ(t) γ t = γ τ(t)
70 Time dilation for general Bregman Lagrangian O(e βt ) : E-L for Lagrangian L α,β,γ speed up time: Y t = X τ(t) where O(e β τ(t) ) : E-L for Lagrangian L α, β, γ α t = α τ(t) + log τ(t) β t = β τ(t) γ t = γ τ(t) Question: How to discretize E-L while preserving the convergence rate?
71 Discretizing the dynamics (naive approach) Write E-L as a system of first-order equations: Z t = X t + t p Ẋt d dt h(z t) = Cpt p 1 f (X t )
72 Discretizing the dynamics (naive approach) Write E-L as a system of first-order equations: Z t = X t + t p Ẋt d dt h(z t) = Cpt p 1 f (X t ) Euler discretization with time step δ > 0 (i.e., set x k = X t, x k+1 = X t+δ ): x k+1 = p k + p z k + z k = arg min z k k + p x k { Cpk (p 1) f (x k ), z + 1 ɛ D h(z, z k 1 ) with step size ɛ = δ p, and k (p 1) = k(k + 1) (k + p 2) is the rising factorial }
73 Naive discretization doesn t work x k+1 = p k + p z k + z k = arg min z k k + p x k { Cpk (p 1) f (x k ), z + 1 ɛ D h(z, z k 1 ) Cannot obtain a convergence guarantee, and empirically unstable } f(xk) k
74 Modified discretization Introduce an auxiliary sequence y k : x k+1 = p k + p z k + z k = arg min z k k + p y k { Cpk (p 1) f (y k ), z + 1 ɛ D h(z, z k 1 ) Sufficient condition: f (y k ), x k y k Mɛ 1 p 1 f (y k ) } p p 1
75 Modified discretization Introduce an auxiliary sequence y k : x k+1 = p k + p z k + z k = arg min z k k + p y k { Cpk (p 1) f (y k ), z + 1 ɛ D h(z, z k 1 ) Sufficient condition: f (y k ), x k y k Mɛ 1 p 1 f (y k ) Assume h is uniformly convex: D h (y, x) 1 p y x p } p p 1 Theorem Theorem f (y k ) f (x ) O ( ) 1 ɛk p Note: Matching convergence rates 1/(ɛk p ) = 1/(δk) p = 1/t p Proof using generalization of Nesterov s estimate sequence technique
76 Accelerated higher-order gradient method x k+1 = p k + p z k + k k + p y k { y k = arg min f p 1 (y; x k ) + 2 y z k = arg min z } ɛp y x k p ( 1 O } { Cpk (p 1) f (y k ), z + 1 ɛ D h(z, z k 1 ) ɛk p 1 If p 1 f is (1/ɛ)-Lipschitz and h is uniformly convex of order p, then: ( ) 1 f (y k ) f (x ) O ɛk p accelerated rate p = 2: Accelerated gradient/mirror descent p = 3: Accelerated cubic-regularized Newton s method (Nesterov 08) p 2: Accelerated higher-order method )
77 Recap: Gradient vs. accelerated methods How to design dynamics for minimizing a convex function f? Rescaled gradient flow p 2 p 1 Ẋ t = f (X t) / f (X t) ( ) 1 O t p 1 Higher-order gradient method ( ) 1 O ɛk p 1 when p 1 f is 1 ɛ -Lipschitz matching rate with ɛ = δ p 1 δ = ɛ 1 p 1
78 Recap: Gradient vs. accelerated methods How to design dynamics for minimizing a convex function f? Rescaled gradient flow p 2 p 1 Ẋ t = f (X t) / f (X t) ( ) 1 O t p 1 Polynomial Euler-Lagrange equation Ẍ p + 1 t+ Ẋ t+t p 2[ ( 2 h X t )] 1 f t+ t p Ẋt (Xt) = ( ) 1 O t p Higher-order gradient method ( ) 1 O ɛk p 1 when p 1 f is 1 ɛ -Lipschitz matching rate with ɛ = δ p 1 δ = ɛ 1 p 1 Accelerated higher-order method ( ) 1 O ɛk p when p 1 f is 1 ɛ -Lipschitz matching rate with ɛ = δ p δ = ɛ 1 p
79 Summary: Bregman Lagrangian Bregman Lagrangian family with general convergence guarantee ) L(x, ẋ, t) = e (D γt+αt h (x + e αt ẋ, x) e βt f (x) Polynomial subfamily generates accelerated higher-order methods: O(1/t p ) convergence rate via higher-order smoothness Exponential subfamily: O(e ct ) rate via uniform convexity Understand structure and properties of Bregman Lagrangian: Gauge invariance, symmetry, gradient flows as limit points, etc. Bregman Hamiltonian: ( H(x, p, t) = e (D αt+γt h h(x) + e γ t p, h(x) ) + e βt f (x))
80 Discussion Many conceptual and mathematical challenges arising in taking seriously the problem of Big Data Facing these challenges will require a rapprochement between computational thinking and inferential thinking bringing computational and inferential fields together at the level of their foundations
On Computational Thinking, Inferential Thinking and Data Science
On Computational Thinking, Inferential Thinking and Data Science Michael I. Jordan University of California, Berkeley September 24, 2016 What Is the Big Data Phenomenon? Science in confirmatory mode (e.g.,
More informationOn Gradient-Based Optimization: Accelerated, Stochastic, Asynchronous, Distributed
On Gradient-Based Optimization: Accelerated, Stochastic, Asynchronous, Distributed Michael I. Jordan University of California, Berkeley February 9, 2017 Modern Optimization and Large-Scale Data Analysis
More informationA Conservation Law Method in Optimization
A Conservation Law Method in Optimization Bin Shi Florida International University Tao Li Florida International University Sundaraja S. Iyengar Florida International University Abstract bshi1@cs.fiu.edu
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring
More informationNon-convex optimization. Issam Laradji
Non-convex optimization Issam Laradji Strongly Convex Objective function f(x) x Strongly Convex Objective function Assumptions Gradient Lipschitz continuous f(x) Strongly convex x Strongly Convex Objective
More informationNewton s Method. Javier Peña Convex Optimization /36-725
Newton s Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, f ( (y) = max y T x f(x) ) x Properties and
More informationOptimized first-order minimization methods
Optimized first-order minimization methods Donghwan Kim & Jeffrey A. Fessler EECS Dept., BME Dept., Dept. of Radiology University of Michigan web.eecs.umich.edu/~fessler UM AIM Seminar 2014-10-03 1 Disclosure
More informationJournal Club. A Universal Catalyst for First-Order Optimization (H. Lin, J. Mairal and Z. Harchaoui) March 8th, CMAP, Ecole Polytechnique 1/19
Journal Club A Universal Catalyst for First-Order Optimization (H. Lin, J. Mairal and Z. Harchaoui) CMAP, Ecole Polytechnique March 8th, 2018 1/19 Plan 1 Motivations 2 Existing Acceleration Methods 3 Universal
More informationLasso: Algorithms and Extensions
ELE 538B: Sparsity, Structure and Inference Lasso: Algorithms and Extensions Yuxin Chen Princeton University, Spring 2017 Outline Proximal operators Proximal gradient methods for lasso and its extensions
More informationAccuracy First: Selecting a Differential Privacy Level for Accuracy-Constrained Empirical Risk Minimization
Accuracy First: Selecting a Differential Privacy Level for Accuracy-Constrained Empirical Risk Minimization Katrina Ligett HUJI & Caltech joint with Seth Neel, Aaron Roth, Bo Waggoner, Steven Wu NIPS 2017
More informationStochastic Optimization with Variance Reduction for Infinite Datasets with Finite Sum Structure
Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite Sum Structure Alberto Bietti Julien Mairal Inria Grenoble (Thoth) March 21, 2017 Alberto Bietti Stochastic MISO March 21,
More informationLecture 5: Gradient Descent. 5.1 Unconstrained minimization problems and Gradient descent
10-725/36-725: Convex Optimization Spring 2015 Lecturer: Ryan Tibshirani Lecture 5: Gradient Descent Scribes: Loc Do,2,3 Disclaimer: These notes have not been subjected to the usual scrutiny reserved for
More informationGradient Descent. Ryan Tibshirani Convex Optimization /36-725
Gradient Descent Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: canonical convex programs Linear program (LP): takes the form min x subject to c T x Gx h Ax = b Quadratic program (QP): like
More informationIntegration Methods and Optimization Algorithms
Integration Methods and Optimization Algorithms Damien Scieur INRIA, ENS, PSL Research University, Paris France damien.scieur@inria.fr Francis Bach INRIA, ENS, PSL Research University, Paris France francis.bach@inria.fr
More informationOptimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison
Optimization Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison optimization () cost constraints might be too much to cover in 3 hours optimization (for big
More informationAccelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems)
Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Donghwan Kim and Jeffrey A. Fessler EECS Department, University of Michigan
More informationDescent methods. min x. f(x)
Gradient Descent Descent methods min x f(x) 5 / 34 Descent methods min x f(x) x k x k+1... x f(x ) = 0 5 / 34 Gradient methods Unconstrained optimization min f(x) x R n. 6 / 34 Gradient methods Unconstrained
More informationDistributed online optimization over jointly connected digraphs
Distributed online optimization over jointly connected digraphs David Mateos-Núñez Jorge Cortés University of California, San Diego {dmateosn,cortes}@ucsd.edu Southern California Optimization Day UC San
More informationOptimization methods
Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,
More informationAccelerated Stochastic Mirror Descent: From Continuous-time Dynamics to Discrete-time Algorithms
Accelerated Stochastic Mirror Descent: From Continuous-time Dynamics to Discrete-time Algorithms Pan Xu * Tianhao Wang * Quanquan Gu Department of Computer Science School of Mathematical Sciences Department
More informationNewton s Method. Ryan Tibshirani Convex Optimization /36-725
Newton s Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, Properties and examples: f (y) = max x
More informationHamiltonian Descent Methods
Hamiltonian Descent Methods Chris J. Maddison 1,2 with Daniel Paulin 1, Yee Whye Teh 1,2, Brendan O Donoghue 2, Arnaud Doucet 1 Department of Statistics University of Oxford 1 DeepMind London, UK 2 The
More informationPrimal-dual Subgradient Method for Convex Problems with Functional Constraints
Primal-dual Subgradient Method for Convex Problems with Functional Constraints Yurii Nesterov, CORE/INMA (UCL) Workshop on embedded optimization EMBOPT2014 September 9, 2014 (Lucca) Yu. Nesterov Primal-dual
More information10. Unconstrained minimization
Convex Optimization Boyd & Vandenberghe 10. Unconstrained minimization terminology and assumptions gradient descent method steepest descent method Newton s method self-concordant functions implementation
More informationDistributed online optimization over jointly connected digraphs
Distributed online optimization over jointly connected digraphs David Mateos-Núñez Jorge Cortés University of California, San Diego {dmateosn,cortes}@ucsd.edu Mathematical Theory of Networks and Systems
More informationCourse Notes for EE227C (Spring 2018): Convex Optimization and Approximation
Course Notes for EE7C (Spring 018): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee7c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee7c@berkeley.edu October
More information1 Differential Privacy and Statistical Query Learning
10-806 Foundations of Machine Learning and Data Science Lecturer: Maria-Florina Balcan Lecture 5: December 07, 015 1 Differential Privacy and Statistical Query Learning 1.1 Differential Privacy Suppose
More informationminimize x subject to (x 2)(x 4) u,
Math 6366/6367: Optimization and Variational Methods Sample Preliminary Exam Questions 1. Suppose that f : [, L] R is a C 2 -function with f () on (, L) and that you have explicit formulae for
More informationOptimization methods
Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to
More informationGradient descent. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725
Gradient descent Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Gradient descent First consider unconstrained minimization of f : R n R, convex and differentiable. We want to solve
More informationConvex Optimization. Newton s method. ENSAE: Optimisation 1/44
Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)
More informationLecture 5: September 12
10-725/36-725: Convex Optimization Fall 2015 Lecture 5: September 12 Lecturer: Lecturer: Ryan Tibshirani Scribes: Scribes: Barun Patra and Tyler Vuong Note: LaTeX template courtesy of UC Berkeley EECS
More informationProximal Minimization by Incremental Surrogate Optimization (MISO)
Proximal Minimization by Incremental Surrogate Optimization (MISO) (and a few variants) Julien Mairal Inria, Grenoble ICCOPT, Tokyo, 2016 Julien Mairal, Inria MISO 1/26 Motivation: large-scale machine
More informationEstimators based on non-convex programs: Statistical and computational guarantees
Estimators based on non-convex programs: Statistical and computational guarantees Martin Wainwright UC Berkeley Statistics and EECS Based on joint work with: Po-Ling Loh (UC Berkeley) Martin Wainwright
More informationLecture 14: October 17
1-725/36-725: Convex Optimization Fall 218 Lecture 14: October 17 Lecturer: Lecturer: Ryan Tibshirani Scribes: Pengsheng Guo, Xian Zhou Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:
More informationContinuous and Discrete-time Accelerated Stochastic Mirror Descent for Strongly Convex Functions
Continuous and Discrete-time Accelerated Stochastic Mirror Descent for Strongly Convex Functions Pan Xu * Tianhao Wang *2 Quanquan Gu Abstract We provide a second-order stochastic differential equation
More informationLinearly-Convergent Stochastic-Gradient Methods
Linearly-Convergent Stochastic-Gradient Methods Joint work with Francis Bach, Michael Friedlander, Nicolas Le Roux INRIA - SIERRA Project - Team Laboratoire d Informatique de l École Normale Supérieure
More informationLocal Differential Privacy
Local Differential Privacy Peter Kairouz Department of Electrical & Computer Engineering University of Illinois at Urbana-Champaign Joint work with Sewoong Oh (UIUC) and Pramod Viswanath (UIUC) / 33 Wireless
More informationPrivate Empirical Risk Minimization, Revisited
Private Empirical Risk Minimization, Revisited Raef Bassily Adam Smith Abhradeep Thakurta April 10, 2014 Abstract In this paper, we initiate a systematic investigation of differentially private algorithms
More informationOn Bayesian Computation
On Bayesian Computation Michael I. Jordan with Elaine Angelino, Maxim Rabinovich, Martin Wainwright and Yun Yang Previous Work: Information Constraints on Inference Minimize the minimax risk under constraints
More informationarxiv: v1 [math.oc] 1 Jul 2016
Convergence Rate of Frank-Wolfe for Non-Convex Objectives Simon Lacoste-Julien INRIA - SIERRA team ENS, Paris June 8, 016 Abstract arxiv:1607.00345v1 [math.oc] 1 Jul 016 We give a simple proof that the
More informationConditional Gradient (Frank-Wolfe) Method
Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties
More informationA Second-Order Path-Following Algorithm for Unconstrained Convex Optimization
A Second-Order Path-Following Algorithm for Unconstrained Convex Optimization Yinyu Ye Department is Management Science & Engineering and Institute of Computational & Mathematical Engineering Stanford
More informationFast Stochastic Optimization Algorithms for ML
Fast Stochastic Optimization Algorithms for ML Aaditya Ramdas April 20, 205 This lecture is about efficient algorithms for minimizing finite sums min w R d n i= f i (w) or min w R d n f i (w) + λ 2 w 2
More informationADMM and Fast Gradient Methods for Distributed Optimization
ADMM and Fast Gradient Methods for Distributed Optimization João Xavier Instituto Sistemas e Robótica (ISR), Instituto Superior Técnico (IST) European Control Conference, ECC 13 July 16, 013 Joint work
More informationMinimizing Finite Sums with the Stochastic Average Gradient Algorithm
Minimizing Finite Sums with the Stochastic Average Gradient Algorithm Joint work with Nicolas Le Roux and Francis Bach University of British Columbia Context: Machine Learning for Big Data Large-scale
More informationThis can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization
This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization x = prox_f(x)+prox_{f^*}(x) use to get prox of norms! PROXIMAL METHODS WHY PROXIMAL METHODS Smooth
More informationLarge Deviations for Small-Noise Stochastic Differential Equations
Chapter 22 Large Deviations for Small-Noise Stochastic Differential Equations This lecture is at once the end of our main consideration of diffusions and stochastic calculus, and a first taste of large
More informationA Greedy Framework for First-Order Optimization
A Greedy Framework for First-Order Optimization Jacob Steinhardt Department of Computer Science Stanford University Stanford, CA 94305 jsteinhardt@cs.stanford.edu Jonathan Huggins Department of EECS Massachusetts
More informationLarge Deviations for Small-Noise Stochastic Differential Equations
Chapter 21 Large Deviations for Small-Noise Stochastic Differential Equations This lecture is at once the end of our main consideration of diffusions and stochastic calculus, and a first taste of large
More informationReferences. --- a tentative list of papers to be mentioned in the ICML 2017 tutorial. Recent Advances in Stochastic Convex and Non-Convex Optimization
References --- a tentative list of papers to be mentioned in the ICML 2017 tutorial Recent Advances in Stochastic Convex and Non-Convex Optimization Disclaimer: in a quite arbitrary order. 1. [ShalevShwartz-Zhang,
More informationAccelerated Proximal Gradient Methods for Convex Optimization
Accelerated Proximal Gradient Methods for Convex Optimization Paul Tseng Mathematics, University of Washington Seattle MOPTA, University of Guelph August 18, 2008 ACCELERATED PROXIMAL GRADIENT METHODS
More informationOn Acceleration with Noise-Corrupted Gradients. + m k 1 (x). By the definition of Bregman divergence:
A Omitted Proofs from Section 3 Proof of Lemma 3 Let m x) = a i On Acceleration with Noise-Corrupted Gradients fxi ), u x i D ψ u, x 0 ) denote the function under the minimum in the lower bound By Proposition
More informationIFT Lecture 6 Nesterov s Accelerated Gradient, Stochastic Gradient Descent
IFT 6085 - Lecture 6 Nesterov s Accelerated Gradient, Stochastic Gradient Descent This version of the notes has not yet been thoroughly checked. Please report any bugs to the scribes or instructor. Scribe(s):
More informationAn Optimal Affine Invariant Smooth Minimization Algorithm.
An Optimal Affine Invariant Smooth Minimization Algorithm. Alexandre d Aspremont, CNRS & École Polytechnique. Joint work with Martin Jaggi. Support from ERC SIPA. A. d Aspremont IWSL, Moscow, June 2013,
More informationDuality in Linear Programs. Lecturer: Ryan Tibshirani Convex Optimization /36-725
Duality in Linear Programs Lecturer: Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: proximal gradient descent Consider the problem x g(x) + h(x) with g, h convex, g differentiable, and
More informationUnconstrained minimization
CSCI5254: Convex Optimization & Its Applications Unconstrained minimization terminology and assumptions gradient descent method steepest descent method Newton s method self-concordant functions 1 Unconstrained
More informationSubgradient Method. Ryan Tibshirani Convex Optimization
Subgradient Method Ryan Tibshirani Convex Optimization 10-725 Consider the problem Last last time: gradient descent min x f(x) for f convex and differentiable, dom(f) = R n. Gradient descent: choose initial
More informationRanking from Crowdsourced Pairwise Comparisons via Matrix Manifold Optimization
Ranking from Crowdsourced Pairwise Comparisons via Matrix Manifold Optimization Jialin Dong ShanghaiTech University 1 Outline Introduction FourVignettes: System Model and Problem Formulation Problem Analysis
More informationNonlinear Optimization Methods for Machine Learning
Nonlinear Optimization Methods for Machine Learning Jorge Nocedal Northwestern University University of California, Davis, Sept 2018 1 Introduction We don t really know, do we? a) Deep neural networks
More informationRandomized Smoothing for Stochastic Optimization
Randomized Smoothing for Stochastic Optimization John Duchi Peter Bartlett Martin Wainwright University of California, Berkeley NIPS Big Learn Workshop, December 2011 Duchi (UC Berkeley) Smoothing and
More informationSubgradient Method. Guest Lecturer: Fatma Kilinc-Karzan. Instructors: Pradeep Ravikumar, Aarti Singh Convex Optimization /36-725
Subgradient Method Guest Lecturer: Fatma Kilinc-Karzan Instructors: Pradeep Ravikumar, Aarti Singh Convex Optimization 10-725/36-725 Adapted from slides from Ryan Tibshirani Consider the problem Recall:
More informationSelected Topics in Optimization. Some slides borrowed from
Selected Topics in Optimization Some slides borrowed from http://www.stat.cmu.edu/~ryantibs/convexopt/ Overview Optimization problems are almost everywhere in statistics and machine learning. Input Model
More informationAdaptive Online Gradient Descent
University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 6-4-2007 Adaptive Online Gradient Descent Peter Bartlett Elad Hazan Alexander Rakhlin University of Pennsylvania Follow
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning First-Order Methods, L1-Regularization, Coordinate Descent Winter 2016 Some images from this lecture are taken from Google Image Search. Admin Room: We ll count final numbers
More informationTrade-Offs in Distributed Learning and Optimization
Trade-Offs in Distributed Learning and Optimization Ohad Shamir Weizmann Institute of Science Includes joint works with Yossi Arjevani, Nathan Srebro and Tong Zhang IHES Workshop March 2016 Distributed
More informationConvex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013
Convex Optimization (EE227A: UC Berkeley) Lecture 15 (Gradient methods III) 12 March, 2013 Suvrit Sra Optimal gradient methods 2 / 27 Optimal gradient methods We saw following efficiency estimates for
More informationGRADIENT = STEEPEST DESCENT
GRADIENT METHODS GRADIENT = STEEPEST DESCENT Convex Function Iso-contours gradient 0.5 0.4 4 2 0 8 0.3 0.2 0. 0 0. negative gradient 6 0.2 4 0.3 2.5 0.5 0 0.5 0.5 0 0.5 0.4 0.5.5 0.5 0 0.5 GRADIENT DESCENT
More informationFast Algorithms for Segmented Regression
Fast Algorithms for Segmented Regression Jayadev Acharya 1 Ilias Diakonikolas 2 Jerry Li 1 Ludwig Schmidt 1 1 MIT 2 USC June 21, 2016 1 / 21 Statistical vs computational tradeoffs? General Motivating Question
More informationADMM and Accelerated ADMM as Continuous Dynamical Systems
Guilherme França 1 Daniel P. Robinson 1 René Vidal 1 Abstract Recently, there has been an increasing interest in using tools from dynamical systems to analyze the behavior of simple optimization algorithms
More informationCS 435, 2018 Lecture 3, Date: 8 March 2018 Instructor: Nisheeth Vishnoi. Gradient Descent
CS 435, 2018 Lecture 3, Date: 8 March 2018 Instructor: Nisheeth Vishnoi Gradient Descent This lecture introduces Gradient Descent a meta-algorithm for unconstrained minimization. Under convexity of the
More informationLinear and Sublinear Linear Algebra Algorithms: Preconditioning Stochastic Gradient Algorithms with Randomized Linear Algebra
Linear and Sublinear Linear Algebra Algorithms: Preconditioning Stochastic Gradient Algorithms with Randomized Linear Algebra Michael W. Mahoney ICSI and Dept of Statistics, UC Berkeley ( For more info,
More informationAccelerating Stochastic Optimization
Accelerating Stochastic Optimization Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem and Mobileye Master Class at Tel-Aviv, Tel-Aviv University, November 2014 Shalev-Shwartz
More informationProximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725
Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:
More information6. Proximal gradient method
L. Vandenberghe EE236C (Spring 2016) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping
More informationAnswering Many Queries with Differential Privacy
6.889 New Developments in Cryptography May 6, 2011 Answering Many Queries with Differential Privacy Instructors: Shafi Goldwasser, Yael Kalai, Leo Reyzin, Boaz Barak, and Salil Vadhan Lecturer: Jonathan
More informationLecture 14: Newton s Method
10-725/36-725: Conve Optimization Fall 2016 Lecturer: Javier Pena Lecture 14: Newton s ethod Scribes: Varun Joshi, Xuan Li Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes
More informationLearning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley
Learning Methods for Online Prediction Problems Peter Bartlett Statistics and EECS UC Berkeley Course Synopsis A finite comparison class: A = {1,..., m}. 1. Prediction with expert advice. 2. With perfect
More informationHow hard is this function to optimize?
How hard is this function to optimize? John Duchi Based on joint work with Sabyasachi Chatterjee, John Lafferty, Yuancheng Zhu Stanford University West Coast Optimization Rumble October 2016 Problem minimize
More informationOracle Complexity of Second-Order Methods for Smooth Convex Optimization
racle Complexity of Second-rder Methods for Smooth Convex ptimization Yossi Arjevani had Shamir Ron Shiff Weizmann Institute of Science Rehovot 7610001 Israel Abstract yossi.arjevani@weizmann.ac.il ohad.shamir@weizmann.ac.il
More informationGeneralized Conditional Gradient and Its Applications
Generalized Conditional Gradient and Its Applications Yaoliang Yu University of Alberta UBC Kelowna, 04/18/13 Y-L. Yu (UofA) GCG and Its Apps. UBC Kelowna, 04/18/13 1 / 25 1 Introduction 2 Generalized
More information1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method
L. Vandenberghe EE236C (Spring 2016) 1. Gradient method gradient method, first-order methods quadratic bounds on convex functions analysis of gradient method 1-1 Approximate course outline First-order
More informationLogistic Regression Trained with Different Loss Functions. Discussion
Logistic Regression Trained with Different Loss Functions Discussion CS640 Notations We restrict our discussions to the binary case. g(z) = g (z) = g(z) z h w (x) = g(wx) = + e z = g(z)( g(z)) + e wx =
More informationarxiv: v1 [cs.lg] 27 May 2014
Private Empirical Risk Minimization, Revisited Raef Bassily Adam Smith Abhradeep Thakurta May 29, 2014 arxiv:1405.7085v1 [cs.lg] 27 May 2014 Abstract In this paper, we initiate a systematic investigation
More informationProximal Newton Method. Ryan Tibshirani Convex Optimization /36-725
Proximal Newton Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: primal-dual interior-point method Given the problem min x subject to f(x) h i (x) 0, i = 1,... m Ax = b where f, h
More informationHigher-Order Methods
Higher-Order Methods Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. PCMI, July 2016 Stephen Wright (UW-Madison) Higher-Order Methods PCMI, July 2016 1 / 25 Smooth
More informationAccelerated gradient methods
ELE 538B: Large-Scale Optimization for Data Science Accelerated gradient methods Yuxin Chen Princeton University, Spring 018 Outline Heavy-ball methods Nesterov s accelerated gradient methods Accelerated
More informationUnconstrained minimization of smooth functions
Unconstrained minimization of smooth functions We want to solve min x R N f(x), where f is convex. In this section, we will assume that f is differentiable (so its gradient exists at every point), and
More informationTutorial: PART 2. Optimization for Machine Learning. Elad Hazan Princeton University. + help from Sanjeev Arora & Yoram Singer
Tutorial: PART 2 Optimization for Machine Learning Elad Hazan Princeton University + help from Sanjeev Arora & Yoram Singer Agenda 1. Learning as mathematical optimization Stochastic optimization, ERM,
More informationInformation-theoretic foundations of differential privacy
Information-theoretic foundations of differential privacy Darakhshan J. Mir Rutgers University, Piscataway NJ 08854, USA, mir@cs.rutgers.edu Abstract. We examine the information-theoretic foundations of
More informationA Brief Overview of Practical Optimization Algorithms in the Context of Relaxation
A Brief Overview of Practical Optimization Algorithms in the Context of Relaxation Zhouchen Lin Peking University April 22, 2018 Too Many Opt. Problems! Too Many Opt. Algorithms! Zero-th order algorithms:
More informationComparison of Modern Stochastic Optimization Algorithms
Comparison of Modern Stochastic Optimization Algorithms George Papamakarios December 214 Abstract Gradient-based optimization methods are popular in machine learning applications. In large-scale problems,
More informationBeyond stochastic gradient descent for large-scale machine learning
Beyond stochastic gradient descent for large-scale machine learning Francis Bach INRIA - Ecole Normale Supérieure, Paris, France Joint work with Eric Moulines, Nicolas Le Roux and Mark Schmidt - CAP, July
More informationLecture 15 Newton Method and Self-Concordance. October 23, 2008
Newton Method and Self-Concordance October 23, 2008 Outline Lecture 15 Self-concordance Notion Self-concordant Functions Operations Preserving Self-concordance Properties of Self-concordant Functions Implications
More informationarxiv: v1 [math.oc] 20 Jul 2018
Continuous-Time Accelerated Methods via a Hybrid Control Lens ARMAN SHARIFI KOLARIJANI, PEYMAN MOHAJERIN ESFAHANI, TAMÁS KEVICZKY arxiv:1807.07805v1 [math.oc] 20 Jul 2018 Abstract. Treating optimization
More informationMath 273a: Optimization Subgradients of convex functions
Math 273a: Optimization Subgradients of convex functions Made by: Damek Davis Edited by Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com 1 / 42 Subgradients Assumptions
More informationDistributed Estimation, Information Loss and Exponential Families. Qiang Liu Department of Computer Science Dartmouth College
Distributed Estimation, Information Loss and Exponential Families Qiang Liu Department of Computer Science Dartmouth College Statistical Learning / Estimation Learning generative models from data Topic
More informationConverse Lyapunov theorem and Input-to-State Stability
Converse Lyapunov theorem and Input-to-State Stability April 6, 2014 1 Converse Lyapunov theorem In the previous lecture, we have discussed few examples of nonlinear control systems and stability concepts
More informationTutorial: PART 2. Online Convex Optimization, A Game- Theoretic Approach to Learning
Tutorial: PART 2 Online Convex Optimization, A Game- Theoretic Approach to Learning Elad Hazan Princeton University Satyen Kale Yahoo Research Exploiting curvature: logarithmic regret Logarithmic regret
More informationIntegration Methods and Accelerated Optimization Algorithms
Integration Methods and Accelerated Optimization Algorithms Damien Scieur Vincent Roulet Francis Bach Alexandre d Aspremont INRIA - Sierra Project-team Département d Informatique de l Ecole Normale Supérieure
More information