ELE 538B: Large-Scale Optimization for Data Science. Introduction. Yuxin Chen Princeton University, Spring 2018

Similar documents
ELE 538B: Large-Scale Optimization for Data Science. Quasi-Newton methods. Yuxin Chen Princeton University, Spring 2018

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

MATH 5720: Gradient Methods Hung Phan, UMass Lowell October 4, 2018

An introduction to the theory of SDDP algorithm

CSE 3802 / ECE Numerical Methods in Scientific Computation. Jinbo Bi. Department of Computer Science & Engineering

Západočeská Univerzita v Plzni, Czech Republic and Groupe ESIEE Paris, France

Supplement for Stochastic Convex Optimization: Faster Local Growth Implies Faster Global Convergence

Lecture 1 Overview. course mechanics. outline & topics. what is a linear dynamical system? why study linear systems? some examples

A Primal-Dual Type Algorithm with the O(1/t) Convergence Rate for Large Scale Constrained Convex Programs

Notes on online convex optimization

Particle Swarm Optimization

Lecture 9: September 25

Chapter 3 Boundary Value Problem

CHAPTER 10 VALIDATION OF TEST WITH ARTIFICAL NEURAL NETWORK

Spring Ammar Abu-Hudrouss Islamic University Gaza

Ensamble methods: Boosting

Most Probable Phase Portraits of Stochastic Differential Equations and Its Numerical Simulation

Deep Learning: Theory, Techniques & Applications - Recurrent Neural Networks -

A Local Regret in Nonconvex Online Learning

L07. KALMAN FILTERING FOR NON-LINEAR SYSTEMS. NA568 Mobile Robotics: Methods & Algorithms

Random Walk with Anti-Correlated Steps

1 Review of Zero-Sum Games

A primal-dual Laplacian gradient flow dynamics for distributed resource allocation problems

Ensamble methods: Bagging and Boosting

Article from. Predictive Analytics and Futurism. July 2016 Issue 13

Technical Report Doc ID: TR March-2013 (Last revision: 23-February-2016) On formulating quadratic functions in optimization models.

Revisiting Projection-Free Optimization for Strongly Convex Constraint Sets

Orientation. Connections between network coding and stochastic network theory. Outline. Bruce Hajek. Multicast with lost packets

Lecture 2 October ε-approximation of 2-player zero-sum games

A Parallel Best-Response Algorithm with Exact Line Search for Nonconvex Sparsity-Regularized Rank Minimization

MATH 128A, SUMMER 2009, FINAL EXAM SOLUTION

Final Spring 2007

Physics 235 Chapter 2. Chapter 2 Newtonian Mechanics Single Particle

Solutions for Assignment 2

INTRODUCTION TO MACHINE LEARNING 3RD EDITION

Entanglement and complexity of many-body wavefunctions

Non-parametric techniques. Instance Based Learning. NN Decision Boundaries. Nearest Neighbor Algorithm. Distance metric important

Primal-Dual Splitting: Recent Improvements and Variants

arxiv: v4 [stat.ml] 14 Jun 2018

RANDOM LAGRANGE MULTIPLIERS AND TRANSVERSALITY

A variational radial basis function approximation for diffusion processes.

STATE-SPACE MODELLING. A mass balance across the tank gives:

Non-parametric techniques. Instance Based Learning. NN Decision Boundaries. Nearest Neighbor Algorithm. Distance metric important

Kalman filtering for maximum likelihood estimation given corrupted observations.

A Decentralized Second-Order Method with Exact Linear Convergence Rate for Consensus Optimization

t is a basis for the solution space to this system, then the matrix having these solutions as columns, t x 1 t, x 2 t,... x n t x 2 t...

Introduction to Probability and Statistics Slides 4 Chapter 4

Modal identification of structures from roving input data by means of maximum likelihood estimation of the state space model

INSTANTANEOUS VELOCITY

Stochastic Variance-Reduced Cubic Regularized Newton Method

An EM algorithm for maximum likelihood estimation given corrupted observations. E. E. Holmes, National Marine Fisheries Service

ON THE DEGREES OF RATIONAL KNOTS

The Rosenblatt s LMS algorithm for Perceptron (1958) is built around a linear neuron (a neuron with a linear

Instructor: Barry McQuarrie Page 1 of 5

Lecture 2-1 Kinematics in One Dimension Displacement, Velocity and Acceleration Everything in the world is moving. Nothing stays still.

Single and Double Pendulum Models

PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD

Multi-scale 2D acoustic full waveform inversion with high frequency impulsive source

From Particles to Rigid Bodies

18 Biological models with discrete time

Math 115 Final Exam December 14, 2017

An Introduction to Stochastic Programming: The Recourse Problem

Applying Genetic Algorithms for Inventory Lot-Sizing Problem with Supplier Selection under Storage Capacity Constraints

A Shooting Method for A Node Generation Algorithm

Notes for Lecture 17-18

Simulation-Solving Dynamic Models ABE 5646 Week 2, Spring 2010

2.7. Some common engineering functions. Introduction. Prerequisites. Learning Outcomes

23.5. Half-Range Series. Introduction. Prerequisites. Learning Outcomes

Math Week 14 April 16-20: sections first order systems of linear differential equations; 7.4 mass-spring systems.

Solutions from Chapter 9.1 and 9.2

Appendix to Online l 1 -Dictionary Learning with Application to Novel Document Detection

Pade and Laguerre Approximations Applied. to the Active Queue Management Model. of Internet Protocol

Vehicle Arrival Models : Headway

Aryan Mokhtari, Wei Shi, Qing Ling, and Alejandro Ribeiro. cost function n

MODULE - 9 LECTURE NOTES 2 GENETIC ALGORITHMS

From Complex Fourier Series to Fourier Transforms

Lecture 4: November 13

Information Geometry and Natural Gradients

Robot Motion Model EKF based Localization EKF SLAM Graph SLAM

Computer-Aided Analysis of Electronic Circuits Course Notes 3

Learning to Process Natural Language in Big Data Environment

Outline. lse-logo. Outline. Outline. 1 Wald Test. 2 The Likelihood Ratio Test. 3 Lagrange Multiplier Tests

Math 334 Test 1 KEY Spring 2010 Section: 001. Instructor: Scott Glasgow Dates: May 10 and 11.

Introduction to Mobile Robotics

Application of a Stochastic-Fuzzy Approach to Modeling Optimal Discrete Time Dynamical Systems by Using Large Scale Data Processing

4.2 The Fourier Transform

Accelerated micro/macro Monte Carlo simulation of dilute polymer solutions

Problem Set 5. Graduate Macro II, Spring 2017 The University of Notre Dame Professor Sims

Probabilistic Robotics The Sparse Extended Information Filter

The Contradiction within Equations of Motion with Constant Acceleration

RL Lecture 7: Eligibility Traces. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1

Time Series Forecasting using CCA and Kohonen Maps - Application to Electricity Consumption

The equation to any straight line can be expressed in the form:

arxiv: v2 [cs.lg] 28 Dec 2018

CS376 Computer Vision Lecture 6: Optical Flow

dt = C exp (3 ln t 4 ). t 4 W = C exp ( ln(4 t) 3) = C(4 t) 3.

Kinematics and kinematic functions

An EM based training algorithm for recurrent neural networks

Chapter 7 Response of First-order RL and RC Circuits

Explaining Total Factor Productivity. Ulrich Kohli University of Geneva December 2015

Transcription:

ELE 538B: Large-Scale Opimizaion for Daa Science Inroducion Yuxin Chen Princeon Universiy, Spring 2018

Surge of daa-inensive applicaions Widespread applicaions in large-scale daa science and learning 2.5 exabyes of daa are generaed every day (2012) exabye zeabye yoabye...?? limied processing abiliy (compuaion, sorage,...)

Opimizaion has ransformed algorihm design

Opimizaion has ransformed algorihm design

Opimizaion has ransformed algorihm design (Convex) opimizaion is almos a ool

Solvabiliy / racabiliy... he grea waershed in opimizaion isn beween lineariy and nonlineariy, bu convexiy and nonconvexiy R. Rockafellar 1993 Inroducion 1-

Polynomial-ime solvabiliy scalabiliy Even polynomial-ime algorihms migh be useless in large-scale applicaions Inroducion 1-5

Since = 2/(( + 1)), we have f (x ) f x k x k Example: Newon s mehod 1) 10 5 10 10 =) 10 150 k f (xk ) converges in only 5 seps ( + 1) +1 2 2 + 1 kg k22 ( + 1) 1 ( + 1) +1 2 3 1 2 kg k2 x k22 + L2f f bes,k x k22 + 1X k 2 kg k2 k=0 L2f 2L2f P ( + 1) k=0 k k f (x ) 5 f op O(log log 1ε ) ieraions k=0 1 X f (x ) and hence quadraic local convergence =) f op aains ε αaccuracy backracking parameers = 0.1, β =wihin 0.7 consrained minimizaion 2 2 f op 0 100 op ff(x (x)(k)) f p 2 L f f bes,k k=0 105 k f (xk ) 1 x k22 + ( + 1) +1 X x(1) f op 0 Summing over all ieraions before, we ge x ( ( + 1) +1 ( x R 1) n f (x) Examplesminimize f (x ) f op x k22 +1 2 1 x = x ( f (x )) f (x ) Summing over all ieraions before, we ge L2f 2L2f P ( + 1) k=0 k k=0 x k22 + ( 1) x k22 ( + 1) +1 xample in R2 (page 10 9) (0) op and hence 1X k 2 kg k2 1 2 kg k2 Inroducion 10 21 1-6

Since = 2/(( + 1)), we have f (x ) f x k x k Example: Newon s mehod 1) 10 5 10 10 =) 10 150 k f (xk ) =) f op aains ε αaccuracy backracking parameers = 0.1, β =wihin 0.7 converges inypically only 5 sepsrequires ( + 1) +1 2 2 + 1 kg k22 ( + 1) 1 ( + 1) +1 2 3 1 2 kg k2 x k22 + L2f f bes,k x k22 + 1X k 2 kg k2 k=0 L2f 2L2f P ( + 1) k=0 k k f (x ) 5 f op O(log log 1ε ) ieraions Hessian informaion 2 f (x) Rn n k=0 1 X f (x ) and hence quadraic local convergence consrained minimizaion 2 2 f op 0 100 op ff(x (x)(k)) f p 2 L f f bes,k k=0 105 k f (xk ) 1 x k22 + ( + 1) +1 X x(1) f op 0 Summing over all ieraions before, we ge x ( ( + 1) +1 ( x R 1) n f (x) Examplesminimize f (x ) f op x k22 +1 2 1 x = x ( f (x )) f (x ) Summing over all ieraions before, we ge L2f 2L2f P ( + 1) k=0 k k=0 x k22 + ( 1) x k22 ( + 1) +1 xample in R2 (page 10 9) (0) op and hence 1X k 2 kg k2 1 2 kg k2 Inroducion 10 21 1-6

Since = 2/(( + 1)), we have f (x ) f x k x k Example: Newon s mehod 1) 10 5 10 10 =) 10 150 k f (xk ) =) f op aains ε αaccuracy backracking parameers = 0.1, β =wihin 0.7 converges inypically only 5 sepsrequires +1 2 2 + 1 kg k22 ( + 1) 1 ( + 1) +1 2 3 1 2 kg k2 x k22 + L2f f bes,k x k22 + 1X k 2 kg k2 k=0 L2f 2L2f P ( + 1) k=0 k k f (x ) 5 f op O(log log 1ε ) ieraions Hessian informaion 2 f (x) Rn n 1 consrained minimizaion k=0 X f (x ) ( + 1) a single ieraion may las forever; prohibiive sorage requiremen quadraic local convergence and hence 2 2 f op 0 100 op ff(x (x)(k)) f p 2 L f f bes,k k=0 105 k f (xk ) 1 x k22 + ( + 1) +1 X x(1) f op 0 Summing over all ieraions before, we ge x ( ( + 1) +1 ( x R 1) n f (x) Examplesminimize f (x ) f op x k22 +1 2 1 x = x ( f (x )) f (x ) Summing over all ieraions before, we ge L2f 2L2f P ( + 1) k=0 k k=0 x k22 + ( 1) x k22 ( + 1) +1 xample in R2 (page 10 9) (0) op and hence 1X k 2 kg k2 1 2 kg k2 Inroducion 10 21 1-6

Ieraion complexiy vs. per-ieraion cos compuaional cos = ieraion complexiy }{{} #ieraions needed cos per ieraion ieraion complexiy ieraions needed cos per ieraion Large-scale problems call for mehods wih cheap ieraions Inroducion 1-7

Mehods of choice x Fir firs-order oracle f(x) -orde Òf(x) -order Firs-order mehods: mehods ha exploi only informaion on funcion values and (sub)gradiens (wihou using Hessian informaion) cheap ieraions low memory requiremen Inroducion 1-8

Wha his course will NOT cover second-order mehods check ORF 522 (by M. Wang) convex analysis check ORF 522 (by M. Wang) and ORF 523 (by A. Ahmadi) sum of squares programming check ORF 523 (by A. Ahmadi) approximaion algorihms for NP hard problems check ORF 523 (by A. Ahmadi) compuaional hardness check ORF 523 (by A. Ahmadi) online opimizaion check COS 511 (by E. Hazan) Inroducion 1-9

Wha his course will cover: convex opimizaion algorihms gradien mehods Frank-Wolfe and projeced gradien mehods subgradien mehods proximal gradien mehods acceleraed proximal gradien mehods mirror descen sochasic gradien mehods ADMM quasi-newon mehods (BFGS) large-scale linear algebra (conjugae gradien, lanczos mehod) ODE inerpreaions Inroducion 1-10

Wha his course will cover: nonconvex opimizaion? geomery of marix facorizaion (phase rerieval, marix compleion, marix sensing) escaping saddle poins gradien mehods for marix facorizaion neural nework? Inroducion 1-11

Texbooks We recommend hese hree books, bu will no follow hem closely... Inroducion 1-12

WARNING There will be quie a few THEOREMS and PROOFS... May be somewha disorganized Taugh for he firs ime Inroducion 1-13

Prerequisies basic linear algebra basic probabiliy a programming language (e.g. Malab, Pyhon,...) knowledge in basic convex opimizaion Inroducion 1-1

Prerequisies basic linear algebra basic probabiliy a programming language (e.g. Malab, Pyhon,...) knowledge in basic convex opimizaion Somewha surprisingly, mos proofs rely only on basic linear algebra and elemenary recursive formula Inroducion 1-1

Grading difficuly Inroducion 1-15

Grading difficuly workload Inroducion 1-15

Grading Homeworks: 3 problem ses use Piazza as he main mode of elecronic communicaion; please pos (and answer) quesions here! Term projec eiher individually or in groups of wo Inroducion 1-16

Grading Homeworks: 3 problem ses use Piazza as he main mode of elecronic communicaion; please pos (and answer) quesions here! Term projec eiher individually or in groups of wo grade = { max{0.5h + 0.5P, P }, if exam where H: homework; P : projec Inroducion 1-16

Grading Homeworks: 3 problem ses use Piazza as he main mode of elecronic communicaion; please pos (and answer) quesions here! Term projec eiher individually or in groups of wo grade = { max{0.5h + 0.5P, P }, if exam max{0.5h + 0.5P, 0.5E + 0.5P, P }, if exam where H: homework; P : projec; E: akehome exam Inroducion 1-16

Grading Homeworks: 3 problem ses use Piazza as he main mode of elecronic communicaion; please pos (and answer) quesions here! Term projec eiher individually or in groups of wo grade = max{0.5h + 0.5P, P }, max{0.5h + 0.5P, 0.5E + 0.5P, P }, if } exam {{} random else where H: homework; P : projec; E: akehome exam Inroducion 1-16

Term projec Two forms lieraure review original research You are srongly encouraged o combine i wih your own research Inroducion 1-17

Term projec Two forms lieraure review original research You are srongly encouraged o combine i wih your own research Three milesones Proposal (March 16): up o 1 page Presenaion (eiher las week of class or reading period) Repor (May 13): up o pages wih unlimied appendix Inroducion 1-17