LECTURE 24 LECTURE OUTLINE

Similar documents
arxiv: v1 [cs.lg] 22 Feb 2015

Parallel Multi-splitting Proximal Method for Star Networks

Support vector machines

ECE 421/599 Electric Energy Systems 7 Optimal Dispatch of Generation. Instructor: Kai Sun Fall 2014

A new type of optimization method based on conjugate directions

Supplementary material for the paper Optimal Transport for Domain Adaptation

Support vector machines II

Functions of Random Variables

An Introduction to. Support Vector Machine

COMPROMISE HYPERSPHERE FOR STOCHASTIC DOMINANCE MODEL

CS 1675 Introduction to Machine Learning Lecture 12 Support vector machines

Supervised learning: Linear regression Logistic regression

BERNSTEIN COLLOCATION METHOD FOR SOLVING NONLINEAR DIFFERENTIAL EQUATIONS. Aysegul Akyuz Dascioglu and Nese Isler

Chapter 4 Multiple Random Variables

An Accelerated Proximal Coordinate Gradient Method

X ε ) = 0, or equivalently, lim

CSE 5526: Introduction to Neural Networks Linear Regression

3. Basic Concepts: Consequences and Properties

CS 2750 Machine Learning. Lecture 8. Linear regression. CS 2750 Machine Learning. Linear regression. is a linear combination of input components x

Lecture 16: Backpropogation Algorithm Neural Networks with smooth activation functions

A conic cutting surface method for linear-quadraticsemidefinite

Entropies & Information Theory

CHAPTER 4 RADICAL EXPRESSIONS

Research Article A New Iterative Method for Common Fixed Points of a Finite Family of Nonexpansive Mappings

Analysis of Lagrange Interpolation Formula

PROJECTION PROBLEM FOR REGULAR POLYGONS

A Robust Total Least Mean Square Algorithm For Nonlinear Adaptive Filter

Variable Metric Forward-Backward Splitting with Applications to Monotone Inclusions in Duality

A COMPARATIVE STUDY OF THE METHODS OF SOLVING NON-LINEAR PROGRAMMING PROBLEM

Binary classification: Support Vector Machines

NP!= P. By Liu Ran. Table of Contents. The P versus NP problem is a major unsolved problem in computer

Nonlinear Blind Source Separation Using Hybrid Neural Networks*

0/1 INTEGER PROGRAMMING AND SEMIDEFINTE PROGRAMMING

1 Convergence of the Arnoldi method for eigenvalue problems

Wavelet Basics. (A Beginner s Introduction) J. S. Marron Department of Statistics University of North Carolina

Stochastic Primal-Dual Coordinate Method for Regularized Empirical Risk Minimization

CONSTRUCTING GENERALIZED MEAN FUNCTIONS USING CONVEX FUNCTIONS WITH REGULARITY CONDITIONS

Feature Selection: Part 2. 1 Greedy Algorithms (continued from the last lecture)

PTAS for Bin-Packing

n -dimensional vectors follow naturally from the one

Regression and the LMS Algorithm

( ) 2 2. Multi-Layer Refraction Problem Rafael Espericueta, Bakersfield College, November, 2006

Distributed Accelerated Proximal Coordinate Gradient Methods

13. Parametric and Non-Parametric Uncertainties, Radial Basis Functions and Neural Network Approximations

Randomized Dual Coordinate Ascent with Arbitrary Sampling

Maximum Likelihood Estimation

Lecture 07: Poles and Zeros

A nonsmooth Levenberg-Marquardt method for generalized complementarity problem

TESTS BASED ON MAXIMUM LIKELIHOOD

Generalization of the Dissimilarity Measure of Fuzzy Sets

NP!= P. By Liu Ran. Table of Contents. The P vs. NP problem is a major unsolved problem in computer

Transforms that are commonly used are separable

UNIT 2 SOLUTION OF ALGEBRAIC AND TRANSCENDENTAL EQUATIONS

Communication-Efficient Distributed Primal-Dual Algorithm for Saddle Point Problems

Aitken delta-squared generalized Juncgk-type iterative procedure

A Strong Convergence Theorem for a Proximal-Type. Algorithm in Re exive Banach Spaces

A Penalty Function Algorithm with Objective Parameters and Constraint Penalty Parameter for Multi-Objective Programming

A tighter lower bound on the circuit size of the hardest Boolean functions

4 Inner Product Spaces

Point Estimation: definition of estimators

QR Factorization and Singular Value Decomposition COS 323

C-1: Aerodynamics of Airfoils 1 C-2: Aerodynamics of Airfoils 2 C-3: Panel Methods C-4: Thin Airfoil Theory

The Lie Algebra of Smooth Sections of a T-bundle

Rademacher Complexity. Examples

AN EULER-MC LAURIN FORMULA FOR INFINITE DIMENSIONAL SPACES

Chapter 14 Logistic Regression Models

TRIANGULAR MEMBERSHIP FUNCTIONS FOR SOLVING SINGLE AND MULTIOBJECTIVE FUZZY LINEAR PROGRAMMING PROBLEM.

ρ < 1 be five real numbers. The

A Fixed Point Method for Convex Systems

Evaluation of uncertainty in measurements

Solution of General Dual Fuzzy Linear Systems. Using ABS Algorithm

EVALUATION OF FUNCTIONAL INTEGRALS BY MEANS OF A SERIES AND THE METHOD OF BOREL TRANSFORM

Research and Simulation of FECG Signal Blind Separation Algorithm Based on Gradient Method

Solving Constrained Flow-Shop Scheduling. Problems with Three Machines

The Necessarily Efficient Point Method for Interval Molp Problems

Relations to Other Statistical Methods Statistical Data Analysis with Positive Definite Kernels

Strong Convergence of Weighted Averaged Approximants of Asymptotically Nonexpansive Mappings in Banach Spaces without Uniform Convexity

Minimization of Unconstrained Nonpolynomial Large-Scale Optimization Problems Using Conjugate Gradient Method Via Exact Line Search

Gender Classification from ECG Signal Analysis using Least Square Support Vector Machine

A Remark on the Uniform Convergence of Some Sequences of Functions

Difference of Convex Functions Optimization Methods: Algorithm of Minimum Maximal Network Flow Problem with Time-Windows

Taylor s Series and Interpolation. Interpolation & Curve-fitting. CIS Interpolation. Basic Scenario. Taylor Series interpolates at a specific

Lebesgue Measure of Generalized Cantor Set

Unit 9. The Tangent Bundle

KLT Tracker. Alignment. 1. Detect Harris corners in the first frame. 2. For each Harris corner compute motion between consecutive frames

Fitting: Deformable contours

Study on a Fire Detection System Based on Support Vector Machine

Linear Regression Linear Regression with Shrinkage. Some slides are due to Tommi Jaakkola, MIT AI Lab

3D Geometry for Computer Graphics. Lesson 2: PCA & SVD

Estimation of Stress- Strength Reliability model using finite mixture of exponential distributions

DIFFERENTIAL GEOMETRIC APPROACH TO HAMILTONIAN MECHANICS

Part 4b Asymptotic Results for MRR2 using PRESS. Recall that the PRESS statistic is a special type of cross validation procedure (see Allen (1971))

Complete Convergence and Some Maximal Inequalities for Weighted Sums of Random Variables

Global solution of optimization problems with signomial parts

X X X E[ ] E X E X. is the ()m n where the ( i,)th. j element is the mean of the ( i,)th., then

-Pareto Optimality for Nondifferentiable Multiobjective Programming via Penalty Function

CIS 800/002 The Algorithmic Foundations of Data Privacy October 13, Lecture 9. Database Update Algorithms: Multiplicative Weights

Integral Equation Methods. Jacob White. Thanks to Deepak Ramaswamy, Michal Rewienski, Xin Wang and Karen Veroy

15-381: Artificial Intelligence. Regression and neural networks (NN)

9 U-STATISTICS. Eh =(m!) 1 Eh(X (1),..., X (m ) ) i.i.d

Transcription:

LECTURE 24 LECTURE OUTLINE Gradet proxmal mmzato method Noquadratc proxmal algorthms Etropy mmzato algorthm Expoetal augmeted Lagraga mehod Etropc descet algorthm ************************************** Refereces: Bec, A., ad Teboulle, M., 200. Gradet- Based Algorthms wth Applcatos to Sgal Recovery Problems, Covex Optmzato Sgal Processg ad Commucatos (Y. Eldar ad D. Palomar, eds.), Cambrdge Uversty Press, pp. 42-88. Bec, A., ad Teboulle, M., 2003. Mrror Descet ad Nolear Projected Subgradet Methods for Covex Optmzato, Operatos Research Letters, Vol. 3, pp. 67-75. Bertseas, D. P., 999. Nolear Programmg, Athea Scetfc, Belmot, MA. All fgures are courtesy of Athea Scetfc, ad are used wth permsso.

PROXIMAL AND GRADIENT PROJECTION Proxmal algorthm to mmze covex f over closed covex X { x + arg m f(x)+ x x 2 x X 2c f(x ) γ f(x) γ 2c x x 2 x x + x x Let f be dfferetable ad assume f(x) f(y) L x y, x, y X Defe the lear approxmato fucto at x!(y; x) =f(x)+ f(x) (y x), y Coecto of proxmal wth gradet projecto { ( y = arg m!(z; x)+ z X 2α z x 2 = P X x α f(x) 2

GRADIENT-PROXIMAL METHOD I Mmze f(x)+g(x) over x X, where X: closed covex, f, g: covex, f s dfferetable. Gradet-proxmal method: { x + arg m!(x; x )+g(x)+ x X 2α x x 2 Recall ey equalty: For all x, y X L f(y)!(y; x)+ y x 2 2 Cost reducto for α /L: L f(x + )+g(x + )!(x + ; x )+ x 2 + x 2 + g(x + )!(x + ; x )+g(x + )+ x + x 2 2α!(x ; x )+g(x ) = f(x )+g(x ) Ths s a ey sght for the covergece aalyss. 3

GRADIENT-PROXIMAL METHOD II Equvalet defto of gradet-proxmal: x + arg m x X z = x α f(x ) { g(x)+ 2α x z 2 Smplfes the mplemetato of proxmal, by usg gradet terato to deal wth the case of a coveet compoet f Ths s smlar to cremetal subgradet-proxmal method, but the gradet-proxmal method does ot exted to the case where the cost cossts of the sum of multple compoets. Allows a costat stepsze (uder the restrcto α /L). Ths does ot exted to cremetal methods. Le all gradet ad subgradet methods, covergece ca be slow. There are specal cases where the method ca be frutfully appled (see the referece by Bec ad Teboulle). 4

GENERALIZED PROXIMAL ALGORITHM Itroduce a geeral regularzato term D : x + arg m x X Example: Bregma f(x)+d (x, x ) dstace fucto D (x, y) = ( (x) ( y) (y) (x c y), where : (, ] s a covex fucto, dfferetable wth a ope set cotag dom(f), ad c s a postve pealty parameter. All the deas for applcatos ad coectos of the quadratc form of the proxmal algorthm exted to the oquadratc case (although the aalyss may ot be trval). I partcular we have: A dual proxmal algorthm (based o Fechel dualty) Equvalece wth (oquadratc) augmeted Lagragea method Combatos wth polyhedral approxmatos (budle-type methods) Icremetal subgradet-proxmal methods Nolear gradet projecto algorthms 5

ENTROPY MINIMIZATION ALGORITHM A specal case volvg etropy regularzato: x x + arg m f(x)+ x l x X c x = where x 0 ad all subsequet x have postve compoets We use Fechel dualty to obta a dual form of ths mmzato Note: The logarthmc fucto p(x) = x(l x ) ad the expoetal fucto are a cojugate par. The dual problem s y + arg m f x>0, 0 f x =0, f x<0, p (y) =e y f (y)+ x e c y y c = 6

EXPONENTIAL AUGMENTED LAGRANGIAN The dual proxmal terato s x = x e c y + +, =,..., where y + s obtaed from the dual proxmal: y + arg m y f (y)+ c = x e c y A specal case for the covex problem mmze f(x) subject to g (x) 0,...,g r (x) 0, x X s the expoetal augmeted Lagragea method Cossts of ucostraed mmzatos r x arg m f(x)+ µ j e c g j (x), x X c j= followed by the multpler teratos j µ j j c g j (x ) + = µ e, j =,...,r 7

NONLINEAR PROJECTION ALGORITHM Subgradet projecto wth geeral regularzato term D : x + arg m f(x )+ f(x ) (x x )+D (x, x ) x X where f(x ) s a subgradet of f at x. called mrror descet method. Also Learzato of f smplfes the mmzato The use of oquadratc learzato s useful problems wth specal structure Etropc descet method: Mmze f(x) over the ut smplex X = x 0 x =. Method: x + arg m = x g + x X α = where g are the compoets of f(x ). l x x Ths mmzato ca be doe closed form: x + = x e α g, j =,..., x j e α g j= 8

MIT OpeCourseWare http://ocw.mt.edu 6.253 Covex Aalyss ad Optmzato Sprg 202 For formato about ctg these materals or our Terms of Use, vst: http://ocw.mt.edu/terms.