MATH36061 Convex Optimization

Size: px

Start display at page:

Download "MATH36061 Convex Optimization"

Mildred Davidson
5 years ago
Views:

$M\cr NA Manchester Numerical Analysis$ Lotz School of Mathematics The

1 M\cr NA Manchester Numerical Analysis MATH36061 Convex Optimization Martin Lotz School of Mathematics The University of Manchester Manchester, September 26, 2017

2 Outline General information What is optimization? Course overview

3 Outline General information What is optimization? Course overview

4 Organisation The course website can be found under 1 / 27

5 Organisation The course website can be found under Tutorials start in week two, the tutorial class in week one will consist of a review of important concepts and Python. 1 / 27

6 Organisation The course website can be found under Tutorials start in week two, the tutorial class in week one will consist of a review of important concepts and Python. Problem sets consist of two parts: problems to be worked on at home, and problems to be discussed in class. 1 / 27

7 Organisation The course website can be found under Tutorials start in week two, the tutorial class in week one will consist of a review of important concepts and Python. Problem sets consist of two parts: problems to be worked on at home, and problems to be discussed in class. All the material presented in the lecture will be made available online as the course progresses. 1 / 27

8 Organisation The course website can be found under Tutorials start in week two, the tutorial class in week one will consist of a review of important concepts and Python. Problem sets consist of two parts: problems to be worked on at home, and problems to be discussed in class. All the material presented in the lecture will be made available online as the course progresses. Lecture notes will appear on the website each week. 1 / 27

9 Organisation The course website can be found under Tutorials start in week two, the tutorial class in week one will consist of a review of important concepts and Python. Problem sets consist of two parts: problems to be worked on at home, and problems to be discussed in class. All the material presented in the lecture will be made available online as the course progresses. Lecture notes will appear on the website each week. Contact: martin.lotz@manchester.ac.uk 1 / 27

10 Example classes Example classes are there to deepen the understanding of the course material. Problems sheets should ideally be looked at before the example class. 2 / 32

11 Example classes Example classes are there to deepen the understanding of the course material. Problems sheets should ideally be looked at before the example class. In the example classes, you will be given some time to work on the problems, you should ask questions if some parts are not clear, you will get feedback on your attempts at the problems, we will go through a selection of problems and their solutions. 2 / 32

12 Python 3 / 32

13 Python Python is an indispensable tool for this course. 3 / 32

14 Python Python is an indispensable tool for this course. Python is one of the most popular programming languages, used among others by Google, Dropbox, NASA, etc. 3 / 32

15 Python Python is an indispensable tool for this course. Python is one of the most popular programming languages, used among others by Google, Dropbox, NASA, etc. Many Data science jobs require knowledge of Python. 3 / 32

16 Python Python is an indispensable tool for this course. Python is one of the most popular programming languages, used among others by Google, Dropbox, NASA, etc. Many Data science jobs require knowledge of Python. Availability for this course: On SageMathCloud By installing Anaconda Python on your computer 3 / 32

17 Python Python is an indispensable tool for this course. Python is one of the most popular programming languages, used among others by Google, Dropbox, NASA, etc. Many Data science jobs require knowledge of Python. Availability for this course: On SageMathCloud By installing Anaconda Python on your computer The course page contains links to Python documentation 3 / 32

18 Python Example Python code for printing Fibonacci numbers below 25 def f ( a, b ) : c = a + b return c a, b = 0, 1 while b<25: a, b = b, f ( a, b ) p r i n t a 4 / 32

19 CVXPY 5 / 32

20 CVXPY CVXPY is a convex optimization package for Python. 5 / 32

21 CVXPY CVXPY is a convex optimization package for Python. CVXPY is freely available on 5 / 32

22 CVXPY CVXPY is a convex optimization package for Python. CVXPY is freely available on It is easy to install and the documentation provides some examples to get started. On SageMathCloud it is readily available. 5 / 32

23 CVXPY CVXPY is a convex optimization package for Python. CVXPY is freely available on It is easy to install and the documentation provides some examples to get started. On SageMathCloud it is readily available. An overview of Python, Jupyter notebooks, and CVXPY is given during the first tutorial class on Thursday! 5 / 32

24 Mathematical prerequisites The course will make extensive use of results from linear algebra (vectors and matrices in R n ), multivariate calculus (gradients, Hessians). A comprehensive overview of the mathematics needed can be found on the course website. 6 / 32

25 Outline General information What is optimization? Course overview

26 Mathematical optimization A mathematical optimization problem is a problem of the form minimize subject to f(x) x Ω where f is the objective function and Ω R n is a set of constraints. Often the set Ω is defined by those x R n that satisfy some conditions g 1 (x) 0,..., g m (x) 0, where the g i are real-valued functions. A vector x such that f(x ) f(x) for all other x satisfying the constraints is called a solution or minimizer of the problem. 7 / 32

27 Example: least squares Assume a quantity Y depends linearly on predictors X 1,..., X n : where ε is some random error. Y = β 0 + β 1 X β p X p + ε, (3.1) 8 / 32

28 Example: least squares Assume a quantity Y depends linearly on predictors X 1,..., X n : where ε is some random error. Y = β 0 + β 1 X β p X p + ε, (3.1) Example: Y could sales, and X 1,..., X p advertisement budget in different media (TV, internet, radio, billboards, newspapers). 8 / 32

29 Example: least squares Assume a quantity Y depends linearly on predictors X 1,..., X n : where ε is some random error. Y = β 0 + β 1 X β p X p + ε, (3.1) Example: Y could sales, and X 1,..., X p advertisement budget in different media (TV, internet, radio, billboards, newspapers). Goal: determine model parameters β 0,..., β p. 8 / 32

30 Example: least squares Assume a quantity Y depends linearly on predictors X 1,..., X n : where ε is some random error. Y = β 0 + β 1 X β p X p + ε, (3.1) Example: Y could sales, and X 1,..., X p advertisement budget in different media (TV, internet, radio, billboards, newspapers). Goal: determine model parameters β 0,..., β p. What we know is data from n p observations or experiments: y i = β 0 + β 1 x i1 + + β p x ip + ε i, 1 i n. 8 / 32

31 Example: least squares What we know is data from n p observations or experiments: y i = β 0 + β 1 x i1 + + β p x ip + ε i, 1 i n. 9 / 32

32 Example: least squares What we know is data from n p observations or experiments: y i = β 0 + β 1 x i1 + + β p x ip + ε i, 1 i n. Collect the data in matrices and vectors: y 1 y n y =., X = 1 x 11 x 1p x n1 x np, β = β 0 β 1. β p ε 1, ε =. ε n 9 / 32

33 Example: least squares What we know is data from n p observations or experiments: y i = β 0 + β 1 x i1 + + β p x ip + ε i, 1 i n. Collect the data in matrices and vectors: y 1 y n y =., X = 1 x 11 x 1p x n1 x np The relationship can then be written as y = Xβ + ε., β = β 0 β 1. β p ε 1, ε =. ε n 9 / 32

34 Example: least squares Based on the relationship y = Xβ + ε, choose β that minimizes the squared 2-norm of the error, minimize y Xβ / 32

35 Example: least squares Based on the relationship y = Xβ + ε, choose β that minimizes the squared 2-norm of the error, minimize y Xβ 2 2 This is an optimization problem with a quadratic objective function and no constraints. 10 / 32

36 Example: least squares Based on the relationship y = Xβ + ε, choose β that minimizes the squared 2-norm of the error, minimize y Xβ 2 2 This is an optimization problem with a quadratic objective function and no constraints. The problem has a closed form solution β = (X X) 1 X y. 10 / 32

37 Example: least squares Based on the relationship y = Xβ + ε, choose β that minimizes the squared 2-norm of the error, minimize y Xβ 2 2 This is an optimization problem with a quadratic objective function and no constraints. The problem has a closed form solution β = (X X) 1 X y. There are more efficient methods available than using the closed form solution. 10 / 32

38 Application: linear regression Observation: the log of the basal metabolic rate (energy expenditure) rate Y in mammals is related to log adult mass X as Y = β 0 + β 1 X. 11 / 32

39 Application: linear regression Observation: the log of the basal metabolic rate (energy expenditure) rate Y in mammals is related to log adult mass X as Y = β 0 + β 1 X. Collect data from a database 573 mammal species and plot the relationship. 11 / 32

40 Application: linear regression Observation: the log of the basal metabolic rate (energy expenditure) rate Y in mammals is related to log adult mass X as Y = β 0 + β 1 X. Collect data from a database 573 mammal species and plot the relationship. Assemble the vector y, matrix X, and solve the problem minimize y Xβ 2 11 / 32

41 Application: linear regression # C r e a t e a p+1 v e c t o r o f v a r i a b l e s Beta = V a r i a b l e ( p+1) # C r e a t e sum of s q u a r e s o b j e c t i v e f u n c t i o n o b j e c t i v e=minimize ( s u m e n t r i e s ( s q u a r e (X Beta y ) ) ) # C r e a t e problem and s o l v e i t prob = Problem ( o b j e c t i v e ) prob. s o l v e ( ) p l t. p l o t ( bmr [ :, 0 ], bmr [ :, 1 ], o ) xx = np. l i n s p a c e ( 0, 1 4, ) bmr = p l t. p l o t ( xx, Beta [ 0 ]. v a l u e+beta [ 1 ]. v a l u e xx, \ c o l o r= red, l i n e w i d t h =2) p l t. show ( ) 12 / 32

42 Application: linear regression Y = β 0 + β 1 X log Basal metabolic rate log Adult body mass 13 / 32

43 Application: linear regression Y = β0 + β1 X. Figure: Episode Size Matters from TV series Wonders of Life 13 / 32

44 Example: linear programming Linear programming refers to problems of the form maximize c 1 x c n x n subject to a 11 x a 1n x n b 1 a m1 x a mn x n b m x 1 0,..., x n / 32

45 Example: linear programming Linear programming refers to problems of the form maximize c 1 x c n x n subject to a 11 x a 1n x n b 1 a m1 x a mn x n b m x 1 0,..., x n 0. Can be rewritten in standard form shown at the beginning! 14 / 32

46 Example: linear programming Linear programming refers to problems of the form maximize c 1 x c n x n subject to a 11 x a 1n x n b 1 a m1 x a mn x n b m x 1 0,..., x n 0. Can be rewritten in standard form shown at the beginning! Using matrices and vectors: maximize c, x (= c x) subject to Ax b (inequality componentwise) x 0 14 / 32

47 Linear programming: geometry {x : a, x b}. 15 / 32

48 Linear programming: geometry {x : a 1, x b 1,..., a m, x b m } = {x : Ax b}. Polyhedron 16 / 32

49 Linear programming: geometry maximize c, x subject to Ax b. 17 / 32

50 Linear programming: application A cargo plane has two compartments with capacities C 1 = 35 and C 2 = 40 tonnes and volumes V 1 = 250 and V 2 = 400 cubic metres. 18 / 32

51 Linear programming: application A cargo plane has two compartments with capacities C 1 = 35 and C 2 = 40 tonnes and volumes V 1 = 250 and V 2 = 400 cubic metres. Three types of cargo: Volume Weight Profit ( / tonne) Cargo Cargo Cargo / 32

52 Linear programming: application A cargo plane has two compartments with capacities C 1 = 35 and C 2 = 40 tonnes and volumes V 1 = 250 and V 2 = 400 cubic metres. Three types of cargo: Volume Weight Profit ( / tonne) Cargo Cargo Cargo How to distribute the cargo to maximize profit? 18 / 32

53 Linear programming: application Objective function: x ij : amount of cargo i in compartment j f(x) = 300 (x 11 + x 12 ) (x 21 + x 22 ) (x 31 + x 32 ). 19 / 32

54 Linear programming: application x ij : amount of cargo i in compartment j Objective function: f(x) = 300 (x 11 + x 12 ) (x 21 + x 22 ) (x 31 + x 32 ). Constraints: x 11 + x (total amount of cargo 1) x 21 + x (total amount of cargo 2) x 31 + x (total amount of cargo 3) x 11 + x 21 + x (weight constraint on comp. 1) x 12 + x 22 + x (weight constraint on comp. 2) 8x x x (volume constraint on comp. 1) 8x x x (volume constraint on comp. 2) (x 11 + x 21 + x 31 )/35 (x 12 + x 22 + x 32 )/40 = 0 (maintain balance of weight ratio) x ij 0 (cargo can t have negative weight) 19 / 32

55 Linear programming: application As complicated as it looks, it can be solved easily with CVXPY. # D e f i n e a l l t h e m a t r i c e s and v e c t o r s i n v o l v e d c = np. a r r a y ( [ 3 0 0, 3 0 0, 3 5 0, 3 5 0, 2 7 0, ] ) A = np. a r r a y ( [ [ 1, 1, 0, 0, 0, 0 ], [ 0, 0, 1, 1, 0, 0 ], [ 0, 0, 0, 0, 1, 1 ], [ 1, 0, 1, 0, 1, 0 ], [ 0, 1, 0, 1, 0, 1 ], [ 8, 0, 10, 0, 7, 0 ], [ 0, 8, 0, 10, 0, 7 ] ] ) b = np. a r r a y ( [ 2 5, 3 2, 2 8, 3 5, 4 0, 2 5 0, ] ) ; B = np. array ( [ 1 / 35, 1/40, 1/35, 1/40, 1/35, 1/40]); d = np. z e r o s ( 1 ) # C r e a t e v a r i a b l e s, o b j e c t i v e and c o n s t r a i n t s x = V a r i a b l e ( 6 ) c o n s t r a i n t s = [ A x <= b, B x == d, x >= 0 ] o b j e c t i v e = Maximize ( c x ) # C r e a t e a problem u s i n g t h e o b j e c t i v e and c o n s t r a i n t s and s o l v e i t prob = Problem ( o b j e c t i v e, c o n s t r a i n t s ) prob. s o l v e ( ) 20 / 32

56 Linear programming: application As complicated as it looks, it can be solved easily with CVXPY. # D e f i n e a l l t h e m a t r i c e s and v e c t o r s i n v o l v e d c = np. a r r a y ( [ 3 0 0, 3 0 0, 3 5 0, 3 5 0, 2 7 0, ] ) A = np. a r r a y ( [ [ 1, 1, 0, 0, 0, 0 ], [ 0, 0, 1, 1, 0, 0 ], [ 0, 0, 0, 0, 1, 1 ], [ 1, 0, 1, 0, 1, 0 ], [ 0, 1, 0, 1, 0, 1 ], [ 8, 0, 10, 0, 7, 0 ], [ 0, 8, 0, 10, 0, 7 ] ] ) b = np. a r r a y ( [ 2 5, 3 2, 2 8, 3 5, 4 0, 2 5 0, ] ) ; B = np. array ( [ 1 / 35, 1/40, 1/35, 1/40, 1/35, 1/40]); d = np. z e r o s ( 1 ) # C r e a t e v a r i a b l e s, o b j e c t i v e and c o n s t r a i n t s x = V a r i a b l e ( 6 ) c o n s t r a i n t s = [ A x <= b, B x == d, x >= 0 ] o b j e c t i v e = Maximize ( c x ) # C r e a t e a problem u s i n g t h e o b j e c t i v e and c o n s t r a i n t s and s o l v e i t prob = Problem ( o b j e c t i v e, c o n s t r a i n t s ) prob. s o l v e ( ) x 11 = , x 12 = , x 21 = 0, x 22 = 32, x 31 = 28, x 32 = / 32

57 Example: total variation inpainting Optimization methods play an increasingly important role in image and signal processing. 21 / 32

58 Example: total variation inpainting Optimization methods play an increasingly important role in image and signal processing. An image is an m n matrix U, with each entry u ij corresponding to a light intensity or colour. 21 / 32

59 Example: total variation inpainting Optimization methods play an increasingly important role in image and signal processing. An image is an m n matrix U, with each entry u ij corresponding to a light intensity or colour. In the image inpainting problem, one aims to guess the true value of missing or corrupted entries of an image. 21 / 32

60 Example: total variation inpainting Optimization methods play an increasingly important role in image and signal processing. An image is an m n matrix U, with each entry u ij corresponding to a light intensity or colour. In the image inpainting problem, one aims to guess the true value of missing or corrupted entries of an image. A conceptually simple approach is to replace the image with the closest image among a set of images satisfying typical properties. 21 / 32

61 Example: total variation inpainting What are typical properties of a typical image? Images tend to have large homogeneous areas in which the colour doesn t change much; Images have approximately low rank, when interpreted as matrices. The total variation or TV-norm is the sum of the norm of the horizontal and vertical differences, U TV = m n i=1 j=1 (u i+1,j u i,j ) 2 + (u i,j+1 u i,j ) 2, 22 / 32

62 Example: total variation inpainting The TV-norm naturally increases with increased variation or sharp edges in an image U 1 = , U The left matrix has TV-norm U 1 TV = , while the right one has TV-norm U 2 TV = Intuitively, we would expect a natural image with artifacts added to it to have a higher TV norm. 23 / 32

63 Example: total variation inpainting Consider the following corrupted image. 24 / 32

64 Example: total variation inpainting Applying the optimization problem minimize X TV subject to x ij = u ij, (i, j) Ω. we recover an approximation to the true image: Inpainted Image Difference Image 25 / 32

65 Example: machine learning In machine learning, the goal is to automatically learn a function f(x) = y from a training set of pairs (x i, y i ). 26 / 32

66 Example: machine learning In machine learning, the goal is to automatically learn a function f(x) = y from a training set of pairs (x i, y i ). The problem can be posed as an optimization problem min f F m (f(x i ) y i ) 2, i=1 where F is a class of functions. Examples include regression and classification. 26 / 32

67 Application: letter recognition By solving an optimization problem (a multi-layer neural network), the computer learns a function that assigns to an image the closest handwritten letter. 27 / 32

68 Applications of machine learning Handwritten character recognition; Classification of gene expression data; Computer games; Online recommender systems; Sentiment analysis, text mining; Medical diagnosis; Computer vision;... Developments driven by Google DeepMind, Facebook AI, OpenAI, etc. 28 / 32

69 Outline General information What is optimization? Course overview

70 Overview of the course The course will try to balance three points of view: Theory. Optimality conditions, duality theory, convex analysis, geometry of linear, quadratic and semidefinite programming. Algorithms. Descent algorithms, interior-point methods, projected gradient methods, Lagrangian Applications. Logistics and scheduling, finance, signal processing, convex regularization, semidefinite relaxations of hard combinatorial problems, machine learning, compressive sensing. 29 / 32

We implement examples and see optimization at work. Visual.

71 The course will be... Rigorous. We prove Theorems and the occasional Lemma. Computational. We implement examples and see optimization at work. Visual. Convex optimization is a geometric theory, so expect to see lots of pictures. 30 / 32

72 Overview of the course Next class (Thursday 9-11, Univ. Place 6.212): Unconstrained optimization: gradient descent; Introduction to the computational tools of this course. 31 / 32

73 See you next time! Caged parrot Free parrot minimize X TV subject to x ij = u ij, (i, j) Ω. 32 / 32

Math for Machine Learning Open Doors to Data Science and Artificial Intelligence. Richard Han

Math for Machine Learning Open Doors to Data Science and Artificial Intelligence Richard Han Copyright 05 Richard Han All rights reserved. CONTENTS PREFACE... - INTRODUCTION... LINEAR REGRESSION... 4 LINEAR