Non-convex optimization. Issam Laradji

Size: px

Start display at page:

Download "Non-convex optimization. Issam Laradji"

Darrell Wilkins
6 years ago
Views:

1 Non-convex optimization Issam Laradji

2 Strongly Convex Objective function f(x) x

3 Strongly Convex Objective function Assumptions Gradient Lipschitz continuous f(x) Strongly convex x

4 Strongly Convex Objective function Assumptions Gradient Lipschitz continuous f(x) Strongly convex x Randomized coordinate descent

5 Non-strongly Convex optimization Assumptions Gradient Lipschitz continuous Convergence rate Compared to the strongly convex convergence rate

6 Non-strongly Convex optimization

7 Non-Strongly Convex Objective function Assumptions Lipschitz continuous Restricted secant inequality Randomized coordinate descent

8 Invex functions (a generalization of convex function) Objective function Assumptions Lipschitz continuous Polyak [1963] Invex function (one global minimum) This inequality simply requires that the gradient grows faster than a linear function as we move away from the optimal function value.

9 Invex functions (a generalization of convex function) Objective function Assumptions Lipschitz continuous Polyak [1963] - for invex functions where this holds Invex function (one global minimum) Randomized coordinate descent

10 Invex functions (a generalization of convex function) Objective function Invex Assumptions Lipschitz continuous Convex Polyak Polyak [1963] - for invex functions where this holds Venn diagram Randomized coordinate descent

11 Non-convex functions

12 Non-convex functions local maxima Global minimum Local minima

13 Non-convex functions Global minimum Local minima Strategy 1: local optimization of the non-convex function

14 Non-convex functions local maxima Global minimum Local minima Assumptions for local non-convex optimization Lipschitz continuous Locally convex

15 Non-convex functions local maxima Local randomized coordinate descent Global minimum Local minima Strategy 1: local optimization of the non-convex function All convex functions rates apply.

16 Non-convex functions local maxima Global minimum Local minima Strategy 1: local optimization of the non-convex function Issue: dealing with saddle points

17 Non-convex functions Global minimum Local minima Strategy 2: Global optimization of the non-convex function Issue: Exponential number of saddle points

18 Local non-convex optimization Gradient Descent Difficult to define a proper step size Newton method Newton method solves the slowness problem by rescaling the gradients in each direction with the inverse of the corresponding eigenvalues of the hessian can result in moving in the wrong direction (negative eigenvalues) Saddle-Free Newton s method rescales gradients by the absolute value of the inverse Hessian and the Hessian s Lanczos vectors.

19 Local non-convex optimization Random stochastic gradient descent Sample noise r uniformly from unit sphere Escapes saddle points but step size is difficult to determine Cubic regularization [Nesterov 2006] Gradient Lipschitz continuous Hessian Lipschitz continuous

20 Local non-convex optimization Random stochastic gradient descent Sample noise r uniformly from unit sphere Escapes saddle points but step size is difficult to determine Momentum can help escape saddle points (rolling ball)

21 Global non-convex optimization Matrix completion problem [De Sa et al. 2015] Applications (usually for datasets with missing data) matrix completion Image reconstruction recommendation systems.

22 Global non-convex optimization Matrix completion problem [De Sa et al. 2015] Reformulate it as (unconstrained non-convex problem) Gradient descent

23 Global non-convex optimization Reformulate it as (unconstrained non-convex problem) Gradient descent Using the Riemannian manifold, we can derive the following This widely-used algorithm converges globally, using only random initialization

24 Convex relaxation of non-convex functions optimization Convex Neural Networks [Bengio et al. 2006] Single-hidden layer network original neural networks non-convex problem

25 Convex relaxation of non-convex functions optimization Convex Neural Networks [Bengio et al. 2006] Single-hidden layer network Use alternating minimization Potential issues with the activation function

26 Bayesian optimization (global non-convex optimization) Typically used for finding optimal parameters Determining the step size of # hidden layers for neural networks The parameter values are bounded? Other methods include sampling the parameter values random uniformly Grid-search

27 Bayesian optimization (global non-convex optimization) Fit Gaussian process on the observed data (purple shade) Probability distribution on the function values

Bayesian optimization (global non-convex optimization) Fit Gaussian process on the observed data (purple shade) Probability distribution on the function values

28 Bayesian optimization (global non-convex optimization) Fit Gaussian process on the observed data (purple shade) Probability distribution on the function values Acquisition function (green shade) a function of the objective value (exploitation) in the Gaussian density function; and the uncertainty in the prediction value (exploration).

29 Bayesian optimization Slower than grid-search with low level of smoothness (illustrate) Faster than grid-search with high level of smoothness (illustrate)

30 Bayesian optimization Slower than grid-search with low level of smoothness (illustrate) Faster than grid-search with high level of smoothness (illustrate) Grid-search Bayesian optimization

31 Bayesian optimization Slower than grid-search with low level of smoothness (illustrate) Faster than grid-search with high level of smoothness (illustrate) A measure of smoothness Grid-search Bayesian optimization

32 Summary Non-convex optimization Strategy 1: Local non-convex optimization Convexity convergence rates apply Escape saddle points using, for example, cubic regularization and saddle-free newton update Strategy 2: Relaxing the non-convex problem to a convex problem Convex neural networks Strategy 3: Global non-convex optimization Bayesian optimization Matrix completion

Non-Convex Optimization in Machine Learning. Jan Mrkos AIC

Non-Convex Optimization in Machine Learning Jan Mrkos AIC The Plan 1. Introduction 2. Non convexity 3. (Some) optimization approaches 4. Speed and stuff? Neural net universal approximation Theorem (1989):