An interior-point stochastic approximation method and an L1-regularized delta rule

Size: px

Start display at page:

Download "An interior-point stochastic approximation method and an L1-regularized delta rule"

Derick Terry
5 years ago
Views:

1 Photograph from National Geographic, Sept 2008 An interior-point stochastic approximation method and an L1-regularized delta rule Peter Carbonetto, Mark Schmidt and Nando de Freitas University of British Columbia Advances in Neural Information Processing Systems 2008

A motivating example Goal: p(spam email-features, ).

2 A motivating example Goal: p(spam -features, ). (Cormack and Lynam, 2005) Approach: find model that maximizes likelihood of training data. But: training data is observed over time (on-line learning). Moreover: we need to penalize complex models. 2 /17

3 What is stochastic approximation, briefly Original Problem: (Spall, 2003; Kushner and Yin, 2003; Bottou, 1998) 1. Minimize f(x), or find F(x) = f(x) = We only observe noisy, unbiased gk F(xk). Robbins & Monro algorithm: 1. xk+1 = xk - ak gk 2. {ak } is a sequence of decreasing step sizes. Problem in this talk: minimize f(x) subject to c(x) 0. 3/17

4 Motivating example (continued) nonsmooth, unconstrained objective: minimize log p(spam -features, ) + λ 1 change to x to obtain smooth, constrained objective: minimize subject to x 0. log p(spam -features, x) + λ u 1 4/17

5 One possible approach Projected gradient (Bertsekas, 1999; Poljak, 1978) Has convergence guarantees. Not always efficient to compute projection. Big steps may be biased slow progress. 5/17

6 The interior-point approach Projected gradient Has convergence guarantees. Primal-dual Interior-point method Also has convergence guarantees. Only feasible for simple constraints. Large steps may be biased. Works for many types of constraints. Steps are never biased. Blah blah blah blah 6/17

7 The interior-point approach Our problem: minimize f(x) subject to c(x) 0. The barrier function: (Fiacco and McCormick, 1968) log-barrier function /17

8 The interior-point approach Primal interior-point method: solve sequence F (x) = f (x) = 0 for decreasing > 0. feasible set But: we cannot assess convergence to each subproblem F (x) = 0! Adapted from Fiacco and McCormick (1968). 8/17

9 The primal-dual approach (M. H. Wright, 1992; S. J. Wright, 1996; many others ) Recall problem: minimize f(x) subject to c(x) 0. Introduce Lagrange multiplier-like variables z. Use Robbins-Monro to solve moving target: dual variables Take steps Replace with noisy estimate xk+1 = xk + âk Δxk zk+1 = zk + âk Δzk Perturbed KKT conditions Primal-dual search direction 9/17

10 Why does this work? 1. Central path numerically stable. 2. Primal-dual search direction keeps us on central path, even with noisy gradients. 10/17

11 A small experiment Problem: linear regression + L1 penalty (Lasso) minimize Question: how well does on-line estimate recover exact solution? Synthetic data. subject to x 0. Ax b 2 + λ Repeat for k = 1 to 100, step sizes ak = 1/(k0 + k). 11/17

12 A small experiment Compared these methods: Projected gradient Primal-dual interior-point Sub-gradient (Shalev-Shwartz et al, 2007; Hazan, 2007) Smoothed approximation Augmented Lagrangian (Wang and Spall, 2003) 12/17

13 Some empirical evidence Sensitivity to step size sequence distance to exact solution k0 (step sizes are ak = 1/(k0 + k).) 13/17

14 Some empirical evidence distance to exact solution Sensitivity to strength of L1 penalty penalty strength 14/17

15 Shrinkage effect exact solution online 1 pass of data online 10 passes regression coefficient penalty strength penalty strength penalty strength 15/17

16 In summary Robbins-Munro solved F(x) = 0 with updates xk+1 = xk - ak gk. We solve sequence F (x,z) = 0 with updates xk+1 = xk + âk Δxk zk+1 = zk + âk Δzk where (Δx,Δz) is the solution to the perturbed KKT conditions. 16/17

17 Thank you! 17/17

Accelerated primal-dual methods for linearly constrained convex problems

Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23 Accelerated proximal gradient For convex composite problem: minimize