Online Convex Optimization in the Bandit Setting: Gradient Descent Without a Gradient. -Avinash Atreya Feb

Size: px

Start display at page:

Download "Online Convex Optimization in the Bandit Setting: Gradient Descent Without a Gradient. -Avinash Atreya Feb"

Marshall Pope
5 years ago
Views:

1 Olie Covex Optimizatio i the Badit Settig: Gradiet Descet Without a Gradiet -Aviash Atreya Feb

2 Outlie Itroductio The Problem Example Backgroud Notatio Results Oe Poit Estimate Mai Theorem Extesios ad Related Work

3 The Problem At time t: We eed to choose a iput vector x t S R d S is a covex set Nature reveals oly the cost c t x t c t : R d R is covex (ot ecessarily differetiable) Our Goal: Miimize the expected regret: E c t x t mi x S c t (x)

4 Example Olie advertisig speds each day Each compoet of x t : sped o a search egie i dollars Ed of the day we lear the umber of clicks

5 Backgroud Olie Covex Optimizatio We lear the fuctio c t after we pick x t Badit Settig We lear oly the outcome of our actio Olie Covex Optimizatio i Badit Settig We oly lear the outcome c t x t

6 Notatio I D : diameter x y 2 D x, y S G : gradiet upper boud c t x t 2 G t 1 t

7 Notatio II C : fuctio absolute value boud c t x C t, x L : Lipschitz Costat c t x c t y 2 L x y 2 t, x, y S

8 Notatio III Util ball B ad uit sphere S B = x R d x 1+, S = x R d x = 1+ Projectio oto the covex set S P S x = arg mi x z z S

9 Key Results Olie Covex Optimizatio: (Zikevich) c t x t Badit Settig E c t x t mi x S mi x S c x DG c t x 6 5 6dC

10 Outlie Itroductio Oe Poit Estimate Key Challege Projected Gradiet Descet Expected Gradiet Descet Oe Poit Estimate Mai Theorem Extesios ad Related Work

11 Key Challege Approach Projected Gradiet descet x t+1 = P S x t ν c t x t Challege How to estimate gradiet with oly c t x t?

12 Gradiet Estimate We eed at least d + 1 poits i d dimesios 1-d : f x f x+δ f x δ f(x + δ) f(x) Prior work exists o usig two poit estimates i d dimesios

13 Projected Gradiet Descet Due to Zikevich (see i class) x 1 = 0 ; At time t + 1 c t is revealed (covex ad differetiable) x t+1 = P s x t η c t x t Regret boud: c t x t mi x S c x RG

14 Expected Gradiet Descet x 1 = 0 ; At time t + 1 x t+1 = P s x t ηg t g t : radom vector E g t x t - = c t x t Same boud holds o expectatio: E c t x t mi x S c t x RG

15 Key Challege Revisited Challege Estimate gradiet c t x t with oe poit estimate c t (x t ) Somewhat easier Come up with ct, g t so that E g t x t - = c t x t Come up with a fuctio c t (close to c t ) whose gradiet is easy to estimate (usig c t ) o expectatio

16 Oe Poit Estimate I Fudametal theorem of calculus: +δ d c dx t (x + y)dy = c t (x + δ) c t (x δ) δ Uiform radom variable:v, 1, +1-1 d dx δ 1 2 c t(x + vδ)dv = c t x + δ c t x δ 2 1

17 Oe Poit Estimate II Radom variable u * 1, +1+ d dx E v ~ U 1,1 c t x + δv = E u ~ 1,1 c t x + δu u δ c t x = E,c t (x + δv)- (smoothed versio of c t ) The fuctio we are lookig for! g t = c t x + δu t u t v is draw from a lie, u from ed poits

18 Oe poit Estimate III d dimesios v ~ B (the uit ball) u ~ S (the uit sphere) v u E v ~ B c t x + δv = d δ E u ~ S c t x + δu u Follows from Stokes theorem (geeralizatio of fudametal theorem to d dimesios)

19 Puttig Thigs Together Expected gradiet o c t : 1 α S, C, C- g t = d c δ t x t + δu t u t, u t ~ B x t+1 = P s (x t ηg t ) E g t x t - = c t x Boud o regret: E c t x t mi x 1 α S c t x t RG

20 Outlie Itroductio Oe poit estimate Mai Theorem Algorithm Observatios Proof Sketch Results Extesios ad Related Work

21 The Algorithm Badit-Gradiet-Descet(α, δ, ν) x 1 = 0 At time t Select u t ~ S Play x t + δu t Observe c t (x t + δu t ) x t+1 = P 1 α S (x t νc t x t + δu t u t )

22 Terms i the boud Expected gradiet for ct 1 α S S Differece betwee mi i 1 α S ad S Differece betwee c x, x c y, y S 1 α S ad

23 Observatio I If we take a step of size αr from x we stay i S 1 α S, Bouds o S: rb S RB S cotais the origi r S R αrb cetered at x S. So, 1 α S + αrb 1 α S + αs = S

24 Observatio II From expected gradiet (η = νδ/d) E ct x t mi x 1 α S ct x t RG Gradiet boud G g t = d δ c t x t + δu t u t dc δ Regret boud: RdC δ

25 Observatio III Optimum i 1 α S is ear optimum i S From Jese s iequality c t 1 α x + α0 1 α c t x + αc t 0 c t 1 α x c t x α c t 0 c t x 2αC Summig up mi x 1 α S c t x mi x S c t x 2αC

26 Observatio IV Lipcshitz across 1 α S ad S: For x S, y 1 α S 2C x y c t x c t y αr Obvious whe Δ = x y > αr Otherwise we pick a poit z S i the directio of Δ ad use Jese s iequality

27 Proof Sketch I Combiig all the observatios E c t x t mi x S c t x t RdC (expected gradiet) δ + 6δC (effective Lipcshtiz) αr +2αC (differece i mi)

28 Boud is of the form Proof Sketch II a 1 δ + b δ α + cα Settig δ = 3 3 abc 3 a2 bc 3, α = ab c 2 gives a boud of Note: a = RdC, b = 6C r, c = 2C

29 Theorem For 3Rd 2r 3 δ = rr2 d , ν = RC, 3, α = 3Rd 2r We ca show a boud of 3C dr r

30 Outlie Itroductio Oe Poit Estimate Mai Theorem Extesios ad Related Work Boud with a Lipschitz Costat Reshapig to Isotropic Positio Related Work

31 Boud with a Lipschitz Costat Whe each c t is L Lipschitz, for suitable values of α, δ, ν We ca show a boud of RdC L + C r Ituitio: use the direct Lipschtiz costat istead of the effective oe

32 Reshapig Depedece o R/r is ot ideal Trasform S to be i its isotropic positio Affie trasformatio so that covariace = I r = 1, R = 1.01d, L = LR, C = C

33 Related Work Klieberg (idepedetly) O( 3 4) boud for the same problem Phases of legth d + 1 Radom oe-poit gradiet estimates Oly oblivious adversaries Olie liear optimizatio i badit settig Kalai ad Vempala show a boud of O( )

1 Duality revisited. AM 221: Advanced Optimization Spring 2016

1 Duality revisited. AM 221: Advanced Optimization Spring 2016 AM 22: Advaced Optimizatio Sprig 206 Prof. Yaro Siger Sectio 7 Wedesday, Mar. 9th Duality revisited I this sectio, we will give a slightly differet perspective o duality. optimizatio program: f(x) x R