Online learning with noisy side observations

Size: px

Start display at page:

Download "Online learning with noisy side observations"

Paulina Stevens
5 years ago
Views:

1 Online learning with noisy side observations Tomáš Kocák Gergely Neu Michal Valko Inria Lille - Nord Europe, France DTIC, Universitat Pompeu Fabra, Barcelona, Spain Inria Lille - Nord Europe, France SequeL Inria Lille AISTATS 2016 Cadiz, April 2016

2 Motivation Example Example Fishing spots Every day: Choose fishing spot (at the beginning of the day) Maximize total number of fish caught This example is motivated by an example of Wu et al Online learning with noisy side observations SequeL - 1/17

3 Motivation Example Example Fishing spots Every day: Choose fishing spot (at the beginning of the day) Maximize total number of fish caught This example is motivated by an example of Wu et al Online learning with noisy side observations SequeL - 1/17

4 Problem representation Previous representation Problem representation Previous representation, introduced by Mannor and Shamir 2011 Actions are nodes of directed unweighted graph Playing action reveals losses c t,j of neighbors j N(I t ) c t,j = l t,j Online learning with noisy side observations SequeL - 2/17

5 Problem representation Previous representation Problem representation Previous representation, introduced by Mannor and Shamir 2011 Perfect observations are in many cases not realistic Actions are nodes of directed unweighted graph Playing action reveals losses c t,j of neighbors j N(I t ) c t,j = l t,j Online learning with noisy side observations SequeL - 2/17

6 Problem representation Problem representation in our work Problem representation Problem representation in our paper Actions are nodes of directed weighted graph. Playing action reveals noisy losses c t,j of neighbors j N(I t ). Smaller weight st,(it,j) bigger noise c t,j = s t,(it,j)l t,j + (1 s t,(it,j))ξ t,j Online learning with noisy side observations SequeL - 3/17

7 Problem representation Problem representation in our work Problem representation Problem representation in our paper 0.9l+0.1ξ l+0.8ξ l+0ξ l+0.3ξ Actions are nodes of directed weighted graph. Playing action reveals noisy losses c t,j of neighbors j N(I t ). Smaller weight st,(it,j) bigger noise c t,j = s t,(it,j)l t,j + (1 s t,(it,j))ξ t,j Online learning with noisy side observations SequeL - 3/17

8 Framework Description goals Framework Description and goals Learning process N actions (nodes of a graph) T rounds: Environment sets losses for actions Environment choses a graph structure (not disclosed) Learner picks an action It to play Learner incurs the loss lt,it of the action I t Learner observes graph Learner observes noisy losses ct,j of the neighbors j N(I t) Goal of the learner Minimizing cumulative regret R t = T t=1 l It }{{} Learner min j [N] [ T ] l j t=1 } {{ } Best action Online learning with noisy side observations SequeL - 4/17

9 Regret bounds Unweighted graphs Regret bounds - special cases Unweighted graphs Edgeless graph (Bandit problem) Edgeless graph No side observations Regret bound of Õ( NT ) - Played action - Unobserved action Online learning with noisy side observations SequeL - 5/17

10 Regret bounds Unweighted graphs Regret bounds - special cases Unweighted graphs Edgeless graph (Bandit problem) Complete graph No side observations Regret bound of Õ( NT ) Complete graph (Full information) All side observations Regret bound of Õ( T ) - Played action - Observed action Online learning with noisy side observations SequeL - 5/17

11 Regret bounds Unweighted graphs Regret bounds - special cases Unweighted graphs Edgeless graph (Bandit problem) General graph No side observations Regret bound of Õ( NT ) Complete graph (Full information) All side observations Regret bound of Õ( T ) General unweighted graph (Mannor and Shamir 2011) Some side observations Regret bound of Õ( αt ) α - independence number - Played action - Observed action - Unobserved action Online learning with noisy side observations SequeL - 5/17

12 Regret bounds Unweighted graphs Regret bounds - special cases Unweighted graphs Edgeless graph (Bandit problem) No side observations Regret bound of Õ( NT ) Independence number Complete graph (Full information) All side observations Regret bound of Õ( T ) General unweighted graph (Mannor and Shamir 2011) Some side observations Regret bound of Õ( αt ) α - independence number Size of the largest independence set Not connected nodes Example above: α = 5 Online learning with noisy side observations SequeL - 5/17

13 Regret bounds Unweighted graphs Regret bounds - special cases Unweighted graphs Edgeless graph (Bandit problem) No side observations Regret bound of Õ( NT ) Independence number Complete graph (Full information) All side observations Regret bound of Õ( T ) General unweighted graph (Mannor and Shamir 2011) Some side observations Regret bound of Õ( αt ) α - independence number Size of the largest independence set Not connected nodes Example above: α = 5 Online learning with noisy side observations SequeL - 5/17

14 Regret bounds Unweighted graphs Regret bounds - special cases Unweighted graphs Edgeless graph (Bandit problem) No side observations Regret bound of Õ( NT ) Independence number Complete graph (Full information) All side observations Regret bound of Õ( T ) General unweighted graph (Mannor and Shamir 2011) Some side observations Regret bound of Õ( αt ) α - independence number Size of the largest independence set Not connected nodes Example above: α = 5 NT αt T Online learning with noisy side observations SequeL - 5/17

15 General template Exp3 type algorithm template In every round: Compute exponential weights using loss estimates ˆl s,i ( ) t 1 w t,i = exp η t ˆl s,i s=1 Create a probability distribution such that p t,i w t,i Play action I t such that P(I t = i) = p t,i = w t,i N j=1 w t,j Create loss estimates ˆl t,i (using observability graph) Loss estimates define the algorithm Loss estimates compensate for lack of side observations Online learning with noisy side observations SequeL - 6/17

16 General template Exp3 type algorithm template In every round: Compute exponential weights using loss estimates ˆl s,i ( ) t 1 w t,i = exp η t ˆl s,i s=1 Create a probability distribution such that p t,i w t,i Play action I t such that P(I t = i) = p t,i = w t,i N j=1 w t,j Create loss estimates ˆl t,i (using observability graph) Loss estimates define the algorithm Loss estimates compensate for lack of side observations Question: What are good loss estimates? Online learning with noisy side observations SequeL - 6/17

17 Loss estimates Loss estimates Desired property of loss estimates: Good loss approximation (unbiased estimates) ] E [ˆlt,i l t,i Online learning with noisy side observations SequeL - 7/17

18 Loss estimates Loss estimates Desired property of loss estimates: Good loss approximation (unbiased estimates) ] E [ˆlt,i l t,i We can use noisy side observations: E [c t,i ] = E [ s t,(it,i)l t,j ] }{{} ( N j=1 s t,(j,i)p t,j)l t,i + E [ (1 s t,(it,i))ξ t,i ] }{{} 0 Online learning with noisy side observations SequeL - 7/17

19 Loss estimates Loss estimates Desired property of loss estimates: Good loss approximation (unbiased estimates) ] E [ˆlt,i l t,i We can use noisy side observations: Loss estimates E [c t,i ] = E [ s t,(it,i)l t,j ] }{{} ( N j=1 s t,(j,i)p t,j)l t,i + E [ (1 s t,(it,i))ξ t,i ] }{{} 0 ˆl t,i = c t,i N j=1 s t,(j,i)p t,j Online learning with noisy side observations SequeL - 7/17

20 Loss estimates Loss estimates First attempt ˆl t,i = s t,(i t,i)l t,i + (1 s t,(it,j))ξ t,i N j=1 s t,(j,i)p t,j Pros: Unbiased estimates (good approximation of real losses) Cons: Unreliable observations are included (small weights) Large variance of estimates No theoretical guaranties! Online learning with noisy side observations SequeL - 8/17

real losses) Cons: Unreliable observations are included (small weights) Large variance

21 Loss estimates Loss estimates First attempt ˆl t,i = s t,(i t,i)l t,i + (1 s t,(it,j))ξ t,i N j=1 s t,(j,i)p t,j Pros: Unbiased estimates (good approximation of real losses) Cons: Unreliable observations are included (small weights) Large variance of estimates No theoretical guaranties! Online learning with noisy side observations SequeL - 8/17

22 Second attempt Loss estimates Second attempt - thresholding (Exp3-IXt algorithm) Small weights are not reliable Online learning with noisy side observations SequeL - 9/17

23 Second attempt Loss estimates Second attempt - thresholding (Exp3-IXt algorithm) Small weights are not reliable Online learning with noisy side observations SequeL - 9/17

24 Second attempt Loss estimates Second attempt - thresholding (Exp3-IXt algorithm) Small weights are not reliable Online learning with noisy side observations SequeL - 9/17

25 Second attempt Loss estimates Second attempt - thresholding (Exp3-IXt algorithm) Small weights are not reliable Loss estimates for thresholded weights: ˆl t,i = c t,i I{s t,(it,i) ε} N j=1 s t,(j,i)p t,j I{s t,(j,i) ε} Online learning with noisy side observations SequeL - 9/17

26 Second attempt Loss estimates Second attempt - thresholding (Exp3-IXt algorithm) Small weights are not reliable Loss estimates for thresholded weights: ˆl t,i = c t,i I{s t,(it,i) ε} N j=1 s t,(j,i)p t,j I{s t,(j,i) ε} Online learning with noisy side observations SequeL - 9/17

27 Second attempt Loss estimates Second attempt - Exp3-IXt algorithm ˆl t,i = c t,i I{s t,(it,i) ε} N j=1 s t,(j,i)p t,j I{s t,(j,i) ε} Theorem (Regret bound of Exp3-IXt) When tuned properly, Exp3-IXt has regret bound of ( ) α(ε) E [R t ] = Õ ε T 2 α(ε) - independence number of thresholded graph Delete all the edges with weight smaller than ε Compute independence number (ignoring weights) Online learning with noisy side observations SequeL - 10/17

28 Second attempt Loss estimates Second attempt - Exp3-IXt algorithm ˆl t,i = c t,i I{s t,(it,i) ε} N j=1 s t,(j,i)p t,j I{s t,(j,i) ε} Theorem (Regret bound of Exp3-IXt) When tuned properly, Exp3-IXt has regret bound of ( ) α(ε) E [R t ] = Õ ε T 2 α(ε) - independence number of thresholded graph Delete all the edges with weight smaller than ε Compute independence number (ignoring weights) Online learning with noisy side observations SequeL - 10/17

29 Second attempt Effective independence number Effective independence number (setting ε to optimal value) α α(ε) = min ε [0,1] ε 2 Online learning with noisy side observations SequeL - 11/17

30 Second attempt Effective independence number Effective independence number (setting ε to optimal value) α α(ε) = min ε [0,1] ε 2 Regret bound of Exp3-IXt For optimal value of ε, Exp3-IXt has regret bound of ( ) E [R t ] = Õ α T Online learning with noisy side observations SequeL - 11/17

31 Second attempt Effective independence number α can be much smaller then N Example: complete graph with uniformly distributed weights α log(n) U(0, 1) weights Online learning with noisy side observations SequeL - 12/17

32 Second attempt Effective independence number α can be much smaller then N Example: complete graph with uniformly distributed weights α 1 c 2 U(c, 1) weights Online learning with noisy side observations SequeL - 12/17

33 Second attempt Effective independence number Question: How do we set ε? Online learning with noisy side observations SequeL - 13/17

34 Second attempt Effective independence number Question: How do we set ε? Answer: Finding optimal ε is hard Necessary to know whole graph Computing α(ε) is NP hard Online learning with noisy side observations SequeL - 13/17

35 Third attempt Loss estimates Third attempt - Exp3-WIX algorithm ˆl t,i = [ st,(it,i)l t,i + (1 s t,(it,j))ξ t,i ] N j=1 s t,(j,i) p t,j Online learning with noisy side observations SequeL - 14/17

36 Third attempt Loss estimates Third attempt - Exp3-WIX algorithm ˆl t,i = s [ ] t,(i t,i) st,(it,i)l t,i + (1 s t,(it,j))ξ t,i N j=1 s2 t,(j,i) p t,j Online learning with noisy side observations SequeL - 14/17

37 Third attempt Loss estimates Third attempt - Exp3-WIX algorithm ˆl t,i = s [ ] t,(i t,i) st,(it,i)l t,i + (1 s t,(it,j))ξ t,i N j=1 s2 t,(j,i) p t,j Properties of the estimates Unbiased estimates Smaller variance. Multiplying by s t,(it,i): Pulls estimate towards zero when weight is small Decreasing variance of noise Online learning with noisy side observations SequeL - 14/17

38 Third attempt Weighted graph Third attemp - regret bound of Exp3-WIX algorithm Theorem (Regret bound of Exp3-WIX) When tuned properly, Exp3-WIX algorithm has regret bound of ( ) E [R t ] = Õ α T Online learning with noisy side observations SequeL - 15/17

39 Third attempt Weighted graph Third attemp - regret bound of Exp3-WIX algorithm Theorem (Regret bound of Exp3-WIX) When tuned properly, Exp3-WIX algorithm has regret bound of ( ) E [R t ] = Õ α T Advantages over Exp3-IXt algorithm No thresholding (using all side observations) Algorithm does not need to know the best ε (NP-hard) Regret bound of order α T Online learning with noisy side observations SequeL - 15/17

40 Experiments Empirical performance Experiments Empirical performance Exp3 - basic algorithm which ignores all side observations Exp3-IXt - thresholded algorithm (needs to set ε) Exp3-WIX - proposed algorithm Adaptive learning rate Exp3-WIX Exp3 Exp3-IXt Cumulative regret Value of ε Online learning with noisy side observations SequeL - 16/17

41 Conclusion Conclusion New setting with noisy side observations Introduction of effective independence number α Exp3-WIX algorithm for the setting Does not need to threshold Does not need to know whole graph Regret bound of order α T Open questions: Is the effective independence number right quantity? Is there a matching lower-bound for Exp3-WIX? Upper-bound of Exp3-WIX matches lower-bound for some cases (e.g., bandits, full information, setting of Mannor and Shamir 2011) Related lower-bound (Wu et al. 2015) for a stochastic setting with Gaussian noise ( ) α R t = Ω ε T 2 Online learning with noisy side observations SequeL - 17/17

42 Thank you! Tomorrow session: Poster 10 SequeL Inria Lille AISTATS 2016 Toma s Koca k tomas.kocak@inria.fr sequel.lille.inria.fr

Efficient learning by implicit exploration in bandit problems with side observations

Efficient learning by implicit exploration in bandit problems with side observations Tomáš Kocák, Gergely Neu, Michal Valko, Rémi Munos SequeL team, INRIA Lille - Nord Europe, France SequeL INRIA Lille