Allocating Resources, in the Future

Size: px

Start display at page:

Download "Allocating Resources, in the Future"

Christopher Marshall
5 years ago
Views:

1 Allocating Resources, in the Future Sid Banerjee School of ORIE May 3, 2018 Simons Workshop on Mathematical and Computational Challenges in Real-Time Decision Making

2 online resource allocation: basic model θ (1) θ (2) θ (3) θ (t) θ (T) B 1 =3 single resource, initial capacity B; T agents arrive sequentially agent t has type θ (t) = reward earned if agent is allocated 1/18

3 online resource allocation: basic model θ (1) θ (2) θ (3) θ (t) θ (T) B 2 =3 single resource, initial capacity B; T agents arrive sequentially agent t has type θ (t) = reward earned if agent is allocated principle makes irrevocable decisions 1/18

4 online resource allocation: basic model θ (1) θ (2) θ (3) θ (t) θ (T) B 3 =2 single resource, initial capacity B; T agents arrive sequentially agent t has type θ (t) = reward earned if agent is allocated principle makes irrevocable decisions; resource is non-replenishable 1/18

5 online resource allocation: basic model θ (1) θ (2) θ (3) θ (t) θ (T) B t =1 single resource, initial capacity B; T agents arrive sequentially agent t has type θ (t) = reward earned if agent is allocated principle makes irrevocable decisions; resource is non-replenishable assumptions on agent types {θ t } finite set of values {v i } n (e.g. θ (t) = v i with prob p i i.i.d.) in general: arrivals can be time varying, correlated 1/18

6 online resource allocation: basic model θ (1) θ (2) θ (3) θ (t) θ (T) B t =1 single resource, initial capacity B; T agents arrive sequentially agent t has type θ (t) = reward earned if agent is allocated principle makes irrevocable decisions; resource is non-replenishable assumptions on agent types {θ t } finite set of values {v i } n (e.g. θ (t) = v i with prob p i i.i.d.) in general: arrivals can be time varying, correlated online resource allocation problem allocate resources to maximize sum of rewards 1/18

7 online resource allocation: first generalization θ (1) θ (2) θ (3) θ (t) θ (T) θ ~ (A i,v i ) w.p. p i d resources, initial capacities (B 1, B 2,..., B d ) T agents; each has type θ i = (A i, v i ) A i {0, 1} d : resource requirement, v i : value agent has type θ i with prob p i also known as: network revenue management; single-minded buyer 2/18

8 online resource allocation: second generalization θ (1) θ (2) θ (3) θ (t) θ (T) θ ~ (v i1,v i2 ) w.p. p i d resources, initial capacities (B 1, B 2,..., B d ) T agents arrive sequentially each has type θ = (v i1, v i2,..., v id ), wants single resource also known as: online weighted matching; unit-demand buyer 3/18

9 online allocation across fields related problems studied in Markov decision processes, online algorithms, prophet inequalities, revenue management, etc. informational variants: distributional knowledge bandit settings adversarial inputs 4/18

10 the technological zeitgeist the deep learning revolution vast improvements in machine learning for data-driven prediction 5/18

11 axiomatizing the zeitgeist the deep learning revolution vast improvements in machine learning for data-driven prediction axiom: have access to black-box predictive algorithms 6/18

12 axiomatizing the zeitgeist the deep learning revolution vast improvements in machine learning for data-driven prediction axiom: have access to black-box predictive algorithms core question of this talk how does having such an oracle affect online resource allocation? TL;DR - new online allocation policies with strong regret bounds re-examining old questions leads to surprising new insights 6/18

13 bridging online allocation and predictive models The Bayesian Prophet: A Low-Regret Framework for Online Decision Making Alberto Vera & S.B. (2018) 7/18

14 focus of talk: allocation with single-minded agents θ (1) θ (2) θ (3) θ (t) θ (T) θ ~ (A i,v i ) w.p. p i d resources, initial capacities (B 1, B 2,..., B d ) T agents arrive sequentially; each has type θ = (A, v) A = resource requirement, v = value agent has type θ i with prob p i, i.i.d. online allocation problem allocate resources to maximize sum of rewards 8/18

15 performance measure θ (1) θ (2) θ (3) θ (t) θ (T) θ ~ (A i,v i ) w.p. p i optimal policy can be computed via dynamic programming requires exact distributional knowledge curse of dimensionality : state-space = T B 1... B d does not quantify cost of uncertainty 9/18

16 performance measure θ (1) θ (2) θ (3) θ (t) θ (T) θ ~ (A i,v i ) w.p. p i optimal policy can be computed via dynamic programming requires exact distributional knowledge curse of dimensionality : state-space = T B 1... B d does not quantify cost of uncertainty prophet benchmark V off : OFFLINE optimal policy; has full knowledge of {θ 1, θ 2,..., θ T } 9/18

17 performance measure: regret prophet benchmark: V off OFFLINE knows entire type sequence {θ t t = 1... T } for the network revenue management setting, V off given by max. s.t. x i v i A i x i B 0 x i N i [1 : T ] N i [1 : T ] # of arrivals of type θ i = (A i, v i ) over {1, 2,..., T } regret E[Regret] = E[V off V alg ] 10/18

18 online allocation with prediction oracle given black-box predictive oracle about performance of OFFLINE (specifically, for any t, B, have statistical info about V off [t, T ]) 11/18

19 online allocation with prediction oracle given black-box predictive oracle about performance of OFFLINE (specifically, for any t, B, have statistical info about V off [t, T ]) let π t = P [ V off [t, T ] decreases if OFFLINE accepts t th arrival ] 11/18

20 online allocation with prediction oracle given black-box predictive oracle about performance of OFFLINE (specifically, for any t, B, have statistical info about V off [t, T ]) let π t = P [ V off [t, T ] decreases if OFFLINE accepts t th arrival ] Bayes selector accept t th arrival iff π t > /18

21 online allocation with prediction oracle given black-box predictive oracle about performance of OFFLINE (specifically, for any t, B, have statistical info about V off [t, T ]) let π t = P [ V off [t, T ] decreases if OFFLINE accepts t th arrival ] Bayes selector accept t th arrival iff π t > 0.5 theorem [Vera & B, 2018] (under mild tail bounds on N i [t : T ]) Bayes selector has E[Regret] independent of T, B 1, B 2,..., B d 11/18

22 online allocation with prediction oracle given black-box predictive oracle about performance of OFFLINE (specifically, for any t, B, have statistical info about V off [t, T ]) let π t = P [ V off [t, T ] decreases if OFFLINE accepts t th arrival ] Bayes selector accept t th arrival iff π t > 0.5 theorem [Vera & B, 2018] (under mild tail bounds on N i [t : T ]) Bayes selector has E[Regret] independent of T, B 1, B 2,..., B d arrivals can be time-varying, correlated; discounted rewards works for general settings (single-minded, unit-demand, etc.) can use approx oracle (e.g., from samples) 11/18

23 standard approach: randomized admission control (RAC) offline optimum V off max. s.t. x i v i A i x i B 0 x i N i [1 : T ] 12/18

24 standard approach: randomized admission control (RAC) offline optimum V off max. s.t. x i v i A i x i B 0 x i N i [1 : T ] max. s.t. (upfront) fluid LP V fl x i v i A i x i B 0 x i E[N i [1 : T ]] = Tp i E[V off ] V fl (via Jensen s, concavity of V off w.r.t. N i ) fluid RAC: accept type θ i with prob x i Tp i 12/18

25 standard approach: randomized admission control (RAC) offline optimum V off max. s.t. x i v i A i x i B 0 x i N i [1 : T ] max. s.t. (upfront) fluid LP V fl x i v i A i x i B 0 x i E[N i [1 : T ]] = Tp i E[V off ] V fl (via Jensen s, concavity of V off w.r.t. N i ) fluid RAC: accept type θ i with prob x i Tp i proposition fluid RAC has E[Regret] = Θ( T ) [Gallego & van Ryzin 97], [Maglaras & Meissner 06] N.B. this is a static policy! 12/18

26 RAC with re-solving offline optimum V off max. s.t. x i v i A i x i B 0 x i N i max. s.t. re-solved fluid LP V fl (t): x i [t]v i A i x i [t] B[t] 0 x i [t] E[N i [t : T ]] = (T t)p i AC with re-solving: at time t, accept type θ i with prob x i [t] (T t)p i 13/18

27 RAC with re-solving offline optimum V off max. s.t. x i v i A i x i B 0 x i N i max. s.t. re-solved fluid LP V fl (t): x i [t]v i A i x i [t] B[t] 0 x i [t] E[N i [t : T ]] = (T t)p i AC with re-solving: at time t, accept type θ i with prob x i [t] (T t)p i regret improves to o( T ) [Reiman & Wang 08] O(1) regret under (dual) non-degeneracy [Jasin & Kumar 12] 13/18

28 RAC with re-solving offline optimum V off max. s.t. x i v i A i x i B 0 x i N i max. s.t. re-solved fluid LP V fl (t): x i [t]v i A i x i [t] B[t] 0 x i [t] E[N i [t : T ]] = (T t)p i AC with re-solving: at time t, accept type θ i with prob x i [t] (T t)p i regret improves to o( T ) [Reiman & Wang 08] O(1) regret under (dual) non-degeneracy [Jasin & Kumar 12] most results use V fl as benchmark (including prophet inequality ) proposition [Vera & B 18] for degenerate instances, V fl E[V off ] = Ω( T ) 13/18

29 Bayes selector for i.i.d arrivals Bayes selector π t = P [ V off [t, T ] decreases if OFFLINE accepts t th arrival ] accept t th arrival iff π t > /18

30 Bayes selector for i.i.d arrivals Bayes selector π t = P [ V off [t, T ] decreases if OFFLINE accepts t th arrival ] accept t th arrival iff π t > 0.5 re-solved fluid LP max. x i [t]v i s.t. Ax[t] B[t], 0 x i [t] E[N i [t : T ]] 14/18

31 Bayes selector for i.i.d arrivals Bayes selector π t = P [ V off [t, T ] decreases if OFFLINE accepts t th arrival ] accept t th arrival iff π t > 0.5 re-solved fluid LP max. x i [t]v i s.t. Ax[t] B[t], 0 x i [t] E[N i [t : T ]] a the re-solved LP gives an approximate admission oracle fluid Bayes selector accept type θ i iff x i [t] E[N i [t:t ]] > /18

32 Bayes selector for i.i.d arrivals Bayes selector π t = P [ V off [t, T ] decreases if OFFLINE accepts t th arrival ] accept t th arrival iff π t > 0.5 re-solved fluid LP max. x i [t]v i s.t. Ax[t] B[t], 0 x i [t] E[N i [t : T ]] a the re-solved LP gives an approximate admission oracle fluid Bayes selector accept type θ i iff x i [t] E[N i [t:t ]] > 0.5 proposition [Vera & B, 2018] fluid Bayes selector has E[Regret] 2v max n p 1 i 14/18

33 Bayes selector for i.i.d arrivals Bayes selector π t = P [ V off [t, T ] decreases if OFFLINE accepts t th arrival ] accept t th arrival iff π t > 0.5 re-solved fluid LP max. x i [t]v i s.t. Ax[t] B[t], 0 x i [t] E[N i [t : T ]] a the re-solved LP gives an approximate admission oracle fluid Bayes selector accept type θ i iff x i [t] E[N i [t:t ]] > 0.5 proposition [Vera & B, 2018] fluid Bayes selector has E[Regret] 2v max n p 1 i proposed for multi-secretary by [Gurvich & Arlotto, 2017] NRM via partial resolving [Bumpensanti & Wang, 2018] 14/18

34 proof outline the proof comprises two parts 1. compensated coupling: regret bound for Bayes selector for generic online decision problem 2. bound compensation for online packing problems via LP sensitivity, measure concentration 15/18

35 the compensated coupling: make OFFLINE follow ONLINE for any time t, budget B[t] let V off (t, B[t]) OFFLINE starting from current state 16/18

36 the compensated coupling: make OFFLINE follow ONLINE for any time t, budget B[t] let V off (t, B[t]) OFFLINE starting from current state for any action a, disagreement set Q t (a) set of sample-paths ω where a is sub-optimal (given B[t]) 16/18

37 the compensated coupling: make OFFLINE follow ONLINE for any time t, budget B[t] let V off (t, B[t]) OFFLINE starting from current state for any action a, disagreement set Q t (a) set of sample-paths ω where a is sub-optimal (given B[t]) can compensate OFFLINE to follow same action a as ONLINE V off (t, B[t]) R alg t + v max 1 ω Qt(a) + V off (t + 1, B[t + 1]) 16/18

38 the compensated coupling: make OFFLINE follow ONLINE for any time t, budget B[t] let V off (t, B[t]) OFFLINE starting from current state for any action a, disagreement set Q t (a) set of sample-paths ω where a is sub-optimal (given B[t]) can compensate OFFLINE to follow same action a as ONLINE V off (t, B[t]) R alg t + v max 1 ω Qt(a) + V off (t + 1, B[t + 1]) iterating, we get E[V off ] E[V alg ] + v max T t=1 note: Bayes selector picks a t = min a P[Q t (a t )] P[Q t (a t )] 16/18

39 compensated coupling for single resource allocation for any time t, budget B[t] if Bayes selector rejects type θ i, assume OFFLINE front-loads θ i error only if OFFLINE rejects all future θ i if Bayes selector accepts type θ i, assume OFFLINE back-loads θ i error only if OFFLINE accepts all future θ i 17/18

40 compensated coupling for single resource allocation for any time t, budget B[t] if Bayes selector rejects type θ i, assume OFFLINE front-loads θ i error only if OFFLINE rejects all future θ i if Bayes selector accepts type θ i, assume OFFLINE back-loads θ i error only if OFFLINE accepts all future θ i c(t t) claim: smaller of the two events has probability e 17/18

41 summary online allocation via the Bayes selector new online allocation policy with horizon-independent regret way to use black-box predictive algorithms generic regret bounds for any online decision problem 18/18

42 Thanks! 18/18

Low-Regret for Online Decision-Making

Low-Regret for Online Decision-Making Siddhartha Banerjee and Alberto Vera November 6, 2018 1/17 Introduction Compensated Coupling Bayes Selector Conclusion Motivation 2/17 Introduction Compensated Coupling Bayes Selector Conclusion Motivation