The Knowledge Gradient for Sequential Decision Making with Stochastic Binary Feedbacks

Size: px

Start display at page:

Download "The Knowledge Gradient for Sequential Decision Making with Stochastic Binary Feedbacks"

Malcolm Gibson
5 years ago
Views:

1 The Knowledge Gradient for Sequential Decision Making with Stochastic Binary Feedbacks Yingfei Wang, Chu Wang and Warren B. Powell Princeton University Yingfei Wang Optimal Learning Methods June 22, / 21

2 Sequential Decision Problems M discrete alternatives Unknow truth µ x Each time n, the learner chooses an alternative x n, receives reward W n x n. offline objective max E π [µ x N ] online objective max E π N 1 n=0 [µ x n] Overview Numerous Communities Multi-armed bandits Ranking and selection Stochastic search Control theory... Various Applications Recommendations: ads, news Packet routing Revenue management Laboratory experiments guidance:... Yingfei Wang Optimal Learning Methods June 22, / 21

3 Applications with binary outputs Revenue management: whether or not a customer books a room. Health analytics: success (patient does not need to return for more treatment) or failure (patient does need followup care). Production of single or double-walled nanotubes: controllable parameters: catalyst, laser power, Hydrogen, pressure, temperature, Ar/CO2, ethylene etc. Yingfei Wang Optimal Learning Methods June 22, / 21

4 Applications with binary outputs Revenue management: whether or not a customer books a room. Health analytics: success (patient does not need to return for more treatment) or failure (patient does need followup care). Production of single or double-walled nanotubes: controllable parameters: catalyst, laser power, Hydrogen, pressure, temperature, Ar/CO2, ethylene etc. Single Wall Double Wall Yingfei Wang Optimal Learning Methods June 22, / 21

5 Outline 1 Sequential Decision Problems with Binary Outputs 2 The Knowledge Gradient Policy 3 Experimental Results Yingfei Wang Optimal Learning Methods June 22, / 21

6 Sequential Decision Problems with Binary Outputs 1 Sequential Decision Problems with Binary Outputs Model Bayesian linear classification and Laplace approximation 2 The Knowledge Gradient Policy 3 Experimental Results Yingfei Wang Optimal Learning Methods June 22, / 21

7 Sequential Decision Problems with Binary Outputs Model Model A finite set of alternatives x X = {x 1,..., x M }. Binary outcome y { 1, +1} with unknown probability p(y = +1 x). Goal: given a limited budget N, choose the measurement policy (x 0,..., x N 1 ) and the implementation decision that maximizes p(y = +1 x). Generalized linear model for modeling probability p(y = +1 x, w) = σ(w T x), where σ(a) = 1 1+exp( a) or σ(a) = Φ(a) = a N (s 0, 12 )ds. Yingfei Wang Optimal Learning Methods June 22, / 21

8 Sequential Decision Problems with Binary Outputs Bayesian linear classification and Laplace approximation Logistic and probit regression Training set D = {(x i, y i )} n i=1 Likelihood p(d w) = n i=1 σ(y i w T x i ). ŵ = arg min w n i=1 log(σ(y i w T x i )). Bayesian logistic and probit regression p(w D) = 1 Z p(d w)p(w) p(w) n i=1 σ(y i w T x i ). Extend to leverage for sequential model updates: p(w D 0 ) x 0,y 1 p(w D 1 ) x 1,y 2 p(w D 2 ) Exact Bayesian inference for linear classifier is intractable. Monte Carlo sampling or analytic approximations to the posterior: Laplace approximation. Yingfei Wang Optimal Learning Methods June 22, / 21

9 Sequential Decision Problems with Binary Outputs Bayesian linear classification and Laplace approximation Laplace approximation Ψ(w) = log p(d w) + log p(w). Second-order Taylor expansion to Ψ around its MAP (maximum a posteriori) solution ŵ = arg max w Ψ(w): Ψ(w) Ψ(ŵ) 1 2 (w ŵ)t H(w ŵ), H = 2 Ψ(w) w=ŵ. Laplace approximation to the posterior p(w D) N (w ŵ, H 1 ). Yingfei Wang Optimal Learning Methods June 22, / 21

10 Sequential Decision Problems with Binary Outputs Bayesian linear classification and Laplace approximation Online Bayesian linear classification based on Laplace approximation Extend to leverage for sequential model updates: Laplace approximated posterior serves as prior for the next available data. p(w j ) = N ( w j m 0 j, (q0 j ) 1) (m n j, qn j ) {x n,y n+1 } (m n+1 j, q n+1 j ) ˆt := 2 log σ(y i w T f 2 i x) f =ŵ T x m n+1 = arg max w 1 2 q n+1 j = q n j ˆtx 2 j d qi n (w i mi n ) 2 + log(σ(yw T x)) i=1 Yingfei Wang Optimal Learning Methods June 22, / 21

11 Sequential Decision Problems with Binary Outputs Bayesian linear classification and Laplace approximation Online Bayesian linear classification based on Laplace approximation arg max w 1 2 d q i (w i m i ) 2 + log(σ(yw T x)). i=1 1-dimensional bisection method: Set Ψ/ w i = 0. Define p as p := σ (yw T x) σ(yw T x). Then we have w i = m i + yp x i q i. p = σ (p d i=1 x i 2/q i + ym T x) σ(p d i=1 x i 2/q i + ym T x). The equation has a unique solution in interval [0, σ (ym T x)/σ(ym T x)]. Yingfei Wang Optimal Learning Methods June 22, / 21

12 The Knowledge Gradient Policy 1 Sequential Decision Problems with Binary Outputs 2 The Knowledge Gradient Policy Knowledge Gradient Policy for Lookup Table Model Knowledge Gradient Policy for Linear Bayesian Classification 3 Experimental Results Yingfei Wang Optimal Learning Methods June 22, / 21

13 The Knowledge Gradient Policy Characteristics of our problems Expensive experiments. Small samples. Requiring that we learn from our decisions as quickly as possible. Yingfei Wang Optimal Learning Methods June 22, / 21

14 The Knowledge Gradient Policy Knowledge Gradient Policy for Lookup Table Model Knowledge gradient policy for lookup table model [3] M discrete alternatives, unknow truth µ x, W x = µ x + ɛ µ x F n N (θ n x, σ n x ) Knowledge state S n = (θ n, σ n ), V (s) = max x θ x ν KG x (S n ) = E[V ( S n+1 (x) ) V (S n )] = E[max θ n+1 x x (x) max θ n x x S n ]. The Knowledge Gradient (KG) policy X KG (S n ) = arg max x ν KG x (S n ). Change which produces a change in the decision. Change in estimate of value of alternative 5 due to measurement. Yingfei Wang Optimal Learning Methods June 22, / 21

15 The Knowledge Gradient Policy Knowledge Gradient Policy for Linear Bayesian Classification Knowledge gradient policy for linear Bayesian classification belief model y x w Bernoulli(σ(w T x)) w j F n N (m n j, (qn j ) 1 ) Knowledge state S n = (m n, q n ) V ( s ) = max x p(y x = +1 x, s) ν KG x (S n ) = E [ V ( S n+1 (x, y) ) V (S n ) S n] = E [ max p ( y x x = +1 x, S n+1 (x, y) ) max p(y x x = +1 x, S n ) S n] The Knowledge Gradient (KG) policy X KG (S n ) = arg max x ν KG x (S n ). The knowledge gradient policy can work with any choice of link function σ( ) and approximation procedures by adjusting the transition function S n+1 (x, ) accordingly. Online learning [7]: X OLKG (S n ) = arg max x p(y = +1 x, S n ) + (N n)ν KG x (S n ). Yingfei Wang Optimal Learning Methods June 22, / 21

16 Experimental Results 1 Sequential Decision Problems with Binary Outputs 2 The Knowledge Gradient Policy 3 Experimental Results Behavior of the KG policy Comparison with other Policies Yingfei Wang Optimal Learning Methods June 22, / 21

17 Experimental Results Behavior of the KG policy Sampling behavior of the KG policy x 2 0 x 2 0 x x 1 x 1 x x 2 0 x 2 0 x x 1 x 1 x 1 Yingfei Wang Optimal Learning Methods June 22, / 21

18 Experimental Results Behavior of the KG policy Absolute class distribution error x x x x x Figure: Absolute distribution error. Yingfei Wang Optimal Learning Methods June 22, / 21

19 Experimental Results Comparison with other Policies Competing policies random sampling (Random) a myopic method that selects the most uncertain instance each step (MostUncertain) discriminative batch-mode active learning (Disc) [4] with batch size set to 1 expected improvement (EI) [8] with an initial fit of 5 examples Thompson sampling (TS) [2] UCB on the latent function w T x (UCB) [6] Metric Opportunity Cost (OC) OC := max x X p(y = +1 x, w ) p(y = +1 x N+1, w ). Yingfei Wang Optimal Learning Methods June 22, / 21

20 Opportunity Cost Opportunity Cost Experimental Results Opportunity Cost Opportunity Cost Comparison with other Policies Opportunity Cost Opportunity Cost Comparison with other Policies 0.3 EI UCB TS 0.25 Random MostUncertain Disc 0.2 KG EI 0.5 EI UCB UCB TS TS Random 0.4 Random MostUncertain MostUncertain Disc Disc KG KG #Iterations (a) sonar #Iterations (b) glass identification #Iterations (c) blood transfusion EI UCB TS Random MostUncertain Disc KG EI UCB TS Random MostUncertain Disc KG EI UCB TS Random MostUncertain Disc KG #Iterations (d) survival #Iterations (e) breast cancer (wpbc) #Iterations (f) planning relax Figure: Opportunity cost on UCI. Yingfei Wang Optimal Learning Methods June 22, / 21

21 Experimental Results Comparison with other Policies Thank you! Questions? Yingfei Wang Optimal Learning Methods June 22, / 21

22 Reference Xavier Boyen and Daphne Koller. Tractable inference for complex stochastic processes. In Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence, pages Morgan Kaufmann Publishers Inc., Olivier Chapelle and Lihong Li. An empirical evaluation of thompson sampling. In Advances in neural information processing systems, pages , Peter I Frazier, Warren B Powell, and Savas Dayanik. A knowledge-gradient policy for sequential information collection. SIAM Journal on Control and Optimization, 47(5): , Yuhong Guo and Dale Schuurmans. Discriminative batch mode active learning. In Advances in neural information processing systems, pages , Steffen L Lauritzen. Propagation of probabilities, means, and variances in mixed graphical association models. Journal of the American Statistical Association, 87(420): , Lihong Li, Wei Chu, John Langford, and Robert E Schapire. A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th international conference on World wide web, pages ACM, Yingfei Wang Optimal Learning Methods June 22, / 21

23 Reference Ilya O Ryzhov, Warren B Powell, and Peter I Frazier. The knowledge gradient algorithm for a general class of online learning problems. Operations Research, 60(1): , Matthew Tesch, Jeff Schneider, and Howie Choset. Expensive function optimization with stochastic binary outcomes. In Proceedings of The 30th International Conference on Machine Learning, pages , Yingfei Wang Optimal Learning Methods June 22, / 21

The Knowledge Gradient for Sequential Decision Making with Stochastic Binary Feedbacks

3 4 5 6 7 8 9 3 4 5 6 7 8 9 3 4 5 6 7 8 9 3 3 3 33 34 35 36 37 38 39 4 4 4 43 44 45 46 47 48 49 5 5 5 53 54 The Knowledge Gradient for Sequential Decision Making with Stochastic Binary Feedbacks Abstract