Planning and Model Selection in Data Driven Markov models

Size: px

Start display at page:

Download "Planning and Model Selection in Data Driven Markov models"

Wilfred Derrick Norris
5 years ago
Views:

1 Planning and Model Selection in Data Driven Markov models Shie Mannor Department of Electrical Engineering Technion Joint work with many people along the way: Dotan Di-Castro (Yahoo!), Assaf Halak (Technion), Ofir Mebel (Apple), Aviv Tamar (Technion), John Tsitsiklis (MIT), Huan Xu (NUS), and many others. ICML, June 2014 S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsicml, June / 36

2 Classical planning problems We typically want to maximize the expected average/discounted reward In planning: Model is known A single scalar reward We believe the model to be true, even though it is not... S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsicml, June / 36

3 Classical planning problems We typically want to maximize the expected average/discounted reward In planning: Model is known A single scalar reward We believe the model to be true, even though it is not... S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsicml, June / 36

4 Classical planning problems We typically want to maximize the expected average/discounted reward In planning: Model is known A single scalar reward We believe the model to be true, even though it is not... S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsicml, June / 36

5 Flavors of uncertainty Natural" uncertainty: random transitions/rewards classical MDP planning. Not this talk. Deterministic uncertainty in the parameters Robust MDPs. 1/2 of this talk. Probabilistic uncertainty in the parameters Bayesian RL/MDPs Not this talk (but a book is coming!). Model uncertainty: model is not known second 1/2 of this talk. S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsicml, June / 36

6 Flavors of uncertainty Natural" uncertainty: random transitions/rewards classical MDP planning. Not this talk. Deterministic uncertainty in the parameters Robust MDPs. 1/2 of this talk. Probabilistic uncertainty in the parameters Bayesian RL/MDPs Not this talk (but a book is coming!). Model uncertainty: model is not known second 1/2 of this talk. S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsicml, June / 36

7 Flavors of uncertainty Natural" uncertainty: random transitions/rewards classical MDP planning. Not this talk. Deterministic uncertainty in the parameters Robust MDPs. 1/2 of this talk. Probabilistic uncertainty in the parameters Bayesian RL/MDPs Not this talk (but a book is coming!). Model uncertainty: model is not known second 1/2 of this talk. S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsicml, June / 36

8 Flavors of uncertainty Natural" uncertainty: random transitions/rewards classical MDP planning. Not this talk. Deterministic uncertainty in the parameters Robust MDPs. 1/2 of this talk. Probabilistic uncertainty in the parameters Bayesian RL/MDPs Not this talk (but a book is coming!). Model uncertainty: model is not known second 1/2 of this talk. S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsicml, June / 36

9 Motivation I: Power grid Unit commitment scheduling problem (very dynamic) Objective: guarantee reliability (99.9%) Model is kind of known, but highly stochastic. Failures (outages) do happen! S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsicml, June / 36

10 Motivation I: Power grid Unit commitment scheduling problem (very dynamic) Objective: guarantee reliability (99.9%) Model is kind of known, but highly stochastic. Failures (outages) do happen! S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsicml, June / 36

11 Motivation II Large US retailer (Fortune 500 company) Marketing problem: send or not send coupon/invitation/mail order catalogue Common wisdom: per customer look at RFM Recency, Frequency, Monetary value Dynamics matter How to discretize? S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsicml, June / 36

12 Motivation II Large US retailer (Fortune 500 company) Marketing problem: send or not send coupon/invitation/mail order catalogue Common wisdom: per customer look at RFM Recency, Frequency, Monetary value Dynamics matter How to discretize? S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsicml, June / 36

13 Motivation II Large US retailer (Fortune 500 company) Marketing problem: send or not send coupon/invitation/mail order catalogue Common wisdom: per customer look at RFM Recency, Frequency, Monetary value Dynamics matter How to discretize? S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsicml, June / 36

Motivation II Large US retailer (Fortune 500 company) Marketing problem: send or not send coupon/invitation/mail order catalogue Common wisdom: per customer look at RFM

14 Motivation II Large US retailer (Fortune 500 company) Marketing problem: send or not send coupon/invitation/mail order catalogue Common wisdom: per customer look at RFM Recency, Frequency, Monetary value Dynamics matter How to discretize? S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsicml, June / 36

15 Common to the problems Real state space is huge with lots of uncertainty and parameters Batch data are available Reasonable simulators, but not ground truth Computational speed less of an issue, but still need to get things done Important low probability events Uncertainty and (model) risk are THE concern S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsicml, June / 36

16 Common to the problems Real state space is huge with lots of uncertainty and parameters Batch data are available Reasonable simulators, but not ground truth Computational speed less of an issue, but still need to get things done Important low probability events Uncertainty and (model) risk are THE concern S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsicml, June / 36

17 Common to the problems Real state space is huge with lots of uncertainty and parameters Batch data are available Reasonable simulators, but not ground truth Computational speed less of an issue, but still need to get things done Important low probability events Uncertainty and (model) risk are THE concern S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsicml, June / 36

18 This talk How to choose the right" model based on available data. How to optimize when the model is not (fully) known? S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsicml, June / 36

19 Part I: The Model Selection Problem We will focus on a simpler problem: 1 Ignore action completely (MRP). We have: State reward next state. 2 We observe a sequence of T observations and rewards that occur in some space O R (O is complicated) D(T ) = (o 1, r 1, o 2, r 2,..., o T, r T ). 3 We are given K mappings from O to states spaces S 1,..., S K, belong to MRPs M 1,..., M K, respectively. Each mapping H i : O S i describes a model where S i = {x (i) 1,..., x (i) S i } is the state space of the MRP M i. 4 We do not describe how the mappings {H i } K i=1 are constructed. [HalakM, KDD 2013] S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsicml, June / 36

20 The Identification Problem A model selection criterion takes as input D T and the models M 1,..., M k, and it returns one of the k models as the proposed true model. Definition: A model selection criterion is weakly consistent if ( ) P i ˆM(DT ) i 0 as n, where P i is the probability induced when model i is the correct model. S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsicml, June / 36

21 Penalized Likelihood Criteria In abstraction: data samples y 1, y 2,..., y T. L i (T ) = max{log P(y 1,..., y T M i (θ))}. θ We denote the dimension of θ by M i. Then, an MDL model estimator has the following structure MDL(i) M i f (T ) L i (T ), where f (T ) is some sub-linear function. Many related criteria: AIC, BIC, and many others. S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

22 Impossibility result Theorem: There does not exist a consistent MDL-like criterion. S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

23 Identifying Markov Reward Processes We look at the aggregate prediction error. Two types of aggregations: 1 Reward aggregation 2 Transition probability aggregation We will focus on refined models: M 1 M 2 is M 2 is a refinement of M 1. S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

24 Reward Aggregation Define reward mean square error (RMSE) operator to be L i RMSE (D T ) = 1 T ɛ(xj i ), j S i where ɛ(xj i ) is the error in state j in model i of the reward estimate. Observation: lim T L i RMSE (D T ) = x S i π(x)var(x). Lemma: Suppose M i contains M k. Then, for a single trajectory D T we have L i RMSE (D T ) L k RMSE (D T ). Moreover, if the states aggregated in M i are with different mean rewards, then the inequality is sharp. Corollary: Consider a series of refined models M 1... M k. Then, L 1 RMSE (D T ) L 2 RMSE (D T )... L k RMSE (D T ). S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

25 Reward Aggregation Score The (reward) score for the j-th model to be ˆM(j) = M j f (T ) T + Lj RMSE (D T ), (1) where f (T ) is a sub-linear increasing function with lim T f (T )/ T. Based on the RMSE, we consider the following model selector { } ˆM RMSE = arg min ˆM(j). j Theorem: The model selector ˆM MSE is weakly consistent. Finite time analysis gives rates. S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

26 Experiments with artificial data The figure reports the test statistic ˆM(k) for different model dimensions k. The error bars are one standard deviation from the mean. S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

27 Experiments with artificial data The test statistic ÂIC(k)/ˆBIC(k) for different model dimensions k. S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

28 Experiments with real data Large US apparel retailer. RFM measures: Recency, Frequency and Monetary value Problem: How to aggregate? Focus on recency 1 Randomly 2 Most recent 3 Least recent S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

29 Real data (random) (lowest) (highest) Each graph is for a different value of f (T )/T : blue =1, green = 10, red = 50. S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

30 Mini-conclusion A very special model selection problem. Standard approaches fail - but not all is lost How to aggregate? Mismatched models? MRP MDP: for optimization and for model discovery S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

31 Mini-conclusion A very special model selection problem. Standard approaches fail - but not all is lost How to aggregate? Mismatched models? MRP MDP: for optimization and for model discovery S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

32 Part II: Robust MDPs When in doubt assume the worse Model = {S, P, R, A} Assume: S and A are known and given P P and R R are not known. Look for a policy with best worst-case performance. S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

33 Part II: Robust MDPs When in doubt assume the worse Model = {S, P, R, A} Assume: S and A are known and given P P and R R are not known. Look for a policy with best worst-case performance. S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

34 Part II: Robust MDPs When in doubt assume the worse Model = {S, P, R, A} Assume: S and A are known and given P P and R R are not known. Look for a policy with best worst-case performance. S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

35 Robust MDPs Problem becomes: ( ) max policy Game against nature min disturbance E policy, disturbance In general: problem is NP-hard [ ] γ t r t Nice (non robust) generative probabilistic interpretation: If Pr gen (disturbance) 1 δ, then ( ) approximates percentile optimization Can get generalization bounds: Robustness = generalization t S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

36 Robust MDPs Problem becomes: ( ) max policy Game against nature min disturbance E policy, disturbance In general: problem is NP-hard [ ] γ t r t Nice (non robust) generative probabilistic interpretation: If Pr gen (disturbance) 1 δ, then ( ) approximates percentile optimization Can get generalization bounds: Robustness = generalization t S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

37 Robust MDPs Problem becomes: ( ) max policy Game against nature min disturbance E policy, disturbance In general: problem is NP-hard [ ] γ t r t Nice (non robust) generative probabilistic interpretation: If Pr gen (disturbance) 1 δ, then ( ) approximates percentile optimization Can get generalization bounds: Robustness = generalization t S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

38 Robust MDPs Problem becomes: ( ) max policy Game against nature min disturbance E policy, disturbance In general: problem is NP-hard [ ] γ t r t Nice (non robust) generative probabilistic interpretation: If Pr gen (disturbance) 1 δ, then ( ) approximates percentile optimization Can get generalization bounds: Robustness = generalization t S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

39 Robust MDPs (Cont ) Problem is solvable when disturbance is a product uncertainty: { disturbance = (P, R) : P U p (s), R } U r (s) s s Take the worst outcome for every state. Eventually solved by Nilim and El-Ghoui (OR, 2005) using DP (assuming regular uncertainty sets). If really bad events are to be included: must make U p and U r huge conservativeness A complicated tradeoff to make: can t correlate bad events S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

40 Robust MDPs (Cont ) Problem is solvable when disturbance is a product uncertainty: { disturbance = (P, R) : P U p (s), R } U r (s) s s Take the worst outcome for every state. Eventually solved by Nilim and El-Ghoui (OR, 2005) using DP (assuming regular uncertainty sets). If really bad events are to be included: must make U p and U r huge conservativeness A complicated tradeoff to make: can t correlate bad events S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

41 Robust MDPs (Cont ) Problem is solvable when disturbance is a product uncertainty: { disturbance = (P, R) : P U p (s), R } U r (s) s s Take the worst outcome for every state. Eventually solved by Nilim and El-Ghoui (OR, 2005) using DP (assuming regular uncertainty sets). If really bad events are to be included: must make U p and U r huge conservativeness A complicated tradeoff to make: can t correlate bad events S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

42 Uncertainty sets Convex sets large uncertainty. S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

43 Useful uncertainty sets I At most K disasters: { disturbance K = (P, R) : P s = P nom, R s = R nom except at most } s 1,.., s K where P si, R si ( P, R) A generative model: Every node get disturbed with probability µ Pr generative (K disturbances) = 1 tail of polynomial distribution K ( ) K = µ k (1 µ) K k k k=0 S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

44 Useful uncertainty sets I At most K disasters: { disturbance K = (P, R) : P s = P nom, R s = R nom except at most } s 1,.., s K where P si, R si ( P, R) A generative model: Every node get disturbed with probability µ Pr generative (K disturbances) = 1 tail of polynomial distribution K ( ) K = µ k (1 µ) K k k k=0 S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

45 Useful uncertainty sets II Continuous uncertainty budget" Suppose there is probability Pr dist ((P, r); (P nom, R nom )). A generative model: Every node get disturbed with a continuous probability Pr dist : Pr (disturbance) = ( Pr dist (Ps, R s ); (P s nom, R s nom ) )) generative s S disturbance λ = { (P, R) : Not convex Cannot solve Pr (disturbance) λ} generative S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

46 Useful uncertainty sets II Continuous uncertainty budget" Suppose there is probability Pr dist ((P, r); (P nom, R nom )). A generative model: Every node get disturbed with a continuous probability Pr dist : Pr (disturbance) = ( Pr dist (Ps, R s ); (P s nom, R s nom ) )) generative s S disturbance λ = { (P, R) : Not convex Cannot solve Pr (disturbance) λ} generative S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

47 Useful uncertainty sets II Continuous uncertainty budget" Suppose there is probability Pr dist ((P, r); (P nom, R nom )). A generative model: Every node get disturbed with a continuous probability Pr dist : Pr (disturbance) = ( Pr dist (Ps, R s ); (P s nom, R s nom ) )) generative s S disturbance λ = { (P, R) : Not convex Cannot solve Pr (disturbance) λ} generative S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

48 Useful uncertainty sets II Continuous uncertainty budget" Suppose there is probability Pr dist ((P, r); (P nom, R nom )). A generative model: Every node get disturbed with a continuous probability Pr dist : Pr (disturbance) = ( Pr dist (Ps, R s ); (P s nom, R s nom ) )) generative s S disturbance λ = { (P, R) : Not convex Cannot solve Pr (disturbance) λ} generative S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

49 State Augmentation Original state space: S New state space: S {0, 1,..., K } for discrete disasters New state space: S [0, λ] for uncertainty budget Recipe: Use your favorite algorithm with augmented state space. 1 Can use structural properties (monotonicity) in uncertainty. 2 For continuous uncertainty, can discretize efficiently. Caveat: must assume uncertainty is observed S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

50 State Augmentation Original state space: S New state space: S {0, 1,..., K } for discrete disasters New state space: S [0, λ] for uncertainty budget Recipe: Use your favorite algorithm with augmented state space. 1 Can use structural properties (monotonicity) in uncertainty. 2 For continuous uncertainty, can discretize efficiently. Caveat: must assume uncertainty is observed S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

51 State Augmentation Original state space: S New state space: S {0, 1,..., K } for discrete disasters New state space: S [0, λ] for uncertainty budget Recipe: Use your favorite algorithm with augmented state space. 1 Can use structural properties (monotonicity) in uncertainty. 2 For continuous uncertainty, can discretize efficiently. Caveat: must assume uncertainty is observed S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

52 Scaling-up robust MDPs Smallish robust MDPs are easy (robust or not) Non-robust MDPs can be handled with function approximation or policy search methods. What about robust MDPs? Difficulty 1: State space in many problems is very large Difficulty 2: Parameters of dynamics rarely known: hard to do Monte-Carlo S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

53 Scaling-up robust MDPs Smallish robust MDPs are easy (robust or not) Non-robust MDPs can be handled with function approximation or policy search methods. What about robust MDPs? Difficulty 1: State space in many problems is very large Difficulty 2: Parameters of dynamics rarely known: hard to do Monte-Carlo S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

54 Scaling-up robust MDPs Smallish robust MDPs are easy (robust or not) Non-robust MDPs can be handled with function approximation or policy search methods. What about robust MDPs? Difficulty 1: State space in many problems is very large Difficulty 2: Parameters of dynamics rarely known: hard to do Monte-Carlo S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

55 A crash course on ADP In ADP (Approximate Dynamic Programming) we want to solve the Bellman equation. (Policy evaluation): Write Bellman equation is J π = T π J π. T π y = r + γpy Assume: J = Φ w (Φ is known and given and w is a vector in, say, R n.) With approximation: Φ w = ΠT π Φ w (Π is a projection onto {Φw : w R n }) Not too hard to show that ΠT π is a contraction unique fixed point. S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

56 A crash course on ADP In ADP (Approximate Dynamic Programming) we want to solve the Bellman equation. (Policy evaluation): Write Bellman equation is J π = T π J π. T π y = r + γpy Assume: J = Φ w (Φ is known and given and w is a vector in, say, R n.) With approximation: Φ w = ΠT π Φ w (Π is a projection onto {Φw : w R n }) Not too hard to show that ΠT π is a contraction unique fixed point. S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

57 Back to robust MDPs In robust MDPs: J π (s) = r(s, π(s)) + γ inf E [ π J π (s ) s, π(s) ] P P Writing the pessimistic transition as σ π v = inf p P p v we can write: Not linear anymore! T π y = r + γσ π (y) Can show that Tv = sup π T π v is a contraction [Iyengar, 2005] S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

58 Solving Robust MDPs Fixed point equation (writing J = Φ w): Φ w = ΠT π Φ w ( ) In general: no solution. But: If for some β < 1 we have that P(s s, π(s)) β γ ˆP(s s, π(s)) for all P (ˆP is the conditional according to which data are observed.) Then: 1 ΠT π is a contraction and we have a unique solution to (*). 2 This solution is close to the optimal one. S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

59 Solving Robust MDPs Fixed point equation (writing J = Φ w): Φ w = ΠT π Φ w ( ) In general: no solution. But: If for some β < 1 we have that P(s s, π(s)) β γ ˆP(s s, π(s)) for all P (ˆP is the conditional according to which data are observed.) Then: 1 ΠT π is a contraction and we have a unique solution to (*). 2 This solution is close to the optimal one. S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

60 Solving Robust MDPs (cont ) Value iteration: Solution: Three terms to control: Φ w k+1 = ΠT π Φ w k ( ) ( ) w k+1 = (Φ DΦ) 1 Φ T Dr + γφ Dσ π (Φw k ) Φ DΦ 1 N N φ(s t )φ(st ), t=1 Φ Dr 1 N N φ(s t )r(s t, π(s t )) t=1 and Φ Dσ π (Φw k ) 1 N N φ(s t )σ P (Φw k ) t=1 But how to we compute the pessimistic σ P (Φw k )? S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

61 Solving the inner problem Pessimizing oracle: inf p P s is reachable p(s)φ(s) w k Can be solved when P(s, a) = {p : dist(p, ˆp) ɛ, and p is a probability} and some other lucky cases. So we know how to find" the value function of a given policy. Finding optimal policy: SARSA: 1 Define φ(s, a) instead of just φ(s) 2 Let π w = arg max a φ(s, a) w. 3 Do value iteration in the new state-action problem. [Tamar M Xu, ICML 2014] S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

62 Solving the inner problem Pessimizing oracle: inf p P s is reachable p(s)φ(s) w k Can be solved when P(s, a) = {p : dist(p, ˆp) ɛ, and p is a probability} and some other lucky cases. So we know how to find" the value function of a given policy. Finding optimal policy: SARSA: 1 Define φ(s, a) instead of just φ(s) 2 Let π w = arg max a φ(s, a) w. 3 Do value iteration in the new state-action problem. [Tamar M Xu, ICML 2014] S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

63 Example from option trading Trading an option: S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

64 Back to motivating examples I Unit commitment problem (day-ahead/few days ahead): Classical approach is N/N + 1 Renewables (wind/solar) and demand-response cause much stochasticity Modeling is hard (complex internal state, dependencies, high-dimensionality) Rare events are essential: need to robustify. S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

65 Back to motivating examples II Marketing problem: Our model is quite bad to start with Robust optimization fights you model mismatch Uncertainty set construction is mysterious (but is it important?) Main issue: Validation Secondary issue: opimization S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

66 Back to motivating examples II Marketing problem: Our model is quite bad to start with Robust optimization fights you model mismatch Uncertainty set construction is mysterious (but is it important?) Main issue: Validation Secondary issue: opimization S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

67 Conclusions Model misspecification parameter uncertainty robust approach Why uncertainty matters? Models come from data, data are a bitch Reward comes from data or humans, this is even worse Outlook: Contextual parameters Latent models [Bandits: MaillardM ICML 2014] Policy evaluation Model validation S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

68 Conclusions Model misspecification parameter uncertainty robust approach Why uncertainty matters? Models come from data, data are a bitch Reward comes from data or humans, this is even worse Outlook: Contextual parameters Latent models [Bandits: MaillardM ICML 2014] Policy evaluation Model validation S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

69 Conclusions Model misspecification parameter uncertainty robust approach Why uncertainty matters? Models come from data, data are a bitch Reward comes from data or humans, this is even worse Outlook: Contextual parameters Latent models [Bandits: MaillardM ICML 2014] Policy evaluation Model validation S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

Practicable Robust Markov Decision Processes

Practicable Robust Markov Decision Processes Huan Xu Department of Mechanical Engineering National University of Singapore Joint work with Shiau-Hong Lim (IBM), Shie Mannor (Techion), Ofir Mebel (Apple)