Planning and Model Selection in Data Driven Markov models

Size: px
Start display at page:

Download "Planning and Model Selection in Data Driven Markov models"

Transcription

1 Planning and Model Selection in Data Driven Markov models Shie Mannor Department of Electrical Engineering Technion Joint work with many people along the way: Dotan Di-Castro (Yahoo!), Assaf Halak (Technion), Ofir Mebel (Apple), Aviv Tamar (Technion), John Tsitsiklis (MIT), Huan Xu (NUS), and many others. ICML, June 2014 S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsicml, June / 36

2 Classical planning problems We typically want to maximize the expected average/discounted reward In planning: Model is known A single scalar reward We believe the model to be true, even though it is not... S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsicml, June / 36

3 Classical planning problems We typically want to maximize the expected average/discounted reward In planning: Model is known A single scalar reward We believe the model to be true, even though it is not... S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsicml, June / 36

4 Classical planning problems We typically want to maximize the expected average/discounted reward In planning: Model is known A single scalar reward We believe the model to be true, even though it is not... S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsicml, June / 36

5 Flavors of uncertainty Natural" uncertainty: random transitions/rewards classical MDP planning. Not this talk. Deterministic uncertainty in the parameters Robust MDPs. 1/2 of this talk. Probabilistic uncertainty in the parameters Bayesian RL/MDPs Not this talk (but a book is coming!). Model uncertainty: model is not known second 1/2 of this talk. S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsicml, June / 36

6 Flavors of uncertainty Natural" uncertainty: random transitions/rewards classical MDP planning. Not this talk. Deterministic uncertainty in the parameters Robust MDPs. 1/2 of this talk. Probabilistic uncertainty in the parameters Bayesian RL/MDPs Not this talk (but a book is coming!). Model uncertainty: model is not known second 1/2 of this talk. S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsicml, June / 36

7 Flavors of uncertainty Natural" uncertainty: random transitions/rewards classical MDP planning. Not this talk. Deterministic uncertainty in the parameters Robust MDPs. 1/2 of this talk. Probabilistic uncertainty in the parameters Bayesian RL/MDPs Not this talk (but a book is coming!). Model uncertainty: model is not known second 1/2 of this talk. S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsicml, June / 36

8 Flavors of uncertainty Natural" uncertainty: random transitions/rewards classical MDP planning. Not this talk. Deterministic uncertainty in the parameters Robust MDPs. 1/2 of this talk. Probabilistic uncertainty in the parameters Bayesian RL/MDPs Not this talk (but a book is coming!). Model uncertainty: model is not known second 1/2 of this talk. S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsicml, June / 36

9 Motivation I: Power grid Unit commitment scheduling problem (very dynamic) Objective: guarantee reliability (99.9%) Model is kind of known, but highly stochastic. Failures (outages) do happen! S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsicml, June / 36

10 Motivation I: Power grid Unit commitment scheduling problem (very dynamic) Objective: guarantee reliability (99.9%) Model is kind of known, but highly stochastic. Failures (outages) do happen! S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsicml, June / 36

11 Motivation II Large US retailer (Fortune 500 company) Marketing problem: send or not send coupon/invitation/mail order catalogue Common wisdom: per customer look at RFM Recency, Frequency, Monetary value Dynamics matter How to discretize? S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsicml, June / 36

12 Motivation II Large US retailer (Fortune 500 company) Marketing problem: send or not send coupon/invitation/mail order catalogue Common wisdom: per customer look at RFM Recency, Frequency, Monetary value Dynamics matter How to discretize? S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsicml, June / 36

13 Motivation II Large US retailer (Fortune 500 company) Marketing problem: send or not send coupon/invitation/mail order catalogue Common wisdom: per customer look at RFM Recency, Frequency, Monetary value Dynamics matter How to discretize? S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsicml, June / 36

14 Motivation II Large US retailer (Fortune 500 company) Marketing problem: send or not send coupon/invitation/mail order catalogue Common wisdom: per customer look at RFM Recency, Frequency, Monetary value Dynamics matter How to discretize? S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsicml, June / 36

15 Common to the problems Real state space is huge with lots of uncertainty and parameters Batch data are available Reasonable simulators, but not ground truth Computational speed less of an issue, but still need to get things done Important low probability events Uncertainty and (model) risk are THE concern S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsicml, June / 36

16 Common to the problems Real state space is huge with lots of uncertainty and parameters Batch data are available Reasonable simulators, but not ground truth Computational speed less of an issue, but still need to get things done Important low probability events Uncertainty and (model) risk are THE concern S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsicml, June / 36

17 Common to the problems Real state space is huge with lots of uncertainty and parameters Batch data are available Reasonable simulators, but not ground truth Computational speed less of an issue, but still need to get things done Important low probability events Uncertainty and (model) risk are THE concern S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsicml, June / 36

18 This talk How to choose the right" model based on available data. How to optimize when the model is not (fully) known? S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsicml, June / 36

19 Part I: The Model Selection Problem We will focus on a simpler problem: 1 Ignore action completely (MRP). We have: State reward next state. 2 We observe a sequence of T observations and rewards that occur in some space O R (O is complicated) D(T ) = (o 1, r 1, o 2, r 2,..., o T, r T ). 3 We are given K mappings from O to states spaces S 1,..., S K, belong to MRPs M 1,..., M K, respectively. Each mapping H i : O S i describes a model where S i = {x (i) 1,..., x (i) S i } is the state space of the MRP M i. 4 We do not describe how the mappings {H i } K i=1 are constructed. [HalakM, KDD 2013] S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsicml, June / 36

20 The Identification Problem A model selection criterion takes as input D T and the models M 1,..., M k, and it returns one of the k models as the proposed true model. Definition: A model selection criterion is weakly consistent if ( ) P i ˆM(DT ) i 0 as n, where P i is the probability induced when model i is the correct model. S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsicml, June / 36

21 Penalized Likelihood Criteria In abstraction: data samples y 1, y 2,..., y T. L i (T ) = max{log P(y 1,..., y T M i (θ))}. θ We denote the dimension of θ by M i. Then, an MDL model estimator has the following structure MDL(i) M i f (T ) L i (T ), where f (T ) is some sub-linear function. Many related criteria: AIC, BIC, and many others. S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

22 Impossibility result Theorem: There does not exist a consistent MDL-like criterion. S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

23 Identifying Markov Reward Processes We look at the aggregate prediction error. Two types of aggregations: 1 Reward aggregation 2 Transition probability aggregation We will focus on refined models: M 1 M 2 is M 2 is a refinement of M 1. S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

24 Reward Aggregation Define reward mean square error (RMSE) operator to be L i RMSE (D T ) = 1 T ɛ(xj i ), j S i where ɛ(xj i ) is the error in state j in model i of the reward estimate. Observation: lim T L i RMSE (D T ) = x S i π(x)var(x). Lemma: Suppose M i contains M k. Then, for a single trajectory D T we have L i RMSE (D T ) L k RMSE (D T ). Moreover, if the states aggregated in M i are with different mean rewards, then the inequality is sharp. Corollary: Consider a series of refined models M 1... M k. Then, L 1 RMSE (D T ) L 2 RMSE (D T )... L k RMSE (D T ). S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

25 Reward Aggregation Score The (reward) score for the j-th model to be ˆM(j) = M j f (T ) T + Lj RMSE (D T ), (1) where f (T ) is a sub-linear increasing function with lim T f (T )/ T. Based on the RMSE, we consider the following model selector { } ˆM RMSE = arg min ˆM(j). j Theorem: The model selector ˆM MSE is weakly consistent. Finite time analysis gives rates. S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

26 Experiments with artificial data The figure reports the test statistic ˆM(k) for different model dimensions k. The error bars are one standard deviation from the mean. S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

27 Experiments with artificial data The test statistic ÂIC(k)/ˆBIC(k) for different model dimensions k. S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

28 Experiments with real data Large US apparel retailer. RFM measures: Recency, Frequency and Monetary value Problem: How to aggregate? Focus on recency 1 Randomly 2 Most recent 3 Least recent S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

29 Real data (random) (lowest) (highest) Each graph is for a different value of f (T )/T : blue =1, green = 10, red = 50. S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

30 Mini-conclusion A very special model selection problem. Standard approaches fail - but not all is lost How to aggregate? Mismatched models? MRP MDP: for optimization and for model discovery S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

31 Mini-conclusion A very special model selection problem. Standard approaches fail - but not all is lost How to aggregate? Mismatched models? MRP MDP: for optimization and for model discovery S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

32 Part II: Robust MDPs When in doubt assume the worse Model = {S, P, R, A} Assume: S and A are known and given P P and R R are not known. Look for a policy with best worst-case performance. S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

33 Part II: Robust MDPs When in doubt assume the worse Model = {S, P, R, A} Assume: S and A are known and given P P and R R are not known. Look for a policy with best worst-case performance. S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

34 Part II: Robust MDPs When in doubt assume the worse Model = {S, P, R, A} Assume: S and A are known and given P P and R R are not known. Look for a policy with best worst-case performance. S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

35 Robust MDPs Problem becomes: ( ) max policy Game against nature min disturbance E policy, disturbance In general: problem is NP-hard [ ] γ t r t Nice (non robust) generative probabilistic interpretation: If Pr gen (disturbance) 1 δ, then ( ) approximates percentile optimization Can get generalization bounds: Robustness = generalization t S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

36 Robust MDPs Problem becomes: ( ) max policy Game against nature min disturbance E policy, disturbance In general: problem is NP-hard [ ] γ t r t Nice (non robust) generative probabilistic interpretation: If Pr gen (disturbance) 1 δ, then ( ) approximates percentile optimization Can get generalization bounds: Robustness = generalization t S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

37 Robust MDPs Problem becomes: ( ) max policy Game against nature min disturbance E policy, disturbance In general: problem is NP-hard [ ] γ t r t Nice (non robust) generative probabilistic interpretation: If Pr gen (disturbance) 1 δ, then ( ) approximates percentile optimization Can get generalization bounds: Robustness = generalization t S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

38 Robust MDPs Problem becomes: ( ) max policy Game against nature min disturbance E policy, disturbance In general: problem is NP-hard [ ] γ t r t Nice (non robust) generative probabilistic interpretation: If Pr gen (disturbance) 1 δ, then ( ) approximates percentile optimization Can get generalization bounds: Robustness = generalization t S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

39 Robust MDPs (Cont ) Problem is solvable when disturbance is a product uncertainty: { disturbance = (P, R) : P U p (s), R } U r (s) s s Take the worst outcome for every state. Eventually solved by Nilim and El-Ghoui (OR, 2005) using DP (assuming regular uncertainty sets). If really bad events are to be included: must make U p and U r huge conservativeness A complicated tradeoff to make: can t correlate bad events S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

40 Robust MDPs (Cont ) Problem is solvable when disturbance is a product uncertainty: { disturbance = (P, R) : P U p (s), R } U r (s) s s Take the worst outcome for every state. Eventually solved by Nilim and El-Ghoui (OR, 2005) using DP (assuming regular uncertainty sets). If really bad events are to be included: must make U p and U r huge conservativeness A complicated tradeoff to make: can t correlate bad events S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

41 Robust MDPs (Cont ) Problem is solvable when disturbance is a product uncertainty: { disturbance = (P, R) : P U p (s), R } U r (s) s s Take the worst outcome for every state. Eventually solved by Nilim and El-Ghoui (OR, 2005) using DP (assuming regular uncertainty sets). If really bad events are to be included: must make U p and U r huge conservativeness A complicated tradeoff to make: can t correlate bad events S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

42 Uncertainty sets Convex sets large uncertainty. S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

43 Useful uncertainty sets I At most K disasters: { disturbance K = (P, R) : P s = P nom, R s = R nom except at most } s 1,.., s K where P si, R si ( P, R) A generative model: Every node get disturbed with probability µ Pr generative (K disturbances) = 1 tail of polynomial distribution K ( ) K = µ k (1 µ) K k k k=0 S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

44 Useful uncertainty sets I At most K disasters: { disturbance K = (P, R) : P s = P nom, R s = R nom except at most } s 1,.., s K where P si, R si ( P, R) A generative model: Every node get disturbed with probability µ Pr generative (K disturbances) = 1 tail of polynomial distribution K ( ) K = µ k (1 µ) K k k k=0 S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

45 Useful uncertainty sets II Continuous uncertainty budget" Suppose there is probability Pr dist ((P, r); (P nom, R nom )). A generative model: Every node get disturbed with a continuous probability Pr dist : Pr (disturbance) = ( Pr dist (Ps, R s ); (P s nom, R s nom ) )) generative s S disturbance λ = { (P, R) : Not convex Cannot solve Pr (disturbance) λ} generative S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

46 Useful uncertainty sets II Continuous uncertainty budget" Suppose there is probability Pr dist ((P, r); (P nom, R nom )). A generative model: Every node get disturbed with a continuous probability Pr dist : Pr (disturbance) = ( Pr dist (Ps, R s ); (P s nom, R s nom ) )) generative s S disturbance λ = { (P, R) : Not convex Cannot solve Pr (disturbance) λ} generative S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

47 Useful uncertainty sets II Continuous uncertainty budget" Suppose there is probability Pr dist ((P, r); (P nom, R nom )). A generative model: Every node get disturbed with a continuous probability Pr dist : Pr (disturbance) = ( Pr dist (Ps, R s ); (P s nom, R s nom ) )) generative s S disturbance λ = { (P, R) : Not convex Cannot solve Pr (disturbance) λ} generative S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

48 Useful uncertainty sets II Continuous uncertainty budget" Suppose there is probability Pr dist ((P, r); (P nom, R nom )). A generative model: Every node get disturbed with a continuous probability Pr dist : Pr (disturbance) = ( Pr dist (Ps, R s ); (P s nom, R s nom ) )) generative s S disturbance λ = { (P, R) : Not convex Cannot solve Pr (disturbance) λ} generative S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

49 State Augmentation Original state space: S New state space: S {0, 1,..., K } for discrete disasters New state space: S [0, λ] for uncertainty budget Recipe: Use your favorite algorithm with augmented state space. 1 Can use structural properties (monotonicity) in uncertainty. 2 For continuous uncertainty, can discretize efficiently. Caveat: must assume uncertainty is observed S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

50 State Augmentation Original state space: S New state space: S {0, 1,..., K } for discrete disasters New state space: S [0, λ] for uncertainty budget Recipe: Use your favorite algorithm with augmented state space. 1 Can use structural properties (monotonicity) in uncertainty. 2 For continuous uncertainty, can discretize efficiently. Caveat: must assume uncertainty is observed S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

51 State Augmentation Original state space: S New state space: S {0, 1,..., K } for discrete disasters New state space: S [0, λ] for uncertainty budget Recipe: Use your favorite algorithm with augmented state space. 1 Can use structural properties (monotonicity) in uncertainty. 2 For continuous uncertainty, can discretize efficiently. Caveat: must assume uncertainty is observed S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

52 Scaling-up robust MDPs Smallish robust MDPs are easy (robust or not) Non-robust MDPs can be handled with function approximation or policy search methods. What about robust MDPs? Difficulty 1: State space in many problems is very large Difficulty 2: Parameters of dynamics rarely known: hard to do Monte-Carlo S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

53 Scaling-up robust MDPs Smallish robust MDPs are easy (robust or not) Non-robust MDPs can be handled with function approximation or policy search methods. What about robust MDPs? Difficulty 1: State space in many problems is very large Difficulty 2: Parameters of dynamics rarely known: hard to do Monte-Carlo S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

54 Scaling-up robust MDPs Smallish robust MDPs are easy (robust or not) Non-robust MDPs can be handled with function approximation or policy search methods. What about robust MDPs? Difficulty 1: State space in many problems is very large Difficulty 2: Parameters of dynamics rarely known: hard to do Monte-Carlo S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

55 A crash course on ADP In ADP (Approximate Dynamic Programming) we want to solve the Bellman equation. (Policy evaluation): Write Bellman equation is J π = T π J π. T π y = r + γpy Assume: J = Φ w (Φ is known and given and w is a vector in, say, R n.) With approximation: Φ w = ΠT π Φ w (Π is a projection onto {Φw : w R n }) Not too hard to show that ΠT π is a contraction unique fixed point. S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

56 A crash course on ADP In ADP (Approximate Dynamic Programming) we want to solve the Bellman equation. (Policy evaluation): Write Bellman equation is J π = T π J π. T π y = r + γpy Assume: J = Φ w (Φ is known and given and w is a vector in, say, R n.) With approximation: Φ w = ΠT π Φ w (Π is a projection onto {Φw : w R n }) Not too hard to show that ΠT π is a contraction unique fixed point. S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

57 Back to robust MDPs In robust MDPs: J π (s) = r(s, π(s)) + γ inf E [ π J π (s ) s, π(s) ] P P Writing the pessimistic transition as σ π v = inf p P p v we can write: Not linear anymore! T π y = r + γσ π (y) Can show that Tv = sup π T π v is a contraction [Iyengar, 2005] S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

58 Solving Robust MDPs Fixed point equation (writing J = Φ w): Φ w = ΠT π Φ w ( ) In general: no solution. But: If for some β < 1 we have that P(s s, π(s)) β γ ˆP(s s, π(s)) for all P (ˆP is the conditional according to which data are observed.) Then: 1 ΠT π is a contraction and we have a unique solution to (*). 2 This solution is close to the optimal one. S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

59 Solving Robust MDPs Fixed point equation (writing J = Φ w): Φ w = ΠT π Φ w ( ) In general: no solution. But: If for some β < 1 we have that P(s s, π(s)) β γ ˆP(s s, π(s)) for all P (ˆP is the conditional according to which data are observed.) Then: 1 ΠT π is a contraction and we have a unique solution to (*). 2 This solution is close to the optimal one. S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

60 Solving Robust MDPs (cont ) Value iteration: Solution: Three terms to control: Φ w k+1 = ΠT π Φ w k ( ) ( ) w k+1 = (Φ DΦ) 1 Φ T Dr + γφ Dσ π (Φw k ) Φ DΦ 1 N N φ(s t )φ(st ), t=1 Φ Dr 1 N N φ(s t )r(s t, π(s t )) t=1 and Φ Dσ π (Φw k ) 1 N N φ(s t )σ P (Φw k ) t=1 But how to we compute the pessimistic σ P (Φw k )? S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

61 Solving the inner problem Pessimizing oracle: inf p P s is reachable p(s)φ(s) w k Can be solved when P(s, a) = {p : dist(p, ˆp) ɛ, and p is a probability} and some other lucky cases. So we know how to find" the value function of a given policy. Finding optimal policy: SARSA: 1 Define φ(s, a) instead of just φ(s) 2 Let π w = arg max a φ(s, a) w. 3 Do value iteration in the new state-action problem. [Tamar M Xu, ICML 2014] S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

62 Solving the inner problem Pessimizing oracle: inf p P s is reachable p(s)φ(s) w k Can be solved when P(s, a) = {p : dist(p, ˆp) ɛ, and p is a probability} and some other lucky cases. So we know how to find" the value function of a given policy. Finding optimal policy: SARSA: 1 Define φ(s, a) instead of just φ(s) 2 Let π w = arg max a φ(s, a) w. 3 Do value iteration in the new state-action problem. [Tamar M Xu, ICML 2014] S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

63 Example from option trading Trading an option: S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

64 Back to motivating examples I Unit commitment problem (day-ahead/few days ahead): Classical approach is N/N + 1 Renewables (wind/solar) and demand-response cause much stochasticity Modeling is hard (complex internal state, dependencies, high-dimensionality) Rare events are essential: need to robustify. S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

65 Back to motivating examples II Marketing problem: Our model is quite bad to start with Robust optimization fights you model mismatch Uncertainty set construction is mysterious (but is it important?) Main issue: Validation Secondary issue: opimization S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

66 Back to motivating examples II Marketing problem: Our model is quite bad to start with Robust optimization fights you model mismatch Uncertainty set construction is mysterious (but is it important?) Main issue: Validation Secondary issue: opimization S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

67 Conclusions Model misspecification parameter uncertainty robust approach Why uncertainty matters? Models come from data, data are a bitch Reward comes from data or humans, this is even worse Outlook: Contextual parameters Latent models [Bandits: MaillardM ICML 2014] Policy evaluation Model validation S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

68 Conclusions Model misspecification parameter uncertainty robust approach Why uncertainty matters? Models come from data, data are a bitch Reward comes from data or humans, this is even worse Outlook: Contextual parameters Latent models [Bandits: MaillardM ICML 2014] Policy evaluation Model validation S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

69 Conclusions Model misspecification parameter uncertainty robust approach Why uncertainty matters? Models come from data, data are a bitch Reward comes from data or humans, this is even worse Outlook: Contextual parameters Latent models [Bandits: MaillardM ICML 2014] Policy evaluation Model validation S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June / 36

Practicable Robust Markov Decision Processes

Practicable Robust Markov Decision Processes Practicable Robust Markov Decision Processes Huan Xu Department of Mechanical Engineering National University of Singapore Joint work with Shiau-Hong Lim (IBM), Shie Mannor (Techion), Ofir Mebel (Apple)

More information

Lightning Does Not Strike Twice: Robust MDPs with Coupled Uncertainty

Lightning Does Not Strike Twice: Robust MDPs with Coupled Uncertainty JMLR: Workshop and Conference Proceedings vol (212) 1 12 European Workshop on Reinforcement Learning Lightning Does Not Strike Twice: Robust MDPs with Coupled Uncertainty Shie Mannor Technion Ofir Mebel

More information

Lightning Does Not Strike Twice: Robust MDPs with Coupled Uncertainty

Lightning Does Not Strike Twice: Robust MDPs with Coupled Uncertainty Lightning Does Not Strike Twice: Robust MDPs with Coupled Uncertainty Shie Mannor shie@ee.technion.ac.il Department of Electrical Engineering, Technion, Israel Ofir Mebel ofirmebel@gmail.com Department

More information

Preference Elicitation for Sequential Decision Problems

Preference Elicitation for Sequential Decision Problems Preference Elicitation for Sequential Decision Problems Kevin Regan University of Toronto Introduction 2 Motivation Focus: Computational approaches to sequential decision making under uncertainty These

More information

CMU Lecture 12: Reinforcement Learning. Teacher: Gianni A. Di Caro

CMU Lecture 12: Reinforcement Learning. Teacher: Gianni A. Di Caro CMU 15-781 Lecture 12: Reinforcement Learning Teacher: Gianni A. Di Caro REINFORCEMENT LEARNING Transition Model? State Action Reward model? Agent Goal: Maximize expected sum of future rewards 2 MDP PLANNING

More information

Lecture notes for Analysis of Algorithms : Markov decision processes

Lecture notes for Analysis of Algorithms : Markov decision processes Lecture notes for Analysis of Algorithms : Markov decision processes Lecturer: Thomas Dueholm Hansen June 6, 013 Abstract We give an introduction to infinite-horizon Markov decision processes (MDPs) with

More information

Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm

Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm Michail G. Lagoudakis Department of Computer Science Duke University Durham, NC 2778 mgl@cs.duke.edu

More information

Probabilistic Planning. George Konidaris

Probabilistic Planning. George Konidaris Probabilistic Planning George Konidaris gdk@cs.brown.edu Fall 2017 The Planning Problem Finding a sequence of actions to achieve some goal. Plans It s great when a plan just works but the world doesn t

More information

Distributed Optimization. Song Chong EE, KAIST

Distributed Optimization. Song Chong EE, KAIST Distributed Optimization Song Chong EE, KAIST songchong@kaist.edu Dynamic Programming for Path Planning A path-planning problem consists of a weighted directed graph with a set of n nodes N, directed links

More information

CSE250A Fall 12: Discussion Week 9

CSE250A Fall 12: Discussion Week 9 CSE250A Fall 12: Discussion Week 9 Aditya Menon (akmenon@ucsd.edu) December 4, 2012 1 Schedule for today Recap of Markov Decision Processes. Examples: slot machines and maze traversal. Planning and learning.

More information

MDP Preliminaries. Nan Jiang. February 10, 2019

MDP Preliminaries. Nan Jiang. February 10, 2019 MDP Preliminaries Nan Jiang February 10, 2019 1 Markov Decision Processes In reinforcement learning, the interactions between the agent and the environment are often described by a Markov Decision Process

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Dipendra Misra Cornell University dkm@cs.cornell.edu https://dipendramisra.wordpress.com/ Task Grasp the green cup. Output: Sequence of controller actions Setup from Lenz et. al.

More information

Motivation for introducing probabilities

Motivation for introducing probabilities for introducing probabilities Reaching the goals is often not sufficient: it is important that the expected costs do not outweigh the benefit of reaching the goals. 1 Objective: maximize benefits - costs.

More information

Proper Security Criteria Determination in a Power System with High Penetration of Renewable Resources

Proper Security Criteria Determination in a Power System with High Penetration of Renewable Resources Proper Security Criteria Determination in a Power System with High Penetration of Renewable Resources Mojgan Hedayati, Kory Hedman, and Junshan Zhang School of Electrical, Computer, and Energy Engineering

More information

Reinforcement Learning II

Reinforcement Learning II Reinforcement Learning II Andrea Bonarini Artificial Intelligence and Robotics Lab Department of Electronics and Information Politecnico di Milano E-mail: bonarini@elet.polimi.it URL:http://www.dei.polimi.it/people/bonarini

More information

Markov Decision Processes and Dynamic Programming

Markov Decision Processes and Dynamic Programming Markov Decision Processes and Dynamic Programming A. LAZARIC (SequeL Team @INRIA-Lille) Ecole Centrale - Option DAD SequeL INRIA Lille EC-RL Course In This Lecture A. LAZARIC Markov Decision Processes

More information

Reductions Of Undiscounted Markov Decision Processes and Stochastic Games To Discounted Ones. Jefferson Huang

Reductions Of Undiscounted Markov Decision Processes and Stochastic Games To Discounted Ones. Jefferson Huang Reductions Of Undiscounted Markov Decision Processes and Stochastic Games To Discounted Ones Jefferson Huang School of Operations Research and Information Engineering Cornell University November 16, 2016

More information

Chapter 3: The Reinforcement Learning Problem

Chapter 3: The Reinforcement Learning Problem Chapter 3: The Reinforcement Learning Problem Objectives of this chapter: describe the RL problem we will be studying for the remainder of the course present idealized form of the RL problem for which

More information

Basics of reinforcement learning

Basics of reinforcement learning Basics of reinforcement learning Lucian Buşoniu TMLSS, 20 July 2018 Main idea of reinforcement learning (RL) Learn a sequential decision policy to optimize the cumulative performance of an unknown system

More information

Estimation and Optimization: Gaps and Bridges. MURI Meeting June 20, Laurent El Ghaoui. UC Berkeley EECS

Estimation and Optimization: Gaps and Bridges. MURI Meeting June 20, Laurent El Ghaoui. UC Berkeley EECS MURI Meeting June 20, 2001 Estimation and Optimization: Gaps and Bridges Laurent El Ghaoui EECS UC Berkeley 1 goals currently, estimation (of model parameters) and optimization (of decision variables)

More information

Markov Decision Processes

Markov Decision Processes Markov Decision Processes Noel Welsh 11 November 2010 Noel Welsh () Markov Decision Processes 11 November 2010 1 / 30 Annoucements Applicant visitor day seeks robot demonstrators for exciting half hour

More information

Internet Monetization

Internet Monetization Internet Monetization March May, 2013 Discrete time Finite A decision process (MDP) is reward process with decisions. It models an environment in which all states are and time is divided into stages. Definition

More information

Reinforcement Learning as Classification Leveraging Modern Classifiers

Reinforcement Learning as Classification Leveraging Modern Classifiers Reinforcement Learning as Classification Leveraging Modern Classifiers Michail G. Lagoudakis and Ronald Parr Department of Computer Science Duke University Durham, NC 27708 Machine Learning Reductions

More information

Long-run Average Reward for Markov Decision Processes

Long-run Average Reward for Markov Decision Processes Long-run Average Reward for Markov Decision Processes Based on a paper at CAV 2017 Pranav Ashok 1, Krishnendu Chatterjee 2, Przemysław Daca 2, Jan Křetínský 1 and Tobias Meggendorfer 1 August 9, 2017 1

More information

CMU Lecture 11: Markov Decision Processes II. Teacher: Gianni A. Di Caro

CMU Lecture 11: Markov Decision Processes II. Teacher: Gianni A. Di Caro CMU 15-781 Lecture 11: Markov Decision Processes II Teacher: Gianni A. Di Caro RECAP: DEFINING MDPS Markov decision processes: o Set of states S o Start state s 0 o Set of actions A o Transitions P(s s,a)

More information

Reinforcement Learning. Yishay Mansour Tel-Aviv University

Reinforcement Learning. Yishay Mansour Tel-Aviv University Reinforcement Learning Yishay Mansour Tel-Aviv University 1 Reinforcement Learning: Course Information Classes: Wednesday Lecture 10-13 Yishay Mansour Recitations:14-15/15-16 Eliya Nachmani Adam Polyak

More information

Some AI Planning Problems

Some AI Planning Problems Course Logistics CS533: Intelligent Agents and Decision Making M, W, F: 1:00 1:50 Instructor: Alan Fern (KEC2071) Office hours: by appointment (see me after class or send email) Emailing me: include CS533

More information

Prof. Dr. Ann Nowé. Artificial Intelligence Lab ai.vub.ac.be

Prof. Dr. Ann Nowé. Artificial Intelligence Lab ai.vub.ac.be REINFORCEMENT LEARNING AN INTRODUCTION Prof. Dr. Ann Nowé Artificial Intelligence Lab ai.vub.ac.be REINFORCEMENT LEARNING WHAT IS IT? What is it? Learning from interaction Learning about, from, and while

More information

Robust Modified Policy Iteration

Robust Modified Policy Iteration Robust Modified Policy Iteration David L. Kaufman Department of Industrial and Operations Engineering, University of Michigan 1205 Beal Avenue, Ann Arbor, MI 48109, USA 8davidlk8umich.edu (remove 8s) Andrew

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning March May, 2013 Schedule Update Introduction 03/13/2015 (10:15-12:15) Sala conferenze MDPs 03/18/2015 (10:15-12:15) Sala conferenze Solving MDPs 03/20/2015 (10:15-12:15) Aula Alpha

More information

CS 7180: Behavioral Modeling and Decisionmaking

CS 7180: Behavioral Modeling and Decisionmaking CS 7180: Behavioral Modeling and Decisionmaking in AI Markov Decision Processes for Complex Decisionmaking Prof. Amy Sliva October 17, 2012 Decisions are nondeterministic In many situations, behavior and

More information

Decision Theory: Markov Decision Processes

Decision Theory: Markov Decision Processes Decision Theory: Markov Decision Processes CPSC 322 Lecture 33 March 31, 2006 Textbook 12.5 Decision Theory: Markov Decision Processes CPSC 322 Lecture 33, Slide 1 Lecture Overview Recap Rewards and Policies

More information

Lecture 1: March 7, 2018

Lecture 1: March 7, 2018 Reinforcement Learning Spring Semester, 2017/8 Lecture 1: March 7, 2018 Lecturer: Yishay Mansour Scribe: ym DISCLAIMER: Based on Learning and Planning in Dynamical Systems by Shie Mannor c, all rights

More information

Robust Optimization for Risk Control in Enterprise-wide Optimization

Robust Optimization for Risk Control in Enterprise-wide Optimization Robust Optimization for Risk Control in Enterprise-wide Optimization Juan Pablo Vielma Department of Industrial Engineering University of Pittsburgh EWO Seminar, 011 Pittsburgh, PA Uncertainty in Optimization

More information

MS&E338 Reinforcement Learning Lecture 1 - April 2, Introduction

MS&E338 Reinforcement Learning Lecture 1 - April 2, Introduction MS&E338 Reinforcement Learning Lecture 1 - April 2, 2018 Introduction Lecturer: Ben Van Roy Scribe: Gabriel Maher 1 Reinforcement Learning Introduction In reinforcement learning (RL) we consider an agent

More information

Planning in Markov Decision Processes

Planning in Markov Decision Processes Carnegie Mellon School of Computer Science Deep Reinforcement Learning and Control Planning in Markov Decision Processes Lecture 3, CMU 10703 Katerina Fragkiadaki Markov Decision Process (MDP) A Markov

More information

Learning Exploration/Exploitation Strategies for Single Trajectory Reinforcement Learning

Learning Exploration/Exploitation Strategies for Single Trajectory Reinforcement Learning JMLR: Workshop and Conference Proceedings vol:1 8, 2012 10th European Workshop on Reinforcement Learning Learning Exploration/Exploitation Strategies for Single Trajectory Reinforcement Learning Michael

More information

Central-limit approach to risk-aware Markov decision processes

Central-limit approach to risk-aware Markov decision processes Central-limit approach to risk-aware Markov decision processes Jia Yuan Yu Concordia University November 27, 2015 Joint work with Pengqian Yu and Huan Xu. Inventory Management 1 1 Look at current inventory

More information

INF 5860 Machine learning for image classification. Lecture 14: Reinforcement learning May 9, 2018

INF 5860 Machine learning for image classification. Lecture 14: Reinforcement learning May 9, 2018 Machine learning for image classification Lecture 14: Reinforcement learning May 9, 2018 Page 3 Outline Motivation Introduction to reinforcement learning (RL) Value function based methods (Q-learning)

More information

Markov Decision Processes and Solving Finite Problems. February 8, 2017

Markov Decision Processes and Solving Finite Problems. February 8, 2017 Markov Decision Processes and Solving Finite Problems February 8, 2017 Overview of Upcoming Lectures Feb 8: Markov decision processes, value iteration, policy iteration Feb 13: Policy gradients Feb 15:

More information

MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti

MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti 1 MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti Historical background 2 Original motivation: animal learning Early

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Function approximation Mario Martin CS-UPC May 18, 2018 Mario Martin (CS-UPC) Reinforcement Learning May 18, 2018 / 65 Recap Algorithms: MonteCarlo methods for Policy Evaluation

More information

Reinforcement learning an introduction

Reinforcement learning an introduction Reinforcement learning an introduction Prof. Dr. Ann Nowé Computational Modeling Group AIlab ai.vub.ac.be November 2013 Reinforcement Learning What is it? Learning from interaction Learning about, from,

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Model-Based Reinforcement Learning Model-based, PAC-MDP, sample complexity, exploration/exploitation, RMAX, E3, Bayes-optimal, Bayesian RL, model learning Vien Ngo MLR, University

More information

Policy Gradients with Variance Related Risk Criteria

Policy Gradients with Variance Related Risk Criteria Aviv Tamar avivt@tx.technion.ac.il Dotan Di Castro dot@tx.technion.ac.il Shie Mannor shie@ee.technion.ac.il Department of Electrical Engineering, The Technion - Israel Institute of Technology, Haifa, Israel

More information

Course 16:198:520: Introduction To Artificial Intelligence Lecture 13. Decision Making. Abdeslam Boularias. Wednesday, December 7, 2016

Course 16:198:520: Introduction To Artificial Intelligence Lecture 13. Decision Making. Abdeslam Boularias. Wednesday, December 7, 2016 Course 16:198:520: Introduction To Artificial Intelligence Lecture 13 Decision Making Abdeslam Boularias Wednesday, December 7, 2016 1 / 45 Overview We consider probabilistic temporal models where the

More information

Reinforcement Learning: the basics

Reinforcement Learning: the basics Reinforcement Learning: the basics Olivier Sigaud Université Pierre et Marie Curie, PARIS 6 http://people.isir.upmc.fr/sigaud August 6, 2012 1 / 46 Introduction Action selection/planning Learning by trial-and-error

More information

Approximate Dynamic Programming

Approximate Dynamic Programming Master MVA: Reinforcement Learning Lecture: 5 Approximate Dynamic Programming Lecturer: Alessandro Lazaric http://researchers.lille.inria.fr/ lazaric/webpage/teaching.html Objectives of the lecture 1.

More information

Prioritized Sweeping Converges to the Optimal Value Function

Prioritized Sweeping Converges to the Optimal Value Function Technical Report DCS-TR-631 Prioritized Sweeping Converges to the Optimal Value Function Lihong Li and Michael L. Littman {lihong,mlittman}@cs.rutgers.edu RL 3 Laboratory Department of Computer Science

More information

1 MDP Value Iteration Algorithm

1 MDP Value Iteration Algorithm CS 0. - Active Learning Problem Set Handed out: 4 Jan 009 Due: 9 Jan 009 MDP Value Iteration Algorithm. Implement the value iteration algorithm given in the lecture. That is, solve Bellman s equation using

More information

On Prediction and Planning in Partially Observable Markov Decision Processes with Large Observation Sets

On Prediction and Planning in Partially Observable Markov Decision Processes with Large Observation Sets On Prediction and Planning in Partially Observable Markov Decision Processes with Large Observation Sets Pablo Samuel Castro pcastr@cs.mcgill.ca McGill University Joint work with: Doina Precup and Prakash

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Dynamic Programming Marc Toussaint University of Stuttgart Winter 2018/19 Motivation: So far we focussed on tree search-like solvers for decision problems. There is a second important

More information

Lecture 3: The Reinforcement Learning Problem

Lecture 3: The Reinforcement Learning Problem Lecture 3: The Reinforcement Learning Problem Objectives of this lecture: describe the RL problem we will be studying for the remainder of the course present idealized form of the RL problem for which

More information

CS 234 Midterm - Winter

CS 234 Midterm - Winter CS 234 Midterm - Winter 2017-18 **Do not turn this page until you are instructed to do so. Instructions Please answer the following questions to the best of your ability. Read all the questions first before

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Markov decision process & Dynamic programming Evaluative feedback, value function, Bellman equation, optimality, Markov property, Markov decision process, dynamic programming, value

More information

Learning to Coordinate Efficiently: A Model-based Approach

Learning to Coordinate Efficiently: A Model-based Approach Journal of Artificial Intelligence Research 19 (2003) 11-23 Submitted 10/02; published 7/03 Learning to Coordinate Efficiently: A Model-based Approach Ronen I. Brafman Computer Science Department Ben-Gurion

More information

Partially Observable Markov Decision Processes (POMDPs)

Partially Observable Markov Decision Processes (POMDPs) Partially Observable Markov Decision Processes (POMDPs) Sachin Patil Guest Lecture: CS287 Advanced Robotics Slides adapted from Pieter Abbeel, Alex Lee Outline Introduction to POMDPs Locally Optimal Solutions

More information

Markov Decision Processes and Dynamic Programming

Markov Decision Processes and Dynamic Programming Markov Decision Processes and Dynamic Programming A. LAZARIC (SequeL Team @INRIA-Lille) ENS Cachan - Master 2 MVA SequeL INRIA Lille MVA-RL Course How to model an RL problem The Markov Decision Process

More information

Experts in a Markov Decision Process

Experts in a Markov Decision Process University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 2004 Experts in a Markov Decision Process Eyal Even-Dar Sham Kakade University of Pennsylvania Yishay Mansour Follow

More information

Recent Advances in Ranking: Adversarial Respondents and Lower Bounds on the Bayes Risk

Recent Advances in Ranking: Adversarial Respondents and Lower Bounds on the Bayes Risk Recent Advances in Ranking: Adversarial Respondents and Lower Bounds on the Bayes Risk Vincent Y. F. Tan Department of Electrical and Computer Engineering, Department of Mathematics, National University

More information

Sequential Decision Problems

Sequential Decision Problems Sequential Decision Problems Michael A. Goodrich November 10, 2006 If I make changes to these notes after they are posted and if these changes are important (beyond cosmetic), the changes will highlighted

More information

Notes from Week 9: Multi-Armed Bandit Problems II. 1 Information-theoretic lower bounds for multiarmed

Notes from Week 9: Multi-Armed Bandit Problems II. 1 Information-theoretic lower bounds for multiarmed CS 683 Learning, Games, and Electronic Markets Spring 007 Notes from Week 9: Multi-Armed Bandit Problems II Instructor: Robert Kleinberg 6-30 Mar 007 1 Information-theoretic lower bounds for multiarmed

More information

CS599 Lecture 1 Introduction To RL

CS599 Lecture 1 Introduction To RL CS599 Lecture 1 Introduction To RL Reinforcement Learning Introduction Learning from rewards Policies Value Functions Rewards Models of the Environment Exploitation vs. Exploration Dynamic Programming

More information

The Art of Sequential Optimization via Simulations

The Art of Sequential Optimization via Simulations The Art of Sequential Optimization via Simulations Stochastic Systems and Learning Laboratory EE, CS* & ISE* Departments Viterbi School of Engineering University of Southern California (Based on joint

More information

Reinforcement learning

Reinforcement learning Reinforcement learning Based on [Kaelbling et al., 1996, Bertsekas, 2000] Bert Kappen Reinforcement learning Reinforcement learning is the problem faced by an agent that must learn behavior through trial-and-error

More information

Crowdsourcing & Optimal Budget Allocation in Crowd Labeling

Crowdsourcing & Optimal Budget Allocation in Crowd Labeling Crowdsourcing & Optimal Budget Allocation in Crowd Labeling Madhav Mohandas, Richard Zhu, Vincent Zhuang May 5, 2016 Table of Contents 1. Intro to Crowdsourcing 2. The Problem 3. Knowledge Gradient Algorithm

More information

Understanding (Exact) Dynamic Programming through Bellman Operators

Understanding (Exact) Dynamic Programming through Bellman Operators Understanding (Exact) Dynamic Programming through Bellman Operators Ashwin Rao ICME, Stanford University January 15, 2019 Ashwin Rao (Stanford) Bellman Operators January 15, 2019 1 / 11 Overview 1 Value

More information

6 Reinforcement Learning

6 Reinforcement Learning 6 Reinforcement Learning As discussed above, a basic form of supervised learning is function approximation, relating input vectors to output vectors, or, more generally, finding density functions p(y,

More information

Chapter 16 Planning Based on Markov Decision Processes

Chapter 16 Planning Based on Markov Decision Processes Lecture slides for Automated Planning: Theory and Practice Chapter 16 Planning Based on Markov Decision Processes Dana S. Nau University of Maryland 12:48 PM February 29, 2012 1 Motivation c a b Until

More information

Robust Monte Carlo Methods for Sequential Planning and Decision Making

Robust Monte Carlo Methods for Sequential Planning and Decision Making Robust Monte Carlo Methods for Sequential Planning and Decision Making Sue Zheng, Jason Pacheco, & John Fisher Sensing, Learning, & Inference Group Computer Science & Artificial Intelligence Laboratory

More information

Reinforcement Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina

Reinforcement Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina Reinforcement Learning Introduction Introduction Unsupervised learning has no outcome (no feedback). Supervised learning has outcome so we know what to predict. Reinforcement learning is in between it

More information

Introduction to Reinforcement Learning

Introduction to Reinforcement Learning CSCI-699: Advanced Topics in Deep Learning 01/16/2019 Nitin Kamra Spring 2019 Introduction to Reinforcement Learning 1 What is Reinforcement Learning? So far we have seen unsupervised and supervised learning.

More information

Course basics. CSE 190: Reinforcement Learning: An Introduction. Last Time. Course goals. The website for the class is linked off my homepage.

Course basics. CSE 190: Reinforcement Learning: An Introduction. Last Time. Course goals. The website for the class is linked off my homepage. Course basics CSE 190: Reinforcement Learning: An Introduction The website for the class is linked off my homepage. Grades will be based on programming assignments, homeworks, and class participation.

More information

Chapter 3: The Reinforcement Learning Problem

Chapter 3: The Reinforcement Learning Problem Chapter 3: The Reinforcement Learning Problem Objectives of this chapter: describe the RL problem we will be studying for the remainder of the course present idealized form of the RL problem for which

More information

Lecture 3: Policy Evaluation Without Knowing How the World Works / Model Free Policy Evaluation

Lecture 3: Policy Evaluation Without Knowing How the World Works / Model Free Policy Evaluation Lecture 3: Policy Evaluation Without Knowing How the World Works / Model Free Policy Evaluation CS234: RL Emma Brunskill Winter 2018 Material builds on structure from David SIlver s Lecture 4: Model-Free

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels? Machine Learning and Bayesian Inference Dr Sean Holden Computer Laboratory, Room FC6 Telephone extension 6372 Email: sbh11@cl.cam.ac.uk www.cl.cam.ac.uk/ sbh11/ Unsupervised learning Can we find regularity

More information

Model Selection in Markovian Processes

Model Selection in Markovian Processes Model Selection in Markovian Processes Assaf Hallak Technion Haifa, Israel ifogph@gmail.com Dotan Di-Castro Technion Haifa, Israel dotan.dicastro@gmail.com Shie Mannor Technion Haifa, Israel shie@ee.technion.ac.il

More information

Empirical Risk Minimization

Empirical Risk Minimization Empirical Risk Minimization Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Introduction PAC learning ERM in practice 2 General setting Data X the input space and Y the output space

More information

Logarithmic Online Regret Bounds for Undiscounted Reinforcement Learning

Logarithmic Online Regret Bounds for Undiscounted Reinforcement Learning Logarithmic Online Regret Bounds for Undiscounted Reinforcement Learning Peter Auer Ronald Ortner University of Leoben, Franz-Josef-Strasse 18, 8700 Leoben, Austria auer,rortner}@unileoben.ac.at Abstract

More information

Constructing Learning Models from Data: The Dynamic Catalog Mailing Problem

Constructing Learning Models from Data: The Dynamic Catalog Mailing Problem Constructing Learning Models from Data: The Dynamic Catalog Mailing Problem Peng Sun May 6, 2003 Problem and Motivation Big industry In 2000 Catalog companies in the USA sent out 7 billion catalogs, generated

More information

Quantifying Stochastic Model Errors via Robust Optimization

Quantifying Stochastic Model Errors via Robust Optimization Quantifying Stochastic Model Errors via Robust Optimization IPAM Workshop on Uncertainty Quantification for Multiscale Stochastic Systems and Applications Jan 19, 2016 Henry Lam Industrial & Operations

More information

16.4 Multiattribute Utility Functions

16.4 Multiattribute Utility Functions 285 Normalized utilities The scale of utilities reaches from the best possible prize u to the worst possible catastrophe u Normalized utilities use a scale with u = 0 and u = 1 Utilities of intermediate

More information

Complexity of stochastic branch and bound methods for belief tree search in Bayesian reinforcement learning

Complexity of stochastic branch and bound methods for belief tree search in Bayesian reinforcement learning Complexity of stochastic branch and bound methods for belief tree search in Bayesian reinforcement learning Christos Dimitrakakis Informatics Institute, University of Amsterdam, Amsterdam, The Netherlands

More information

CS 188 Introduction to Fall 2007 Artificial Intelligence Midterm

CS 188 Introduction to Fall 2007 Artificial Intelligence Midterm NAME: SID#: Login: Sec: 1 CS 188 Introduction to Fall 2007 Artificial Intelligence Midterm You have 80 minutes. The exam is closed book, closed notes except a one-page crib sheet, basic calculators only.

More information

Lecture Slides - Part 1

Lecture Slides - Part 1 Lecture Slides - Part 1 Bengt Holmstrom MIT February 2, 2016. Bengt Holmstrom (MIT) Lecture Slides - Part 1 February 2, 2016. 1 / 36 Going to raise the level a little because 14.281 is now taught by Juuso

More information

Selected Examples of CONIC DUALITY AT WORK Robust Linear Optimization Synthesis of Linear Controllers Matrix Cube Theorem A.

Selected Examples of CONIC DUALITY AT WORK Robust Linear Optimization Synthesis of Linear Controllers Matrix Cube Theorem A. . Selected Examples of CONIC DUALITY AT WORK Robust Linear Optimization Synthesis of Linear Controllers Matrix Cube Theorem A. Nemirovski Arkadi.Nemirovski@isye.gatech.edu Linear Optimization Problem,

More information

Reinforcement Learning with Function Approximation. Joseph Christian G. Noel

Reinforcement Learning with Function Approximation. Joseph Christian G. Noel Reinforcement Learning with Function Approximation Joseph Christian G. Noel November 2011 Abstract Reinforcement learning (RL) is a key problem in the field of Artificial Intelligence. The main goal is

More information

SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning

SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning Mark Schmidt University of British Columbia, May 2016 www.cs.ubc.ca/~schmidtm/svan16 Some images from this lecture are

More information

Introduction to Reinforcement Learning

Introduction to Reinforcement Learning Introduction to Reinforcement Learning Rémi Munos SequeL project: Sequential Learning http://researchers.lille.inria.fr/ munos/ INRIA Lille - Nord Europe Machine Learning Summer School, September 2011,

More information

Optimism in the Face of Uncertainty Should be Refutable

Optimism in the Face of Uncertainty Should be Refutable Optimism in the Face of Uncertainty Should be Refutable Ronald ORTNER Montanuniversität Leoben Department Mathematik und Informationstechnolgie Franz-Josef-Strasse 18, 8700 Leoben, Austria, Phone number:

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Lecture 6: RL algorithms 2.0 Alexandre Proutiere, Sadegh Talebi, Jungseul Ok KTH, The Royal Institute of Technology Objectives of this lecture Present and analyse two online algorithms

More information

Today s Outline. Recap: MDPs. Bellman Equations. Q-Value Iteration. Bellman Backup 5/7/2012. CSE 473: Artificial Intelligence Reinforcement Learning

Today s Outline. Recap: MDPs. Bellman Equations. Q-Value Iteration. Bellman Backup 5/7/2012. CSE 473: Artificial Intelligence Reinforcement Learning CSE 473: Artificial Intelligence Reinforcement Learning Dan Weld Today s Outline Reinforcement Learning Q-value iteration Q-learning Exploration / exploitation Linear function approximation Many slides

More information

Artificial Intelligence & Sequential Decision Problems

Artificial Intelligence & Sequential Decision Problems Artificial Intelligence & Sequential Decision Problems (CIV6540 - Machine Learning for Civil Engineers) Professor: James-A. Goulet Département des génies civil, géologique et des mines Chapter 15 Goulet

More information

arxiv: v3 [math.oc] 25 Apr 2018

arxiv: v3 [math.oc] 25 Apr 2018 Problem-driven scenario generation: an analytical approach for stochastic programs with tail risk measure Jamie Fairbrother *, Amanda Turner *, and Stein W. Wallace ** * STOR-i Centre for Doctoral Training,

More information

MDPs with Non-Deterministic Policies

MDPs with Non-Deterministic Policies MDPs with Non-Deterministic Policies Mahdi Milani Fard School of Computer Science McGill University Montreal, Canada mmilan1@cs.mcgill.ca Joelle Pineau School of Computer Science McGill University Montreal,

More information

On the Approximate Solution of POMDP and the Near-Optimality of Finite-State Controllers

On the Approximate Solution of POMDP and the Near-Optimality of Finite-State Controllers On the Approximate Solution of POMDP and the Near-Optimality of Finite-State Controllers Huizhen (Janey) Yu (janey@mit.edu) Dimitri Bertsekas (dimitrib@mit.edu) Lab for Information and Decision Systems,

More information

Grundlagen der Künstlichen Intelligenz

Grundlagen der Künstlichen Intelligenz Grundlagen der Künstlichen Intelligenz Formal models of interaction Daniel Hennes 27.11.2017 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Taxonomy of domains Models of

More information

Stochastic Composition Optimization

Stochastic Composition Optimization Stochastic Composition Optimization Algorithms and Sample Complexities Mengdi Wang Joint works with Ethan X. Fang, Han Liu, and Ji Liu ORFE@Princeton ICCOPT, Tokyo, August 8-11, 2016 1 / 24 Collaborators

More information

Real Time Value Iteration and the State-Action Value Function

Real Time Value Iteration and the State-Action Value Function MS&E338 Reinforcement Learning Lecture 3-4/9/18 Real Time Value Iteration and the State-Action Value Function Lecturer: Ben Van Roy Scribe: Apoorva Sharma and Tong Mu 1 Review Last time we left off discussing

More information