Artificial Intelligence & Sequential Decision Problems

Size: px

Start display at page:

Download "Artificial Intelligence & Sequential Decision Problems"

Dennis Cunningham
5 years ago
Views:

1 Artificial Intelligence & Sequential Decision Problems (CIV Machine Learning for Civil Engineers) Professor: James-A. Goulet Département des génies civil, géologique et des mines Chapter 15 Goulet (2018). Probabilistic Machine Learning for Civil Engineers. Chapter 17 Russell, S. and Norvig, P. (1995). Artificial Intelligence, A modern approach. Prentice-Hall. Bertsekas, D.P. (1995). Dynamic programming and optimal control, volume 1&2. Athena Scientific Belmont, MA. Artificial Intelligence & Sequential Decision Problems V1.2 Machine Learning for Civil Engineers 1 / 33

2 Intro Nomenclature Markov Decision Process, MDP Partially Observable MDP Q-Learning Summary Context Artificial intelligence - AI Combine the cognitive human reasoning with the analysis capacity of computers Level 1 - Limited AI Beyond human capacity in a specific taks Behind limited AI: Machine learning Behind machine learning: Probability theory / Decision Theory / Linear Algebra / Optimization [Google images] Artificial Intelligence & Sequential Decision Problems V1.2 Machine Learning for Civil Engineers Polytechnique Montre al 2 / 33

expected utility [Google images] Artificial Intelligence &

3 AI AI in every day life Behind AI applications, there is an algorithm that mimics a rational agent that maximizes its expected utility [Google images] Artificial Intelligence & Sequential Decision Problems V1.2 Machine Learning for Civil Engineers 3 / 33

4 AI AI in every day life Behind AI applications, there is an algorithm that mimics a rational agent that maximizes its expected utility [Google images] Artificial Intelligence & Sequential Decision Problems V1.2 Machine Learning for Civil Engineers 3 / 33

5 AI AI in every day life Behind AI applications, there is an algorithm that mimics a rational agent that maximizes its expected utility [Google images] Artificial Intelligence & Sequential Decision Problems V1.2 Machine Learning for Civil Engineers 3 / 33

6 AI AI in every day life Behind AI applications, there is an algorithm that mimics a rational agent that maximizes its expected utility [Google images] Artificial Intelligence & Sequential Decision Problems V1.2 Machine Learning for Civil Engineers 3 / 33

AI Potential applications of AI in civil engineering Anomaly detection Input data: Real-time monitoring of a structure behavior AI role: Decides in

7 AI Potential applications of AI in civil engineering Anomaly detection Input data: Real-time monitoring of a structure behavior AI role: Decides in real-time whether or not to alert an engineer [Google images] Artificial Intelligence & Sequential Decision Problems V1.2 Machine Learning for Civil Engineers 4 / 33

8 AI Potential applications of AI in civil engineering Planning of infrastructure maintenance and replacement Input data: Bi-annual inspection reports AI role: Identifies in the optimal maintenance/replacement policies in order to maximizes the benefits of investments [Google images] Artificial Intelligence & Sequential Decision Problems V1.2 Machine Learning for Civil Engineers 4 / 33

9 AI Potential applications of AI in civil engineering Post-earthquake rapid safety assessment d=0 d=1 d=2 d=3 d=4 d=5 Input data: Post-earthquake inspection reports & real-time monitoring data f X D (x d) P(D S) x S={4,5} S={2,3} S={0,1} AI role: Identifies in real-time buildings that are safe v.s. those that must be evacuated [Google images] Artificial Intelligence & Sequential Decision Problems V1.2 Machine Learning for Civil Engineers 4 / 33

10 AI Potential applications of AI in civil engineering In-situ observation Site to be decontaminated Contaminated regions Characterization of contaminated sites Input data: In-situ soil contaminant concentration AI role: Identifies in real-time in the optimal site investigation path in order to minimize characterization costs [Google images] Artificial Intelligence & Sequential Decision Problems V1.2 Machine Learning for Civil Engineers 4 / 33

Intro Nomenclature Markov Decision Process, MDP Partially Observable MDP Q-Learning Summary AI Potential applications of AI in civil engineering Infested trees removal Input data: Time-history of

11 Intro Nomenclature Markov Decision Process, MDP Partially Observable MDP Q-Learning Summary AI Potential applications of AI in civil engineering Infested trees removal Input data: Time-history of trees attacks by the Emerald ash borer AI role: Identifies in real-time the optimal tree removal policy given the infestation dynamics [Google images] Artificial Intelligence & Sequential Decision Problems V1.2 Machine Learning for Civil Engineers Polytechnique Montre al 4 / 33

Intro Nomenclature Markov Decision Process, MDP Partially Observable MDP Q-Learning Summary Module #10 Outline Topics organization µ σ 0.

10 MCMC sampling & Newton Regression θ2 0 3 4 5 6 7 µ µ+σ f X(x) 2 Probability distributions 0.05 0 0.

8 1 State-space model for time-series 2 8 Model Calibration y2 Data-driven methods 1 Revision probability & linear algebra Patogens [ppm] Machine Learning Basics ( fx D(x d)

12 Intro Nomenclature Markov Decision Process, MDP Partially Observable MDP Q-Learning Summary Module #10 Outline Topics organization µ σ 0.4 Model driven methods n Decision Making ( FX(x) x 2 µ σ µ µ+σ x 2 4 Introduction Bayesian Estimation p(b A)p(A) p(b) p(a B) = s = MCMC sampling & Newton Regression θ µ µ+σ f X(x) 2 Probability distributions θ1 5 θ θ1 [ ] cl+ d=0 Classification d=1 d=2 d=3 d=4 d= S={4,5} 1 x 0.6 S={2,3} S={0,1} State-space model for time-series 2 8 Model Calibration y2 Data-driven methods 1 Revision probability & linear algebra Patogens [ppm] Machine Learning Basics ( fx D(x d) Probability theory P(D S) Intro Nomenclature Markov Decision Process, MDP Partially Observable MDP Q-Learning 2 0 y y y1 2 E[U] = 3.7 E[U] = Decision Theory 10 AI & Sequential decision problems Artificial Intelligence & Sequential Decision Problems V1.2 Machine Learning for Civil Engineers Polytechnique Montre al 5 / 33

13 Section Outline Nomenclature 2.1 Variables 2.2 Example #1 - Infrastructure Maintenance Planning Artificial Intelligence & Sequential Decision Problems V1.2 Machine Learning for Civil Engineers 6 / 33

14 Variables AI sequential decision problem - Nomenclature States: s } {{ R } or s j S = {s 1, s 2,, s S } }{{} continuous discrete Actions: a i A = {a 1, a 2,, a A } Transition model: Reward: p(s s j, a i ) = Pr(s = s k s j, a i ), k = 1 : S R(s j ) R Policy: π = {a 1, a 2,, a S } Artificial Intelligence & Sequential Decision Problems V1.2 Machine Learning for Civil Engineers 7 / 33

15 Example #1 - Infrastructure Maintenance Planning Infrastructure Maintenance - Definitions Bridges are inspected every two years: 100% 80% 60% s S = 40% 20% 0% s = 100% s = 80% s = 60% s = 40% s = 20% s = 0% Time t Data {s 1, s 2,, s T } 1 {s 1, s 2,, s T } 2 D =. {s 1, s 2,, s T } N N T [Google images] Artificial Intelligence & Sequential Decision Problems V1.2 Machine Learning for Civil Engineers 8 / 33

16 Example #1 - Infrastructure Maintenance Planning Infrastructure Maintenance - Definitions Bridges are inspected every two years: 100% 80% 60% s S = 40% 20% 0% s = 100% s = 80% s = 60% s = 40% s = 20% s = 0% Time t Data {s 1, s 2,, s T } 1 {s 1, s 2,, s T } 2 D =. {s 1, s 2,, s T } N N T [Google images] Artificial Intelligence & Sequential Decision Problems V1.2 Machine Learning for Civil Engineers 8 / 33

Example #1 - Infrastructure Maintenance Planning Infrastructure Maintenance - Definitions Bridges are inspected every two years: 100% 80% 60% s S = 40% 20% 0% s = 100% s = 80% s = 60% s = 40% s = 20%

17 Example #1 - Infrastructure Maintenance Planning Infrastructure Maintenance - Definitions Bridges are inspected every two years: 100% 80% 60% s S = 40% 20% 0% s = 100% s = 80% s = 60% s = 40% s = 20% s = 0% Time t Data {s 1, s 2,, s T } 1 {s 1, s 2,, s T } 2 D =. {s 1, s 2,, s T } N N T [Google images] Artificial Intelligence & Sequential Decision Problems V1.2 Machine Learning for Civil Engineers 8 / 33

18 Example #1 - Infrastructure Maintenance Planning Infrastructure Maintenance - Actions a A = {do nothing, maintain, replace} Artificial Intelligence & Sequential Decision Problems V1.2 Machine Learning for Civil Engineers 9 / 33

19 S Example #1 - Infrastructure Maintenance Planning Infrastructure Maintenance - Model Employ the dataset D to build a Markovian model p(s t+1 s t, a) p(s t+1 s 1:t, a) Pr(s t+1 = k s t = i, a = do nothing) #{s t = i, s t+1 = k} #{s t = i} p(s t+1 s t = 1, a = do nothing) = Artificial Intelligence & Sequential Decision Problems V1.2 Machine Learning for Civil Engineers 10 / 33

20 S Example #1 - Infrastructure Maintenance Planning Infrastructure Maintenance - Model Employ the dataset D to build a Markovian model p(s t+1 s t, a) p(s t+1 s 1:t, a) Pr(s t+1 = k s t = i, a = do nothing) #{s t = i, s t+1 = k} #{s t = i} p(s t+1 s t = 2, a = do nothing) = Artificial Intelligence & Sequential Decision Problems V1.2 Machine Learning for Civil Engineers 10 / 33

21 Example #1 - Infrastructure Maintenance Planning Infrastructure Maintenance - Model Employ the dataset D to build a Markovian model p(s t+1 s t, a) p(s t+1 s 1:t, a) Pr(s t+1 = k s t = i, a = do nothing) #{s t = i, s t+1 = k} #{s t = i} p(s t+1 s t, a = do nothing) = S S Artificial Intelligence & Sequential Decision Problems V1.2 Machine Learning for Civil Engineers 10 / 33

22 Example #1 - Infrastructure Maintenance Planning Infrastructure Maintenance - Model (cont.) p(s t+1 s t, a = maintain) = p(s t+1 s t, a = replace) = S S S S Artificial Intelligence & Sequential Decision Problems V1.2 Machine Learning for Civil Engineers 11 / 33

23 Example #1 - Infrastructure Maintenance Planning Infrastructure Maintenance - Rewards Rewards for being in a state s or for taking an action a { AATF }} { R(S) = AADTF10 5 users 365 days 3$/user capacity(s) = {109.5, 109.5, 109.5, 98.6, 82.1, 0}M$ capacity(s) = { R(A) = { s=100% 80% 60% 40% 20% 0% {}}{{}}{{}}{{}}{{}}{{}}{ 1, 1, 1, 0.90, 0.75, 0 } do nothing {}}{ 0, repair {}}{ 5, replace {}}{ 20 }M$ Artificial Intelligence & Sequential Decision Problems V1.2 Machine Learning for Civil Engineers 12 / 33

24 Example #1 - Infrastructure Maintenance Planning Illustrative example - Optimal actions Given that the transition model is Markovian, the goal of the Markovian Decision Process is to identify the optimal actions a to be taken for each state s so that a policy π (S) = {a 1, a 2,, a S } Artificial Intelligence & Sequential Decision Problems V1.2 Machine Learning for Civil Engineers 13 / 33

25 Section Outline Markov Decision Process, MDP 3.1 Utility over time 3.2 Value Iteration 3.3 Policy Iteration Artificial Intelligence & Sequential Decision Problems V1.2 Machine Learning for Civil Engineers 14 / 33

26 Utility over time Planning horizon Finite planning horizon: The rewards and utilities are considered over a fixed period of time. The optimal policy is non-stationary; π t (s) depends on t. Infinite planning horizon: The rewards and utilities are considered over an infinite period. The optimal policy is stationary; π (s) does not depend on t. We will only study problems with an infinite planning horizon Artificial Intelligence & Sequential Decision Problems V1.2 Machine Learning for Civil Engineers 15 / 33

27 Utility over time Utility for an infinite planning horizon For a discount factor γ (0, 1[, the discounted sum of rewards over an infinite planning horizon is finite (note: interest rate = 1 γ 1) U(s t=0, s t=1,, s t= ) = R(s 0 ) + γr(s 1 ) + γ 2 R(s 2 ) + = γ t R(s t ) t=0 max R(s) s S 1 γ! The issue is that we do not know s 1, s 2, we only know p(s t+1 s t, a) p(s s, a) }{{} transition model Artificial Intelligence & Sequential Decision Problems V1.2 Machine Learning for Civil Engineers 16 / 33

28 Utility over time Expected utility for an infinite planning horizon Given p(s s, a) and R(s), we can compute the expected utility Expected utility [ ] U(s, π) E[U(s, π)] = R(s) + E γ t R(S t ) Optimal policy π (s) = arg max a A t=1 p(s s, a) U(s, π )! π appears on both sides of the equality iterative resolution Artificial Intelligence & Sequential Decision Problems V1.2 Machine Learning for Civil Engineers 17 / 33

29 Value Iteration Utility for an infinite planning horizon - Bellman Bellman equation: The utility of a state is the immediate reward for that state plus the expected discounted utility of the next state given that the optimal action is taken U(s) = R(s) + max a A p(s s, a) (R(a) + γu(s )) s S Bellman update & Value iteration U (i+1) (s) R(s) + max a A p(s s, a) (R(a) + γu (i) (s )) s S Artificial Intelligence & Sequential Decision Problems V1.2 Machine Learning for Civil Engineers 18 / 33

30 Value Iteration Algorithm 1: Value iteration 1 define: states s S, actions a A; 2 define: transition model p(s s, a); 3 define: rewards R(s), R(a), discount γ, convergence tol. ɛ; 4 initialize: U (s) = U(s) = 0; 5 while U(s) = 0 or U (s) U(s) ɛ do 6 U(s) U (s); 7 for s S do 8 for a A do 9 U(s, a) = s S p(s s, a) (R(a) + γu(s )); 10 π (s) a = arg max U(s, a); a A 11 U (s) R(s) + U(s, a ); 12 return: U(s), π (s); Artificial Intelligence & Sequential Decision Problems V1.2 Machine Learning for Civil Engineers 19 / 33

31 Value Iteration Infrastructure Maintenance - Summary s S = 100% 80% 60% 40% 20% 0% a A = {Do nothing, Maintain, Replace} R(S) = {109.5, 109.5, 109.5, 98.6, 82.1, 0}M$ R(A) = {0, 5, 20}M$ p(s t+1 s t, a = Do nothing) = p(s t+1 s t, a = Maintain) = p(s t+1 s t, a = Replace) = S S S S S S Artificial Intelligence & Sequential Decision Problems V1.2 Machine Learning for Civil Engineers 20 / 33

32 Value Iteration Infrastructure Maintenance - Value iteration s S = {100%, 80%, 60%, 40%, 20%, 0%} a A = {Do nothing, Maintain, Replace} R(S) = {109.5, 109.5, 109.5, 98.6, 82.1, 0}M$ R(A) = {0, 5, 20}M$ γ = 0.97 p(s s, a) = Iteration: 1, s S U 0 (S) = {0, 0, 0, 0, 0, 0} M$ U 1 (s) = R(s) + max a A p(s s, a 1 ) (R(a 1 ) + γu 0 (s )) s S p(s s, a 2 ) (R(a 2 ) + γu 0 (s )) s S p(s s, a 3 ) (R(a 3 ) + γu 0 (s )) s S U (1) (1) = M$ U (2) (1) = 223 M$ Artificial Intelligence & Sequential Decision Problems V1.2 Machine Learning for Civil Engineers 21 / 33

33 Value Iteration Infrastructure Maintenance - Value iteration s S = {100%, 80%, 60%, 40%, 20%, 0%} a A = {Do nothing, Maintain, Replace} R(S) = {109.5, 109.5, 109.5, 98.6, 82.1, 0}M$ R(A) = {0, 5, 20}M$ γ = 0.97 p(s s, a) = Iteration: 2, s S U 1 (S) = {110, 211, 309, 393, 459, 440} M$ U 2 (s) = R(s) + max a A p(s s, a 1 ) (R(a 1 ) + γu 1 (s )) s S p(s s, a 2 ) (R(a 2 ) + γu 1 (s )) s S p(s s, a 3 ) (R(a 3 ) + γu 1 (s )) s S U (1) (1) = M$ U (2) (1) = 223 M$ Artificial Intelligence & Sequential Decision Problems V1.2 Machine Learning for Civil Engineers 21 / 33

34 Value Iteration Infrastructure Maintenance - Value iteration s S = {100%, 80%, 60%, 40%, 20%, 0%} a A = {Do nothing, Maintain, Replace} R(S) = {109.5, 109.5, 109.5, 98.6, 82.1, 0}M$ R(A) = {0, 5, 20}M$ γ = 0.97 p(s s, a) = Iteration: 3, s S U 2 (S) = {223, 329, 430, 511, 572, 550} M$ U 3 (s) = R(s) + max a A p(s s, a 1 ) (R(a 1 ) + γu 2 (s )) s S p(s s, a 2 ) (R(a 2 ) + γu 2 (s )) s S p(s s, a 3 ) (R(a 3 ) + γu 2 (s )) s S U (1) (1) = M$ U (2) (1) = 223 M$ Artificial Intelligence & Sequential Decision Problems V1.2 Machine Learning for Civil Engineers 21 / 33

35 Value Iteration Value iteration - U (1) (1) = M$ U (2) (1) = 223 M$ s=100% 80% U (0) {}}{{}}{ (S) = { 0, 0, 60% {}}{ 0, 40% {}}{ 0, 20% {}}{ 0, 0% {}}{ 0 } M$ U (1) (S) = {110, 211, 309, 393, 459, 440} M$ U (2) (S) = {223, 329, 430, 511, 572, 550} M$. U (536) (S) = {3640, 3635, 3630, 3615, 3592, 3510} M$ U (537) (S) = {3640, 3635, 3630, 3615, 3592, 3510} M$ R(S) = {109.5, 109.5, 109.5, 98.6, 82.1, 0}M$ R(A) = {0, 5, 20}M$ p(s s, do nothing) = ( U (2) (1) R(1) + max a A s S p(s s, a) (R(a) + γu (1) (s ))) R(s=1) {}}{ 0.95 ( M$) ( M$) ( M$) = M$ 109.5M$ + max 1 ( M$) = M$ a A 1 ( M$) = 86.7 M$ = 109.5$ M$ = 223 M$ (π (s = 1) = a = do nothing) Artificial Intelligence & Sequential Decision Problems V1.2 Machine Learning for Civil Engineers 22 / 33

36 Value Iteration Value iteration - U (1) (1) = M$ U (2) (1) = 223 M$ s=100% 80% U (0) {}}{{}}{ (S) = { 0, 0, 60% {}}{ 0, 40% {}}{ 0, 20% {}}{ 0, 0% {}}{ 0 } M$ U (1) (S) = {110, 211, 309, 393, 459, 440} M$ U (2) (S) = {223, 329, 430, 511, 572, 550} M$. U (536) (S) = {3640, 3635, 3630, 3615, 3592, 3510} M$ U (537) (S) = {3640, 3635, 3630, 3615, 3592, 3510} M$ R(S) = {109.5, 109.5, 109.5, 98.6, 82.1, 0}M$ R(A) = {0, 5, 20}M$ p(s s, repair) = ( U (2) (1) R(1) + max a A s S p(s s, a) (R(a) + γu (1) (s ))) R(s=1) {}}{ 0.95 ( M$) ( M$) ( M$) = M$ 109.5M$ + max 1 ( M$) = M$ a A 1 ( M$) = 86.7 M$ = 109.5$ M$ = 223 M$ (π (s = 1) = a = do nothing) Artificial Intelligence & Sequential Decision Problems V1.2 Machine Learning for Civil Engineers 22 / 33

37 Value Iteration Value iteration - U (1) (1) = M$ U (2) (1) = 223 M$ s=100% 80% U (0) {}}{{}}{ (S) = { 0, 0, 60% {}}{ 0, 40% {}}{ 0, 20% {}}{ 0, 0% {}}{ 0 } M$ U (1) (S) = {110, 211, 309, 393, 459, 440} M$ U (2) (S) = {223, 329, 430, 511, 572, 550} M$. U (536) (S) = {3640, 3635, 3630, 3615, 3592, 3510} M$ U (537) (S) = {3640, 3635, 3630, 3615, 3592, 3510} M$ R(S) = {109.5, 109.5, 109.5, 98.6, 82.1, 0}M$ R(A) = {0, 5, 20}M$ p(s s, replace) = ( U (2) (1) R(1) + max a A s S p(s s, a) (R(a) + γu (1) (s ))) R(s=1) {}}{ 0.95 ( M$) ( M$) ( M$) = M$ 109.5M$ + max 1 ( M$) = M$ a A 1 ( M$) = 86.7 M$ = 109.5$ M$ = 223 M$ (π (s = 1) = a = do nothing) Artificial Intelligence & Sequential Decision Problems V1.2 Machine Learning for Civil Engineers 22 / 33

38 Value Iteration Infrastructure Maintenance - Value iteration [ ] U(s), [M$] 4,000 3,000 2,000 s = 100% s = 80% s = 60% 1,000 s = 40% s = 20% s = 0% itetations # U(S) = {3640, 3635, 3630, 3615, 3592, 3510} M$ π (S) = {Do nothing, Maintain }{{}}{{}, Maintain }{{}, Maintain }{{}, Replace, Replace} }{{}}{{} s=100% s=80% s=60% s=40% s=20% s=0% [CIV ML/Value iteration bridge.m] Artificial Intelligence & Sequential Decision Problems V1.2 Machine Learning for Civil Engineers 23 / 33

39 Value Iteration Bellman update & Value iteration Value iteration U (i+1) (s) R(s) + max a A p(s s, a) (R(a) + γu (i) (s )) s S The max operation is computationally expensive if S is large solution: policy iteration A policy π = {a 1, a 2,, a S } defines an action to be taken for each state s S Policy iteration U (i+1) (s) R(s) + s S p(s s, π i (s)) (R(π i (s)) + γu (i) (s )) Artificial Intelligence & Sequential Decision Problems V1.2 Machine Learning for Civil Engineers 24 / 33

40 Policy Iteration Algorithm 2: Policy iteration 1 define: states s S, actions a A; 2 define: transition model p(s s, a); 3 define: rewards R(s), R(a), discount γ; 4 initialize: U(s) = 0, π (s) = π(s) A; 5 while π (s) π(s) do 6 π(s) = π (s); 7 U(s) = Value iteration(u(s), π(s), p(s s, a), R(s), R(a), γ); 8 for s S do 9 π (s) arg max a A 10 return: U(s), π (s) = π(s); R(a) + γ s S p(s s, a) U(s ); Artificial Intelligence & Sequential Decision Problems V1.2 Machine Learning for Civil Engineers 25 / 33

41 Policy Iteration Infrastructure Maintenance - Policy iteration s S = {100%, 80%, 60%, 40%, 20%, 0%} a A = {Do nothing (1), Maintain (2), Replace (3)} R(S) = {109.5, 109.5, 109.5, 98.6, 82.1, 0}M$ R(A) = {0, 5, 20}M$ γ = 0.97 p(s s, a) = Iteration: 1, s S π 0 (S) = {1, 1, 1, 1, 1, 1} U 0 (S) = {2063, 1290, , 197, 0} M$ R(a 1 ) + γ p(s s, a 1 ) U 1 (s ) π 1 (s) = arg max a A s S R(a 2 ) + γ p(s s, a 2 ) U 1 (s ) s S R(a 3 ) + γ p(s s, a 3 ) U 1 (s ) s S π 5 (S) = π 6 (S) = π (S) = {1, 2, 2, 2, 3, 3} Artificial Intelligence & Sequential Decision Problems V1.2 Machine Learning for Civil Engineers 26 / 33

42 Policy Iteration Infrastructure Maintenance - Policy iteration s S = {100%, 80%, 60%, 40%, 20%, 0%} a A = {Do nothing (1), Maintain (2), Replace (3)} R(S) = {109.5, 109.5, 109.5, 98.6, 82.1, 0}M$ R(A) = {0, 5, 20}M$ γ = 0.97 p(s s, a) = Iteration: 2, s S π 1 (S) = {2, 2, 3, 3, 3, 3} U 1 (S) = {3483, 3483, 3468, 3457, 3441, 3359} M$ R(a 1 ) + γ p(s s, a 1 ) U 2 (s ) π 2 (s) = arg max a A s S R(a 2 ) + γ p(s s, a 2 ) U 2 (s ) s S R(a 3 ) + γ p(s s, a 3 ) U 2 (s ) s S π 5 (S) = π 6 (S) = π (S) = {1, 2, 2, 2, 3, 3} Artificial Intelligence & Sequential Decision Problems V1.2 Machine Learning for Civil Engineers 26 / 33

43 Policy Iteration Infrastructure Maintenance - Policy iteration s S = {100%, 80%, 60%, 40%, 20%, 0%} a A = {Do nothing (1), Maintain (2), Replace (3)} R(S) = {109.5, 109.5, 109.5, 98.6, 82.1, 0}M$ R(A) = {0, 5, 20}M$ γ = 0.97 p(s s, a) = Iteration: 3, s S π 2 (S) = {1, 1, 2, 2, 3, 3} U 2 (S) = {3622, 3606, 3603, 3588, 3576, 3494} M$ R(a 1 ) + γ p(s s, a 1 ) U 3 (s ) π 3 (s) = arg max a A s S R(a 2 ) + γ p(s s, a 2 ) U 3 (s ) s S R(a 3 ) + γ p(s s, a 3 ) U 3 (s ) s S π 5 (S) = π 6 (S) = π (S) = {1, 2, 2, 2, 3, 3} Artificial Intelligence & Sequential Decision Problems V1.2 Machine Learning for Civil Engineers 26 / 33

44 Policy Iteration Infrastructure Maintenance - Policy iteration s S = {100%, 80%, 60%, 40%, 20%, 0%} a A = {Do nothing (1), Maintain (2), Replace (3)} R(S) = {109.5, 109.5, 109.5, 98.6, 82.1, 0}M$ R(A) = {0, 5, 20}M$ γ = 0.97 p(s s, a) = Iteration: 6, s S π 5 (S) = {1, 2, 2, 2, 3, 3} U 2 (S) = {3640, 3635, 3630, 3615, 3592, 3510} M$ R(a 1 ) + γ p(s s, a 1 ) U 5 (s ) π 6 (s) = arg max a A s S R(a 2 ) + γ p(s s, a 2 ) U 5 (s ) s S R(a 3 ) + γ p(s s, a 3 ) U 5 (s ) s S π 5 (S) = π 6 (S) = π (S) = {1, 2, 2, 2, 3, 3} Artificial Intelligence & Sequential Decision Problems V1.2 Machine Learning for Civil Engineers 26 / 33

45 Policy Iteration Value iteration - U (1) (1) = 2063 M$ U (2) (1) = 3483 M$ U (1) (S) = {2063, 1290, , 197, 0} M$, π (1) (S) = {2, 2, 3, 3, 3, 3} Policy Iteration loop Policy Iteration loop 1 Value Iteration loop State π (1) (1)=2, maintain {}}{ U (2)(1) (1) = 109.5M$ + 1 ( M$) = 2106 M$ U (2)(2) (1) = 109.5M$ + 1 ( M$) = 2147 M$. U (2)(350) (1) = 109.5M$ + 1 ( M$) = 3483 M$. U (2) (1) = U (2)(351) (1) = 109.5M$ + 1 ( M$) = 3483 M$. Artificial Intelligence & Sequential Decision Problems V1.2 Machine Learning for Civil Engineers 27 / 33

46 Policy Iteration Policy iteration - π (1) (1) = 2 π (2) (1) = 1 U (2) (S) = {3483, 3483, 3468, 3457, 3441, 3359} M$ π (2) (1)=arg max a A =arg max a A = 1. R(a 1 ) + γ p(s s, a 1 ) U 2 (s ) s S R(a 2 ) + γ p(s s, a 2 ) U 2 (s ) s S R(a 3 ) + γ p(s s, a 3 ) U 2 (s ) s S 0 M$ ( M$ M$ M$) = 3378 M$ 5 M$ (1 3483M$) = 3374 M$ 20 M$ (1 3483M$) = 3359 M$ Artificial Intelligence & Sequential Decision Problems V1.2 Machine Learning for Civil Engineers 28 / 33

47 Policy Iteration Infrastructure Maintenance - Policy iteration [ ] % 80% 60% 40% 20% 0% Replace Maintain s Do nothing a π (S) = {Do nothing, Maintain }{{}}{{}, Maintain }{{}, Maintain }{{}, Replace, Replace} }{{}}{{} s=100% s=80% s=60% s=40% s=20% s=0% [CIV ML/Policy iteration bridge.m] Artificial Intelligence & Sequential Decision Problems V1.2 Machine Learning for Civil Engineers 29 / 33

48 Policy Iteration Infrastructure Maintenance - Policy iteration [ ] % 80% 60% 40% 20% 0% Replace Maintain s Do nothing a π (S) = {Do nothing, Maintain }{{}}{{}, Maintain }{{}, Maintain }{{}, Replace, Replace} }{{}}{{} s=100% s=80% s=60% s=40% s=20% s=0% [CIV ML/Policy iteration bridge.m] Artificial Intelligence & Sequential Decision Problems V1.2 Machine Learning for Civil Engineers 29 / 33

49 Policy Iteration Policy v.s. Value iteration π: Policy iteration π: Value iteration Time: Policy iteration Time: Value iteration Artificial Intelligence & Sequential Decision Problems V1.2 Machine Learning for Civil Engineers 30 / 33

50 Partially Observable MDP (POMDP) One limitation of MDP is that at a time t, you must know exactly in what state (s) you are. When this is not the case POMDP (Note: computationally more demanding than MDP) Chapter 17 Russell, S. and Norvig, P. (1995). Artificial Intelligence, A modern approach. Prentice-Hall. Artificial Intelligence & Sequential Decision Problems V1.2 Machine Learning for Civil Engineers 31 / 33

Intro Nomenclature Markov Decision Process, MDP Partially Observable MDP Q-Learning Summary Advanced reinforcement learning Q-learning A = { } S = { } R(S) = {fct( )} p(st+1 st, a) = unknown.

51 Intro Nomenclature Markov Decision Process, MDP Partially Observable MDP Q-Learning Summary Advanced reinforcement learning Q-learning A = { } S = { } R(S) = {fct( )} p(st+1 st, a) = unknown... U(s, a) Q(s, a) (Neural Network) Chapter 21 Russell, S. and Norvig, P. (1995). Artificial Intelligence, A modern approach. Prentice-Hall. [Google images] Artificial Intelligence & Sequential Decision Problems V1.2 Machine Learning for Civil Engineers Polytechnique Montre al 32 / 33

52 Summary Level 1 - Limited AI: Beyond human capacity in a specific task Sequential decision problem: 1. Identify possible actions 2. Quantify the utility for possible outcomes 3. Build a predictive model of the problem studied 4. Formulate the AI as a sequential decision problem States: s R or s }{{} j S = {s 1, s 2,, s S} }{{} continuous discrete Actions: a i A = {a 1, a 2,, a A} Model: Reward: p(s s j, a i ) = Pr(s = s k s j, a i ), k = 1 : S R(s j ) R Policy: π = {a 1, a 1,, a S} Infinite planning horizon: The rewards and utilities are considered over an infinite period of time. The optimal policy is stationary; π (s) does not depend on t. Bellman equation: The utility of a state is the immediate reward for that state plus the expected discounted utility of the next state given that the optimal action is taken U(s) = R(s) + max a A p(s s, a) (R(a) + γu(s )) s S Bellman update & Value iteration U (i+1) (s) R(s)+ max a A Policy iteration p(s s, a) (R(a)+γU (i) (s )) s S U (i+1) (s) R(s)+ p(s s, π i (s)) (R(π i (s))+γu (i) (s )) s S π: Policy iteration π: Value iteration Time: Policy iteration Time: Value iteration Artificial Intelligence & Sequential Decision Problems V1.2 Machine Learning for Civil Engineers 33 / 33

Intro Utility theory Utility & Loss Functions Value of Information Summary. Decision Theory. (CIV Machine Learning for Civil Engineers)

Intro Utility theory Utility & Loss Functions Value of Information Summary. Decision Theory. (CIV Machine Learning for Civil Engineers) Decision Theory (CIV654 - Machine Learning for Civil Engineers) Professor: James-A. Goulet Département des génies civil, géologique et des mines Chapter 14 Goulet (219). Probabilistic Machine Learning