Tutorial: A Unified Framework for Optimization under Uncertainty

Size: px

Start display at page:

Download "Tutorial: A Unified Framework for Optimization under Uncertainty"

Alvin Sherman
5 years ago
Views:

1 Tuorial: A Unified Framework for Opimizaion under Uncerainy Informs Annual Meeing - Nashville November 13, 2016 Warren B. Powell Princeon Universiy Deparmen of Operaions Research and Financial Engineering 2016 Warren B. Powell, Princeon Universiy

2 Learning problems Healh sciences» Sequenial design of experimens for drug discovery» Drug delivery Opimizing he design of proecive membranes o conrol drug release» Medical decision making Opimal learning for medical reamens.

3 Meeing variabiliy wih porfolios of generaion wih mixures of dispachabiliy Slide 3

Real-ime logisics Uber» Provides real-ime, on-demand ransporaion.» Drivers are encouraged o ener or leave he sysem using pricing signals and informaional guidance.

5 Real-ime logisics Uber» Provides real-ime, on-demand ransporaion.» Drivers are encouraged o ener or leave he sysem using pricing signals and informaional guidance. Decisions:» How o price o ge he righ balance of drivers relaive o cusomers.» Assigning and rouing drivers o manage Uber-creaed congesion.» Real-ime managemen of drivers.» Pricing (rips, new services, )» Policies (rules for managing drivers, cusomers, ) 2016 W.B. Powell

6 Planning for a risky world Disaser response Robus design of emergency response neworks. Design of sensor neworks and communicaion sysems o manage responses o hurricanes, sunamis, nuclear disasers and erroris aacks. Disease Managemen of medical personnel, equipmen and vaccines o respond o a disease oubreak. Robus design of supply chains o miigae he disrupion of ransporaion sysems. Slide 6

» How o reac? Hurricane Sandy» Once in 100 years?

7 Designing robus power grids The power grid» Loss of power creaes cascading failures (lack of fuel, inabiliy o pump waer)» How o plan?» How o reac? Hurricane Sandy» Once in 100 years?» Rare convergence of evens» Bu, meeorologiss did an amazing job of forecasing he sorm.

8 Modeling Before we can solve complex problems, we have o know how o hink abou hem. Min E {cx} Ax = b x > 0 Organize class libraries, and se up communicaions and daabases Mahemaician Sofware The bigges challenge when making decisions under uncerainy is modeling.

9 Modeling For deerminisic problems, we speak he language of mahemaical programming» Linear programming: min x cx Ax b x 0» For ime-saged problems min T x0,..., x cx T 0 A x B x b Dx u x Arguably Danzig s bigges conribuion, more so han he simplex algorihm, was his ariculaion of opimizaion problems in a sandard forma, which has given algorihmic researchers a common language.

10 Modeling For deerminisic problems, we speak he language of mahemaical programming» Linear programming: min x cx Ax b x 0» Opimal conrol:» For ime-saged problems min T x0,..., x cx T 0 A x B x b Dx u x min L( x, u ) J ( x ) u0,..., ut T T 0 x T f( x, u ) 1

11 Sochasic Approximae Robus Decision dynamic opimizaion Simulaion programming analysis opimizaion Opimal Dynamic Model learning Programming predicive and Sochasic conrol Opimal Bandi conrol search conrol problems Online programming Sochasic conrol Reinforcemen learning Markov decision processes compuaion Simulaion opimizaion

12 Sochasic Approximae Robus Decision dynamic opimizaion Simulaion programming analysis opimizaion Opimal Dynamic Model learning Programming predicive and Sochasic conrol Opimal Bandi conrol search conrol problems Online programming Sochasic conrol Reinforcemen learning Markov decision processes compuaion Simulaion opimizaion

13 Ouline Canonical problems Problem classes Soluion sraegies for learning problems Elemens of a dynamic model An energy sorage illusraion Modeling uncerainy Designing policies The four classes of policies From deerminisic o sochasic opimizaion

14 Ouline Canonical problems Problem classes Soluion sraegies for learning problems Elemens of a dynamic model An energy sorage illusraion Modeling uncerainy Designing policies The four classes of policies From deerminisic o sochasic opimizaion

15 Canonical problems Decision rees

16 Canonical problems Sochasic search (derivaive based)» Basic problem: max F( xw, ) x» Sochasic gradien n1 n n n1 x x nxf( x, W )» Convergence: n * lim n F( x, W) F( x, W) Manufacuring nework (x=design) Uni commimen problem (x=day ahead decisions) Transformers (x=replacemen policy) Invenory sysem (x=design, replenishmen policy) Baery sysem (x=choice of maerial) Paien reamen cos (x=drug, reamens) Trucking company (x=flee size and mix)

17 Canonical problems Ranking and selecion (derivaive free)» Basic problem: max x x F( xw, ) 1,..., M n» We need o design a policy X ( S ) ha, N max, F X W

Canonical problems Muli-armed bandi problems» We do no know he expeced winnings from each slo machine ( arm ).» Collec informaion by playing a machine.

18 Canonical problems Muli-armed bandi problems» We do no know he expeced winnings from each slo machine ( arm ).» Collec informaion by playing a machine. n» We need o find a policy X ( S ) for playing machine x ha maximizes: where W max n 1 S n N 1 n0 W n1 n x "winnings" Sae of knowledge New informaion Wha we know abou each slo machine n n x X ( S ) Choose nex arm o play

19 Canonical problems Two-sage sochasic programming» Make iniial decision How many Chrismas rees o plan» See informaion See he orders for Chrismas rees from reailers» Make final decision Shipping Chrismas rees o reailers Opimizaion model where W x 1 min cx Q x ( x, W ) This is ofen solved using x 0 Q ( x, W ( )) min c ( ) x ( ) x ( ) X ( ) min cx 0 0 p( ) c( ) x( ) T 1

20 Canonical problems Muli-sage sochasic programming» The sochasic programming communiy likes o wrie:» This is he same as:» which is he same as T min EC S, X ( S ) S 0 0

21 Canonical problems (Discree) Markov decision processes» Bellman s opimaliy equaion V ( S ) min C( S, a ) V ( S ) S a A 1 1 min a (, ) ( 1 ', ) 1( 1) CS a A ps s S a V S s '»where S Discree sae (node in nework, iems in invenory) a W Acion (ransiion o node, purchases) Random informaion (demand, prices, wind, deposis) S S ( S, a, W ) M 1 1» Solve saring a =T wih VT( ST) 0and sep backward in ime.

22 Canonical problems Linear quadraic regulaion (LQR)» A popular opimal conrol problem in engineering involves solving:» where: x T min ( x ) Qx ( u ) Ru T T u0,..., ut 0 Sae a ime u Conrol a ime (mus be F 1» Possible o show ha he opimal policy looks like: U ( x ) K x where is a complicaed funcion of Q and R. measurable) x f( x, u ) w ( w is random a ime ) K

23 Canonical problems Opimal sopping - Find he bes ime o sop and sell an asse» Model: Exogenous process: Decision: 1 If we sop and sell a ime X ( ) 0 Oherwise Reward: f ( p ) Reward received if we sop a ime (e.g. f( p ) p )» Opimizaion problem: p1 p2 p T,,..., Sequence of sock prices max X f( p ) where is a sopping ime (or " F measurable funcion" )

24 Canonical problems Sochasic conrol (from ex by Rene Carmona)» This is MCCM - mahemaically correc, compuaionally meaningless.

is simply incorrec - x is a random variable. This does no model he flow of informaion.

25 Canonical problems Engineers like o wrie T x0,..., x CS T x 0 max (, )» This way of modeling is asonishingly common in he engineering lieraure, bu i is simply incorrec - x is a random variable. This does no model he flow of informaion. Mahemaicians like o wrie max CS (, x) x0,..., xt 0 where x is F measurable. T I am no smar enough o do sochasic opimizaion.» This is mahemaically correc, bu wih no pah o compuaion.

26 Canonical problems Beer:» Maximize over policies: T max CS (, X ( S)) S0 0 where X ( S ) is a funcion of he sae S.» Now we jus have o show how o search over policies. This is likely o look like: T max (, ( )), f CS X S S f F 0 0 f where f F is funcion classes, is unable parameers» In his uorial, we are going o show ha all of hese canonical problems can be modeled his way.

27 Ouline Canonical problems Problem classes Soluion sraegies for learning problems Elemens of a dynamic model An energy sorage illusraion Modeling uncerainy Designing policies The four classes of policies From deerminisic o sochasic opimizaion

28 Problem classes Saging of informaion and decisions» Saic sochasic opimizaion: Decision, informaion.» Two-sage sochasic programming (vecor x): Decision, informaion, decision.» Mulisage sochasic programming (vecor x): Decision, informaion, decision, informaion,, decision.» Finie horizon Markov decision process (finie acions): Decision, informaion, decision, informaion,, decision.» Asympoic sochasic search: Decision, informaion, decision, informaion,» Infinie horizon Markov decision process (finie acions): Decision, informaion, decision, informaion, decision, Conexual informaion» Each problem above sars wih iniial informaion from an exogenous source (he conex ).

29 Problem classes Learning problems (sae independen)» Arises when we are rying o opimize an unknown funcion (black box simulaion, lab experimen, newsvendor wih unknown disribuion): max FxW (, ) pmin( xw, ) cx x» The sae variable is S K Our sae of knowledge abou F ( x, W )» Transiion: x X ( S ) Fˆ F( x, W ) K S K 1 1 ( S K, x X ( S ), W, Fˆ F( x, W ), S K, x X ( S ),...)

30 Problem classes Parameric belief model» We have a nonlinear model fx ( ) wih uncerain n» Knowledge is n n p p, p Prob[ ] k k k fx ( ) Sampled belief model K ( p ),( ) n n K K k k1 k k1

31 Problem classes Sae-dependen problems» Dynamic informaion process» Imagine price is revealed before making a decision: max FxW (, ) pmin( xw, ) cx x p» Sae variable is now S ( p, K )» Transiions: x X ( K ) Fˆ F( x, W ) K p p pˆ 1 1 S ( p, K ) ( S ( p, K ), x X ( S ), W ( pˆ, Fˆ ), S ( p, K ), x X ( S ),...)

32 Problem classes Sae-dependen problems» Dynamic resource process Excess invenory held over: R Invenory available a ime» Opimizaion problem is now max FxW (, ) p min( xr, ˆ) cx 0xR» Sae variable is now S ( R, p, K )» Transiions: x X ( K ) Fˆ F( x, W ) K R max 0, R x Rˆ 1 1 p p pˆ 1 1 S ( R, p, K ) ( S ( R, p, K ), x X ( S ), W ( Rˆ, pˆ, Fˆ), S ( R, p, K ), x X ( S ),...)

33 Problem classes Offline (final reward)» We can ieraively search for he bes soluion.» We only care abou he final soluion.» Asympoic formulaion: max FxW (, ) x» Finie horizon formulaion:, max N Fx (, W) Online (cumulaive reward)» We have o learn as we go N 1 1 max n n FX ( ( S ), W ) n 0

34 Problem classes Our mos general formulaion ha covers all of hese problems is max, ( ), E C S X S W S T max E F( X, W) S T 0»where S S ( S, X ( S ), W ) M 1 1 So, how do we design policies?

35 Ouline Canonical problems Problem classes Soluion sraegies for learning problems Elemens of a dynamic model An energy sorage illusraion Modeling uncerainy Designing policies The four classes of policies From deerminisic o sochasic opimizaion

36 Soluion sraegies Special srucure» Where expecaions can be compued, urning he problem ino a deerminisic problem» Implies we can solve max exacly as a deerminisic x F( xw, ) problem (his is wha we are doing when we use Bellman s equaion). Sampled problems (SAA, scenario rees)» Unconrolled sampling» Conrolled sampling Adapive learning algorihms» Works wih full probabiliy space» This is our focus.

37 Soluion sraegies Sampled problems» Sample average approximaion (sochasic opimizaion) N 1 min n x F( x, W ) N n» Probabilisic learning of a sampled model» Saisical learning N 1 min yn f( xn ) N 1 K n x F x k pk k 1 min (, ) n1 2

38 Soluion sraegies Sampling» Unconrolled This is wha is implied when we are given he daa. Bach daase ( big daa ) No conrol over he arrival process: e.g. Paiens arriving o a hospial» Direc conrol Creaing samples ha accuraely represen he underlying sochasic process Quanizaion/epi-splines Voronoi quanizaion K-L divergence (more generally phi-divergence)» Indirec conrol Decision influences disribuion (e.g. selling sock influences price)

39 Soluion sraegies Adapive learning algorihms» We seek mehods ha are rying o solve he original problem (no he sampled approximaion)» We are ineresed in: Asympoic opimaliy Rapid finie-ime convergence Sraegies:» Derivaive-based sochasic search Asympoic analysis (Robbins-Monro ec) Finie-ime analysis» Derivaive-free sochasic search Requires ieraive learning

40 Soluion sraegies Derivaive-based sochasic search infinie horizon» Sochasic gradien algorihm: max F( xw, ) pmin xw, cx x x x F( x, W ) n1 n n n1 n x F x W n n1 x (, ) F x W n * lim n (, ) (, ) pc xw c x W F x W» Asympoic analysis produces a deerminisic soluion This is deerminisic hinking on a sochasic problem» Wha happens when we wan he bes soluion in N ieraions? * x

41 Soluion sraegies Derivaive-based sochasic search finie horizon The sae The decision» We wan a mehod (an algorihm) ha produces he bes soluion by ime N: The exogenous informaion n1 n n n1 x x nxf( x, W ) The ransiion funcion» Assume ha our sepsize rule is n n The policy N n where N number of imes he soluion has no improved.»afer n ieraions, our sae is n ( n, n n1 M n n1 S x N ) S S ( S, n, W )» Given he sae n S and he parameer, we can deermine (afer sampling n 1) he nex sae n. W S 1

42 Soluion sraegies Tesing differen sepsize rules ( policies ) Percenage error from opimal OSA 1 n Kalman» We wan o opimize he rae of convergence: Differen sepsize rules Differen ways of compuing he gradien

43 Soluion sraegies Derivaive-based sochasic search finie horizon» If X ( S n ) is our algorihm (policy), we follow a sample pah 1 N W,..., W o obain a final soluion x,n, which is a random variable.» Our opimizaion problem is o find he bes policy (algorihm) X ( S n ), which requires aking an N expecaion over he samples max F ( x, W ) x Wx W 1,..., W N, N,, N W 1..., W N

44 Soluion sraegies Derivaive-free sochasic search» Sar by assuming ha our se of possible decisions is finie: xx x1, x2,..., xm» Assume we have some belief abou our funcion (say, lookup able). Using a Bayesian model, we assume we have a disribuion of belief abou f ( x) F( x, W) given by 0 0 F( x, W) N, x 0 0 2,0 where is he precision where 1/ » We refer o S K N(, ) as our prior sae of knowledge.

45 Derivaive-free, finie horizon Belief sae for ranking and selecion» S is our sae of knowledge n 5 n 5 S N, n 2, n S S,..., S n n n

46 Derivaive-free, finie horizon Updaing beliefs»afer n experimens, our belief is» Assume ha based on his belief, we choose» We updae our beliefs using n n N, x x x n n x x X ( S ) o run for our nex experimen (experimen n+1): W n1 n1 x n x n n n W n1 n1 x x x n W n1 n W x x W Transiion funcion: n1 M n n n1 S S ( S, x, W )

47 Derivaive-free, finie horizon Designing a policy» We need a rule for picking which decision o ry nex. n We call his rule our policy. Some examples are: Inerval esimaion: ( S ) Upper confidence bounding UCB n UCB n UCB log n n X ( S, ) arg max x x Nx No. of imes x is esed. n N x Thompson sampling: Knowledge gradien (expeced value of informaion): X IE n IE n IE n n n X ( S, ) arg max Sd. dev. of x x x x x TS n n n n n X ( S ) arg max ˆ ˆ N(, ) x x x x x ( x) E max F( y, K ( x)) max F( y, K ) KG, n n1 n y y

48 Derivaive-free, finie horizon Tesing policies» We have hree sources of randomness: 0 0 The rue funcion x N, (Bayesian belief model) The samples 1 N W,..., W generaed from ruh x n1 n1 W x n x n while following policy Finally, he uncerainy W needed o evaluae he final design x,n» We choose our policy by solving: max (, ) F x W x Wx N, N,, N W 1..., W N, max N F( x, W)

49 Ouline Canonical problems Problem classes Soluion sraegies for learning problems Elemens of a dynamic model An energy sorage illusraion Modeling uncerainy Designing policies The four classes of policies From deerminisic o sochasic opimizaion

50 Modeling We lack a sandard language for modeling sequenial, sochasic decision problems.» In he slides ha follow, we propose o model problems along five fundamenal dimensions: Sae variables Decision variables Exogenous informaion Transiion funcion Objecive funcion» This framework draws heavily from Markov decision processes and he conrol heory communiies, bu i is no he sandard form used anywhere.

51 Modeling dynamic problems The sysem sae: Conrols communiy x "Informaion sae" Operaions research/mdp/compuer science S R, I, K Sysem sae, where: R I K Resource sae (physical sae) Locaion/saus of ruck/rain/plane Energy in sorage Informaion sae Prices Weaher Knowledge sae ("belief sae") Belief abou raffic delays Belief abou he saus of equipmen Slide 51

52 The sae variable Classes of sae variables K I R Resource/physical sae Informaion sae Knowledge/belief sae

53 The sae variables Wha is a sae variable?» Bellman s classic ex on dynamic programming (1957) describes he sae variable wih: we have a physical sysem characerized a any sage by a small se of parameers, he sae variables.» The mos popular book on dynamic programming (Puerman, 2005, p.18) defines a sae variable wih he following senence: A each decision epoch, he sysem occupies a sae.» Wikipedia: Sae commonly refers o eiher he presen condiion of a sysem or eniy or. A sae variable is one of he se of variables ha are used o describe he mahemaical sae of a dynamical sysem

The sae variable A proposed definiion of a sae variable: The sae S is he minimally dimensioned funcion of hisory ha, combined wih he exogenous informaion, is necessary and sufficien o calculae he

54 The sae variable A proposed definiion of a sae variable: The sae S is he minimally dimensioned funcion of hisory ha, combined wih he exogenous informaion, is necessary and sufficien o calculae he coss/rewards, consrains, and ransiions, from ime onward.» The firs depends on a policy. The second depends only on he problem.» Using eiher definiion, all properly modeled problems are Markovian!

55 Modeling dynamic problems Decisions: Markov decision processes/compuer science a Discree acion Conrol heory u Low-dimensional coninuous vecor Operaions research x Usually a discree or coninuous bu high-dimensional vecor of decisions. A his poin, we do no specify how o make a decision. Insead, we define he funcion X ( s) (or A ( s) or U ( s)), where specifies he ype of policy. " " carries informaion abou he ype of funcion f, and any unable parameers f. Slide 55

56 The decision variables Syles of decisions»binary xx 0,1» Finie x X 1,2,..., M» Coninuous scalar x X a, b» Coninuous vecor x( x1,..., xk), xk» Discree vecor x( x1,..., xk), xk» Caegorical x ( a,..., a ), a is a caegory (e.g. red/green/blue) 1 I i

57 Modeling dynamic problems Exogenous informaion: W New informaion ha firs became known a ime = Rˆ, Dˆ, pˆ, Eˆ Rˆ Equipmen failures, delays, new arrivals New drivers being hired o he nework Dˆ New cusomer demands pˆ Changes in prices Eˆ Informaion abou he environmen (emperaure,...) Noe: Any variable indexed by is known a ime. This convenion, which is no sandard in conrol heory, dramaically simplifies he modeling of informaion. Below, we le represen a sequence of acual observaions W1, W2,... W refers o a sample realizaion of he random variable W. Slide 57

58 Modeling dynamic problems The ransiion funcion M S 1 S ( S, x, W 1) R ˆ 1 R x R 1 Invenories p ˆ 1 p p 1 Spo prices D D Dˆ Marke demands 1 1 Also known as he: Sysem model Sae ransiion model Plan model Plan equaion Transiion law Transfer funcion Transformaion funcion Law of moion Model For many applicaions, hese equaions are unknown. This is known as model-free dynamic programming. Slide 58

Modeling dynamic problems The objecive funcion

merics» Properies (convexiy, monooniciy, coninuiy,

$cumulaive reward» Time o compue (fracions of$

59 Modeling dynamic problems The objecive funcion Dimensions of objecive funcions» Performance merics» Properies (convexiy, monooniciy, coninuiy, unimodulariy, )» Final reward vs. cumulaive reward» Time o compue (fracions of seconds o minues, o hours, o days or monhs)» Expecaion or risk measures Slide 59

60 Objecive funcions Performance merics» Coss, profis, revenues, conribuions (business)» Gains, losses (engineering)» Srengh, conduciviy, diffusiviy (maerials science)» Tolerance, oxiciy, effeciveness (healh)» Sabiliy, robusness (engineering)» Risk, volailiy (finance)» Uiliy (economics)

61 Objecive funcions Objecive funcions» Deerminisic coss T c x = Deerminisic linear coss» Sae-independen 1 1 F( x, W ) pmax( x, W ) cx n n n n n» Sae-dependen CS (, x) px ( p is in he sae variable) CS (, xw, ) pmax( xw, ) cx 1 1 T T CS (, x, S ) SQS S RS

62 Objecive funcions Characerisics of he objecive funcion» Analyical behavior Concave/convex, unimodal, monoone, smooh,» Compuaional cos: Fracions of a second Analyical funcions Minues Compuer simulaions Hours Laboraory experimens/compuer simulaions Days (or longer) Laboraory/field experimens Weeks o monhs Field experimens» Sarup/swiching coss Wha is involved o observe funcion for differen inpus? Is here a cos o swich o differen inpus?» Risk operaors Expecaions Risk measures Robus/wors case

63 Modeling sochasic, dynamic problems Objecive funcions» Offline (asympoic) sochasic search max F( x, W) x» Two-sage sochasic programming» Offline (finie ime) sochasic search max x cx Qx (, W) , N max F( X, W)» Muliarmed bandi problem» Conexual bandi problem» Full dynamic programming» Offline dynamic programming N 1 1 max n n F( X ( S ), W ) n0 N 1 n n1 F X S W S0 n0 T CS X S S0 0 T imp learn CS X S 0 max ( ( ), ) max (, ( )) max (, ( ))

64 Problem classes The circle of sochasic opimizaion T 0 max CS (, X ( S)) learn T max C( S, X ( S), W ) S max x F( x, W) 0 imp max cx Qx (, W) x o 0 1, max N F( X, W) T max CS (, X ( S)) S 0 0 N 1 1 max n n F( X ( S ), W ) n0 1 N 1 max n n F( X ( S ), W ) S n0 0

65 Modeling sochasic, dynamic problems The universal objecive funcion max E, ( ), C S X S W S T Expecaion over all Conribuion funcion random oucomes Decision funcion (policy) Sae variable Iniial sae variable Finding he bes policy New informaion Given a sysem model (ransiion funcion) S S S, x, W ( ) M 1 1 Now we jus have o find he bes policy.

66 Problem classes Major problem classes Offline (erminal reward) Online (cumulaive reward) Sae independen max x F( x, W), N max F( X, W) N 1 max ( ( n ), n1 ) n0 F X S W S sochasic search muliarmed bandi problems 0 Sae dependen max CS (, X ( S)) learn T 1 0 imp imp S S ( S, X ( S ), W ) M 1 1 T 1 max CS (, X ( S)) S 0 M S S ( S, X ( S ), W ) 1 1 dynamic programming 0

67 Ouline Canonical problems Problem classes Soluion sraegies for learning problems Elemens of a dynamic model An energy sorage illusraion Modeling uncerainy Designing policies The four classes of policies From deerminisic o sochasic opimizaion

68 An energy sorage problem Consider a basic energy sorage problem:» We are going o show ha wih minor variaions in he characerisics of his problem, we can make each class of policy work bes.

69 An energy sorage problem A model of our problem» Sae variables» Decision variables» Exogenous informaion» Transiion funcion» Objecive funcion

70 An energy sorage problem Sae variables E B L G» We will presen he full model, accumulaing he informaion we need in he sae variable.» We will highligh informaion we need as we proceed. This informaion will make up our sae variable.

71 An energy sorage problem E Decision variables B L G EL EB GL GB BL x x, x, x, x, x,» Consrains;

72 An energy sorage problem E Exogenous informaion B L G W Eˆ Change in energy from wind beween 1 and p L ' Noise in he price process beween Dˆ Change in load beween 1 and L ' load ' 1 and f Forecas of load D provided by vendor a ime f L f ' Provided exogenously

73 An energy sorage problem E Transiion funcion B L G E E Eˆ 1 1 p p p p p D D Dˆ 1 1 R R x baery baery 1

74 An energy sorage problem E Objecive funcion B L G GB CS (, x) p x x GL T min CS (, X ( S)) S0 0 Expecaion depends on forecass f.

75 An energy sorage problem Sae variables» Cos funcion p Price of elecriciy» Decision funcion Consrains: S R, E, L,( p, p, p ), f L 1 2 L f Needed o compue probabiliy model» Transiion funcion p p p p p

76 Modeling Noes» There is a common misundersanding ha sae variables have o be simple (hey don ).» There is also a endency o refer o problems ha depend on prior informaion (such as p and p ) as 1 hisory dependen. Bu his is informaion known 2 a ime (who cares when i firs became known).» All properly modeled problems are Markovian!» Undersanding sae variables is very imporan in dynamic sysems, because i forces you o undersand wha you know a ime, and wha you don.

77 Ouline Canonical problems Problem classes Soluion sraegies for learning problems Elemens of a dynamic model An energy sorage illusraion Modeling uncerainy Designing policies The four classes of policies From deerminisic o sochasic opimizaion

78 Modeling uncerainy There are wo informaion processes ha drive he sysem: x» Decisions This is he endogenously conrollable informaion process.» Exogenous informaion - This comes from he iniial sae S, and he exogenous informaion process W. 0 To figure ou how o make good decisions, you need: M» The sysem model S 1 S ( S, x, W 1)» The iniial sae S 0 and he exogenous informaion process W.» The conribuion funcion CS (, xw, ) W.B. Powell

79 Modeling uncerainy S 0 The iniial sae. This conains:» All deerminisic parameers needed by he sysem. This is saic daa, so i is no modeled as par of he dynamic sae. S, 0» Sae of knowledge probabilisic informaion abou uncerain parameers. This informaion is always represened as a probabiliy disribuion of some form W.B. Powell

80 Modeling uncerainy The exogenous informaion process W which migh include:» Passive informaion This is informaion ha arrives regardless of any acions we may ake. Examples: Purely exogenous Informaion ha is no influenced by he sae of he sysem or any acions we ake. Examples: Rainfall, sock prices (if we are a small player). Exogenous disribuions may influenced by saes and/or acions (sock prices if we are a large player).» Acive informaion This is informaion we choose o collec Running a laboraory experimen Purchasing a repor 2016 W.B. Powell

81 Modeling uncerainy Types of uncerainy» Observaional uncerainy Errors in our observaions of he sae of he sysem: Wha is he CO2 conen of he amosphere? Wha is invenory of oil in he U.S.?» Prognosic uncerainy Uncerainy in he forecas of a fuure even. Forecasing demands Forecasing he weaher 2016 W.B. Powell

82 Modeling uncerainy Types of uncerainy» Experimenal noise This is he variabiliy ha arises when running repeaed experimens (eiher in a lab or in he field) Tesing he impac of a new flu drug. Tesing he effec of a new maerial on baery lifeimes» Transiional uncerainy We have a model of how a (presumably) deerminisic sysem evolves, bu here is sill noise: M S S ( S, x ) 1! Modeling he locaion of an aircraf moving a a cerain speed from a known locaion. Predicing he ime of arrival of a car a a downsream node 2016 W.B. Powell

83 Modeling uncerainy Types of uncerainy» Inferenial uncerainy Uncerainy in parameers esimaed from observaional daa Someimes known as diagnosic uncerainy which migh arise in he conex of esimaing a condiion such as disease or he reason for a malfuncion (in an engine). Such an assessmen would an inference based on indirec observaions.» Model uncerainy This is uncerainy abou he model iself, which comes in wo forms: Uncerainy abou he srucure of he model: Linear approximaion of a nonlinear model Differen ses of equaions describing he climae Parameers characerizing he model 2016 W.B. Powell

84 Modeling uncerainy Types of uncerainy» Sysemaic exogenous uncerainy - Errors in he model of exogenous informaion ha occur on long ime scales: Modeling he effec of long-erm drops in oil consumpion due o conservaion Modeling he effec of increased cloud cover due o climae change W Base signal (forecased) Low frequency noise ( scenarios ) High frequency noise 2016 W.B. Powell

85 Modeling uncerainy Types of uncerainy» Conrol uncerainy x x You ask for bu you ge Wiley ses a wholesale price of $80, bu Amazon sells a some random price above ha (limis Wiley s abiliy o se prices).» Algorihmic uncerainy Run he same algorihm wice, and you may ge differen answers (depends on he algorihm and he naure of he compue environmen) W.B. Powell

86 Modeling uncerainy Bayesian vs. frequenis uncerainy» Bayesian uncerainy is capured by a disribuion of belief derived from prior informaion: Exper judgmen Informaion colleced from differen seings Pas experience Bayesian uncerainy is always communicaed hrough» Frequenis uncerainy This is uncerainy derived from saisical analysis of he variabiliy inheren in he exogenous informaion W S W.B. Powell

87 Modeling uncerainy Types of disribuions» Probabiliy disribuions come in differen forms: Classical hin ailed disribuions Exponenial family» Normal, exponenial, gamma» Uniform Discree varians Heavy-ailed disribuions Cauchy disribuion (may have infinie variance) Jump diffusion Sum of low-variance normally disribued error, plus a high-variance error ha occurs wih low probabiliy Spikes Burss Rare evens 2016 W.B. Powell

88 Modeling uncerainy Noes» How can you claim you have an opimal policy if you have no modeled he problem properly?» Uncerainy is easily he mos suble and overlooked aspec of modeling.» I is no enough o include uncerainy you have o capure uncerainy in a way ha represens realiy. This issue universally pervades applicaions of sochasic opimizaion o real problems W.B. Powell

89 Ouline Canonical problems Problem classes Soluion sraegies for learning problems Elemens of a dynamic model An energy sorage illusraion Modeling uncerainy Designing policies The four classes of policies From deerminisic o sochasic opimizaion

90 Designing policies We have o sar by describing wha we mean by a policy.» Definiion: A policy is a mapping from a sae o an acion. any mapping. How do we search over an arbirary space of policies?

91 Designing policies Policies and he English language Behavior Manner Riual Belief Mehod Rule Bias Mode Syle Commandmen Mores Technique Conduc Paerns Tene Convenion Plans Tradiion Culure Policies Way of life Cusoms Pracice Dogma Prejudice Eiquee Principle Fashion Procedure Formula Process Habi Proocols Laws/bylaws Recipe

92 Designing policies Two fundamenal sraegies: 1) Policy search Search over a class of funcions for making decisions o opimize some meric. T max f f E C S, X ( S ) S ( ff, ) 0 0 2) Lookahead approximaions Approximae he impac of a decision now on he fuure. T * X ( ) arg max (, ) max S x C S ( ', '( ')) 1, x C S X S S S x ' 1

93 Designing policies Policy search: 1a) Analyical funcions ha direcly map saes o acions ( policy PFA funcion approximaions or PFAs) x X ( S ) Lookup ables when in his sae, ake his acion Parameric funcions Order-up-o policies: if invenory is less han s, order up o S. PFA Affine policies - x X ( S ) ff ( S) ff Neural neworks Locally/semi/non parameric Requires opimizing over local regions 1b) Maximizing analyical approximaions of coss and/or consrains ( cos funcion approximaions or CFAs) Opimizing a deerminisic model modified o handle uncerainy (buffer socks, schedule slack) CFA X ( S ) arg max C ( S, x ) x X ( )

94 Designing policies Policy search:» Typically involves searching wihin a parameerized family» bu may involve comparisons across classes of funcional approximaions.» May be done offline (erminal reward) or online (cumulaive reward). Syles» Acive learning Experimen wih new policies wih he hope of finding improvemens, bu risks spending ime using less effecive policies.» Passive learning Using he policy you believe is bes, do updaing based on samples ha work well.

95 Designing policies Lookahead approximaions Approximae he impac of a decision now on he fuure:» An opimal policy (based on looking ahead): T * X ( ) arg max (, ) max S x C S ( ', '( ')) 1, x C S X S S S x ' 1 2a) Approximaing he value of being in a downsream sae using machine learning ( value funcion approximaions ) * X ( S ) arg max C( S, x ) V ( S ) S, x x 1 1 X ( S ) arg max C( S, x ) V ( S ) S, x VFA x 1 1 x x arg max x CS (, ) ( ) x V S 2b) Approximae lookahead models Opimize over an approximae model of he fuure: ( ) arg max (, ) max ( ', ( ')),!, T LA X S C S x C S X S S S x ' 1

96 Designing policies Policies based on value funcion approximaions» This is he foundaion of all soluion sraegies ha depend on Bellman (or Hamilon-Jacobi) opimaliy equaions.» Exac value funcions are rare: Discree saes and acions, wih a compuable one-sep ransiion marix. Analyical soluions for special funcions (e.g. LQR)» Approximae value funcions are generally based on: Approximae value ieraion Approximae policy ieraion

97 Designing policies The ulimae lookahead policy is opimal T * X ( ) arg max (, ) max S x C S ( ', '( ')) 1, x C S X S S S x ' 1 Maximizaion ha we canno compue Expecaions ha we canno compue

98 Designing policies The ulimae lookahead policy is opimal T * X ( ) arg max (, ) max S x C S ( ', '( ')) 1, x C S X S S S x ' 1 Insead, we have o solve an approximaion called he lookahead model: * ( ) arg max (, ) max ( ', '( ')), 1, H X S x C S x C S X S S S x x ' 1» A lookahead policy works by approximaing he lookahead model.

99 Designing policies Types of lookahead approximaions» One-sep lookahead Widely used in pure learning policies: Bayes greedy/naïve Bayes Thompson sampling Value of informaion (knowledge gradien)» Muli-sep lookahead Deerminisic lookahead, also known as model predicive conrol, rolling horizon procedure Sochasic lookahead: Two-sage (widely used in sochasic linear programming) Mulisage» Mone carlo ree search (MCTS) for discree acion spaces» Mulisage scenario rees (sochasic linear programming) ypically no racable.

100 Four (mea)classes of policies Policy search Lookahead approximaions 1) Policy funcion approximaions (PFAs)» Lookup ables, rules, parameric/nonparameric funcions 2) Cos funcion approximaion (CFAs) CFA» X ( S ) argmax C ( S, ) ( ) x x X 3) Policies based on value funcion approximaions (VFAs)» VFA x x X ( S) argmax x C( S, ) (, ) x V S S x 4) Direc lookahead policies (DLAs)» Deerminisic lookahead/rolling horizon proc./model predicive conrol» Chance consrained programming PAx [ fw ( )] 1» Sochasic lookahead /sochasic prog/mone Carlo ree search x, x,1,..., x,t» Robus opimizaion T LAD ' ' x,..., x, H ' 1 X ( S ) arg max C( S, x ) C( S, x ) T LAS ' ' ' 1 X ( S ) arg max C( S, x ) p( ) C( S ( ), x ( )) T LARO ' ' x,..., x, H ww ( ) ' 1 X ( S ) arg max min C( S, x ) C( S ( w), x ( w))

101 Four (mea)classes of policies Funcion approx. 1) Policy funcion approximaions (PFAs)» Lookup ables, rules, parameric/nonparameric funcions 2) Cos funcion approximaion (CFAs) CFA» X ( S ) argmax C ( S, ) ( ) x x X 3) Policies based on value funcion approximaions (VFAs)» VFA x x X ( S) argmax x C( S, ) (, ) x V S S x 4) Direc lookahead policies (DLAs)» Deerminisic lookahead/rolling horizon proc./model predicive conrol T LAD ' ' x,..., x, H ' 1 X ( S ) arg max C( S, x ) C( S, x )» Chance consrained programming PAx [ fw ( )] 1» Sochasic lookahead /sochasic prog/mone Carlo ree search x, x,1,..., x,t» Robus opimizaion T LAS ' ' ' 1 X ( S ) arg max C( S, x ) p( ) C( S ( ), x ( )) T LARO ' ' x,..., x, H ww ( ) ' 1 X ( S ) arg max min C( S, x ) C( S ( w), x ( w))

Approximaion sraegies Approximaion sraegies» Lookup ables Independen beliefs Correlaed beliefs» Linear parameric models Linear models Sparse-linear

102 Approximaion sraegies Approximaion sraegies» Lookup ables Independen beliefs Correlaed beliefs» Linear parameric models Linear models Sparse-linear Tree regression» Nonlinear parameric models Logisic regression Neural neworks» Nonparameric models Gaussian process regression Hierarchical aggregaion

103 Designing policies Finding he bes policy» We have o firs ariculae our classes of policies» So minimizing over means:» We hen have o pick an objecive such as or f f PFAs, CFAs, VFAs, LAs Parameers ha characerize each family. f, T max C S, X ( S ) F X ( S ), W 0 0 max C S, X F( X, W) f T T T T

104 Designing policies Noes:» Three of he four classes of policies involve some form of funcion approximaion (PFA, CFA, VFA)» Lookahead models require approximaing he lookahead model, which requires (among oher approximaions) replacing he full probabiliy space wih a sampled approximaion (can be hard o solve).» Searching for he bes parameerized policy is jus like solving a sochasic search problem: May be derivaive-based or derivaive-free. May be solved offline or online.» VFAs have o be esimaed using biased observaions.

105 Ouline Canonical problems Problem classes Soluion sraegies for learning problems Elemens of a dynamic model An energy sorage illusraion Modeling uncerainy Designing policies The four classes of policies From deerminisic o sochasic opimizaion

106 Ouline The four classes of policies» Policy funcion approximaions (PFAs)» Cos funcion approximaions (CFAs)» Value funcion approximaions (VFAs)» Direc lookahead policies (DLAs)

107 Ouline The four classes of policies» Policy funcion approximaions (PFAs)» Cos funcion approximaions (CFAs)» Value funcion approximaions (VFAs)» Direc lookahead policies (DLAs)

108 Policy funcion approximaions Baery arbirage When o charge, when o discharge, given volaile LMPs

109 Policy funcion approximaions Grid operaors require ha baeries bid charge and discharge prices, an hour in advance Discharge Charge We have o search for he bes values for he policy Charge Discharge parameers and.

110 Policy funcion approximaions Our policy funcion migh be he parameric model (his is nonlinear in he parameers): charge 1 if p Energy in sorage: charge X ( S ) 0 if p charge 1 if p discharge Price of elecriciy:

111 Policy funcion approximaions Finding he bes policy» We need o maximize T max F( ) C S, X ( S ) 0» We canno compue he expecaion, so we run simulaions: Charge Discharge Slide 111

112 Ouline The four classes of policies» Policy funcion approximaions (PFAs)» Cos funcion approximaions (CFAs)» Value funcion approximaions (VFAs)» Direc lookahead policies (DLAs)

113 Robus cos funcion approximaion Invenory managemen» How much produc should I order o anicipae fuure demands?» Need o accommodae differen sources of uncerainy. Marke behavior Transi imes Supplier uncerainy Produc qualiy

114 Robus cos funcion approximaions Imagine ha we wan o purchase pars from differen suppliers. Le x p be he amoun of produc we purchase a ime from supplier p o mee forecased demand. We would solve subjec o D X ( S ) arg min c x pp x p p pp x x x p p p D u 0 p» This assumes our demand forecas is accurae. D

115 Robus cos funcion approximaions Imagine ha we wan o purchase pars from differen suppliers. Le x p be he amoun of produc we purchase a ime from supplier p o mee forecased demand. We would solve subjec o X ( S ) arg min c x x ( ) p p pp pp x x x p p p u p Reserve buffer D» This is a parameric cos funcion approximaion D Reserve ( ) Buffer sock

116 Cos funcion approximaions A general way of creaing CFAs:» Define our policy: X ( ) argmin xc( S, x) ff ( S, x) ff Cos correcion erm.» This has been confused wih approximae dynamic programming, bu he correcion erm is no a value funcion.

117 Cos funcion approximaions An even more general CFA model:» Define our policy: X ( ) argmin C ( S, x ) subjec o x Paramerically modified coss Ax b ( ) Paramerically modified consrains» We une by opimizing: min F ( ) C( S, X ( )) T 0

118 Ouline The four classes of policies» Policy funcion approximaions (PFAs)» Cos funcion approximaions (CFAs)» Value funcion approximaions (VFAs)» Direc lookahead policies (DLAs)

119 Schneider Naional Slide Warren B. Powell Slide 119

120 Value funcion approximaion Pre-decision sae: we see he demands $350 $300 $150 $450 S TX (, Dˆ )

121 Value funcion approximaion We use iniial value funcion approximaions V 0 ( MN) 0 V 0 ( CO) 0 $350 V 0 ( NY) 0 V 0 ( CA) 0 $300 $150 $450 S TX (, Dˆ )

122 Value funcion approximaion and make our firs choice: 1 x V 0 ( MN) 0 V 0 ( CO) 0 $350 V 0 ( NY) 0 V 0 ( CA) 0 $300 $150 $450 S x NY ( ) 1

123 Value funcion approximaion Updae he value of being in Texas. V 0 ( MN) 0 V 0 ( CO) 0 $350 V 0 ( NY) 0 V 0 ( CA) 0 $300 $150 $450 V 1 ( TX) 450 S x NY ( ) 1

124 Value funcion approximaion Now move o he nex sae, sample new demands and make a new decision V 0 ( MN) 0 V 0 ( CO) 0 $400 $180 V 0 ( NY) 0 V 0 ( CA) 0 $600 $125 V 1 ( TX) 450 S NY (, Dˆ ) 1 1 1

125 Value funcion approximaion Updae value of being in NY V 0 ( MN) 0 V 0 ( CO) 0 $400 $180 V 0 ( NY) 600 V 0 ( CA) 0 $600 $125 V 1 ( TX) 450 S x CA 1 ( ) 2

126 Value funcion approximaion Move o California. V 0 ( MN) 0 V 0 ( CA) 0 $200 V 0 ( CO) 0 $350 $400 $150 V 1 ( TX) 450 V 0 ( NY) 600 S CA (, Dˆ ) 2 2 2

127 Value funcion approximaion Make decision o reurn o TX and updae value of being in CA V 0 ( MN) 0 V 0 ( CA) 800 $200 V 0 ( CO) 0 $350 $400 $150 V 1 ( TX) 450 V 0 ( NY) 500 S CA (, Dˆ ) 2 2 2

128 Value funcion approximaion Updaing he value funcion: Old value: V 1 ( TX) $450 New esimae: 2 ˆ ( ) $800 v TX How do we merge old wih new? ( ) (1 ) ( ) ( ) ˆ ( ) V TX V TX v TX (0.90)$450+(0.10)$800 $485

129 Value funcion approximaion An updaed value of being in TX V 0 ( MN) 0 V 0 ( CO) 0 $385 V 0 ( NY) 600 V 0 ( CA) 800 $275 $800 $125 V 1 ( TX) 485 S TX (, Dˆ ) 3 3 3

130 Approximae value ieraion n Sep 1: Sar wih a pre-decision sae S Sep 2: Solve he deerminisic opimizaion using an approximae value funcion: n n n1 M, x n ˆ min x (, ) ( ( v C S x V S S, x )) x n o obain. Sep 3: Updae he value funcion approximaion n x, n n1 x, n n 1( 1) (1 n 1) 1 ( 1) n 1ˆ V S V S v n Sep 4: Obain Mone Carlo sample of W ( ) and compue he nex pre-decision sae: n M n n n S 1 S ( S, x, W 1( )) Sep 5: Reurn o sep 1. on policy learning Deerminisic opimizaion Recursive saisics Simulaion

131 Approximae value ieraion n Sep 1: Sar wih a pre-decision sae S Sep 2: Solve he deerminisic opimizaion using an approximae value funcion: n n n1 M, x n ˆ min x (, ) ( ( v C S x V S S, x )) x n o obain. Sep 3: Updae he value funcion approximaion n x, n n1 x, n n 1( 1) (1 n 1) 1 ( 1) n 1ˆ V S V S v n Sep 4: Obain Mone Carlo sample of W ( ) and compue he nex pre-decision sae: n M n n n S 1 S ( S, x, W 1( )) Sep 5: Reurn o sep 1. Deerminisic opimizaion Recursive saisics Simulaion

132 Approximae dynamic programming a ypical performance graph Objecive funcion Ieraions

133 Ouline The four classes of policies» Policy funcion approximaions (PFAs)» Cos funcion approximaions (CFAs)» Value funcion approximaions (VFAs)» Direc lookahead policies (DLAs)

134 Lookahead policies Planning your nex chess move:» You pu your finger on he piece while you hink abou moves ino he fuure. This is a lookahead policy, illusraed for a problem wih discree acions.

135 Slide 135

136 Lookahead policies Decision rees:

137 Lookahead policies Modeling lookahead policies» Lookahead policies solve a lookahead model, which is an approximaion of he fuure.» I is imporan o undersand he difference beween he: Base model his is he model we are rying o solve by finding he bes policy. This is usually some form of simulaor. The lookahead model, which is our approximaion of he fuure o help us make beer decisions now.» The base model is ypically a simulaor, or i migh be he real world.

138 Lookahead policies Lookahead models use five classes of approximaions:» Horizon runcaion Replacing a longer horizon problem wih a shorer horizon» Sage aggregaion Replacing mulisage problems wih wo-sage approximaion.» Oucome aggregaion/sampling Simplifying he exogenous informaion process» Discreizaion Of ime, saes and decisions» Dimensionaliy reducion We may ignore some variables (such as forecass) in he lookahead model ha we capure in he base model (hese become laen variables in he lookahead model).

139 Lookahead policies Noes:» The academic lieraure a he momen does no disinguish beween lookahead models and base models.» When uncerainy is involved, base models are almos always simulaors ha simulae differen policies (which migh include a lookahead policy).» When we use a deerminisic lookahead policy o solve a sochasic problem, we undersand ha he model being solved by he lookahead policy is jus an approximaion.» When we use a sochasic lookahead model, hen hings end o ge confusing.

140 Lookahead policies Lookahead policies are he rickies o model:» We creae ilde variables for he lookahead model: S x,',' Approximaed sae variable (e.g coarse discreizaion) Decision we plan on implemening a ime ' when we are planning a ime, ', 1,..., H x x, x,..., x,, 1, H,' W,' Approximaion of informaion process c,' Forecas of coss a ime ' made a ime b Forecas of righ hand sides for ime ' made a ime» All variables are indexed by (when he lookahead model is being generaed) and (he ime wihin he lookahead model).

141 Lookahead policies We can use his noaion o creae a policy based on our lookahead model: Limied horizon * ( ) arg max (, ) max H X ( ', '( ')), 1, S C S x C S X S S S x ' 1 Resriced/simplified se of policies Sampled se of realizaions (or deerminisic); Aggregaed saging of decisions and informaion Simplified/discreized se of sae variables Simplified/discreized se of sae variables» Simples lookahead is deerminisic.

142 Lookahead policies Deerminisic lookahead T '1 X LAD (S ) arg minc( S, x ) ' C( S ', x ' ) x, x,1,..., x,t Sochasic lookahead (wih wo-sage approximaion) T LAS ' ( ) argmin (, ) ( ) ( '( ), '( )) ' 1 X S CS x p CS x x, x,1,..., x,t Scenario rees

143 Lookahead policies Lookahead policies peek ino he fuure» Opimize over deerminisic lookahead model The lookahead model The real process

144 Lookahead policies Lookahead policies peek ino he fuure» Opimize over deerminisic lookahead model The lookahead model The real process

145 Lookahead policies Lookahead policies peek ino he fuure» Opimize over deerminisic lookahead model The lookahead model The real process

146 Lookahead policies Lookahead policies peek ino he fuure» Opimize over deerminisic lookahead model The lookahead model The real process

147 Lookahead policies There are wo sraegies for formulaing and solving sochasic lookahead models:» Solve a sampled model of he fuure over a se of scenarios. We have wo opions: A full mulisage ree Usually impossible o solve A wo-sage approximaion We break he problem ino: Iniial decision See all fuure informaion Make all remaining decisions These may sill be very hard o solve.» Use value funcion approximaions: Benders cus (SDDP) Oher sraegies for approximaing value funcions.

148 Mulisage lookahead models Sochasic lookahead» Here, we approximae he informaion model by using a Mone Carlo sample o creae a scenario ree: 1am 2am 3am 4am 5am.. Change in wind speed Change in wind speed Change in wind speed Slide 148

149 Mulisage lookahead models We can hen simulae his lookahead policy over ime: The lookahead model The base model

150 Mulisage lookahead models We can hen simulae his lookahead policy over ime: The lookahead model The base model

151 Mulisage lookahead models We can hen simulae his lookahead policy over ime: The lookahead model The base model

152 Mulisage lookahead models We can hen simulae his lookahead policy over ime: The lookahead model The base model

153 Lookahead policies Noes:» Solving sochasic lookahead policies can be hard!» bu his is sill jus a lookahead policy which is a class of rolling horizon heurisic.» Even if solving he lookahead model is hard, an opimal soluion of a lookahead model (even a sochasic one) is (wih rare excepions) no an opimal policy.

154 Lookahead policies There are wo ways of evaluaing a policy» Offline learning using a simulaor: where 1 T F ( ) C( S( ), X ( S( ))) 0 S ( ) S ( S ( ), X ( S ( )), W ( )) M 1 1» Online learning (in he field) Implemen he policy and observe how well i works. Make adjusmens as necessary.

155 Lookahead policies Offline learning» Can es policies fairly quickly» Can es condiions ha have no acually happened.» Requires making assumpions abou dynamics and random evens Online learning» Takes a day o observe a days performance» Have o live wih real world evens» No assumpions required

156 Lookahead policies Noes:» Lookahead policies are he only class ha does no require using any form of saisical funcion approximaion.» Lookahead policies do no require uning, bu you may wan o es parameer seings (horizon, number of scenarios).» The price of hese feaures ends o be policies ha are much harder o compue.» Approximaion sraegies can only be evaluaed in he conrolled seing of a simulaor.

157 An energy sorage problem Consider a basic energy sorage problem:» We are going o show ha wih minor variaions in he characerisics of his problem, we can make each class of policy work bes.

158 An energy sorage problem We can creae disinc flavors of his problem:» Problem class 1 Bes for PFAs Highly sochasic (heavy ailed) elecriciy prices Saionary daa» Problem class 2 Bes for CFAs Sochasic prices and wind (bu no heavy ailed) Saionary daa» Problem class 3 - Bes for VFAs Sochasic wind and prices (bu no oo random) Time varying loads, bu inaccurae wind forecass» Problem class 4 Bes for deerminisic lookaheads Relaively low noise problem wih accurae forecass» Problem class 5 A hybrid policy worked bes here Sochasic prices and wind, nonsaionary daa, noisy forecass W.B.Powell

159 An energy sorage problem The policies» The PFA: Charge baery when price is below p1 Discharge when price is above p2» The CFA Opimize over a horizon H; mainain upper and lower bounds (u, l) for every ime period excep he firs (noe ha his is a hybrid wih a lookahead).» The VFA Piecewise linear, concave value funcion in erms of energy, indexed by ime.» The lookahead (deerminisic) Opimize over a horizon H (only unable parameer) using forecass of demand, prices and wind energy» The lookahead CFA Use a lookahead policy (deerminisic), bu wih a unable parameer ha improves robusness.

160 An energy sorage problem Each policy is bes on cerain problems» Resuls are percen of poserior opimal soluion» any policy migh be bes depending on he daa. Join research wih Prof. Sephan Meisel, Universiy of Muenser, Germany.

161 Ouline Canonical problems Problem classes Soluion sraegies for learning problems Elemens of a dynamic model An energy sorage illusraion Modeling uncerainy Designing policies The four classes of policies From deerminisic o sochasic opimizaion

162 From deerminisic o sochasic Imagine ha you would like o solve he ime-dependen linear program: min T x0,..., x cx T 0» subjec o Ax 0 0 b0 Ax B 1x 1 b, 1. We can conver his o a proper sochasic model by replacing x wih X ( S ): T cx S 0 X ( S) min ( ) A x The policy has o saisfy wih ransiion funcion: S S S, x, W M 1 1 R

163 Modeling Deerminisic» Objecive funcion T min x0,..., xt 0 cx Sochasic» Objecive funcion max, ( ), E C S X S W S T 1 0 0» Decision variables:» Policy,..., x0 x T X : S» Consrains:» Consrains a ime a ime Ax R x X ( S) x 0 Transiion funcion» Transiion funcion M S 1 S S, x, W 1 R b B x 1 1» Exogenous informaion ( W1, W2,..., W T )

164 From deerminisic o sochasic Sochasic problems» Modeling is he mos imporan, and hardes, aspec of sochasic opimizaion» Searching for policies is imporan, bu less criical.» Modeling uncerainy is ofen overlooked, bu is of cenral imporance.» Evaluaing a policy is imporan, and difficul. Deerminisic problems» Modeling is imporan, bu no cenral.» Algorihms are he mos imporan, and hardes par.» Huh?» Jus add up he coss!!

165 Modeling sochasic, dynamic problems The universal objecive funcion max E, ( ), C S X S W S T wih S S S, x, W ( ) M 1 1 Now search for policies:» Policy search: PFAs, CFAs» Lookahead policies: VFAs, DLAs

166 Theory Compuaion Modeling Applicaions

Tutorial: A Unified Modeling and Algorithmic Framework for Optimization under Uncertainty

Tutorial: A Unified Modeling and Algorithmic Framework for Optimization under Uncertainty Tuorial: A Unified Modeling and Algorihmic Framework for Opimizaion under Uncerainy Informs Opimizaion Sociey Meeing March 23, 2018 Warren B. Powell Princeon Universiy Deparmen of Operaions Research and