Tutorial: A Unified Modeling and Algorithmic Framework for Optimization under Uncertainty

Size: px

Start display at page:

Download "Tutorial: A Unified Modeling and Algorithmic Framework for Optimization under Uncertainty"

Ursula Barker
5 years ago
Views:

1 Tuorial: A Unified Modeling and Algorihmic Framework for Opimizaion under Uncerainy Informs Opimizaion Sociey Meeing March 23, 2018 Warren B. Powell Princeon Universiy Deparmen of Operaions Research and Financial Engineering 2016 Warren B. Powell, Princeon Universiy

2 Drug discovery Designing molecules» X and Y are sies where we can hang subsiuens o change he behavior of he molecule. We approximae he performance using a linear belief model: Y X 0 ij ij sies i subsiuens j» How o sequence experimens o learn he bes molecule as quickly as possible?

3 Real-ime logisics Uber/Lyf» Provides real-ime, on-demand ransporaion.» Drivers are encouraged o ener or leave he sysem using pricing signals and informaional guidance. Decisions:» How o price o ge he righ balance of drivers relaive o cusomers.» Real-ime managemen of drivers.» Policies (rules for managing drivers, cusomers, )

4 Ride sharing Cars Riders

5 Ride sharing +1 +2

6 Ride sharing +1 +2

7 Ride sharing +1 +2

8 Maching buyers wih sellers Now we have a logisic curve for each origin-desinaion pair (i,j) P Y ( p, a ) e 1 e 0 a ij ij p ij a 0 a ij ij p ij a Buyer Seller Number of offers for each (i,j) pair is relaively small. Need o generalize he learning across hundreds o housands of markes. Offered price

9 Meeing variabiliy wih porfolios of generaion wih mixures of dispachabiliy

10 Sorage applicaions How much energy o sore in a baery o handle he volailiy of wind and spo prices o mee demands?

11 Ouline Canonical problems Elemens of a dynamic model An energy sorage illusraion Soluion sraegies and problem classes Modeling uncerainy Designing policies

12 Ouline Canonical problems Elemens of a dynamic model An energy sorage illusraion Soluion sraegies and problem classes Modeling uncerainy Designing policies

13 Canonical problems Decision rees

14 Canonical problems Sochasic search (derivaive based)» Basic problem: max F( xw, ) x» Sochasic gradien n1 n n n1 x x nxf( x, W )» Asympoic convergence: n * lim n F( x, W) F( x, W)» Finie ime performance Manufacuring nework (x=design) Uni commimen problem (x=day ahead decisions) Invenory sysem (x=design, replenishmen policy) Baery sysem (x=choice of maerial) Paien reamen cos (x=drug, reamens) Trucking company (x=flee size and mix), n max F( x, W) where is an algorihm (or policy)

15 Canonical problems Ranking and selecion (derivaive free)» Basic problem: max F ( xw, ) x x x 1,..., M n» We need o design a policy X ( S ) ha finds a design given by x,n, max N F x, W» We refer o his objecive as maximizing he final reward.

16 Canonical problems Muli-armed bandi problems» We learn he reward from playing each arm n» We need o find a policy X ( S ) for playing machine x ha maximizes: where N 1 1 max n n F( X ( S ), W ) n0 W n 1 S n "winnings" New informaion Sae of knowledge Wha we know abou each slo machine n n x X ( S ) Choose nex arm o play We refer o his problem as maximizing cumulaive reward.

17 Canonical problems (Discree) Markov decision processes» Bellman s opimaliy equaion V ( S ) min C( S, a ) V ( S ) S a A 1 1» This is also he same as solving where he opimal policy has he form min a (, ) ( 1 ', ) 1( 1) A CS a ps s S a V S s ' T min E C S, X ( S ) S 0 0 ( { }) p X ( S ) = argmin C( S, x ) + V ( S ) S, x x

18 Canonical problems Opimal sopping I» Model: Exogenous process: Decision: 1 If we sop and sell a ime X ( ) 0 Oherwise Reward: p» Opimizaion problem: p1 p2 p T,,..., Sequence of sock prices Price received if we sop a ime max px where is a sopping ime (or " F measurable funcion" )

19 Canonical problems Opimal sopping II» Model: Exogenous process: p, p,..., p Sequence of sock prices Sae: R 1 2 p (1 ) p p 1 Policy: 1 p p X( S ) 0 Oherwise» Opimizaion problem: T = 1 if we are holding asse, 0 oherwise. S = ( R, p, p ) T T px S px S 0 0 max ( ) max ( )

20 Canonical problems Linear quadraic regulaion (LQR)» A popular opimal conrol problem in engineering involves solving:» where: x T min ( x ) Qx ( u ) Ru T T u0,..., ut 0 Sae a ime u Conrol a ime (mus be F 1» Possible o show ha he opimal policy looks like: where is a complicaed funcion of Q and R. measurable) x f( x, u ) w ( w is random a ime ) U * ( x ) K x K

21 Canonical problems Sochasic programming» A (wo-sage) sochasic programming problem where min cx Qx (, ) x0x Qx (, ( )) min c( ) x( ) 0 1 x ( ) X ( ) » This is he canonical form of sochasic programming, which migh also be wrien over muliple periods: min cx 0 0 p( ) c( ) x( ) T 1

22 Canonical problems Sochasic programming» A (wo-sage) sochasic programming policy where min cx Q( x, ) x X 1 Qx (, ( )) min c ( ) x ( ) 1 x ( ) X ( ) » This is he canonical form of sochasic programming, which migh also be wrien over muliple periods: min cx p( ) c ( ) x ( ) ' ' ' 1 H

23 Canonical problems A robus opimizaion problem would be wrien min max (, ) xx ww ( ) F xw» This means finding he bes design x for he wors oucome w in an uncerainy se ( )» This has been adaped o muliperiod problems T H x 0,..., x 1H ( w( 0w,...,,..., wtw) H ) ( ) ( ) ' ' ' ' ' ' ' 0' min max c ( cw( w) x) x

24 Canonical problems Why do we need a unified framework?» The classical frameworks and algorihms are fragile.» Small changes o problems invalidae opimaliy condiions, or make algorihmic approaches inracable.» Praciioners need robus approaches ha will provide high qualiy soluions for all problems.

25 Ouline Canonical problems Elemens of a dynamic model An energy sorage illusraion Soluion sraegies problem classes Modeling uncerainy Designing policies

26 Sorage applicaions How much energy o sore in a baery o handle he volailiy of wind and spo prices o mee demands?

27 Modeling For deerminisic problems, we speak he language of mahemaical programming» Linear programming: min x cx Ax b x 0» For ime-saged problems min T x0,..., x cx T 0 A x B x b Dx u x Arguably Danzig s bigges conribuion, more so han he simplex algorihm, was his ariculaion of opimizaion problems in a sandard forma, which has given algorihmic researchers a common language.

28 Sochasic Approximae Robus Decision dynamic opimizaion Simulaion programming analysis opimizaion Opimal Dynamic Model learning Programming predicive and Sochasic conrol Opimal Bandi conrol search conrol problems Online programming Sochasic conrol Reinforcemen learning Markov decision processes compuaion Simulaion opimizaion

29 Sochasic Approximae Robus Decision dynamic opimizaion Simulaion programming analysis opimizaion Opimal Dynamic Model learning Programming predicive and Sochasic conrol Opimal Bandi conrol search conrol problems Online programming Sochasic conrol Reinforcemen learning Markov decision processes compuaion Simulaion opimizaion

30 Modeling Before we can solve complex problems, we have o know how o hink abou hem. Min E {cx} Ax = b x > 0 Organize class libraries, and se up communicaions and daabases Mahemaician Sofware The bigges challenge when making decisions under uncerainy is modeling.

31 Modeling We lack a sandard language for modeling sequenial, sochasic decision problems.» In he slides ha follow, we propose o model problems along five fundamenal dimensions: Sae variables Decision variables Exogenous informaion Transiion funcion Objecive funcion» This framework draws heavily from Markov decision processes and he conrol heory communiies, bu i is no he sandard form used anywhere.

32 Modeling dynamic problems The sae variable: Conrols communiy x "Informaion sae" Operaions research/mdp/compuer science S R, I, B Sysem sae, where: R Resource sae (physical sae) Locaion/saus of ruck/rain/plane Energy in sorage I Informaion sae Prices Weaher B Belief sae ("sae of knowledge") Belief abou raffic delays Belief abou he saus of equipmen

33 The sae variable Illusraing sae variables» A deerminisic graph S (? N ) 6

34 The sae variable Illusraing sae variables» A sochasic graph S?

35 The sae variable Illusraing sae variables» A sochasic graph S?, 6, 12.7,8.9,13.5 N c, N, j j R I

36 The sae variable Illusraing sae variables» A sochasic graph wih lef urn penalies (.7) S? N, c, N, j, N 1 6, 12,7,8.9,13.5,3 j R I

37 The sae variable Varian of problem in Puerman (2005):» Find bes pah from 1 o 11 ha minimizes he second highes arc cos along he pah: » If he raveler is a node 9, wha is her sae? S?( N, highes,second highes) (9,15,12)

38 The sae variables Wha is a sae variable?» Bellman s classic ex on dynamic programming (1957) describes he sae variable wih: we have a physical sysem characerized a any sage by a small se of parameers, he sae variables.» The mos popular book on dynamic programming (Puerman, 2005, p.18) defines a sae variable wih he following senence: A each decision epoch, he sysem occupies a sae.» Wikipedia: A sae variable is one of he se of variables ha are used o describe he mahemaical sae of a dynamical sysem

39 The sae variable My definiion of a sae variable:» The firs depends on a policy. The second depends only on he problem (and includes he consrains).» Using eiher definiion, all properly modeled problems are Markovian!

40 Modeling dynamic problems Decisions: Markov decision processes/compuer science a Discree acion Conrol heory u Low-dimensional coninuous vecor Operaions research x Usually a discree or coninuous bu high-dimensional vecor of decisions. A his poin, we do no specify how o make a decision. Insead, we define he funcion X ( s) (or A ( s) or U ( s)), where specifies he ype of policy. " " carries informaion abou he ype of funcion f, and any unable parameers f.

41 The decision variables Syles of decisions»binary xx 0,1» Finie x X 1,2,..., M» Coninuous scalar x X a, b» Coninuous vecor x( x1,..., xk), xk» Discree vecor x( x1,..., xk), xk» Caegorical x ( a,..., a ), a is a caegory (e.g. red/green/blue) 1 I i

42 Modeling dynamic problems Exogenous informaion: W New informaion ha firs became known a ime = Rˆ, Dˆ, pˆ, Eˆ Rˆ Equipmen failures, delays, new arrivals New drivers being hired o he nework Dˆ New cusomer demands pˆ Changes in prices Eˆ Informaion abou he environmen (emperaure,...) Noe: Any variable indexed by is known a ime. This convenion, which is no sandard in conrol heory, dramaically simplifies he modeling of informaion. Below, we le represen a sequence of acual observaions W1, W2,... W refers o a sample realizaion of he random variable W.

Modeling dynamic problems The ransiion funcion M S 1 S ( S, x, W

demands 1 1 Also known as he: Sysem model Sae ransiion model Plan

funcion Law of moion Model ransiion funcion For many applicaions,

43 Modeling dynamic problems The ransiion funcion M S 1 S ( S, x, W 1) R ˆ 1 R x R 1 Invenories p ˆ 1 p p 1 Spo prices D D Dˆ Marke demands 1 1 Also known as he: Sysem model Sae ransiion model Plan model Plan equaion Sae equaion Transfer funcion Transformaion funcion Law of moion Model ransiion funcion For many applicaions, hese equaions are unknown. This is known as model-free dynamic programming.

44 Modeling sochasic, dynamic problems The universal objecive funcion max E, ( ), C S X S W S T Expecaion over all Conribuion funcion random oucomes Decision funcion (policy) Sae variable Iniial sae variable Finding he bes policy New informaion Given a sysem model (ransiion funcion) S S S, x, W ( ) and a sochasic process: M 1 1 S0, W1, W2,..., WT Now we jus have o find he bes policy.

45 Modeling sochasic, dynamic problems The universal objecive funcion max E, ( ), C S X S W S T Given a sysem model (ransiion funcion) S S S, x, W ( ) and a sochasic process: M 1 1 S0, W1, W2,..., WT Now we jus have o find he bes policy.

46 Modeling Deerminisic» Objecive funcion T min x0,..., xt 0 cx Sochasic» Objecive funcion max, ( ), E C S X S W S T 1 0 0» Decision variables:» Policy,..., x0 x T X : S» Consrains:» Consrains a ime a ime Ax R x X ( S) x 0 Transiion funcion» Transiion funcion M S 1 S S, x, W 1 R b B x 1 1» Exogenous informaion ( S0, W1, W2,..., W T )

47 Ouline Canonical problems Elemens of a dynamic model An energy sorage illusraion Soluion sraegies and problem classes Modeling uncerainy Designing policies

48 An energy sorage problem Consider a basic energy sorage problem:» We are going o show ha wih minor variaions in he characerisics of his problem, we can make each class of policy work bes.

49 An energy sorage problem A model of our problem» Sae variables» Decision variables» Exogenous informaion» Transiion funcion» Objecive funcion

50 An energy sorage problem Sae variables E B L G» We will presen he full model, accumulaing he informaion we need in he sae variable.» We will highligh informaion we need as we proceed. This informaion will make up our sae variable.

51 An energy sorage problem E Decision variables B L G EL EB GL GB BL x x, x, x, x, x,» Consrains;

52 An energy sorage problem E Exogenous informaion B L G W Eˆ Change in energy from wind beween 1 and p D ' Noise in he price process beween f Forecas of demand D provided by vendor a ime f D D f D ' ' ' Provided exogenously 1 and Difference beween acual demand and forecas

53 An energy sorage problem Transiion funcion E B L G E E Eˆ 1 1 p p p p ( ) p D p T p f D D 1, 1 1 R R x baery baery 1 p p p p 1 2

54 Learning in sochasic opimizaion Updaing he demand parameer»le be he new price and le F ( p ) ( ) p p p p price T » We updae our esimae using our recursive leas squares equaions: 1 q = q - Bpe g+ 1 e = F( p q )-p, B = B - B p p B g ( T ( ) ) + 1 g+ 1 T + 1 = 1 + ( p ) B p

55 An energy sorage problem E Objecive funcion B L G GB CS (, x) p x x GL T min C ( S, X ( S ), W ) S 0 1 0

56 An energy sorage problem Sae variables» Cos funcion p Price of elecriciy» Decision funcion Consrains: D S E, L, R,( p, p 1, p2), f,(, B )» Transiion funcion p p p p D p f D D 1, B ( x ) 1 1 1

57 Ouline Elemens of a dynamic model An energy sorage illusraion Soluion sraegies and problem classes Modeling uncerainy Designing policies

58 Soluion sraegies and problem classes Special srucure» There are special cases where we can solve max F( xw, ) x exacly. Bu no very many. Sampled problems (SAA, scenario rees)» If he only problem is ha we canno compue he expecaion, we migh solve a sampled approximaion max ˆ F( xw, ) x Adapive learning algorihms» This is wha we have o urn o for mos problems, and is he focus of his uorial.

59 Soluion sraegies and problem classes Sae independen problems» The problem does no depend on he sae of he sysem. max FxW (, ) pmin( xw, ) cx x» The only sae variable is wha we know (or believe) abou he unknown funcion F(, xw), called he belief sae, so. Sae dependen problems» Now he problem may depend on wha we know a ime : 0» Now he sae is max CSxW (,, ) p min( xw, ) cx x R

60 Soluion sraegies and problem classes Offline (final reward)» We can ieraively search for he bes soluion, bu only care abou he final answer.» Asympoic formulaion: max FxW (, ) x» Finie horizon formulaion:, max N Fx (, W) Online (cumulaive reward)» We have o learn as we go N 1 1 max n n FX ( ( S ), W ) n 0 üï ï ý ï ïþ ranking and selecion or sochasic search

61 Soluion sraegies and problem classes

62 Soluion sraegies and problem classes

63 Soluion sraegies and problem classes Learning policies: Approximae dynamic programming Q-learning SDDP

64 Soluion sraegies and problem classes Online (cumulaive reward) dynamic programming is recognized as he dynamic programming problem, bu he enire lieraure on solving dynamic programs describes class (4) problems. This appears o be an open problem.

65 Ouline Canonical problems Elemens of a dynamic model An energy sorage illusraion Soluion sraegies and problem classes Modeling uncerainy Designing policies

66 Modeling uncerainy Observaional uncerainy Prognosic uncerainy (forecasing) Experimenal noise/variabiliy Transiional uncerainy Inferenial uncerainy Model uncerainy Sysemaic exogenous uncerainy Conrol/implemenaion uncerainy Algorihmic noise Goal uncerainy Modeling uncerainy in he conex of sochasic opimizaion is a relaively unapped area of research.

67 Ouline Canonical problems Elemens of a dynamic model An energy sorage illusraion Soluion sraegies and problem classes Modeling uncerainy Designing policies

68 Designing policies We have o sar by describing wha we mean by a policy.» Definiion: A policy is a mapping from a sae o an acion. any mapping. How do we search over an arbirary space of policies?

69 Designing policies Two fundamenal sraegies: 1) Policy search Search over a class of funcions for making decisions o opimize some meric. T max f f E C S, X ( S ) S ( ff, ) 0 0 2) Lookahead approximaions Approximae he impac of a decision now on he fuure. T * X ( ) arg max (, ) max S x C S ( ', '( ')) 1, x C S X S S S x ' 1

70 Designing policies Policy search: 1a) Policy funcion approximaions (PFAs) Lookup ables when in his sae, ake his acion Parameric funcions Order-up-o policies: if invenory is less han s, order up o S. PFA Affine policies - x X ( S ) ff ( S) ff Neural neworks Locally/semi/non parameric Requires opimizing over local regions 1b) Cos funcion approximaions (CFAs) Opimizing a deerminisic model modified o handle uncerainy (buffer socks, schedule slack) CFA X ( S ) arg max C ( S, x ) x X ( ) PFA x X ( S )

71 Designing policies Lookahead approximaions Approximae he impac of a decision now on he fuure:» An opimal policy (based on looking ahead): T * X ( ) arg max (, ) max S x C S ( ', '( ')) 1, x C S X S S S x ' 1 2a) Approximaing he value of being in a downsream sae using machine learning ( value funcion approximaions ) X ( S ) arg max C( S, x ) V ( S ) S, x * x 1 1 X ( S ) arg max C( S, x ) V ( S ) S, x VFA x 1 1 x x arg max CS (, x) V ( S) x

72 Designing policies Lookahead approximaions Approximae he impac of a decision now on he fuure:» An opimal policy (based on looking ahead): T * X ( ) arg max (, ) max S x C S ( ', '( ')) 1, x C S X S S S x ' 1 2a) Approximaing he value of being in a downsream sae using machine learning ( value funcion approximaions ) X ( S ) arg max C( S, x ) V ( S ) S, x * x 1 1 X ( S ) arg max C( S, x ) V ( S ) S, x VFA x 1 1 x x arg max CS (, x) V ( S) x

73 Designing policies Lookahead approximaions Approximae he impac of a decision now on he fuure:» An opimal policy (based on looking ahead): T * X ( ) arg max (, ) max S x C S ( ', '( ')) 1, x C S X S S S x ' 1 2a) Approximaing he value of being in a downsream sae using machine learning ( value funcion approximaions ) X ( S ) arg max C( S, x ) V ( S ) S, x * x 1 1 X ( S ) arg max C( S, x ) V ( S ) S, x VFA x 1 1 x x arg max CS (, x) V ( S) x

74 Designing policies The ulimae lookahead policy is opimal T * X ( ) arg max (, ) max S x C S ( ', '( ')) 1, x C S X S S S x ' 1

75 Designing policies The ulimae lookahead policy is opimal T * X ( ) arg max (, ) max S x C S ( ', '( ')) 1, x C S X S S S x ' 1 Maximizaion ha we canno compue Expecaions ha we canno compue

76 Designing policies The ulimae lookahead policy is opimal T * X ( ) arg max (, ) max S x C S ( ', '( ')) 1, x C S X S S S x ' 1» 2b) Insead, we have o solve an approximaion called he lookahead model: H * X ( S) arg max x (, ) max ( ', '( ')), 1, C S x C S X S S S x ' 1» A lookahead policy works by approximaing he lookahead model.

77 Designing policies Types of lookahead approximaions» One-sep lookahead Widely used in pure learning policies: Bayes greedy/naïve Bayes Expeced improvemen Value of informaion (knowledge gradien)» Muli-sep lookahead Deerminisic lookahead, also known as model predicive conrol, rolling horizon procedure Sochasic lookahead: Two-sage (widely used in sochasic linear programming) Mulisage» Mone carlo ree search (MCTS) for discree acion spaces» Mulisage scenario rees (sochasic linear programming) ypically no racable.

Four (mea)classes of policies Policy search Lookahead approximaions 1) Policy funcion

approximaion (CFAs) CFA» X ( S ) argmax C ( S, ) ( ) x x X 3) Policies based on value funcion

(DLAs)» Deerminisic lookahead/rolling horizon proc.

/sochasic prog/mone Carlo ree search x, x,1,..., x,t» Robus opimizaion T LAD ' ' x,.

78 Four (mea)classes of policies Policy search Lookahead approximaions 1) Policy funcion approximaions (PFAs)» Lookup ables, rules, parameric/nonparameric funcions 2) Cos funcion approximaion (CFAs) CFA» X ( S ) argmax C ( S, ) ( ) x x X 3) Policies based on value funcion approximaions (VFAs)» VFA x x X ( S) argmax x C( S, ) (, ) x V S S x 4) Direc lookahead policies (DLAs)» Deerminisic lookahead/rolling horizon proc./model predicive conrol» Chance consrained programming PAx [ fw ( )] 1» Sochasic lookahead /sochasic prog/mone Carlo ree search x, x,1,..., x,t» Robus opimizaion T LAD ' ' x,..., x, H ' 1 X ( S ) arg max C( S, x ) C( S, x ) T LAS ' ' ' 1 X ( S ) arg max C( S, x ) p( ) C( S ( ), x ( )) T LARO ' ' x,..., x, H ww ( ) ' 1 X ( S ) arg max min C( S, x ) C( S ( w), x ( w))

79 Four (mea)classes of policies Funcion approx. 1) Policy funcion approximaions (PFAs)» Lookup ables, rules, parameric/nonparameric funcions 2) Cos funcion approximaion (CFAs) CFA» X ( S ) argmax C ( S, ) ( ) x x X 3) Policies based on value funcion approximaions (VFAs)» VFA x x X ( S) argmax x C( S, ) (, ) x V S S x 4) Direc lookahead policies (DLAs)» Deerminisic lookahead/rolling horizon proc./model predicive conrol T LAD ' ' x,..., x, H ' 1 X ( S ) arg max C( S, x ) C( S, x )» Chance consrained programming PAx [ fw ( )] 1» Sochasic lookahead /sochasic prog/mone Carlo ree search x, x,1,..., x,t» Robus opimizaion T LAS ' ' ' 1 X ( S ) arg max C( S, x ) p( ) C( S ( ), x ( )) T LARO ' ' x,..., x, H ww ( ) ' 1 X ( S ) arg max min C( S, x ) C( S ( w), x ( w))

Four (mea)classes of policies Imbedded opimizaion 1) Policy funcion approximaions (PFAs)» Lookup ables, rules,

value funcion approximaions (VFAs)» VFA x x X ( S) argmax x C( S, ) (, ) x V S S x 4) Direc lookahead policies (DLAs)»

.., x, H ' 1 X ( S ) arg max C( S, x ) C( S, x )» Chance consrained programming PAx [ fw ( )] 1» Sochasic lookahead /sochasic

80 Four (mea)classes of policies Imbedded opimizaion 1) Policy funcion approximaions (PFAs)» Lookup ables, rules, parameric/nonparameric funcions 2) Cos funcion approximaion (CFAs) CFA» X ( S ) argmax C ( S, ) ( ) x x X 3) Policies based on value funcion approximaions (VFAs)» VFA x x X ( S) argmax x C( S, ) (, ) x V S S x 4) Direc lookahead policies (DLAs)» Deerminisic lookahead/rolling horizon proc./model predicive conrol T LAD ' ' x,..., x, H ' 1 X ( S ) arg max C( S, x ) C( S, x )» Chance consrained programming PAx [ fw ( )] 1» Sochasic lookahead /sochasic prog/mone Carlo ree search x, x,1,..., x,t» Robus opimizaion T LAS ' ' ' 1 X ( S ) arg max C( S, x ) p( ) C( S ( ), x ( )) T LARO ' ' x,..., x, H ww ( ) ' 1 X ( S ) arg max min C( S, x ) C( S ( w), x ( w))

81 Learning problems Funcions we have o learn:» Approximaing he objecive.» Designing a policy.» A value funcion approximaion.» Designing a cos funcion approximaion: The objecive funcion,. The consrains» Approximaing he ransiion funcion

Approximaion sraegies Approximaion sraegies» Lookup ables Independen beliefs Correlaed beliefs» Linear parameric models Linear models Sparse-linear Tree regression»

82 Approximaion sraegies Approximaion sraegies» Lookup ables Independen beliefs Correlaed beliefs» Linear parameric models Linear models Sparse-linear Tree regression» Nonlinear parameric models Logisic regression Neural neworks» Nonparameric models Gaussian process regression Kernel regression Suppor vecor machines Deep neural neworks

83 Designing policies Finding he bes policy» We have o firs ariculae our classes of policies» So minimizing over means:» We hen have o pick an objecive such as or f f PFAs, CFAs, VFAs, DLAs Parameers ha characerize each family. f, T 1 max C S, X ( S ) F X ( S ), W 0 0 max C S, X F( X, W) f T T T T

84 Ouline The four classes of policies» Policy funcion approximaions (PFAs)» Cos funcion approximaions (CFAs)» Value funcion approximaions (VFAs)» Direc lookahead policies (DLAs)» A hybrid lookahead/cfa

85 Ouline The four classes of policies» Policy funcion approximaions (PFAs)» Cos funcion approximaions (CFAs)» Value funcion approximaions (VFAs)» Direc lookahead policies (DLAs)» A hybrid lookahead/cfa

86 Policy funcion approximaions Baery arbirage When o charge, when o discharge, given volaile LMPs

87 Policy funcion approximaions Grid operaors require ha baeries bid charge and discharge prices, an hour in advance Discharge Charge We have o search for he bes values for he policy Charge Discharge parameers and.

88 Policy funcion approximaions Our policy funcion migh be he parameric model (his is nonlinear in he parameers): charge 1 if p Energy in sorage: charge X ( S ) 0 if p charge 1 if p discharge Price of elecriciy:

89 Policy funcion approximaions Finding he bes policy» We need o maximize T max F( ) C S, X ( S ) 0» We canno compue he expecaion, so we run simulaions: Charge Discharge

90 Ouline The four classes of policies» Policy funcion approximaions (PFAs)» Cos funcion approximaions (CFAs)» Value funcion approximaions (VFAs)» Direc lookahead policies (DLAs)» A hybrid lookahead/cfa

91 Cos funcion approximaions Lookup able» We can organize poenial caalyss ino groups» Scieniss using domain knowledge can esimae correlaions in experimens beween similar caalyss.

92 Cos funcion approximaions Correlaed beliefs: Tesing one maerial eaches us abou oher maerials

93 Cos funcion approximaions Designing a policy» Examples of policies for making decisions: Inerval esimaion: IE n IE n IE n n n X ( S ) arg max Sd. dev. of Upper confidence bounding UCB n UCB n UCB log n n X ( S ) arg max x x Nx No. of imes x is esed. n N x Thompson sampling: Knowledge gradien (expeced value of informaion): x x x x x TS n n n n n X ( S ) arg max ˆ ˆ N(, ) x x x x x ( x) E max F( y, B ( x)) max F( y, B ) KG, n n1 n y y

94 Cos funcion approximaions Picking a he mean. means we are evaluaing each choice

95 Cos funcion approximaions Picking means we are evaluaing each choice a he 95 h percenile

96 Cos funcion approximaions Opimizing he policy» We opimize o maximize: where IE, N max ( ), F F x W IE n IE n IE n IE n n n n x X ( S ) arg max S (, ) x x x x x Noes:» This can handle any belief model, including correlaed beliefs, nonlinear belief models.» All we require is ha we be able o simulae a policy.

97 Cos funcion approximaions Invenory managemen» How much produc should I order o anicipae fuure demands?» Need o accommodae differen sources of uncerainy. Marke behavior Transi imes Supplier uncerainy Produc qualiy

98 Cos funcion approximaions Imagine ha we wan o purchase pars from differen suppliers. Le x p be he amoun of produc we purchase a ime from supplier p o mee forecased demand. We would solve subjec o D X ( S ) arg max c x pp x p p pp x x x p p p D u 0 p» This assumes our demand forecas is accurae. D

99 Cos funcion approximaions Imagine ha we wan o purchase pars from differen suppliers. Le x p be he amoun of produc we purchase a ime from supplier p o mee forecased demand. We would solve subjec o X ( S ) arg max c x x ( ) p p pp pp x x x p p p u p Reserve buffer D» This is a parameric cos funcion approximaion D Reserve ( ) Buffer sock

100 Cos funcion approximaions A general way of creaing CFAs:» Define our policy: X ( ) arg max C ( S, x ) subjec o x Paramerically modified coss Ax b ( ) Paramerically modified consrains» We une by opimizing: min F ( ) C( S, X ( )) T 0

101 Ouline The four classes of policies» Policy funcion approximaions (PFAs)» Cos funcion approximaions (CFAs)» Value funcion approximaions (VFAs)» Direc lookahead policies (DLAs)» A hybrid lookahead/cfa

102 Locaional LMPs marginal Locaional marginal prices prices on he grid $977/MW!!!

103 Grid Opimizing level baery sorage sorage Slide 103 Slide 103

104 Grid Opimizing level baery sorage sorage Slide 104 Slide 104

105 Value funcion approximaions Slide 105 Slide 105

106 Value funcion approximaions Monday Time :05 :10 :15 :20 Slide 106 Slide 106

107 Value funcion approximaions Monday Time :05 :10 :15 :20 Slide 107 Slide 107

108 Value funcion approximaions Monday Time :05 :10 :15 :20 Slide 108 Slide 108

109 Exploiing concaviy Derivaives are used o esimae a piecewise linear approximaion V ( R ) R Slide 109

110 Approximae value ieraion n Sep 1: Sar wih a pre-decision sae S Sep 2: Solve he deerminisic opimizaion using an approximae value funcion: n n n1 M, x n ˆ min x (, ) ( ( v C S x V S S, x )) x n o obain. Sep 3: Updae he value funcion approximaion n x, n n1 x, n n 1( 1) (1 n 1) 1 ( 1) n 1ˆ V S V S v n Sep 4: Obain Mone Carlo sample of W ( ) and compue he nex pre-decision sae: n M n n n S 1 S ( S, x, W 1( )) Sep 5: Reurn o sep 1. on policy learning Deerminisic opimizaion Recursive saisics Simulaion

111 Approximae value ieraion n Sep 1: Sar wih a pre-decision sae S Sep 2: Solve he deerminisic opimizaion using an approximae value funcion: n n n1 M, x n ˆ min x (, ) ( ( v C S x V S S, x )) x n o obain. Sep 3: Updae he value funcion approximaion n x, n n1 x, n n 1( 1) (1 n 1) 1 ( 1) n 1ˆ V S V S v n Sep 4: Obain Mone Carlo sample of W ( ) and compue he nex pre-decision sae: n M n n n S 1 S ( S, x, W 1( )) Sep 5: Reurn o sep 1. Deerminisic opimizaion Recursive saisics Simulaion

112 Approximae dynamic programming a ypical performance graph Objecive funcion Ieraions

113 The value of grid level sorage Wihou sorage

114 The value of grid level sorage Wih sorage

115 Approximae value funcions can work very well, bu you need srucure o guide he learning process. ADP needs benchmarks and careful uning. I hink you give a oo rosy a picure of ADP. Andy Baro, in commens on a paper (2009) Is he RL glass half full, or half empy? Rich Suon, NIPS workshop, (2014)

116 Ouline The four classes of policies» Policy funcion approximaions (PFAs)» Cos funcion approximaions (CFAs)» Value funcion approximaions (VFAs)» Direc lookahead policies (DLAs)» A hybrid lookahead/cfa

117 Lookahead policies Planning your nex chess move:» You pu your finger on he piece while you hink abou moves ino he fuure. This is a lookahead policy, illusraed for a problem wih discree acions.

118

119 Lookahead policies Decision rees:

120 Lookahead policies Modeling lookahead policies» Lookahead policies solve a lookahead model, which is an approximaion of he fuure.» I is imporan o undersand he difference beween he: Base model his is he model we are rying o solve by finding he bes policy. This is usually some form of simulaor. The lookahead model, which is our approximaion of he fuure o help us make beer decisions now.» The base model is ypically a simulaor, or i migh be he real world.

121 Lookahead policies Lookahead models use five classes of approximaions:» Horizon runcaion Replacing a longer horizon problem wih a shorer horizon» Sage aggregaion Replacing mulisage problems wih wo-sage approximaion.» Oucome aggregaion/sampling Simplifying he exogenous informaion process» Discreizaion Of ime, saes and decisions» Dimensionaliy reducion We may ignore some variables (such as forecass) in he lookahead model ha we capure in he base model (hese become laen variables in he lookahead model).

122 Lookahead policies Lookahead policies are he rickies o model:» We creae ilde variables for he lookahead model: S x,',' Approximaed sae variable (e.g coarse discreizaion) Decision we plan on implemening a ime ' when we are planning a ime, ', 1,..., H x x, x,..., x,, 1, H,' W,' Approximaion of informaion process c,' Forecas of coss a ime ' made a ime b Forecas of righ hand sides for ime ' made a ime» All variables are indexed by (when he lookahead model is being generaed) and (he ime wihin he lookahead model).

123 Lookahead policies We can use his noaion o creae a policy based on our lookahead model: Limied horizon * ( ) arg max (, ) max H X ( ', '( ')), 1, S C S x C S X S S S x ' 1 Resriced/simplified se of policies Sampled se of realizaions (or deerminisic); Aggregaed saging of decisions and informaion Simplified/discreized se of sae variables Simplified/discreized se of decision variables» Simples lookahead is deerminisic.

124 Lookahead policies Deerminisic lookahead T LAD ' ' ' 1 X ( S ) argmax C( S, x ) C( S, x ) x, x,1,..., x,t Sochasic lookahead (wih wo-sage approximaion) T LAS ' ' ' 1 X ( S) arg max CS (, x) p( ) CS ( ( ), x ( )) x, x,1,..., x,t Scenario rees

125 Lookahead policies Lookahead policies peek ino he fuure» Opimize over deerminisic lookahead model The lookahead model The real process

126 Lookahead policies Lookahead policies peek ino he fuure» Opimize over deerminisic lookahead model The lookahead model The real process

127 Lookahead policies Lookahead policies peek ino he fuure» Opimize over deerminisic lookahead model The lookahead model The real process

128 Lookahead policies Lookahead policies peek ino he fuure» Opimize over deerminisic lookahead model The lookahead model The real process

129 Lookahead policies Sochasic lookahead» Here, we approximae he informaion model by using a Mone Carlo sample o creae a scenario ree: 1am 2am 3am 4am 5am.. Change in wind speed Change in wind speed Change in wind speed

130 Lookahead policies We can hen simulae his lookahead policy over ime: The lookahead model The base model

131 Lookahead policies We can hen simulae his lookahead policy over ime: The lookahead model The base model

132 Lookahead policies We can hen simulae his lookahead policy over ime: The lookahead model The base model

133 Lookahead policies We can hen simulae his lookahead policy over ime: The lookahead model The base model

134 Learning damaged neworks Which way should he ruck go? call X X call call X X call X X call call call 134 4

135 Decision Oucome Decision Oucome Decision

136 Lookahead policies Mone Carlo ree search: C. Browne, E. Powley, D. Whiehouse, S. Lucas, P. I. Cowling, P. Rohlfshagen, S. Tavener, D. Perez, S. Samohrakis and S. Colon, A survey of Mone Carlo ree search mehods, IEEE Transacions on Compuaional Inelligence and AI in Games, vol. 4, no. 1, pp. 1 49, March 2012.

137

138 Lookahead policies Noes:» Solving sochasic lookahead policies can be hard!» bu his is sill jus a lookahead policy which is a class of rolling horizon heurisic.» Even if solving he lookahead model is hard, an opimal soluion of a lookahead model (even a sochasic one) is (wih rare excepions) no an opimal policy.

139 Ouline The four classes of policies» Policy funcion approximaions (PFAs)» Cos funcion approximaions (CFAs)» Value funcion approximaions (VFAs)» Direc lookahead policies (DLAs)» A hybrid lookahead/cfa

140 Parameric cos funcion approximaion An energy sorage problem:

141 Parameric cos funcion approximaion Forecass evolve over ime as new informaion arrives: Rolling forecass, updaed each hour. Forecas made a midnigh: Acual

142 Parameric cos funcion approximaion Benchmark policy Deerminisic lookahead x + bx + x f wd rd gd D ' ' ' ' gd gr G x ' + x ' f ' rd rg x ' + x ' R ' x + x R -R wr gr max ' ' ' x + x f wr wd E ' ' ' x wr gr charge ' ' x + x g + x g rd rg discharge ' '

143 Parameric cos funcion approximaion Parameric cos funcion approximaions» Replace he consrain x wr ' wd x ' wih:» Lookup able modified forecass (one adjusmen erm for each ime ' in he fuure): wr wd E x x ' ' ' f '» Exponenial funcion for adjusmens (jus wo parameers) wr wd 2 ( ' ) E ' ' 1 ' x x e f» Consan adjusmen (one parameer) x x f wr wd E ' ' '

144 Parameric cos funcion approximaion Opimizing he CFA:» Le F(, ) be a simulaion of our policy given by T 0» We hen compue he gradien wih respec o F(, ) C S ( ), X ( S ( ) ) F(, ) F(, )» The parameer is found using a classical sochasic gradien algorihm: F(, ) n 1 n n n 1 n We esed several sepsize formulas and found ha ADAGRAD worked bes: h 2 an = G = å( xf( x, W+ 1) ) G + e ' = 0

145 Parameric cos funcion approximaion Opimizing he CFA:» We compue he gradien by applying he chain rule where he ineracion from one ime period o he nex is capured using» Assuming here are no ineger variables, hese equaions are quie easy o compue.

146 s = 0 s = 10 f f s = 20 s = 30 f Lookup able f Exponenial funcion Consan parameer

147 Parameric cos funcion approximaion Improvemen over deerminisic benchmark: Lookup able Exponenial Consan

148 Parameric cos funcion approximaion The parameric CFA represens a fundamenal rehinking of he modeling of sochasic programming problems:» From hinking of he lookahead model as he objecive funcion: max cx 0 0 p( ) c( ) x( ) T 1» To acknowledging ha he lookahead model is a policy for solving he base model T max, ( ), E C S X S W S which is a simulaor where we do no have o make any of he sandard approximaions required in sochasic programming.

149 An energy sorage problem Consider a basic energy sorage problem:» We are going o show ha wih minor variaions in he characerisics of his problem, we can make each class of policy work bes.

150 An energy sorage problem We can creae disinc flavors of his problem:» Problem class 1 Bes for PFAs Highly sochasic (heavy ailed) elecriciy prices Saionary daa» Problem class 2 Bes for CFAs Sochasic prices and wind (bu no heavy ailed) Saionary daa» Problem class 3 - Bes for VFAs Sochasic wind and prices (bu no oo random) Time varying loads, bu inaccurae wind forecass» Problem class 4 Bes for deerminisic lookaheads Relaively low noise problem wih accurae forecass» Problem class 5 A hybrid policy worked bes here Sochasic prices and wind, nonsaionary daa, noisy forecass.

151 An energy sorage problem The policies» The PFA: Charge baery when price is below p1 Discharge when price is above p2» The CFA Opimize over a horizon H; mainain upper and lower bounds (u, l) for every ime period excep he firs (noe ha his is a hybrid wih a lookahead).» The VFA Piecewise linear, concave value funcion in erms of energy, indexed by ime.» The lookahead (deerminisic) Opimize over a horizon H (only unable parameer) using forecass of demand, prices and wind energy» The lookahead CFA Use a lookahead policy (deerminisic), bu wih a unable parameer ha improves robusness.

152 An energy sorage problem Each policy is bes on cerain problems» Resuls are percen of poserior opimal soluion» any policy migh be bes depending on he daa. Join research wih Prof. Sephan Meisel, Universiy of Muenser, Germany.

153 Sochasic Approximae Robus Decision dynamic opimizaion Simulaion programming analysis opimizaion Opimal Dynamic Model learning Programming predicive and Sochasic conrol Opimal Bandi conrol search conrol problems Online programming Sochasic conrol Warren Powell 2017 Reinforcemen learning Markov decision processes compuaion Simulaion opimizaion

154 Sochasic search Sochasic programming Sochasic gradiens Policy search (PFAs) Ranking and selecion Cos funcion approx. (CFAs) Simulaion opimizaion Reinforcemen learning/adp Value funcion approx. (VFAs) Lookahead policies (DLAs) Muli-armed bandis /opimal learning Markov decision processes Opimal conrol Warren Powell 2017

155 Theory Compuaion Modeling Applicaions

156 Thank you! hp:// A uorial on his opic is available a he op of hp://

Tutorial: A Unified Framework for Optimization under Uncertainty

Tutorial: A Unified Framework for Optimization under Uncertainty Tuorial: A Unified Framework for Opimizaion under Uncerainy Informs Annual Meeing - Nashville November 13, 2016 Warren B. Powell Princeon Universiy Deparmen of Operaions Research and Financial Engineering