From Multiarmed Bandits to Stochastic Optimization

Size: px
Start display at page:

Download "From Multiarmed Bandits to Stochastic Optimization"

Transcription

1 From Muliarmed Bandis o Sochasic Opimizaion Muliarmed Bandis Workshop Roerdam, NL May 24, 2018 Warren B. Powell Princeon Universiy Deparmen of Operaions Research and Financial Engineering 2018 Warren B. Powell, Princeon Universiy

2 The muliarmed bandi sory

3 Maerials science» Opimizing payloads: reacive species, biomolecules, fluorescen markers,» Conrollers for roboic scienis for maerials science experimens» Opimizing nanoparicles o maximize phooconduciviy

4 Learning problems Healh sciences» Sequenial design of experimens for drug discovery» Drug delivery Opimizing he design of proecive membranes o conrol drug release» Medical decision making Opimal learning for medical reamens.

5 Drug discovery Opimizing he configuraion of molecules Design of effecive policies can accelerae he search process for new drugs.

6 Opimal learning in diabees How do we find he bes reamen for diabees?» The sandard reamen is a medicaion called meformin, which works for abou 70 percen of paiens.» Wha do we do when meformin does no work for a paien?» There are abou 20 oher reamens, and i is a process of rial and error. Docors need o ge hrough his process as quickly as possible.

7 Truckload brokerages Now we have a logisic curve for each origin-desinaion pair (i,j) P Y ( p, a ) e 1 e 0 a ij ij p ij a 0 a ij ij p ij a Shipper Carrier Number of offers for each (i,j) pair is relaively small. Need o generalize he learning across raffic lanes. Slides ha follow are from senior hesis of Connor Werh 2017 Offered price

8 Ad-click opimizaion Opimizing bids for inerne ads» In parnership wih Roomsage.com» Developed Princeon ad-click game» Teams compee o find bes policy Clicks Bid ($/click) Profis

9 Emergency sorm response The power grid» Loss of power creaes cascading failures (lack of fuel, inabiliy o pump waer)» How o plan?» How o reac? Hurricane Sandy» Once in 100 years?» Rare convergence of evens» Bu, meeorologiss did an amazing job of forecasing he sorm.

10 Emergency sorm response cal l X X 0.76 cal l cal l X X X cal l 0.03 cal l X 0.67 cal l cal l 10

11 Emergency sorm response cal l X X 0.76 cal l X cal l cal l cal l X X X 0 cal l cal l 11

12 Emergency sorm response cal l X X 0.76 cal l X cal l cal l cal l 0 0 X 0 0 X X cal l cal l 12

13 The bandi vocabulary

14 The bandi vocabulary

15 Arms

16 and bandis

17 Muliarmed bandi problems Wha is a bandi problem?» The lieraure seems o characerize a bandi problem as any problem where a policy has o balance exploraion vs. exploiaion.» Bu his means ha a bandi problem is defined by how i is solved. E.g., if you use a pure exploraion policy, is i a bandi problem? My definiion:» Any sequenial decision problem which involves learning, and where we have direc or indirec conrol over he informaion ha is colleced.

18 Muliarmed bandi problems Dimensions of a bandi problem:» The arms (decisions) may be Binary (A/B esing, sopping problems) Discree alernaives (drug, caalys, ) Coninuous choices (price) Vecor-valued (baskeball eam, producs, movies, ) Muliaribue (aribues of a movie, song, person) Saic vs. dynamic choice ses Sequenial vs. bach» Informaion (wha we observe) Success-failure/discree oucome Exponenial family (e.g. Gaussian, exponenial, ) Heavy-ailed (e.g. Cauchy) Daa-driven (disribuion unknown) Saionary vs. nonsaionary processes Lagged responses? Adversarial?

19 Muliarmed bandi problems Dimensions of a bandi problem:» Belief models Lookup ables (hese are mos common) Independen or correlaed beliefs Parameric models Linear or nonlinear in he parameers Nonparameric models Locally linear Deep neural neworks/svm Bayesian vs. frequenis» Objecive funcion Expeced performance (e.g. regre) Offline (final reward) vs. online (cumulaive reward) Jus ineresed in final design? Or opimizing while learning? Risk merics

20 Ouline Elemens of a sequenial decision model Mixed sae problems Designing policies Searching for he bes policy

21 Ouline Elemens of a sequenial decision model Mixed sae problems Designing policies Searching for he bes policy

22 Modeling Any sequenial decision problem consiss of five core elemens:» Sae variables» Decision variables» Exogenous informaion» Transiion funcion» Objecive funcion

23 Modeling dynamic problems The sae variable: Conrols communiy x "Informaion sae" Operaions research/mdp/compuer science S R, I, B Sysem sae, where: R Resource sae (physical sae) Locaion/saus of ruck/rain/plane Energy in sorage I Informaion sae Prices Weaher B Belief sae ("sae of knowledge") Belief abou raffic delays Belief abou he saus of equipmen

24 Modeling dynamic problems The sae variable:» The iniial sae conains: All deerminisic parameers Iniial values of dynamic parameers Prior disribuion of belief abou unknown parameers» The dynamic sae conains All informaion ha changes over ime. Physical sae Informaion sae Belief sae (Bayesian updaing): m bm + b n n W n+ 1 n+ 1 x x = n W b b = b + b n+ 1 n W x x + b W

25 Modeling dynamic problems Decisions: Markov decision processes/compuer science a Discree acion Conrol heory u Low-dimensional coninuous vecor Operaions research x Usually a discree or coninuous bu high-dimensional vecor of decisions. A his poin, we do no specify how o make a decision. Insead, we define he funcion X ( s) (or A ( s) or U ( s)), where specifies he ype of policy. " " carries informaion abou he ype of funcion f, and any unable parameers f.

26 The decision variables Syles of decisions»binary xx 0,1» Finie x X 1,2,..., M Classic bandi model» Coninuous scalar x X a, b» Coninuous vecor x( x1,..., xk), xk» Discree vecor x( x1,..., xk), xk» Caegorical x ( a,..., a ), a is a caegory (e.g. paien aribues) 1 I i

27 Modeling dynamic problems Exogenous informaion: W New informaion ha firs became known a ime = Rˆ, Dˆ, pˆ, Eˆ Rˆ Equipmen failures, delays, new arrivals New drivers being hired o he nework Dˆ New cusomer demands pˆ Changes in prices Eˆ Informaion abou he environmen (emperaure,...) Noe: Any variable indexed by is known a ime. This convenion, which is no sandard in conrol heory, dramaically simplifies he modeling of informaion. Below, we le represen a sequence of acual observaions W1, W2,... W refers o a sample realizaion of he random variable W.

28 Modeling dynamic problems The ransiion funcion M S 1 S ( S, x, W 1) R ˆ 1 R x R 1 Invenories p ˆ 1 p p 1 Spo prices D D Dˆ Marke demands 1 1 W n n W n1 n1 x x n W n1 n W x x Also known as he: Sysem model Sae ransiion model Plan model Plan equaion Transiion law Bayesian updaing of belief Transfer funcion Transformaion funcion Law of moion Model

29 Modeling sochasic, dynamic problems The universal objecive funcion» Cumulaive reward (classical bandi objecive) T max, ( ), C S X S W S 0» Final reward ( bes arm bandi objecive) max F( x, W) Given a sysem model (ransiion funcion) S S S, x, W ( ) and a sochasic process: M 1 1 S0, W1, W2,..., WT 1 0, N ˆ

30 Sochasic Approximae Robus Decision dynamic opimizaion Simulaion programming analysis opimizaion Opimal Dynamic Model learning Programming predicive and Sochasic conrol Opimal Bandi conrol search conrol problems Online programming Sochasic conrol Reinforcemen learning Markov decision processes compuaion Simulaion opimizaion

31 Ouline Elemens of a sequenial decision model Mixed sae problems Designing policies Searching for he bes policy

32 Modeling dynamic problems Some major problem classes» Pure physical sae Invenory problems Sochasic shores pah problems» Physical plus informaion Invenory wih exogenous prices, weaher,» Pure belief saes These are classical bandi problems Differen ypes of belief models» Belief plus informaion Paien arriving o docor s office who hen prescribes a drug. Conexual bandi problems» Everyhing: Revenue managemen Clinical rials

33 Modeling dynamic problems Mixed sae problems (physical and belief sae)» Clinical rials Learning he performance of a new drug (belief sae) Tracking he number of paiens signed up (physical sae)» Revenue managemen for hoels Learning marke response o price (belief sae) Tracking how many rooms have been reserved (physical sae)» An energy sorage problem

34 An energy sorage problem Consider a basic energy sorage problem:» We have o manage he flows of energy (blue lines) while managing differen sources of uncerainy.

35 An energy sorage problem E Transiion funcion wihou learning B L G E E Eˆ 1 1 p p p p D p f D D 1, 1 1 R R x baery baery 1

36 An energy sorage problem E Transiion funcion wih passive learning B L E E Eˆ 1 1 p p p p D p f D D 1, 1 1 R R x baery baery 1

37 Learning in sochasic opimizaion Updaing he demand parameer»le be he new price and le n F ( x ) p p p» We updae our esimae using our recursive leas squares equaions: 1 q = q - B fe e g ( T ff ( ) ) + 1 g = F( x q )-p, 1 B = B - B B T g = 1 + ( f) B f æ p ö = ç çèp ø f p- 1-2

38 An energy sorage problem E Transiion funcion wih acive learning B L E E Eˆ 1 1 p p p p x D GB p f D D 1, 1 1 R R x baery baery 1

39 Ouline Elemens of a sequenial decision model Mixed sae problems Designing policies Searching for he bes policy

40 Designing policies We have o sar by describing wha we mean by a policy.» Definiion: A policy is a mapping from a sae o an acion. any mapping. How do we search over an arbirary space of policies?

41 Designing policies Two fundamenal sraegies: 1) Policy search Search over a class of funcions for making decisions o opimize some meric. T max f f C, ( ) (, ) S f F X S S 0 0 2) Lookahead approximaions Approximae he impac of a decision now on he fuure. T * X ( ) arg max (, ) max S x C S ( ', '( ')) 1, x C S X S S S x ' 1

42 Designing policies Policy search: 1a) Policy funcion approximaions (PFAs) Lookup ables when in his sae, ake his acion Parameric funcions Order-up-o policies: if invenory is less han s, order up o S. PFA Affine policies - x X ( S ) ff ( S) ff Neural neworks Locally/semi/non parameric Requires opimizing over local regions 1b) Cos funcion approximaions (CFAs) Opimizing a deerminisic model modified o handle uncerainy (buffer socks, schedule slack) CFA X ( S ) argmax x x x PFA x X ( S )

43 Designing policies Lookahead policies 2a) Value funcion approximaions We approximae he impac of a decision on he fuure T * X ( ) arg max (, ) max S x C S ( ', '( ')) 1, x C S X S S S x ' 1 Approximaing he value of being in a downsream sae using machine learning ( value funcion approximaions ) X ( S ) arg max C( S, x ) V ( S ) S, x * x 1 1 X ( S ) arg max C( S, x ) V ( S ) S, x VFA x 1 1 x x arg max CS (, x) V ( S) x

44 Designing policies Lookahead policies 2a) Value funcion approximaions We approximae he impac of a decision on he fuure T * X ( ) arg max (, ) max S x C S ( ', '( ')) 1, x C S X S S S x ' 1 Approximaing he value of being in a downsream sae using machine learning ( value funcion approximaions ) X ( S ) arg max C( S, x ) V ( S ) S, x * x 1 1 X ( S ) arg max C( S, x ) V ( S ) S, x VFA x 1 1 x x arg max CS (, x) V ( S) x

45 Designing policies Lookahead policies 2a) Value funcion approximaions We approximae he impac of a decision on he fuure T * X ( ) arg max (, ) max S x C S ( ', '( ')) 1, x C S X S S S x ' 1 Approximaing he value of being in a downsream sae using machine learning ( value funcion approximaions ) X ( S ) arg max C( S, x ) V ( S ) S, x * x 1 1 X ( S ) arg max C( S, x ) V ( S ) S, x VFA x 1 1 x x arg max CS (, x) V ( S) x

46 Designing policies 2b) Direc lookahead policies T * X ( ) arg max (, ) max S x C S ( ', '( ')) 1, x C S X S S S x ' 1

47 Designing policies 2b) Direc lookahead policies» We replace he exac lookahead T * X ( ) arg max (, ) max S x C S ( ', '( ')) 1, x C S X S S S x ' 1 wih an approximaion called he lookahead model: H * X ( S) arg max x (, ) max ( ', '( ')), 1, C S x C S X S S S x ' 1» A lookahead policy works by approximaing he lookahead model.

48 Designing policies Types of lookahead approximaions» One-sep lookahead Widely used in pure learning policies: Bayes greedy/naïve Bayes Thompson sampling Value of informaion (knowledge gradien)» Muli-sep lookahead Deerminisic lookahead, also known as model predicive conrol, rolling horizon procedure Sochasic lookahead: Two-sage (widely used in sochasic linear programming) Mulisage» Mone carlo ree search (MCTS) for discree acion spaces» Mulisage scenario rees (sochasic linear programming) ypically no racable.

49 Four (mea)classes of policies Policy search Lookahead approximaions 1) Policy funcion approximaions (PFAs)» Lookup ables, rules, parameric/nonparameric funcions 2) Cos funcion approximaion (CFAs) CFA» X ( S ) argmax C ( S, ) ( ) x x X 3) Policies based on value funcion approximaions (VFAs)» VFA x x X ( S) argmax x C( S, ) (, ) x V S S x 4) Direc lookahead policies (DLAs)» Deerminisic lookahead/rolling horizon proc./model predicive conrol» Chance consrained programming PAx [ fw ( )] 1» Sochasic lookahead /sochasic prog/mone Carlo ree search x, x,1,..., x,t» Robus opimizaion T LAD ' ' x,..., x, H ' 1 X ( S ) arg max C( S, x ) C( S, x ) T LAS ' ' ' 1 X ( S ) arg max C( S, x ) p( ) C( S ( ), x ( )) T LARO ' ' x,..., x, H ww ( ) ' 1 X ( S ) arg max min C( S, x ) C( S ( w), x ( w))

50 Four (mea)classes of policies Funcion approx. 1) Policy funcion approximaions (PFAs)» Lookup ables, rules, parameric/nonparameric funcions 2) Cos funcion approximaion (CFAs) CFA» X ( S ) argmax C ( S, ) ( ) x x X 3) Policies based on value funcion approximaions (VFAs)» VFA x x X ( S) argmax x C( S, ) (, ) x V S S x 4) Direc lookahead policies (DLAs)» Deerminisic lookahead/rolling horizon proc./model predicive conrol T LAD ' ' x,..., x, H ' 1 X ( S ) arg max C( S, x ) C( S, x )» Chance consrained programming PAx [ fw ( )] 1» Sochasic lookahead /sochasic prog/mone Carlo ree search x, x,1,..., x,t» Robus opimizaion T LAS ' ' ' 1 X ( S ) arg max C( S, x ) p( ) C( S ( ), x ( )) T LARO ' ' x,..., x, H ww ( ) ' 1 X ( S ) arg max min C( S, x ) C( S ( w), x ( w))

51 Four (mea)classes of policies Imbedded opimizaion 1) Policy funcion approximaions (PFAs)» Lookup ables, rules, parameric/nonparameric funcions 2) Cos funcion approximaion (CFAs) CFA» X ( S ) argmax C ( S, ) ( ) x x X 3) Policies based on value funcion approximaions (VFAs)» VFA x x X ( S) argmax x C( S, ) (, ) x V S S x 4) Direc lookahead policies (DLAs)» Deerminisic lookahead/rolling horizon proc./model predicive conrol T LAD ' ' x,..., x, H ' 1 X ( S ) arg max C( S, x ) C( S, x )» Chance consrained programming PAx [ fw ( )] 1» Sochasic lookahead /sochasic prog/mone Carlo ree search x, x,1,..., x,t» Robus opimizaion T LAS ' ' ' 1 X ( S ) arg max C( S, x ) p( ) C( S ( ), x ( )) T LARO ' ' x,..., x, H ww ( ) ' 1 X ( S ) arg max min C( S, x ) C( S ( w), x ( w))

52 Policies for pure learning problems 1) Policy funcion approximaion (PFA)» Revenue maximizaion problem Demand funcion D( p q ) = q -q Revenue n n n 1 2 R( p q ) = pd( p) = q p-q p Need o une p n n n 1 2 PFA policy pure exploiaion n q p = 2q n 1 n 2 PFA policy wih acive exploraion ( exciaion policy ) q e p = + e e N(0, s ) n n 1 n n n 2q2 2

53 Policies for pure learning problems 1) Policy funcion approximaion (PFA)» Linear decision rules ( affine policies ) X ( S q) = q + qf ( S ) + q f ( S ) q f ( S ) PFA n n n n F F» Neural neworks

54 Policies for pure learning problems 2) Cos funcion approximaions (CFA)» Upper confidence bounding X ( S ) argmax» Inerval esimaion log n N UCB n UCB n UCB x x n x 0 n z x IE n IE n IE n X ( S ) argmax x x x» Bolzmann exploraion ( sof max ) n x n e Choose x wih probabiliy: Px ( ) e x' n x '

55 Policies for pure learning problems A learning problem wih correlaed beliefs

56 Policies for pure learning problems Picking a he mean. means we are evaluaing each choice

57 Policies for pure learning problems Picking means we are evaluaing each choice a he 95 h percenile

58 Policies for pure learning problems PFAs and CFAs have o be uned» Final reward ( offline learning ) max F ( x N, W ˆ) ( x N IE ( ), W ˆ),, IE 1 N Wˆ,..., W» Cumulaive reward ( online learning ) max, ( ), T IE IE E C S X S W 1 S 0 0» Boh require searching over unable parameers. Offline uning is classical sochasic search Online uning is a relaively open research area W

59 Cos funcion approximaions Tuning he inerval esimaion policy X ( S ) argmax IE n IE n IE n x x x

60 Policies for pure learning problems 3) Policies based on value funcion approximaions» VFAs using a physical sae problem V S C S x E V S S n n n n1 n1 n ( ) max x (, ) ( ) Curren node (e.g. node 2)

61 Policies for pure learning problems 3) Policies based on value funcion approximaions» VFAs using a physical sae problem V S C S x E V S S n n n n1 n1 n ( ) max x (, ) ( ) Decision o go o a node (e.g. 5) Downsream node (e.g. 5)

62 Policies for pure learning problems 3) Policies based on value funcion approximaions» VFAs using a learning problem 2 5 S 5 N, S ( S,..., S ) n n n V S C S x E V S S n n n n1 n1 n ( ) max x (, ) ( ) Curren sae of knowledge New sae of knowledge Decision o make a measuremen

63 Policies for pure learning problems 3) Policies based on value funcion approximaions» Illusraion: finding he bes drug in he se» Afer a es we observe success or failure: ì n+ 1 1 Success n Wx = ï í If x = x ïî 0 Failure»Le Probabiliy ha drug x is successful. We assume ha n n n rx S Bea( ax, bx) where is our belief sae, wih updaing equaions: a = a + W, b = b + (1-W ) n+ 1 n n+ 1 n+ 1 n n+ 1 x x x x x x

64 Policies for pure learning problems 3) Policies based on value funcion approximaions» Bellman s equaion: V = é êë W + V + W + -W S n n n n 1 n 1 n n 1 n n 1 n ( a, b ) max x + x g + ( a +, b 1 + )» This can be solved for a sopping problem o deermine when o sop esing a single drug.» Problemaic if and are vecors. Giins developed a novel decomposiion ha allows us o solve his problem for one drug ( arm ) a a ime. ù úû

65 Policies for pure learning problems 3) Policies based on value funcion approximaions» For normally disribued rewards, Giins (1974) showed ha we can solve dynamic programs for each alernaive.» Produces a policy ha looks like X æ æ n s ö ö ( S ) = arg max x mx s, g + G W ç ç s è è øø Gi n n W x æ n s ö ç W x where, g is he Giins index obained by Gç ç s çè ø solving a dynamic program for wheher o coninue or sop esing a single drug.» Considered a compuaional breakhrough, bu compuing Giins indices is sill a challenge, and only applies o special cases.

66 Policies for pure learning problems 4) Policies based on direc lookaheads (DLA)» The knowledge gradien for offline (final reward): E max F( y, B ( x)) max F( y, B ) KG, n n 1 n x y y Finding he new design wih our new belief (bu wihou knowing he oucome of he experimen) Choosing he bes design given wha we know now. Curren belief sae Averaging over he possible oucomes of he experimen (and our differen beliefs abou parameers) Updaed parameer esimaes afer running experimen wih densiy x. Proposed experimen

67 The knowledge gradien 4) Policies based on direc lookaheads (DLA)» The knowledge gradien compues he expeced improvemen from a single experimen Change which produces a change in he decision. Change in esimae of value of opion 5 due o measuremen

68 The knowledge gradien Esimaed value of alernaive Sandard deviaion Knowledge gradien

69 The knowledge gradien

70 The knowledge gradien

71 The knowledge gradien Some properies of he knowledge gradien for offline (final reward) problems.,» Asympoically opimal (finds bes x in he limi)» Opimal (by consrucion) if budge =1.» Opimal for all n if number of alernaives = 2 (e.g. A/B esing).» Only saionary policy ha is boh myopically and asympoically opimal. For online problems» Asympoically opimal (finds bes x in he limi) as

72

73 The knowledge gradien Differen belief models» Lookup ables Independen beliefs Correlaed beliefs» Linear parameric models Linear models Sparse-linear Tree regression» Nonlinear parameric models Logisic regression Neural neworks» Nonparameric models Gaussian process regression Kernel regression Suppor vecor machines Deep neural neworks

74 The knowledge gradien The marginal value of informaion» Repeaedly sampling he same alernaive Value of informaion Number of imes we sample he same alernaive

75 The knowledge gradien The marginal value of informaion» The value of informaion may be concave if an experimen is noisy Value of informaion Number of imes we sample he same alernaive

76 The knowledge gradien The marginal value of informaion» The value of informaion may be concave if an experimen is noisy Value of informaion Number of imes we sample he same alernaive

77 The knowledge gradien From offline o online learning» The knowledge gradien compues he value of informaion for a erminal reward objecive: E max F( y, B ( x)) max F( y, B ) KG, n n 1 n x y y» Imagine ha we have a budge of N experimens, and ha we are summing rewards over his horizon. The value of informaion from a single experimen is now ( Nn) KG, n KG, x x x Expeced reward Remaining horizon Offline KG

78 The knowledge gradien Knowledge gradien for offline and online learning Offline learning KG, n x Online learning ( N n) KG, n KG, x x x» This bridges wha have hisorically been fundamenally differen fields.

79 Ouline Elemens of a sequenial decision model Mixed sae problems Designing policies Searching for he bes policy

80 Designing policies Finding he bes policy» We have o firs ariculae our classes of policies f f PFAs, CFAs, VFAs, DLAs Parameers ha characerize each family.» So minimizing over means: f,» We hen have o pick an objecive such as T max C S, X ( S ) S 0 or f 0 max F( X, W) S T 0

81 Muliarmed bandi problems Policy search class» Policies end o be relaively simple and easy o compue» Well suied o rapid (e.g. inerne speed) learning applicaions needing fas compuaion.» Tuning is imporan, and ypically requires a realisic simulaor. Lookahead class» Policies can be relaively complex o compue.» Well suied o problems wih expensive experimens.» Typically avoids uning, bu may require a prior.

82 Muliarmed bandi problems Noes:» Any of he four classes of policies may be appropriae depending on he characerisics of he problem.» Acive learning arises in many applicaions, bu is ofen overlooked.» The bandi culure of coming up wih problem variaions should be inheried by oher communiies.» Bandi researchers ofen focus on good bu no opimal policies (e.g. UCB policies) wih good characerisics (e.g. robus across a wide range of disribuions).

83 MOLTE Modular, opimal learning esing environmen» Malab-based environmen wih modular library of problems and algorihms, each in is own.m file.» User specifies in a spreadshee which algorihms are run on which problems hp://

84 MOLTE Comparison on library problems

85 Princeon ad-click game In collaboraion wih Roomsage.com

86 Princeon ad-click game Learning he bid-response curve Clicks Bid ($/click)» Varies by hour of week» Response depends on locaion, age, gender, device

87 Princeon ad-click game The ad-click game:» Learn he bes policy for bidding for ads» Bids compee in a simulaed aucion following he rules used by Google

88 Thank you! For more informaion, please visi: hp:// See Courses or he jungle webpages.

Tutorial: A Unified Modeling and Algorithmic Framework for Optimization under Uncertainty

Tutorial: A Unified Modeling and Algorithmic Framework for Optimization under Uncertainty Tuorial: A Unified Modeling and Algorihmic Framework for Opimizaion under Uncerainy Informs Opimizaion Sociey Meeing March 23, 2018 Warren B. Powell Princeon Universiy Deparmen of Operaions Research and

More information

Tutorial: A Unified Framework for Optimization under Uncertainty

Tutorial: A Unified Framework for Optimization under Uncertainty Tuorial: A Unified Framework for Opimizaion under Uncerainy Informs Annual Meeing - Nashville November 13, 2016 Warren B. Powell Princeon Universiy Deparmen of Operaions Research and Financial Engineering

More information

Energy Storage and Renewables in New Jersey: Complementary Technologies for Reducing Our Carbon Footprint

Energy Storage and Renewables in New Jersey: Complementary Technologies for Reducing Our Carbon Footprint Energy Sorage and Renewables in New Jersey: Complemenary Technologies for Reducing Our Carbon Fooprin ACEE E-filliaes workshop November 14, 2014 Warren B. Powell Daniel Seingar Harvey Cheng Greg Davies

More information

An introduction to the theory of SDDP algorithm

An introduction to the theory of SDDP algorithm An inroducion o he heory of SDDP algorihm V. Leclère (ENPC) Augus 1, 2014 V. Leclère Inroducion o SDDP Augus 1, 2014 1 / 21 Inroducion Large scale sochasic problem are hard o solve. Two ways of aacking

More information

Energy Storage Benchmark Problems

Energy Storage Benchmark Problems Energy Sorage Benchmark Problems Daniel F. Salas 1,3, Warren B. Powell 2,3 1 Deparmen of Chemical & Biological Engineering 2 Deparmen of Operaions Research & Financial Engineering 3 Princeon Laboraory

More information

Diebold, Chapter 7. Francis X. Diebold, Elements of Forecasting, 4th Edition (Mason, Ohio: Cengage Learning, 2006). Chapter 7. Characterizing Cycles

Diebold, Chapter 7. Francis X. Diebold, Elements of Forecasting, 4th Edition (Mason, Ohio: Cengage Learning, 2006). Chapter 7. Characterizing Cycles Diebold, Chaper 7 Francis X. Diebold, Elemens of Forecasing, 4h Ediion (Mason, Ohio: Cengage Learning, 006). Chaper 7. Characerizing Cycles Afer compleing his reading you should be able o: Define covariance

More information

State-Space Models. Initialization, Estimation and Smoothing of the Kalman Filter

State-Space Models. Initialization, Estimation and Smoothing of the Kalman Filter Sae-Space Models Iniializaion, Esimaion and Smoohing of he Kalman Filer Iniializaion of he Kalman Filer The Kalman filer shows how o updae pas predicors and he corresponding predicion error variances when

More information

Notes on Kalman Filtering

Notes on Kalman Filtering Noes on Kalman Filering Brian Borchers and Rick Aser November 7, Inroducion Daa Assimilaion is he problem of merging model predicions wih acual measuremens of a sysem o produce an opimal esimae of he curren

More information

T L. t=1. Proof of Lemma 1. Using the marginal cost accounting in Equation(4) and standard arguments. t )+Π RB. t )+K 1(Q RB

T L. t=1. Proof of Lemma 1. Using the marginal cost accounting in Equation(4) and standard arguments. t )+Π RB. t )+K 1(Q RB Elecronic Companion EC.1. Proofs of Technical Lemmas and Theorems LEMMA 1. Le C(RB) be he oal cos incurred by he RB policy. Then we have, T L E[C(RB)] 3 E[Z RB ]. (EC.1) Proof of Lemma 1. Using he marginal

More information

Non-parametric techniques. Instance Based Learning. NN Decision Boundaries. Nearest Neighbor Algorithm. Distance metric important

Non-parametric techniques. Instance Based Learning. NN Decision Boundaries. Nearest Neighbor Algorithm. Distance metric important on-parameric echniques Insance Based Learning AKA: neares neighbor mehods, non-parameric, lazy, memorybased, or case-based learning Copyrigh 2005 by David Helmbold 1 Do no fi a model (as do LDA, logisic

More information

5. Stochastic processes (1)

5. Stochastic processes (1) Lec05.pp S-38.45 - Inroducion o Teleraffic Theory Spring 2005 Conens Basic conceps Poisson process 2 Sochasic processes () Consider some quaniy in a eleraffic (or any) sysem I ypically evolves in ime randomly

More information

Estimation of Poses with Particle Filters

Estimation of Poses with Particle Filters Esimaion of Poses wih Paricle Filers Dr.-Ing. Bernd Ludwig Chair for Arificial Inelligence Deparmen of Compuer Science Friedrich-Alexander-Universiä Erlangen-Nürnberg 12/05/2008 Dr.-Ing. Bernd Ludwig (FAU

More information

Non-parametric techniques. Instance Based Learning. NN Decision Boundaries. Nearest Neighbor Algorithm. Distance metric important

Non-parametric techniques. Instance Based Learning. NN Decision Boundaries. Nearest Neighbor Algorithm. Distance metric important on-parameric echniques Insance Based Learning AKA: neares neighbor mehods, non-parameric, lazy, memorybased, or case-based learning Copyrigh 2005 by David Helmbold 1 Do no fi a model (as do LTU, decision

More information

Vehicle Arrival Models : Headway

Vehicle Arrival Models : Headway Chaper 12 Vehicle Arrival Models : Headway 12.1 Inroducion Modelling arrival of vehicle a secion of road is an imporan sep in raffic flow modelling. I has imporan applicaion in raffic flow simulaion where

More information

Online Convex Optimization Example And Follow-The-Leader

Online Convex Optimization Example And Follow-The-Leader CSE599s, Spring 2014, Online Learning Lecure 2-04/03/2014 Online Convex Opimizaion Example And Follow-The-Leader Lecurer: Brendan McMahan Scribe: Sephen Joe Jonany 1 Review of Online Convex Opimizaion

More information

3.1.3 INTRODUCTION TO DYNAMIC OPTIMIZATION: DISCRETE TIME PROBLEMS. A. The Hamiltonian and First-Order Conditions in a Finite Time Horizon

3.1.3 INTRODUCTION TO DYNAMIC OPTIMIZATION: DISCRETE TIME PROBLEMS. A. The Hamiltonian and First-Order Conditions in a Finite Time Horizon 3..3 INRODUCION O DYNAMIC OPIMIZAION: DISCREE IME PROBLEMS A. he Hamilonian and Firs-Order Condiions in a Finie ime Horizon Define a new funcion, he Hamilonian funcion, H. H he change in he oal value of

More information

Subway stations energy and air quality management

Subway stations energy and air quality management Subway saions energy and air qualiy managemen wih sochasic opimizaion Trisan Rigau 1,2,4, Advisors: P. Carpenier 3, J.-Ph. Chancelier 2, M. De Lara 2 EFFICACITY 1 CERMICS, ENPC 2 UMA, ENSTA 3 LISIS, IFSTTAR

More information

Two Popular Bayesian Estimators: Particle and Kalman Filters. McGill COMP 765 Sept 14 th, 2017

Two Popular Bayesian Estimators: Particle and Kalman Filters. McGill COMP 765 Sept 14 th, 2017 Two Popular Bayesian Esimaors: Paricle and Kalman Filers McGill COMP 765 Sep 14 h, 2017 1 1 1, dx x Bel x u x P x z P Recall: Bayes Filers,,,,,,, 1 1 1 1 u z u x P u z u x z P Bayes z = observaion u =

More information

Robust estimation based on the first- and third-moment restrictions of the power transformation model

Robust estimation based on the first- and third-moment restrictions of the power transformation model h Inernaional Congress on Modelling and Simulaion, Adelaide, Ausralia, 6 December 3 www.mssanz.org.au/modsim3 Robus esimaion based on he firs- and hird-momen resricions of he power ransformaion Nawaa,

More information

2016 Possible Examination Questions. Robotics CSCE 574

2016 Possible Examination Questions. Robotics CSCE 574 206 Possible Examinaion Quesions Roboics CSCE 574 ) Wha are he differences beween Hydraulic drive and Shape Memory Alloy drive? Name one applicaion in which each one of hem is appropriae. 2) Wha are he

More information

Augmented Reality II - Kalman Filters - Gudrun Klinker May 25, 2004

Augmented Reality II - Kalman Filters - Gudrun Klinker May 25, 2004 Augmened Realiy II Kalman Filers Gudrun Klinker May 25, 2004 Ouline Moivaion Discree Kalman Filer Modeled Process Compuing Model Parameers Algorihm Exended Kalman Filer Kalman Filer for Sensor Fusion Lieraure

More information

Probabilistic Robotics

Probabilistic Robotics Probabilisic Roboics Bayes Filer Implemenaions Gaussian filers Bayes Filer Reminder Predicion bel p u bel d Correcion bel η p z bel Gaussians : ~ π e p N p - Univariae / / : ~ μ μ μ e p Ν p d π Mulivariae

More information

Smoothing. Backward smoother: At any give T, replace the observation yt by a combination of observations at & before T

Smoothing. Backward smoother: At any give T, replace the observation yt by a combination of observations at & before T Smoohing Consan process Separae signal & noise Smooh he daa: Backward smooher: A an give, replace he observaion b a combinaion of observaions a & before Simple smooher : replace he curren observaion wih

More information

Simulation-Solving Dynamic Models ABE 5646 Week 2, Spring 2010

Simulation-Solving Dynamic Models ABE 5646 Week 2, Spring 2010 Simulaion-Solving Dynamic Models ABE 5646 Week 2, Spring 2010 Week Descripion Reading Maerial 2 Compuer Simulaion of Dynamic Models Finie Difference, coninuous saes, discree ime Simple Mehods Euler Trapezoid

More information

Application of a Stochastic-Fuzzy Approach to Modeling Optimal Discrete Time Dynamical Systems by Using Large Scale Data Processing

Application of a Stochastic-Fuzzy Approach to Modeling Optimal Discrete Time Dynamical Systems by Using Large Scale Data Processing Applicaion of a Sochasic-Fuzzy Approach o Modeling Opimal Discree Time Dynamical Sysems by Using Large Scale Daa Processing AA WALASZE-BABISZEWSA Deparmen of Compuer Engineering Opole Universiy of Technology

More information

Ensamble methods: Boosting

Ensamble methods: Boosting Lecure 21 Ensamble mehods: Boosing Milos Hauskrech milos@cs.pi.edu 5329 Senno Square Schedule Final exam: April 18: 1:00-2:15pm, in-class Term projecs April 23 & April 25: a 1:00-2:30pm in CS seminar room

More information

Machine Learning 4771

Machine Learning 4771 ony Jebara, Columbia Universiy achine Learning 4771 Insrucor: ony Jebara ony Jebara, Columbia Universiy opic 20 Hs wih Evidence H Collec H Evaluae H Disribue H Decode H Parameer Learning via JA & E ony

More information

L07. KALMAN FILTERING FOR NON-LINEAR SYSTEMS. NA568 Mobile Robotics: Methods & Algorithms

L07. KALMAN FILTERING FOR NON-LINEAR SYSTEMS. NA568 Mobile Robotics: Methods & Algorithms L07. KALMAN FILTERING FOR NON-LINEAR SYSTEMS NA568 Mobile Roboics: Mehods & Algorihms Today s Topic Quick review on (Linear) Kalman Filer Kalman Filering for Non-Linear Sysems Exended Kalman Filer (EKF)

More information

Distribution of Estimates

Distribution of Estimates Disribuion of Esimaes From Economerics (40) Linear Regression Model Assume (y,x ) is iid and E(x e )0 Esimaion Consisency y α + βx + he esimaes approach he rue values as he sample size increases Esimaion

More information

Zürich. ETH Master Course: L Autonomous Mobile Robots Localization II

Zürich. ETH Master Course: L Autonomous Mobile Robots Localization II Roland Siegwar Margaria Chli Paul Furgale Marco Huer Marin Rufli Davide Scaramuzza ETH Maser Course: 151-0854-00L Auonomous Mobile Robos Localizaion II ACT and SEE For all do, (predicion updae / ACT),

More information

Introduction to Probability and Statistics Slides 4 Chapter 4

Introduction to Probability and Statistics Slides 4 Chapter 4 Inroducion o Probabiliy and Saisics Slides 4 Chaper 4 Ammar M. Sarhan, asarhan@mahsa.dal.ca Deparmen of Mahemaics and Saisics, Dalhousie Universiy Fall Semeser 8 Dr. Ammar Sarhan Chaper 4 Coninuous Random

More information

Excel-Based Solution Method For The Optimal Policy Of The Hadley And Whittin s Exact Model With Arma Demand

Excel-Based Solution Method For The Optimal Policy Of The Hadley And Whittin s Exact Model With Arma Demand Excel-Based Soluion Mehod For The Opimal Policy Of The Hadley And Whiin s Exac Model Wih Arma Demand Kal Nami School of Business and Economics Winson Salem Sae Universiy Winson Salem, NC 27110 Phone: (336)750-2338

More information

Ensamble methods: Bagging and Boosting

Ensamble methods: Bagging and Boosting Lecure 21 Ensamble mehods: Bagging and Boosing Milos Hauskrech milos@cs.pi.edu 5329 Senno Square Ensemble mehods Mixure of expers Muliple base models (classifiers, regressors), each covers a differen par

More information

m = 41 members n = 27 (nonfounders), f = 14 (founders) 8 markers from chromosome 19

m = 41 members n = 27 (nonfounders), f = 14 (founders) 8 markers from chromosome 19 Sequenial Imporance Sampling (SIS) AKA Paricle Filering, Sequenial Impuaion (Kong, Liu, Wong, 994) For many problems, sampling direcly from he arge disribuion is difficul or impossible. One reason possible

More information

Inventory Control of Perishable Items in a Two-Echelon Supply Chain

Inventory Control of Perishable Items in a Two-Echelon Supply Chain Journal of Indusrial Engineering, Universiy of ehran, Special Issue,, PP. 69-77 69 Invenory Conrol of Perishable Iems in a wo-echelon Supply Chain Fariborz Jolai *, Elmira Gheisariha and Farnaz Nojavan

More information

Simulating models with heterogeneous agents

Simulating models with heterogeneous agents Simulaing models wih heerogeneous agens Wouer J. Den Haan London School of Economics c by Wouer J. Den Haan Individual agen Subjec o employmen shocks (ε i, {0, 1}) Incomplee markes only way o save is hrough

More information

Introduction to Mobile Robotics

Introduction to Mobile Robotics Inroducion o Mobile Roboics Bayes Filer Kalman Filer Wolfram Burgard Cyrill Sachniss Giorgio Grisei Maren Bennewiz Chrisian Plagemann Bayes Filer Reminder Predicion bel p u bel d Correcion bel η p z bel

More information

Sequential Importance Resampling (SIR) Particle Filter

Sequential Importance Resampling (SIR) Particle Filter Paricle Filers++ Pieer Abbeel UC Berkeley EECS Many slides adaped from Thrun, Burgard and Fox, Probabilisic Roboics 1. Algorihm paricle_filer( S -1, u, z ): 2. Sequenial Imporance Resampling (SIR) Paricle

More information

Appendix to Creating Work Breaks From Available Idleness

Appendix to Creating Work Breaks From Available Idleness Appendix o Creaing Work Breaks From Available Idleness Xu Sun and Ward Whi Deparmen of Indusrial Engineering and Operaions Research, Columbia Universiy, New York, NY, 127; {xs2235,ww24}@columbia.edu Sepember

More information

Announcements. Recap: Filtering. Recap: Reasoning Over Time. Example: State Representations for Robot Localization. Particle Filtering

Announcements. Recap: Filtering. Recap: Reasoning Over Time. Example: State Representations for Robot Localization. Particle Filtering Inroducion o Arificial Inelligence V22.0472-001 Fall 2009 Lecure 18: aricle & Kalman Filering Announcemens Final exam will be a 7pm on Wednesday December 14 h Dae of las class 1.5 hrs long I won ask anyhing

More information

Planning in POMDPs. Dominik Schoenberger Abstract

Planning in POMDPs. Dominik Schoenberger Abstract Planning in POMDPs Dominik Schoenberger d.schoenberger@sud.u-darmsad.de Absrac This documen briefly explains wha a Parially Observable Markov Decision Process is. Furhermore i inroduces he differen approaches

More information

Single-loop System Reliability-based Topology Optimization Accounting for Statistical Dependence between Limit-states

Single-loop System Reliability-based Topology Optimization Accounting for Statistical Dependence between Limit-states 11 h US Naional Congress on Compuaional Mechanics Minneapolis, Minnesoa Single-loop Sysem Reliabiliy-based Topology Opimizaion Accouning for Saisical Dependence beween imi-saes Tam Nguyen, Norheasern Universiy

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Probabilisic reasoning over ime So far, we ve mosly deal wih episodic environmens Excepions: games wih muliple moves, planning In paricular, he Bayesian neworks we ve seen so far describe

More information

Object tracking: Using HMMs to estimate the geographical location of fish

Object tracking: Using HMMs to estimate the geographical location of fish Objec racking: Using HMMs o esimae he geographical locaion of fish 02433 - Hidden Markov Models Marin Wæver Pedersen, Henrik Madsen Course week 13 MWP, compiled June 8, 2011 Objecive: Locae fish from agging

More information

Conditional Independence

Conditional Independence Condiional Independence Rober A. Miller Dynamic Discree Choice Models March 2018 Miller (Dynamic Discree Choice Models) cemmap 2 March 2018 1 / 37 A Recapiulaion A dynamic discree choice model Each period

More information

RL Lecture 7: Eligibility Traces. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1

RL Lecture 7: Eligibility Traces. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 RL Lecure 7: Eligibiliy Traces R. S. Suon and A. G. Baro: Reinforcemen Learning: An Inroducion 1 N-sep TD Predicion Idea: Look farher ino he fuure when you do TD backup (1, 2, 3,, n seps) R. S. Suon and

More information

CHAPTER 10 VALIDATION OF TEST WITH ARTIFICAL NEURAL NETWORK

CHAPTER 10 VALIDATION OF TEST WITH ARTIFICAL NEURAL NETWORK 175 CHAPTER 10 VALIDATION OF TEST WITH ARTIFICAL NEURAL NETWORK 10.1 INTRODUCTION Amongs he research work performed, he bes resuls of experimenal work are validaed wih Arificial Neural Nework. From he

More information

1 Review of Zero-Sum Games

1 Review of Zero-Sum Games COS 5: heoreical Machine Learning Lecurer: Rob Schapire Lecure #23 Scribe: Eugene Brevdo April 30, 2008 Review of Zero-Sum Games Las ime we inroduced a mahemaical model for wo player zero-sum games. Any

More information

Chapter 2. Models, Censoring, and Likelihood for Failure-Time Data

Chapter 2. Models, Censoring, and Likelihood for Failure-Time Data Chaper 2 Models, Censoring, and Likelihood for Failure-Time Daa William Q. Meeker and Luis A. Escobar Iowa Sae Universiy and Louisiana Sae Universiy Copyrigh 1998-2008 W. Q. Meeker and L. A. Escobar. Based

More information

Maximum Likelihood Parameter Estimation in State-Space Models

Maximum Likelihood Parameter Estimation in State-Space Models Maximum Likelihood Parameer Esimaion in Sae-Space Models Arnaud Douce Deparmen of Saisics, Oxford Universiy Universiy College London 4 h Ocober 212 A. Douce (UCL Maserclass Oc. 212 4 h Ocober 212 1 / 32

More information

THE real challenge of stochastic optimization involves

THE real challenge of stochastic optimization involves IEEE TRANS. ON POWER SYSTEMS, VOL. XX, NO. X, DECEMBER XXXX 1 Tuorial on Sochasic Opimizaion in Energy II: An energy sorage illusraion Warren B. Powell, Member, IEEE, Sephan Meisel Absrac In Par I of his

More information

Problem Set 5. Graduate Macro II, Spring 2017 The University of Notre Dame Professor Sims

Problem Set 5. Graduate Macro II, Spring 2017 The University of Notre Dame Professor Sims Problem Se 5 Graduae Macro II, Spring 2017 The Universiy of Nore Dame Professor Sims Insrucions: You may consul wih oher members of he class, bu please make sure o urn in your own work. Where applicable,

More information

Presentation Overview

Presentation Overview Acion Refinemen in Reinforcemen Learning by Probabiliy Smoohing By Thomas G. Dieerich & Didac Busques Speaer: Kai Xu Presenaion Overview Bacground The Probabiliy Smoohing Mehod Experimenal Sudy of Acion

More information

SZG Macro 2011 Lecture 3: Dynamic Programming. SZG macro 2011 lecture 3 1

SZG Macro 2011 Lecture 3: Dynamic Programming. SZG macro 2011 lecture 3 1 SZG Macro 2011 Lecure 3: Dynamic Programming SZG macro 2011 lecure 3 1 Background Our previous discussion of opimal consumpion over ime and of opimal capial accumulaion sugges sudying he general decision

More information

Kriging Models Predicting Atrazine Concentrations in Surface Water Draining Agricultural Watersheds

Kriging Models Predicting Atrazine Concentrations in Surface Water Draining Agricultural Watersheds 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Kriging Models Predicing Arazine Concenraions in Surface Waer Draining Agriculural Waersheds Paul L. Mosquin, Jeremy Aldworh, Wenlin Chen Supplemenal Maerial Number

More information

Tom Heskes and Onno Zoeter. Presented by Mark Buller

Tom Heskes and Onno Zoeter. Presented by Mark Buller Tom Heskes and Onno Zoeer Presened by Mark Buller Dynamic Bayesian Neworks Direced graphical models of sochasic processes Represen hidden and observed variables wih differen dependencies Generalize Hidden

More information

Article from. Predictive Analytics and Futurism. July 2016 Issue 13

Article from. Predictive Analytics and Futurism. July 2016 Issue 13 Aricle from Predicive Analyics and Fuurism July 6 Issue An Inroducion o Incremenal Learning By Qiang Wu and Dave Snell Machine learning provides useful ools for predicive analyics The ypical machine learning

More information

Modal identification of structures from roving input data by means of maximum likelihood estimation of the state space model

Modal identification of structures from roving input data by means of maximum likelihood estimation of the state space model Modal idenificaion of srucures from roving inpu daa by means of maximum likelihood esimaion of he sae space model J. Cara, J. Juan, E. Alarcón Absrac The usual way o perform a forced vibraion es is o fix

More information

EE 315 Notes. Gürdal Arslan CLASS 1. (Sections ) What is a signal?

EE 315 Notes. Gürdal Arslan CLASS 1. (Sections ) What is a signal? EE 35 Noes Gürdal Arslan CLASS (Secions.-.2) Wha is a signal? In his class, a signal is some funcion of ime and i represens how some physical quaniy changes over some window of ime. Examples: velociy of

More information

Optimal Investment under Dynamic Risk Constraints and Partial Information

Optimal Investment under Dynamic Risk Constraints and Partial Information Opimal Invesmen under Dynamic Risk Consrains and Parial Informaion Wolfgang Puschögl Johann Radon Insiue for Compuaional and Applied Mahemaics (RICAM) Ausrian Academy of Sciences www.ricam.oeaw.ac.a 2

More information

Unit Root Time Series. Univariate random walk

Unit Root Time Series. Univariate random walk Uni Roo ime Series Univariae random walk Consider he regression y y where ~ iid N 0, he leas squares esimae of is: ˆ yy y y yy Now wha if = If y y hen le y 0 =0 so ha y j j If ~ iid N 0, hen y ~ N 0, he

More information

ACE 562 Fall Lecture 5: The Simple Linear Regression Model: Sampling Properties of the Least Squares Estimators. by Professor Scott H.

ACE 562 Fall Lecture 5: The Simple Linear Regression Model: Sampling Properties of the Least Squares Estimators. by Professor Scott H. ACE 56 Fall 005 Lecure 5: he Simple Linear Regression Model: Sampling Properies of he Leas Squares Esimaors by Professor Sco H. Irwin Required Reading: Griffihs, Hill and Judge. "Inference in he Simple

More information

An Introduction to Stochastic Programming: The Recourse Problem

An Introduction to Stochastic Programming: The Recourse Problem An Inroducion o Sochasic Programming: he Recourse Problem George Danzig and Phil Wolfe Ellis Johnson, Roger Wes, Dick Cole, and Me John Birge Where o look in he ex pp. 6-7, Secion.2.: Inroducion o sochasic

More information

Chapter 14 Wiener Processes and Itô s Lemma. Options, Futures, and Other Derivatives, 9th Edition, Copyright John C. Hull

Chapter 14 Wiener Processes and Itô s Lemma. Options, Futures, and Other Derivatives, 9th Edition, Copyright John C. Hull Chaper 14 Wiener Processes and Iô s Lemma Copyrigh John C. Hull 014 1 Sochasic Processes! Describes he way in which a variable such as a sock price, exchange rae or ineres rae changes hrough ime! Incorporaes

More information

Conservative Contextual Linear Bandits

Conservative Contextual Linear Bandits Conservaive Conexual Linear Bandis Abbas Kazerouni, Mohammad Ghavamzadeh and Benjamin Van Roy 1 Absrac Safey is a desirable propery ha can immensely increase he applicabiliy of learning algorihms in real-world

More information

Temporal probability models

Temporal probability models Temporal probabiliy models CS194-10 Fall 2011 Lecure 25 CS194-10 Fall 2011 Lecure 25 1 Ouline Hidden variables Inerence: ilering, predicion, smoohing Hidden Markov models Kalman ilers (a brie menion) Dynamic

More information

Physical Limitations of Logic Gates Week 10a

Physical Limitations of Logic Gates Week 10a Physical Limiaions of Logic Gaes Week 10a In a compuer we ll have circuis of logic gaes o perform specific funcions Compuer Daapah: Boolean algebraic funcions using binary variables Symbolic represenaion

More information

Using the Kalman filter Extended Kalman filter

Using the Kalman filter Extended Kalman filter Using he Kalman filer Eended Kalman filer Doz. G. Bleser Prof. Sricker Compuer Vision: Objec and People Tracking SA- Ouline Recap: Kalman filer algorihm Using Kalman filers Eended Kalman filer algorihm

More information

Anno accademico 2006/2007. Davide Migliore

Anno accademico 2006/2007. Davide Migliore Roboica Anno accademico 2006/2007 Davide Migliore migliore@ele.polimi.i Today Eercise session: An Off-side roblem Robo Vision Task Measuring NBA layers erformance robabilisic Roboics Inroducion The Bayesian

More information

Data Fusion using Kalman Filter. Ioannis Rekleitis

Data Fusion using Kalman Filter. Ioannis Rekleitis Daa Fusion using Kalman Filer Ioannis Rekleiis Eample of a arameerized Baesian Filer: Kalman Filer Kalman filers (KF represen poserior belief b a Gaussian (normal disribuion A -d Gaussian disribuion is

More information

A First Course on Kinetics and Reaction Engineering. Class 19 on Unit 18

A First Course on Kinetics and Reaction Engineering. Class 19 on Unit 18 A Firs ourse on Kineics and Reacion Engineering lass 19 on Uni 18 Par I - hemical Reacions Par II - hemical Reacion Kineics Where We re Going Par III - hemical Reacion Engineering A. Ideal Reacors B. Perfecly

More information

Summary of shear rate kinematics (part 1)

Summary of shear rate kinematics (part 1) InroToMaFuncions.pdf 4 CM465 To proceed o beer-designed consiuive equaions, we need o know more abou maerial behavior, i.e. we need more maerial funcions o predic, and we need measuremens of hese maerial

More information

Západočeská Univerzita v Plzni, Czech Republic and Groupe ESIEE Paris, France

Západočeská Univerzita v Plzni, Czech Republic and Groupe ESIEE Paris, France ADAPTIVE SIGNAL PROCESSING USING MAXIMUM ENTROPY ON THE MEAN METHOD AND MONTE CARLO ANALYSIS Pavla Holejšovsá, Ing. *), Z. Peroua, Ing. **), J.-F. Bercher, Prof. Assis. ***) Západočesá Univerzia v Plzni,

More information

Overview. COMP14112: Artificial Intelligence Fundamentals. Lecture 0 Very Brief Overview. Structure of this course

Overview. COMP14112: Artificial Intelligence Fundamentals. Lecture 0 Very Brief Overview. Structure of this course OMP: Arificial Inelligence Fundamenals Lecure 0 Very Brief Overview Lecurer: Email: Xiao-Jun Zeng x.zeng@mancheser.ac.uk Overview This course will focus mainly on probabilisic mehods in AI We shall presen

More information

ARTIFICIAL INTELLIGENCE. Markov decision processes

ARTIFICIAL INTELLIGENCE. Markov decision processes INFOB2KI 2017-2018 Urech Univeriy The Neherland ARTIFICIAL INTELLIGENCE Markov deciion procee Lecurer: Silja Renooij Thee lide are par of he INFOB2KI Coure Noe available from www.c.uu.nl/doc/vakken/b2ki/chema.hml

More information

G.F. Dargush, Y. Hu, S. Dogruel, G. Apostolakis University at Buffalo MCEER Annual Meeting Washington, DC June 29, 2006

G.F. Dargush, Y. Hu, S. Dogruel, G. Apostolakis University at Buffalo MCEER Annual Meeting Washington, DC June 29, 2006 Simulaion-based Muli-hazard Decision Suppor G.F. Dargush, Y. Hu, S. Dogruel, G. Aposolakis Universiy a Buffalo MCEER Annual Meeing Washingon, DC June 29, 2006 Simulaion-based Muli-hazard Decision Suppor

More information

Conservative Contextual Linear Bandits

Conservative Contextual Linear Bandits Conservaive Conexual Linear Bandis Abbas Kazerouni Sanford Universiy abbask@sanford.edu Yasin Abbasi-Yadkori Adobe Research abbasiya@adobe.com Mohammad Ghavamzadeh DeepMind ghavamza@google.com Benjamin

More information

Lecture 2-1 Kinematics in One Dimension Displacement, Velocity and Acceleration Everything in the world is moving. Nothing stays still.

Lecture 2-1 Kinematics in One Dimension Displacement, Velocity and Acceleration Everything in the world is moving. Nothing stays still. Lecure - Kinemaics in One Dimension Displacemen, Velociy and Acceleraion Everyhing in he world is moving. Nohing says sill. Moion occurs a all scales of he universe, saring from he moion of elecrons in

More information

Probabilistic Robotics SLAM

Probabilistic Robotics SLAM Probabilisic Roboics SLAM The SLAM Problem SLAM is he process by which a robo builds a map of he environmen and, a he same ime, uses his map o compue is locaion Localizaion: inferring locaion given a map

More information

Exponential Weighted Moving Average (EWMA) Chart Under The Assumption of Moderateness And Its 3 Control Limits

Exponential Weighted Moving Average (EWMA) Chart Under The Assumption of Moderateness And Its 3 Control Limits DOI: 0.545/mjis.07.5009 Exponenial Weighed Moving Average (EWMA) Char Under The Assumpion of Moderaeness And Is 3 Conrol Limis KALPESH S TAILOR Assisan Professor, Deparmen of Saisics, M. K. Bhavnagar Universiy,

More information

Multi-scale 2D acoustic full waveform inversion with high frequency impulsive source

Multi-scale 2D acoustic full waveform inversion with high frequency impulsive source Muli-scale D acousic full waveform inversion wih high frequency impulsive source Vladimir N Zubov*, Universiy of Calgary, Calgary AB vzubov@ucalgaryca and Michael P Lamoureux, Universiy of Calgary, Calgary

More information

Institute for Mathematical Methods in Economics. University of Technology Vienna. Singapore, May Manfred Deistler

Institute for Mathematical Methods in Economics. University of Technology Vienna. Singapore, May Manfred Deistler MULTIVARIATE TIME SERIES ANALYSIS AND FORECASTING Manfred Deisler E O S Economerics and Sysems Theory Insiue for Mahemaical Mehods in Economics Universiy of Technology Vienna Singapore, May 2004 Inroducion

More information

Competitive and Cooperative Inventory Policies in a Two-Stage Supply-Chain

Competitive and Cooperative Inventory Policies in a Two-Stage Supply-Chain Compeiive and Cooperaive Invenory Policies in a Two-Sage Supply-Chain (G. P. Cachon and P. H. Zipkin) Presened by Shruivandana Sharma IOE 64, Supply Chain Managemen, Winer 2009 Universiy of Michigan, Ann

More information

Speaker Adaptation Techniques For Continuous Speech Using Medium and Small Adaptation Data Sets. Constantinos Boulis

Speaker Adaptation Techniques For Continuous Speech Using Medium and Small Adaptation Data Sets. Constantinos Boulis Speaker Adapaion Techniques For Coninuous Speech Using Medium and Small Adapaion Daa Ses Consaninos Boulis Ouline of he Presenaion Inroducion o he speaker adapaion problem Maximum Likelihood Sochasic Transformaions

More information

Temporal probability models. Chapter 15, Sections 1 5 1

Temporal probability models. Chapter 15, Sections 1 5 1 Temporal probabiliy models Chaper 15, Secions 1 5 Chaper 15, Secions 1 5 1 Ouline Time and uncerainy Inerence: ilering, predicion, smoohing Hidden Markov models Kalman ilers (a brie menion) Dynamic Bayesian

More information

OBJECTIVES OF TIME SERIES ANALYSIS

OBJECTIVES OF TIME SERIES ANALYSIS OBJECTIVES OF TIME SERIES ANALYSIS Undersanding he dynamic or imedependen srucure of he observaions of a single series (univariae analysis) Forecasing of fuure observaions Asceraining he leading, lagging

More information

Linear Time-invariant systems, Convolution, and Cross-correlation

Linear Time-invariant systems, Convolution, and Cross-correlation Linear Time-invarian sysems, Convoluion, and Cross-correlaion (1) Linear Time-invarian (LTI) sysem A sysem akes in an inpu funcion and reurns an oupu funcion. x() T y() Inpu Sysem Oupu y() = T[x()] An

More information

5.1 - Logarithms and Their Properties

5.1 - Logarithms and Their Properties Chaper 5 Logarihmic Funcions 5.1 - Logarihms and Their Properies Suppose ha a populaion grows according o he formula P 10, where P is he colony size a ime, in hours. When will he populaion be 2500? We

More information

From Particles to Rigid Bodies

From Particles to Rigid Bodies Rigid Body Dynamics From Paricles o Rigid Bodies Paricles No roaions Linear velociy v only Rigid bodies Body roaions Linear velociy v Angular velociy ω Rigid Bodies Rigid bodies have boh a posiion and

More information

Linear Response Theory: The connection between QFT and experiments

Linear Response Theory: The connection between QFT and experiments Phys540.nb 39 3 Linear Response Theory: The connecion beween QFT and experimens 3.1. Basic conceps and ideas Q: How do we measure he conduciviy of a meal? A: we firs inroduce a weak elecric field E, and

More information

A Reinforcement Learning Approach for Collaborative Filtering

A Reinforcement Learning Approach for Collaborative Filtering A Reinforcemen Learning Approach for Collaboraive Filering Jungkyu Lee, Byonghwa Oh 2, Jihoon Yang 2, and Sungyong Park 2 Cyram Inc, Seoul, Korea jklee@cyram.com 2 Sogang Universiy, Seoul, Korea {mrfive,yangjh,parksy}@sogang.ac.kr

More information

Air Traffic Forecast Empirical Research Based on the MCMC Method

Air Traffic Forecast Empirical Research Based on the MCMC Method Compuer and Informaion Science; Vol. 5, No. 5; 0 ISSN 93-8989 E-ISSN 93-8997 Published by Canadian Cener of Science and Educaion Air Traffic Forecas Empirical Research Based on he MCMC Mehod Jian-bo Wang,

More information

Mechanical Fatigue and Load-Induced Aging of Loudspeaker Suspension. Wolfgang Klippel,

Mechanical Fatigue and Load-Induced Aging of Loudspeaker Suspension. Wolfgang Klippel, Mechanical Faigue and Load-Induced Aging of Loudspeaker Suspension Wolfgang Klippel, Insiue of Acousics and Speech Communicaion Dresden Universiy of Technology presened a he ALMA Symposium 2012, Las Vegas

More information

Online Monte-Carlo Rollout

Online Monte-Carlo Rollout Presenaion Ouline Online Mone-Carlo Rollou For he Ship Self Defense Problem by Sébasien Chouinard 2828-88-7 The ship self-defense problem; Uncerain duraions and decision epochs; The Mone-Carlo Rollou algorihm;

More information

Linear Gaussian State Space Models

Linear Gaussian State Space Models Linear Gaussian Sae Space Models Srucural Time Series Models Level and Trend Models Basic Srucural Model (BSM Dynamic Linear Models Sae Space Model Represenaion Level, Trend, and Seasonal Models Time Varying

More information

Bias-Variance Error Bounds for Temporal Difference Updates

Bias-Variance Error Bounds for Temporal Difference Updates Bias-Variance Bounds for Temporal Difference Updaes Michael Kearns AT&T Labs mkearns@research.a.com Sainder Singh AT&T Labs baveja@research.a.com Absrac We give he firs rigorous upper bounds on he error

More information

CONTROL SYSTEMS, ROBOTICS AND AUTOMATION Vol. XI Control of Stochastic Systems - P.R. Kumar

CONTROL SYSTEMS, ROBOTICS AND AUTOMATION Vol. XI Control of Stochastic Systems - P.R. Kumar CONROL OF SOCHASIC SYSEMS P.R. Kumar Deparmen of Elecrical and Compuer Engineering, and Coordinaed Science Laboraory, Universiy of Illinois, Urbana-Champaign, USA. Keywords: Markov chains, ransiion probabiliies,

More information

Applications in Industry (Extended) Kalman Filter. Week Date Lecture Title

Applications in Industry (Extended) Kalman Filter. Week Date Lecture Title hp://elec34.com Applicaions in Indusry (Eended) Kalman Filer 26 School of Informaion echnology and Elecrical Engineering a he Universiy of Queensland Lecure Schedule: Week Dae Lecure ile 29-Feb Inroducion

More information

Wednesday, November 7 Handout: Heteroskedasticity

Wednesday, November 7 Handout: Heteroskedasticity Amhers College Deparmen of Economics Economics 360 Fall 202 Wednesday, November 7 Handou: Heeroskedasiciy Preview Review o Regression Model o Sandard Ordinary Leas Squares (OLS) Premises o Esimaion Procedures

More information

CSE/NB 528 Lecture 14: From Supervised to Reinforcement Learning (Chapter 9) R. Rao, 528: Lecture 14

CSE/NB 528 Lecture 14: From Supervised to Reinforcement Learning (Chapter 9) R. Rao, 528: Lecture 14 CSE/NB 58 Lecure 14: From Supervised o Reinforcemen Learning Chaper 9 1 Recall from las ime: Sigmoid Neworks Oupu v T g w u g wiui w Inpu nodes u = u 1 u u 3 T i Sigmoid oupu funcion: 1 g a 1 a e 1 ga

More information