ARTIFICIAL INTELLIGENCE. Markov decision processes
|
|
- Helena Alexander
- 5 years ago
- Views:
Transcription
1 INFOB2KI Urech Univeriy The Neherland ARTIFICIAL INTELLIGENCE Markov deciion procee Lecurer: Silja Renooij Thee lide are par of he INFOB2KI Coure Noe available from
2 PageRank (Google) PageRank can be underood a a) A Markov Chain b) A Markov Deciion Proce c) A Parially Obervable Markov Deciion Proce d) None of he above 2
3 Markov model Markov model = ochaic model ha aume Markov propery. ochaic model: model a proce where he ae depend on previou ae in a non deerminiic way. Markov propery: he probabiliy diribuion of fuure ae, condiioned on boh pa and preen value, depend only upon he preen ae: given he preen, he fuure doe no depend on he pa Generally, hi aumpion enable reaoning and compuaion wih he model ha would oherwie be inracable. 3
4 Markov model ype Predicion Planning Fully obervable Markov chain MDP (Markov deciion proce) Parially obervable Hidden Markov model POMDP (Parially obervable Markov deciion proce) Typically for opimiaion purpoe Predicion model can be repreened a variable level by a (Dynamic) Bayeian nework: S 1 S 2 S 3 S 1 S 2 S 3 O 1 O 2 O 3 4
5 MDP: ouline Search in non deerminiic environmen Soluion: opimal policy (plan) of acion ha maximize reward (deciion heoreic planning) Bellman equaion and value ieraion Link wih learning 5
6 Running example: Grid World A maze like problem The agen live in a grid, where wall block he agen pah Noiy movemen: acion do no alway go a planned If wall in choen direcion, hen ay pu; 80% of he ime, he acion Norh ake he agen Norh 10% of he ime, Norh ake he agen We; 10% Ea (ame deviaion for oher acion) The agen receive reward each ime ep Small living reward each ep (can be negaive) Big reward come a he end (good or bad) Goal: maximize um of reward
7 Grid World Acion Deerminiic Grid World Sochaic Grid World 7
8 Goal, reward and opimaliy crieria Tradiional planning goal can be encoded in reward funcion; effec of raniion i uncerain Example: achieving a ae aifying propery P a minimal co i encoded by making any ae aifying P a zeroreward aborbing ae, and aigning all oher ae negaive reward. Reward are addiive and ime eparable, and objecive i o maximize expeced oal reward; fuure reward may be dicouned Planning horizon can be finie, infinie or indefinie (pecial cae of infinie: guaraneed o reach erminal ae) 8
9 Markov Deciion Procee MDP are non deerminiic earch problem An MDP i defined by: A e of ae S A e of acion a A A raniion funcion T(, a, ) Probabiliy ha a from lead o, i.e., P(, a) Alo called he model or he dynamic A reward funcion R(, a, ) Someime ju R() or R( ) A ar ae Someime a erminal ae 9
10 Wha i Markov abou MDP? Recall: Markov generally mean ha given he preen ae, he fuure and he pa are independen For Markov deciion procee, Markov mean acion oucome depend only on he curren ae Andrey Markov ( ) Thi i ju like earch, where he ucceor funcion could only depend on he curren ae (no he hiory) 10
11 MDP Search Tree Each MDP ae projec a earch ree i a ae: a (,a, ) called a raniion T(,a, ) = P(, a) R(,a, ) 11
12 Policie In deerminiic earch problem, we waned an opimal plan: a equence of acion, from ar o a goal For MDP, we wan an opimal policy *: S A Example: opimal policy when R(, a, ) = 0.03 for all nonerminal ae A policy give an acion for each ae An opimal policy i one ha maximize expeced uiliy (reward) if followed Noe: an explici policy define a reflex agen 12
13 Opimal Policie - example R() = 0.01 R() = 0.03 R() = 0.4 R() =
14 Uiliie of Reward Sequence Wha preference hould an agen have over reward equence? More or le? Now or laer? [1, 2, 2] or [2, 3, 4] [0, 0, 1] or [1, 0, 0] I reaonable o maximize he um of reward I alo reaonable o prefer reward now o reward laer A oluion: value of reward decay exponenially 14
15 Dicouning Worh Now Worh Nex Sep Worh In Two Sep 15
16 Epiodic ak: ineracion break naurally ino epiode, e.g., play of a game, rip hrough a maze. reurn give oal reward from ime o ime T, ending an epiode. Coninuing ak: ineracion doe no have naural epiode. dicouned reurn where γ, 0 γ 1, i he dicoun rae Reurn in he long run T k k T r r r r R k k k r r r r R farighed) 1 0 ed (horigh 16
17 Dicouning: implemenaion How o dicoun? Each ime we decend a level, we muliply in he dicoun once Why dicoun? Sooner reward probably do have higher uiliy han laer reward Alo help our algorihm converge Example: Value of receiving [1,2,3] wih dicoun of 0.5 = 1* * *3 Which i le han ha of [3,2,1] 17
18 Solving MDP 18
19 Opimal Quaniie Sae ha value V(): V * () = expeced reurn aring in and hereafer acing opimally acion a ae (Inermediae q ae ha value Q(,a) ) Traniion (,a, ) q ae The opimal policy: * () = opimal acion from ae Any policy ha i greedy wih repec o V* i an opimal policy. ae 19
20 Conider an arbirary policy precribing acion in ae. Wha i he value of following hi policy when in ae? Fir, le coniderhe deerminiic iuaion: Noie ake expeced value over all poible nex ae: where P() i given by he raniion funcion T(). Characerizing V π () ) ( ) ), (, ( ) ( ] [ ) ( i i i k k k k k k k k k V R V r R r r r r r r r r R R E V ) ( ) ), (, ( )) (, ( ) ( V R P V 20
21 Example: Policy Evaluaion π: Alway Go Righ π: Alway Go Forward V π i hown for each ae (indicaed in cell) 21
22 Characerizing opimalv*() )), ( max ( ') ( '),, ( '),, ( max ') ( ') ), (, ( ') ), (, ( max ) ( max ) ( ) ( * * ' ' * * a Q V a R a T V R T V V V a a Expeced reurn from ae i maximized by acing opimally in and hereafer he opimal value for a ae i obained when following he opimal policy : * 22 Thi equaion i called he Bellman equaion.
23 Uing V*() o obain *() The opimal policy can be exraced from V*(): * ( ) arg maxv ( ) arg max a ' T (, a, ') R(, a, ') V * ( ') uing one ep look ahead, i.e. ue he Bellman equaion once more o compue he given ummaion for all acion raher han reurning he max value, reurn he acion ha give he max value 23
24 Uing V*() o obain *() Back o Gridworld: Noie = 0.2 (i.e. move ucceful wih p=0.8; deviaion o lef/righ boh wih p=0.1) Dicoun γ = 0.9 Living reward R(,a, ) = 0 Opimal policy? Given V* (hown in cell), one ep look ahead produce he long erm opimal acion (hown a mall arrowhead). 24
25 Value Ieraion A Dynamic Programming algorihm for compuing V* 25
26 Value Ieraion (VI) Tree backup : define V k () a he opimal value of ill o be obained if he game end in k more ime ep Sar wih V 0 () = 0 for all (including erminal ) Terminal ae reward (if any) added a k=1 Given V k () value, compue for each V k 1 ( ) ( max max Q a a T (, a, ') (, a)) Repea unil convergence of V value ' k Theorem: VI will converge o unique opimal value Baic idea: approximaion ge refined oward opimal value R(, a, ') V k ( ') a V k+1 () V k ( ) 26
27 VI ini: k=0 Policy: baed on one ep lookahead, i.e. acion ha give max V k+1 value no ued in compuing V value! no ye inereing (only hown o demonrae change) defaul policy: N Noie = 0.2 Dicoun = 0.9 Living reward (R) = 0 27
28 k=1 Implemenaion of erminal ae e wih reward r : 1 acion: x (exi) T(e, x)= 1 R(e, x) = r In k=1 erminal ae ge aociaed reward, and no change in V value afer ha Noie = 0.2 Dicoun = 0.9 Living reward (R)= 0 28
29 k=2 Noie = 0.2 Dicoun = 0.9 Living reward = 0 29
30 k=3 Noie = 0.2 Dicoun = 0.9 Living reward = 0 30
31 k=4 Noie = 0.2 Dicoun = 0.9 Living reward = 0 31
32 k=5 Noie = 0.2 Dicoun = 0.9 Living reward = 0 32
33 k=6 Noie = 0.2 Dicoun = 0.9 Living reward = 0 33
34 k=7 Noie = 0.2 Dicoun = 0.9 Living reward = 0 34
35 k=8 Noie = 0.2 Dicoun = 0.9 Living reward = 0 35
36 k=9 Noie = 0.2 Dicoun = 0.9 Living reward = 0 36
37 k=10 Noie = 0.2 Dicoun = 0.9 Living reward = 0 37
38 k=11 Noie = 0.2 Dicoun = 0.9 Living reward = 0 38
39 k=12 Noie = 0.2 Dicoun = 0.9 Living reward = 0 39
40 k=100 Noie = 0.2 Dicoun = 0.9 Living reward = 0 40
41 Problem wih Value Ieraion Value ieraion repea he Bellman updae: V k 1( ) max T (, a, ') R(, a, ') V ( ') a ' Problem 1: I low O(S 2 A) per ieraion,a, Problem 2: The max a each ae rarely change k a, a Problem 3: The policy ofen converge long before he value 41
42 Policy Ieraion Alernaive approach for opimal value: Sep 1: Policy evaluaion: calculae reurn for ome fixed policy unil convergence Sep 2: Policy improvemen: updae policy uing one ep look ahead wih reuling converged (bu no opimal!) reurn a fuure value Repea ep unil policy converge Thi i policy ieraion I ill opimal! Can converge (much) faer under ome condiion 42
43 Recall: Policy Evaluaion π: Alway Go Righ π: Alway Go Forward V π i hown for each ae 43
44 Comparion VI and PI Boh are dynamic program for olving MDP and compue he ame hing (all opimal value) In value ieraion: Every ieraion updae boh value and (implicily) policy Don rack policy: aking max over acion implicily recompue i In policy ieraion: Do everal pae ha updae reurn wih fixed policy (each pa i fa: we conider only one acion, no all) Afer he policy i evaluaed, a new policy i choen (low like a value ieraion pa) The new policy will be beer (or we re done) 44
45 Double Bandi 45
46 Double-Bandi MDP Acion: Blue, Red Sae: Win, Loe 0.25 $0 No dicoun 100 ime ep $1 W 0.75 $ $0 L $1 Boh ae have he ame value $2 1.0 Noe he repreenaion a value raher han variable level! 46
47 Offline Planning Solving MDP i offline planning You deermine all quaniie hrough compuaion You need o know he deail of he MDP You do no acually play he game! Value 0.25 $0 No dicoun 100 ime ep Boh ae have he ame value Play Red Play Blue $1 1.0 W 0.75 $ $ $2 L $
48 Le Play! $2$2$0$2$2 $2$2$0$0$0 48
49 Online Planning Rule changed! Red win chance i differen.?? $0 $1 1.0 W?? $2?? $0?? $2 L $
50 Le Play again! $0$0$0$2$0 $2$0$0$0$0 50
51 Wha Ju Happened? Tha wan planning, i wa learning! Specifically, reinforcemen learning There wa an MDP, bu you couldn olve i wih ju compuaion You needed o acually ac o figure i ou 51
52 PageRank (Google) PageRank can be underood a a) A Markov Chain b) A Markov Deciion Proce c) A Parially Obervable Markov Deciion Proce d) None of he above 52
RL Lecture 7: Eligibility Traces. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1
RL Lecure 7: Eligibiliy Traces R. S. Suon and A. G. Baro: Reinforcemen Learning: An Inroducion 1 N-sep TD Predicion Idea: Look farher ino he fuure when you do TD backup (1, 2, 3,, n seps) R. S. Suon and
More informationAlgorithmic Discrete Mathematics 6. Exercise Sheet
Algorihmic Dicree Mahemaic. Exercie Shee Deparmen of Mahemaic SS 0 PD Dr. Ulf Lorenz 7. and 8. Juni 0 Dipl.-Mah. David Meffer Verion of June, 0 Groupwork Exercie G (Heap-Sor) Ue Heap-Sor wih a min-heap
More informationCSE/NB 528 Lecture 14: Reinforcement Learning (Chapter 9)
CSE/NB 528 Lecure 14: Reinforcemen Learning Chaper 9 Image from hp://clasdean.la.asu.edu/news/images/ubep2001/neuron3.jpg Lecure figures are from Dayan & Abbo s book hp://people.brandeis.edu/~abbo/book/index.hml
More informationRandomized Perfect Bipartite Matching
Inenive Algorihm Lecure 24 Randomized Perfec Biparie Maching Lecurer: Daniel A. Spielman April 9, 208 24. Inroducion We explain a randomized algorihm by Ahih Goel, Michael Kapralov and Sanjeev Khanna for
More informationNetwork Flows: Introduction & Maximum Flow
CSC 373 - lgorihm Deign, nalyi, and Complexiy Summer 2016 Lalla Mouaadid Nework Flow: Inroducion & Maximum Flow We now urn our aenion o anoher powerful algorihmic echnique: Local Search. In a local earch
More informationSZG Macro 2011 Lecture 3: Dynamic Programming. SZG macro 2011 lecture 3 1
SZG Macro 2011 Lecure 3: Dynamic Programming SZG macro 2011 lecure 3 1 Background Our previous discussion of opimal consumpion over ime and of opimal capial accumulaion sugges sudying he general decision
More information3.1.3 INTRODUCTION TO DYNAMIC OPTIMIZATION: DISCRETE TIME PROBLEMS. A. The Hamiltonian and First-Order Conditions in a Finite Time Horizon
3..3 INRODUCION O DYNAMIC OPIMIZAION: DISCREE IME PROBLEMS A. he Hamilonian and Firs-Order Condiions in a Finie ime Horizon Define a new funcion, he Hamilonian funcion, H. H he change in he oal value of
More informationProblem Set If all directed edges in a network have distinct capacities, then there is a unique maximum flow.
CSE 202: Deign and Analyi of Algorihm Winer 2013 Problem Se 3 Inrucor: Kamalika Chaudhuri Due on: Tue. Feb 26, 2013 Inrucion For your proof, you may ue any lower bound, algorihm or daa rucure from he ex
More informationMacroeconomics 1. Ali Shourideh. Final Exam
4780 - Macroeconomic 1 Ali Shourideh Final Exam Problem 1. A Model of On-he-Job Search Conider he following verion of he McCall earch model ha allow for on-he-job-earch. In paricular, uppoe ha ime i coninuou
More informationVehicle Arrival Models : Headway
Chaper 12 Vehicle Arrival Models : Headway 12.1 Inroducion Modelling arrival of vehicle a secion of road is an imporan sep in raffic flow modelling. I has imporan applicaion in raffic flow simulaion where
More informationChapter 7: Inverse-Response Systems
Chaper 7: Invere-Repone Syem Normal Syem Invere-Repone Syem Baic Sar ou in he wrong direcion End up in he original eady-ae gain value Two or more yem wih differen magniude and cale in parallel Main yem
More informationSuggested Solutions to Midterm Exam Econ 511b (Part I), Spring 2004
Suggeed Soluion o Miderm Exam Econ 511b (Par I), Spring 2004 1. Conider a compeiive equilibrium neoclaical growh model populaed by idenical conumer whoe preference over conumpion ream are given by P β
More informationChapter 21. Reinforcement Learning. The Reinforcement Learning Agent
CSE 47 Chaper Reinforcemen Learning The Reinforcemen Learning Agen Agen Sae u Reward r Acion a Enironmen CSE AI Faculy Why reinforcemen learning Programming an agen o drie a car or fly a helicoper is ery
More informationCS4445/9544 Analysis of Algorithms II Solution for Assignment 1
Conider he following flow nework CS444/944 Analyi of Algorihm II Soluion for Aignmen (0 mark) In he following nework a minimum cu ha capaciy 0 Eiher prove ha hi aemen i rue, or how ha i i fale Uing he
More informationBU Macro BU Macro Fall 2008, Lecture 4
Dynamic Programming BU Macro 2008 Lecure 4 1 Ouline 1. Cerainy opimizaion problem used o illusrae: a. Resricions on exogenous variables b. Value funcion c. Policy funcion d. The Bellman equaion and an
More informationEECE 301 Signals & Systems Prof. Mark Fowler
EECE 31 Signal & Syem Prof. Mark Fowler Noe Se #27 C-T Syem: Laplace Tranform Power Tool for yem analyi Reading Aignmen: Secion 6.1 6.3 of Kamen and Heck 1/18 Coure Flow Diagram The arrow here how concepual
More information1 Motivation and Basic Definitions
CSCE : Deign and Analyi of Algorihm Noe on Max Flow Fall 20 (Baed on he preenaion in Chaper 26 of Inroducion o Algorihm, 3rd Ed. by Cormen, Leieron, Rive and Sein.) Moivaion and Baic Definiion Conider
More informationCSE/NB 528 Lecture 14: From Supervised to Reinforcement Learning (Chapter 9) R. Rao, 528: Lecture 14
CSE/NB 58 Lecure 14: From Supervised o Reinforcemen Learning Chaper 9 1 Recall from las ime: Sigmoid Neworks Oupu v T g w u g wiui w Inpu nodes u = u 1 u u 3 T i Sigmoid oupu funcion: 1 g a 1 a e 1 ga
More informationProblem Set 5. Graduate Macro II, Spring 2017 The University of Notre Dame Professor Sims
Problem Se 5 Graduae Macro II, Spring 2017 The Universiy of Nore Dame Professor Sims Insrucions: You may consul wih oher members of he class, bu please make sure o urn in your own work. Where applicable,
More informationPresentation Overview
Acion Refinemen in Reinforcemen Learning by Probabiliy Smoohing By Thomas G. Dieerich & Didac Busques Speaer: Kai Xu Presenaion Overview Bacground The Probabiliy Smoohing Mehod Experimenal Sudy of Acion
More informationDiscussion Session 2 Constant Acceleration/Relative Motion Week 03
PHYS 100 Dicuion Seion Conan Acceleraion/Relaive Moion Week 03 The Plan Today you will work wih your group explore he idea of reference frame (i.e. relaive moion) and moion wih conan acceleraion. You ll
More information6.8 Laplace Transform: General Formulas
48 HAP. 6 Laplace Tranform 6.8 Laplace Tranform: General Formula Formula Name, ommen Sec. F() l{ f ()} e f () d f () l {F()} Definiion of Tranform Invere Tranform 6. l{af () bg()} al{f ()} bl{g()} Lineariy
More informationLecture 26. Lucas and Stokey: Optimal Monetary and Fiscal Policy in an Economy without Capital (JME 1983) t t
Lecure 6. Luca and Sokey: Opimal Moneary and Fical Policy in an Economy wihou Capial (JME 983. A argued in Kydland and Preco (JPE 977, Opimal governmen policy i likely o be ime inconien. Fiher (JEDC 98
More informationThe Residual Graph. 11 Augmenting Path Algorithms. Augmenting Path Algorithm. Augmenting Path Algorithm
Augmening Pah Algorihm Greedy-algorihm: ar wih f (e) = everywhere find an - pah wih f (e) < c(e) on every edge augmen flow along he pah repea a long a poible The Reidual Graph From he graph G = (V, E,
More informationCSC 364S Notes University of Toronto, Spring, The networks we will consider are directed graphs, where each edge has associated with it
CSC 36S Noe Univeriy of Torono, Spring, 2003 Flow Algorihm The nework we will conider are direced graph, where each edge ha aociaed wih i a nonnegaive capaciy. The inuiion i ha if edge (u; v) ha capaciy
More informationCHAPTER 7: SECOND-ORDER CIRCUITS
EEE5: CI RCUI T THEORY CHAPTER 7: SECOND-ORDER CIRCUITS 7. Inroducion Thi chaper conider circui wih wo orage elemen. Known a econd-order circui becaue heir repone are decribed by differenial equaion ha
More informationTo become more mathematically correct, Circuit equations are Algebraic Differential equations. from KVL, KCL from the constitutive relationship
Laplace Tranform (Lin & DeCarlo: Ch 3) ENSC30 Elecric Circui II The Laplace ranform i an inegral ranformaion. I ranform: f ( ) F( ) ime variable complex variable From Euler > Lagrange > Laplace. Hence,
More information18 Extensions of Maximum Flow
Who are you?" aid Lunkwill, riing angrily from hi ea. Wha do you wan?" I am Majikhie!" announced he older one. And I demand ha I am Vroomfondel!" houed he younger one. Majikhie urned on Vroomfondel. I
More informationPhysics 240: Worksheet 16 Name
Phyic 4: Workhee 16 Nae Non-unifor circular oion Each of hee proble involve non-unifor circular oion wih a conan α. (1) Obain each of he equaion of oion for non-unifor circular oion under a conan acceleraion,
More informationIntroduction to Congestion Games
Algorihmic Game Theory, Summer 2017 Inroducion o Congeion Game Lecure 1 (5 page) Inrucor: Thoma Keelheim In hi lecure, we ge o know congeion game, which will be our running example for many concep in game
More informationAlgorithms and Data Structures 2011/12 Week 9 Solutions (Tues 15th - Fri 18th Nov)
Algorihm and Daa Srucure 2011/ Week Soluion (Tue 15h - Fri 18h No) 1. Queion: e are gien 11/16 / 15/20 8/13 0/ 1/ / 11/1 / / To queion: (a) Find a pair of ube X, Y V uch ha f(x, Y) = f(v X, Y). (b) Find
More informationFlow networks. Flow Networks. A flow on a network. Flow networks. The maximum-flow problem. Introduction to Algorithms, Lecture 22 December 5, 2001
CS 545 Flow Nework lon Efra Slide courey of Charle Leieron wih mall change by Carola Wenk Flow nework Definiion. flow nework i a direced graph G = (V, E) wih wo diinguihed verice: a ource and a ink. Each
More informationu(t) Figure 1. Open loop control system
Open loop conrol v cloed loop feedbac conrol The nex wo figure preen he rucure of open loop and feedbac conrol yem Figure how an open loop conrol yem whoe funcion i o caue he oupu y o follow he reference
More informationThe Residual Graph. 12 Augmenting Path Algorithms. Augmenting Path Algorithm. Augmenting Path Algorithm
Augmening Pah Algorihm Greedy-algorihm: ar wih f (e) = everywhere find an - pah wih f (e) < c(e) on every edge augmen flow along he pah repea a long a poible The Reidual Graph From he graph G = (V, E,
More informationEE202 Circuit Theory II
EE202 Circui Theory II 2017-2018, Spring Dr. Yılmaz KALKAN I. Inroducion & eview of Fir Order Circui (Chaper 7 of Nilon - 3 Hr. Inroducion, C and L Circui, Naural and Sep epone of Serie and Parallel L/C
More informationSoviet Rail Network, 1955
7.1 Nework Flow Sovie Rail Nework, 19 Reerence: On he hiory o he ranporaion and maximum low problem. lexander Schrijver in Mah Programming, 91: 3, 00. (See Exernal Link ) Maximum Flow and Minimum Cu Max
More informationMath 2214 Solution Test 1 B Spring 2016
Mah 14 Soluion Te 1 B Spring 016 Problem 1: Ue eparaion of ariable o ole he Iniial alue DE Soluion (14p) e =, (0) = 0 d = e e d e d = o = ln e d uing u-du b leing u = e 1 e = + where C = for he iniial
More informationNetwork Flows UPCOPENCOURSEWARE number 34414
Nework Flow UPCOPENCOURSEWARE number Topic : F.-Javier Heredia Thi work i licened under he Creaive Common Aribuion- NonCommercial-NoDeriv. Unpored Licene. To view a copy of hi licene, vii hp://creaivecommon.org/licene/by-nc-nd/./
More informationPhys1112: DC and RC circuits
Name: Group Members: Dae: TA s Name: Phys1112: DC and RC circuis Objecives: 1. To undersand curren and volage characerisics of a DC RC discharging circui. 2. To undersand he effec of he RC ime consan.
More informationInventory Analysis and Management. Multi-Period Stochastic Models: Optimality of (s, S) Policy for K-Convex Objective Functions
Muli-Period Sochasic Models: Opimali of (s, S) Polic for -Convex Objecive Funcions Consider a seing similar o he N-sage newsvendor problem excep ha now here is a fixed re-ordering cos (> 0) for each (re-)order.
More informationTwo Popular Bayesian Estimators: Particle and Kalman Filters. McGill COMP 765 Sept 14 th, 2017
Two Popular Bayesian Esimaors: Paricle and Kalman Filers McGill COMP 765 Sep 14 h, 2017 1 1 1, dx x Bel x u x P x z P Recall: Bayes Filers,,,,,,, 1 1 1 1 u z u x P u z u x z P Bayes z = observaion u =
More informationMachine Learning 4771
ony Jebara, Columbia Universiy achine Learning 4771 Insrucor: ony Jebara ony Jebara, Columbia Universiy opic 20 Hs wih Evidence H Collec H Evaluae H Disribue H Decode H Parameer Learning via JA & E ony
More informationGraphs III - Network Flow
Graph III - Nework Flow Flow nework eup graph G=(V,E) edge capaciy w(u,v) 0 - if edge doe no exi, hen w(u,v)=0 pecial verice: ource verex ; ink verex - no edge ino and no edge ou of Aume every verex v
More informationDiebold, Chapter 7. Francis X. Diebold, Elements of Forecasting, 4th Edition (Mason, Ohio: Cengage Learning, 2006). Chapter 7. Characterizing Cycles
Diebold, Chaper 7 Francis X. Diebold, Elemens of Forecasing, 4h Ediion (Mason, Ohio: Cengage Learning, 006). Chaper 7. Characerizing Cycles Afer compleing his reading you should be able o: Define covariance
More informationMath 333 Problem Set #2 Solution 14 February 2003
Mah 333 Problem Se #2 Soluion 14 February 2003 A1. Solve he iniial value problem dy dx = x2 + e 3x ; 2y 4 y(0) = 1. Soluion: This is separable; we wrie 2y 4 dy = x 2 + e x dx and inegrae o ge The iniial
More informationFinal Spring 2007
.615 Final Spring 7 Overview The purpose of he final exam is o calculae he MHD β limi in a high-bea oroidal okamak agains he dangerous n = 1 exernal ballooning-kink mode. Effecively, his corresponds o
More informationIB Physics Kinematics Worksheet
IB Physics Kinemaics Workshee Wrie full soluions and noes for muliple choice answers. Do no use a calculaor for muliple choice answers. 1. Which of he following is a correc definiion of average acceleraion?
More informationReading from Young & Freedman: For this topic, read sections 25.4 & 25.5, the introduction to chapter 26 and sections 26.1 to 26.2 & 26.4.
PHY1 Elecriciy Topic 7 (Lecures 1 & 11) Elecric Circuis n his opic, we will cover: 1) Elecromoive Force (EMF) ) Series and parallel resisor combinaions 3) Kirchhoff s rules for circuis 4) Time dependence
More informationPlanning in POMDPs. Dominik Schoenberger Abstract
Planning in POMDPs Dominik Schoenberger d.schoenberger@sud.u-darmsad.de Absrac This documen briefly explains wha a Parially Observable Markov Decision Process is. Furhermore i inroduces he differen approaches
More informationSelfish Routing. Tim Roughgarden Cornell University. Includes joint work with Éva Tardos
Selfih Rouing Tim Roughgarden Cornell Univeriy Include join work wih Éva Tardo 1 Which roue would you chooe? Example: one uni of raffic (e.g., car) wan o go from o delay = 1 hour (no congeion effec) long
More informationCHAPTER 12 DIRECT CURRENT CIRCUITS
CHAPTER 12 DIRECT CURRENT CIUITS DIRECT CURRENT CIUITS 257 12.1 RESISTORS IN SERIES AND IN PARALLEL When wo resisors are conneced ogeher as shown in Figure 12.1 we said ha hey are conneced in series. As
More information8. Basic RL and RC Circuits
8. Basic L and C Circuis This chaper deals wih he soluions of he responses of L and C circuis The analysis of C and L circuis leads o a linear differenial equaion This chaper covers he following opics
More informationLaplace transfom: t-translation rule , Haynes Miller and Jeremy Orloff
Laplace ransfom: -ranslaion rule 8.03, Haynes Miller and Jeremy Orloff Inroducory example Consider he sysem ẋ + 3x = f(, where f is he inpu and x he response. We know is uni impulse response is 0 for
More informationt is a basis for the solution space to this system, then the matrix having these solutions as columns, t x 1 t, x 2 t,... x n t x 2 t...
Mah 228- Fri Mar 24 5.6 Marix exponenials and linear sysems: The analogy beween firs order sysems of linear differenial equaions (Chaper 5) and scalar linear differenial equaions (Chaper ) is much sronger
More informationSection 3.5 Nonhomogeneous Equations; Method of Undetermined Coefficients
Secion 3.5 Nonhomogeneous Equaions; Mehod of Undeermined Coefficiens Key Terms/Ideas: Linear Differenial operaor Nonlinear operaor Second order homogeneous DE Second order nonhomogeneous DE Soluion o homogeneous
More informationCMU-Q Lecture 3: Search algorithms: Informed. Teacher: Gianni A. Di Caro
CMU-Q 5-38 Lecure 3: Search algorihms: Informed Teacher: Gianni A. Di Caro UNINFORMED VS. INFORMED SEARCH Sraegy How desirable is o be in a cerain inermediae sae for he sake of (effecively) reaching a
More informationSolutions to Assignment 1
MA 2326 Differenial Equaions Insrucor: Peronela Radu Friday, February 8, 203 Soluions o Assignmen. Find he general soluions of he following ODEs: (a) 2 x = an x Soluion: I is a separable equaion as we
More informationEchocardiography Project and Finite Fourier Series
Echocardiography Projec and Finie Fourier Series 1 U M An echocardiagram is a plo of how a porion of he hear moves as he funcion of ime over he one or more hearbea cycles If he hearbea repeas iself every
More information23.2. Representing Periodic Functions by Fourier Series. Introduction. Prerequisites. Learning Outcomes
Represening Periodic Funcions by Fourier Series 3. Inroducion In his Secion we show how a periodic funcion can be expressed as a series of sines and cosines. We begin by obaining some sandard inegrals
More informationBias-Variance Error Bounds for Temporal Difference Updates
Bias-Variance Bounds for Temporal Difference Updaes Michael Kearns AT&T Labs mkearns@research.a.com Sainder Singh AT&T Labs baveja@research.a.com Absrac We give he firs rigorous upper bounds on he error
More information2. VECTORS. R Vectors are denoted by bold-face characters such as R, V, etc. The magnitude of a vector, such as R, is denoted as R, R, V
ME 352 VETS 2. VETS Vecor algebra form he mahemaical foundaion for kinemaic and dnamic. Geomer of moion i a he hear of boh he kinemaic and dnamic of mechanical em. Vecor anali i he imehonored ool for decribing
More informationSophisticated Monetary Policies. Andrew Atkeson. V.V. Chari. Patrick Kehoe
Sophisicaed Moneary Policies Andrew Akeson UCLA V.V. Chari Universiy of Minnesoa Parick Kehoe Federal Reserve Bank of Minneapolis and Universiy of Minnesoa Barro, Lucas-Sokey Approach o Policy Solve Ramsey
More informationAlgorithm Design and Analysis
Algorihm Deign and Analyi LECTURES 17 Nework Flow Dualiy of Max Flow and Min Cu Algorihm: Ford-Fulkeron Capaciy Scaling Sofya Rakhodnikova S. Rakhodnikova; baed on lide by E. Demaine, C. Leieron, A. Smih,
More informationLecture 2-1 Kinematics in One Dimension Displacement, Velocity and Acceleration Everything in the world is moving. Nothing stays still.
Lecure - Kinemaics in One Dimension Displacemen, Velociy and Acceleraion Everyhing in he world is moving. Nohing says sill. Moion occurs a all scales of he universe, saring from he moion of elecrons in
More information3.1 More on model selection
3. More on Model selecion 3. Comparing models AIC, BIC, Adjused R squared. 3. Over Fiing problem. 3.3 Sample spliing. 3. More on model selecion crieria Ofen afer model fiing you are lef wih a handful of
More informationSection 7.4 Modeling Changing Amplitude and Midline
488 Chaper 7 Secion 7.4 Modeling Changing Ampliude and Midline While sinusoidal funcions can model a variey of behaviors, i is ofen necessary o combine sinusoidal funcions wih linear and exponenial curves
More informationEE Control Systems LECTURE 2
Copyrigh F.L. Lewi 999 All righ reerved EE 434 - Conrol Syem LECTURE REVIEW OF LAPLACE TRANSFORM LAPLACE TRANSFORM The Laplace ranform i very ueful in analyi and deign for yem ha are linear and ime-invarian
More informationExponential Smoothing
Exponenial moohing Inroducion A simple mehod for forecasing. Does no require long series. Enables o decompose he series ino a rend and seasonal effecs. Paricularly useful mehod when here is a need o forecas
More informationCourse Notes for EE227C (Spring 2018): Convex Optimization and Approximation
Course Noes for EE7C Spring 018: Convex Opimizaion and Approximaion Insrucor: Moriz Hard Email: hard+ee7c@berkeley.edu Graduae Insrucor: Max Simchowiz Email: msimchow+ee7c@berkeley.edu Ocober 15, 018 3
More informationViterbi Algorithm: Background
Vierbi Algorihm: Background Jean Mark Gawron March 24, 2014 1 The Key propery of an HMM Wha is an HMM. Formally, i has he following ingrediens: 1. a se of saes: S 2. a se of final saes: F 3. an iniial
More information1 Review of Zero-Sum Games
COS 5: heoreical Machine Learning Lecurer: Rob Schapire Lecure #23 Scribe: Eugene Brevdo April 30, 2008 Review of Zero-Sum Games Las ime we inroduced a mahemaical model for wo player zero-sum games. Any
More informationNetwork Flow. Data Structures and Algorithms Andrei Bulatov
Nework Flow Daa Srucure and Algorihm Andrei Bulao Algorihm Nework Flow 24-2 Flow Nework Think of a graph a yem of pipe We ue hi yem o pump waer from he ource o ink Eery pipe/edge ha limied capaciy Flow
More information5.1 - Logarithms and Their Properties
Chaper 5 Logarihmic Funcions 5.1 - Logarihms and Their Properies Suppose ha a populaion grows according o he formula P 10, where P is he colony size a ime, in hours. When will he populaion be 2500? We
More informationA Dynamic Model of Economic Fluctuations
CHAPTER 15 A Dynamic Model of Economic Flucuaions Modified for ECON 2204 by Bob Murphy 2016 Worh Publishers, all righs reserved IN THIS CHAPTER, OU WILL LEARN: how o incorporae dynamics ino he AD-AS model
More informationAdditional Methods for Solving DSGE Models
Addiional Mehod for Solving DSGE Model Karel Meren, Cornell Univeriy Reference King, R. G., Ploer, C. I. & Rebelo, S. T. (1988), Producion, growh and buine cycle: I. he baic neoclaical model, Journal of
More informationTypes of Exponential Smoothing Methods. Simple Exponential Smoothing. Simple Exponential Smoothing
M Business Forecasing Mehods Exponenial moohing Mehods ecurer : Dr Iris Yeung Room No : P79 Tel No : 788 8 Types of Exponenial moohing Mehods imple Exponenial moohing Double Exponenial moohing Brown s
More informationE β t log (C t ) + M t M t 1. = Y t + B t 1 P t. B t 0 (3) v t = P tc t M t Question 1. Find the FOC s for an optimum in the agent s problem.
Noes, M. Krause.. Problem Se 9: Exercise on FTPL Same model as in paper and lecure, only ha one-period govenmen bonds are replaced by consols, which are bonds ha pay one dollar forever. I has curren marke
More informationReminder: Flow Networks
0/0/204 Ma/CS 6a Cla 4: Variou (Flow) Execie Reminder: Flow Nework A flow nework i a digraph G = V, E, ogeher wih a ource verex V, a ink verex V, and a capaciy funcion c: E N. Capaciy Source 7 a b c d
More information= ( ) ) or a system of differential equations with continuous parametrization (T = R
XIII. DIFFERENCE AND DIFFERENTIAL EQUATIONS Ofen funcions, or a sysem of funcion, are paramerized in erms of some variable, usually denoed as and inerpreed as ime. The variable is wrien as a funcion of
More informationMath 2214 Solution Test 1A Spring 2016
Mah 14 Soluion Tes 1A Spring 016 sec Problem 1: Wha is he larges -inerval for which ( 4) = has a guaraneed + unique soluion for iniial value (-1) = 3 according o he Exisence Uniqueness Theorem? Soluion
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Civil and Environmental Engineering
MASSACHUSETTS INSTITUTE OF TECHNOLOGY Deparmen of Civil and Environmenal Engineering 1.731 Waer Reource Syem Lecure 17 River Bain Planning Screening Model Nov. 7 2006 River Bain Planning River bain planning
More informationAnno accademico 2006/2007. Davide Migliore
Roboica Anno accademico 2006/2007 Davide Migliore migliore@ele.polimi.i Today Eercise session: An Off-side roblem Robo Vision Task Measuring NBA layers erformance robabilisic Roboics Inroducion The Bayesian
More informationEssential Microeconomics : OPTIMAL CONTROL 1. Consider the following class of optimization problems
Essenial Microeconomics -- 6.5: OPIMAL CONROL Consider he following class of opimizaion problems Max{ U( k, x) + U+ ( k+ ) k+ k F( k, x)}. { x, k+ } = In he language of conrol heory, he vecor k is he vecor
More informationSoviet Rail Network, 1955
Sovie Rail Nework, 1 Reference: On he hiory of he ranporaion and maximum flow problem. Alexander Schrijver in Mah Programming, 1: 3,. Maximum Flow and Minimum Cu Max flow and min cu. Two very rich algorihmic
More information16 Max-Flow Algorithms and Applications
Algorihm A proce canno be underood by opping i. Underanding mu move wih he flow of he proce, mu join i and flow wih i. The Fir Law of Mena, in Frank Herber Dune (196) There a difference beween knowing
More informationL07. KALMAN FILTERING FOR NON-LINEAR SYSTEMS. NA568 Mobile Robotics: Methods & Algorithms
L07. KALMAN FILTERING FOR NON-LINEAR SYSTEMS NA568 Mobile Roboics: Mehods & Algorihms Today s Topic Quick review on (Linear) Kalman Filer Kalman Filering for Non-Linear Sysems Exended Kalman Filer (EKF)
More informationAn introduction to the theory of SDDP algorithm
An inroducion o he heory of SDDP algorihm V. Leclère (ENPC) Augus 1, 2014 V. Leclère Inroducion o SDDP Augus 1, 2014 1 / 21 Inroducion Large scale sochasic problem are hard o solve. Two ways of aacking
More informationZürich. ETH Master Course: L Autonomous Mobile Robots Localization II
Roland Siegwar Margaria Chli Paul Furgale Marco Huer Marin Rufli Davide Scaramuzza ETH Maser Course: 151-0854-00L Auonomous Mobile Robos Localizaion II ACT and SEE For all do, (predicion updae / ACT),
More informationKINEMATICS IN ONE DIMENSION
KINEMATICS IN ONE DIMENSION PREVIEW Kinemaics is he sudy of how hings move how far (disance and displacemen), how fas (speed and velociy), and how fas ha how fas changes (acceleraion). We say ha an objec
More information( ) a system of differential equations with continuous parametrization ( T = R + These look like, respectively:
XIII. DIFFERENCE AND DIFFERENTIAL EQUATIONS Ofen funcions, or a sysem of funcion, are paramerized in erms of some variable, usually denoed as and inerpreed as ime. The variable is wrien as a funcion of
More informationAdmin MAX FLOW APPLICATIONS. Flow graph/networks. Flow constraints 4/30/13. CS lunch today Grading. in-flow = out-flow for every vertex (except s, t)
/0/ dmin lunch oday rading MX LOW PPLIION 0, pring avid Kauchak low graph/nework low nework direced, weighed graph (V, ) poiive edge weigh indicaing he capaciy (generally, aume ineger) conain a ingle ource
More informationSimulation-Solving Dynamic Models ABE 5646 Week 2, Spring 2010
Simulaion-Solving Dynamic Models ABE 5646 Week 2, Spring 2010 Week Descripion Reading Maerial 2 Compuer Simulaion of Dynamic Models Finie Difference, coninuous saes, discree ime Simple Mehods Euler Trapezoid
More informationChapter 6. Laplace Transforms
6- Chaper 6. Laplace Tranform 6.4 Shor Impule. Dirac Dela Funcion. Parial Fracion 6.5 Convoluion. Inegral Equaion 6.6 Differeniaion and Inegraion of Tranform 6.7 Syem of ODE 6.4 Shor Impule. Dirac Dela
More informationWhat is maximum Likelihood? History Features of ML method Tools used Advantages Disadvantages Evolutionary models
Wha i maximum Likelihood? Hiory Feaure of ML mehod Tool ued Advanage Diadvanage Evoluionary model Maximum likelihood mehod creae all he poible ree conaining he e of organim conidered, and hen ue he aiic
More informationCS 473G Lecture 15: Max-Flow Algorithms and Applications Fall 2005
CS 473G Lecure 1: Max-Flow Algorihm and Applicaion Fall 200 1 Max-Flow Algorihm and Applicaion (November 1) 1.1 Recap Fix a direced graph G = (V, E) ha doe no conain boh an edge u v and i reveral v u,
More informationParticle Swarm Optimization Combining Diversification and Intensification for Nonlinear Integer Programming Problems
Paricle Swarm Opimizaion Combining Diversificaion and Inensificaion for Nonlinear Ineger Programming Problems Takeshi Masui, Masaoshi Sakawa, Kosuke Kao and Koichi Masumoo Hiroshima Universiy 1-4-1, Kagamiyama,
More informationChapter 7: Solving Trig Equations
Haberman MTH Secion I: The Trigonomeric Funcions Chaper 7: Solving Trig Equaions Le s sar by solving a couple of equaions ha involve he sine funcion EXAMPLE a: Solve he equaion sin( ) The inverse funcions
More informationChapter 6. Laplace Transforms
Chaper 6. Laplace Tranform Kreyzig by YHLee;45; 6- An ODE i reduced o an algebraic problem by operaional calculu. The equaion i olved by algebraic manipulaion. The reul i ranformed back for he oluion of
More informationIntroduction to SLE Lecture Notes
Inroducion o SLE Lecure Noe May 13, 16 - The goal of hi ecion i o find a ufficien condiion of λ for he hull K o be generaed by a imple cure. I urn ou if λ 1 < 4 hen K i generaed by a imple curve. We will
More informationSelfish Routing and the Price of Anarchy. Tim Roughgarden Cornell University
Selfih Rouing and he Price of Anarchy Tim Roughgarden Cornell Univeriy 1 Algorihm for Self-Inereed Agen Our focu: problem in which muliple agen (people, compuer, ec.) inerac Moivaion: he Inerne decenralized
More information