An Introduction to COMPUTATIONAL REINFORCEMENT LEARING. Andrew G. Barto. Department of Computer Science University of Massachusetts Amherst

Size: px
Start display at page:

Download "An Introduction to COMPUTATIONAL REINFORCEMENT LEARING. Andrew G. Barto. Department of Computer Science University of Massachusetts Amherst"

Transcription

1 An Intrductin t COMPUTATIONAL REINFORCEMENT LEARING Andrew G. Bart Department f Cmputer Science University f Massachusetts Amherst UPF Lecture 1 Autnmus Learning Labratry Department f Cmputer Science

2 Artificial Intelligence Psychlgy Cmputatinal Reinfrcement Learning (RL) Cntrl Thery and Operatins Research Neurscience Artificial Neural Netwrks

3 The Overall Plan Lecture 1: What is Cmputatinal Reinfrcement Learning? Learning frm evaluative feedback Markv decisin prcesses Lecture 2: Dynamic Prgramming Basic Mnte Carl methds Tempral Difference methds A unified perspective Cnnectins t neurscience Lecture 3: Functin apprimatin Mdel-based methds Dimensins f Reinfrcement Learning A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,

4 What is Reinfrcement Learning? Learning frm interactin Gal-riented learning Learning abut, frm, and while interacting with an eternal envirnment Learning what t d hw t map situatins t actins s as t maimize a numerical reward signal A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,

5 Supervised Learning Training Inf = desired (target) utputs Inputs Supervised Learning System Outputs Errr = (target utput actual utput) A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,

6 Reinfrcement Learning Training Inf = evaluatins ( rewards / penalties ) Inputs RL System Outputs ( actins ) Objective: get as much reward as pssible A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,

7 Key Features f RL Learner is nt tld which actins t take Trial-and-Errr search Pssibility f delayed reward Sacrifice shrt-term gains fr greater lng-term gains The need t eplre and eplit Cnsiders the whle prblem f a gal-directed agent interacting with an uncertain envirnment A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,

8 Cmplete Agent Temprally situated Cntinual learning and planning Object is t affect the envirnment Envirnment is stchastic and uncertain Envirnment state actin reward Agent A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,

9 A Less Misleading View eternal sensatins memry reward RL agent state internal sensatins actins A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,

10 Elements f RL Plicy: what t d Reward: what is gd Value: what is gd because it predicts reward Mdel: what fllws what Plicy Reward Value Mdel f envirnment A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,

11 An Etended Eample: Tic-Tac-Te X O X O X X O O X X X O O X X X O O O X X X O O O X X X } s mve } s mve } s mve } s mve Assume an imperfect ppnent: he/she smetimes makes mistakes } s mve A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,

12 An RL Apprach t Tic-Tac-Te 1. Make a table with ne entry per state: State V(s) estimated prbability f winning.5?.5?... 1 win lss 0 draw 2. Nw play lts f games. T pick ur mves, lk ahead ne step: current state varius pssible net states A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press, * Just pick the net state with the highest estimated prb. f winning the largest V(s); a greedy mve. But 10% f the time pick a mve at randm; an eplratry mve.

13 RL Learning Rule fr Tic-Tac-Te Eplratry mve s s! the state befre ur greedy mve the state after ur greedy mve We increment each V(s) tward V( s!) a backup : V(s) " V (s) + #[ V( s!) $ V (s)] a small psitive fractin, e.g.,! =.1 the step - size parameter A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,

14 Hw can we imprve this T.T.T. player? Take advantage f symmetries representatin/generalizatin Hw might this backfire? D we need randm mves? Why? D we always need a full 10%? Can we learn frm randm mves? Can we learn ffline?... Pre-training frm self play? Using learned mdels f ppnent? A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,

15 Hw is Tic-Tac-Te T Easy? Finite, small number f states One-step lk-ahead is always pssible State cmpletely bservable... A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,

16 Sme Ntable RL Applicatins TD-Gammn: Tesaur wrld s best backgammn prgram Elevatr Cntrl: Crites & Bart high perfrmance dwn-peak elevatr cntrller Inventry Management: Van Ry, Bertsekas, Lee&Tsitsiklis 10 15% imprvement ver industry standard methds Dynamic Channel Assignment: Singh & Bertsekas, Nie & Haykin high perfrmance assignment f radi channels t mbile telephne calls A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,

17 TD-Gammn Tesaur, Value Actin selectin by 2 3 ply search TD errr V t+1! V t Start with a randm netwrk Play very many games against self Learn a value functin frm this simulated eperience This prduces arguably the best player in the wrld A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,

18 10 flrs, 4 elevatr cars Elevatr Dispatching Crites and Bart, 1996 STATES: buttn states; psitins, directins, and mtin states f cars; passengers in cars & in halls ACTIONS: stp at, r g by, net flr REWARDS: rughly, 1 per time step fr each persn waiting Cnservatively abut states A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,

19 Autnmus Helicpter Flight A. Ng, Stanfrd; H. Kim, M. Jrdn, S. Sastry, Berkeley A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,

20 Quadrupedal Lcmtin Nate Khl & Peter Stne, Univ f Teas at Austin All training dne with physical rbts: Sny Aib ERS-210A Befre Learning After 1000 trials, r abut 3 hurs A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,

21 Learning Cntrl fr Dynamically Stable Walking Rbts Russ Tedrake, Teresa Zhang, H. Sebastin Seung, MIT Start with a Passive Walker A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,

22

23 Grasp Cntrl R. Platt, A. Fagg, R. Grupen, Univ f Mass Umass Trs: Deter A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,

24 Sme RL Histry Trial-and-Errr learning Tempral-difference learning Optimal cntrl, value functins Thrndike (Ψ) 1911 Minsky Secndary reinfrcement (Ψ) Samuel Hamiltn (Physics) 1800s Shannn Bellman/Hward (OR) Klpf Hlland Bart et al. Witten Suttn Werbs Watkins A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,

25 Samuel s Checkers Player Arthur Samuel 1959, 1967 Scre bard cnfiguratins by a scring plynmial (after Shannn, 1950) Minima t determine backed-up scre f a psitin Alpha-beta cutffs Rte learning: save each bard cnfig encuntered tgether with backed-up scre needed a sense f directin : like discunting Learning by generalizatin: similar t TD algrithm A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,

26 Samuel s Backups A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,

27 The Basic Idea... we are attempting t make the scre, calculated fr the current bard psitin, lk like that calculated fr the terminal bard psitins f the chain f mves which mst prbably ccur during actual play. A. L. Samuel Sme Studies in Machine Learning Using the Game f Checkers, 1959 A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,

28 A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press, MENACE (Michie 1961) Matchb Educable Nughts and Crsses Engine

29 The Overall Plan Lecture 1: What is Cmputatinal Reinfrcement Learning? Learning frm evaluative feedback Markv decisin prcesses Lecture 2: Dynamic Prgramming Basic Mnte Carl methds Tempral Difference methds A unified perspective Cnnectins t neurscience Lecture 3: Functin apprimatin Mdel-based methds Dimensins f Reinfrcement Learning A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,

30 Lecture 1, Part 2: Evaluative Feedback Evaluating actins vs. instructing by giving crrect actins Pure evaluative feedback depends ttally n the actin taken. Pure instructive feedback depends nt at all n the actin taken. Supervised learning is instructive; ptimizatin is evaluative Assciative vs. Nnassciative: Assciative: inputs mapped t utputs; learn the best utput fr each input Nnassciative: learn (find) ne best utput n-armed bandit (at least hw we treat it) is: Nnassciative Evaluative feedback A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,

31 The n-armed Bandit Prblem Chse repeatedly frm ne f n actins; each chice is called a play a t After each play, yu get a reward, where E r t a t = Q * (a t ) These are unknwn actin values Distributin f depends nly n Objective is t maimize the reward in the lng term, e.g., ver 1000 plays T slve the n-armed bandit prblem, yu must eplre a variety f actins and the eplit the best f them r t r t a t A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,

32 The Eplratin/Eplitatin Dilemma Suppse yu frm estimates Q t (a)! Q * (a) actin value estimates The greedy actin at t is a t * = argma a Q t (a) a t = a t *! eplitatin a t " a t*! eplratin Yu can t eplit all the time; yu can t eplre all the time Yu can never stp eplring; but yu shuld always reduce eplring A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,

33 Actin-Value Methds Methds that adapt actin-value estimates and nthing else, e.g.: suppse by the t-th play, actin a had been chsen k a times, prducing rewards r, r, K, r 1 2 k a, then Q t (a) = r 1 + r 2 + Lr k a k a sample average lim Q (a) = t k a! " Q* (a) A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,

34 ε-greedy Actin Selectin Greedy actin selectin: ε-greedy: a t = a t * = arg ma a Q t (a) a t = a t * with prbability 1! " { randm actin with prbability "... the simplest way t try t balance eplratin and eplitatin A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,

35 10-Armed Testbed n = 10 pssible actins Each each Q * (a) r t 1000 plays is chsen randmly frm a nrmal distributin: is als nrmal:!(q * (a t ),1) repeat the whle thing 2000 times and average the results!(0,1) A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,

36 ε-greedy Methds n the 10-Armed Testbed A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,

37 Sftma Actin Selectin Sftma actin selectin methds grade actin prbs. by estimated values. The mst cmmn sftma uses a Gibbs, r Bltzmann, distributin: Chse actin a n play t with prbability " e Q t (a)! n e Q t (b)! b=1 where! is the cmputatinal temperature, A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,

38 Linear Learning Autmata Let! t (a) = Pr{ a t = a} be the nly adapted parameter L R I (Linear, reward - inactin) On success :! t +1 (a t ) =! t (a t ) +"(1 #! t (a t )) 0 < " < 1 (the ther actin prbs. are adjusted t still sum t 1) On failure : n change L R -P (Linear, reward - penalty) On success :! t +1 (a t ) =! t (a t ) +"(1 #! t (a t )) 0 < " < 1 (the ther actin prbs. are adjusted t still sum t 1) On failure :! t +1 (a t ) =! t (a t ) +"(0 #! t (a t )) 0 < " < 1 Fr tw actins, a stchastic, incremental versin f the supervised algrithm A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,

39 Incremental Implementatin Recall the sample average estimatin methd: The average f the first k rewards is (drpping the dependence n ): a Q k = r 1 + r 2 +Lr k k Can we d this incrementally (withut string all the rewards)? We culd keep a running sum and cunt, r, equivalently: Q k +1 = Q k + 1 [ k +1 r k +1! Q k ] This is a cmmn frm fr update rules: NewEstimate = OldEstimate + StepSize[Target OldEstimate] A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,

40 Tracking a Nnstatinary Prblem Chsing Q k t be a sample average is apprpriate in a statinary prblem, i.e., when nne f the Q * (a) change ver time, But nt in a nnstatinary prblem. Better in the nnstatinary case is: Q k +1 = Q k +! [ r k +1 " Q k ] fr cnstant!, 0 <! # 1 = (1"!) k Q 0 + $!(1 "!) k "i r i k i =1 epnential, recency-weighted average A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,

41 Optimistic Initial Values All methds s far depend n Q 0 (a), i.e., they are biased. Suppse instead we initialize the actin values ptimistically, i.e., n the 10-armed testbed, use Q 0 (a) = 5 fr all a A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,

42 Reinfrcement Cmparisn Cmpare rewards t a reference reward, average f bserved rewards, e.g., an Strengthen r weaken the actin taken depending n Let p t (a) dente the preference fr actin a Preferences determine actin prbabilities, e.g., by Gibbs distributin: Then:! t (a) = Pr{ a t = a} = p t +1 (a t ) = p t (a) + r t! r t " r t e p t (a) n e p t (b) b=1 [ ] and r t+1 = r t + "[ r t! r ] t r t! r t A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,

43 Assciative Search Imagine switching bandits at each play Bandit 3 actins A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,

44 Cnclusins These are all very simple methds but they are cmplicated enugh we will build n them Ideas fr imprvements: estimating uncertainties... interval estimatin apprimating Bayes ptimal slutins Gittens indices The full RL prblem ffers sme ideas fr slutin... A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,

45 The Overall Plan Lecture 1: What is Cmputatinal Reinfrcement Learning? Learning frm evaluative feedback Markv decisin prcesses Lecture 2: Dynamic Prgramming Basic Mnte Carl methds Tempral Difference methds A unified perspective Cnnectins t neurscience Lecture 3: Functin apprimatin Mdel-based methds Dimensins f Reinfrcement Learning A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,

46 Lecture 1, Part 3: Markv Decisin Prcesses Objectives f this part: describe the RL prblem in terms f MDPs present idealized frm f the RL prblem fr which we have precise theretical results; intrduce key cmpnents f the mathematics: value functins and Bellman equatins; describe trade-ffs between applicability and mathematical tractability. A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,

47 The Agent-Envirnment Interface Agent and envirnment interact at discrete time steps Agent bserves state at step t : s t!s prduces actin at step t : a t! A(s t ) gets resulting reward : r t +1!" : t = 0, 1, 2, K and resulting net state : s t s t a t r t +1 s t +1 a t +1 r t +2 s t +2 t +2 a r t +3 s t a t +3 A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,

48 The Agent Learns a Plicy Plicy at step t,! t : a mapping frm states t actin prbabilities! t (s, a) = prbability that a t = a when s t = s Reinfrcement learning methds specify hw the agent changes its plicy as a result f eperience. Rughly, the agent s gal is t get as much reward as it can ver the lng run. A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,

49 Getting the Degree f Abstractin Right Time steps need nt refer t fied intervals f real time. Actins can be lw level (e.g., vltages t mtrs), r high level (e.g., accept a jb ffer), mental (e.g., shift in fcus f attentin), etc. States can be lw-level sensatins, r they can be abstract, symblic, based n memry, r subjective (e.g., the state f being surprised r lst ). An RL agent is nt like a whle animal r rbt, which cnsist f many RL agents as well as ther cmpnents. The envirnment is nt necessarily unknwn t the agent, nly incmpletely cntrllable. Reward cmputatin is in the agent s envirnment because the agent cannt change it arbitrarily. A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,

50 Gals and Rewards Is a scalar reward signal an adequate ntin f a gal? maybe nt, but it is surprisingly fleible. A gal shuld specify what we want t achieve, nt hw we want t achieve it. A gal must be utside the agent s direct cntrl thus utside the agent. The agent must be able t measure success: eplicitly; frequently during its lifespan. A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,

51 Returns Suppse the sequence f rewards after step t is : r t +1, r t+ 2, r t + 3, K What d we want t maimize? In general, we want t maimize the epected return, E{ R t }, fr each step t. Episdic tasks: interactin breaks naturally int episdes, e.g., plays f a game, trips thrugh a maze. R t = r t +1 + r t +2 +L + r T, where T is a final time step at which a terminal state is reached, ending an episde. A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,

52 Returns fr Cntinuing Tasks Cntinuing tasks: interactin des nt have natural episdes. Discunted return: " # k =0 R t = r t +1 +! r t+ 2 +! 2 r t +3 +L =! k r t + k +1, where!, 0 $! $ 1, is the discunt rate. shrtsighted 0! " # 1 farsighted A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,

53 An Eample Avid failure: the ple falling beynd a critical angle r the cart hitting end f track. As an episdic task where episde ends upn failure: reward = +1 fr each step befre failure! return = number f steps befre failure As a cntinuing task with discunted return: reward =!1 upn failure; 0 therwise " return is related t! # k, fr k steps befre failure In either case, return is maimized by aviding failure fr as lng as pssible. A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,

54 Anther Eample Get t the tp f the hill as quickly as pssible. reward =!1 fr each step where nt at tp f hill " return =! number f steps befre reaching tp f hill Return is maimized by minimizing number f steps reach the tp f the hill. A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,

55 A Unified Ntatin In episdic tasks, we number the time steps f each episde starting frm zer. We usually d nt have t distinguish between episdes, s we write s t instead f s t, j fr the state at step t f episde j. Think f each episde as ending in an absrbing state that always prduces reward f zer: We can cver all cases by writing R t =! k r t +k +1, where! can be 1 nly if a zer reward absrbing state is always reached. " # k =0 A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,

56 The Markv Prperty By the state at step t, we mean whatever infrmatin is available t the agent at step t abut its envirnment. The state can include immediate sensatins, highly prcessed sensatins, and structures built up ver time frm sequences f sensatins. Ideally, a state shuld summarize past sensatins s as t retain all essential infrmatin, i.e., it shuld have the Markv Prperty: Pr{ s t +1 = s!, r t +1 = r s t,a t,r t, s t "1,a t "1,K, r 1,s 0,a } 0 = Pr{ s t +1 = s!, r t +1 = r s t,a } t fr all s!, r, and histries s t,a t,r t, s t "1,a t "1,K, r 1, s 0,a 0. A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,

57 Markv Decisin Prcesses If a reinfrcement learning task has the Markv Prperty, it is basically a Markv Decisin Prcess (MDP). If state and actin sets are finite, it is a finite MDP. T define a finite MDP, yu need t give: state and actin sets ne-step dynamics defined by transitin prbabilities: P s s! { } fr all s,! a = Pr s t +1 = s! s t = s, a t = a s "S, a "A(s). reward epectatins: a R s! = E{ r t +1 s t = s, a t = a, s t +1 = s!} fr all s, s! "S, a "A(s). A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,

58 An Eample Finite MDP Recycling Rbt At each step, rbt has t decide whether it shuld (1) actively search fr a can, (2) wait fr smene t bring it a can, r (3) g t hme base and recharge. Searching is better but runs dwn the battery; if runs ut f pwer while searching, has t be rescued (which is bad). Decisins made n basis f current energy level: high, lw. Reward = number f cans cllected A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,

59 Recycling Rbt MDP S = { high, lw} A(high) = { search, wait} A(lw) = { search, wait, recharge } R search = epected n. f cans while searching R wait = epected n. f cans while waiting R search > R wait A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,

60 Value Functins The value f a state is the epected return starting frm that state; depends n the agent s plicy: State - value functin fr plicy! : V! (s) = E! R t s t = s { } = E! & $ " k r t +k +1 s t = s The value f taking an actin in a state under plicy π is the epected return starting frm that state, taking that actin, and thereafter fllwing π : A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press, % ' # k =0 Actin- value functin fr plicy! : { } = E! & $ " k r t + k +1 s t = s,a t = a Q! (s, a) = E! R t s t = s, a t = a % ' # k = 0 ( ) * ( ) *

61 Bellman Equatin fr a Plicy π The basic idea: R t = r t +1 +! r t +2 +! 2 r t + 3 +! 3 r t + 4 L = r t +1 +! ( r t +2 +! r t +3 +! 2 r t + 4 L) = r t +1 +! R t +1 S: V " (s) = E " { R t s t = s} { } = E " r t +1 + #V " ( s t +1 ) s t = s Or, withut the epectatin peratr: $ V! (s) =!(s, a) P a s s " a $ s " [ + # V! ( s ")] R a s s " A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,

62 Mre n the Bellman Equatin $ V! (s) =!(s, a) P a s s " a $ s " [ + # V! ( s ")] R a s s " This is a set f equatins (in fact, linear), ne fr each state. The value functin fr π is its unique slutin. Backup diagrams: fr V! fr Q! A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,

63 What abut Value Functin fr Q!? Q " (s,a) = P a s " # s R s s " s " [ a + # $ "( s #, a # ) Q " ( s #, a #) ] a # A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,

64 Gridwrld Actins: nrth, suth, east, west; deterministic. If wuld take agent ff the grid: n mve but reward = 1 Other actins prduce reward = 0, ecept actins that mve agent ut f special states A and B as shwn. State-value functin fr equiprbable randm plicy; γ = 0.9 Nte: A s value is less than immediate reward B s value is mre than immediate reward A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,

65 Optimal Value Functins Fr finite MDPs, plicies can be partially rdered:! "! # if and nly if V! (s) " V! # (s) fr all s $S There is always at least ne (and pssibly many) plicies that is better than r equal t all the thers. This is an ptimal plicy. We dente them all π *. Optimal plicies share the same ptimal state-value functin: V! (s) = ma V " (s) fr all s #S " Optimal plicies als share the same ptimal actin-value functin: Q! (s, a) = ma " Q " (s, a) fr all s #S and a #A(s) This is the epected return fr taking actin a in state s and thereafter fllwing an ptimal plicy. A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,

66 Bellman Optimality Equatin fr V* The value f a state under an ptimal plicy must equal the epected return fr the best actin frm that state: V! (s) = ma Q #! (s,a) a"a( s) = ma a"a( s) = ma a"a( s) E{ r t +1 + $ V! (s t +1 ) s t = s, a t = a} & P a R a s s % s s % s % The relevant backup diagram: [ + $ V! ( s %)] V! is the unique slutin f this system f nnlinear equatins. A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,

67 Bellman Optimality Equatin fr Q* { } Q " (s,a) = E r t +1 + # maq " ( s $, a $ ) s t = s,a t = a a $ % s $ = P s $ [ ] a s R a s s $ + # maq " ( s $, a $ ) a $ The relevant backup diagram: Q * is the unique slutin f this system f nnlinear equatins. A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,

68 Why Optimal State-Value Functins are Useful Any plicy that is greedy with respect t V! is an ptimal plicy. V! Therefre, given, ne-step-ahead search prduces the lng-term ptimal actins. E.g., back t the gridwrld: A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,

69 Car-n-the-Hill Optimal Value Functin Predicted minimum time t gal (negated) Get t the tp f the hill as quickly as pssible (rughly) Muns & Mre Variable reslutin discretizatin fr high-accuracy slutins f ptimal cntrl prblems, IJCAI 99. A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,

70 What Abut Optimal Actin-Value Functins? Q * Given, the agent des nt even have t d a ne-step-ahead search:! " (s) = arg ma Q " (s, a) a#a(s) A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,

71 Slving the Bellman Optimality Equatin Finding an ptimal plicy by slving the Bellman Optimality Equatin requires the fllwing: accurate knwledge f envirnment dynamics; we have enugh space and time t d the cmputatin; the Markv Prperty. Hw much space and time d we need? plynmial in number f states (via dynamic prgramming methds; net lecture), BUT, number f states is ften huge (e.g., backgammn has abut 10**20 states). We usually have t settle fr apprimatins. Many RL methds can be understd as apprimately slving the Bellman Optimality Equatin. A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,

72 Semi-Markv Decisin Prcesses (SMDPs) Generalizatin f an MDP where there is a waiting, r dwell, time τ in each state Transitin prbabilities generalize t P( s ",# s,a) Bellman equatins generalize, e.g., fr a discrete time SMDP: V * (s) = ma a "A(s) & s #,$ [ ] s P( s #,$ s,a) R a + % $ V * ( # s # s ) a R s s " where is nw the amunt f discunted reward epected t accumulate ver the waiting time in s upn ding a and ending up in s A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,

73 Summary Agent-envirnment interactin States Actins Rewards Plicy: stchastic rule fr selecting actins Return: the functin f future rewards agent tries t maimize Episdic and cntinuing tasks Markv Prperty Markv Decisin Prcess Transitin prbabilities Epected rewards Value functins State-value functin fr a plicy Actin-value functin fr a plicy Optimal state-value functin Optimal actin-value functin Optimal value functins Optimal plicies Bellman Equatins The need fr apprimatin Semi-Markv Decisin Prcesses A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,

74 Edward L. Thrndike ( ) Learning by Trial-and-Errr puzzle b A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,

75 Law f Effect Of several respnses made t the same situatin, thse which are accmpanied r clsely fllwed by satisfactin t the animal will, ther things being equal, be mre firmly cnnected with the situatin, s that, when it recurs, they will be mre likely t recur; thse which are accmpanied r clsely fllwed by discmfrt t the animal will, ther things being equal, have their cnnectins with that situatin weakened, s that when it recurs, they will be less likely t ccur. Edward Thrndike, 1911 A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,

76 Search + Memry Search: Trial-and-Errr, Generate-and-Test, Variatin-and- Selectin Memry: remember what wrked best fr each situatin and start frm there net time A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,

77 Credit Assignment Prblem Marvin Minsky, 1961 Getting useful training infrmatin t the right places at the right times Spatial Tempral A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,

78 The Overall Plan Lecture 1: What is Cmputatinal Reinfrcement Learning? Learning frm evaluative feedback Markv decisin prcesses Lecture 2: Basic Mnte Carl methds Dynamic Prgramming Tempral Difference methds A unified perspective Cnnectins t neurscience Lecture 3: Functin apprimatin Mdel-based methds Dimensins f Reinfrcement Learning A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,

Reinforcement Learning" CMPSCI 383 Nov 29, 2011!

Reinforcement Learning CMPSCI 383 Nov 29, 2011! Reinfrcement Learning" CMPSCI 383 Nv 29, 2011! 1 Tdayʼs lecture" Review f Chapter 17: Making Cmple Decisins! Sequential decisin prblems! The mtivatin and advantages f reinfrcement learning.! Passive learning!

More information

Admin. MDP Search Trees. Optimal Quantities. Reinforcement Learning

Admin. MDP Search Trees. Optimal Quantities. Reinforcement Learning Admin Reinfrcement Learning Cntent adapted frm Berkeley CS188 MDP Search Trees Each MDP state prjects an expectimax-like search tree Optimal Quantities The value (utility) f a state s: V*(s) = expected

More information

Reinforcement Learning. Up until now we have been

Reinforcement Learning. Up until now we have been Reinforcement Learning Slides by Rich Sutton Mods by Dan Lizotte Refer to Reinforcement Learning: An Introduction by Sutton and Barto Alpaydin Chapter 16 Up until now we have been Supervised Learning Classifying,

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning 1 Reinforcement Learning Mainly based on Reinforcement Learning An Introduction by Richard Sutton and Andrew Barto Slides are mainly based on the course material provided by the

More information

CS599 Lecture 1 Introduction To RL

CS599 Lecture 1 Introduction To RL CS599 Lecture 1 Introduction To RL Reinforcement Learning Introduction Learning from rewards Policies Value Functions Rewards Models of the Environment Exploitation vs. Exploration Dynamic Programming

More information

Chapter 3: The Reinforcement Learning Problem

Chapter 3: The Reinforcement Learning Problem Chapter 3: The Reinforcement Learning Problem Objectives of this chapter: describe the RL problem we will be studying for the remainder of the course present idealized form of the RL problem for which

More information

Lecture 2: Learning from Evaluative Feedback. or Bandit Problems

Lecture 2: Learning from Evaluative Feedback. or Bandit Problems Lecture 2: Learning from Evaluative Feedback or Bandit Problems 1 Edward L. Thorndike (1874-1949) Puzzle Box 2 Learning by Trial-and-Error Law of Effect: Of several responses to the same situation, those

More information

You need to be able to define the following terms and answer basic questions about them:

You need to be able to define the following terms and answer basic questions about them: CS440/ECE448 Sectin Q Fall 2017 Midterm Review Yu need t be able t define the fllwing terms and answer basic questins abut them: Intr t AI, agents and envirnments Pssible definitins f AI, prs and cns f

More information

Administrativia. Assignment 1 due thursday 9/23/2004 BEFORE midnight. Midterm exam 10/07/2003 in class. CS 460, Sessions 8-9 1

Administrativia. Assignment 1 due thursday 9/23/2004 BEFORE midnight. Midterm exam 10/07/2003 in class. CS 460, Sessions 8-9 1 Administrativia Assignment 1 due thursday 9/23/2004 BEFORE midnight Midterm eam 10/07/2003 in class CS 460, Sessins 8-9 1 Last time: search strategies Uninfrmed: Use nly infrmatin available in the prblem

More information

Lecture 3: The Reinforcement Learning Problem

Lecture 3: The Reinforcement Learning Problem Lecture 3: The Reinforcement Learning Problem Objectives of this lecture: describe the RL problem we will be studying for the remainder of the course present idealized form of the RL problem for which

More information

Chapter 3: The Reinforcement Learning Problem

Chapter 3: The Reinforcement Learning Problem Chapter 3: The Reinforcement Learning Problem Objectives of this chapter: describe the RL problem we will be studying for the remainder of the course present idealized form of the RL problem for which

More information

Differentiation Applications 1: Related Rates

Differentiation Applications 1: Related Rates Differentiatin Applicatins 1: Related Rates 151 Differentiatin Applicatins 1: Related Rates Mdel 1: Sliding Ladder 10 ladder y 10 ladder 10 ladder A 10 ft ladder is leaning against a wall when the bttm

More information

Pattern Recognition 2014 Support Vector Machines

Pattern Recognition 2014 Support Vector Machines Pattern Recgnitin 2014 Supprt Vectr Machines Ad Feelders Universiteit Utrecht Ad Feelders ( Universiteit Utrecht ) Pattern Recgnitin 1 / 55 Overview 1 Separable Case 2 Kernel Functins 3 Allwing Errrs (Sft

More information

x 1 Outline IAML: Logistic Regression Decision Boundaries Example Data

x 1 Outline IAML: Logistic Regression Decision Boundaries Example Data Outline IAML: Lgistic Regressin Charles Suttn and Victr Lavrenk Schl f Infrmatics Semester Lgistic functin Lgistic regressin Learning lgistic regressin Optimizatin The pwer f nn-linear basis functins Least-squares

More information

Activity Guide Loops and Random Numbers

Activity Guide Loops and Random Numbers Unit 3 Lessn 7 Name(s) Perid Date Activity Guide Lps and Randm Numbers CS Cntent Lps are a relatively straightfrward idea in prgramming - yu want a certain chunk f cde t run repeatedly - but it takes a

More information

Math Foundations 20 Work Plan

Math Foundations 20 Work Plan Math Fundatins 20 Wrk Plan Units / Tpics 20.8 Demnstrate understanding f systems f linear inequalities in tw variables. Time Frame December 1-3 weeks 6-10 Majr Learning Indicatrs Identify situatins relevant

More information

SPH3U1 Lesson 06 Kinematics

SPH3U1 Lesson 06 Kinematics PROJECTILE MOTION LEARNING GOALS Students will: Describe the mtin f an bject thrwn at arbitrary angles thrugh the air. Describe the hrizntal and vertical mtins f a prjectile. Slve prjectile mtin prblems.

More information

A New Evaluation Measure. J. Joiner and L. Werner. The problems of evaluation and the needed criteria of evaluation

A New Evaluation Measure. J. Joiner and L. Werner. The problems of evaluation and the needed criteria of evaluation III-l III. A New Evaluatin Measure J. Jiner and L. Werner Abstract The prblems f evaluatin and the needed criteria f evaluatin measures in the SMART system f infrmatin retrieval are reviewed and discussed.

More information

Bootstrap Method > # Purpose: understand how bootstrap method works > obs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(obs) >

Bootstrap Method > # Purpose: understand how bootstrap method works > obs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(obs) > Btstrap Methd > # Purpse: understand hw btstrap methd wrks > bs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(bs) > mean(bs) [1] 21.64625 > # estimate f lambda > lambda = 1/mean(bs);

More information

Lab 1 The Scientific Method

Lab 1 The Scientific Method INTRODUCTION The fllwing labratry exercise is designed t give yu, the student, an pprtunity t explre unknwn systems, r universes, and hypthesize pssible rules which may gvern the behavir within them. Scientific

More information

CAUSAL INFERENCE. Technical Track Session I. Phillippe Leite. The World Bank

CAUSAL INFERENCE. Technical Track Session I. Phillippe Leite. The World Bank CAUSAL INFERENCE Technical Track Sessin I Phillippe Leite The Wrld Bank These slides were develped by Christel Vermeersch and mdified by Phillippe Leite fr the purpse f this wrkshp Plicy questins are causal

More information

Department of Electrical Engineering, University of Waterloo. Introduction

Department of Electrical Engineering, University of Waterloo. Introduction Sectin 4: Sequential Circuits Majr Tpics Types f sequential circuits Flip-flps Analysis f clcked sequential circuits Mre and Mealy machines Design f clcked sequential circuits State transitin design methd

More information

Edward L. Thorndike #1874$1949% Lecture 2: Learning from Evaluative Feedback. or!bandit Problems" Learning by Trial$and$Error.

Edward L. Thorndike #1874$1949% Lecture 2: Learning from Evaluative Feedback. or!bandit Problems Learning by Trial$and$Error. Lecture 2: Learning from Evaluative Feedback Edward L. Thorndike #1874$1949% or!bandit Problems" Puzzle Box 1 2 Learning by Trial$and$Error Law of E&ect:!Of several responses to the same situation, those

More information

Modelling of Clock Behaviour. Don Percival. Applied Physics Laboratory University of Washington Seattle, Washington, USA

Modelling of Clock Behaviour. Don Percival. Applied Physics Laboratory University of Washington Seattle, Washington, USA Mdelling f Clck Behaviur Dn Percival Applied Physics Labratry University f Washingtn Seattle, Washingtn, USA verheads and paper fr talk available at http://faculty.washingtn.edu/dbp/talks.html 1 Overview

More information

COMP9414/ 9814/ 3411: Artificial Intelligence. 14. Course Review. COMP3411 c UNSW, 2014

COMP9414/ 9814/ 3411: Artificial Intelligence. 14. Course Review. COMP3411 c UNSW, 2014 COMP9414/ 9814/ 3411: Artificial Intelligence 14. Curse Review COMP9414/9814/3411 14s1 Review 1 Assessment Assessable cmpnents f the curse: Assignment 1 10% Assignment 2 8% Assignment 3 12% Written Eam

More information

, which yields. where z1. and z2

, which yields. where z1. and z2 The Gaussian r Nrmal PDF, Page 1 The Gaussian r Nrmal Prbability Density Functin Authr: Jhn M Cimbala, Penn State University Latest revisin: 11 September 13 The Gaussian r Nrmal Prbability Density Functin

More information

Chapter 3: Cluster Analysis

Chapter 3: Cluster Analysis Chapter 3: Cluster Analysis } 3.1 Basic Cncepts f Clustering 3.1.1 Cluster Analysis 3.1. Clustering Categries } 3. Partitining Methds 3..1 The principle 3.. K-Means Methd 3..3 K-Medids Methd 3..4 CLARA

More information

Tree Structured Classifier

Tree Structured Classifier Tree Structured Classifier Reference: Classificatin and Regressin Trees by L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stne, Chapman & Hall, 98. A Medical Eample (CART): Predict high risk patients

More information

Resampling Methods. Cross-validation, Bootstrapping. Marek Petrik 2/21/2017

Resampling Methods. Cross-validation, Bootstrapping. Marek Petrik 2/21/2017 Resampling Methds Crss-validatin, Btstrapping Marek Petrik 2/21/2017 Sme f the figures in this presentatin are taken frm An Intrductin t Statistical Learning, with applicatins in R (Springer, 2013) with

More information

Five Whys How To Do It Better

Five Whys How To Do It Better Five Whys Definitin. As explained in the previus article, we define rt cause as simply the uncvering f hw the current prblem came int being. Fr a simple causal chain, it is the entire chain. Fr a cmplex

More information

Physics 2010 Motion with Constant Acceleration Experiment 1

Physics 2010 Motion with Constant Acceleration Experiment 1 . Physics 00 Mtin with Cnstant Acceleratin Experiment In this lab, we will study the mtin f a glider as it accelerates dwnhill n a tilted air track. The glider is supprted ver the air track by a cushin

More information

COMP9444 Neural Networks and Deep Learning 3. Backpropagation

COMP9444 Neural Networks and Deep Learning 3. Backpropagation COMP9444 Neural Netwrks and Deep Learning 3. Backprpagatin Tetbk, Sectins 4.3, 5.2, 6.5.2 COMP9444 17s2 Backprpagatin 1 Outline Supervised Learning Ockham s Razr (5.2) Multi-Layer Netwrks Gradient Descent

More information

Course basics. CSE 190: Reinforcement Learning: An Introduction. Last Time. Course goals. The website for the class is linked off my homepage.

Course basics. CSE 190: Reinforcement Learning: An Introduction. Last Time. Course goals. The website for the class is linked off my homepage. Course basics CSE 190: Reinforcement Learning: An Introduction The website for the class is linked off my homepage. Grades will be based on programming assignments, homeworks, and class participation.

More information

Determining Optimum Path in Synthesis of Organic Compounds using Branch and Bound Algorithm

Determining Optimum Path in Synthesis of Organic Compounds using Branch and Bound Algorithm Determining Optimum Path in Synthesis f Organic Cmpunds using Branch and Bund Algrithm Diastuti Utami 13514071 Prgram Studi Teknik Infrmatika Seklah Teknik Elektr dan Infrmatika Institut Teknlgi Bandung,

More information

The blessing of dimensionality for kernel methods

The blessing of dimensionality for kernel methods fr kernel methds Building classifiers in high dimensinal space Pierre Dupnt Pierre.Dupnt@ucluvain.be Classifiers define decisin surfaces in sme feature space where the data is either initially represented

More information

Physical Layer: Outline

Physical Layer: Outline 18-: Intrductin t Telecmmunicatin Netwrks Lectures : Physical Layer Peter Steenkiste Spring 01 www.cs.cmu.edu/~prs/nets-ece Physical Layer: Outline Digital Representatin f Infrmatin Characterizatin f Cmmunicatin

More information

IAML: Support Vector Machines

IAML: Support Vector Machines 1 / 22 IAML: Supprt Vectr Machines Charles Suttn and Victr Lavrenk Schl f Infrmatics Semester 1 2 / 22 Outline Separating hyperplane with maimum margin Nn-separable training data Epanding the input int

More information

In SMV I. IAML: Support Vector Machines II. This Time. The SVM optimization problem. We saw:

In SMV I. IAML: Support Vector Machines II. This Time. The SVM optimization problem. We saw: In SMV I IAML: Supprt Vectr Machines II Nigel Gddard Schl f Infrmatics Semester 1 We sa: Ma margin trick Gemetry f the margin and h t cmpute it Finding the ma margin hyperplane using a cnstrained ptimizatin

More information

Internal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9.

Internal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9. Sectin 7 Mdel Assessment This sectin is based n Stck and Watsn s Chapter 9. Internal vs. external validity Internal validity refers t whether the analysis is valid fr the ppulatin and sample being studied.

More information

Turing Machines. Human-aware Robotics. 2017/10/17 & 19 Chapter 3.2 & 3.3 in Sipser Ø Announcement:

Turing Machines. Human-aware Robotics. 2017/10/17 & 19 Chapter 3.2 & 3.3 in Sipser Ø Announcement: Turing Machines Human-aware Rbtics 2017/10/17 & 19 Chapter 3.2 & 3.3 in Sipser Ø Annuncement: q q q q Slides fr this lecture are here: http://www.public.asu.edu/~yzhan442/teaching/cse355/lectures/tm-ii.pdf

More information

CHAPTER 3 INEQUALITIES. Copyright -The Institute of Chartered Accountants of India

CHAPTER 3 INEQUALITIES. Copyright -The Institute of Chartered Accountants of India CHAPTER 3 INEQUALITIES Cpyright -The Institute f Chartered Accuntants f India INEQUALITIES LEARNING OBJECTIVES One f the widely used decisin making prblems, nwadays, is t decide n the ptimal mix f scarce

More information

Support-Vector Machines

Support-Vector Machines Supprt-Vectr Machines Intrductin Supprt vectr machine is a linear machine with sme very nice prperties. Haykin chapter 6. See Alpaydin chapter 13 fr similar cntent. Nte: Part f this lecture drew material

More information

Computational modeling techniques

Computational modeling techniques Cmputatinal mdeling techniques Lecture 4: Mdel checing fr ODE mdels In Petre Department f IT, Åb Aademi http://www.users.ab.fi/ipetre/cmpmd/ Cntent Stichimetric matrix Calculating the mass cnservatin relatins

More information

Dead-beat controller design

Dead-beat controller design J. Hetthéssy, A. Barta, R. Bars: Dead beat cntrller design Nvember, 4 Dead-beat cntrller design In sampled data cntrl systems the cntrller is realised by an intelligent device, typically by a PLC (Prgrammable

More information

Assessment Primer: Writing Instructional Objectives

Assessment Primer: Writing Instructional Objectives Assessment Primer: Writing Instructinal Objectives (Based n Preparing Instructinal Objectives by Mager 1962 and Preparing Instructinal Objectives: A critical tl in the develpment f effective instructin

More information

CS 477/677 Analysis of Algorithms Fall 2007 Dr. George Bebis Course Project Due Date: 11/29/2007

CS 477/677 Analysis of Algorithms Fall 2007 Dr. George Bebis Course Project Due Date: 11/29/2007 CS 477/677 Analysis f Algrithms Fall 2007 Dr. Gerge Bebis Curse Prject Due Date: 11/29/2007 Part1: Cmparisn f Srting Algrithms (70% f the prject grade) The bjective f the first part f the assignment is

More information

MODULE 1. e x + c. [You can t separate a demominator, but you can divide a single denominator into each numerator term] a + b a(a + b)+1 = a + b

MODULE 1. e x + c. [You can t separate a demominator, but you can divide a single denominator into each numerator term] a + b a(a + b)+1 = a + b . REVIEW OF SOME BASIC ALGEBRA MODULE () Slving Equatins Yu shuld be able t slve fr x: a + b = c a d + e x + c and get x = e(ba +) b(c a) d(ba +) c Cmmn mistakes and strategies:. a b + c a b + a c, but

More information

ENSC Discrete Time Systems. Project Outline. Semester

ENSC Discrete Time Systems. Project Outline. Semester ENSC 49 - iscrete Time Systems Prject Outline Semester 006-1. Objectives The gal f the prject is t design a channel fading simulatr. Upn successful cmpletin f the prject, yu will reinfrce yur understanding

More information

Medium Scale Integrated (MSI) devices [Sections 2.9 and 2.10]

Medium Scale Integrated (MSI) devices [Sections 2.9 and 2.10] EECS 270, Winter 2017, Lecture 3 Page 1 f 6 Medium Scale Integrated (MSI) devices [Sectins 2.9 and 2.10] As we ve seen, it s smetimes nt reasnable t d all the design wrk at the gate-level smetimes we just

More information

Part 3 Introduction to statistical classification techniques

Part 3 Introduction to statistical classification techniques Part 3 Intrductin t statistical classificatin techniques Machine Learning, Part 3, March 07 Fabi Rli Preamble ØIn Part we have seen that if we knw: Psterir prbabilities P(ω i / ) Or the equivalent terms

More information

Hypothesis Tests for One Population Mean

Hypothesis Tests for One Population Mean Hypthesis Tests fr One Ppulatin Mean Chapter 9 Ala Abdelbaki Objective Objective: T estimate the value f ne ppulatin mean Inferential statistics using statistics in rder t estimate parameters We will be

More information

making triangle (ie same reference angle) ). This is a standard form that will allow us all to have the X= y=

making triangle (ie same reference angle) ). This is a standard form that will allow us all to have the X= y= Intrductin t Vectrs I 21 Intrductin t Vectrs I 22 I. Determine the hrizntal and vertical cmpnents f the resultant vectr by cunting n the grid. X= y= J. Draw a mangle with hrizntal and vertical cmpnents

More information

Lecture 13: Markov Chain Monte Carlo. Gibbs sampling

Lecture 13: Markov Chain Monte Carlo. Gibbs sampling Lecture 13: Markv hain Mnte arl Gibbs sampling Gibbs sampling Markv chains 1 Recall: Apprximate inference using samples Main idea: we generate samples frm ur Bayes net, then cmpute prbabilities using (weighted)

More information

Introduction to Spacetime Geometry

Introduction to Spacetime Geometry Intrductin t Spacetime Gemetry Let s start with a review f a basic feature f Euclidean gemetry, the Pythagrean therem. In a twdimensinal crdinate system we can relate the length f a line segment t the

More information

k-nearest Neighbor How to choose k Average of k points more reliable when: Large k: noise in attributes +o o noise in class labels

k-nearest Neighbor How to choose k Average of k points more reliable when: Large k: noise in attributes +o o noise in class labels Mtivating Example Memry-Based Learning Instance-Based Learning K-earest eighbr Inductive Assumptin Similar inputs map t similar utputs If nt true => learning is impssible If true => learning reduces t

More information

Reinforcement Learning (1)

Reinforcement Learning (1) Reinforcement Learning 1 Reinforcement Learning (1) Machine Learning 64-360, Part II Norman Hendrich University of Hamburg, Dept. of Informatics Vogt-Kölln-Str. 30, D-22527 Hamburg hendrich@informatik.uni-hamburg.de

More information

AP Physics Kinematic Wrap Up

AP Physics Kinematic Wrap Up AP Physics Kinematic Wrap Up S what d yu need t knw abut this mtin in tw-dimensin stuff t get a gd scre n the ld AP Physics Test? First ff, here are the equatins that yu ll have t wrk with: v v at x x

More information

What is Statistical Learning?

What is Statistical Learning? What is Statistical Learning? Sales 5 10 15 20 25 Sales 5 10 15 20 25 Sales 5 10 15 20 25 0 50 100 200 300 TV 0 10 20 30 40 50 Radi 0 20 40 60 80 100 Newspaper Shwn are Sales vs TV, Radi and Newspaper,

More information

Sequential Allocation with Minimal Switching

Sequential Allocation with Minimal Switching In Cmputing Science and Statistics 28 (1996), pp. 567 572 Sequential Allcatin with Minimal Switching Quentin F. Stut 1 Janis Hardwick 1 EECS Dept., University f Michigan Statistics Dept., Purdue University

More information

1996 Engineering Systems Design and Analysis Conference, Montpellier, France, July 1-4, 1996, Vol. 7, pp

1996 Engineering Systems Design and Analysis Conference, Montpellier, France, July 1-4, 1996, Vol. 7, pp THE POWER AND LIMIT OF NEURAL NETWORKS T. Y. Lin Department f Mathematics and Cmputer Science San Jse State University San Jse, Califrnia 959-003 tylin@cs.ssu.edu and Bereley Initiative in Sft Cmputing*

More information

Lab #3: Pendulum Period and Proportionalities

Lab #3: Pendulum Period and Proportionalities Physics 144 Chwdary Hw Things Wrk Spring 2006 Name: Partners Name(s): Intrductin Lab #3: Pendulum Perid and Prprtinalities Smetimes, it is useful t knw the dependence f ne quantity n anther, like hw the

More information

Department of Economics, University of California, Davis Ecn 200C Micro Theory Professor Giacomo Bonanno. Insurance Markets

Department of Economics, University of California, Davis Ecn 200C Micro Theory Professor Giacomo Bonanno. Insurance Markets Department f Ecnmics, University f alifrnia, Davis Ecn 200 Micr Thery Prfessr Giacm Bnann Insurance Markets nsider an individual wh has an initial wealth f. ith sme prbability p he faces a lss f x (0

More information

Relationship Between Amplifier Settling Time and Pole-Zero Placements for Second-Order Systems *

Relationship Between Amplifier Settling Time and Pole-Zero Placements for Second-Order Systems * Relatinship Between Amplifier Settling Time and Ple-Zer Placements fr Secnd-Order Systems * Mark E. Schlarmann and Randall L. Geiger Iwa State University Electrical and Cmputer Engineering Department Ames,

More information

Dataflow Analysis and Abstract Interpretation

Dataflow Analysis and Abstract Interpretation Dataflw Analysis and Abstract Interpretatin Cmputer Science and Artificial Intelligence Labratry MIT Nvember 9, 2015 Recap Last time we develped frm first principles an algrithm t derive invariants. Key

More information

COMP 551 Applied Machine Learning Lecture 4: Linear classification

COMP 551 Applied Machine Learning Lecture 4: Linear classification COMP 551 Applied Machine Learning Lecture 4: Linear classificatin Instructr: Jelle Pineau (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/cmp551 Unless therwise nted, all material psted

More information

Determining the Accuracy of Modal Parameter Estimation Methods

Determining the Accuracy of Modal Parameter Estimation Methods Determining the Accuracy f Mdal Parameter Estimatin Methds by Michael Lee Ph.D., P.E. & Mar Richardsn Ph.D. Structural Measurement Systems Milpitas, CA Abstract The mst cmmn type f mdal testing system

More information

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeff Reading: Chapter 2 STATS 202: Data mining and analysis September 27, 2017 1 / 20 Supervised vs. unsupervised learning In unsupervised

More information

NUMBERS, MATHEMATICS AND EQUATIONS

NUMBERS, MATHEMATICS AND EQUATIONS AUSTRALIAN CURRICULUM PHYSICS GETTING STARTED WITH PHYSICS NUMBERS, MATHEMATICS AND EQUATIONS An integral part t the understanding f ur physical wrld is the use f mathematical mdels which can be used t

More information

INSTRUMENTAL VARIABLES

INSTRUMENTAL VARIABLES INSTRUMENTAL VARIABLES Technical Track Sessin IV Sergi Urzua University f Maryland Instrumental Variables and IE Tw main uses f IV in impact evaluatin: 1. Crrect fr difference between assignment f treatment

More information

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeff Reading: Chapter 2 STATS 202: Data mining and analysis September 27, 2017 1 / 20 Supervised vs. unsupervised learning In unsupervised

More information

COMP 551 Applied Machine Learning Lecture 9: Support Vector Machines (cont d)

COMP 551 Applied Machine Learning Lecture 9: Support Vector Machines (cont d) COMP 551 Applied Machine Learning Lecture 9: Supprt Vectr Machines (cnt d) Instructr: Herke van Hf (herke.vanhf@mail.mcgill.ca) Slides mstly by: Class web page: www.cs.mcgill.ca/~hvanh2/cmp551 Unless therwise

More information

initially lcated away frm the data set never win the cmpetitin, resulting in a nnptimal nal cdebk, [2] [3] [4] and [5]. Khnen's Self Organizing Featur

initially lcated away frm the data set never win the cmpetitin, resulting in a nnptimal nal cdebk, [2] [3] [4] and [5]. Khnen's Self Organizing Featur Cdewrd Distributin fr Frequency Sensitive Cmpetitive Learning with One Dimensinal Input Data Aristides S. Galanpuls and Stanley C. Ahalt Department f Electrical Engineering The Ohi State University Abstract

More information

Basics. Primary School learning about place value is often forgotten and can be reinforced at home.

Basics. Primary School learning about place value is often forgotten and can be reinforced at home. Basics When pupils cme t secndary schl they start a lt f different subjects and have a lt f new interests but it is still imprtant that they practise their basic number wrk which may nt be reinfrced as

More information

Thermodynamics and Equilibrium

Thermodynamics and Equilibrium Thermdynamics and Equilibrium Thermdynamics Thermdynamics is the study f the relatinship between heat and ther frms f energy in a chemical r physical prcess. We intrduced the thermdynamic prperty f enthalpy,

More information

This section is primarily focused on tools to aid us in finding roots/zeros/ -intercepts of polynomials. Essentially, our focus turns to solving.

This section is primarily focused on tools to aid us in finding roots/zeros/ -intercepts of polynomials. Essentially, our focus turns to solving. Sectin 3.2: Many f yu WILL need t watch the crrespnding vides fr this sectin n MyOpenMath! This sectin is primarily fcused n tls t aid us in finding rts/zers/ -intercepts f plynmials. Essentially, ur fcus

More information

Thermodynamics Partial Outline of Topics

Thermodynamics Partial Outline of Topics Thermdynamics Partial Outline f Tpics I. The secnd law f thermdynamics addresses the issue f spntaneity and invlves a functin called entrpy (S): If a prcess is spntaneus, then Suniverse > 0 (2 nd Law!)

More information

x x

x x Mdeling the Dynamics f Life: Calculus and Prbability fr Life Scientists Frederick R. Adler cfrederick R. Adler, Department f Mathematics and Department f Bilgy, University f Utah, Salt Lake City, Utah

More information

Kinetic Model Completeness

Kinetic Model Completeness 5.68J/10.652J Spring 2003 Lecture Ntes Tuesday April 15, 2003 Kinetic Mdel Cmpleteness We say a chemical kinetic mdel is cmplete fr a particular reactin cnditin when it cntains all the species and reactins

More information

Elements of Machine Intelligence - I

Elements of Machine Intelligence - I ECE-175A Elements f Machine Intelligence - I Ken Kreutz-Delgad Nun Vascncels ECE Department, UCSD Winter 2011 The curse The curse will cver basic, but imprtant, aspects f machine learning and pattern recgnitin

More information

Example 1. A robot has a mass of 60 kg. How much does that robot weigh sitting on the earth at sea level? Given: m. Find: Relationships: W

Example 1. A robot has a mass of 60 kg. How much does that robot weigh sitting on the earth at sea level? Given: m. Find: Relationships: W Eample 1 rbt has a mass f 60 kg. Hw much des that rbt weigh sitting n the earth at sea level? Given: m Rbt = 60 kg ind: Rbt Relatinships: Slutin: Rbt =589 N = mg, g = 9.81 m/s Rbt = mrbt g = 60 9. 81 =

More information

Synchronous Motor V-Curves

Synchronous Motor V-Curves Synchrnus Mtr V-Curves 1 Synchrnus Mtr V-Curves Intrductin Synchrnus mtrs are used in applicatins such as textile mills where cnstant speed peratin is critical. Mst small synchrnus mtrs cntain squirrel

More information

Chapter 3 Digital Transmission Fundamentals

Chapter 3 Digital Transmission Fundamentals Chapter 3 Digital Transmissin Fundamentals Errr Detectin and Crrectin CSE 3213, Winter 2010 Instructr: Frhar Frzan Mdul-2 Arithmetic Mdul 2 arithmetic is perfrmed digit y digit n inary numers. Each digit

More information

2004 AP CHEMISTRY FREE-RESPONSE QUESTIONS

2004 AP CHEMISTRY FREE-RESPONSE QUESTIONS 2004 AP CHEMISTRY FREE-RESPONSE QUESTIONS 6. An electrchemical cell is cnstructed with an pen switch, as shwn in the diagram abve. A strip f Sn and a strip f an unknwn metal, X, are used as electrdes.

More information

Eric Klein and Ning Sa

Eric Klein and Ning Sa Week 12. Statistical Appraches t Netwrks: p1 and p* Wasserman and Faust Chapter 15: Statistical Analysis f Single Relatinal Netwrks There are fur tasks in psitinal analysis: 1) Define Equivalence 2) Measure

More information

We can see from the graph above that the intersection is, i.e., [ ).

We can see from the graph above that the intersection is, i.e., [ ). MTH 111 Cllege Algebra Lecture Ntes July 2, 2014 Functin Arithmetic: With nt t much difficulty, we ntice that inputs f functins are numbers, and utputs f functins are numbers. S whatever we can d with

More information

Lecture 7: Damped and Driven Oscillations

Lecture 7: Damped and Driven Oscillations Lecture 7: Damped and Driven Oscillatins Last time, we fund fr underdamped scillatrs: βt x t = e A1 + A csω1t + i A1 A sinω1t A 1 and A are cmplex numbers, but ur answer must be real Implies that A 1 and

More information

Competency Statements for Wm. E. Hay Mathematics for grades 7 through 12:

Competency Statements for Wm. E. Hay Mathematics for grades 7 through 12: Cmpetency Statements fr Wm. E. Hay Mathematics fr grades 7 thrugh 12: Upn cmpletin f grade 12 a student will have develped a cmbinatin f sme/all f the fllwing cmpetencies depending upn the stream f math

More information

BLAST / HIDDEN MARKOV MODELS

BLAST / HIDDEN MARKOV MODELS CS262 (Winter 2015) Lecture 5 (January 20) Scribe: Kat Gregry BLAST / HIDDEN MARKOV MODELS BLAST CONTINUED HEURISTIC LOCAL ALIGNMENT Use Cmmnly used t search vast bilgical databases (n the rder f terabases/tetrabases)

More information

A - LEVEL MATHEMATICS 2018/2019

A - LEVEL MATHEMATICS 2018/2019 A - LEVEL MATHEMATICS 2018/2019 STRUCTURE OF THE COURSE Yur maths A-Level Maths curse cvers Pure Mathematics, Mechanics and Statistics. Yu will be eamined at the end f the tw-year curse. The assessment

More information

COMP 551 Applied Machine Learning Lecture 11: Support Vector Machines

COMP 551 Applied Machine Learning Lecture 11: Support Vector Machines COMP 551 Applied Machine Learning Lecture 11: Supprt Vectr Machines Instructr: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/cmp551 Unless therwise nted, all material psted fr this curse

More information

The Law of Total Probability, Bayes Rule, and Random Variables (Oh My!)

The Law of Total Probability, Bayes Rule, and Random Variables (Oh My!) The Law f Ttal Prbability, Bayes Rule, and Randm Variables (Oh My!) Administrivia Hmewrk 2 is psted and is due tw Friday s frm nw If yu didn t start early last time, please d s this time. Gd Milestnes:

More information

COMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification

COMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification COMP 551 Applied Machine Learning Lecture 5: Generative mdels fr linear classificatin Instructr: Herke van Hf (herke.vanhf@mail.mcgill.ca) Slides mstly by: Jelle Pineau Class web page: www.cs.mcgill.ca/~hvanh2/cmp551

More information

MODULE FOUR. This module addresses functions. SC Academic Elementary Algebra Standards:

MODULE FOUR. This module addresses functions. SC Academic Elementary Algebra Standards: MODULE FOUR This mdule addresses functins SC Academic Standards: EA-3.1 Classify a relatinship as being either a functin r nt a functin when given data as a table, set f rdered pairs, r graph. EA-3.2 Use

More information

LHS Mathematics Department Honors Pre-Calculus Final Exam 2002 Answers

LHS Mathematics Department Honors Pre-Calculus Final Exam 2002 Answers LHS Mathematics Department Hnrs Pre-alculus Final Eam nswers Part Shrt Prblems The table at the right gives the ppulatin f Massachusetts ver the past several decades Using an epnential mdel, predict the

More information

AP Statistics Notes Unit Two: The Normal Distributions

AP Statistics Notes Unit Two: The Normal Distributions AP Statistics Ntes Unit Tw: The Nrmal Distributins Syllabus Objectives: 1.5 The student will summarize distributins f data measuring the psitin using quartiles, percentiles, and standardized scres (z-scres).

More information

BASD HIGH SCHOOL FORMAL LAB REPORT

BASD HIGH SCHOOL FORMAL LAB REPORT BASD HIGH SCHOOL FORMAL LAB REPORT *WARNING: After an explanatin f what t include in each sectin, there is an example f hw the sectin might lk using a sample experiment Keep in mind, the sample lab used

More information

Fall 2013 Physics 172 Recitation 3 Momentum and Springs

Fall 2013 Physics 172 Recitation 3 Momentum and Springs Fall 03 Physics 7 Recitatin 3 Mmentum and Springs Purpse: The purpse f this recitatin is t give yu experience wrking with mmentum and the mmentum update frmula. Readings: Chapter.3-.5 Learning Objectives:.3.

More information

I. Analytical Potential and Field of a Uniform Rod. V E d. The definition of electric potential difference is

I. Analytical Potential and Field of a Uniform Rod. V E d. The definition of electric potential difference is Length L>>a,b,c Phys 232 Lab 4 Ch 17 Electric Ptential Difference Materials: whitebards & pens, cmputers with VPythn, pwer supply & cables, multimeter, crkbard, thumbtacks, individual prbes and jined prbes,

More information

Lab 11 LRC Circuits, Damped Forced Harmonic Motion

Lab 11 LRC Circuits, Damped Forced Harmonic Motion Physics 6 ab ab 11 ircuits, Damped Frced Harmnic Mtin What Yu Need T Knw: The Physics OK this is basically a recap f what yu ve dne s far with circuits and circuits. Nw we get t put everything tgether

More information

CESAR Science Case The differential rotation of the Sun and its Chromosphere. Introduction. Material that is necessary during the laboratory

CESAR Science Case The differential rotation of the Sun and its Chromosphere. Introduction. Material that is necessary during the laboratory Teacher s guide CESAR Science Case The differential rtatin f the Sun and its Chrmsphere Material that is necessary during the labratry CESAR Astrnmical wrd list CESAR Bklet CESAR Frmula sheet CESAR Student

More information