An Introduction to COMPUTATIONAL REINFORCEMENT LEARING. Andrew G. Barto. Department of Computer Science University of Massachusetts Amherst
|
|
- Eleanor Catherine Spencer
- 6 years ago
- Views:
Transcription
1 An Intrductin t COMPUTATIONAL REINFORCEMENT LEARING Andrew G. Bart Department f Cmputer Science University f Massachusetts Amherst UPF Lecture 1 Autnmus Learning Labratry Department f Cmputer Science
2 Artificial Intelligence Psychlgy Cmputatinal Reinfrcement Learning (RL) Cntrl Thery and Operatins Research Neurscience Artificial Neural Netwrks
3 The Overall Plan Lecture 1: What is Cmputatinal Reinfrcement Learning? Learning frm evaluative feedback Markv decisin prcesses Lecture 2: Dynamic Prgramming Basic Mnte Carl methds Tempral Difference methds A unified perspective Cnnectins t neurscience Lecture 3: Functin apprimatin Mdel-based methds Dimensins f Reinfrcement Learning A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,
4 What is Reinfrcement Learning? Learning frm interactin Gal-riented learning Learning abut, frm, and while interacting with an eternal envirnment Learning what t d hw t map situatins t actins s as t maimize a numerical reward signal A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,
5 Supervised Learning Training Inf = desired (target) utputs Inputs Supervised Learning System Outputs Errr = (target utput actual utput) A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,
6 Reinfrcement Learning Training Inf = evaluatins ( rewards / penalties ) Inputs RL System Outputs ( actins ) Objective: get as much reward as pssible A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,
7 Key Features f RL Learner is nt tld which actins t take Trial-and-Errr search Pssibility f delayed reward Sacrifice shrt-term gains fr greater lng-term gains The need t eplre and eplit Cnsiders the whle prblem f a gal-directed agent interacting with an uncertain envirnment A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,
8 Cmplete Agent Temprally situated Cntinual learning and planning Object is t affect the envirnment Envirnment is stchastic and uncertain Envirnment state actin reward Agent A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,
9 A Less Misleading View eternal sensatins memry reward RL agent state internal sensatins actins A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,
10 Elements f RL Plicy: what t d Reward: what is gd Value: what is gd because it predicts reward Mdel: what fllws what Plicy Reward Value Mdel f envirnment A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,
11 An Etended Eample: Tic-Tac-Te X O X O X X O O X X X O O X X X O O O X X X O O O X X X } s mve } s mve } s mve } s mve Assume an imperfect ppnent: he/she smetimes makes mistakes } s mve A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,
12 An RL Apprach t Tic-Tac-Te 1. Make a table with ne entry per state: State V(s) estimated prbability f winning.5?.5?... 1 win lss 0 draw 2. Nw play lts f games. T pick ur mves, lk ahead ne step: current state varius pssible net states A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press, * Just pick the net state with the highest estimated prb. f winning the largest V(s); a greedy mve. But 10% f the time pick a mve at randm; an eplratry mve.
13 RL Learning Rule fr Tic-Tac-Te Eplratry mve s s! the state befre ur greedy mve the state after ur greedy mve We increment each V(s) tward V( s!) a backup : V(s) " V (s) + #[ V( s!) $ V (s)] a small psitive fractin, e.g.,! =.1 the step - size parameter A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,
14 Hw can we imprve this T.T.T. player? Take advantage f symmetries representatin/generalizatin Hw might this backfire? D we need randm mves? Why? D we always need a full 10%? Can we learn frm randm mves? Can we learn ffline?... Pre-training frm self play? Using learned mdels f ppnent? A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,
15 Hw is Tic-Tac-Te T Easy? Finite, small number f states One-step lk-ahead is always pssible State cmpletely bservable... A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,
16 Sme Ntable RL Applicatins TD-Gammn: Tesaur wrld s best backgammn prgram Elevatr Cntrl: Crites & Bart high perfrmance dwn-peak elevatr cntrller Inventry Management: Van Ry, Bertsekas, Lee&Tsitsiklis 10 15% imprvement ver industry standard methds Dynamic Channel Assignment: Singh & Bertsekas, Nie & Haykin high perfrmance assignment f radi channels t mbile telephne calls A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,
17 TD-Gammn Tesaur, Value Actin selectin by 2 3 ply search TD errr V t+1! V t Start with a randm netwrk Play very many games against self Learn a value functin frm this simulated eperience This prduces arguably the best player in the wrld A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,
18 10 flrs, 4 elevatr cars Elevatr Dispatching Crites and Bart, 1996 STATES: buttn states; psitins, directins, and mtin states f cars; passengers in cars & in halls ACTIONS: stp at, r g by, net flr REWARDS: rughly, 1 per time step fr each persn waiting Cnservatively abut states A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,
19 Autnmus Helicpter Flight A. Ng, Stanfrd; H. Kim, M. Jrdn, S. Sastry, Berkeley A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,
20 Quadrupedal Lcmtin Nate Khl & Peter Stne, Univ f Teas at Austin All training dne with physical rbts: Sny Aib ERS-210A Befre Learning After 1000 trials, r abut 3 hurs A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,
21 Learning Cntrl fr Dynamically Stable Walking Rbts Russ Tedrake, Teresa Zhang, H. Sebastin Seung, MIT Start with a Passive Walker A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,
22
23 Grasp Cntrl R. Platt, A. Fagg, R. Grupen, Univ f Mass Umass Trs: Deter A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,
24 Sme RL Histry Trial-and-Errr learning Tempral-difference learning Optimal cntrl, value functins Thrndike (Ψ) 1911 Minsky Secndary reinfrcement (Ψ) Samuel Hamiltn (Physics) 1800s Shannn Bellman/Hward (OR) Klpf Hlland Bart et al. Witten Suttn Werbs Watkins A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,
25 Samuel s Checkers Player Arthur Samuel 1959, 1967 Scre bard cnfiguratins by a scring plynmial (after Shannn, 1950) Minima t determine backed-up scre f a psitin Alpha-beta cutffs Rte learning: save each bard cnfig encuntered tgether with backed-up scre needed a sense f directin : like discunting Learning by generalizatin: similar t TD algrithm A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,
26 Samuel s Backups A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,
27 The Basic Idea... we are attempting t make the scre, calculated fr the current bard psitin, lk like that calculated fr the terminal bard psitins f the chain f mves which mst prbably ccur during actual play. A. L. Samuel Sme Studies in Machine Learning Using the Game f Checkers, 1959 A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,
28 A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press, MENACE (Michie 1961) Matchb Educable Nughts and Crsses Engine
29 The Overall Plan Lecture 1: What is Cmputatinal Reinfrcement Learning? Learning frm evaluative feedback Markv decisin prcesses Lecture 2: Dynamic Prgramming Basic Mnte Carl methds Tempral Difference methds A unified perspective Cnnectins t neurscience Lecture 3: Functin apprimatin Mdel-based methds Dimensins f Reinfrcement Learning A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,
30 Lecture 1, Part 2: Evaluative Feedback Evaluating actins vs. instructing by giving crrect actins Pure evaluative feedback depends ttally n the actin taken. Pure instructive feedback depends nt at all n the actin taken. Supervised learning is instructive; ptimizatin is evaluative Assciative vs. Nnassciative: Assciative: inputs mapped t utputs; learn the best utput fr each input Nnassciative: learn (find) ne best utput n-armed bandit (at least hw we treat it) is: Nnassciative Evaluative feedback A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,
31 The n-armed Bandit Prblem Chse repeatedly frm ne f n actins; each chice is called a play a t After each play, yu get a reward, where E r t a t = Q * (a t ) These are unknwn actin values Distributin f depends nly n Objective is t maimize the reward in the lng term, e.g., ver 1000 plays T slve the n-armed bandit prblem, yu must eplre a variety f actins and the eplit the best f them r t r t a t A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,
32 The Eplratin/Eplitatin Dilemma Suppse yu frm estimates Q t (a)! Q * (a) actin value estimates The greedy actin at t is a t * = argma a Q t (a) a t = a t *! eplitatin a t " a t*! eplratin Yu can t eplit all the time; yu can t eplre all the time Yu can never stp eplring; but yu shuld always reduce eplring A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,
33 Actin-Value Methds Methds that adapt actin-value estimates and nthing else, e.g.: suppse by the t-th play, actin a had been chsen k a times, prducing rewards r, r, K, r 1 2 k a, then Q t (a) = r 1 + r 2 + Lr k a k a sample average lim Q (a) = t k a! " Q* (a) A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,
34 ε-greedy Actin Selectin Greedy actin selectin: ε-greedy: a t = a t * = arg ma a Q t (a) a t = a t * with prbability 1! " { randm actin with prbability "... the simplest way t try t balance eplratin and eplitatin A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,
35 10-Armed Testbed n = 10 pssible actins Each each Q * (a) r t 1000 plays is chsen randmly frm a nrmal distributin: is als nrmal:!(q * (a t ),1) repeat the whle thing 2000 times and average the results!(0,1) A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,
36 ε-greedy Methds n the 10-Armed Testbed A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,
37 Sftma Actin Selectin Sftma actin selectin methds grade actin prbs. by estimated values. The mst cmmn sftma uses a Gibbs, r Bltzmann, distributin: Chse actin a n play t with prbability " e Q t (a)! n e Q t (b)! b=1 where! is the cmputatinal temperature, A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,
38 Linear Learning Autmata Let! t (a) = Pr{ a t = a} be the nly adapted parameter L R I (Linear, reward - inactin) On success :! t +1 (a t ) =! t (a t ) +"(1 #! t (a t )) 0 < " < 1 (the ther actin prbs. are adjusted t still sum t 1) On failure : n change L R -P (Linear, reward - penalty) On success :! t +1 (a t ) =! t (a t ) +"(1 #! t (a t )) 0 < " < 1 (the ther actin prbs. are adjusted t still sum t 1) On failure :! t +1 (a t ) =! t (a t ) +"(0 #! t (a t )) 0 < " < 1 Fr tw actins, a stchastic, incremental versin f the supervised algrithm A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,
39 Incremental Implementatin Recall the sample average estimatin methd: The average f the first k rewards is (drpping the dependence n ): a Q k = r 1 + r 2 +Lr k k Can we d this incrementally (withut string all the rewards)? We culd keep a running sum and cunt, r, equivalently: Q k +1 = Q k + 1 [ k +1 r k +1! Q k ] This is a cmmn frm fr update rules: NewEstimate = OldEstimate + StepSize[Target OldEstimate] A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,
40 Tracking a Nnstatinary Prblem Chsing Q k t be a sample average is apprpriate in a statinary prblem, i.e., when nne f the Q * (a) change ver time, But nt in a nnstatinary prblem. Better in the nnstatinary case is: Q k +1 = Q k +! [ r k +1 " Q k ] fr cnstant!, 0 <! # 1 = (1"!) k Q 0 + $!(1 "!) k "i r i k i =1 epnential, recency-weighted average A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,
41 Optimistic Initial Values All methds s far depend n Q 0 (a), i.e., they are biased. Suppse instead we initialize the actin values ptimistically, i.e., n the 10-armed testbed, use Q 0 (a) = 5 fr all a A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,
42 Reinfrcement Cmparisn Cmpare rewards t a reference reward, average f bserved rewards, e.g., an Strengthen r weaken the actin taken depending n Let p t (a) dente the preference fr actin a Preferences determine actin prbabilities, e.g., by Gibbs distributin: Then:! t (a) = Pr{ a t = a} = p t +1 (a t ) = p t (a) + r t! r t " r t e p t (a) n e p t (b) b=1 [ ] and r t+1 = r t + "[ r t! r ] t r t! r t A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,
43 Assciative Search Imagine switching bandits at each play Bandit 3 actins A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,
44 Cnclusins These are all very simple methds but they are cmplicated enugh we will build n them Ideas fr imprvements: estimating uncertainties... interval estimatin apprimating Bayes ptimal slutins Gittens indices The full RL prblem ffers sme ideas fr slutin... A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,
45 The Overall Plan Lecture 1: What is Cmputatinal Reinfrcement Learning? Learning frm evaluative feedback Markv decisin prcesses Lecture 2: Dynamic Prgramming Basic Mnte Carl methds Tempral Difference methds A unified perspective Cnnectins t neurscience Lecture 3: Functin apprimatin Mdel-based methds Dimensins f Reinfrcement Learning A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,
46 Lecture 1, Part 3: Markv Decisin Prcesses Objectives f this part: describe the RL prblem in terms f MDPs present idealized frm f the RL prblem fr which we have precise theretical results; intrduce key cmpnents f the mathematics: value functins and Bellman equatins; describe trade-ffs between applicability and mathematical tractability. A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,
47 The Agent-Envirnment Interface Agent and envirnment interact at discrete time steps Agent bserves state at step t : s t!s prduces actin at step t : a t! A(s t ) gets resulting reward : r t +1!" : t = 0, 1, 2, K and resulting net state : s t s t a t r t +1 s t +1 a t +1 r t +2 s t +2 t +2 a r t +3 s t a t +3 A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,
48 The Agent Learns a Plicy Plicy at step t,! t : a mapping frm states t actin prbabilities! t (s, a) = prbability that a t = a when s t = s Reinfrcement learning methds specify hw the agent changes its plicy as a result f eperience. Rughly, the agent s gal is t get as much reward as it can ver the lng run. A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,
49 Getting the Degree f Abstractin Right Time steps need nt refer t fied intervals f real time. Actins can be lw level (e.g., vltages t mtrs), r high level (e.g., accept a jb ffer), mental (e.g., shift in fcus f attentin), etc. States can be lw-level sensatins, r they can be abstract, symblic, based n memry, r subjective (e.g., the state f being surprised r lst ). An RL agent is nt like a whle animal r rbt, which cnsist f many RL agents as well as ther cmpnents. The envirnment is nt necessarily unknwn t the agent, nly incmpletely cntrllable. Reward cmputatin is in the agent s envirnment because the agent cannt change it arbitrarily. A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,
50 Gals and Rewards Is a scalar reward signal an adequate ntin f a gal? maybe nt, but it is surprisingly fleible. A gal shuld specify what we want t achieve, nt hw we want t achieve it. A gal must be utside the agent s direct cntrl thus utside the agent. The agent must be able t measure success: eplicitly; frequently during its lifespan. A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,
51 Returns Suppse the sequence f rewards after step t is : r t +1, r t+ 2, r t + 3, K What d we want t maimize? In general, we want t maimize the epected return, E{ R t }, fr each step t. Episdic tasks: interactin breaks naturally int episdes, e.g., plays f a game, trips thrugh a maze. R t = r t +1 + r t +2 +L + r T, where T is a final time step at which a terminal state is reached, ending an episde. A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,
52 Returns fr Cntinuing Tasks Cntinuing tasks: interactin des nt have natural episdes. Discunted return: " # k =0 R t = r t +1 +! r t+ 2 +! 2 r t +3 +L =! k r t + k +1, where!, 0 $! $ 1, is the discunt rate. shrtsighted 0! " # 1 farsighted A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,
53 An Eample Avid failure: the ple falling beynd a critical angle r the cart hitting end f track. As an episdic task where episde ends upn failure: reward = +1 fr each step befre failure! return = number f steps befre failure As a cntinuing task with discunted return: reward =!1 upn failure; 0 therwise " return is related t! # k, fr k steps befre failure In either case, return is maimized by aviding failure fr as lng as pssible. A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,
54 Anther Eample Get t the tp f the hill as quickly as pssible. reward =!1 fr each step where nt at tp f hill " return =! number f steps befre reaching tp f hill Return is maimized by minimizing number f steps reach the tp f the hill. A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,
55 A Unified Ntatin In episdic tasks, we number the time steps f each episde starting frm zer. We usually d nt have t distinguish between episdes, s we write s t instead f s t, j fr the state at step t f episde j. Think f each episde as ending in an absrbing state that always prduces reward f zer: We can cver all cases by writing R t =! k r t +k +1, where! can be 1 nly if a zer reward absrbing state is always reached. " # k =0 A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,
56 The Markv Prperty By the state at step t, we mean whatever infrmatin is available t the agent at step t abut its envirnment. The state can include immediate sensatins, highly prcessed sensatins, and structures built up ver time frm sequences f sensatins. Ideally, a state shuld summarize past sensatins s as t retain all essential infrmatin, i.e., it shuld have the Markv Prperty: Pr{ s t +1 = s!, r t +1 = r s t,a t,r t, s t "1,a t "1,K, r 1,s 0,a } 0 = Pr{ s t +1 = s!, r t +1 = r s t,a } t fr all s!, r, and histries s t,a t,r t, s t "1,a t "1,K, r 1, s 0,a 0. A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,
57 Markv Decisin Prcesses If a reinfrcement learning task has the Markv Prperty, it is basically a Markv Decisin Prcess (MDP). If state and actin sets are finite, it is a finite MDP. T define a finite MDP, yu need t give: state and actin sets ne-step dynamics defined by transitin prbabilities: P s s! { } fr all s,! a = Pr s t +1 = s! s t = s, a t = a s "S, a "A(s). reward epectatins: a R s! = E{ r t +1 s t = s, a t = a, s t +1 = s!} fr all s, s! "S, a "A(s). A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,
58 An Eample Finite MDP Recycling Rbt At each step, rbt has t decide whether it shuld (1) actively search fr a can, (2) wait fr smene t bring it a can, r (3) g t hme base and recharge. Searching is better but runs dwn the battery; if runs ut f pwer while searching, has t be rescued (which is bad). Decisins made n basis f current energy level: high, lw. Reward = number f cans cllected A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,
59 Recycling Rbt MDP S = { high, lw} A(high) = { search, wait} A(lw) = { search, wait, recharge } R search = epected n. f cans while searching R wait = epected n. f cans while waiting R search > R wait A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,
60 Value Functins The value f a state is the epected return starting frm that state; depends n the agent s plicy: State - value functin fr plicy! : V! (s) = E! R t s t = s { } = E! & $ " k r t +k +1 s t = s The value f taking an actin in a state under plicy π is the epected return starting frm that state, taking that actin, and thereafter fllwing π : A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press, % ' # k =0 Actin- value functin fr plicy! : { } = E! & $ " k r t + k +1 s t = s,a t = a Q! (s, a) = E! R t s t = s, a t = a % ' # k = 0 ( ) * ( ) *
61 Bellman Equatin fr a Plicy π The basic idea: R t = r t +1 +! r t +2 +! 2 r t + 3 +! 3 r t + 4 L = r t +1 +! ( r t +2 +! r t +3 +! 2 r t + 4 L) = r t +1 +! R t +1 S: V " (s) = E " { R t s t = s} { } = E " r t +1 + #V " ( s t +1 ) s t = s Or, withut the epectatin peratr: $ V! (s) =!(s, a) P a s s " a $ s " [ + # V! ( s ")] R a s s " A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,
62 Mre n the Bellman Equatin $ V! (s) =!(s, a) P a s s " a $ s " [ + # V! ( s ")] R a s s " This is a set f equatins (in fact, linear), ne fr each state. The value functin fr π is its unique slutin. Backup diagrams: fr V! fr Q! A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,
63 What abut Value Functin fr Q!? Q " (s,a) = P a s " # s R s s " s " [ a + # $ "( s #, a # ) Q " ( s #, a #) ] a # A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,
64 Gridwrld Actins: nrth, suth, east, west; deterministic. If wuld take agent ff the grid: n mve but reward = 1 Other actins prduce reward = 0, ecept actins that mve agent ut f special states A and B as shwn. State-value functin fr equiprbable randm plicy; γ = 0.9 Nte: A s value is less than immediate reward B s value is mre than immediate reward A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,
65 Optimal Value Functins Fr finite MDPs, plicies can be partially rdered:! "! # if and nly if V! (s) " V! # (s) fr all s $S There is always at least ne (and pssibly many) plicies that is better than r equal t all the thers. This is an ptimal plicy. We dente them all π *. Optimal plicies share the same ptimal state-value functin: V! (s) = ma V " (s) fr all s #S " Optimal plicies als share the same ptimal actin-value functin: Q! (s, a) = ma " Q " (s, a) fr all s #S and a #A(s) This is the epected return fr taking actin a in state s and thereafter fllwing an ptimal plicy. A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,
66 Bellman Optimality Equatin fr V* The value f a state under an ptimal plicy must equal the epected return fr the best actin frm that state: V! (s) = ma Q #! (s,a) a"a( s) = ma a"a( s) = ma a"a( s) E{ r t +1 + $ V! (s t +1 ) s t = s, a t = a} & P a R a s s % s s % s % The relevant backup diagram: [ + $ V! ( s %)] V! is the unique slutin f this system f nnlinear equatins. A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,
67 Bellman Optimality Equatin fr Q* { } Q " (s,a) = E r t +1 + # maq " ( s $, a $ ) s t = s,a t = a a $ % s $ = P s $ [ ] a s R a s s $ + # maq " ( s $, a $ ) a $ The relevant backup diagram: Q * is the unique slutin f this system f nnlinear equatins. A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,
68 Why Optimal State-Value Functins are Useful Any plicy that is greedy with respect t V! is an ptimal plicy. V! Therefre, given, ne-step-ahead search prduces the lng-term ptimal actins. E.g., back t the gridwrld: A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,
69 Car-n-the-Hill Optimal Value Functin Predicted minimum time t gal (negated) Get t the tp f the hill as quickly as pssible (rughly) Muns & Mre Variable reslutin discretizatin fr high-accuracy slutins f ptimal cntrl prblems, IJCAI 99. A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,
70 What Abut Optimal Actin-Value Functins? Q * Given, the agent des nt even have t d a ne-step-ahead search:! " (s) = arg ma Q " (s, a) a#a(s) A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,
71 Slving the Bellman Optimality Equatin Finding an ptimal plicy by slving the Bellman Optimality Equatin requires the fllwing: accurate knwledge f envirnment dynamics; we have enugh space and time t d the cmputatin; the Markv Prperty. Hw much space and time d we need? plynmial in number f states (via dynamic prgramming methds; net lecture), BUT, number f states is ften huge (e.g., backgammn has abut 10**20 states). We usually have t settle fr apprimatins. Many RL methds can be understd as apprimately slving the Bellman Optimality Equatin. A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,
72 Semi-Markv Decisin Prcesses (SMDPs) Generalizatin f an MDP where there is a waiting, r dwell, time τ in each state Transitin prbabilities generalize t P( s ",# s,a) Bellman equatins generalize, e.g., fr a discrete time SMDP: V * (s) = ma a "A(s) & s #,$ [ ] s P( s #,$ s,a) R a + % $ V * ( # s # s ) a R s s " where is nw the amunt f discunted reward epected t accumulate ver the waiting time in s upn ding a and ending up in s A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,
73 Summary Agent-envirnment interactin States Actins Rewards Plicy: stchastic rule fr selecting actins Return: the functin f future rewards agent tries t maimize Episdic and cntinuing tasks Markv Prperty Markv Decisin Prcess Transitin prbabilities Epected rewards Value functins State-value functin fr a plicy Actin-value functin fr a plicy Optimal state-value functin Optimal actin-value functin Optimal value functins Optimal plicies Bellman Equatins The need fr apprimatin Semi-Markv Decisin Prcesses A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,
74 Edward L. Thrndike ( ) Learning by Trial-and-Errr puzzle b A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,
75 Law f Effect Of several respnses made t the same situatin, thse which are accmpanied r clsely fllwed by satisfactin t the animal will, ther things being equal, be mre firmly cnnected with the situatin, s that, when it recurs, they will be mre likely t recur; thse which are accmpanied r clsely fllwed by discmfrt t the animal will, ther things being equal, have their cnnectins with that situatin weakened, s that when it recurs, they will be less likely t ccur. Edward Thrndike, 1911 A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,
76 Search + Memry Search: Trial-and-Errr, Generate-and-Test, Variatin-and- Selectin Memry: remember what wrked best fr each situatin and start frm there net time A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,
77 Credit Assignment Prblem Marvin Minsky, 1961 Getting useful training infrmatin t the right places at the right times Spatial Tempral A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,
78 The Overall Plan Lecture 1: What is Cmputatinal Reinfrcement Learning? Learning frm evaluative feedback Markv decisin prcesses Lecture 2: Basic Mnte Carl methds Dynamic Prgramming Tempral Difference methds A unified perspective Cnnectins t neurscience Lecture 3: Functin apprimatin Mdel-based methds Dimensins f Reinfrcement Learning A. G. Bart, Barcelna Lectures, April Based n R. S. Suttn and A. G. Bart: Reinfrcement Learning: An Intrductin, MIT Press,
Reinforcement Learning" CMPSCI 383 Nov 29, 2011!
Reinfrcement Learning" CMPSCI 383 Nv 29, 2011! 1 Tdayʼs lecture" Review f Chapter 17: Making Cmple Decisins! Sequential decisin prblems! The mtivatin and advantages f reinfrcement learning.! Passive learning!
More informationAdmin. MDP Search Trees. Optimal Quantities. Reinforcement Learning
Admin Reinfrcement Learning Cntent adapted frm Berkeley CS188 MDP Search Trees Each MDP state prjects an expectimax-like search tree Optimal Quantities The value (utility) f a state s: V*(s) = expected
More informationReinforcement Learning. Up until now we have been
Reinforcement Learning Slides by Rich Sutton Mods by Dan Lizotte Refer to Reinforcement Learning: An Introduction by Sutton and Barto Alpaydin Chapter 16 Up until now we have been Supervised Learning Classifying,
More informationReinforcement Learning
Reinforcement Learning 1 Reinforcement Learning Mainly based on Reinforcement Learning An Introduction by Richard Sutton and Andrew Barto Slides are mainly based on the course material provided by the
More informationCS599 Lecture 1 Introduction To RL
CS599 Lecture 1 Introduction To RL Reinforcement Learning Introduction Learning from rewards Policies Value Functions Rewards Models of the Environment Exploitation vs. Exploration Dynamic Programming
More informationChapter 3: The Reinforcement Learning Problem
Chapter 3: The Reinforcement Learning Problem Objectives of this chapter: describe the RL problem we will be studying for the remainder of the course present idealized form of the RL problem for which
More informationLecture 2: Learning from Evaluative Feedback. or Bandit Problems
Lecture 2: Learning from Evaluative Feedback or Bandit Problems 1 Edward L. Thorndike (1874-1949) Puzzle Box 2 Learning by Trial-and-Error Law of Effect: Of several responses to the same situation, those
More informationYou need to be able to define the following terms and answer basic questions about them:
CS440/ECE448 Sectin Q Fall 2017 Midterm Review Yu need t be able t define the fllwing terms and answer basic questins abut them: Intr t AI, agents and envirnments Pssible definitins f AI, prs and cns f
More informationAdministrativia. Assignment 1 due thursday 9/23/2004 BEFORE midnight. Midterm exam 10/07/2003 in class. CS 460, Sessions 8-9 1
Administrativia Assignment 1 due thursday 9/23/2004 BEFORE midnight Midterm eam 10/07/2003 in class CS 460, Sessins 8-9 1 Last time: search strategies Uninfrmed: Use nly infrmatin available in the prblem
More informationLecture 3: The Reinforcement Learning Problem
Lecture 3: The Reinforcement Learning Problem Objectives of this lecture: describe the RL problem we will be studying for the remainder of the course present idealized form of the RL problem for which
More informationChapter 3: The Reinforcement Learning Problem
Chapter 3: The Reinforcement Learning Problem Objectives of this chapter: describe the RL problem we will be studying for the remainder of the course present idealized form of the RL problem for which
More informationDifferentiation Applications 1: Related Rates
Differentiatin Applicatins 1: Related Rates 151 Differentiatin Applicatins 1: Related Rates Mdel 1: Sliding Ladder 10 ladder y 10 ladder 10 ladder A 10 ft ladder is leaning against a wall when the bttm
More informationPattern Recognition 2014 Support Vector Machines
Pattern Recgnitin 2014 Supprt Vectr Machines Ad Feelders Universiteit Utrecht Ad Feelders ( Universiteit Utrecht ) Pattern Recgnitin 1 / 55 Overview 1 Separable Case 2 Kernel Functins 3 Allwing Errrs (Sft
More informationx 1 Outline IAML: Logistic Regression Decision Boundaries Example Data
Outline IAML: Lgistic Regressin Charles Suttn and Victr Lavrenk Schl f Infrmatics Semester Lgistic functin Lgistic regressin Learning lgistic regressin Optimizatin The pwer f nn-linear basis functins Least-squares
More informationActivity Guide Loops and Random Numbers
Unit 3 Lessn 7 Name(s) Perid Date Activity Guide Lps and Randm Numbers CS Cntent Lps are a relatively straightfrward idea in prgramming - yu want a certain chunk f cde t run repeatedly - but it takes a
More informationMath Foundations 20 Work Plan
Math Fundatins 20 Wrk Plan Units / Tpics 20.8 Demnstrate understanding f systems f linear inequalities in tw variables. Time Frame December 1-3 weeks 6-10 Majr Learning Indicatrs Identify situatins relevant
More informationSPH3U1 Lesson 06 Kinematics
PROJECTILE MOTION LEARNING GOALS Students will: Describe the mtin f an bject thrwn at arbitrary angles thrugh the air. Describe the hrizntal and vertical mtins f a prjectile. Slve prjectile mtin prblems.
More informationA New Evaluation Measure. J. Joiner and L. Werner. The problems of evaluation and the needed criteria of evaluation
III-l III. A New Evaluatin Measure J. Jiner and L. Werner Abstract The prblems f evaluatin and the needed criteria f evaluatin measures in the SMART system f infrmatin retrieval are reviewed and discussed.
More informationBootstrap Method > # Purpose: understand how bootstrap method works > obs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(obs) >
Btstrap Methd > # Purpse: understand hw btstrap methd wrks > bs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(bs) > mean(bs) [1] 21.64625 > # estimate f lambda > lambda = 1/mean(bs);
More informationLab 1 The Scientific Method
INTRODUCTION The fllwing labratry exercise is designed t give yu, the student, an pprtunity t explre unknwn systems, r universes, and hypthesize pssible rules which may gvern the behavir within them. Scientific
More informationCAUSAL INFERENCE. Technical Track Session I. Phillippe Leite. The World Bank
CAUSAL INFERENCE Technical Track Sessin I Phillippe Leite The Wrld Bank These slides were develped by Christel Vermeersch and mdified by Phillippe Leite fr the purpse f this wrkshp Plicy questins are causal
More informationDepartment of Electrical Engineering, University of Waterloo. Introduction
Sectin 4: Sequential Circuits Majr Tpics Types f sequential circuits Flip-flps Analysis f clcked sequential circuits Mre and Mealy machines Design f clcked sequential circuits State transitin design methd
More informationEdward L. Thorndike #1874$1949% Lecture 2: Learning from Evaluative Feedback. or!bandit Problems" Learning by Trial$and$Error.
Lecture 2: Learning from Evaluative Feedback Edward L. Thorndike #1874$1949% or!bandit Problems" Puzzle Box 1 2 Learning by Trial$and$Error Law of E&ect:!Of several responses to the same situation, those
More informationModelling of Clock Behaviour. Don Percival. Applied Physics Laboratory University of Washington Seattle, Washington, USA
Mdelling f Clck Behaviur Dn Percival Applied Physics Labratry University f Washingtn Seattle, Washingtn, USA verheads and paper fr talk available at http://faculty.washingtn.edu/dbp/talks.html 1 Overview
More informationCOMP9414/ 9814/ 3411: Artificial Intelligence. 14. Course Review. COMP3411 c UNSW, 2014
COMP9414/ 9814/ 3411: Artificial Intelligence 14. Curse Review COMP9414/9814/3411 14s1 Review 1 Assessment Assessable cmpnents f the curse: Assignment 1 10% Assignment 2 8% Assignment 3 12% Written Eam
More information, which yields. where z1. and z2
The Gaussian r Nrmal PDF, Page 1 The Gaussian r Nrmal Prbability Density Functin Authr: Jhn M Cimbala, Penn State University Latest revisin: 11 September 13 The Gaussian r Nrmal Prbability Density Functin
More informationChapter 3: Cluster Analysis
Chapter 3: Cluster Analysis } 3.1 Basic Cncepts f Clustering 3.1.1 Cluster Analysis 3.1. Clustering Categries } 3. Partitining Methds 3..1 The principle 3.. K-Means Methd 3..3 K-Medids Methd 3..4 CLARA
More informationTree Structured Classifier
Tree Structured Classifier Reference: Classificatin and Regressin Trees by L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stne, Chapman & Hall, 98. A Medical Eample (CART): Predict high risk patients
More informationResampling Methods. Cross-validation, Bootstrapping. Marek Petrik 2/21/2017
Resampling Methds Crss-validatin, Btstrapping Marek Petrik 2/21/2017 Sme f the figures in this presentatin are taken frm An Intrductin t Statistical Learning, with applicatins in R (Springer, 2013) with
More informationFive Whys How To Do It Better
Five Whys Definitin. As explained in the previus article, we define rt cause as simply the uncvering f hw the current prblem came int being. Fr a simple causal chain, it is the entire chain. Fr a cmplex
More informationPhysics 2010 Motion with Constant Acceleration Experiment 1
. Physics 00 Mtin with Cnstant Acceleratin Experiment In this lab, we will study the mtin f a glider as it accelerates dwnhill n a tilted air track. The glider is supprted ver the air track by a cushin
More informationCOMP9444 Neural Networks and Deep Learning 3. Backpropagation
COMP9444 Neural Netwrks and Deep Learning 3. Backprpagatin Tetbk, Sectins 4.3, 5.2, 6.5.2 COMP9444 17s2 Backprpagatin 1 Outline Supervised Learning Ockham s Razr (5.2) Multi-Layer Netwrks Gradient Descent
More informationCourse basics. CSE 190: Reinforcement Learning: An Introduction. Last Time. Course goals. The website for the class is linked off my homepage.
Course basics CSE 190: Reinforcement Learning: An Introduction The website for the class is linked off my homepage. Grades will be based on programming assignments, homeworks, and class participation.
More informationDetermining Optimum Path in Synthesis of Organic Compounds using Branch and Bound Algorithm
Determining Optimum Path in Synthesis f Organic Cmpunds using Branch and Bund Algrithm Diastuti Utami 13514071 Prgram Studi Teknik Infrmatika Seklah Teknik Elektr dan Infrmatika Institut Teknlgi Bandung,
More informationThe blessing of dimensionality for kernel methods
fr kernel methds Building classifiers in high dimensinal space Pierre Dupnt Pierre.Dupnt@ucluvain.be Classifiers define decisin surfaces in sme feature space where the data is either initially represented
More informationPhysical Layer: Outline
18-: Intrductin t Telecmmunicatin Netwrks Lectures : Physical Layer Peter Steenkiste Spring 01 www.cs.cmu.edu/~prs/nets-ece Physical Layer: Outline Digital Representatin f Infrmatin Characterizatin f Cmmunicatin
More informationIAML: Support Vector Machines
1 / 22 IAML: Supprt Vectr Machines Charles Suttn and Victr Lavrenk Schl f Infrmatics Semester 1 2 / 22 Outline Separating hyperplane with maimum margin Nn-separable training data Epanding the input int
More informationIn SMV I. IAML: Support Vector Machines II. This Time. The SVM optimization problem. We saw:
In SMV I IAML: Supprt Vectr Machines II Nigel Gddard Schl f Infrmatics Semester 1 We sa: Ma margin trick Gemetry f the margin and h t cmpute it Finding the ma margin hyperplane using a cnstrained ptimizatin
More informationInternal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9.
Sectin 7 Mdel Assessment This sectin is based n Stck and Watsn s Chapter 9. Internal vs. external validity Internal validity refers t whether the analysis is valid fr the ppulatin and sample being studied.
More informationTuring Machines. Human-aware Robotics. 2017/10/17 & 19 Chapter 3.2 & 3.3 in Sipser Ø Announcement:
Turing Machines Human-aware Rbtics 2017/10/17 & 19 Chapter 3.2 & 3.3 in Sipser Ø Annuncement: q q q q Slides fr this lecture are here: http://www.public.asu.edu/~yzhan442/teaching/cse355/lectures/tm-ii.pdf
More informationCHAPTER 3 INEQUALITIES. Copyright -The Institute of Chartered Accountants of India
CHAPTER 3 INEQUALITIES Cpyright -The Institute f Chartered Accuntants f India INEQUALITIES LEARNING OBJECTIVES One f the widely used decisin making prblems, nwadays, is t decide n the ptimal mix f scarce
More informationSupport-Vector Machines
Supprt-Vectr Machines Intrductin Supprt vectr machine is a linear machine with sme very nice prperties. Haykin chapter 6. See Alpaydin chapter 13 fr similar cntent. Nte: Part f this lecture drew material
More informationComputational modeling techniques
Cmputatinal mdeling techniques Lecture 4: Mdel checing fr ODE mdels In Petre Department f IT, Åb Aademi http://www.users.ab.fi/ipetre/cmpmd/ Cntent Stichimetric matrix Calculating the mass cnservatin relatins
More informationDead-beat controller design
J. Hetthéssy, A. Barta, R. Bars: Dead beat cntrller design Nvember, 4 Dead-beat cntrller design In sampled data cntrl systems the cntrller is realised by an intelligent device, typically by a PLC (Prgrammable
More informationAssessment Primer: Writing Instructional Objectives
Assessment Primer: Writing Instructinal Objectives (Based n Preparing Instructinal Objectives by Mager 1962 and Preparing Instructinal Objectives: A critical tl in the develpment f effective instructin
More informationCS 477/677 Analysis of Algorithms Fall 2007 Dr. George Bebis Course Project Due Date: 11/29/2007
CS 477/677 Analysis f Algrithms Fall 2007 Dr. Gerge Bebis Curse Prject Due Date: 11/29/2007 Part1: Cmparisn f Srting Algrithms (70% f the prject grade) The bjective f the first part f the assignment is
More informationMODULE 1. e x + c. [You can t separate a demominator, but you can divide a single denominator into each numerator term] a + b a(a + b)+1 = a + b
. REVIEW OF SOME BASIC ALGEBRA MODULE () Slving Equatins Yu shuld be able t slve fr x: a + b = c a d + e x + c and get x = e(ba +) b(c a) d(ba +) c Cmmn mistakes and strategies:. a b + c a b + a c, but
More informationENSC Discrete Time Systems. Project Outline. Semester
ENSC 49 - iscrete Time Systems Prject Outline Semester 006-1. Objectives The gal f the prject is t design a channel fading simulatr. Upn successful cmpletin f the prject, yu will reinfrce yur understanding
More informationMedium Scale Integrated (MSI) devices [Sections 2.9 and 2.10]
EECS 270, Winter 2017, Lecture 3 Page 1 f 6 Medium Scale Integrated (MSI) devices [Sectins 2.9 and 2.10] As we ve seen, it s smetimes nt reasnable t d all the design wrk at the gate-level smetimes we just
More informationPart 3 Introduction to statistical classification techniques
Part 3 Intrductin t statistical classificatin techniques Machine Learning, Part 3, March 07 Fabi Rli Preamble ØIn Part we have seen that if we knw: Psterir prbabilities P(ω i / ) Or the equivalent terms
More informationHypothesis Tests for One Population Mean
Hypthesis Tests fr One Ppulatin Mean Chapter 9 Ala Abdelbaki Objective Objective: T estimate the value f ne ppulatin mean Inferential statistics using statistics in rder t estimate parameters We will be
More informationmaking triangle (ie same reference angle) ). This is a standard form that will allow us all to have the X= y=
Intrductin t Vectrs I 21 Intrductin t Vectrs I 22 I. Determine the hrizntal and vertical cmpnents f the resultant vectr by cunting n the grid. X= y= J. Draw a mangle with hrizntal and vertical cmpnents
More informationLecture 13: Markov Chain Monte Carlo. Gibbs sampling
Lecture 13: Markv hain Mnte arl Gibbs sampling Gibbs sampling Markv chains 1 Recall: Apprximate inference using samples Main idea: we generate samples frm ur Bayes net, then cmpute prbabilities using (weighted)
More informationIntroduction to Spacetime Geometry
Intrductin t Spacetime Gemetry Let s start with a review f a basic feature f Euclidean gemetry, the Pythagrean therem. In a twdimensinal crdinate system we can relate the length f a line segment t the
More informationk-nearest Neighbor How to choose k Average of k points more reliable when: Large k: noise in attributes +o o noise in class labels
Mtivating Example Memry-Based Learning Instance-Based Learning K-earest eighbr Inductive Assumptin Similar inputs map t similar utputs If nt true => learning is impssible If true => learning reduces t
More informationReinforcement Learning (1)
Reinforcement Learning 1 Reinforcement Learning (1) Machine Learning 64-360, Part II Norman Hendrich University of Hamburg, Dept. of Informatics Vogt-Kölln-Str. 30, D-22527 Hamburg hendrich@informatik.uni-hamburg.de
More informationAP Physics Kinematic Wrap Up
AP Physics Kinematic Wrap Up S what d yu need t knw abut this mtin in tw-dimensin stuff t get a gd scre n the ld AP Physics Test? First ff, here are the equatins that yu ll have t wrk with: v v at x x
More informationWhat is Statistical Learning?
What is Statistical Learning? Sales 5 10 15 20 25 Sales 5 10 15 20 25 Sales 5 10 15 20 25 0 50 100 200 300 TV 0 10 20 30 40 50 Radi 0 20 40 60 80 100 Newspaper Shwn are Sales vs TV, Radi and Newspaper,
More informationSequential Allocation with Minimal Switching
In Cmputing Science and Statistics 28 (1996), pp. 567 572 Sequential Allcatin with Minimal Switching Quentin F. Stut 1 Janis Hardwick 1 EECS Dept., University f Michigan Statistics Dept., Purdue University
More information1996 Engineering Systems Design and Analysis Conference, Montpellier, France, July 1-4, 1996, Vol. 7, pp
THE POWER AND LIMIT OF NEURAL NETWORKS T. Y. Lin Department f Mathematics and Cmputer Science San Jse State University San Jse, Califrnia 959-003 tylin@cs.ssu.edu and Bereley Initiative in Sft Cmputing*
More informationLab #3: Pendulum Period and Proportionalities
Physics 144 Chwdary Hw Things Wrk Spring 2006 Name: Partners Name(s): Intrductin Lab #3: Pendulum Perid and Prprtinalities Smetimes, it is useful t knw the dependence f ne quantity n anther, like hw the
More informationDepartment of Economics, University of California, Davis Ecn 200C Micro Theory Professor Giacomo Bonanno. Insurance Markets
Department f Ecnmics, University f alifrnia, Davis Ecn 200 Micr Thery Prfessr Giacm Bnann Insurance Markets nsider an individual wh has an initial wealth f. ith sme prbability p he faces a lss f x (0
More informationRelationship Between Amplifier Settling Time and Pole-Zero Placements for Second-Order Systems *
Relatinship Between Amplifier Settling Time and Ple-Zer Placements fr Secnd-Order Systems * Mark E. Schlarmann and Randall L. Geiger Iwa State University Electrical and Cmputer Engineering Department Ames,
More informationDataflow Analysis and Abstract Interpretation
Dataflw Analysis and Abstract Interpretatin Cmputer Science and Artificial Intelligence Labratry MIT Nvember 9, 2015 Recap Last time we develped frm first principles an algrithm t derive invariants. Key
More informationCOMP 551 Applied Machine Learning Lecture 4: Linear classification
COMP 551 Applied Machine Learning Lecture 4: Linear classificatin Instructr: Jelle Pineau (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/cmp551 Unless therwise nted, all material psted
More informationDetermining the Accuracy of Modal Parameter Estimation Methods
Determining the Accuracy f Mdal Parameter Estimatin Methds by Michael Lee Ph.D., P.E. & Mar Richardsn Ph.D. Structural Measurement Systems Milpitas, CA Abstract The mst cmmn type f mdal testing system
More informationLecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff
Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeff Reading: Chapter 2 STATS 202: Data mining and analysis September 27, 2017 1 / 20 Supervised vs. unsupervised learning In unsupervised
More informationNUMBERS, MATHEMATICS AND EQUATIONS
AUSTRALIAN CURRICULUM PHYSICS GETTING STARTED WITH PHYSICS NUMBERS, MATHEMATICS AND EQUATIONS An integral part t the understanding f ur physical wrld is the use f mathematical mdels which can be used t
More informationINSTRUMENTAL VARIABLES
INSTRUMENTAL VARIABLES Technical Track Sessin IV Sergi Urzua University f Maryland Instrumental Variables and IE Tw main uses f IV in impact evaluatin: 1. Crrect fr difference between assignment f treatment
More informationLecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff
Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeff Reading: Chapter 2 STATS 202: Data mining and analysis September 27, 2017 1 / 20 Supervised vs. unsupervised learning In unsupervised
More informationCOMP 551 Applied Machine Learning Lecture 9: Support Vector Machines (cont d)
COMP 551 Applied Machine Learning Lecture 9: Supprt Vectr Machines (cnt d) Instructr: Herke van Hf (herke.vanhf@mail.mcgill.ca) Slides mstly by: Class web page: www.cs.mcgill.ca/~hvanh2/cmp551 Unless therwise
More informationinitially lcated away frm the data set never win the cmpetitin, resulting in a nnptimal nal cdebk, [2] [3] [4] and [5]. Khnen's Self Organizing Featur
Cdewrd Distributin fr Frequency Sensitive Cmpetitive Learning with One Dimensinal Input Data Aristides S. Galanpuls and Stanley C. Ahalt Department f Electrical Engineering The Ohi State University Abstract
More informationBasics. Primary School learning about place value is often forgotten and can be reinforced at home.
Basics When pupils cme t secndary schl they start a lt f different subjects and have a lt f new interests but it is still imprtant that they practise their basic number wrk which may nt be reinfrced as
More informationThermodynamics and Equilibrium
Thermdynamics and Equilibrium Thermdynamics Thermdynamics is the study f the relatinship between heat and ther frms f energy in a chemical r physical prcess. We intrduced the thermdynamic prperty f enthalpy,
More informationThis section is primarily focused on tools to aid us in finding roots/zeros/ -intercepts of polynomials. Essentially, our focus turns to solving.
Sectin 3.2: Many f yu WILL need t watch the crrespnding vides fr this sectin n MyOpenMath! This sectin is primarily fcused n tls t aid us in finding rts/zers/ -intercepts f plynmials. Essentially, ur fcus
More informationThermodynamics Partial Outline of Topics
Thermdynamics Partial Outline f Tpics I. The secnd law f thermdynamics addresses the issue f spntaneity and invlves a functin called entrpy (S): If a prcess is spntaneus, then Suniverse > 0 (2 nd Law!)
More informationx x
Mdeling the Dynamics f Life: Calculus and Prbability fr Life Scientists Frederick R. Adler cfrederick R. Adler, Department f Mathematics and Department f Bilgy, University f Utah, Salt Lake City, Utah
More informationKinetic Model Completeness
5.68J/10.652J Spring 2003 Lecture Ntes Tuesday April 15, 2003 Kinetic Mdel Cmpleteness We say a chemical kinetic mdel is cmplete fr a particular reactin cnditin when it cntains all the species and reactins
More informationElements of Machine Intelligence - I
ECE-175A Elements f Machine Intelligence - I Ken Kreutz-Delgad Nun Vascncels ECE Department, UCSD Winter 2011 The curse The curse will cver basic, but imprtant, aspects f machine learning and pattern recgnitin
More informationExample 1. A robot has a mass of 60 kg. How much does that robot weigh sitting on the earth at sea level? Given: m. Find: Relationships: W
Eample 1 rbt has a mass f 60 kg. Hw much des that rbt weigh sitting n the earth at sea level? Given: m Rbt = 60 kg ind: Rbt Relatinships: Slutin: Rbt =589 N = mg, g = 9.81 m/s Rbt = mrbt g = 60 9. 81 =
More informationSynchronous Motor V-Curves
Synchrnus Mtr V-Curves 1 Synchrnus Mtr V-Curves Intrductin Synchrnus mtrs are used in applicatins such as textile mills where cnstant speed peratin is critical. Mst small synchrnus mtrs cntain squirrel
More informationChapter 3 Digital Transmission Fundamentals
Chapter 3 Digital Transmissin Fundamentals Errr Detectin and Crrectin CSE 3213, Winter 2010 Instructr: Frhar Frzan Mdul-2 Arithmetic Mdul 2 arithmetic is perfrmed digit y digit n inary numers. Each digit
More information2004 AP CHEMISTRY FREE-RESPONSE QUESTIONS
2004 AP CHEMISTRY FREE-RESPONSE QUESTIONS 6. An electrchemical cell is cnstructed with an pen switch, as shwn in the diagram abve. A strip f Sn and a strip f an unknwn metal, X, are used as electrdes.
More informationEric Klein and Ning Sa
Week 12. Statistical Appraches t Netwrks: p1 and p* Wasserman and Faust Chapter 15: Statistical Analysis f Single Relatinal Netwrks There are fur tasks in psitinal analysis: 1) Define Equivalence 2) Measure
More informationWe can see from the graph above that the intersection is, i.e., [ ).
MTH 111 Cllege Algebra Lecture Ntes July 2, 2014 Functin Arithmetic: With nt t much difficulty, we ntice that inputs f functins are numbers, and utputs f functins are numbers. S whatever we can d with
More informationLecture 7: Damped and Driven Oscillations
Lecture 7: Damped and Driven Oscillatins Last time, we fund fr underdamped scillatrs: βt x t = e A1 + A csω1t + i A1 A sinω1t A 1 and A are cmplex numbers, but ur answer must be real Implies that A 1 and
More informationCompetency Statements for Wm. E. Hay Mathematics for grades 7 through 12:
Cmpetency Statements fr Wm. E. Hay Mathematics fr grades 7 thrugh 12: Upn cmpletin f grade 12 a student will have develped a cmbinatin f sme/all f the fllwing cmpetencies depending upn the stream f math
More informationBLAST / HIDDEN MARKOV MODELS
CS262 (Winter 2015) Lecture 5 (January 20) Scribe: Kat Gregry BLAST / HIDDEN MARKOV MODELS BLAST CONTINUED HEURISTIC LOCAL ALIGNMENT Use Cmmnly used t search vast bilgical databases (n the rder f terabases/tetrabases)
More informationA - LEVEL MATHEMATICS 2018/2019
A - LEVEL MATHEMATICS 2018/2019 STRUCTURE OF THE COURSE Yur maths A-Level Maths curse cvers Pure Mathematics, Mechanics and Statistics. Yu will be eamined at the end f the tw-year curse. The assessment
More informationCOMP 551 Applied Machine Learning Lecture 11: Support Vector Machines
COMP 551 Applied Machine Learning Lecture 11: Supprt Vectr Machines Instructr: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/cmp551 Unless therwise nted, all material psted fr this curse
More informationThe Law of Total Probability, Bayes Rule, and Random Variables (Oh My!)
The Law f Ttal Prbability, Bayes Rule, and Randm Variables (Oh My!) Administrivia Hmewrk 2 is psted and is due tw Friday s frm nw If yu didn t start early last time, please d s this time. Gd Milestnes:
More informationCOMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification
COMP 551 Applied Machine Learning Lecture 5: Generative mdels fr linear classificatin Instructr: Herke van Hf (herke.vanhf@mail.mcgill.ca) Slides mstly by: Jelle Pineau Class web page: www.cs.mcgill.ca/~hvanh2/cmp551
More informationMODULE FOUR. This module addresses functions. SC Academic Elementary Algebra Standards:
MODULE FOUR This mdule addresses functins SC Academic Standards: EA-3.1 Classify a relatinship as being either a functin r nt a functin when given data as a table, set f rdered pairs, r graph. EA-3.2 Use
More informationLHS Mathematics Department Honors Pre-Calculus Final Exam 2002 Answers
LHS Mathematics Department Hnrs Pre-alculus Final Eam nswers Part Shrt Prblems The table at the right gives the ppulatin f Massachusetts ver the past several decades Using an epnential mdel, predict the
More informationAP Statistics Notes Unit Two: The Normal Distributions
AP Statistics Ntes Unit Tw: The Nrmal Distributins Syllabus Objectives: 1.5 The student will summarize distributins f data measuring the psitin using quartiles, percentiles, and standardized scres (z-scres).
More informationBASD HIGH SCHOOL FORMAL LAB REPORT
BASD HIGH SCHOOL FORMAL LAB REPORT *WARNING: After an explanatin f what t include in each sectin, there is an example f hw the sectin might lk using a sample experiment Keep in mind, the sample lab used
More informationFall 2013 Physics 172 Recitation 3 Momentum and Springs
Fall 03 Physics 7 Recitatin 3 Mmentum and Springs Purpse: The purpse f this recitatin is t give yu experience wrking with mmentum and the mmentum update frmula. Readings: Chapter.3-.5 Learning Objectives:.3.
More informationI. Analytical Potential and Field of a Uniform Rod. V E d. The definition of electric potential difference is
Length L>>a,b,c Phys 232 Lab 4 Ch 17 Electric Ptential Difference Materials: whitebards & pens, cmputers with VPythn, pwer supply & cables, multimeter, crkbard, thumbtacks, individual prbes and jined prbes,
More informationLab 11 LRC Circuits, Damped Forced Harmonic Motion
Physics 6 ab ab 11 ircuits, Damped Frced Harmnic Mtin What Yu Need T Knw: The Physics OK this is basically a recap f what yu ve dne s far with circuits and circuits. Nw we get t put everything tgether
More informationCESAR Science Case The differential rotation of the Sun and its Chromosphere. Introduction. Material that is necessary during the laboratory
Teacher s guide CESAR Science Case The differential rtatin f the Sun and its Chrmsphere Material that is necessary during the labratry CESAR Astrnmical wrd list CESAR Bklet CESAR Frmula sheet CESAR Student
More information