Reinforcement learning
|
|
- Priscilla Crawford
- 6 years ago
- Views:
Transcription
1 Renforcement learnng Nathanel Daw Gatsby Computatonal Neuroscence Unt gatsby.ucl.ac.uk Mostly adapted from Andrew Moore s tutorals, copyrght 2002, 2004 by Andrew Moore. Hs orgnals, and many more tutorals are avalable at: The problem Decson-makng n a stuaton that may be:. Sequental (lke chess, a maze) 2. Stochastc (lke backgammon, the stock market)
2 The plan We dscuss:. How to evaluate long-term payoffs Markov systems 2. How to fnd optmal decsons Markov decson processes, dynamc programmng 3. How to learn these on the fly Renforcement learnng, sem-supervsed learnng 4. Extensons Dscounted rewards An assstant professor gets pad, say, 20K per year. How much, n total, wll the A.P. earn n ther lfe? = Infnty $ $ What s wrong wth ths argument?
3 Dscounted rewards A reward (payment) n the future s not worth qute as much as a reward now. Because of chance of oblteraton Because of nflaton Example: Beng promsed $0,000 next year s worth only 90% as much as recevng $0,000 rght now. Assumng payment n years n future s worth only (0.9) n of payment now, what s the AP s Future Dscounted Sum of Rewards? Dscount factors People n economcs and probablstc decsonmakng do ths all the tme. The Dscounted sum of future rewards usng dscount factor γ s (reward now) + γ (reward n tme step) + γ 2 (reward n 2 tme steps) + γ 3 (reward n 3 tme steps) + (nfnte sum) = τ E γ t r(st ) τ = t
4 Defne: 0.6 A. Assstant Prof 20 The academc lfe 0.2 B. Assoc. Prof S. 0.2 On the Street D. Dead 0 T. Tenured Prof 400 Assume Dscount Factor γ = 0.9 V A = Expected dscounted future rewards startng n state A V B = Expected dscounted future rewards startng n state B V T = T V S = S V D = D How do we compute V A, V B, V T, V S, V D? A Markov system wth rewards Has a set of states {s s 2 s N } Has a transton probablty matrx: T T 2 T N T= T 2 T j = Prob(next state s t+ = s j ths state s t = s ) : T N T NN Each state has a reward. {r r 2 r N } There s a dscount factor γ. 0 < γ < On Each Tme Step 0. Assume your state s s t. You get gven reward r(s t ) 2. You randomly move to another state P(s t+ = s j s t = s ) = T j 3. All future rewards are dscounted by γ
5 Solvng a Markov system Wrte V(s t ) = expected dscounted sum of future rewards startng n state s t = s V(s t ) = r(s t ) + E [ γ r(s t+ ) + γ 2 r(s t+ ) + ] = r(s t ) + γ (Expected future rewards startng from next state) = r(s t )+ γ (T V(s )+T 2 V(s 2 )+ T N V(s N )) Usng vector notaton wrte: V(s ) r T T 2 T N V(s 2 ) r 2 T 2. V= : R= : T= : V(s N ) r N T N T N2 T NN Queston: can you nvent a closed form expresson for V n terms of R, T, and γ? Solvng a Markov system wth matrx nverson Upsde: You get an exact answer Downsde:
6 Solvng a Markov system wth matrx nverson Upsde: You get an exact answer Downsde: If you have 00,000 states you re solvng a 00,000 by 00,000 system of equatons. Value teraton: another way to solve a Markov system Let s t = s. Defne: V (s ) = Expected dscounted sum of rewards over the next tme step. V 2 (s ) = Expected dscounted sum rewards durng next 2 steps V 3 (s ) = Expected dscounted sum rewards durng next 3 steps : V k (s ) = Expected dscounted sum rewards durng next k steps V (s ) = (what?) V 2 (s ) = (what?) V k+ (s ) = (what?)
7 Value teraton: another way to solve a Markov system Let s t = s. Defne: V (s ) = Expected dscounted sum of rewards over the next tme step. V 2 (s ) = Expected dscounted sum rewards durng next 2 steps V 3 (s ) = Expected dscounted sum rewards durng next 3 steps : V k (s ) = Expected dscounted sum rewards durng next k steps V (s ) = r(s t ) V 2 (s ) = r(s t ) + E[ γ r(s t+ ) ] V k+ (s ) = r(s t ) + E[ γ r(s t+ ) + + γ k r(s t+k ) ] = r( s ) + γ = r( s ) + γ N = Number of states N j= N j= T V j T V j ( s ) k j ( s ) j Let s do Value teraton SUN +4 WIND 0 HAIL.::.:.:: -8 γ = 0.5 k V k (SUN) V k (WIND) V k (HAIL)
8 Let s do Value teraton SUN +4 WIND 0 HAIL.::.:.:: -8 γ = 0.5 k V k (SUN) V k (WIND) V k (HAIL) Value teraton for solvng Markov systems Compute V (s ) for each j Compute V 2 (s ) for each j : Compute V k (s ) for each j As k V k (s ) V(s ). Why? When to stop? When Max V k+ (s ) V k (s ) < ξ (Ths s faster than smple matrx nverson f the transton matrx s sparse)
9 You run a startup company. In every state you must choose between Savng money or Advertsng. A Markov decson process Poor & Unknown +0 S S A Rch & Unknown +0 A S Poor & Famous +0 S A Rch & Famous +0 γ = 0.9 A Markov decson processes An MDP has A set of states {s s N } A set of actons {a a M } A set of rewards {r r N } (one for each state) A transton probablty functon T ( s = s s = s and a = a ) k j = Prob t + j t t On each step: 0. Call current state s. Receve reward r 2. Choose acton {a a M } 3. If you choose acton a k you ll move to state s j wth probablty 4. All future rewards are dscounted by γ k k T j
10 A polcy A polcy s a mappng from states to actons. Eg: Polcy Number : STATE ACTION PU S PF A RU S RF A S PU 0 S RU +0 PF 0 A RF +0 A Polcy Number 2: STATE ACTION PU A PF A RU A RF A PU 0 RU 0 A A PF 0 RF 0 A A Whch of the above two polces s best? A polcy Followng a polcy reduces an MDP to a Markov system. Wth what transton probabltes? How many possble polces n our example? (In general, we mght also consder stochastc polces) How do you compute the optmal polcy?
11 Interestng fact For every MDP there exsts a (at least one) determnstc optmal polcy. It s a polcy such that for every possble start state there s no better opton than to follow the polcy. (Not proved n ths lecture) More formally V π (s ): expected dscounted future reward followng polcy π from state s V (s ): expected dscounted future reward followng optmal polcy π from state s V (s ) V π (s ) for all states and polces
12 Computng the optmal polcy Idea One: Run through all possble polces. Select the best. What s the problem?? Optmal value functon S +0 /3 B A A A /3 /3 B S 2 +3 B S Queston What (by nspecton) s an optmal polcy for that MDP? (assume γ = 0.9) What s V*(s )? What s V*(s 2 )? What s V*(s 3 )?
13 Computng the Optmal Value Functon wth Value Iteraton Defne V k (s ) = Maxmum possble expected sum of dscounted rewards I can get f I start at state S and I lve for k tme steps. Note that V (s ) = r(s ) Let s compute V k (s ) for our example k V k (PU) V k (PF) V k (RU) V k (RF)
14 Let s compute V k (s ) for our example k V k (PU) V k (PF) V k (RU) V k (RF) Bellman Equaton N n+ k n V ( s ) max r ( s ) + γ T V ( s ) = k j= Value Iteraton for solvng MDPs Compute V (s ) for all Compute V 2 (s ) for all..untl converged converged when j max Also known as Dynamc Programmng J j n+ n ( S ) J ( S ) ξ Can also update values asynchronously (e n any order)
15 Fndng the optmal polcy Gven V*, t s easy to fnd π* How? Fndng the optmal polcy Gven V*, t s easy to fnd π* Compute V*(s ) for all usng value teraton Defne the best acton n state s as argmax k + k r γ Tj V j ( s ) j
16 Algorthm: Intalze π = Any randomly chosen polcy Alternate Polcy evaluaton: compute V πk (expected rewards usng polcy π k ) Polcy mprovement: π k+ (s ) = arg max a Polcy teraton a r( s ) + γ Tj V j ( s ) untl π k = π k+. You now have an optmal polcy. Ths wll converge n a fnte number of teratons (why?) πk j Another way to compute optmal polces (for all ; greedy polcy under V k ) Where we are Formalsms: Markov system and Markov decson process Markov system = MDP + polcy Value teraton for fndng expected (optonally optmal) future rewards Optmal values provde optmal polcy Polcy teraton for fndng optmal polces Next: onlne versons of these algorthms
17 Onlne renforcement learnng Imagne you are a robot n a world controlled by an MDP You are not gven the functons R, T, etc. Must learn values and polces from experence wth ndvdual states, rewards and actons. As before, let s start wth the Markov system (no acton) case Learnng Delayed Rewards s a? R=???? s d? R=??? s b? R=??? s d? R=??? s c? R=??? s f? R=??? All you can see s a seres of states and rewards: s a (r=0) s b (r=0) s c (r=4) s b (r=0) s d (r=0) s e (r=0) Task: Based on ths sequence, estmate V(s a ),V(s b ) V(s f )
18 Idea : Certanty-Equvalent Learnng Idea: Use your data to estmate the underlyng Markov system, then use the prevous offlne methods to solve t s a (r=0) s b (r=0) s c (r=4) s b (r=0) s d (r=0) s e (r=0) You draw n the Estmated Markov System: transtons + probs s a r est = 0 s b r est = 0 s c r est = 4 s d r est = 0 s e r est = 0 What are the estmated V values? C.E. for Markov systems Estmate T j, r by countng transtons, averagng rewards At each step, solve new estmated system wth, for nstance, value teraton (Why do we want new estmates at each step?) Slow, memory ntensve Varatons (e.g. prortzed sweepng) mnmze computaton by takng shortcuts on value teraton step. Can be data-neffcent.
19 Idea 2: Value samplng Idea: Sample long-term values drectly from observed sequence, wthout estmatng T or r s a (r=0) s b (r=0) s c (r=4) s b (r=0) s d (r=0) s e (r=0) At t= we were n state s a and eventually got a long term dscounted reward of 0+γ0+γ 2 4+γ 3 0+γ 4 0 = At t=2 n state s b ltdr = 2 At t=5 n state s d ltdr = 0 At t=3 n state s c ltdr = 4 At t=6 n state s e ltdr = 0 At t=4 n state s b ltdr = 0 State Observatons s a =V est (s a ) s b 2, 0 =V est (s b ) s c 4 4 =V est (s c ) s d 0 0 =V est (s d ) s e 0 0 =V est (s e ) Mean LTDR (Ths algorthm s also called Monte Carlo samplng or TD()) Assume γ= Idea 3: Temporal Dfference learnng (Sutton/Barto) Idea: A samplng verson of value teraton Only mantan a V est array, nothng else So you ve got V est (s ), V est (s 2 ), V est (s N ) and you observe s r s j what should you do? Can You Guess? A transton from that receves an mmedate reward of r and jumps to j
20 Value teraton update: TD learnng V k+ ( s ) = E r s r s j Use observed r as sample of ths k [ ( s ) + γv ( s )] V est (s ) (-α) V est (s ) + α (sampled future reward) = (-α) V est (s ) + α (r + γv est (s j )) (Ths s actually an algorthm called TD(0)) Use observed s j as sample of ths, and V est (s j ) to estmate ts value Learnng rule: ( bootstrappng ) nudge V est (s ) toward sampled value wth learnng rate α j TD convergence Dayan (992) showed that for a more general famly of TD rules, as the number of observatons goes to nfnty, then V est PROVIDED All states vsted ly often Decayng learnng rates: α = t= α t= t 2 t < ( s ) V( s ) Ths means k. T. k. T. T t= Ths means T t= α > k t 2 α < k t
21 Onlne polcy learnng The task: World: You are n state 34. Your mmedate reward s 3. You have 3 actons. Robot: I ll take acton 2. World: You are n state 77. Your mmedate reward s -7. You have 2 actons. Robot: I ll take acton. World: You re n state 34 (agan). Your mmedate reward s 3. You have 3 actons. The Credt Assgnment Problem I m n state 43, 39, 22, 2, 2, 3, 54, 26, reward = 0, = 0, = 0, = 0, = 0, = 0, = 0, = 00, acton = 2 = 4 = = = = 2 = 2 Yppee! I got to a state wth a bg reward! But whch of my actons along the way actually helped me get there?? Ths s the Credt Assgnment problem. The MDP machnery we have developed helps address ths problem.
22 Idea : Certanty-Equvalent Learnng Idea: Use your data to estmate the underlyng MDP, then use the prevous offlne methods to solve t Same as before, except now solve estmated MDP usng the MDP verson of value teraton: V est or polcy teraton est est,a est ( s ) max r + γ T V ( s ) = a j,j j The explore/explot problem We re n state s We can estmate r est, T est, V est So what acton should we choose? IDEA IDEA : 2 : Any problems wth these deas? Any other suggestons? Could we be optmal? est a = arg max r (s ) + γ T a j a = random est, a', j V est ( s ) j
23 Idea 2: Actor/Crtc (Sutton) Idea: An approxmate, samplng verson of polcy teraton, usng TD for polcy evaluaton and stochastc gradent ascent for polcy mprovement Assume stochastc polcy, parameterzed by w: Prob(choose acton a n state s ): π ( s, a) exp βw( s, a) Observe Update V est (s ) wth TD ( ) s r,a s j How do we mprove polcy w? What polcy does V est evaluate? Polcy mprovement: π k+ (s )= Observed: s r,a Actor/crtc s j πk arg max[ E[ r( s ) + γv ( s j ) ] sample Update rule: For all k, w(s,a k ) w(s,a k ) + ν (r + γv est (s j ) V est (s )) (δ a k,a π(s,a k )) a sample estmate learnng rate Kronecker δ: f a k =a, 0 f a k a Performs stochastc gradent ascent on V est Increase probablty of acton a f result better than expected.e. f r + γv est (s j ) > V est (s ), Decrease t otherwse
24 Actor/crtc Samplng verson of polcy teraton Dsadvantages: No convergence guarantees, somewhat flakey n practce Problems wth V est not trackng current polcy Advantages: May be better wth functon approxmaton (wll return to ths) Not obvous how to make a samplng verson of MDP value teraton for optmal values V*. (Why?) Defne Idea 3: Q-learnng (Watkns) Idea: TD wth a redefned value functon, whch can be learned ndependent of exploraton polcy Q*(s,a)= Expected sum of dscounted future rewards f I start n state s, f I then take acton a, and f I m subsequently optmal Questons: Defne Q*(s,a) n terms of V* Defne V*(s ) n terms of Q*
25 Q We mantan Q est nstead of V est values The TD update, on seeng s est Q-learnng Q verson of Bellman equaton: a ( s, a) = r + γ T maxq ( S a ) Q,j j, j a s j, s: est est ( s, a) α r γ maxq ( s, a' ) + ( α ) Q ( s, a) + a reward r acton a Ths s even cleverer than t looks: the Q est values are not based by any partcular exploraton polcy. Q-learnng s proved to converge. j Q-Learnng: Choosng Actons Same ssues as for CE choosng actons Optmal acton s: arg max Q * ( s, a ) a Don t always choose optmally accordng to Q est Don t always be random (otherwse t wll take a long tme to reach somewhere exctng) Boltzmann exploraton, as n actor/crtc: Prob(choose acton a) exp βq est ( ( s, a) )
26 Where we are Formalsm Offlne algorthms Onlne algorthms for value estmaton CE (model based), samplng, TD (model free) Onlne algorthms for polcy learnng CE, actor/crtc, Q-learnng If we had tme Avod lookup tables for value functon Use functon approxmaton/regresson Convergence guarantees out the wndow May favor polcy gradent methods Actor/crtc s one example, but polcy gradents can also be estmated drectly,.e. wthout usng values Optmal exploraton/explotaton tradeoffs Gttns ndces E 3 (Kearns)
27 If we had tme TD(λ) TD generalzed wth multstep backups Monte Carlo trajectory samplng s a specal case. Applcatons Backgammon Elevator schedulng Neuroscentfc modelng If we had tme Partally Observable MDPs RL when state s not observable (lke a hdden Markov model) Extremely ntractable, some cases proved uncomputable Approxmate offlne value teraton methods No polcy teraton, mnmal onlne methods
28 Revew paper: For more Kaelblng et al., Renforcement learnng: A Survey, Journal of AI Research, 996 Two excellent books: Sutton & Barto, Renforcement Learnng Informal and readable Tstskls & Bertsekas, Neuro-Dynamc Programmng Formal and less readable, full of delghtful proofs
Hidden Markov Models
Note to other teachers and users of these sldes. Andrew would be delghted f you found ths source materal useful n gvng your own lectures. Feel free to use these sldes verbatm, or to modfy them to ft your
More informationLecture Notes on Linear Regression
Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume
More informationCSC321 Tutorial 9: Review of Boltzmann machines and simulated annealing
CSC321 Tutoral 9: Revew of Boltzmann machnes and smulated annealng (Sldes based on Lecture 16-18 and selected readngs) Yue L Emal: yuel@cs.toronto.edu Wed 11-12 March 19 Fr 10-11 March 21 Outlne Boltzmann
More informationFor now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.
Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson
More informationLinear Feature Engineering 11
Lnear Feature Engneerng 11 2 Least-Squares 2.1 Smple least-squares Consder the followng dataset. We have a bunch of nputs x and correspondng outputs y. The partcular values n ths dataset are x y 0.23 0.19
More informationAn Experiment/Some Intuition (Fall 2006): Lecture 18 The EM Algorithm heads coin 1 tails coin 2 Overview Maximum Likelihood Estimation
An Experment/Some Intuton I have three cons n my pocket, 6.864 (Fall 2006): Lecture 18 The EM Algorthm Con 0 has probablty λ of heads; Con 1 has probablty p 1 of heads; Con 2 has probablty p 2 of heads
More information2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification
E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton
More informationErratum: A Generalized Path Integral Control Approach to Reinforcement Learning
Journal of Machne Learnng Research 00-9 Submtted /0; Publshed 7/ Erratum: A Generalzed Path Integral Control Approach to Renforcement Learnng Evangelos ATheodorou Jonas Buchl Stefan Schaal Department of
More informationLectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix
Lectures - Week 4 Matrx norms, Condtonng, Vector Spaces, Lnear Independence, Spannng sets and Bass, Null space and Range of a Matrx Matrx Norms Now we turn to assocatng a number to each matrx. We could
More informationChapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems
Numercal Analyss by Dr. Anta Pal Assstant Professor Department of Mathematcs Natonal Insttute of Technology Durgapur Durgapur-713209 emal: anta.bue@gmal.com 1 . Chapter 5 Soluton of System of Lnear Equatons
More informationGeneralized Linear Methods
Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set
More informationLecture 21: Numerical methods for pricing American type derivatives
Lecture 21: Numercal methods for prcng Amercan type dervatves Xaoguang Wang STAT 598W Aprl 10th, 2014 (STAT 598W) Lecture 21 1 / 26 Outlne 1 Fnte Dfference Method Explct Method Penalty Method (STAT 598W)
More informationDifference Equations
Dfference Equatons c Jan Vrbk 1 Bascs Suppose a sequence of numbers, say a 0,a 1,a,a 3,... s defned by a certan general relatonshp between, say, three consecutve values of the sequence, e.g. a + +3a +1
More informationVector Norms. Chapter 7 Iterative Techniques in Matrix Algebra. Cauchy-Bunyakovsky-Schwarz Inequality for Sums. Distances. Convergence.
Vector Norms Chapter 7 Iteratve Technques n Matrx Algebra Per-Olof Persson persson@berkeley.edu Department of Mathematcs Unversty of Calforna, Berkeley Math 128B Numercal Analyss Defnton A vector norm
More informationHidden Markov Models
Hdden Markov Models Namrata Vaswan, Iowa State Unversty Aprl 24, 204 Hdden Markov Model Defntons and Examples Defntons:. A hdden Markov model (HMM) refers to a set of hdden states X 0, X,..., X t,...,
More informationEEL 6266 Power System Operation and Control. Chapter 3 Economic Dispatch Using Dynamic Programming
EEL 6266 Power System Operaton and Control Chapter 3 Economc Dspatch Usng Dynamc Programmng Pecewse Lnear Cost Functons Common practce many utltes prefer to represent ther generator cost functons as sngle-
More informationAdditional Codes using Finite Difference Method. 1 HJB Equation for Consumption-Saving Problem Without Uncertainty
Addtonal Codes usng Fnte Dfference Method Benamn Moll 1 HJB Equaton for Consumpton-Savng Problem Wthout Uncertanty Before consderng the case wth stochastc ncome n http://www.prnceton.edu/~moll/ HACTproect/HACT_Numercal_Appendx.pdf,
More information1 Convex Optimization
Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,
More informationEcon107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)
I. Classcal Assumptons Econ7 Appled Econometrcs Topc 3: Classcal Model (Studenmund, Chapter 4) We have defned OLS and studed some algebrac propertes of OLS. In ths topc we wll study statstcal propertes
More informationSingular Value Decomposition: Theory and Applications
Sngular Value Decomposton: Theory and Applcatons Danel Khashab Sprng 2015 Last Update: March 2, 2015 1 Introducton A = UDV where columns of U and V are orthonormal and matrx D s dagonal wth postve real
More informationCIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M
CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute
More informationSpace of ML Problems. CSE 473: Artificial Intelligence. Parameter Estimation and Bayesian Networks. Learning Topics
/7/7 CSE 73: Artfcal Intellgence Bayesan - Learnng Deter Fox Sldes adapted from Dan Weld, Jack Breese, Dan Klen, Daphne Koller, Stuart Russell, Andrew Moore & Luke Zettlemoyer What s Beng Learned? Space
More informationLecture 4: Universal Hash Functions/Streaming Cont d
CSE 5: Desgn and Analyss of Algorthms I Sprng 06 Lecture 4: Unversal Hash Functons/Streamng Cont d Lecturer: Shayan Oves Gharan Aprl 6th Scrbe: Jacob Schreber Dsclamer: These notes have not been subjected
More informationLecture 14: Bandits with Budget Constraints
IEOR 8100-001: Learnng and Optmzaton for Sequental Decson Makng 03/07/16 Lecture 14: andts wth udget Constrants Instructor: Shpra Agrawal Scrbed by: Zhpeng Lu 1 Problem defnton In the regular Mult-armed
More informationErrors for Linear Systems
Errors for Lnear Systems When we solve a lnear system Ax b we often do not know A and b exactly, but have only approxmatons  and ˆb avalable. Then the best thng we can do s to solve ˆx ˆb exactly whch
More informationLecture 4. Instructor: Haipeng Luo
Lecture 4 Instructor: Hapeng Luo In the followng lectures, we focus on the expert problem and study more adaptve algorthms. Although Hedge s proven to be worst-case optmal, one may wonder how well t would
More informationKernel Methods and SVMs Extension
Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general
More informationWinter 2008 CS567 Stochastic Linear/Integer Programming Guest Lecturer: Xu, Huan
Wnter 2008 CS567 Stochastc Lnear/Integer Programmng Guest Lecturer: Xu, Huan Class 2: More Modelng Examples 1 Capacty Expanson Capacty expanson models optmal choces of the tmng and levels of nvestments
More informationprinceton univ. F 13 cos 521: Advanced Algorithm Design Lecture 3: Large deviations bounds and applications Lecturer: Sanjeev Arora
prnceton unv. F 13 cos 521: Advanced Algorthm Desgn Lecture 3: Large devatons bounds and applcatons Lecturer: Sanjeev Arora Scrbe: Today s topc s devaton bounds: what s the probablty that a random varable
More informationFeature Selection: Part 1
CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?
More informationThe Second Anti-Mathima on Game Theory
The Second Ant-Mathma on Game Theory Ath. Kehagas December 1 2006 1 Introducton In ths note we wll examne the noton of game equlbrum for three types of games 1. 2-player 2-acton zero-sum games 2. 2-player
More informationSome Comments on Accelerating Convergence of Iterative Sequences Using Direct Inversion of the Iterative Subspace (DIIS)
Some Comments on Acceleratng Convergence of Iteratve Sequences Usng Drect Inverson of the Iteratve Subspace (DIIS) C. Davd Sherrll School of Chemstry and Bochemstry Georga Insttute of Technology May 1998
More informationGaussian Mixture Models
Lab Gaussan Mxture Models Lab Objectve: Understand the formulaton of Gaussan Mxture Models (GMMs) and how to estmate GMM parameters. You ve already seen GMMs as the observaton dstrbuton n certan contnuous
More informationThe Geometry of Logit and Probit
The Geometry of Logt and Probt Ths short note s meant as a supplement to Chapters and 3 of Spatal Models of Parlamentary Votng and the notaton and reference to fgures n the text below s to those two chapters.
More informationEEE 241: Linear Systems
EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they
More informationHidden Markov Models
CM229S: Machne Learnng for Bonformatcs Lecture 12-05/05/2016 Hdden Markov Models Lecturer: Srram Sankararaman Scrbe: Akshay Dattatray Shnde Edted by: TBD 1 Introducton For a drected graph G we can wrte
More informationProblem Set 9 Solutions
Desgn and Analyss of Algorthms May 4, 2015 Massachusetts Insttute of Technology 6.046J/18.410J Profs. Erk Demane, Srn Devadas, and Nancy Lynch Problem Set 9 Solutons Problem Set 9 Solutons Ths problem
More informationI529: Machine Learning in Bioinformatics (Spring 2017) Markov Models
I529: Machne Learnng n Bonformatcs (Sprng 217) Markov Models Yuzhen Ye School of Informatcs and Computng Indana Unversty, Bloomngton Sprng 217 Outlne Smple model (frequency & profle) revew Markov chan
More informationMath 261 Exercise sheet 2
Math 261 Exercse sheet 2 http://staff.aub.edu.lb/~nm116/teachng/2017/math261/ndex.html Verson: September 25, 2017 Answers are due for Monday 25 September, 11AM. The use of calculators s allowed. Exercse
More informationLinear Approximation with Regularization and Moving Least Squares
Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...
More information10-701/ Machine Learning, Fall 2005 Homework 3
10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40
More informationWeek 5: Neural Networks
Week 5: Neural Networks Instructor: Sergey Levne Neural Networks Summary In the prevous lecture, we saw how we can construct neural networks by extendng logstc regresson. Neural networks consst of multple
More informationDynamic Programming. Lecture 13 (5/31/2017)
Dynamc Programmng Lecture 13 (5/31/2017) - A Forest Thnnng Example - Projected yeld (m3/ha) at age 20 as functon of acton taken at age 10 Age 10 Begnnng Volume Resdual Ten-year Volume volume thnned volume
More information4DVAR, according to the name, is a four-dimensional variational method.
4D-Varatonal Data Assmlaton (4D-Var) 4DVAR, accordng to the name, s a four-dmensonal varatonal method. 4D-Var s actually a drect generalzaton of 3D-Var to handle observatons that are dstrbuted n tme. The
More informationAn Online Learning Algorithm for Demand Response in Smart Grid
An Onlne Learnng Algorthm for Demand Response n Smart Grd Shahab Bahram, Student Member, IEEE, Vncent W.S. Wong, Fellow, IEEE, and Janwe Huang, Fellow, IEEE Abstract Demand response program wth real-tme
More informationCourse 395: Machine Learning - Lectures
Course 395: Machne Learnng - Lectures Lecture 1-2: Concept Learnng (M. Pantc Lecture 3-4: Decson Trees & CC Intro (M. Pantc Lecture 5-6: Artfcal Neural Networks (S.Zaferou Lecture 7-8: Instance ased Learnng
More information18.1 Introduction and Recap
CS787: Advanced Algorthms Scrbe: Pryananda Shenoy and Shjn Kong Lecturer: Shuch Chawla Topc: Streamng Algorthmscontnued) Date: 0/26/2007 We contnue talng about streamng algorthms n ths lecture, ncludng
More information1 The Mistake Bound Model
5-850: Advanced Algorthms CMU, Sprng 07 Lecture #: Onlne Learnng and Multplcatve Weghts February 7, 07 Lecturer: Anupam Gupta Scrbe: Bryan Lee,Albert Gu, Eugene Cho he Mstake Bound Model Suppose there
More informationHopfield networks and Boltzmann machines. Geoffrey Hinton et al. Presented by Tambet Matiisen
Hopfeld networks and Boltzmann machnes Geoffrey Hnton et al. Presented by Tambet Matsen 18.11.2014 Hopfeld network Bnary unts Symmetrcal connectons http://www.nnwj.de/hopfeld-net.html Energy functon The
More information1 GSW Iterative Techniques for y = Ax
1 for y = A I m gong to cheat here. here are a lot of teratve technques that can be used to solve the general case of a set of smultaneous equatons (wrtten n the matr form as y = A), but ths chapter sn
More informationMMA and GCMMA two methods for nonlinear optimization
MMA and GCMMA two methods for nonlnear optmzaton Krster Svanberg Optmzaton and Systems Theory, KTH, Stockholm, Sweden. krlle@math.kth.se Ths note descrbes the algorthms used n the author s 2007 mplementatons
More informationIntroduction to Hidden Markov Models
Introducton to Hdden Markov Models Alperen Degrmenc Ths document contans dervatons and algorthms for mplementng Hdden Markov Models. The content presented here s a collecton of my notes and personal nsghts
More informationDepartment of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING
MACHINE LEANING Vasant Honavar Bonformatcs and Computatonal Bology rogram Center for Computatonal Intellgence, Learnng, & Dscovery Iowa State Unversty honavar@cs.astate.edu www.cs.astate.edu/~honavar/
More informationTracking with Kalman Filter
Trackng wth Kalman Flter Scott T. Acton Vrgna Image and Vdeo Analyss (VIVA), Charles L. Brown Department of Electrcal and Computer Engneerng Department of Bomedcal Engneerng Unversty of Vrgna, Charlottesvlle,
More informationLecture Space-Bounded Derandomization
Notes on Complexty Theory Last updated: October, 2008 Jonathan Katz Lecture Space-Bounded Derandomzaton 1 Space-Bounded Derandomzaton We now dscuss derandomzaton of space-bounded algorthms. Here non-trval
More informationLogistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI
Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton
More informationLecture 10 Support Vector Machines II
Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed
More informationStructure and Drive Paul A. Jensen Copyright July 20, 2003
Structure and Drve Paul A. Jensen Copyrght July 20, 2003 A system s made up of several operatons wth flow passng between them. The structure of the system descrbes the flow paths from nputs to outputs.
More informationSemi-Supervised Learning
Sem-Supervsed Learnng Consder the problem of Prepostonal Phrase Attachment. Buy car wth money ; buy car wth wheel There are several ways to generate features. Gven the lmted representaton, we can assume
More informationCOS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture # 15 Scribe: Jieming Mao April 1, 2013
COS 511: heoretcal Machne Learnng Lecturer: Rob Schapre Lecture # 15 Scrbe: Jemng Mao Aprl 1, 013 1 Bref revew 1.1 Learnng wth expert advce Last tme, we started to talk about learnng wth expert advce.
More informationHomework Assignment 3 Due in class, Thursday October 15
Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.
More informationCurve Fitting with the Least Square Method
WIKI Document Number 5 Interpolaton wth Least Squares Curve Fttng wth the Least Square Method Mattheu Bultelle Department of Bo-Engneerng Imperal College, London Context We wsh to model the postve feedback
More informationHidden Markov Models & The Multivariate Gaussian (10/26/04)
CS281A/Stat241A: Statstcal Learnng Theory Hdden Markov Models & The Multvarate Gaussan (10/26/04) Lecturer: Mchael I. Jordan Scrbes: Jonathan W. Hu 1 Hdden Markov Models As a bref revew, hdden Markov models
More informationSuggested solutions for the exam in SF2863 Systems Engineering. June 12,
Suggested solutons for the exam n SF2863 Systems Engneerng. June 12, 2012 14.00 19.00 Examner: Per Enqvst, phone: 790 62 98 1. We can thnk of the farm as a Jackson network. The strawberry feld s modelled
More informationLecture 3 January 31, 2017
CS 224: Advanced Algorthms Sprng 207 Prof. Jelan Nelson Lecture 3 January 3, 207 Scrbe: Saketh Rama Overvew In the last lecture we covered Y-fast tres and Fuson Trees. In ths lecture we start our dscusson
More informationEnsemble Methods: Boosting
Ensemble Methods: Boostng Ncholas Ruozz Unversty of Texas at Dallas Based on the sldes of Vbhav Gogate and Rob Schapre Last Tme Varance reducton va baggng Generate new tranng data sets by samplng wth replacement
More informationLecture 12: Classification
Lecture : Classfcaton g Dscrmnant functons g The optmal Bayes classfer g Quadratc classfers g Eucldean and Mahalanobs metrcs g K Nearest Neghbor Classfers Intellgent Sensor Systems Rcardo Guterrez-Osuna
More informationLecture 17 : Stochastic Processes II
: Stochastc Processes II 1 Contnuous-tme stochastc process So far we have studed dscrete-tme stochastc processes. We studed the concept of Makov chans and martngales, tme seres analyss, and regresson analyss
More informationReal-Time Systems. Multiprocessor scheduling. Multiprocessor scheduling. Multiprocessor scheduling
Real-Tme Systems Multprocessor schedulng Specfcaton Implementaton Verfcaton Multprocessor schedulng -- -- Global schedulng How are tasks assgned to processors? Statc assgnment The processor(s) used for
More informationMLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012
MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example:
More informationCredit Card Pricing and Impact of Adverse Selection
Credt Card Prcng and Impact of Adverse Selecton Bo Huang and Lyn C. Thomas Unversty of Southampton Contents Background Aucton model of credt card solctaton - Errors n probablty of beng Good - Errors n
More informationOther NN Models. Reinforcement learning (RL) Probabilistic neural networks
Other NN Models Renforcement learnng (RL) Probablstc neural networks Support vector machne (SVM) Renforcement learnng g( (RL) Basc deas: Supervsed dlearnng: (delta rule, BP) Samples (x, f(x)) to learn
More informationCS : Algorithms and Uncertainty Lecture 17 Date: October 26, 2016
CS 29-128: Algorthms and Uncertanty Lecture 17 Date: October 26, 2016 Instructor: Nkhl Bansal Scrbe: Mchael Denns 1 Introducton In ths lecture we wll be lookng nto the secretary problem, and an nterestng
More informationMultilayer Perceptrons and Backpropagation. Perceptrons. Recap: Perceptrons. Informatics 1 CG: Lecture 6. Mirella Lapata
Multlayer Perceptrons and Informatcs CG: Lecture 6 Mrella Lapata School of Informatcs Unversty of Ednburgh mlap@nf.ed.ac.uk Readng: Kevn Gurney s Introducton to Neural Networks, Chapters 5 6.5 January,
More informationNeuro-Adaptive Design II:
Lecture 37 Neuro-Adaptve Desgn II: A Robustfyng Tool for Any Desgn Dr. Radhakant Padh Asst. Professor Dept. of Aerospace Engneerng Indan Insttute of Scence - Bangalore Motvaton Perfect system modelng s
More informationOn an Extension of Stochastic Approximation EM Algorithm for Incomplete Data Problems. Vahid Tadayon 1
On an Extenson of Stochastc Approxmaton EM Algorthm for Incomplete Data Problems Vahd Tadayon Abstract: The Stochastc Approxmaton EM (SAEM algorthm, a varant stochastc approxmaton of EM, s a versatle tool
More informationLOW BIAS INTEGRATED PATH ESTIMATORS. James M. Calvin
Proceedngs of the 007 Wnter Smulaton Conference S G Henderson, B Bller, M-H Hseh, J Shortle, J D Tew, and R R Barton, eds LOW BIAS INTEGRATED PATH ESTIMATORS James M Calvn Department of Computer Scence
More informationParametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010
Parametrc fractonal mputaton for mssng data analyss Jae Kwang Km Survey Workng Group Semnar March 29, 2010 1 Outlne Introducton Proposed method Fractonal mputaton Approxmaton Varance estmaton Multple mputaton
More informationNeural networks. Nuno Vasconcelos ECE Department, UCSD
Neural networs Nuno Vasconcelos ECE Department, UCSD Classfcaton a classfcaton problem has two types of varables e.g. X - vector of observatons (features) n the world Y - state (class) of the world x X
More informationMATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)
1/16 MATH 829: Introducton to Data Mnng and Analyss The EM algorthm (part 2) Domnque Gullot Departments of Mathematcal Scences Unversty of Delaware Aprl 20, 2016 Recall 2/16 We are gven ndependent observatons
More informationA PROBABILITY-DRIVEN SEARCH ALGORITHM FOR SOLVING MULTI-OBJECTIVE OPTIMIZATION PROBLEMS
HCMC Unversty of Pedagogy Thong Nguyen Huu et al. A PROBABILITY-DRIVEN SEARCH ALGORITHM FOR SOLVING MULTI-OBJECTIVE OPTIMIZATION PROBLEMS Thong Nguyen Huu and Hao Tran Van Department of mathematcs-nformaton,
More informationClustering with Gaussian Mixtures
Note to other teachers and users of these sldes. Andrew would be delghted f you found ths source materal useful n gvng your own lectures. Feel free to use these sldes verbatm, or to modfy them to ft your
More informationSDMML HT MSc Problem Sheet 4
SDMML HT 06 - MSc Problem Sheet 4. The recever operatng characterstc ROC curve plots the senstvty aganst the specfcty of a bnary classfer as the threshold for dscrmnaton s vared. Let the data space be
More informationMultilayer neural networks
Lecture Multlayer neural networks Mlos Hauskrecht mlos@cs.ptt.edu 5329 Sennott Square Mdterm exam Mdterm Monday, March 2, 205 In-class (75 mnutes) closed book materal covered by February 25, 205 Multlayer
More informationComputational issues surrounding the management of an ecological food web
Computatonal ssues surroundng the management of an ecologcal food web Wllam J M Probert, Eve McDonald-Madden, Nathale Peyrard, Régs Sabbadn AIGM 12, ECAI2012 Montpeller, France Ratonale Ecology has many
More informationLecture 10: May 6, 2013
TTIC/CMSC 31150 Mathematcal Toolkt Sprng 013 Madhur Tulsan Lecture 10: May 6, 013 Scrbe: Wenje Luo In today s lecture, we manly talked about random walk on graphs and ntroduce the concept of graph expander,
More informationStatistics for Economics & Business
Statstcs for Economcs & Busness Smple Lnear Regresson Learnng Objectves In ths chapter, you learn: How to use regresson analyss to predct the value of a dependent varable based on an ndependent varable
More informationContinuous Time Markov Chains
Contnuous Tme Markov Chans Brth and Death Processes,Transton Probablty Functon, Kolmogorov Equatons, Lmtng Probabltes, Unformzaton Chapter 6 1 Markovan Processes State Space Parameter Space (Tme) Dscrete
More informationModule 9. Lecture 6. Duality in Assignment Problems
Module 9 1 Lecture 6 Dualty n Assgnment Problems In ths lecture we attempt to answer few other mportant questons posed n earler lecture for (AP) and see how some of them can be explaned through the concept
More informationSTAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 16
STAT 39: MATHEMATICAL COMPUTATIONS I FALL 218 LECTURE 16 1 why teratve methods f we have a lnear system Ax = b where A s very, very large but s ether sparse or structured (eg, banded, Toepltz, banded plus
More informationMarkov decision processes
IMT Atlantque Technopôle de Brest-Irose - CS 83818 29238 Brest Cedex 3 Téléphone: +33 0)2 29 00 13 04 Télécope: +33 0)2 29 00 10 12 URL: www.mt-atlantque.fr Markov decson processes Lecture notes therry.chonavel@mt-atlantque.fr
More informationOutline and Reading. Dynamic Programming. Dynamic Programming revealed. Computing Fibonacci. The General Dynamic Programming Technique
Outlne and Readng Dynamc Programmng The General Technque ( 5.3.2) -1 Knapsac Problem ( 5.3.3) Matrx Chan-Product ( 5.3.1) Dynamc Programmng verson 1.4 1 Dynamc Programmng verson 1.4 2 Dynamc Programmng
More informationMaximal Margin Classifier
CS81B/Stat41B: Advanced Topcs n Learnng & Decson Makng Mamal Margn Classfer Lecturer: Mchael Jordan Scrbes: Jana van Greunen Corrected verson - /1/004 1 References/Recommended Readng 1.1 Webstes www.kernel-machnes.org
More informationGeorgia Tech PHYS 6124 Mathematical Methods of Physics I
Georga Tech PHYS 624 Mathematcal Methods of Physcs I Instructor: Predrag Cvtanovć Fall semester 202 Homework Set #7 due October 30 202 == show all your work for maxmum credt == put labels ttle legends
More informationCS4495/6495 Introduction to Computer Vision. 3C-L3 Calibrating cameras
CS4495/6495 Introducton to Computer Vson 3C-L3 Calbratng cameras Fnally (last tme): Camera parameters Projecton equaton the cumulatve effect of all parameters: M (3x4) f s x ' 1 0 0 0 c R 0 I T 3 3 3 x1
More informationCS : Algorithms and Uncertainty Lecture 14 Date: October 17, 2016
CS 294-128: Algorthms and Uncertanty Lecture 14 Date: October 17, 2016 Instructor: Nkhl Bansal Scrbe: Antares Chen 1 Introducton In ths lecture, we revew results regardng follow the regularzed leader (FTRL.
More informationVapnik-Chervonenkis theory
Vapnk-Chervonenks theory Rs Kondor June 13, 2008 For the purposes of ths lecture, we restrct ourselves to the bnary supervsed batch learnng settng. We assume that we have an nput space X, and an unknown
More informationReview of Taylor Series. Read Section 1.2
Revew of Taylor Seres Read Secton 1.2 1 Power Seres A power seres about c s an nfnte seres of the form k = 0 k a ( x c) = a + a ( x c) + a ( x c) + a ( x c) k 2 3 0 1 2 3 + In many cases, c = 0, and the
More informationBoostrapaggregating (Bagging)
Boostrapaggregatng (Baggng) An ensemble meta-algorthm desgned to mprove the stablty and accuracy of machne learnng algorthms Can be used n both regresson and classfcaton Reduces varance and helps to avod
More information10.34 Fall 2015 Metropolis Monte Carlo Algorithm
10.34 Fall 2015 Metropols Monte Carlo Algorthm The Metropols Monte Carlo method s very useful for calculatng manydmensonal ntegraton. For e.g. n statstcal mechancs n order to calculate the prospertes of
More information