Applying Q-Learning to Flappy Bird
|
|
- Norman Farmer
- 6 years ago
- Views:
Transcription
1 Applying Q-Lerning to Flppy Bird Moritz Ebeling-Rump, Mnfred Ko, Zchry Hervieux-Moore Abstrct The field of mchine lerning is n interesting nd reltively new re of reserch in rtificil intelligence. In this pper, specil type of reinforcement lerning, Q-Lerning, ws pplied to the populr mobile gme Flppy Bird. The Q- Lerning lgorithm ws tested on two different environments. The originl version nd simplified version. The mximum score chieved on the originl version nd simplified version were 169 nd 28,851, respectively. The trde-off between runtime nd ccurcy ws investigted. Using pproprite settings, the Q-Lerning lgorithm ws proven to be successful with reltively quick convergence time. I. INTRODUCTION Q-Lerning is form reinforcement lerning tht does not require the gents to hve prior knowledge of the environment dynmics 1. The gents do not hve ccess to the cost function or the trnsition probbilities. Reinforcement lerning is type of lerning strtegy where the gents re not told which ctions to tke. Insted, the gents discovers which ction yields the highest rewrd from experimenttion 2. Q-Lerning ws first introduced by Wtkins in nd it ws not until 1992 tht the convergence ws shown by Wtkins nd Dyn 1. bird is two-dimensionl side-scrolling gme, illustrted in Figure 1, feturing retro style grphics. The gol of the gme is to direct the bird through series of pipes. If the bird touches the floor or pipe, then the bird will die nd the gme restrts. The only ction tht plyers re ble to perform is to mke the bird jump. Otherwise, the bird will fll due to grvity. Ech pipe the bird successfully clers, the score is incremented. In the originl gme, the gp between the pipes re held constnt but the height of the gp is uniformly distributed. In the simplified version tht ws considered, the height of the gp is held constnt. The gme is ment to test plyers urnce s there is no to the gme. II. THEORY Wtkins nd Dyn first showed convergence of the Q- Lerning lgorithm in 1. However, Tsitsiklis gives n lternte proof of convergence with stronger results in 6. It is Tsitsiklis s results tht will be shown here. To begin, define the stndrd form of stochstic pproximtion lgorithms. x i := x i + α i (F i (x) x i + ω i ) (1) Where x = (x 1, x 2,..., x n ) R n, F 1, F 2,..., F n re mppings from R n R, ω i is noise term, nd α i is the step size. Assumptions on equtions of this form re now listed. Note, in Tsitsiklis s pper, he llows for outdted informtion. For simplicity, it is ssumed ll informtion vilble to the controller is up to dte which will simplify ths list. This is justified s the ppliction considered hs controller tht never uses outdted informtion. Assumption 1: () x(0) is F (0)-mesurble. (b) i nd t, w i is F (t + 1)-mesurble. (c) i nd t, α i (t) is F (t)-mesurble. (d) i nd t, we hve Eω i (t) F (t) = 0. (e) constnts A nd B such tht Fig. 1. The gmeply of Flppy Bird Flppy Bird is mobile gme developed in 2013 by Dong Nyugen 4 nd published by dotgers, smll indepent gme developer compny bsed in Vietnm 5. Flppy Eωi 2 (t) F (t) A + B mx mx x j(τ) 2, i, t j τ t Assumption 1 is the typicl restrictions imposed on stochstic systems. First, the present cn be determined entirely by previous informtion. Second, the noise term hs men 0 nd bounded vrince. Assumption 2: () i, α i (t) =, w.p. 1
2 (b) C such tht i, α i (t) 2 C, w.p. 1 Assumption 2 re restrictions on the step size to ensure tht it pproches 0 t slow enough speed to ensure convergence of the lgorithm. The following conventions re now used. If x y then x i y i i. The norm v is defined s: x v = mx i x i v i, x R n Assumption 3: () The mpping F is monotone. Tht is, if x y then F (x) F (y) (b) The mpping F is continuous. (c) The mpping F hs unique fixed point x. (d) If e R n is the vector whose components re ll equl to 1, nd r > 0, then, F (X) re F (x re) F (x + re) F (X) + re It is not hrd to imgine how Assumption 3 might be used to generte inequlities tht re essentil in proof. Assumption 4: vector x R n, positive vector v, nd β 0, 1) such tht, F (x) x v β x x v, x R n It cn be seen tht the previous ssumption provides the frmework for contrction rgument (since β 0, 1)). This cn be used to prove convergence to sttionry point. Assumption 5: positive vector v, nd β 0, 1) nd D such tht, F (x) v β x v + D, x R n Tsitsiklis goes on the prove the following three theorems. Theorem 1: Let Assumptions 1,2, nd 5 hold. Then, then sequence x(t) in the stochstic pproximtion is bounded with probbility 1. Theorem 2: Let Assumptions 1,2, nd 3 hold. Then, then sequence x(t) in the stochstic pproximtion is bounded with probbility 1. Theorem 3: Let Assumptions 1,2, nd 4 hold. Then, then sequence x(t) in the stochstic pproximtion converges to x with probbility 1. From these theorems, it is cler tht Theorem 3 is the strongest in the sense tht we need the fewest ssumptions to gurntee convergence. Also, Theorem 2 requires boundedness which is the result of Theorem 1. Thus, one cn think of these s: to gurntee convergence, one needs Assumptions 1, 2, 4, nd 5 or 1, 2, nd 3. With regrds to Q-Lerning, the wy these theorems re importnt deps on writing F (x) s follows: F iu (Q) = Ec iu + β min Q s(i,u),v v U(s(t,u)) when F (x) is written in this form, it cn be shown tht Assumptions 1 nd 4 re met. Hence, by Theorem 3, we only need Assumption 2 for convergence to be gurnteed. This is n ssumption on the step size α so it is very esy for the lgorithm user to impose this restriction in prctice. For our ppliction, we will rewrite this cost minimiztion s rewrd mximiztion nd use different nottion s shorthnd: F (Q t (s t, t )) = E R t+1 + γe mx Q t (s t+1, ) We now strt with the Q-Lerning lgorithm nd work our wy bckwrds to the stochstic pproximtion lgorithm in eqution 1. Q t+1 (s t, t ) = Q t (s t, t ) + α t (s t, t ) { } R t+1 + γ mx Q t (s t+1, ) Q t (s t, t ) = Q t (s t, t ) + α t (s t, t ) {R t+1 + E R t+1 E R t+1 +γe mx Q t (s t+1, ) E mx Q t (s t+1, ) I t } + mx Q t (s t+1, ) Q t (s t, t ), where I t denotes the informtion vilble to the controller t time t. nd w t (s t, t ) =R t+1 E R t+1 + mx Q t (s t+1, ) E mx Q t (s t+1, ) I t. Rewriting the expression for Q we get Q t+1 (s t, t ) = Q t (s t, t ) + α t (s t, t ) (F (Q t (s t, t )) Q t (s t, t ) + w t (s t, t )) Q is now in the desired form! Thus, we only need to impose the restictions on α(t) to gurntee convergence of Q. The lst cse we wish to consider is when the problem is undiscounted (β = 1). For this, one further ssumption is needed. Assumption 6: () t lest one proper sttionry policy (b) Every improper sttionry policy yields infinite expected cost for t lest one initil stte Where proper policy is defined to be sttionry policy tht the probbility of going to n bsorbing stte with 0 rewrd converges to 1. Otherwise, it is improper. With this, we get the lst theorem: Theorem 4: The Q-Lerning lgorithm converges with probbility 1 if we impose Assumption 2 nd: () β < 1, (b) β = 1, Q iu (0) = 0, nd ll policies re proper, or (c) β = 1, Q iu (0) = 0, Assumption 6 holds, nd Q(t) is gurnteed to be bounded with probbility 1.
3 III. IMPLEMENTATION The key to successful implementtion of Q-Lerning is modeling the problem efficiently. As long s the stte spce is finite, Q-Lerning will converge. In prcticl pplictions, we re looking to obtin results in limited time frme. If the stte spce is very lrge, the lgorithm will tke too long to converge. On the other hnd, if the stte spce is very smll, ccurcy will be lost. Thus, there is trde-off between run-time nd ccurcy. The dimension of the stte spce will be kept to minimum by only considering three ttributes: X distnce from the next pipe Y distnce from the next pipe Life of the bird A visul representtion of the first two ttributes re shown in Figure 2 exct vlues of the rewrd function re not s importnt s long s reinforcement is positive nd punishment is negtive. We weigh the benefit of survivl lower thn the punishment of deth, s the gol is to void deth. { 15, if bird live R t+1 = 1000, if bird ded The lerning rte determines to wht extent new informtion overrides old informtion. The following lerning ws chosen: 1 α t (s t, t ) = 1 + t k=0 1 {s k =s t, k = t} The discount fctor determines the importnce of future rewrds. This vlue is given below. γ = 1. These re the essentil definitions tht re needed to implement the Q-Lerning lgorithm. Now the Q-Lerning pseudocode tht we used in our implementtion is shown below. 1) Initilize Q mtrix 2) Repet ) Determine the ction,, in stte, s, bsed on Q mtrix b) Tke the ction,, nd observe the outcome stte, s, nd rewrd, r c) Updte Q mtrix bsed on Eqution 2 Fig. 2. The first two ttributes: X distnce & Y distnce Only integers re considered to ensure finite stte spce. The distnce in X direction is bounded by 0 from below nd by 300 from bove. The Y distnce rnges from 200 to 200. The stte of the bird is chrcterized by ded s 0 nd the live s 1. Thus the stte spce is defined s: X = 0, , 200 {0, 1}. At every point in time, there re two possible ctions - jump or don t jump. Thus the ction spce is defines s: U = {0, 1}. After the previous ction hs been pplied, the resulting stte is nlyzed. A rewrd function is invoked, which reinforces or punishes the previous ction. If the bird survived, the previous ction is seen s positive nd is being reinforced. However, if the bird died, we pply the punishment. The Q t+1 (s t, t ) = Q t (s t, t ) + α t (s t, t ) ( ) R t+1 + γ mx Q t (s t+1, ) Q t (s t, t ) IV. DISCUSSION When strting the Q-Lerning lgorithm, the Q mtrix is initilized to zero. The Q mtrix cn be seen s the brin of the rtificil intelligence. At the beginning, it does not hve ny knowledge bout its environment. When pplying Eqution 2 for the first time step, the mximum is tken between 0 nd 0. In other words: fced with the decision of jumping or not jumping, both ctions hve the sme Q vlues - tie occurs. We decided to choose the ction don t jump s tie breker becuse it is the more humn-like option. The bird ws expected to fll to the ground in the first couple of trils, but lern to void the ground firly quickly since flling to the ground is not good option. The Q mtrix described before is of size 300 x 400 x 2. Even though tht doesn t seem tht big, it is still totl of 240,000 different sttes. Tht mens tht every time the bird is positioned slightly different reltive to the pipe, it does not hve ny knowledge. However, the lgorithm still converges. It just tkes while to fill the Q mtrix. Tht is why the stte spce ws discretized further for sttes where the bird is fr wy from the initil pipe. The first intelligent behvior is the bird lerns tht flling to the ground is bd ide. Note tht the bird doesn t hve (2)
4 informtion bout wht ground is or even where it is locted. The only vilble informtion is the distnce to the next pipe nd the current life sttus. The next stge in the bird s lerning process is tht it will crsh into the lower pipe until it lerns tht this leds to negtive rewrd. With the tie breker being don t jump, it will fll whenever it encounters new stte. After getting feedbck, it will relize tht flling ws not good move in the prior sitution. If the bird is locted in the sme position reltive to pipe gin it will use this informtion nd jump insted. This propgtion continues leding to the bird eventully clering the first pipe. Another problem rouse: even fter flying through the gp the bird would eventully die. The bird lerned tht flying through the pipe will led to deth. Due to the bck propgtion, the Q mtrix hd negtive entries for sttes close to the gp. Following the lgorithm, the sttes bove the gp with Q vlues of zero would get explored. For ll the AI knows, there could be better option thn flying through the gp. It continues to go through ll other sttes trying to find n lterntive before relizing tht flying through the gp ws the best wy. It ws observed tht the speed of the bird mkes big difference. The gme engine works with grvity. The longer the bird is flling, the fster it gets. This influences the choice of the optiml policy. We could include the speed of the bird in the stte spce, but tht would dd nother dimension to the stte spce, which would slow down the lerning process tremously. Tht s why different pproch ws tken. The lerning rte ws lowered with regrd to how often certin stte ws visited before. The dynmic lerning rte tkes cre of both problems described bove. The AI lerns tht flying closely bove the pipe leds to positive benefits. Even if it does eventully crsh, it doesn t disregrd this informtion. The cse of the bird crshing becuse of missing informtion regrding speed is rther seldom. The djusted lerning rte mkes sure tht these improbble events do not hve too big of n impct on the Q mtrix. The bird successfully lerned how to ply the gme using Q- Lerning without ny prior knowledge of the gme dynmics. A high score of 169 nd 28,851 ws chieved on the originl nd simplified gme, respectively. Figure 3 shows the lerning rte of the bird by compring the number of trils nd their respective scores. Fig. 3. The score plotted with the number of trils of the AI The convergence of the lgorithm is ensured due to the choice of suitble α(t) nd ppropriteness of the stte spce since the discounted fctor γ = 1 ws chosen. We know tht Assumption 2 on α is stisfied due to: α t (s t, t ) = Likewise, = n= t k=0 1 {s k =s t, k = t} 1 n = α t (s t, t ) 2 1 = n 2 = 2 n=1 Since γ = 1 is used, it needs to stisfy the conditions of Theorem 4, in prticulr, prt b or c. We initilize the Q mtrix to 0, which is requirement for both. Next, we impose dditionl structure on the Q. We bound Q from bove by 10,000. Tht is, rewrd becomes 0 once the Q mtrix entry becomes sufficiently lrge. Also, it is known priori tht the gme is solvble. Tht is, there is n optiml policy tht will gurntee the bird s survivl for eternity. This is known by the construction of the gme. The code does not llow for sequence of pipes tht will ensure deth. Hence, we hve two bsorbing sttes. One being the deth of the bird, s the bird cnnot revive. The other is the collection of sttes tht mke the bird live forever. Since we bound the Q mtrix, we hve proper sttionry policy tht hs 0 rewrd nd ll other improper policies (those leding to deth) hve negtive infinite expected rewrd. Thus, we stisfy the conditions for prt c of Theorem 4 nd, by consequence, hve convergence of the Q mtrix. Although convergence is gurnteed, in prctice, one does not hve the luxury of running the lgorithm forever. Thus, the gretest critique of this lgorithm is tht convergence my tke long time, which is undesirble. For instnce, if the controller lerns good behvior, but sub-optiml, it will
5 continue to perform this ction even though better ctions exist until those ctions re explored. In this ppliction, the mjority of the sttes in the stte spce re undesirble. If the controller misses the smll window of desirble sttes, it might hve to explore the entire stte spce before returning to tht smll window which will gretly increse the time of convergence. V. POSSIBLE IMPROVEMENTS If given more time, mny ides could hve been implemented to improve the Q-Lerning lgorithm. Some of these ides include techniques for fster convergence nd methods to strengthen the survivbility rte of the bird. In the project, the grphics ws displying throughout the lerning process of the bird. This significntly slows down the lgorithm. If the grphics were to be removed from the code, the optiml policy cn be chieved in significntly reduced time frme. Only one bird ws used to find the optiml policy throughout the project. If multiple birds were simultneously lerning nd they collectively updted the mtrix Q, then this will gretly reduce the time needed to fill the Q mtrix. This will result in the birds converging to the optiml policy much quicker. When the bird dies, the gme restrts with the bird t the sme position every time. If rndom strting position were to be implemented, then the bird will be ble to explore the stte spce more efficiently. This my reduce the time it tkes for the bird to converge to the optiml policy. To improve the bird s survivbility, possible solution is to dd nother dimension to the stte spce. The bird s stte is highly depent on the velocity of the bird. If velocity is included in the stte spce, then the trnsition probbility would be deterministic. This would llow the bird to lern precisely wht the next stte is going to be given its current stte nd ction. After the bird hs converged to the optiml policy, it ws noticed tht there ws only one cse tht led to the deth of the bird. This cse ws when the height of the gp of the current pipe ws low to the ground while the height of the gp of the next pipe ws ner the ceiling. This cse did not give the bird enough room to mke it to the next pipe. A solution to this would be to include the position of the next pipe in the stte spce. This would llow the bird to pln hed which in turn would increse its survivbility rte. VI. CONCLUSION The big dvntge of the Q-Lerning lgorithm is tht it converges without knowledge of the system. However, the dt given to the lgorithm highly influences the run-time. The key to receiving meningful results in limited timefrme is the choice of n efficient stte spce. Following this observtion, pproprite spces nd prmeters were chosen. Even with short lerning time, the rtificil intelligence exceeded expecttions by performing better thn beginner humn plyer. The Q-Lerning lgorithm cn be improved upon by implementing the improvements outlined in Section V. APPENDIX The Mtlb code for the Q-L erning lgorithm is shown below. Note tht only the relevnt prt of the code is included. x m e t r i c = round ( Tubes. ScreenX... ( Tubes. FrontP ) Bird. ScreenPos ( 1 ) ); y m e t r i c = round (177 ( Tubes. VOffset... ( Tubes. FrontP ) 1) Bird. ScreenPos ( 2 ) ); %I f t h e x m e t r i c i s n e g t i v e %( pssed t h e p i p e ) l o o k t t h e n e x t p i p e i f x m e t r i c < 0 x m e t r i c = round ( Tubes. ScreenX ( mod... ( Tubes. FrontP, 3 ) + 1 )... Bird. ScreenPos ( 1 ) ); y m e t r i c = round ( ( Tubes. VOffset ( mod... ( Tubes. FrontP, 3 ) + 1 )... 1) Bird. ScreenPos (2)+5 10); %Find i n d i c e s f o r k =1: l e n g t h ( x ) i f x m e t r i c <x ( k ) x i n d =k ; brek ; f o r k =1: l e n g t h ( y ) i f y m e t r i c <y ( k ) y i n d =k ; brek ; x i n d o =1; f o r k =1: l e n g t h ( x ) i f x m e t r i c o l d <x ( k ) x i n d o =k ; brek ; y i n d o =1; f o r k =1: l e n g t h ( y ) i f y m e t r i c o l d <y ( k ) y i n d o =k ; brek ;
6 %C l c u l t e lph s t t e C o u n t ( x i n d o, y i n d o ) =... s t t e C o u n t ( x i n d o, y i n d o ) + 1 ; l p h = 1 / ( 1 + s t t e C o u n t ) ; c t i o n i n d e x = c t i o n +1; i f d e t h F l g == t r u e R = 1000; e l s e R = 1 5 ; %Q l e r n i n g l g o r i t h m Q temp =... Q( x i n d o, y i n d o, c t i o n i n d e x ) +... l p h (R mx(q( x ind, y ind, : ) )... Q( x i n d o, y i n d o, c t i o n i n d e x ) ) ; i f Q temp <= Q( x i n d o, y i n d o, c t i o n i n d e x )... = Q temp ; d e t h F l g = f l s e ; %Tke c t i o n bsed on Q m t r i x i f Q( x ind, y ind,1) <Q( x ind, y ind, 2 ) c t i o n =1; F l y K e y S t t u s = t r u e ; e l s e c t i o n =0; x m e t r i c o l d = x m e t r i c ; y m e t r i c o l d = y m e t r i c ; REFERENCES 1 C. J. Wtkins nd P. Dyn, Q-lerning, Mchine lerning, vol. 8, no. 3-4, pp , R. S. Sutton nd A. G. Brto, Reinforcement lerning: An introduction. MIT press, C. J. C. H. Wtkins, Lerning from delyed rewrds. PhD thesis, University of Cmbridge Englnd, R. Willims, Wht is flppy bird? the gme tking the pp store by storm, jn M. Berth, Everything you need to know bout your new fvorite cell phone gme, flppy bird, jn J. N. Tsitsiklis, Asynchronous stochstic pproximtion nd q- lerning, Mchine Lerning, vol. 16, no. 3, pp , 1994.
1 Online Learning and Regret Minimization
2.997 Decision-Mking in Lrge-Scle Systems My 10 MIT, Spring 2004 Hndout #29 Lecture Note 24 1 Online Lerning nd Regret Minimiztion In this lecture, we consider the problem of sequentil decision mking in
More informationReinforcement Learning
Reinforcement Lerning Tom Mitchell, Mchine Lerning, chpter 13 Outline Introduction Comprison with inductive lerning Mrkov Decision Processes: the model Optiml policy: The tsk Q Lerning: Q function Algorithm
More informationReinforcement learning II
CS 1675 Introduction to Mchine Lerning Lecture 26 Reinforcement lerning II Milos Huskrecht milos@cs.pitt.edu 5329 Sennott Squre Reinforcement lerning Bsics: Input x Lerner Output Reinforcement r Critic
More informationAdministrivia CSE 190: Reinforcement Learning: An Introduction
Administrivi CSE 190: Reinforcement Lerning: An Introduction Any emil sent to me bout the course should hve CSE 190 in the subject line! Chpter 4: Dynmic Progrmming Acknowledgment: A good number of these
More information2D1431 Machine Learning Lab 3: Reinforcement Learning
2D1431 Mchine Lerning Lb 3: Reinforcement Lerning Frnk Hoffmnn modified by Örjn Ekeberg December 7, 2004 1 Introduction In this lb you will lern bout dynmic progrmming nd reinforcement lerning. It is ssumed
More information19 Optimal behavior: Game theory
Intro. to Artificil Intelligence: Dle Schuurmns, Relu Ptrscu 1 19 Optiml behvior: Gme theory Adversril stte dynmics hve to ccount for worst cse Compute policy π : S A tht mximizes minimum rewrd Let S (,
More informationSolution for Assignment 1 : Intro to Probability and Statistics, PAC learning
Solution for Assignment 1 : Intro to Probbility nd Sttistics, PAC lerning 10-701/15-781: Mchine Lerning (Fll 004) Due: Sept. 30th 004, Thursdy, Strt of clss Question 1. Bsic Probbility ( 18 pts) 1.1 (
More informationChapter 0. What is the Lebesgue integral about?
Chpter 0. Wht is the Lebesgue integrl bout? The pln is to hve tutoril sheet ech week, most often on Fridy, (to be done during the clss) where you will try to get used to the ides introduced in the previous
More informationp-adic Egyptian Fractions
p-adic Egyptin Frctions Contents 1 Introduction 1 2 Trditionl Egyptin Frctions nd Greedy Algorithm 2 3 Set-up 3 4 p-greedy Algorithm 5 5 p-egyptin Trditionl 10 6 Conclusion 1 Introduction An Egyptin frction
More information1 Probability Density Functions
Lis Yn CS 9 Continuous Distributions Lecture Notes #9 July 6, 28 Bsed on chpter by Chris Piech So fr, ll rndom vribles we hve seen hve been discrete. In ll the cses we hve seen in CS 9, this ment tht our
More information{ } = E! & $ " k r t +k +1
Chpter 4: Dynmic Progrmming Objectives of this chpter: Overview of collection of clssicl solution methods for MDPs known s dynmic progrmming (DP) Show how DP cn be used to compute vlue functions, nd hence,
More information5.7 Improper Integrals
458 pplictions of definite integrls 5.7 Improper Integrls In Section 5.4, we computed the work required to lift pylod of mss m from the surfce of moon of mss nd rdius R to height H bove the surfce of the
More informationChapter 4: Dynamic Programming
Chpter 4: Dynmic Progrmming Objectives of this chpter: Overview of collection of clssicl solution methods for MDPs known s dynmic progrmming (DP) Show how DP cn be used to compute vlue functions, nd hence,
More informationHow do we solve these things, especially when they get complicated? How do we know when a system has a solution, and when is it unique?
XII. LINEAR ALGEBRA: SOLVING SYSTEMS OF EQUATIONS Tody we re going to tlk bout solving systems of liner equtions. These re problems tht give couple of equtions with couple of unknowns, like: 6 2 3 7 4
More informationCS 188 Introduction to Artificial Intelligence Fall 2018 Note 7
CS 188 Introduction to Artificil Intelligence Fll 2018 Note 7 These lecture notes re hevily bsed on notes originlly written by Nikhil Shrm. Decision Networks In the third note, we lerned bout gme trees
More informationBellman Optimality Equation for V*
Bellmn Optimlity Eqution for V* The vlue of stte under n optiml policy must equl the expected return for the best ction from tht stte: V (s) mx Q (s,) A(s) mx A(s) mx A(s) Er t 1 V (s t 1 ) s t s, t s
More informationImproper Integrals, and Differential Equations
Improper Integrls, nd Differentil Equtions October 22, 204 5.3 Improper Integrls Previously, we discussed how integrls correspond to res. More specificlly, we sid tht for function f(x), the region creted
More informationProperties of Integrals, Indefinite Integrals. Goals: Definition of the Definite Integral Integral Calculations using Antiderivatives
Block #6: Properties of Integrls, Indefinite Integrls Gols: Definition of the Definite Integrl Integrl Clcultions using Antiderivtives Properties of Integrls The Indefinite Integrl 1 Riemnn Sums - 1 Riemnn
More informationModule 6 Value Iteration. CS 886 Sequential Decision Making and Reinforcement Learning University of Waterloo
Module 6 Vlue Itertion CS 886 Sequentil Decision Mking nd Reinforcement Lerning University of Wterloo Mrkov Decision Process Definition Set of sttes: S Set of ctions (i.e., decisions): A Trnsition model:
More informationTHE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS.
THE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS RADON ROSBOROUGH https://intuitiveexplntionscom/picrd-lindelof-theorem/ This document is proof of the existence-uniqueness theorem
More informationDuality # Second iteration for HW problem. Recall our LP example problem we have been working on, in equality form, is given below.
Dulity #. Second itertion for HW problem Recll our LP emple problem we hve been working on, in equlity form, is given below.,,,, 8 m F which, when written in slightly different form, is 8 F Recll tht we
More informationRiemann Sums and Riemann Integrals
Riemnn Sums nd Riemnn Integrls Jmes K. Peterson Deprtment of Biologicl Sciences nd Deprtment of Mthemticl Sciences Clemson University August 26, 2013 Outline 1 Riemnn Sums 2 Riemnn Integrls 3 Properties
More informationChapter 14. Matrix Representations of Linear Transformations
Chpter 4 Mtrix Representtions of Liner Trnsformtions When considering the Het Stte Evolution, we found tht we could describe this process using multipliction by mtrix. This ws nice becuse computers cn
More informationRiemann Sums and Riemann Integrals
Riemnn Sums nd Riemnn Integrls Jmes K. Peterson Deprtment of Biologicl Sciences nd Deprtment of Mthemticl Sciences Clemson University August 26, 203 Outline Riemnn Sums Riemnn Integrls Properties Abstrct
More informationReview of Calculus, cont d
Jim Lmbers MAT 460 Fll Semester 2009-10 Lecture 3 Notes These notes correspond to Section 1.1 in the text. Review of Clculus, cont d Riemnn Sums nd the Definite Integrl There re mny cses in which some
More informationNon-Linear & Logistic Regression
Non-Liner & Logistic Regression If the sttistics re boring, then you've got the wrong numbers. Edwrd R. Tufte (Sttistics Professor, Yle University) Regression Anlyses When do we use these? PART 1: find
More informationRecitation 3: More Applications of the Derivative
Mth 1c TA: Pdric Brtlett Recittion 3: More Applictions of the Derivtive Week 3 Cltech 2012 1 Rndom Question Question 1 A grph consists of the following: A set V of vertices. A set E of edges where ech
More informationMath 1B, lecture 4: Error bounds for numerical methods
Mth B, lecture 4: Error bounds for numericl methods Nthn Pflueger 4 September 0 Introduction The five numericl methods descried in the previous lecture ll operte by the sme principle: they pproximte the
More informationImproper Integrals. Type I Improper Integrals How do we evaluate an integral such as
Improper Integrls Two different types of integrls cn qulify s improper. The first type of improper integrl (which we will refer to s Type I) involves evluting n integrl over n infinite region. In the grph
More informationThe Regulated and Riemann Integrals
Chpter 1 The Regulted nd Riemnn Integrls 1.1 Introduction We will consider severl different pproches to defining the definite integrl f(x) dx of function f(x). These definitions will ll ssign the sme vlue
More informationMath 8 Winter 2015 Applications of Integration
Mth 8 Winter 205 Applictions of Integrtion Here re few importnt pplictions of integrtion. The pplictions you my see on n exm in this course include only the Net Chnge Theorem (which is relly just the Fundmentl
More informationf(x) dx, If one of these two conditions is not met, we call the integral improper. Our usual definition for the value for the definite integral
Improper Integrls Every time tht we hve evluted definite integrl such s f(x) dx, we hve mde two implicit ssumptions bout the integrl:. The intervl [, b] is finite, nd. f(x) is continuous on [, b]. If one
More informationMulti-Armed Bandits: Non-adaptive and Adaptive Sampling
CSE 547/Stt 548: Mchine Lerning for Big Dt Lecture Multi-Armed Bndits: Non-dptive nd Adptive Smpling Instructor: Shm Kkde 1 The (stochstic) multi-rmed bndit problem The bsic prdigm is s follows: K Independent
More informationModule 6: LINEAR TRANSFORMATIONS
Module 6: LINEAR TRANSFORMATIONS. Trnsformtions nd mtrices Trnsformtions re generliztions of functions. A vector x in some set S n is mpped into m nother vector y T( x). A trnsformtion is liner if, for
More information3.4 Numerical integration
3.4. Numericl integrtion 63 3.4 Numericl integrtion In mny economic pplictions it is necessry to compute the definite integrl of relvlued function f with respect to "weight" function w over n intervl [,
More informationLecture 1. Functional series. Pointwise and uniform convergence.
1 Introduction. Lecture 1. Functionl series. Pointwise nd uniform convergence. In this course we study mongst other things Fourier series. The Fourier series for periodic function f(x) with period 2π is
More information221B Lecture Notes WKB Method
Clssicl Limit B Lecture Notes WKB Method Hmilton Jcobi Eqution We strt from the Schrödinger eqution for single prticle in potentil i h t ψ x, t = [ ] h m + V x ψ x, t. We cn rewrite this eqution by using
More informationIntermediate Math Circles Wednesday, November 14, 2018 Finite Automata II. Nickolas Rollick a b b. a b 4
Intermedite Mth Circles Wednesdy, Novemer 14, 2018 Finite Automt II Nickols Rollick nrollick@uwterloo.c Regulr Lnguges Lst time, we were introduced to the ide of DFA (deterministic finite utomton), one
More informationThe First Fundamental Theorem of Calculus. If f(x) is continuous on [a, b] and F (x) is any antiderivative. f(x) dx = F (b) F (a).
The Fundmentl Theorems of Clculus Mth 4, Section 0, Spring 009 We now know enough bout definite integrls to give precise formultions of the Fundmentl Theorems of Clculus. We will lso look t some bsic emples
More informationand that at t = 0 the object is at position 5. Find the position of the object at t = 2.
7.2 The Fundmentl Theorem of Clculus 49 re mny, mny problems tht pper much different on the surfce but tht turn out to be the sme s these problems, in the sense tht when we try to pproimte solutions we
More information7.2 The Definite Integral
7.2 The Definite Integrl the definite integrl In the previous section, it ws found tht if function f is continuous nd nonnegtive, then the re under the grph of f on [, b] is given by F (b) F (), where
More informationReversals of Signal-Posterior Monotonicity for Any Bounded Prior
Reversls of Signl-Posterior Monotonicity for Any Bounded Prior Christopher P. Chmbers Pul J. Hely Abstrct Pul Milgrom (The Bell Journl of Economics, 12(2): 380 391) showed tht if the strict monotone likelihood
More informationDATA Search I 魏忠钰. 复旦大学大数据学院 School of Data Science, Fudan University. March 7 th, 2018
DATA620006 魏忠钰 Serch I Mrch 7 th, 2018 Outline Serch Problems Uninformed Serch Depth-First Serch Bredth-First Serch Uniform-Cost Serch Rel world tsk - Pc-mn Serch problems A serch problem consists of:
More informationExam 2, Mathematics 4701, Section ETY6 6:05 pm 7:40 pm, March 31, 2016, IH-1105 Instructor: Attila Máté 1
Exm, Mthemtics 471, Section ETY6 6:5 pm 7:4 pm, Mrch 1, 16, IH-115 Instructor: Attil Máté 1 17 copies 1. ) Stte the usul sufficient condition for the fixed-point itertion to converge when solving the eqution
More informationLecture 14: Quadrature
Lecture 14: Qudrture This lecture is concerned with the evlution of integrls fx)dx 1) over finite intervl [, b] The integrnd fx) is ssumed to be rel-vlues nd smooth The pproximtion of n integrl by numericl
More informationMath Lecture 23
Mth 8 - Lecture 3 Dyln Zwick Fll 3 In our lst lecture we delt with solutions to the system: x = Ax where A is n n n mtrix with n distinct eigenvlues. As promised, tody we will del with the question of
More informationdt. However, we might also be curious about dy
Section 0. The Clculus of Prmetric Curves Even though curve defined prmetricly my not be function, we cn still consider concepts such s rtes of chnge. However, the concepts will need specil tretment. For
More informationODE: Existence and Uniqueness of a Solution
Mth 22 Fll 213 Jerry Kzdn ODE: Existence nd Uniqueness of Solution The Fundmentl Theorem of Clculus tells us how to solve the ordinry differentil eqution (ODE) du = f(t) dt with initil condition u() =
More informationRiemann Integrals and the Fundamental Theorem of Calculus
Riemnn Integrls nd the Fundmentl Theorem of Clculus Jmes K. Peterson Deprtment of Biologicl Sciences nd Deprtment of Mthemticl Sciences Clemson University September 16, 2013 Outline Grphing Riemnn Sums
More informationTheoretical foundations of Gaussian quadrature
Theoreticl foundtions of Gussin qudrture 1 Inner product vector spce Definition 1. A vector spce (or liner spce) is set V = {u, v, w,...} in which the following two opertions re defined: (A) Addition of
More informationContinuous Random Variables
STAT/MATH 395 A - PROBABILITY II UW Winter Qurter 217 Néhémy Lim Continuous Rndom Vribles Nottion. The indictor function of set S is rel-vlued function defined by : { 1 if x S 1 S (x) if x S Suppose tht
More informationA recursive construction of efficiently decodable list-disjunct matrices
CSE 709: Compressed Sensing nd Group Testing. Prt I Lecturers: Hung Q. Ngo nd Atri Rudr SUNY t Bufflo, Fll 2011 Lst updte: October 13, 2011 A recursive construction of efficiently decodble list-disjunct
More informationCMDA 4604: Intermediate Topics in Mathematical Modeling Lecture 19: Interpolation and Quadrature
CMDA 4604: Intermedite Topics in Mthemticl Modeling Lecture 19: Interpoltion nd Qudrture In this lecture we mke brief diversion into the res of interpoltion nd qudrture. Given function f C[, b], we sy
More informationQuantum Physics II (8.05) Fall 2013 Assignment 2
Quntum Physics II (8.05) Fll 2013 Assignment 2 Msschusetts Institute of Technology Physics Deprtment Due Fridy September 20, 2013 September 13, 2013 3:00 pm Suggested Reding Continued from lst week: 1.
More informationJonathan Mugan. July 15, 2013
Jonthn Mugn July 15, 2013 Imgine rt in Skinner box. The rt cn see screen of imges, nd dot in the lower-right corner determines if there will be shock. Bottom-up methods my not find this dot, but top-down
More informationNew Expansion and Infinite Series
Interntionl Mthemticl Forum, Vol. 9, 204, no. 22, 06-073 HIKARI Ltd, www.m-hikri.com http://dx.doi.org/0.2988/imf.204.4502 New Expnsion nd Infinite Series Diyun Zhng College of Computer Nnjing University
More informationNew data structures to reduce data size and search time
New dt structures to reduce dt size nd serch time Tsuneo Kuwbr Deprtment of Informtion Sciences, Fculty of Science, Kngw University, Hirtsuk-shi, Jpn FIT2018 1D-1, No2, pp1-4 Copyright (c)2018 by The Institute
More informationChapter 5 : Continuous Random Variables
STAT/MATH 395 A - PROBABILITY II UW Winter Qurter 216 Néhémy Lim Chpter 5 : Continuous Rndom Vribles Nottions. N {, 1, 2,...}, set of nturl numbers (i.e. ll nonnegtive integers); N {1, 2,...}, set of ll
More information13.4 Work done by Constant Forces
13.4 Work done by Constnt Forces We will begin our discussion of the concept of work by nlyzing the motion of n object in one dimension cted on by constnt forces. Let s consider the following exmple: push
More informationA Fast and Reliable Policy Improvement Algorithm
A Fst nd Relible Policy Improvement Algorithm Ysin Abbsi-Ydkori Peter L. Brtlett Stephen J. Wright Queenslnd University of Technology UC Berkeley nd QUT University of Wisconsin-Mdison Abstrct We introduce
More informationFor the percentage of full time students at RCC the symbols would be:
Mth 17/171 Chpter 7- ypothesis Testing with One Smple This chpter is s simple s the previous one, except it is more interesting In this chpter we will test clims concerning the sme prmeters tht we worked
More informationIn-Class Problems 2 and 3: Projectile Motion Solutions. In-Class Problem 2: Throwing a Stone Down a Hill
MASSACHUSETTS INSTITUTE OF TECHNOLOGY Deprtment of Physics Physics 8T Fll Term 4 In-Clss Problems nd 3: Projectile Motion Solutions We would like ech group to pply the problem solving strtegy with the
More informationThe graphs of Rational Functions
Lecture 4 5A: The its of Rtionl Functions s x nd s x + The grphs of Rtionl Functions The grphs of rtionl functions hve severl differences compred to power functions. One of the differences is the behvior
More informationMath 4310 Solutions to homework 1 Due 9/1/16
Mth 4310 Solutions to homework 1 Due 9/1/16 1. Use the Eucliden lgorithm to find the following gretest common divisors. () gcd(252, 180) = 36 (b) gcd(513, 187) = 1 (c) gcd(7684, 4148) = 68 252 = 180 1
More informationThe Atwood Machine OBJECTIVE INTRODUCTION APPARATUS THEORY
The Atwood Mchine OBJECTIVE To derive the ening of Newton's second lw of otion s it pplies to the Atwood chine. To explin how ss iblnce cn led to the ccelertion of the syste. To deterine the ccelertion
More informationChapter 4 Contravariance, Covariance, and Spacetime Diagrams
Chpter 4 Contrvrince, Covrince, nd Spcetime Digrms 4. The Components of Vector in Skewed Coordintes We hve seen in Chpter 3; figure 3.9, tht in order to show inertil motion tht is consistent with the Lorentz
More informationChapters 4 & 5 Integrals & Applications
Contents Chpters 4 & 5 Integrls & Applictions Motivtion to Chpters 4 & 5 2 Chpter 4 3 Ares nd Distnces 3. VIDEO - Ares Under Functions............................................ 3.2 VIDEO - Applictions
More informationWe partition C into n small arcs by forming a partition of [a, b] by picking s i as follows: a = s 0 < s 1 < < s n = b.
Mth 255 - Vector lculus II Notes 4.2 Pth nd Line Integrls We begin with discussion of pth integrls (the book clls them sclr line integrls). We will do this for function of two vribles, but these ides cn
More informationGoals: Determine how to calculate the area described by a function. Define the definite integral. Explore the relationship between the definite
Unit #8 : The Integrl Gols: Determine how to clculte the re described by function. Define the definite integrl. Eplore the reltionship between the definite integrl nd re. Eplore wys to estimte the definite
More information1 Linear Least Squares
Lest Squres Pge 1 1 Liner Lest Squres I will try to be consistent in nottion, with n being the number of dt points, nd m < n being the number of prmeters in model function. We re interested in solving
More information13: Diffusion in 2 Energy Groups
3: Diffusion in Energy Groups B. Rouben McMster University Course EP 4D3/6D3 Nucler Rector Anlysis (Rector Physics) 5 Sept.-Dec. 5 September Contents We study the diffusion eqution in two energy groups
More informationARITHMETIC OPERATIONS. The real numbers have the following properties: a b c ab ac
REVIEW OF ALGEBRA Here we review the bsic rules nd procedures of lgebr tht you need to know in order to be successful in clculus. ARITHMETIC OPERATIONS The rel numbers hve the following properties: b b
More informationSection 6.1 INTRO to LAPLACE TRANSFORMS
Section 6. INTRO to LAPLACE TRANSFORMS Key terms: Improper Integrl; diverge, converge A A f(t)dt lim f(t)dt Piecewise Continuous Function; jump discontinuity Function of Exponentil Order Lplce Trnsform
More informationNormal Distribution. Lecture 6: More Binomial Distribution. Properties of the Unit Normal Distribution. Unit Normal Distribution
Norml Distribution Lecture 6: More Binomil Distribution If X is rndom vrible with norml distribution with men µ nd vrince σ 2, X N (µ, σ 2, then P(X = x = f (x = 1 e 1 (x µ 2 2 σ 2 σ Sttistics 104 Colin
More informationUNIFORM CONVERGENCE. Contents 1. Uniform Convergence 1 2. Properties of uniform convergence 3
UNIFORM CONVERGENCE Contents 1. Uniform Convergence 1 2. Properties of uniform convergence 3 Suppose f n : Ω R or f n : Ω C is sequence of rel or complex functions, nd f n f s n in some sense. Furthermore,
More informationLECTURE NOTE #12 PROF. ALAN YUILLE
LECTURE NOTE #12 PROF. ALAN YUILLE 1. Clustering, K-mens, nd EM Tsk: set of unlbeled dt D = {x 1,..., x n } Decompose into clsses w 1,..., w M where M is unknown. Lern clss models p(x w)) Discovery of
More informationW. We shall do so one by one, starting with I 1, and we shall do it greedily, trying
Vitli covers 1 Definition. A Vitli cover of set E R is set V of closed intervls with positive length so tht, for every δ > 0 nd every x E, there is some I V with λ(i ) < δ nd x I. 2 Lemm (Vitli covering)
More informationBest Approximation. Chapter The General Case
Chpter 4 Best Approximtion 4.1 The Generl Cse In the previous chpter, we hve seen how n interpolting polynomil cn be used s n pproximtion to given function. We now wnt to find the best pproximtion to given
More information8 Laplace s Method and Local Limit Theorems
8 Lplce s Method nd Locl Limit Theorems 8. Fourier Anlysis in Higher DImensions Most of the theorems of Fourier nlysis tht we hve proved hve nturl generliztions to higher dimensions, nd these cn be proved
More informationNumerical integration
2 Numericl integrtion This is pge i Printer: Opque this 2. Introduction Numericl integrtion is problem tht is prt of mny problems in the economics nd econometrics literture. The orgniztion of this chpter
More informationNumerical Integration. 1 Introduction. 2 Midpoint Rule, Trapezoid Rule, Simpson Rule. AMSC/CMSC 460/466 T. von Petersdorff 1
AMSC/CMSC 46/466 T. von Petersdorff 1 umericl Integrtion 1 Introduction We wnt to pproximte the integrl I := f xdx where we re given, b nd the function f s subroutine. We evlute f t points x 1,...,x n
More informationMath 426: Probability Final Exam Practice
Mth 46: Probbility Finl Exm Prctice. Computtionl problems 4. Let T k (n) denote the number of prtitions of the set {,..., n} into k nonempty subsets, where k n. Argue tht T k (n) kt k (n ) + T k (n ) by
More informationTopic 1 Notes Jeremy Orloff
Topic 1 Notes Jerem Orloff 1 Introduction to differentil equtions 1.1 Gols 1. Know the definition of differentil eqution. 2. Know our first nd second most importnt equtions nd their solutions. 3. Be ble
More informationMATH34032: Green s Functions, Integral Equations and the Calculus of Variations 1
MATH34032: Green s Functions, Integrl Equtions nd the Clculus of Vritions 1 Section 1 Function spces nd opertors Here we gives some brief detils nd definitions, prticulrly relting to opertors. For further
More informationUnit #9 : Definite Integral Properties; Fundamental Theorem of Calculus
Unit #9 : Definite Integrl Properties; Fundmentl Theorem of Clculus Gols: Identify properties of definite integrls Define odd nd even functions, nd reltionship to integrl vlues Introduce the Fundmentl
More informationSufficient condition on noise correlations for scalable quantum computing
Sufficient condition on noise correltions for sclble quntum computing John Presill, 2 Februry 202 Is quntum computing sclble? The ccurcy threshold theorem for quntum computtion estblishes tht sclbility
More informationAQA Further Pure 1. Complex Numbers. Section 1: Introduction to Complex Numbers. The number system
Complex Numbers Section 1: Introduction to Complex Numbers Notes nd Exmples These notes contin subsections on The number system Adding nd subtrcting complex numbers Multiplying complex numbers Complex
More informationBases for Vector Spaces
Bses for Vector Spces 2-26-25 A set is independent if, roughly speking, there is no redundncy in the set: You cn t uild ny vector in the set s liner comintion of the others A set spns if you cn uild everything
More informationSUMMER KNOWHOW STUDY AND LEARNING CENTRE
SUMMER KNOWHOW STUDY AND LEARNING CENTRE Indices & Logrithms 2 Contents Indices.2 Frctionl Indices.4 Logrithms 6 Exponentil equtions. Simplifying Surds 13 Opertions on Surds..16 Scientific Nottion..18
More informationTests for the Ratio of Two Poisson Rates
Chpter 437 Tests for the Rtio of Two Poisson Rtes Introduction The Poisson probbility lw gives the probbility distribution of the number of events occurring in specified intervl of time or spce. The Poisson
More informationPh2b Quiz - 1. Instructions
Ph2b Winter 217-18 Quiz - 1 Due Dte: Mondy, Jn 29, 218 t 4pm Ph2b Quiz - 1 Instructions 1. Your solutions re due by Mondy, Jnury 29th, 218 t 4pm in the quiz box outside 21 E. Bridge. 2. Lte quizzes will
More informationP 3 (x) = f(0) + f (0)x + f (0) 2. x 2 + f (0) . In the problem set, you are asked to show, in general, the n th order term is a n = f (n) (0)
1 Tylor polynomils In Section 3.5, we discussed how to pproximte function f(x) round point in terms of its first derivtive f (x) evluted t, tht is using the liner pproximtion f() + f ()(x ). We clled this
More informationMAA 4212 Improper Integrals
Notes by Dvid Groisser, Copyright c 1995; revised 2002, 2009, 2014 MAA 4212 Improper Integrls The Riemnn integrl, while perfectly well-defined, is too restrictive for mny purposes; there re functions which
More informationMonte Carlo method in solving numerical integration and differential equation
Monte Crlo method in solving numericl integrtion nd differentil eqution Ye Jin Chemistry Deprtment Duke University yj66@duke.edu Abstrct: Monte Crlo method is commonly used in rel physics problem. The
More informationLecture 1: Introduction to integration theory and bounded variation
Lecture 1: Introduction to integrtion theory nd bounded vrition Wht is this course bout? Integrtion theory. The first question you might hve is why there is nything you need to lern bout integrtion. You
More informationHeat flux and total heat
Het flux nd totl het John McCun Mrch 14, 2017 1 Introduction Yesterdy (if I remember correctly) Ms. Prsd sked me question bout the condition of insulted boundry for the 1D het eqution, nd (bsed on glnce
More informationCS 188: Artificial Intelligence Spring 2007
CS 188: Artificil Intelligence Spring 2007 Lecture 3: Queue-Bsed Serch 1/23/2007 Srini Nrynn UC Berkeley Mny slides over the course dpted from Dn Klein, Sturt Russell or Andrew Moore Announcements Assignment
More informationStrong Bisimulation. Overview. References. Actions Labeled transition system Transition semantics Simulation Bisimulation
Strong Bisimultion Overview Actions Lbeled trnsition system Trnsition semntics Simultion Bisimultion References Robin Milner, Communiction nd Concurrency Robin Milner, Communicting nd Mobil Systems 32
More informationUNIT 1 FUNCTIONS AND THEIR INVERSES Lesson 1.4: Logarithmic Functions as Inverses Instruction
Lesson : Logrithmic Functions s Inverses Prerequisite Skills This lesson requires the use of the following skills: determining the dependent nd independent vribles in n exponentil function bsed on dt from
More informationState space systems analysis (continued) Stability. A. Definitions A system is said to be Asymptotically Stable (AS) when it satisfies
Stte spce systems nlysis (continued) Stbility A. Definitions A system is sid to be Asymptoticlly Stble (AS) when it stisfies ut () = 0, t > 0 lim xt () 0. t A system is AS if nd only if the impulse response
More information