Games Against Nature
|
|
- Claude Butler
- 5 years ago
- Views:
Transcription
1 Advanced Course in Machine Learning Spring 2010 Games Agains Naure Handous are joinly prepared by Shie Mannor and Shai Shalev-Shwarz In he previous lecures we alked abou expers in differen seups and analyzed he regre of he algorihm by comparing is performance o he performance of he bes fixed expers (and laer he bes shifing exper). In his lecure we consider he game heory connecion and presen games agains Naure. Along he way, we presen one of he mos common ools o analyze predicion problems: approachabiliy heory. The seup in oday s lecure is ha of full informaion. The nex lecure will be devoed o he parial informaion seup. We sar from a more general model for he game and hen show how o apply i o differen online learning seups. 1 The Model The model is comprised of a single player playing agains Naure. The game is repeaed in ime, and a sage he decision maker has o choose an acion a A and Naure chooses (simulaneously) an acion b B. As a resul he decision maker obains a reward r R(a, b ) (ha is, he reward can be sochasic: we will only need finie second momens). The game coninues ad infinium. We le he average reward be denoed by ˆr = 1 r τ. Noe: There is no reward for Naure, herefore his is no a game in he sandard sense of he word (or, one can say his is a zero-sum game). The decision maker keeps rack of he rewards and of Naure s acions. We consider he empirical frequency of Naure s acions as: q (b) = 1 1{b = b} and noe ha q (B), he se of disribuions over B. 1.1 The saionary case If Naure is saionary (i.e., he acions are generaed from an IID source q ) hen: q q a.s. (In fac, we have exponenially fas convergence: Pr( q q > ɛ) C exp( C ɛ 2 ).) In ha case, one can hope o obain a reward as high as he bes response reward: By obaining we mean: r (q) = max p a,b q(a)p(b)r(a, b) = max a ˆr r (q ) Here is a simple ficiious play algorihm ha obains ha: a.s. q(b)r(a, b). b Games Agains Naure-1
2 1. Observe b and for an esimae: q = 1 1{b = b}. 2. Play a arg max r(a, q ). This algorihm is also based on he celebraed cerainy equivalence scheme. Theorem 1 The Ficiious Play algorihm saisfies ha ˆr r (q ) a.s. Bu wha happens if Naure is no saionary? 1.2 Arbirary source Suppose now ha he sequence b 1, b 2,... is generaed by an arbirary process. Arbirary here means no necessarily sochasic. Clearly, we canno assume ha q converges. Our objecive of having he average reward converge o r (q ) is no well defined anymore since q may no exis. We can define he average regre as: R = r (q ) ˆr. This is a random variable. Randomness is deermined by randomness in he algorihm. The basic quesions is herefore: Can we find an algorihm such ha lim sup R 0 a.s.? If such an algorihm exiss we call i 0-regre (we will laer call such an algorihm 0 exernal regre, bu his is sufficien for now). This is, of course, he same noion from he previous wo lecures where we consider he average regre as opposed o he cumulaive regre. Naure models. 1. Oblivious. Naure wries down he sequence of b 1, b 2,... a ime 0 (no disclosing hem). 2. Non-oblivious. Naure is adversarial and i ries o maximize he regre. Naure may even be aware of any randomizaion he decision maker does (bu no he value of privae coin osses). Observaions: 1. An non-oblivious opponen is a very srong model: i encompasses a wors case view on disurbances in many sysems and i generalizes play agains an adversary. 2. Ficiious play would fail since randomizaion is needed. Ficiious play is called here follow he leader (FL). 3. If he leader does no change (asympoically), FL does have 0 regre. More ineresingly, as long as here are no many swiches, FL works. More precisely, we say ha FL swiches from acion a o a a ime if a 1 = a and a = a. We le he number of swiches be N. We say ha FL exhibis infrequen swiches along a hisory if for every ɛ > 0 here exiss T such ha N / < ɛ for all T. Theorem 2 If FL exhibis infrequen swiches along a hisory i saisfies lim sup R 0 along ha hisory. Proof: Home exercise. (Noe ha we do no use almos sure quanifiers since clearly FL is no opimal for every hisory.) Games Agains Naure-2
3 1.3 A generalized noion of regre In general, regre can be defined as he difference beween he obained (cumulaive reward) and he reward ha would have been obained by he bes sraegy in a reference se. Tha is: R = sup r(σ, hisory) ˆr, sraegy σ where r(σ, hisory) is an esimae of he average reward if playing σ. This is no always well defined or achievable. In he example above, he se of sraegies is simply he se of saionary sraegies. One can easily hink of oher ses of sraegies such as he se of sraegies ha depend on he las observaion from Naure. In ha case: he se of sraegies is idenified wih p p(a b 1 ) (A) B and he reward as a funcion of hisory is defined as: r(σ, hisory) = 1 p(a b 1 )r(a, b ), where b 0 is defined is one of he members of B. We observe ha his comparison class is richer han he comparison class we considered above which can be idenified wih p(a) (A). We will show laer ha here is an asympoical 0-regre sraegy agains his paricular comparison class. a 2 Blackwell s Approachabiliy We now inroduce a useful ool in he analysis of repeaed games agains Naure called Blackwell s approachabiliy heory. Le us define a vecor-valued wo-player game. We call he players P1 and P2 o disinguish hem from he decision makers above. We consider a wo player vecor-valued repeaed game where boh P1 and P2 choose acions as before from finie ses A and B. The reward is now a k-dimensional vecor, m(a, b) R k. As before, he sage game reward is m m(a, b ) (he reward can be a random vecor). The average reward is ˆm = 1 m. P1 s ask is o approach a arge se T, namely o ensure convergence of he average reward vecor o his se irrespecively of P2 s acions. Formally, le T R k denoe he arge se. In he following, d is he Euclidean disance in R k. The se-o-poin disance beween a poin x and a se T is d(x, T ) = inf y T d(x, y). (We le P π,σ denoe he probabiliy measure when P1 plays he policy π and P2 plays policy σ.) Definiion 1 A policy π of P1 approaches a se T R k if lim d( ˆm n, T ) = 0 P π n,σ-a.s., for every σ Σ. A policy σ Σ of P2 excludes a se T if for some δ > 0, lim inf d( ˆm n, T ) > δ P π,σ -a.s. for every π Π, The policy π (σ ) will be called an approaching (excluding) policy for P1 (P2). A se is approachable if here exiss an approaching policy. Noing ha approaching a se and is opological closure are he same, we shall henceforh suppose ha he se T is closed. The noion of approachabiliy and excludabiliy assumes uniformiy wih respec o ime (and he sraegy of P2 (approachabiliy) or P1 (excludabiliy). Games Agains Naure-3
4 2.1 The projeced game Le u be a uni vecor in he reward space R k. We ofen consider he projeced game in direcion u as he zero-sum game wih he same dynamics as above, and scalar rewards r n = m n u. Here sands for he sandard inner produc in R k. Denoe his game by Γ(u). 2.2 The Basic Approachabiliy Resuls For any x T, denoe by C x a closes poin in T o x, and le u x be he uni vecor in he direcion of C x x, which poins from x o he goal se T. The following heorem requires, geomerically, ha here exiss a (mixed) acion p(x) such ha he se of all possible (vecor-valued) expeced rewards is on he oher side of he hyperplane suppored by C x in direcion u x. Theorem 3 Assume ha for every poin x T here exiss a sraegy p(x) such ha: (m(p(x), q) C x ) u x 0, q (B). (1) Then T is approachable by P1. An approaching policy is given as follows: If ˆm n T, play p( ˆm n ), oherwise, play arbirarily. Proof Le y n = C ˆmn and denoe by F n he filraion generaed by he hisory up o ime n. We furher le d n = ˆm n y n. We wan o prove ha d n 0 a.s.. We have ha: IE(d 2 n+1 F n ) = IE( ˆm n+1 y n+1 2 Fn ) IE( ˆm n+1 y n 2 Fn ) = IE( ˆm n+1 ˆm n + ˆm n y n 2 Fn ) = ˆm n y n 2 + IE( ˆm n+1 ˆm n 2 F n ) + 2IE(( ˆm n y n ) ( ˆm n+1 ˆm n ) F n ). Now, since ˆm n+1 ˆm n = m n+1 /(n + 1) ˆm n /(n + 1) we have ha: Expanding he las erm we obain: IE(d 2 n+1 F n ) d 2 n + C n 2 + 2IE(( ˆm n y n ) ( ˆm n+1 ˆm n ) F n ). ( ˆm n y n ) ( ˆm n+1 ˆm n ) = ( ˆm n y n ) (m n+1 /(n + 1) ˆm n /(n + 1)) = ( ˆm n y n ) (y n /n + 1 ˆm n /(n + 1) + m n+1 /(n + 1) y n /(n + 1)) = d 2 n/(n + 1) + 1 n + 1 ( ˆm n y n ) (m n+1 /(n + 1) y n /(n + 1)) Now, he expeced value of he las erm is negaive so we obain: IE(d 2 n+1 F n ) (1 2 n + 1 )d2 n + c n 2. I follows by Lemma 1 ha d n 0 almos surely. Remarks: 1. Convergence Raes. The convergence rae of he above policy is O( T ) and is independen of he dimension. The only dependence kicks in hrough he magniude of he randomness (he second momen, o be exac). Games Agains Naure-4
5 2. Complexiy. There are wo disinc elemens o compuing an approaching sraegy as in Theorem 3. The firs is finding he closes poin C x and he second is solving he projeced game. Solving he projeced 0-sum game can be easily done using linear programming (or oher mehods) wih polynomial dependence on he number of acions of boh players. Finding C x, however, can be in general a very hard problem as finding he closes poin in a non-convex se is NP-hard. There are, however, some easy insances such as he case where T is convex and described in some compac form. In fac, i is enough o assume ha a convex T has a separaion oracle (i.e., we can query in polyime if a poin belongs o T or no). 3. Is a se approachable? In general, i is NP-hard even o deermine if a poin is approachable where hardness here is measured in he dimension (if he dimension is fixed i is no hard o decide if a poin is approachable). 4. The game heory connecion. The above resul generalizes he celebraed min-max heorem. To observe ha, ake a one dimensional problem. In ha case he approachable se is he segmen [v, ]. For convex arge ses, he condiion of he las heorem urns ou o be boh sufficien and necessary. Moreover, his condiion may be expressed in a simpler form, which may be considered as a generalizaion of he minimax heorem for scalar games. Given a saionary policy q (B) for P2, le Φ(A, q) co({m(p, q)} p (A) ), where co is he convex hull operaor. The Euclidean uni sphere in R k is denoed by IB k. The following heorem is characerizes convex approachable ses in an elegan way. Theorem 4 Le T be a closed convex se in R k. (i) T is approachable if and only if Φ(A, q) T for every saionary policy q (B). (ii) If T is no approachable hen i is excludable by P2. In fac, any saionary policy q ha violaes (i) is an excluding policy. (iii) T is approachable if and only if val Γ(u) inf m T u m for every u IB k, where val is he value of he (scalar) 0-sum game. Condiion (i) in Theorem 4 is someimes very easy o check, as we see below. 3 Back o regre We are now ready o use approachabiliy for proving we can minimize he regre. Consider he following vecor-valued game. When he decision maker plays a and Naure plays b and a reward r is obained he vecor-valued reward is m = (r, e b ) where e b is a vecor of zeros excep for he b-h enry which is one. I holds ha: ˆm = (ˆr, q ). Now, define he following arge se T R (B): T = {(r, q) : r r (q), q (B)}. We claim ha T is convex. Indeed, i follows ha r (q) is convex as a maximum of linear funcions. The se T is convex as he epigraph of a convex funcion. We now claim ha T is approachable. By Theorem 4, a necessary and sufficien condiion is ha Φ(A, q) T for every q. Fix some q and le p (A) be a member of he argmax of r, ha is: p arg max r(p, q). Bu his is easy o show since m(p, q) Φ(A, q) and m(p, q) T. This means ha by using approachabiliy we have ha d( ˆm, T ) 0. Wha is lef is o argue ha approaching T implies ha ˆr r (q ) 0 asympoically. This holds since r is a uniformly coninuous funcion (i is convex, coninuous and on a compac domain). We have hus proved: = Games Agains Naure-5
6 Theorem 5 There exiss a sraegy ha guaranees ha lim sup ˆr r (q ) 0 In fac, we have proved ha he convergence rae is O( T ). We now reurn o he problem where we considered generalized regre. We claim a 0-regre sraegy does exis. Indeed, consider he arge se of he form: T = {(r, π) R (B 2 ) : r max π(b, b )p(a b)r(a, b )}, p (A) B b,b B where we idenify p wih a condiional probabiliy of choosing an acion given he pas observaion (noe ha i suffices o choose a pure acion). I is easy o see ha T is convex as an epigraph of a convex funcion. Now, we need o define he game: when P1 chooses a, P2 chooses b and he previous acion chosen by P2 was b he reward is a vecor whose enries are r(a, b ) in he firs coordinae and he remaining coordinaes are zero excep for one a he b B + b coordinae. I remains an easy exercise o show ha he se T is approachable. (We noe ha a sligh exension of approachabiliy is needed: see The Empirical Bayes Envelope and Regre Minimizaion in Compeiive Markov Decision Processes. MOR 28(1): , S. Mannor and N. Shimkin.) 4 Calibraion The definiion of calibraion and a very easy proof using approachabiliy is provided in he aached noe. a.s. A Appendix Lemma 1 Assume e is a non-negaive random variable, measurable according o he sigma algebra F (F F +1 ) and ha IE(e +1 F ) (1 d )e + cd 2. (2) Furher assume ha =1 d =, d 0, and ha d 0. Then e 0 P-a.s. Proof Firs noe ha by aking he expecaion of Eq. (2) we ge: IEe +1 (1 d )IEe + cd 2. According o Bersekas and Tsisiklis (Neuro-dynamic programming, page 117) i follows ha IEe 0. Since e is non-negaive i suffices o show ha e converges. Fix ɛ > 0, le = max{ɛ, e }. V ɛ Since d 0 here exiss T (ɛ) such ha cd < ɛ for > T. Resric aenion o > T (ɛ). If e < ɛ hen If e > ɛ we have: V ɛ IEV ɛ IE(V ɛ +1 F ) (1 d )ɛ + cd 2 ɛ V ɛ. IE(V+1 F ɛ ) (1 d )e + d e V ɛ. is a super-maringale, by a sandard convergence argumen we ge V ɛ V. ɛ By definiion V ɛ ɛ and herefore IEV ɛ ɛ. Since IE [max(x, Y )] IEX + IEY i follows ha IEe + ɛ. So ha IEV ɛ = ɛ. Now we have a posiive random variable, wih expecaion ɛ which is above ɛ wih probabiliy 1. I follows ha V ɛ = ɛ. To summarize, we have shown ha for every ɛ > 0 wih probabiliy 1: lim sup e lim sup V ɛ = lim V ɛ = ɛ. Since ɛ is arbirary and e non-negaive i follows ha e 0 almos surely. Games Agains Naure-6
Online Convex Optimization Example And Follow-The-Leader
CSE599s, Spring 2014, Online Learning Lecure 2-04/03/2014 Online Convex Opimizaion Example And Follow-The-Leader Lecurer: Brendan McMahan Scribe: Sephen Joe Jonany 1 Review of Online Convex Opimizaion
More informationInventory Analysis and Management. Multi-Period Stochastic Models: Optimality of (s, S) Policy for K-Convex Objective Functions
Muli-Period Sochasic Models: Opimali of (s, S) Polic for -Convex Objecive Funcions Consider a seing similar o he N-sage newsvendor problem excep ha now here is a fixed re-ordering cos (> 0) for each (re-)order.
More informationLecture 2-1 Kinematics in One Dimension Displacement, Velocity and Acceleration Everything in the world is moving. Nothing stays still.
Lecure - Kinemaics in One Dimension Displacemen, Velociy and Acceleraion Everyhing in he world is moving. Nohing says sill. Moion occurs a all scales of he universe, saring from he moion of elecrons in
More information1 Review of Zero-Sum Games
COS 5: heoreical Machine Learning Lecurer: Rob Schapire Lecure #23 Scribe: Eugene Brevdo April 30, 2008 Review of Zero-Sum Games Las ime we inroduced a mahemaical model for wo player zero-sum games. Any
More informationT L. t=1. Proof of Lemma 1. Using the marginal cost accounting in Equation(4) and standard arguments. t )+Π RB. t )+K 1(Q RB
Elecronic Companion EC.1. Proofs of Technical Lemmas and Theorems LEMMA 1. Le C(RB) be he oal cos incurred by he RB policy. Then we have, T L E[C(RB)] 3 E[Z RB ]. (EC.1) Proof of Lemma 1. Using he marginal
More informationFinish reading Chapter 2 of Spivak, rereading earlier sections as necessary. handout and fill in some missing details!
MAT 257, Handou 6: Ocober 7-2, 20. I. Assignmen. Finish reading Chaper 2 of Spiva, rereading earlier secions as necessary. handou and fill in some missing deails! II. Higher derivaives. Also, read his
More informationHamilton- J acobi Equation: Explicit Formulas In this lecture we try to apply the method of characteristics to the Hamilton-Jacobi equation: u t
M ah 5 2 7 Fall 2 0 0 9 L ecure 1 0 O c. 7, 2 0 0 9 Hamilon- J acobi Equaion: Explici Formulas In his lecure we ry o apply he mehod of characerisics o he Hamilon-Jacobi equaion: u + H D u, x = 0 in R n
More informationOnline Learning with Partial Feedback. 1 Online Mirror Descent with Estimated Gradient
Avance Course in Machine Learning Spring 2010 Online Learning wih Parial Feeback Hanous are joinly prepare by Shie Mannor an Shai Shalev-Shwarz In previous lecures we alke abou he general framework of
More informationLecture 2 October ε-approximation of 2-player zero-sum games
Opimizaion II Winer 009/10 Lecurer: Khaled Elbassioni Lecure Ocober 19 1 ε-approximaion of -player zero-sum games In his lecure we give a randomized ficiious play algorihm for obaining an approximae soluion
More informationLecture 2 April 04, 2018
Sas 300C: Theory of Saisics Spring 208 Lecure 2 April 04, 208 Prof. Emmanuel Candes Scribe: Paulo Orensein; edied by Sephen Baes, XY Han Ouline Agenda: Global esing. Needle in a Haysack Problem 2. Threshold
More informationChapter 2. First Order Scalar Equations
Chaper. Firs Order Scalar Equaions We sar our sudy of differenial equaions in he same way he pioneers in his field did. We show paricular echniques o solve paricular ypes of firs order differenial equaions.
More informationCourse Notes for EE227C (Spring 2018): Convex Optimization and Approximation
Course Noes for EE7C Spring 018: Convex Opimizaion and Approximaion Insrucor: Moriz Hard Email: hard+ee7c@berkeley.edu Graduae Insrucor: Max Simchowiz Email: msimchow+ee7c@berkeley.edu Ocober 15, 018 3
More informationHamilton- J acobi Equation: Weak S olution We continue the study of the Hamilton-Jacobi equation:
M ah 5 7 Fall 9 L ecure O c. 4, 9 ) Hamilon- J acobi Equaion: Weak S oluion We coninue he sudy of he Hamilon-Jacobi equaion: We have shown ha u + H D u) = R n, ) ; u = g R n { = }. ). In general we canno
More informationTwo Popular Bayesian Estimators: Particle and Kalman Filters. McGill COMP 765 Sept 14 th, 2017
Two Popular Bayesian Esimaors: Paricle and Kalman Filers McGill COMP 765 Sep 14 h, 2017 1 1 1, dx x Bel x u x P x z P Recall: Bayes Filers,,,,,,, 1 1 1 1 u z u x P u z u x z P Bayes z = observaion u =
More informationVehicle Arrival Models : Headway
Chaper 12 Vehicle Arrival Models : Headway 12.1 Inroducion Modelling arrival of vehicle a secion of road is an imporan sep in raffic flow modelling. I has imporan applicaion in raffic flow simulaion where
More informationLearning a Class from Examples. Training set X. Class C 1. Class C of a family car. Output: Input representation: x 1 : price, x 2 : engine power
Alpaydin Chaper, Michell Chaper 7 Alpaydin slides are in urquoise. Ehem Alpaydin, copyrigh: The MIT Press, 010. alpaydin@boun.edu.r hp://www.cmpe.boun.edu.r/ ehem/imle All oher slides are based on Michell.
More informationGMM - Generalized Method of Moments
GMM - Generalized Mehod of Momens Conens GMM esimaion, shor inroducion 2 GMM inuiion: Maching momens 2 3 General overview of GMM esimaion. 3 3. Weighing marix...........................................
More informationEmpirical Process Theory
Empirical Process heory 4.384 ime Series Analysis, Fall 27 Reciaion by Paul Schrimpf Supplemenary o lecures given by Anna Mikusheva Ocober 7, 28 Reciaion 7 Empirical Process heory Le x be a real-valued
More informationLearning a Class from Examples. Training set X. Class C 1. Class C of a family car. Output: Input representation: x 1 : price, x 2 : engine power
Alpaydin Chaper, Michell Chaper 7 Alpaydin slides are in urquoise. Ehem Alpaydin, copyrigh: The MIT Press, 010. alpaydin@boun.edu.r hp://www.cmpe.boun.edu.r/ ehem/imle All oher slides are based on Michell.
More informationLecture Notes 2. The Hilbert Space Approach to Time Series
Time Series Seven N. Durlauf Universiy of Wisconsin. Basic ideas Lecure Noes. The Hilber Space Approach o Time Series The Hilber space framework provides a very powerful language for discussing he relaionship
More informationSupplement for Stochastic Convex Optimization: Faster Local Growth Implies Faster Global Convergence
Supplemen for Sochasic Convex Opimizaion: Faser Local Growh Implies Faser Global Convergence Yi Xu Qihang Lin ianbao Yang Proof of heorem heorem Suppose Assumpion holds and F (w) obeys he LGC (6) Given
More informationNotes on online convex optimization
Noes on online convex opimizaion Karl Sraos Online convex opimizaion (OCO) is a principled framework for online learning: OnlineConvexOpimizaion Inpu: convex se S, number of seps T For =, 2,..., T : Selec
More informationCash Flow Valuation Mode Lin Discrete Time
IOSR Journal of Mahemaics (IOSR-JM) e-issn: 2278-5728,p-ISSN: 2319-765X, 6, Issue 6 (May. - Jun. 2013), PP 35-41 Cash Flow Valuaion Mode Lin Discree Time Olayiwola. M. A. and Oni, N. O. Deparmen of Mahemaics
More informationExpert Advice for Amateurs
Exper Advice for Amaeurs Ernes K. Lai Online Appendix - Exisence of Equilibria The analysis in his secion is performed under more general payoff funcions. Wihou aking an explici form, he payoffs of he
More information18 Biological models with discrete time
8 Biological models wih discree ime The mos imporan applicaions, however, may be pedagogical. The elegan body of mahemaical heory peraining o linear sysems (Fourier analysis, orhogonal funcions, and so
More informationEXERCISES FOR SECTION 1.5
1.5 Exisence and Uniqueness of Soluions 43 20. 1 v c 21. 1 v c 1 2 4 6 8 10 1 2 2 4 6 8 10 Graph of approximae soluion obained using Euler s mehod wih = 0.1. Graph of approximae soluion obained using Euler
More informationNotes for Lecture 17-18
U.C. Berkeley CS278: Compuaional Complexiy Handou N7-8 Professor Luca Trevisan April 3-8, 2008 Noes for Lecure 7-8 In hese wo lecures we prove he firs half of he PCP Theorem, he Amplificaion Lemma, up
More informationApproximation Algorithms for Unique Games via Orthogonal Separators
Approximaion Algorihms for Unique Games via Orhogonal Separaors Lecure noes by Konsanin Makarychev. Lecure noes are based on he papers [CMM06a, CMM06b, LM4]. Unique Games In hese lecure noes, we define
More informationOptimality Conditions for Unconstrained Problems
62 CHAPTER 6 Opimaliy Condiions for Unconsrained Problems 1 Unconsrained Opimizaion 11 Exisence Consider he problem of minimizing he funcion f : R n R where f is coninuous on all of R n : P min f(x) x
More informationDiebold, Chapter 7. Francis X. Diebold, Elements of Forecasting, 4th Edition (Mason, Ohio: Cengage Learning, 2006). Chapter 7. Characterizing Cycles
Diebold, Chaper 7 Francis X. Diebold, Elemens of Forecasing, 4h Ediion (Mason, Ohio: Cengage Learning, 006). Chaper 7. Characerizing Cycles Afer compleing his reading you should be able o: Define covariance
More informationOnline Learning Applications
Online Learning Applicaions Sepember 19, 2016 In he las lecure we saw he following guaranee for minimizing misakes wih Randomized Weighed Majoriy (RWM). Theorem 1 Le M be misakes of RWM and M i he misakes
More informationA Stochastic View of Optimal Regret through Minimax Duality
A Sochasic View of Opimal Regre hrough Minimax Dualiy Jacob Abernehy Compuer Science Division UC Berkeley Alekh Agarwal Compuer Science Division UC Berkeley Peer L. Barle Compuer Science Division Deparmen
More informationMATH 5720: Gradient Methods Hung Phan, UMass Lowell October 4, 2018
MATH 5720: Gradien Mehods Hung Phan, UMass Lowell Ocober 4, 208 Descen Direcion Mehods Consider he problem min { f(x) x R n}. The general descen direcions mehod is x k+ = x k + k d k where x k is he curren
More informationIntroduction to Probability and Statistics Slides 4 Chapter 4
Inroducion o Probabiliy and Saisics Slides 4 Chaper 4 Ammar M. Sarhan, asarhan@mahsa.dal.ca Deparmen of Mahemaics and Saisics, Dalhousie Universiy Fall Semeser 8 Dr. Ammar Sarhan Chaper 4 Coninuous Random
More informationState-Space Models. Initialization, Estimation and Smoothing of the Kalman Filter
Sae-Space Models Iniializaion, Esimaion and Smoohing of he Kalman Filer Iniializaion of he Kalman Filer The Kalman filer shows how o updae pas predicors and he corresponding predicion error variances when
More informationPhysics 127b: Statistical Mechanics. Fokker-Planck Equation. Time Evolution
Physics 7b: Saisical Mechanics Fokker-Planck Equaion The Langevin equaion approach o he evoluion of he velociy disribuion for he Brownian paricle migh leave you uncomforable. A more formal reamen of his
More informationEssential Microeconomics : OPTIMAL CONTROL 1. Consider the following class of optimization problems
Essenial Microeconomics -- 6.5: OPIMAL CONROL Consider he following class of opimizaion problems Max{ U( k, x) + U+ ( k+ ) k+ k F( k, x)}. { x, k+ } = In he language of conrol heory, he vecor k is he vecor
More informationMixing times and hitting times: lecture notes
Miing imes and hiing imes: lecure noes Yuval Peres Perla Sousi 1 Inroducion Miing imes and hiing imes are among he mos fundamenal noions associaed wih a finie Markov chain. A variey of ools have been developed
More informationLecture 33: November 29
36-705: Inermediae Saisics Fall 2017 Lecurer: Siva Balakrishnan Lecure 33: November 29 Today we will coninue discussing he boosrap, and hen ry o undersand why i works in a simple case. In he las lecure
More informationZürich. ETH Master Course: L Autonomous Mobile Robots Localization II
Roland Siegwar Margaria Chli Paul Furgale Marco Huer Marin Rufli Davide Scaramuzza ETH Maser Course: 151-0854-00L Auonomous Mobile Robos Localizaion II ACT and SEE For all do, (predicion updae / ACT),
More informationINTRODUCTION TO MACHINE LEARNING 3RD EDITION
ETHEM ALPAYDIN The MIT Press, 2014 Lecure Slides for INTRODUCTION TO MACHINE LEARNING 3RD EDITION alpaydin@boun.edu.r hp://www.cmpe.boun.edu.r/~ehem/i2ml3e CHAPTER 2: SUPERVISED LEARNING Learning a Class
More informationEcon107 Applied Econometrics Topic 7: Multicollinearity (Studenmund, Chapter 8)
I. Definiions and Problems A. Perfec Mulicollineariy Econ7 Applied Economerics Topic 7: Mulicollineariy (Sudenmund, Chaper 8) Definiion: Perfec mulicollineariy exiss in a following K-variable regression
More information14 Autoregressive Moving Average Models
14 Auoregressive Moving Average Models In his chaper an imporan parameric family of saionary ime series is inroduced, he family of he auoregressive moving average, or ARMA, processes. For a large class
More informationHomework 4 (Stats 620, Winter 2017) Due Tuesday Feb 14, in class Questions are derived from problems in Stochastic Processes by S. Ross.
Homework 4 (Sas 62, Winer 217) Due Tuesday Feb 14, in class Quesions are derived from problems in Sochasic Processes by S. Ross. 1. Le A() and Y () denoe respecively he age and excess a. Find: (a) P{Y
More informationWe just finished the Erdős-Stone Theorem, and ex(n, F ) (1 1/(χ(F ) 1)) ( n
Lecure 3 - Kövari-Sós-Turán Theorem Jacques Versraëe jacques@ucsd.edu We jus finished he Erdős-Sone Theorem, and ex(n, F ) ( /(χ(f ) )) ( n 2). So we have asympoics when χ(f ) 3 bu no when χ(f ) = 2 i.e.
More informationNature Neuroscience: doi: /nn Supplementary Figure 1. Spike-count autocorrelations in time.
Supplemenary Figure 1 Spike-coun auocorrelaions in ime. Normalized auocorrelaion marices are shown for each area in a daase. The marix shows he mean correlaion of he spike coun in each ime bin wih he spike
More informationEchocardiography Project and Finite Fourier Series
Echocardiography Projec and Finie Fourier Series 1 U M An echocardiagram is a plo of how a porion of he hear moves as he funcion of ime over he one or more hearbea cycles If he hearbea repeas iself every
More informationConvergence of the Neumann series in higher norms
Convergence of he Neumann series in higher norms Charles L. Epsein Deparmen of Mahemaics, Universiy of Pennsylvania Version 1.0 Augus 1, 003 Absrac Naural condiions on an operaor A are given so ha he Neumann
More informationLet us start with a two dimensional case. We consider a vector ( x,
Roaion marices We consider now roaion marices in wo and hree dimensions. We sar wih wo dimensions since wo dimensions are easier han hree o undersand, and one dimension is a lile oo simple. However, our
More informationDiscrete Markov Processes. 1. Introduction
Discree Markov Processes 1. Inroducion 1. Probabiliy Spaces and Random Variables Sample space. A model for evens: is a family of subses of such ha c (1) if A, hen A, (2) if A 1, A 2,..., hen A1 A 2...,
More information4 Sequences of measurable functions
4 Sequences of measurable funcions 1. Le (Ω, A, µ) be a measure space (complee, afer a possible applicaion of he compleion heorem). In his chaper we invesigae relaions beween various (nonequivalen) convergences
More informationRobust estimation based on the first- and third-moment restrictions of the power transformation model
h Inernaional Congress on Modelling and Simulaion, Adelaide, Ausralia, 6 December 3 www.mssanz.org.au/modsim3 Robus esimaion based on he firs- and hird-momen resricions of he power ransformaion Nawaa,
More informationR t. C t P t. + u t. C t = αp t + βr t + v t. + β + w t
Exercise 7 C P = α + β R P + u C = αp + βr + v (a) (b) C R = α P R + β + w (c) Assumpions abou he disurbances u, v, w : Classical assumions on he disurbance of one of he equaions, eg. on (b): E(v v s P,
More informationUnit Root Time Series. Univariate random walk
Uni Roo ime Series Univariae random walk Consider he regression y y where ~ iid N 0, he leas squares esimae of is: ˆ yy y y yy Now wha if = If y y hen le y 0 =0 so ha y j j If ~ iid N 0, hen y ~ N 0, he
More information5. Stochastic processes (1)
Lec05.pp S-38.45 - Inroducion o Teleraffic Theory Spring 2005 Conens Basic conceps Poisson process 2 Sochasic processes () Consider some quaniy in a eleraffic (or any) sysem I ypically evolves in ime randomly
More informationBEng (Hons) Telecommunications. Examinations for / Semester 2
BEng (Hons) Telecommunicaions Cohor: BTEL/14/FT Examinaions for 2015-2016 / Semeser 2 MODULE: ELECTROMAGNETIC THEORY MODULE CODE: ASE2103 Duraion: 2 ½ Hours Insrucions o Candidaes: 1. Answer ALL 4 (FOUR)
More informationODEs II, Lecture 1: Homogeneous Linear Systems - I. Mike Raugh 1. March 8, 2004
ODEs II, Lecure : Homogeneous Linear Sysems - I Mike Raugh March 8, 4 Inroducion. In he firs lecure we discussed a sysem of linear ODEs for modeling he excreion of lead from he human body, saw how o ransform
More informationLinear Response Theory: The connection between QFT and experiments
Phys540.nb 39 3 Linear Response Theory: The connecion beween QFT and experimens 3.1. Basic conceps and ideas Q: How do we measure he conduciviy of a meal? A: we firs inroduce a weak elecric field E, and
More informationThe Asymptotic Behavior of Nonoscillatory Solutions of Some Nonlinear Dynamic Equations on Time Scales
Advances in Dynamical Sysems and Applicaions. ISSN 0973-5321 Volume 1 Number 1 (2006, pp. 103 112 c Research India Publicaions hp://www.ripublicaion.com/adsa.hm The Asympoic Behavior of Nonoscillaory Soluions
More information( ) ( ) if t = t. It must satisfy the identity. So, bulkiness of the unit impulse (hyper)function is equal to 1. The defining characteristic is
UNIT IMPULSE RESPONSE, UNIT STEP RESPONSE, STABILITY. Uni impulse funcion (Dirac dela funcion, dela funcion) rigorously defined is no sricly a funcion, bu disribuion (or measure), precise reamen requires
More information1 Solutions to selected problems
1 Soluions o seleced problems 1. Le A B R n. Show ha in A in B bu in general bd A bd B. Soluion. Le x in A. Then here is ɛ > 0 such ha B ɛ (x) A B. This shows x in B. If A = [0, 1] and B = [0, 2], hen
More informationarxiv: v1 [math.pr] 19 Feb 2011
A NOTE ON FELLER SEMIGROUPS AND RESOLVENTS VADIM KOSTRYKIN, JÜRGEN POTTHOFF, AND ROBERT SCHRADER ABSTRACT. Various equivalen condiions for a semigroup or a resolven generaed by a Markov process o be of
More informationGround Rules. PC1221 Fundamentals of Physics I. Kinematics. Position. Lectures 3 and 4 Motion in One Dimension. A/Prof Tay Seng Chuan
Ground Rules PC11 Fundamenals of Physics I Lecures 3 and 4 Moion in One Dimension A/Prof Tay Seng Chuan 1 Swich off your handphone and pager Swich off your lapop compuer and keep i No alking while lecure
More informationMatlab and Python programming: how to get started
Malab and Pyhon programming: how o ge sared Equipping readers he skills o wrie programs o explore complex sysems and discover ineresing paerns from big daa is one of he main goals of his book. In his chaper,
More informationChristos Papadimitriou & Luca Trevisan November 22, 2016
U.C. Bereley CS170: Algorihms Handou LN-11-22 Chrisos Papadimiriou & Luca Trevisan November 22, 2016 Sreaming algorihms In his lecure and he nex one we sudy memory-efficien algorihms ha process a sream
More informationLECTURE 1: GENERALIZED RAY KNIGHT THEOREM FOR FINITE MARKOV CHAINS
LECTURE : GENERALIZED RAY KNIGHT THEOREM FOR FINITE MARKOV CHAINS We will work wih a coninuous ime reversible Markov chain X on a finie conneced sae space, wih generaor Lf(x = y q x,yf(y. (Recall ha q
More informationSolutions from Chapter 9.1 and 9.2
Soluions from Chaper 9 and 92 Secion 9 Problem # This basically boils down o an exercise in he chain rule from calculus We are looking for soluions of he form: u( x) = f( k x c) where k x R 3 and k is
More informationHomogenization of random Hamilton Jacobi Bellman Equations
Probabiliy, Geomery and Inegrable Sysems MSRI Publicaions Volume 55, 28 Homogenizaion of random Hamilon Jacobi Bellman Equaions S. R. SRINIVASA VARADHAN ABSTRACT. We consider nonlinear parabolic equaions
More informationL07. KALMAN FILTERING FOR NON-LINEAR SYSTEMS. NA568 Mobile Robotics: Methods & Algorithms
L07. KALMAN FILTERING FOR NON-LINEAR SYSTEMS NA568 Mobile Roboics: Mehods & Algorihms Today s Topic Quick review on (Linear) Kalman Filer Kalman Filering for Non-Linear Sysems Exended Kalman Filer (EKF)
More informationSimulation-Solving Dynamic Models ABE 5646 Week 2, Spring 2010
Simulaion-Solving Dynamic Models ABE 5646 Week 2, Spring 2010 Week Descripion Reading Maerial 2 Compuer Simulaion of Dynamic Models Finie Difference, coninuous saes, discree ime Simple Mehods Euler Trapezoid
More informationLie Derivatives operator vector field flow push back Lie derivative of
Lie Derivaives The Lie derivaive is a mehod of compuing he direcional derivaive of a vecor field wih respec o anoher vecor field We already know how o make sense of a direcional derivaive of real valued
More informationHamilton Jacobi equations
Hamilon Jacobi equaions Inoducion o PDE The rigorous suff from Evans, mosly. We discuss firs u + H( u = 0, (1 where H(p is convex, and superlinear a infiniy, H(p lim p p = + This by comes by inegraion
More informationMath 527 Lecture 6: Hamilton-Jacobi Equation: Explicit Formulas
Mah 527 Lecure 6: Hamilon-Jacobi Equaion: Explici Formulas Sep. 23, 2 Mehod of characerisics. We r o appl he mehod of characerisics o he Hamilon-Jacobi equaion: u +Hx, Du = in R n, u = g on R n =. 2 To
More informationAn Introduction to Malliavin calculus and its applications
An Inroducion o Malliavin calculus and is applicaions Lecure 5: Smoohness of he densiy and Hörmander s heorem David Nualar Deparmen of Mahemaics Kansas Universiy Universiy of Wyoming Summer School 214
More informationThe Strong Law of Large Numbers
Lecure 9 The Srong Law of Large Numbers Reading: Grimme-Sirzaker 7.2; David Williams Probabiliy wih Maringales 7.2 Furher reading: Grimme-Sirzaker 7.1, 7.3-7.5 Wih he Convergence Theorem (Theorem 54) and
More informationACE 562 Fall Lecture 5: The Simple Linear Regression Model: Sampling Properties of the Least Squares Estimators. by Professor Scott H.
ACE 56 Fall 005 Lecure 5: he Simple Linear Regression Model: Sampling Properies of he Leas Squares Esimaors by Professor Sco H. Irwin Required Reading: Griffihs, Hill and Judge. "Inference in he Simple
More informationAn introduction to the theory of SDDP algorithm
An inroducion o he heory of SDDP algorihm V. Leclère (ENPC) Augus 1, 2014 V. Leclère Inroducion o SDDP Augus 1, 2014 1 / 21 Inroducion Large scale sochasic problem are hard o solve. Two ways of aacking
More informationDIFFERENTIAL GEOMETRY HW 5
DIFFERENTIAL GEOMETRY HW 5 CLAY SHONKWILER 3. Le M be a complee Riemannian manifold wih non-posiive secional curvaure. Prove ha d exp p v w w, for all p M, all v T p M and all w T v T p M. Proof. Le γ
More informationTHE MYSTERY OF STOCHASTIC MECHANICS. Edward Nelson Department of Mathematics Princeton University
THE MYSTERY OF STOCHASTIC MECHANICS Edward Nelson Deparmen of Mahemaics Princeon Universiy 1 Classical Hamilon-Jacobi heory N paricles of various masses on a Euclidean space. Incorporae he masses in he
More informationThe consumption-based determinants of the term structure of discount rates: Corrigendum. Christian Gollier 1 Toulouse School of Economics March 2012
The consumpion-based deerminans of he erm srucure of discoun raes: Corrigendum Chrisian Gollier Toulouse School of Economics March 0 In Gollier (007), I examine he effec of serially correlaed growh raes
More informationLongest Common Prefixes
Longes Common Prefixes The sandard ordering for srings is he lexicographical order. I is induced by an order over he alphabe. We will use he same symbols (,
More informationLecture 9: September 25
0-725: Opimizaion Fall 202 Lecure 9: Sepember 25 Lecurer: Geoff Gordon/Ryan Tibshirani Scribes: Xuezhi Wang, Subhodeep Moira, Abhimanu Kumar Noe: LaTeX emplae couresy of UC Berkeley EECS dep. Disclaimer:
More informationOnline Appendix for "Customer Recognition in. Experience versus Inspection Good Markets"
Online Appendix for "Cusomer Recogniion in Experience versus Inspecion Good Markes" Bing Jing Cheong Kong Graduae School of Business Beijing, 0078, People s Republic of China, bjing@ckgsbeducn November
More informationMath 334 Fall 2011 Homework 11 Solutions
Dec. 2, 2 Mah 334 Fall 2 Homework Soluions Basic Problem. Transform he following iniial value problem ino an iniial value problem for a sysem: u + p()u + q() u g(), u() u, u () v. () Soluion. Le v u. Then
More informationPredator - Prey Model Trajectories and the nonlinear conservation law
Predaor - Prey Model Trajecories and he nonlinear conservaion law James K. Peerson Deparmen of Biological Sciences and Deparmen of Mahemaical Sciences Clemson Universiy Ocober 28, 213 Ouline Drawing Trajecories
More informationSOLUTIONS TO ECE 3084
SOLUTIONS TO ECE 384 PROBLEM 2.. For each sysem below, specify wheher or no i is: (i) memoryless; (ii) causal; (iii) inverible; (iv) linear; (v) ime invarian; Explain your reasoning. If he propery is no
More informationHeat kernel and Harnack inequality on Riemannian manifolds
Hea kernel and Harnack inequaliy on Riemannian manifolds Alexander Grigor yan UHK 11/02/2014 onens 1 Laplace operaor and hea kernel 1 2 Uniform Faber-Krahn inequaliy 3 3 Gaussian upper bounds 4 4 ean-value
More informationt is a basis for the solution space to this system, then the matrix having these solutions as columns, t x 1 t, x 2 t,... x n t x 2 t...
Mah 228- Fri Mar 24 5.6 Marix exponenials and linear sysems: The analogy beween firs order sysems of linear differenial equaions (Chaper 5) and scalar linear differenial equaions (Chaper ) is much sronger
More informationRobotics I. April 11, The kinematics of a 3R spatial robot is specified by the Denavit-Hartenberg parameters in Tab. 1.
Roboics I April 11, 017 Exercise 1 he kinemaics of a 3R spaial robo is specified by he Denavi-Harenberg parameers in ab 1 i α i d i a i θ i 1 π/ L 1 0 1 0 0 L 3 0 0 L 3 3 able 1: able of DH parameers of
More informationOnline Appendix to Solution Methods for Models with Rare Disasters
Online Appendix o Soluion Mehods for Models wih Rare Disasers Jesús Fernández-Villaverde and Oren Levinal In his Online Appendix, we presen he Euler condiions of he model, we develop he pricing Calvo block,
More informationEnsamble methods: Boosting
Lecure 21 Ensamble mehods: Boosing Milos Hauskrech milos@cs.pi.edu 5329 Senno Square Schedule Final exam: April 18: 1:00-2:15pm, in-class Term projecs April 23 & April 25: a 1:00-2:30pm in CS seminar room
More informationA Bayesian Approach to Spectral Analysis
Chirped Signals A Bayesian Approach o Specral Analysis Chirped signals are oscillaing signals wih ime variable frequencies, usually wih a linear variaion of frequency wih ime. E.g. f() = A cos(ω + α 2
More informationDecentralized Stochastic Control with Partial History Sharing: A Common Information Approach
1 Decenralized Sochasic Conrol wih Parial Hisory Sharing: A Common Informaion Approach Ashuosh Nayyar, Adiya Mahajan and Demoshenis Tenekezis arxiv:1209.1695v1 [cs.sy] 8 Sep 2012 Absrac A general model
More informationLet ( α, β be the eigenvector associated with the eigenvalue λ i
ENGI 940 4.05 - Sabiliy Analysis (Linear) Page 4.5 Le ( α, be he eigenvecor associaed wih he eigenvalue λ i of he coefficien i i) marix A Le c, c be arbirary consans. a b c d Case of real, disinc, negaive
More information10. State Space Methods
. Sae Space Mehods. Inroducion Sae space modelling was briefly inroduced in chaper. Here more coverage is provided of sae space mehods before some of heir uses in conrol sysem design are covered in he
More informationRL Lecture 7: Eligibility Traces. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1
RL Lecure 7: Eligibiliy Traces R. S. Suon and A. G. Baro: Reinforcemen Learning: An Inroducion 1 N-sep TD Predicion Idea: Look farher ino he fuure when you do TD backup (1, 2, 3,, n seps) R. S. Suon and
More informationO Q L N. Discrete-Time Stochastic Dynamic Programming. I. Notation and basic assumptions. ε t : a px1 random vector of disturbances at time t.
Econ. 5b Spring 999 C. Sims Discree-Time Sochasic Dynamic Programming 995, 996 by Chrisopher Sims. This maerial may be freely reproduced for educaional and research purposes, so long as i is no alered,
More informationRandom Walk with Anti-Correlated Steps
Random Walk wih Ani-Correlaed Seps John Noga Dirk Wagner 2 Absrac We conjecure he expeced value of random walks wih ani-correlaed seps o be exacly. We suppor his conjecure wih 2 plausibiliy argumens and
More informationSZG Macro 2011 Lecture 3: Dynamic Programming. SZG macro 2011 lecture 3 1
SZG Macro 2011 Lecure 3: Dynamic Programming SZG macro 2011 lecure 3 1 Background Our previous discussion of opimal consumpion over ime and of opimal capial accumulaion sugges sudying he general decision
More information