Multi-Armed Bandits: Non-adaptive and Adaptive Sampling
|
|
- Tobias Turner
- 5 years ago
- Views:
Transcription
1 CSE 547/Stt 548: Mchine Lerning for Big Dt Lecture Multi-Armed Bndits: Non-dptive nd Adptive Smpling Instructor: Shm Kkde 1 The (stochstic) multi-rmed bndit problem The bsic prdigm is s follows: K Independent Arms: {1,... K} Ech rm returns rndom rewrd R if pulled. (simpler cse) ssume R is not time vrying. Gme: You chose rm t t time t. You then observe: X t = R t where R t is smpled from the underlying distribution of tht rm. Criticlly, the distribution over R is not known. 1.1 Regret: n online performnce mesure Our objective is to mximize our long term rewrd. We hve (possibly rndomized) sequentil strtegy/lgorithm A, which is of the form: t = A( 1, X 1, 2, X 2,... t 1, X t 1 ) In T rounds, our rewrd is: T E[ X t A where the expecttion is with respect to the rewrd process nd our lgorithm. Suppose: µ = E[R, nd let us ssume 0 µ 1. Also, define: µ = mx µ. In T rounds nd in expecttion, the best we cn do is obtin µ T. We will mesure our performnce by our expected regret, defined s follows: In T rounds, our (observed) regret is: µ T T X t A 1
2 nd our expected regret is: µ T E X t A where the expecttion is with the rndomness in our outcomes (nd possibly our lgorithm if it is rndomized). 1.2 Cvet: Our presenttion in these notes will be loose in terms of log( ) fctors, in both K nd T. There re multiple good tretments tht provide improvements in terms of these fctors. 2 Review: Hoeffding s bound With N smples, denote the smple men s: ˆµ = 1 X t. N Lemm 2.1. Supposing tht the X t s hve n i.i.d. distribution nd re bounded between 0 nd 1, then, with probbility greter thn 1 δ, we hve tht log(2/δ) ˆµ µ 2N. 3 Wrmup: A non-dptive strtegy t Suppose we first pull ech rm times, in n explortion phse. Then, for the reminder of the T steps, we pull the rm which hd the best observed rewrd during the explortion phse. By the union bound, with probbility greter thn 1 δ, for ll ctions, ˆµ µ O. To see this, we simply mke our error probbility to be δ/k, to the totl error probbility is δ. Thus ll the confidence intervls will hold. During the explortion rounds, our cumultive regret is t most K, trivil upper bound. During the exploittion rounds, let us bound our cumultive regret for the reminder of T K. Note tht for the rm i tht we pull, we must hve tht: ˆµ i ˆµ i where i is n optiml rm. This implies tht µ i µ c where c is universl constnt. To see this, note tht by construction of the lgorithm ˆµ i ˆµ i, which implies µ i ˆµ i ˆµ i µ i ˆµ i ˆµ i µ i µ i ˆµ i µ i ˆµ i µ i, nd the clim follows using the confidence intervl bounds. 2
3 Hence, our totl regret is: Now let us optimize for. µ T T X t K + O (T K) Lemm 3.1. (Regret of the non-dptive strtegy) The totl expected regret of the non-dptive strtegy is: µ T E X t ck 1/3 T 2/3 (log T ) 1/3 where c is universl constnt. Proof. Choose = K 2/3 T 2/3 nd δ = 1/T 2. Note tht with probbility greter thn 1 1/T 2, our regret is bounded by (K 1/3 T 2/3 (log(kt )) 1/3 ). Also, if we fil, the lrgest regret we cn py is T, nd this occurs with probbility less thn 1/T 2, so the reget is: exp. regret Pr(no filure event) K 1/3 T 2/3 (log(kt )) 1/3 + Pr(filure event)t c(1 1/T 2 )K 1/3 T 2/3 (log(kt )) 1/3 + 1 T. This shows tht the regret is bounded s O(K 1/3 T 2/3 (log(kt )) 1/3 ). For T > K, log(kt ) 2 log T (nd for K < T, the climed regret bound is trivilly true). This completes the proof (for different universl constnt). 3.1 A (minimx) optiml dptive lgorithm We will now provide n optiml (up to log fctors) lgorithms (optiml under the i.i.d. ssumption for the rewrds re distributed nd using tht the rewrds re upper bounded by 1). Let be the number of times we pulled rm up to time t. The question is wht rm should pull time t + 1? 3.2 Confidence bounds If we don t cre bout log fctors, then the following is strightforwrd rgument to see tht our confidence bounds will simultneously hold for ll times t (from 0 to ) nd ll K rms. Lemm 3.2. With probbility greter thn 1 δ, we will hve tht for ll times t K, ll [K, ˆµ,t µ c where c is universl constnt. Proof. We will ctully prove stronger sttement: suppose tht we observe the outcome of every rm, we will first provide probbilistic sttement for the confidence intervls of ll the rms (nd for ll smple sizes). Let us pply Hoeffding s bound with n error probbility of δ/(k 2 ). Specificlly, for the rm with smples, we hve tht with probbility greter thn 1 δ/(k 2 ): ˆµ, µ c 3
4 (by strightforwrd ppliction of Hoeffding s bound). Now tht the totl error probbility over ll rms n over smple size is: δ K 2 = δπ2 /6 =0 (the π 2 /6 is from Bsel s problem). Note the sum is finite, which mens the error totl probbility for ll of these confidence intervls is less thn constnt δ. We hve thus shown the following (note the quntifiers): with probbility greter thn 1 δ, tht for ll rms nd ll smple sizes 1 tht: ˆµ, µ c, (for possibility different constnt c). Observe tht the confidence bounds tht ny lgorithm uses t time t is due to hving smples, so we cn now pply the bove bound in this cse, where: log( K/δ) log(tk/δ) c c since t. This shows tht these confidence bounds re vlid for ll times t nd ll rms. The proof is completed by nothing for t K, log(kt) 2 log t. 3.3 The Upper Confidence Bound (UCB) Algorithm At ech time t, Pull rm: t = rg mx ˆµ,t + c := rg mx ˆµ,t + ConfBound,t (where c 10 is constnt). Observe rewrd X t. Updte µ,t,, nd ConfBound,t. With probbility greter thn 1 δ ll the confidence bounds will hold for ll rms nd ll times t. 3.4 Anlysis of UCB If pull rm t time t, wht is our instntneous regret, i.e. wht is: µ µ t? Let i be n optiml rm. Note by construction of the lgorithm we hve, if we pull rm t time t, then: ˆµ,t + ConfBound,t ˆµ i + ConfBound i µ i, the lst step follows due to tht µ i is contined within the confidence intervl for i. Using this we hve tht: µ t ˆµ,t ConfBound,t ˆµ i 2ConfBound,t 4
5 Theorem 3.3. (UCB regret) The totl expected regret of UCB is: µ T E X t c KT log T for n ppropritely chosen universl constnt c. Proof. The expected regret is bounded s: µ T E X t 2 ConfBound,t t 2c N t,t 2c log(t/δ)n,t. (1) Note the following constrint on the N,T s must hold: N,T = T One cn now show the worst cse setting of N,T tht mkes Eqution 1 s lrge s possible (subject to this constrint on the N,T s) is when = T/K. Finlly, to obtin the expected regret bound, the proof is identicl to tht of the previous rgument (in the non-dptive cse, where we choose δ = 1/T 2 ). 5
1 Online Learning and Regret Minimization
2.997 Decision-Mking in Lrge-Scle Systems My 10 MIT, Spring 2004 Hndout #29 Lecture Note 24 1 Online Lerning nd Regret Minimiztion In this lecture, we consider the problem of sequentil decision mking in
More informationReinforcement learning II
CS 1675 Introduction to Mchine Lerning Lecture 26 Reinforcement lerning II Milos Huskrecht milos@cs.pitt.edu 5329 Sennott Squre Reinforcement lerning Bsics: Input x Lerner Output Reinforcement r Critic
More informationA recursive construction of efficiently decodable list-disjunct matrices
CSE 709: Compressed Sensing nd Group Testing. Prt I Lecturers: Hung Q. Ngo nd Atri Rudr SUNY t Bufflo, Fll 2011 Lst updte: October 13, 2011 A recursive construction of efficiently decodble list-disjunct
More informationReinforcement Learning
Reinforcement Lerning Tom Mitchell, Mchine Lerning, chpter 13 Outline Introduction Comprison with inductive lerning Mrkov Decision Processes: the model Optiml policy: The tsk Q Lerning: Q function Algorithm
More informationCS 188 Introduction to Artificial Intelligence Fall 2018 Note 7
CS 188 Introduction to Artificil Intelligence Fll 2018 Note 7 These lecture notes re hevily bsed on notes originlly written by Nikhil Shrm. Decision Networks In the third note, we lerned bout gme trees
More informationMATH362 Fundamentals of Mathematical Finance
MATH362 Fundmentls of Mthemticl Finnce Solution to Homework Three Fll, 2007 Course Instructor: Prof. Y.K. Kwok. If outcome j occurs, then the gin is given by G j = g ij α i, + d where α i = i + d i We
More information38.2. The Uniform Distribution. Introduction. Prerequisites. Learning Outcomes
The Uniform Distribution 8. Introduction This Section introduces the simplest type of continuous probbility distribution which fetures continuous rndom vrible X with probbility density function f(x) which
More informationReview of Calculus, cont d
Jim Lmbers MAT 460 Fll Semester 2009-10 Lecture 3 Notes These notes correspond to Section 1.1 in the text. Review of Clculus, cont d Riemnn Sums nd the Definite Integrl There re mny cses in which some
More informationCMDA 4604: Intermediate Topics in Mathematical Modeling Lecture 19: Interpolation and Quadrature
CMDA 4604: Intermedite Topics in Mthemticl Modeling Lecture 19: Interpoltion nd Qudrture In this lecture we mke brief diversion into the res of interpoltion nd qudrture. Given function f C[, b], we sy
More informationSolution for Assignment 1 : Intro to Probability and Statistics, PAC learning
Solution for Assignment 1 : Intro to Probbility nd Sttistics, PAC lerning 10-701/15-781: Mchine Lerning (Fll 004) Due: Sept. 30th 004, Thursdy, Strt of clss Question 1. Bsic Probbility ( 18 pts) 1.1 (
More information1 Probability Density Functions
Lis Yn CS 9 Continuous Distributions Lecture Notes #9 July 6, 28 Bsed on chpter by Chris Piech So fr, ll rndom vribles we hve seen hve been discrete. In ll the cses we hve seen in CS 9, this ment tht our
More informationContinuous Random Variables
STAT/MATH 395 A - PROBABILITY II UW Winter Qurter 217 Néhémy Lim Continuous Rndom Vribles Nottion. The indictor function of set S is rel-vlued function defined by : { 1 if x S 1 S (x) if x S Suppose tht
More informationPhysics 202H - Introductory Quantum Physics I Homework #08 - Solutions Fall 2004 Due 5:01 PM, Monday 2004/11/15
Physics H - Introductory Quntum Physics I Homework #8 - Solutions Fll 4 Due 5:1 PM, Mondy 4/11/15 [55 points totl] Journl questions. Briefly shre your thoughts on the following questions: Of the mteril
More informationMath 426: Probability Final Exam Practice
Mth 46: Probbility Finl Exm Prctice. Computtionl problems 4. Let T k (n) denote the number of prtitions of the set {,..., n} into k nonempty subsets, where k n. Argue tht T k (n) kt k (n ) + T k (n ) by
More informationThe First Fundamental Theorem of Calculus. If f(x) is continuous on [a, b] and F (x) is any antiderivative. f(x) dx = F (b) F (a).
The Fundmentl Theorems of Clculus Mth 4, Section 0, Spring 009 We now know enough bout definite integrls to give precise formultions of the Fundmentl Theorems of Clculus. We will lso look t some bsic emples
More information5.7 Improper Integrals
458 pplictions of definite integrls 5.7 Improper Integrls In Section 5.4, we computed the work required to lift pylod of mss m from the surfce of moon of mss nd rdius R to height H bove the surfce of the
More informationHeat flux and total heat
Het flux nd totl het John McCun Mrch 14, 2017 1 Introduction Yesterdy (if I remember correctly) Ms. Prsd sked me question bout the condition of insulted boundry for the 1D het eqution, nd (bsed on glnce
More informationChapter 5 : Continuous Random Variables
STAT/MATH 395 A - PROBABILITY II UW Winter Qurter 216 Néhémy Lim Chpter 5 : Continuous Rndom Vribles Nottions. N {, 1, 2,...}, set of nturl numbers (i.e. ll nonnegtive integers); N {1, 2,...}, set of ll
More informationThe Regulated and Riemann Integrals
Chpter 1 The Regulted nd Riemnn Integrls 1.1 Introduction We will consider severl different pproches to defining the definite integrl f(x) dx of function f(x). These definitions will ll ssign the sme vlue
More informationUNIFORM CONVERGENCE. Contents 1. Uniform Convergence 1 2. Properties of uniform convergence 3
UNIFORM CONVERGENCE Contents 1. Uniform Convergence 1 2. Properties of uniform convergence 3 Suppose f n : Ω R or f n : Ω C is sequence of rel or complex functions, nd f n f s n in some sense. Furthermore,
More information19 Optimal behavior: Game theory
Intro. to Artificil Intelligence: Dle Schuurmns, Relu Ptrscu 1 19 Optiml behvior: Gme theory Adversril stte dynmics hve to ccount for worst cse Compute policy π : S A tht mximizes minimum rewrd Let S (,
More informationW. We shall do so one by one, starting with I 1, and we shall do it greedily, trying
Vitli covers 1 Definition. A Vitli cover of set E R is set V of closed intervls with positive length so tht, for every δ > 0 nd every x E, there is some I V with λ(i ) < δ nd x I. 2 Lemm (Vitli covering)
More informationPlanning to Be Surprised: Optimal Bayesian Exploration in Dynamic Environments
Plnning to Be Surprised: Optiml Byesin Explortion in Dynmic Environments Yi Sun, Fustino Gomez, nd Jürgen Schmidhuber IDSIA, Glleri 2, Mnno, CH-6928, Switzerlnd {yi,tino,juergen}@idsi.ch Abstrct. To mximize
More informationLecture 3 ( ) (translated and slightly adapted from lecture notes by Martin Klazar)
Lecture 3 (5.3.2018) (trnslted nd slightly dpted from lecture notes by Mrtin Klzr) Riemnn integrl Now we define precisely the concept of the re, in prticulr, the re of figure U(, b, f) under the grph of
More informationarxiv: v1 [stat.ml] 9 Aug 2016
On Lower Bounds for Regret in Reinforcement Lerning In Osbnd Stnford University, Google DeepMind iosbnd@stnford.edu Benjmin Vn Roy Stnford University bvr@stnford.edu rxiv:1608.02732v1 [stt.ml 9 Aug 2016
More informationAP Calculus Multiple Choice: BC Edition Solutions
AP Clculus Multiple Choice: BC Edition Solutions J. Slon Mrch 8, 04 ) 0 dx ( x) is A) B) C) D) E) Divergent This function inside the integrl hs verticl symptotes t x =, nd the integrl bounds contin this
More informationStudent Activity 3: Single Factor ANOVA
MATH 40 Student Activity 3: Single Fctor ANOVA Some Bsic Concepts In designed experiment, two or more tretments, or combintions of tretments, is pplied to experimentl units The number of tretments, whether
More informationA BRIEF INTRODUCTION TO UNIFORM CONVERGENCE. In the study of Fourier series, several questions arise naturally, such as: c n e int
A BRIEF INTRODUCTION TO UNIFORM CONVERGENCE HANS RINGSTRÖM. Questions nd exmples In the study of Fourier series, severl questions rise nturlly, such s: () (2) re there conditions on c n, n Z, which ensure
More informationLECTURE NOTE #12 PROF. ALAN YUILLE
LECTURE NOTE #12 PROF. ALAN YUILLE 1. Clustering, K-mens, nd EM Tsk: set of unlbeled dt D = {x 1,..., x n } Decompose into clsses w 1,..., w M where M is unknown. Lern clss models p(x w)) Discovery of
More informationProperties of the Riemann Integral
Properties of the Riemnn Integrl Jmes K. Peterson Deprtment of Biologicl Sciences nd Deprtment of Mthemticl Sciences Clemson University Februry 15, 2018 Outline 1 Some Infimum nd Supremum Properties 2
More information1 Structural induction, finite automata, regular expressions
Discrete Structures Prelim 2 smple uestions s CS2800 Questions selected for spring 2017 1 Structurl induction, finite utomt, regulr expressions 1. We define set S of functions from Z to Z inductively s
More informationMath 1B, lecture 4: Error bounds for numerical methods
Mth B, lecture 4: Error bounds for numericl methods Nthn Pflueger 4 September 0 Introduction The five numericl methods descried in the previous lecture ll operte by the sme principle: they pproximte the
More informationp-adic Egyptian Fractions
p-adic Egyptin Frctions Contents 1 Introduction 1 2 Trditionl Egyptin Frctions nd Greedy Algorithm 2 3 Set-up 3 4 p-greedy Algorithm 5 5 p-egyptin Trditionl 10 6 Conclusion 1 Introduction An Egyptin frction
More informationTHE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS.
THE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS RADON ROSBOROUGH https://intuitiveexplntionscom/picrd-lindelof-theorem/ This document is proof of the existence-uniqueness theorem
More informationLecture 1: Introduction to integration theory and bounded variation
Lecture 1: Introduction to integrtion theory nd bounded vrition Wht is this course bout? Integrtion theory. The first question you might hve is why there is nything you need to lern bout integrtion. You
More informationNumerical Integration. 1 Introduction. 2 Midpoint Rule, Trapezoid Rule, Simpson Rule. AMSC/CMSC 460/466 T. von Petersdorff 1
AMSC/CMSC 46/466 T. von Petersdorff 1 umericl Integrtion 1 Introduction We wnt to pproximte the integrl I := f xdx where we re given, b nd the function f s subroutine. We evlute f t points x 1,...,x n
More informationIs there an easy way to find examples of such triples? Why yes! Just look at an ordinary multiplication table to find them!
PUSHING PYTHAGORAS 009 Jmes Tnton A triple of integers ( bc,, ) is clled Pythgoren triple if exmple, some clssic triples re ( 3,4,5 ), ( 5,1,13 ), ( ) fond of ( 0,1,9 ) nd ( 119,10,169 ). + b = c. For
More informationAppendix to Notes 8 (a)
Appendix to Notes 8 () 13 Comprison of the Riemnn nd Lebesgue integrls. Recll Let f : [, b] R be bounded. Let D be prtition of [, b] such tht Let D = { = x 0 < x 1
More information14.3 comparing two populations: based on independent samples
Chpter4 Nonprmetric Sttistics Introduction: : methods for mking inferences bout popultion prmeters (confidence intervl nd hypothesis testing) rely on the ssumptions bout probbility distribution of smpled
More informationChapter 9: Inferences based on Two samples: Confidence intervals and tests of hypotheses
Chpter 9: Inferences bsed on Two smples: Confidence intervls nd tests of hypotheses 9.1 The trget prmeter : difference between two popultion mens : difference between two popultion proportions : rtio of
More informationReinforcement Learning and Policy Reuse
Reinforcement Lerning nd Policy Reue Mnuel M. Veloo PEL Fll 206 Reding: Reinforcement Lerning: An Introduction R. Sutton nd A. Brto Probbilitic policy reue in reinforcement lerning gent Fernndo Fernndez
More informationBellman Optimality Equation for V*
Bellmn Optimlity Eqution for V* The vlue of stte under n optiml policy must equl the expected return for the best ction from tht stte: V (s) mx Q (s,) A(s) mx A(s) mx A(s) Er t 1 V (s t 1 ) s t s, t s
More informationReversals of Signal-Posterior Monotonicity for Any Bounded Prior
Reversls of Signl-Posterior Monotonicity for Any Bounded Prior Christopher P. Chmbers Pul J. Hely Abstrct Pul Milgrom (The Bell Journl of Economics, 12(2): 380 391) showed tht if the strict monotone likelihood
More informationNear-Bayesian Exploration in Polynomial Time
J. Zico Kolter kolter@cs.stnford.edu Andrew Y. Ng ng@cs.stnford.edu Computer Science Deprtment, Stnford University, CA 94305 Abstrct We consider the explortion/exploittion problem in reinforcement lerning
More informationIntegral equations, eigenvalue, function interpolation
Integrl equtions, eigenvlue, function interpoltion Mrcin Chrząszcz mchrzsz@cernch Monte Crlo methods, 26 My, 2016 1 / Mrcin Chrząszcz (Universität Zürich) Integrl equtions, eigenvlue, function interpoltion
More informationNew Integral Inequalities for n-time Differentiable Functions with Applications for pdfs
Applied Mthemticl Sciences, Vol. 2, 2008, no. 8, 353-362 New Integrl Inequlities for n-time Differentible Functions with Applictions for pdfs Aristides I. Kechriniotis Technologicl Eductionl Institute
More informationLecture 21: Order statistics
Lecture : Order sttistics Suppose we hve N mesurements of sclr, x i =, N Tke ll mesurements nd sort them into scending order x x x 3 x N Define the mesured running integrl S N (x) = 0 for x < x = i/n for
More informationProblem Set 9. Figure 1: Diagram. This picture is a rough sketch of the 4 parabolas that give us the area that we need to find. The equations are:
(x + y ) = y + (x + y ) = x + Problem Set 9 Discussion: Nov., Nov. 8, Nov. (on probbility nd binomil coefficients) The nme fter the problem is the designted writer of the solution of tht problem. (No one
More informationOnline Supplements to Performance-Based Contracts for Outpatient Medical Services
Jing, Png nd Svin: Performnce-bsed Contrcts Article submitted to Mnufcturing & Service Opertions Mngement; mnuscript no. MSOM-11-270.R2 1 Online Supplements to Performnce-Bsed Contrcts for Outptient Medicl
More information13: Diffusion in 2 Energy Groups
3: Diffusion in Energy Groups B. Rouben McMster University Course EP 4D3/6D3 Nucler Rector Anlysis (Rector Physics) 5 Sept.-Dec. 5 September Contents We study the diffusion eqution in two energy groups
More informationCredibility Hypothesis Testing of Fuzzy Triangular Distributions
666663 Journl of Uncertin Systems Vol.9, No., pp.6-74, 5 Online t: www.jus.org.uk Credibility Hypothesis Testing of Fuzzy Tringulr Distributions S. Smpth, B. Rmy Received April 3; Revised 4 April 4 Abstrct
More informationLecture 3 Gaussian Probability Distribution
Introduction Lecture 3 Gussin Probbility Distribution Gussin probbility distribution is perhps the most used distribution in ll of science. lso clled bell shped curve or norml distribution Unlike the binomil
More informationName Solutions to Test 3 November 8, 2017
Nme Solutions to Test 3 November 8, 07 This test consists of three prts. Plese note tht in prts II nd III, you cn skip one question of those offered. Some possibly useful formuls cn be found below. Brrier
More informationMath 8 Winter 2015 Applications of Integration
Mth 8 Winter 205 Applictions of Integrtion Here re few importnt pplictions of integrtion. The pplictions you my see on n exm in this course include only the Net Chnge Theorem (which is relly just the Fundmentl
More informationPARTIAL FRACTION DECOMPOSITION
PARTIAL FRACTION DECOMPOSITION LARRY SUSANKA 1. Fcts bout Polynomils nd Nottion We must ssemble some tools nd nottion to prove the existence of the stndrd prtil frction decomposition, used s n integrtion
More informationLecture 20: Numerical Integration III
cs4: introduction to numericl nlysis /8/0 Lecture 0: Numericl Integrtion III Instructor: Professor Amos Ron Scribes: Mrk Cowlishw, Yunpeng Li, Nthnel Fillmore For the lst few lectures we hve discussed
More informationS. S. Dragomir. 2, we have the inequality. b a
Bull Koren Mth Soc 005 No pp 3 30 SOME COMPANIONS OF OSTROWSKI S INEQUALITY FOR ABSOLUTELY CONTINUOUS FUNCTIONS AND APPLICATIONS S S Drgomir Abstrct Compnions of Ostrowski s integrl ineulity for bsolutely
More informationMIXED MODELS (Sections ) I) In the unrestricted model, interactions are treated as in the random effects model:
1 2 MIXED MODELS (Sections 17.7 17.8) Exmple: Suppose tht in the fiber breking strength exmple, the four mchines used were the only ones of interest, but the interest ws over wide rnge of opertors, nd
More informationRiemann Sums and Riemann Integrals
Riemnn Sums nd Riemnn Integrls Jmes K. Peterson Deprtment of Biologicl Sciences nd Deprtment of Mthemticl Sciences Clemson University August 26, 2013 Outline 1 Riemnn Sums 2 Riemnn Integrls 3 Properties
More informationContinuous Random Variables Class 5, Jeremy Orloff and Jonathan Bloom
Lerning Gols Continuous Rndom Vriles Clss 5, 8.05 Jeremy Orloff nd Jonthn Bloom. Know the definition of continuous rndom vrile. 2. Know the definition of the proility density function (pdf) nd cumultive
More informationAdministrivia CSE 190: Reinforcement Learning: An Introduction
Administrivi CSE 190: Reinforcement Lerning: An Introduction Any emil sent to me bout the course should hve CSE 190 in the subject line! Chpter 4: Dynmic Progrmming Acknowledgment: A good number of these
More informationAutomated Recommendation Systems
Automted Recommendtion Systems Collbortive Filtering Through Reinorcement Lerning Most Akhmizdeh Deprtment o MS&E, Stnord University Emil: mkhmi@stnord.edu Alexei Avkov Deprtment o Electricl Engineering,
More informationOnline Markov Decision Processes under Bandit Feedback
Online Mrkov Decision Processes under Bndit Feedbck Gergely Neu, András György, Csb Szepesvári, András Antos Abstrct We consider online lerning in finite stochstic Mrkovin environments where in ech time
More informationRiemann Sums and Riemann Integrals
Riemnn Sums nd Riemnn Integrls Jmes K. Peterson Deprtment of Biologicl Sciences nd Deprtment of Mthemticl Sciences Clemson University August 26, 203 Outline Riemnn Sums Riemnn Integrls Properties Abstrct
More informationHomework 11. Andrew Ma November 30, sin x (1+x) (1+x)
Homewor Andrew M November 3, 4 Problem 9 Clim: Pf: + + d = d = sin b +b + sin (+) d sin (+) d using integrtion by prts. By pplying + d = lim b sin b +b + sin (+) d. Since limits to both sides, lim b sin
More informationA Fast and Reliable Policy Improvement Algorithm
A Fst nd Relible Policy Improvement Algorithm Ysin Abbsi-Ydkori Peter L. Brtlett Stephen J. Wright Queenslnd University of Technology UC Berkeley nd QUT University of Wisconsin-Mdison Abstrct We introduce
More informationChapter 6 Continuous Random Variables and Distributions
Chpter 6 Continuous Rndom Vriles nd Distriutions Mny economic nd usiness mesures such s sles investment consumption nd cost cn hve the continuous numericl vlues so tht they cn not e represented y discrete
More informationBandits and Exploration: How do we (optimally) gather information? Sham M. Kakade
Bandits and Exploration: How do we (optimally) gather information? Sham M. Kakade Machine Learning for Big Data CSE547/STAT548 University of Washington S. M. Kakade (UW) Optimization for Big data 1 / 22
More informationf(x) dx, If one of these two conditions is not met, we call the integral improper. Our usual definition for the value for the definite integral
Improper Integrls Every time tht we hve evluted definite integrl such s f(x) dx, we hve mde two implicit ssumptions bout the integrl:. The intervl [, b] is finite, nd. f(x) is continuous on [, b]. If one
More informationBases for Vector Spaces
Bses for Vector Spces 2-26-25 A set is independent if, roughly speking, there is no redundncy in the set: You cn t uild ny vector in the set s liner comintion of the others A set spns if you cn uild everything
More informationARITHMETIC OPERATIONS. The real numbers have the following properties: a b c ab ac
REVIEW OF ALGEBRA Here we review the bsic rules nd procedures of lgebr tht you need to know in order to be successful in clculus. ARITHMETIC OPERATIONS The rel numbers hve the following properties: b b
More informationThe Periodically Forced Harmonic Oscillator
The Periodiclly Forced Hrmonic Oscilltor S. F. Ellermeyer Kennesw Stte University July 15, 003 Abstrct We study the differentil eqution dt + pdy + qy = A cos (t θ) dt which models periodiclly forced hrmonic
More informationExam 2, Mathematics 4701, Section ETY6 6:05 pm 7:40 pm, March 31, 2016, IH-1105 Instructor: Attila Máté 1
Exm, Mthemtics 471, Section ETY6 6:5 pm 7:4 pm, Mrch 1, 16, IH-115 Instructor: Attil Máté 1 17 copies 1. ) Stte the usul sufficient condition for the fixed-point itertion to converge when solving the eqution
More informationLecture 1. Functional series. Pointwise and uniform convergence.
1 Introduction. Lecture 1. Functionl series. Pointwise nd uniform convergence. In this course we study mongst other things Fourier series. The Fourier series for periodic function f(x) with period 2π is
More informationNumerical Analysis: Trapezoidal and Simpson s Rule
nd Simpson s Mthemticl question we re interested in numericlly nswering How to we evlute I = f (x) dx? Clculus tells us tht if F(x) is the ntiderivtive of function f (x) on the intervl [, b], then I =
More informationReinforcement learning
Reinforcement lerning Regulr MDP Given: Trnition model P Rewrd function R Find: Policy π Reinforcement lerning Trnition model nd rewrd function initilly unknown Still need to find the right policy Lern
More informationLecture 14: Quadrature
Lecture 14: Qudrture This lecture is concerned with the evlution of integrls fx)dx 1) over finite intervl [, b] The integrnd fx) is ssumed to be rel-vlues nd smooth The pproximtion of n integrl by numericl
More informationRecitation 3: More Applications of the Derivative
Mth 1c TA: Pdric Brtlett Recittion 3: More Applictions of the Derivtive Week 3 Cltech 2012 1 Rndom Question Question 1 A grph consists of the following: A set V of vertices. A set E of edges where ech
More information5 Probability densities
5 Probbility densities 5. Continuous rndom vribles 5. The norml distribution 5.3 The norml pproimtion to the binomil distribution 5.5 The uniorm distribution 5. Joint distribution discrete nd continuous
More informationHoeffding, Azuma, McDiarmid
Hoeffding, Azum, McDirmid Krl Strtos 1 Hoeffding (sum of independent RVs) Hoeffding s lemm. If X [, ] nd E[X] 0, then for ll t > 0: E[e tx ] e t2 ( ) 2 / Proof. Since e t is conve, for ll [, ]: This mens:
More informationAnalytical Methods Exam: Preparatory Exercises
Anlyticl Methods Exm: Preprtory Exercises Question. Wht does it men tht (X, F, µ) is mesure spce? Show tht µ is monotone, tht is: if E F re mesurble sets then µ(e) µ(f). Question. Discuss if ech of the
More informationFor the percentage of full time students at RCC the symbols would be:
Mth 17/171 Chpter 7- ypothesis Testing with One Smple This chpter is s simple s the previous one, except it is more interesting In this chpter we will test clims concerning the sme prmeters tht we worked
More informationS. S. Dragomir. 1. Introduction. In [1], Guessab and Schmeisser have proved among others, the following companion of Ostrowski s inequality:
FACTA UNIVERSITATIS NIŠ) Ser Mth Inform 9 00) 6 SOME COMPANIONS OF OSTROWSKI S INEQUALITY FOR ABSOLUTELY CONTINUOUS FUNCTIONS AND APPLICATIONS S S Drgomir Dedicted to Prof G Mstroinni for his 65th birthdy
More informationModule 6 Value Iteration. CS 886 Sequential Decision Making and Reinforcement Learning University of Waterloo
Module 6 Vlue Itertion CS 886 Sequentil Decision Mking nd Reinforcement Lerning University of Wterloo Mrkov Decision Process Definition Set of sttes: S Set of ctions (i.e., decisions): A Trnsition model:
More informationPrinciples of Real Analysis I Fall VI. Riemann Integration
21-355 Principles of Rel Anlysis I Fll 2004 A. Definitions VI. Riemnn Integrtion Let, b R with < b be given. By prtition of [, b] we men finite set P [, b] with, b P. The set of ll prtitions of [, b] will
More informationCS 188: Artificial Intelligence Spring 2007
CS 188: Artificil Intelligence Spring 2007 Lecture 3: Queue-Bsed Serch 1/23/2007 Srini Nrynn UC Berkeley Mny slides over the course dpted from Dn Klein, Sturt Russell or Andrew Moore Announcements Assignment
More informationSection 11.5 Estimation of difference of two proportions
ection.5 Estimtion of difference of two proportions As seen in estimtion of difference of two mens for nonnorml popultion bsed on lrge smple sizes, one cn use CLT in the pproximtion of the distribution
More informationMulti-Bandit Best Arm Identification
Multi-Bndit Best Arm Identifiction Victor Gbillon Mohmmd Ghvmzdeh Alessndro Lzric INRIA Lille - Nord Europe, Tem SequeL {victor.gbillon,mohmmd.ghvmzdeh,lessndro.lzric}@inri.fr Sébstien Bubeck Deprtment
More informationWhen a force f(t) is applied to a mass in a system, we recall that Newton s law says that. f(t) = ma = m d dt v,
Impulse Functions In mny ppliction problems, n externl force f(t) is pplied over very short period of time. For exmple, if mss in spring nd dshpot system is struck by hmmer, the ppliction of the force
More information8 Laplace s Method and Local Limit Theorems
8 Lplce s Method nd Locl Limit Theorems 8. Fourier Anlysis in Higher DImensions Most of the theorems of Fourier nlysis tht we hve proved hve nturl generliztions to higher dimensions, nd these cn be proved
More informationdt. However, we might also be curious about dy
Section 0. The Clculus of Prmetric Curves Even though curve defined prmetricly my not be function, we cn still consider concepts such s rtes of chnge. However, the concepts will need specil tretment. For
More information21.6 Green Functions for First Order Equations
21.6 Green Functions for First Order Equtions Consider the first order inhomogeneous eqution subject to homogeneous initil condition, B[y] y() = 0. The Green function G( ξ) is defined s the solution to
More information3.4 Numerical integration
3.4. Numericl integrtion 63 3.4 Numericl integrtion In mny economic pplictions it is necessry to compute the definite integrl of relvlued function f with respect to "weight" function w over n intervl [,
More informationMORE FUNCTION GRAPHING; OPTIMIZATION. (Last edited October 28, 2013 at 11:09pm.)
MORE FUNCTION GRAPHING; OPTIMIZATION FRI, OCT 25, 203 (Lst edited October 28, 203 t :09pm.) Exercise. Let n be n rbitrry positive integer. Give n exmple of function with exctly n verticl symptotes. Give
More informationPOLYPHASE CIRCUITS. Introduction:
POLYPHASE CIRCUITS Introduction: Three-phse systems re commonly used in genertion, trnsmission nd distribution of electric power. Power in three-phse system is constnt rther thn pulsting nd three-phse
More informationTests for the Ratio of Two Poisson Rates
Chpter 437 Tests for the Rtio of Two Poisson Rtes Introduction The Poisson probbility lw gives the probbility distribution of the number of events occurring in specified intervl of time or spce. The Poisson
More informationAM1 Mathematical Analysis 1 Oct Feb Exercises Lecture 3. sin(x + h) sin x h cos(x + h) cos x h
AM Mthemticl Anlysis Oct. Feb. Dte: October Exercises Lecture Exercise.. If h, prove the following identities hold for ll x: sin(x + h) sin x h cos(x + h) cos x h = sin γ γ = sin γ γ cos(x + γ) (.) sin(x
More informationPolynomial Approximations for the Natural Logarithm and Arctangent Functions. Math 230
Polynomil Approimtions for the Nturl Logrithm nd Arctngent Functions Mth 23 You recll from first semester clculus how one cn use the derivtive to find n eqution for the tngent line to function t given
More informationNumerical Analysis. 10th ed. R L Burden, J D Faires, and A M Burden
Numericl Anlysis 10th ed R L Burden, J D Fires, nd A M Burden Bemer Presenttion Slides Prepred by Dr. Annette M. Burden Youngstown Stte University July 9, 2015 Chpter 4.1: Numericl Differentition 1 Three-Point
More informationLesson 1.6 Exercises, pages 68 73
Lesson.6 Exercises, pges 68 7 A. Determine whether ech infinite geometric series hs finite sum. How do you know? ) + +.5 + 6.75 +... r is:.5, so the sum is not finite. b) 0.5 0.05 0.005 0.0005... r is:
More information