An Analysis of Reinforcement Learning with Function Approximation

Size: px
Start display at page:

Download "An Analysis of Reinforcement Learning with Function Approximation"

Transcription

1 Francisco S. Melo Carnegie Mellon University, Pittsburgh, PA 15213, USA Sean P. Meyn Coorinate Science Lab, Urbana, IL 61801, USA M. Isabel Ribeiro Institute for Systems an Robotics, Lisboa, Portugal Abstract We aress the problem of computing the optimal Q-function in Markov ecision problems with infinite state-space. We analyze the convergence properties of several variations of Q-learning when combine with function approximation, extening the analysis of TD-learning in (Tsitsiklis & Van Roy, 1996a) to stochastic control settings. We ientify conitions uner which such approximate methos converge with probability 1. We conclue with a brief iscussion on the general applicability of our results an compare them with several relate works. 1. Introuction Convergence of Q-learning with function approximation has been a long staning open question in reinforcement learning (Sutton, 1999). In general, valuebase reinforcement learning (RL) methos for optimal control behave poorly when combine with function approximation (Bair, 1995; Tsitsiklis & Van Roy, 1996a). In this paper, we aress this problem by analyzing the convergence of Q-learning when combine with linear function approximation. We ientify a set of conitions that imply the convergence of this approximation metho with probability 1 (w.p.1), when a fixe learning policy is use, an provie an interpretation of the resulting approximation as the fixe point of a Bellman-like operator. This motivates the analysis of several variations of Q-learning when combine with linear function approximation. In particular, we Appearing in Proceeings of the 25 th International Conference on Machine Learning, Helsinki, Finlan, Copyright 2008 by the author(s)/owner(s). stuy a variation of Q-learning using importance sampling an an on-policy variant of Q-learning (SARSA). The paper is organize as follows. We start in Section 2 by escribing Markov ecision problems. We procee with our analysis of the Q-learning algorithm an its variants, an prouce our main results in Section 3. We also compare our results with other relate works in the RL literature. We conclue with some further iscussion in Section Markov Decision Problems Let (X, A, P, r, γ) be a Markov ecision problem (MDP) with a compact state-space X R p an a finite action set A. The action-epenent kernel P a efines the transition probabilities for the unerlying controlle Markov chain {X t } as P X t+1 U X t = x, A t = a = P a (x, U), where U is any measurable subset of X. The A-value process {A t } represents the control process: A t is the control action at time instant t. 1 Solving the MDP consists in etermining the control process {A t } maximizing the expecte total iscounte rewar V ( {A t }, x ) = E γ t R(X t, A t ) X 0 = x, t=0 where 0 γ < 1 is a iscount-factor an R(x, a) represents a ranom rewar receive for taking action a A in state x X. For simplicity of notation, we consier a boune eterministic function r : X A X R assigning a rewar r(x, a, y) every time a transition from x to y occurs after taking 1 We take the control process {A t} to be aapte to the σ-algebra inuce by {X t}.

2 action a. This means that E R(x, a) = r(x, a, y)p a (x, y). X The optimal value function V is efine for each state x X as V (x) = max V ( {A t }, x ) = {A t} = max E γ t R(X t, A t ) X 0 = x {A t} t=0 an verifies the Bellman optimality equation V (x) = max r(x, a, y) + γv (y) P a (x, y). (1) a A X V (x) represents the expecte total iscounte rewar receive along an optimal trajectory starting at state x. We can also efine the optimal Q-function Q as Q (x, a) = r(x, a, y) + γv (y) P a (x, y), (2) X representing the expecte total iscounte rewar along a trajectory starting at state x obtaine by choosing a as the first action an following the optimal policy thereafter. The control process {A t } efine as A t = arg max Q (X t, a), a A t, is optimal in the sense that V ({A t }, x) = V (x) an efines a mapping π : X A known as the optimal policy. The optimal policy etermines the optimal ecision rule for a given MDP. More generally, a (Markov) policy is any mapping π t efine over X A generating a control process {A t } verifying, for all t, P A t = a X t = x = π t (x, a), We write V πt (x) instea of V ({A t }, x) if the control process {A t } is generate by policy π t. A policy π t is stationary if it oes not epen on t an eterministic if it assigns probability 1 to a single action in each state an is thus represente as a map π t : X A for every t. Notice that the optimal control process can be obtaine from the optimal (stationary, eterministic) policy π, which can in turn be obtaine from Q. Therefore, the optimal control problem is solve once the function Q is known for all pairs (x, a). t. Now given any real function q efine over X A, we efine the Bellman operator (Hq)(x, a) = r(x, a, y) + γ max q(y, u) P a (x, y). u A X (3) The function Q in (2) is the fixe-point of H an, since this operator is a contraction in the sup-norm, a fixe-point iteration can be use to etermine Q (at least theoretically) The Q-Learning Algorithm We previously suggeste that a fixe-point iteration coul be use to etermine the function Q. In practice, this requires two important conitions: The kernel P an the rewar function r are known; The successive estimates for Q can be represente compactly an store in a computer with finite memory. If P an/or r are not known, a fixe-point iteration using H is not possible. To solve this problem, Watkins propose in 1989 the Q-learning algorithm (Watkins, 1989). Q-learning procees as follows: consier a MDP M = (X, A, P, r, γ) an suppose that {x t } is an infinite sample trajectory of the unerlying Markov chain obtaine with some policy π t. The corresponing sample control process is enote as {a t } an the sequence of obtaine rewars as {r t }. Given any initial estimate Q 0, Q-learning successively upates this estimate using the rule Q t+1 (x, a) = Q t (x, a) + α t (x, a) t, (4) where {α t } is a step-size sequence an t is the temporal ifference at time t, t = r t + γ max b A Q t(x t+1, b) Q t (x t, a t ). (5) If both X an A are finite sets, each estimate Q t is simply a X A matrix an can be represente explicitly in a computer. In that case, the convergence of Q-learning an several other relate algorithms (such as TD(λ) or SARSA) has been thoroughly stuie (see, for example, (Bertsekas & Tsitsiklis, 1996) an references therein). However, if either X or A are infinite or very large, explicitly representing each Q t becomes infeasible an some form of compact representation is neee (e.g., using function approximation). In this paper, we aress how several RL methos such as Q- learning an SARSA can be combine with function approximation an still retain their main convergence properties. 3. Reinforcement Learning with Linear Function Approximation In this section, we aress the problem of etermining the optimal Q-function for MDPs with infinite statespace X. Let Q = {Q θ } be a family of real-value

3 functions efine in X A. It is assume that the function class is linearly parameterize, so that Q can be expresse as the linear span of a fixe set of M linearly inepenent functions φ i : X A R. For each M-imensional parameter vector θ R M, the function Q θ Q is efine as, Q θ (x, a) = M φ i (x, a)θ(i) = φ (x, a)θ, i=1 where represents the transpose operator. We will also enote the above function by Q(θ) to emphasize the epenence on θ over the epenency on (x, a). Let π be a fixe stochastic, stationary policy an suppose that {x t }, {a t } an {r t } are sample trajectories of states, actions an rewars obtaine from the MDP using policy π. In the original Q-learning algorithm, the Q-values are upate accoring to (4). The temporal ifference t can be interprete as a 1-step estimation error with respect to the optimal function Q. The upate rule in Q-learning moves the estimates Q t closer to the esire function Q, minimizing the expecte value of t. In our approximate setting, we apply the same unerlying iea to obtain the upate rule for approximate Q-learning: θ t+1 = θ t + α t θ Q θ (x t, a t ) t = θ t + α t φ(x t, a t ) t, (6) where, as above, t is the temporal ifference at time t efine in (5). Notice that (6) upates θ t using the temporal ifference t as the error. The graient θ Q θ provies the irection in which this upate is performe. To establish convergence of the algorithm (6) we aopt an ODE argument, establishing the trajectories of the algorithm to closely follow those of an associate ODE with a globally asymptotically stable equilibrium point. This will require several regularity properties on the policy π an on its inuce Markov chain that will, in turn, motivate the stuy of the on-policy version of the algorithm. This on-policy algorithm can be seen as an extension of SARSA to infinite settings Convergence of Q-Learning We now procee by ientifying conitions that ensure the convergence of Q-learning with linear function approximation as escribe by (6). Due to space limitations, we overlook some of the technical etails in the proofs that can easily be fille in. We start by introucing some notation that will greatly simplify the presentation. Given an MDP M = (X, A, P, r, γ) with compact state space X R p, let (X, P π ) be the Markov chain inuce by a fixe policy π. We assume the chain (X, P π ) to be uniformly ergoic with invariant probability measure µ X an the policy π to verify π(x, a) > 0 for all a A an µ X - almost all x X. We enote by µ π the probability measure efine for each measurable set U X an each action a A as µ π (U {a}) = π(x, a)µ X (x). Let now {φ i, i = 1,..., M} be a set of boune, linearly inepenent basis functions to be use in our approximate Q-learning algorithm. We enote by Σ π the matrix efine as Σ π = E π φ(x, a)φ (x, a) = φ φ µ π U X A Notice that the above expression is well-efine an inepenent of the initial istribution for the chain, ue to our assumption of uniform ergoicity. For fixe θ R M an x X, efine the set of maximizing actions at x as A θ x = {a A φ (x, a )θ = max φ (x, a)θ} a an the greey policy with respect to θ as any policy that, at each state x, assigns positive probability only to actions in A θ x. Finally, let φ x enote the row-vector φ (x, a), where a is a ranom action generate accoring to the policy π at state x; likewise, let φ θ x enote the row-vector φ (x, a θ x), where a θ x is now any action in A θ x. We now introuce the θ-epenent matrix Σ π(θ) = E π (φ θ x ) φ θ x. By construction, both Σ π an Σ π are positive efinite, since the functions φ i are assume linearly inepenent. Notice also the ifference between Σ π an each Σ π: the actions in the efinition of the former are taken accoring to π while in the latter they are taken greeily with respect to a particular θ. We are now in position to introuce our first result. Theorem 1 Let M, π an {φ i, i = 1,..., M} be as efine above. If, for all θ, Σ π > γ 2 Σ π(θ) (7) an the step-size sequence verifies α t = αt 2 <, t t

4 then the algorithm in (6) converges w.p.1 an the limit point θ verifies the recursive relation Q(θ ) = Π Q HQ(θ ), where Π Q is the orthogonal projection onto Q. 2 Proof We establish the main statement of the theorem using a stanar ODE argument. The assumptions on the chain (X, P π ) an basis functions {φ i, i = 1,..., M} an the fact that π(x, a) > 0 for all a A an µ X -almost every x X ensure the applicability of Theorem 17 in page 239 of (Benveniste et al., 1990). Therefore, the convergence of the algorithm can be analyze in terms of the stability of the equilibrium points of the associate ODE θ = E π φ x ( r(x, a, y) + γφ θ y θ φ x θ ), (8) where we omitte the explicit epenence of θ on t to avoi excessively cluttering the expression. If the ODE (8) has a globally asymptotically stable equilibrium point, this implies the algorithm (6) to converge w.p.1 (Benveniste et al., 1990). Let then θ 1 (t) an θ 2 (t) be two trajectories of the ODE starting at ifferent initial conitions, an let θ(t) = θ 1 (t) θ 2 (t). From (8), we get t θ 2 2 = 2 θ Σ π θ + 2γEπ (φx θ)( φ θ 1 y θ 1 φ θ2 y θ 2 ). Notice now that, from the efinition of φ θ1 y φ θ1 y θ 2 φ θ2 y θ 2 φ θ2 y θ 1 φ θ1 y θ 1. an φ θ2 y, Taking this into account an efining the sets S + = {(x, a) φ (x, a) θ > 0} an S = X A S +, the previous expression becomes t θ θ Σ π θ + 2γEπ (φx θ)( φ θ 1 y θ ) I S+ + 2γE π (φx θ)( φ θ 2 y θ ) I S, where I S represents the inicator function for the set S. Applying Höler s inequality to each of the expec- 2 The orthogonal projection is naturally efine in the (infinite-imensional) Hilbert space containing Q with inner-prouct given by f, g = X A f g µ π. tations above, we get t θ θ Σ π θ + 2γ E π (φ x θ)2 I S+ E π (φ θ1 x θ) 2 I S+ + 2γ E π (φ x θ)2 I S E π (φ θ2 x θ) 2 I S an a few simple computations finally yiel t θ θ Σ π θ + 2γ θ Σ π θ max ( θ Σ π(θ 1 ) θ, θ Σ π(θ 2 ) θ). Since, by assumption, Σ π > γ 2 Σ π(θ), we can conclue from the expression above that t θ 2 2 < 0. This means, in particular, that θ(t) converges asymptotically to the origin, i.e., the ODE (8) is globally asymptotically stable. Since the ODE is autonomous (i.e., time-invariant), there exists one globally asymptotically stable equilibrium point for the ODE, that verifies the recursive relation θ = Σ 1 ( π E π φ x r(x, a, y) + γφ θ y θ ). (9) Since Σ π is, by construction, positive efinite, the inverse in (9) is well-efine. Multiplying (9) by φ (x, a) on both sies yiels the esire result. It is now important to observe that conition (7) is quite restrictive: since γ is usually taken close to 1, conition (7) essentially requires that, for every θ, max a A φ (x, a)θ π(x, a)φ (x, a)θ. a A Therefore, such conition will selom be met in practice, since it implies that the learning policy π is alreay close to the policy that the algorithm is meant to compute. In other wors, the maximization above yiels a policy close to the policy use uring learning. An, when this is the case, the algorithm essentially behaves like an on-policy algorithm. On the other han, the above conition can be ensure by consiering only a local maximization aroun the learning policy π. This is the most interesting aspect of the above result: it explicitly relates how much information the learning policy provies about greey policies, as a function of γ. To better unerstan this,

5 notice that each policy π is associate with a particular invariant measure on the inuce chain (X, P π ). In particular, the measure associate with the learning policy may be very ifferent from the one inuce by the greey/optimal policy. Taking into account the fact that γ measures, in a sense, the importance of the future, Theorem 1 basically states that: In problems where the performance of the agent greatly epens on future rewars (γ 1), the information provie by the learning policy can only be safely generalize to nearby greey policies, in the sense of (7). In problems where the performance of the agent is less epenent on future rewars (γ 1), the information provie by the learning policy can be safely generalize to more general greey policies. Suppose then that the maximization in the upate equation (6) is to be replace by a local maximization. In other wors, instea of maximizing over all actions in A, the algorithm shoul maximize over a small neighborhoo of the learning policy π (in policy space). The ifficulty with this approach is that such maximization can be har to implement. The use of importance sampling can reaily overcome such ifficulty, by making the maximization in policy-space implicit. The algorithm thus obtaine, which resembles in many aspects the one propose in (Precup et al., 2001), is escribe by the upate rule θ t+1 = θ t + α t φ(x t, a t ) ˆ t, (10) where the moifie temporal ifference ˆ t is given by ˆ t = r t + γ π θ (x t+1, b) π(x t+1, b) Q θ t (x t+1, b) Q θt (x t, a t ), b (11) where π θ is, for example, a θ-epenent ε-greey policy close to the learning policy π. 3 A possible implementation of such algorithm is sketche in Figure 1. We enote by π N the behavior policy at iteration N of the algorithm; in the stopping conition for the algorithm, any aequate policy norm can be use SARSA with Linear Function Approximation The analysis in the previous subsection suggests that on-policy algorithms may potentially yiel more reliable convergence properties. Such fact has alreay been observe in (Tsitsiklis & Van Roy, 1996a; Perkins & Penrith, 2002). In this subsection we thus focus on 3 Given ε > 0, a policy π is ε-greey with respect to a function Q θ Q if, at each state x X, it chooses a ranom action with probability ε an a greey action a A θ x with probability (1 ε). Algorithm 1 Moifie Q-learning. Require: Initial policy π 0 ; 1: Initialize X 0 = x 0 an set N = 0; 2: for t = 0 until T o 3: Sample A t π N (x t, ); 4: Sample next-state X t+1 P at (x t, ); 5: r t = r(x t, a t, x t+1 ); 6: Upate θ t accoring to (10); 7: en for 8: Set π N+1 (x, a) = π θ (x, a); 9: N = N + 1; 10: if π N π N 1 then 11: return π N ; 12: else 13: Goto 2; 14: en if on-policy algorithms. We analyze the convergence of SARSA when combine with linear function approximation. In our main result, we recover the essence of the result in (Perkins & Precup, 2003), although in a somewhat ifferent setting. The main ifferences between our work an that in (Perkins & Precup, 2003) are iscusse further ahea. Once again, we consier a family Q of real-value functions, the linear span of a fixe set of M linearly inepenent functions φ i : X A R, an erive an on-policy algorithm to compute a parameter vector θ such that φ (x, a)θ approximates the optimal Q- function. To this purpose, an unlike what has been one so far, we now consier a θ-epenent learning policy π θ verifying π θ (x, a) > 0 for all θ. In particular, we consier at each time step a learning policy π θt that is ε-greey with respect to φ (x, a)θ t an Lipschitz continuous with respect to θ, with Lipschitz constant C (with respect to some preferre metric). We further assume that, for every fixe θ, the Markov chain (X, P θ ) inuce by such policy is uniformly ergoic. Let then {x t }, {a t } an {r t } be sample trajectories of states, actions an rewars obtaine from the MDP M = (X, A, P, r, γ) using (at each time-step) the θ- epenent policy π θt. The upate rule for our approximate SARSA algorithm is: θ t+1 = θ t + α t φ(x t, a t ) t, (12) where t is the temporal ifference at time t, t = r t + γφ (x t+1, a t+1 )θ t φ (x t, a t )θ t. In orer to use the SARSA algorithm above to approximate the optimal Q-function, it is necessary to slowly ecay the exploration rate, ε, to zero, while guaran-

6 teeing the learning policy to verify the necessary regularity conitions (namely, Lipschitz continuous w.r.t. θ). However, as will soon become apparent, ecreasing the exploration rate to zero will rener our convergent result (an other relate results) not applicable. We are now in position to introuce our main result. Theorem 2 Let M, π θt an {φ i, i = 1,..., M} be as efine above. Let C be the Lipschitz constant of the learning policy π θ with respect to θ. Assume that the step-size sequence verifies α t = αt 2 <. t Then, there is C 0 > 0 such that, if C < C 0, the algorithm in (12) converges w.p.1. Proof We again use an ODE argument to establish the statement of the theorem. As before, the assumptions on (X, P θ ) an basis functions {φ i, i = 1,..., M} an the fact that the learning policy is Lipschitz continuous with respect to θ an verifies π(x, a) > 0 ensure the applicability of Theorem 17 in page 239 of (Benveniste et al., 1990). Therefore, the convergence of the algorithm can be analyze in terms of the stability of the associate ODE: θ = E θ φ x ( r(x, a, y) + γφy θ φ x θ ). (13) Notice that the expectation is taken with respect to the invariant measure of the chain an learning policy, both θ-epenent. To establish global asymptotic stability, we re-write (13) as where θ(t) = A θ θ(t) + b θ A θ = E θ φ x ( γφy φ x ) ; bθ = E θ φ x r(x, a, y). An equilibrium point of (13) must verify θ = A 1 θ b θ an the existence of such equilibrium point has been establishe in (e Farias & Van Roy, 2000) (Theorem 5.1). Let θ(t) = θ(t) θ. Then, Let t θ 2 2 = 2 θ ( A θ θ + b θ ) = = 2 θ ( A θ θ A θ θ + b θ b θ ). λ A = sup A θ A θ 2 θ t λ b = sup θ θ b θ b θ 2 θ θ 2, where the norm in the efinition of λ A is the inuce operator norm an the one in the efinition of λ b is the regular Eucliian norm. The previous expression thus becomes t θ 2 2 = = 2 θ A θ θ + 2 θ (A θ A θ )θ + 2 θ (b θ b θ ) 2 θ A θ θ + 2(λA + λ b ) θ 2 2. Letting λ = λ A + λ b, the above expression can be written as t θ 2 2 θ ( A θ + λi) θ. The fact that the learning policy is assume Lipschitz w.r.t. θ an the uniform ergoicity of the corresponing inuce chain implies that A θ an b θ are also Lipschitz w.r.t. θ (with a ifferent constant). This means that λ goes to zero with C an, therefore, for C sufficiently small, (A+λI) is a negative efinite matrix. 4 Therefore, the ODE (13) is globally asymptotically stable an the conclusion of the theorem follows. Several remarks are now in orer. First of all, Theorem 2 basically states that, for fixe ε if the epenence of the learning policy π θ can be mae sufficiently smooth, then SARSA converges w.p.1. This result is similar to the result in (Perkins & Precup, 2003), although the algorithms are not exactly similar: we consier a continuing task, while the algorithm feature in (Perkins & Precup, 2003) is implemente in an episoic fashion. Furthermore, in our case, convergence was establishe using an ODE argument, instea of the contraction argument in (Perkins & Precup, 2003). Nevertheless, both methos of proof are, in its essence, equivalent an the results in both papers concorant. A secon remark is relate with the implementation of SARSA with a ecaying exploration policy. The analysis of one such algorithm coul be conucte using, once again, an ODE argument. In particular, SARSA coul be escribe as a two-time-scale algorithm: the iterations of the main algorithm (corresponing to (12)) woul evelop on a faster time-scale an the ecaying exploration rate woul evelop at a slower time-scale. The analysis in (Borkar, 1997) coul then be replicate. However, it is well-known that, as ε approaches zero, the learning policy will approach the greey policy w.r.t. θ which is, in general, iscontinuous. Therefore, there is little hope that the smoothness 4 The fact that A θ is negative efinite has been establishe in several works. See, for example, Lemma 3 in (Perkins & Precup, 2003) or, in a slightly ifferent setting (easily extenable to our setting) the proof of Theorem 1 in (Tsitsiklis & Van Roy, 1996a).

7 conition in Theorem 2 (or its equivalent in (Perkins & Precup, 2003)) can be met as ε approaches to zero. 4. Discussion We now briefly iscuss some of the assumptions in the above theorems. We start by emphasizing that all state conitions are only sufficient, meaning that it is possible that convergence may occur even if some (or all) fail to hol. We also iscuss the relation between our results an other relate works from the literature. Seconsly, uniform ergoicity of a Markov chain essentially means that the chain quickly converges to a stationary behavior uniformly over the state-space an we can stuy any properties of the stationary chain by irect sampling. 5 This property an the requirement that π(x, a) > 0 for all a A an µ X -almost all x X can be interprete as a continuous counterpart to the usual conition that all state-action pairs are visite infinitely often. In fact, uniform ergoicity implies that all the regions of the state-space with positive µ X measure are sufficiently visite (Meyn & Tweeie, 1993), an the conition π(x, a) > 0 ensures that, at each state, every action is sufficiently trie. It appears to be a stanar requirement in this continuous scenario, as it has also been use in other works (Tsitsiklis & Van Roy, 1996a; Singh et al., 1994; Perkins & Precup, 2003). 6 The requirement that π(x, a) > 0 for all a A an µ X -almost all x X also correspons to the concept of fair control as introuce in (Borkar, 2000). We also remark that the ivergence example in (Goron, 1996) is ue to the fact that the learning policy fails to verify the Lipschitz continuity conition state in Theorem 2 (as iscusse in (Perkins & Penrith, 2002)) Relate Work In this paper, we analyze how RL algorithms can be combine with linear function approximation to approximate the optimal Q-function in MDPs with infinite state-spaces. In the last ecae or so, several authors have aresse this same problem from ifferent perspectives. We now briefly iscuss several such 5 Explicit bouns on the rate of convergence to stationarity are available in the literature (Meyn & Tweeie, 1994; Diaconis & Saloff-Coste, 1996; Rosenthal, 2002). However, for general chains, such bouns ten to be loose. 6 Most of these works make use of geometric ergoicity which, since we amit a compact state-space, is a consequence of uniform ergoicity. approaches an their relation to the results in this paper. One possible approach is to rely on soft-state aggregation (Singh et al., 1994; Goron, 1995; Tsitsiklis & Van Roy, 1996b), partitioning the state-space into soft regions. Treating the soft-regions as hyper-states, these methos then use stanar learning methos (such as Q-learning or SARSA) to approximate the optimal Q-function. The main ifferences between such methos an those using linear function approximation (such as the ones portraye here) are that, in the former, only one component of the parameter vector is upate at each iteration an the basis functions are, by construction, restraine to be positive an to a to one at each point of the state-space. Sample-base methos (Ormoneit & Sen, 2002; Szepesvári & Smart, 2004) further generalize the applicability of soft-state aggregation methos by using spreaing functions/kernels (Ribeiro & Szepesvári, 1996). Sample-base methos thus exhibit superior convergence rate when compare with simple softstate aggregation methos, although uner somewhat more restrictive conitions. Finally, RL with general linear function approximation was thorougly stuie in (Tsitsiklis & Van Roy, 1996a; Taić, 2001). Posterior works extene the applicability of such results. In Precup01icml, an offpolicy convergent algorithm was propose that uses an importance-sampling principle similar to the one escribe in Section 3. In (Perkins & Precup, 2003), the authors establish the convergence of SARSA with linear function approximation Concluing Remarks We conclue by observing that all methos analyze here as well as those surveye above experience a egraation in performance as the istance between the target function an the chosen linear space increases. If the functions in the chosen linear space provie only a poor approximation of the esire function, there are no practical guarantees on the usefulness of such approximation. The error bouns erive in (Tsitsiklis & Van Roy, 1996a) are reassuring in that they state that the performance of approximate TD gracefully egraes as the istance between the target function an the chosen linear space increases. Although we have not aresse such topic in our analysis, we expect the error bouns in (Tsitsiklis & Van Roy, 1996a) to carry with little changes to our setting. Finally, we make no use of eligibility traces in our algorithms. However, it is just expectable that the methos escribe herein can easily be aapte to ac-

8 commoate for eligibility traces, this eventually yieling better approximations (with tighter error bouns) (Tsitsiklis & Van Roy, 1996a) Acknowlegements The authors woul like to acknowlege the many useful comments from Doina Precup an the unknown reviewers. Francisco S. Melo acknowleges the support from the ICTI an the Portuguese FCT, uner the Carnegie Mellon-Portugal Program. Sean Meyn gratefully acknowleges the financial support from the National Science Founation (ECS ). Isabel Ribeiro was supporte by the FCT in the frame of ISR-Lisboa pluriannual funing. Any opinions, finings, an conclusions or recommenations expresse in this material are those of the authors an o not necessarily reflect the views of the NSF. References Bair, L. (1995). Resiual algorithms: Reinforcement learning with function approximation. Proc. 12th Int. Conf. Machine Learning (pp ). Benveniste, A., Métivier, M., & Priouret, P. (1990). Aaptive algorithms an stochastic approximations, vol. 22. Springer-Verlag. Bertsekas, D., & Tsitsiklis, J. (1996). Neuro-ynamic programming. Athena Scientific. Borkar, V. (1997). Stochastic approximation with two time scales. Systems & Control Letters, 29, Borkar, V. (2000). A learning algorithm for iscretetime stochastic control. Probability in the Engineering an Informational Sciences, 14, e Farias, D., & Van Roy, B. (2000). On the existence of fixe points for approximate value iteration an temporal-ifference learning. Journal of Optimization Theory an Applications, 105, Diaconis, P., & Saloff-Coste, L. (1996). Logarithmic Sobolev inequalities for finite Markov chains. Annals of Applie Probability, 6, Goron, G. (1995). Stable function approximation in ynamic programming (Technical Report CMU-CS ). School of Computer Science, Carnegie Mellon University. Goron, G. (1996). Chattering in SARSA(λ). (Technical Report). CMU Learning Lab Internal Report. Meyn, S., & Tweeie, R. (1993). Markov chains an stochastic stability. Springer-Verlag. Meyn, S., & Tweeie, R. (1994). Computable bouns for geometric convergence rates of Markov chains. Annals of Applie Probability, 4, Ormoneit, D., & Sen, S. (2002). Kernel-base reinforcement learning. Machine Learning, 49, Perkins, T., & Penrith, M. (2002). On the existence of fixe-points for Q-learning an SARSA in partially observable omains. Proc. 19th Int. Conf. Machine Learning (pp ). Perkins, T., & Precup, D. (2003). A convergent form of approximate policy iteration. Av. Neural Information Proc. Systems (pp ). Precup, D., Sutton, R., & Dasgupta, S. (2001). Offpolicy temporal-ifference learning with function approximation. Proc. 18th Int. Conf. Machine Learning (pp ). Ribeiro, C., & Szepesvári, C. (1996). Q-learning combine with spreaing: Convergence an results. Proc. ISRF-IEE Int. Conf. Intelligent an Cognitive Systems (pp ). Rosenthal, J. (2002). Quantitative convergence rates of Markov chains: A simple account. Electronic Communications in Probability, 7, Singh, S., Jaakkola, T., & Joran, M. (1994). Reinforcement learning with soft state aggregation. Av. Neural Information Proc. Systems (pp ). Sutton, R. (1999). Open theoretical questions in reinforcement learning. Lecture Notes in Computer Science, 1572, Szepesvári, C., & Smart, W. (2004). Interpolationbase Q-learning. Proc. 21st Int. Conf. Machine learning (pp ). Taić, V. (2001). On the convergence of temporalifference learning with linear function approximation. Machine Learning, 42, Tsitsiklis, J., & Van Roy, B. (1996a). An analysis of temporal-ifference learning with function approximation. IEEE Trans. Automatic Control, 42, Tsitsiklis, J., & Van Roy, B. (1996b). Feature-base methos for large scale ynamic programming. Machine Learning, 22, Watkins, C. (1989). Learning from elaye rewars. Doctoral issertation, King s College, University of Cambrige.

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012 CS-6 Theory Gems November 8, 0 Lecture Lecturer: Alesaner Mąry Scribes: Alhussein Fawzi, Dorina Thanou Introuction Toay, we will briefly iscuss an important technique in probability theory measure concentration

More information

6 General properties of an autonomous system of two first order ODE

6 General properties of an autonomous system of two first order ODE 6 General properties of an autonomous system of two first orer ODE Here we embark on stuying the autonomous system of two first orer ifferential equations of the form ẋ 1 = f 1 (, x 2 ), ẋ 2 = f 2 (, x

More information

Time-of-Arrival Estimation in Non-Line-Of-Sight Environments

Time-of-Arrival Estimation in Non-Line-Of-Sight Environments 2 Conference on Information Sciences an Systems, The Johns Hopkins University, March 2, 2 Time-of-Arrival Estimation in Non-Line-Of-Sight Environments Sinan Gezici, Hisashi Kobayashi an H. Vincent Poor

More information

The total derivative. Chapter Lagrangian and Eulerian approaches

The total derivative. Chapter Lagrangian and Eulerian approaches Chapter 5 The total erivative 51 Lagrangian an Eulerian approaches The representation of a flui through scalar or vector fiels means that each physical quantity uner consieration is escribe as a function

More information

Lower bounds on Locality Sensitive Hashing

Lower bounds on Locality Sensitive Hashing Lower bouns on Locality Sensitive Hashing Rajeev Motwani Assaf Naor Rina Panigrahy Abstract Given a metric space (X, X ), c 1, r > 0, an p, q [0, 1], a istribution over mappings H : X N is calle a (r,

More information

SINGULAR PERTURBATION AND STATIONARY SOLUTIONS OF PARABOLIC EQUATIONS IN GAUSS-SOBOLEV SPACES

SINGULAR PERTURBATION AND STATIONARY SOLUTIONS OF PARABOLIC EQUATIONS IN GAUSS-SOBOLEV SPACES Communications on Stochastic Analysis Vol. 2, No. 2 (28) 289-36 Serials Publications www.serialspublications.com SINGULAR PERTURBATION AND STATIONARY SOLUTIONS OF PARABOLIC EQUATIONS IN GAUSS-SOBOLEV SPACES

More information

Computing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions

Computing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions Working Paper 2013:5 Department of Statistics Computing Exact Confience Coefficients of Simultaneous Confience Intervals for Multinomial Proportions an their Functions Shaobo Jin Working Paper 2013:5

More information

Lecture 2 Lagrangian formulation of classical mechanics Mechanics

Lecture 2 Lagrangian formulation of classical mechanics Mechanics Lecture Lagrangian formulation of classical mechanics 70.00 Mechanics Principle of stationary action MATH-GA To specify a motion uniquely in classical mechanics, it suffices to give, at some time t 0,

More information

Table of Common Derivatives By David Abraham

Table of Common Derivatives By David Abraham Prouct an Quotient Rules: Table of Common Derivatives By Davi Abraham [ f ( g( ] = [ f ( ] g( + f ( [ g( ] f ( = g( [ f ( ] g( g( f ( [ g( ] Trigonometric Functions: sin( = cos( cos( = sin( tan( = sec

More information

Lectures - Week 10 Introduction to Ordinary Differential Equations (ODES) First Order Linear ODEs

Lectures - Week 10 Introduction to Ordinary Differential Equations (ODES) First Order Linear ODEs Lectures - Week 10 Introuction to Orinary Differential Equations (ODES) First Orer Linear ODEs When stuying ODEs we are consiering functions of one inepenent variable, e.g., f(x), where x is the inepenent

More information

u!i = a T u = 0. Then S satisfies

u!i = a T u = 0. Then S satisfies Deterministic Conitions for Subspace Ientifiability from Incomplete Sampling Daniel L Pimentel-Alarcón, Nigel Boston, Robert D Nowak University of Wisconsin-Maison Abstract Consier an r-imensional subspace

More information

An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback

An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback Journal of Machine Learning Research 8 07) - Submitte /6; Publishe 5/7 An Optimal Algorithm for Banit an Zero-Orer Convex Optimization with wo-point Feeback Oha Shamir Department of Computer Science an

More information

19 Eigenvalues, Eigenvectors, Ordinary Differential Equations, and Control

19 Eigenvalues, Eigenvectors, Ordinary Differential Equations, and Control 19 Eigenvalues, Eigenvectors, Orinary Differential Equations, an Control This section introuces eigenvalues an eigenvectors of a matrix, an iscusses the role of the eigenvalues in etermining the behavior

More information

Least-Squares Regression on Sparse Spaces

Least-Squares Regression on Sparse Spaces Least-Squares Regression on Sparse Spaces Yuri Grinberg, Mahi Milani Far, Joelle Pineau School of Computer Science McGill University Montreal, Canaa {ygrinb,mmilan1,jpineau}@cs.mcgill.ca 1 Introuction

More information

7.1 Support Vector Machine

7.1 Support Vector Machine 67577 Intro. to Machine Learning Fall semester, 006/7 Lecture 7: Support Vector Machines an Kernel Functions II Lecturer: Amnon Shashua Scribe: Amnon Shashua 7. Support Vector Machine We return now to

More information

PDE Notes, Lecture #11

PDE Notes, Lecture #11 PDE Notes, Lecture # from Professor Jalal Shatah s Lectures Febuary 9th, 2009 Sobolev Spaces Recall that for u L loc we can efine the weak erivative Du by Du, φ := udφ φ C0 If v L loc such that Du, φ =

More information

Math Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors

Math Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors Math 18.02 Notes on ifferentials, the Chain Rule, graients, irectional erivative, an normal vectors Tangent plane an linear approximation We efine the partial erivatives of f( xy, ) as follows: f f( x+

More information

Improved Rate-Based Pull and Push Strategies in Large Distributed Networks

Improved Rate-Based Pull and Push Strategies in Large Distributed Networks Improve Rate-Base Pull an Push Strategies in Large Distribute Networks Wouter Minnebo an Benny Van Hout Department of Mathematics an Computer Science University of Antwerp - imins Mielheimlaan, B-00 Antwerp,

More information

Lower Bounds for the Smoothed Number of Pareto optimal Solutions

Lower Bounds for the Smoothed Number of Pareto optimal Solutions Lower Bouns for the Smoothe Number of Pareto optimal Solutions Tobias Brunsch an Heiko Röglin Department of Computer Science, University of Bonn, Germany brunsch@cs.uni-bonn.e, heiko@roeglin.org Abstract.

More information

Agmon Kolmogorov Inequalities on l 2 (Z d )

Agmon Kolmogorov Inequalities on l 2 (Z d ) Journal of Mathematics Research; Vol. 6, No. ; 04 ISSN 96-9795 E-ISSN 96-9809 Publishe by Canaian Center of Science an Eucation Agmon Kolmogorov Inequalities on l (Z ) Arman Sahovic Mathematics Department,

More information

The canonical controllers and regular interconnection

The canonical controllers and regular interconnection Systems & Control Letters ( www.elsevier.com/locate/sysconle The canonical controllers an regular interconnection A.A. Julius a,, J.C. Willems b, M.N. Belur c, H.L. Trentelman a Department of Applie Mathematics,

More information

ELEC3114 Control Systems 1

ELEC3114 Control Systems 1 ELEC34 Control Systems Linear Systems - Moelling - Some Issues Session 2, 2007 Introuction Linear systems may be represente in a number of ifferent ways. Figure shows the relationship between various representations.

More information

JUST THE MATHS UNIT NUMBER DIFFERENTIATION 2 (Rates of change) A.J.Hobson

JUST THE MATHS UNIT NUMBER DIFFERENTIATION 2 (Rates of change) A.J.Hobson JUST THE MATHS UNIT NUMBER 10.2 DIFFERENTIATION 2 (Rates of change) by A.J.Hobson 10.2.1 Introuction 10.2.2 Average rates of change 10.2.3 Instantaneous rates of change 10.2.4 Derivatives 10.2.5 Exercises

More information

d dx But have you ever seen a derivation of these results? We ll prove the first result below. cos h 1

d dx But have you ever seen a derivation of these results? We ll prove the first result below. cos h 1 Lecture 5 Some ifferentiation rules Trigonometric functions (Relevant section from Stewart, Seventh Eition: Section 3.3) You all know that sin = cos cos = sin. () But have you ever seen a erivation of

More information

Lyapunov Functions. V. J. Venkataramanan and Xiaojun Lin. Center for Wireless Systems and Applications. School of Electrical and Computer Engineering,

Lyapunov Functions. V. J. Venkataramanan and Xiaojun Lin. Center for Wireless Systems and Applications. School of Electrical and Computer Engineering, On the Queue-Overflow Probability of Wireless Systems : A New Approach Combining Large Deviations with Lyapunov Functions V. J. Venkataramanan an Xiaojun Lin Center for Wireless Systems an Applications

More information

Linear First-Order Equations

Linear First-Order Equations 5 Linear First-Orer Equations Linear first-orer ifferential equations make up another important class of ifferential equations that commonly arise in applications an are relatively easy to solve (in theory)

More information

Leaving Randomness to Nature: d-dimensional Product Codes through the lens of Generalized-LDPC codes

Leaving Randomness to Nature: d-dimensional Product Codes through the lens of Generalized-LDPC codes Leaving Ranomness to Nature: -Dimensional Prouct Coes through the lens of Generalize-LDPC coes Tavor Baharav, Kannan Ramchanran Dept. of Electrical Engineering an Computer Sciences, U.C. Berkeley {tavorb,

More information

Math 1B, lecture 8: Integration by parts

Math 1B, lecture 8: Integration by parts Math B, lecture 8: Integration by parts Nathan Pflueger 23 September 2 Introuction Integration by parts, similarly to integration by substitution, reverses a well-known technique of ifferentiation an explores

More information

Calculus and optimization

Calculus and optimization Calculus an optimization These notes essentially correspon to mathematical appenix 2 in the text. 1 Functions of a single variable Now that we have e ne functions we turn our attention to calculus. A function

More information

Tractability results for weighted Banach spaces of smooth functions

Tractability results for weighted Banach spaces of smooth functions Tractability results for weighte Banach spaces of smooth functions Markus Weimar Mathematisches Institut, Universität Jena Ernst-Abbe-Platz 2, 07740 Jena, Germany email: markus.weimar@uni-jena.e March

More information

Systems & Control Letters

Systems & Control Letters Systems & ontrol Letters ( ) ontents lists available at ScienceDirect Systems & ontrol Letters journal homepage: www.elsevier.com/locate/sysconle A converse to the eterministic separation principle Jochen

More information

Nested Saturation with Guaranteed Real Poles 1

Nested Saturation with Guaranteed Real Poles 1 Neste Saturation with Guarantee Real Poles Eric N Johnson 2 an Suresh K Kannan 3 School of Aerospace Engineering Georgia Institute of Technology, Atlanta, GA 3332 Abstract The global stabilization of asymptotically

More information

The derivative of a function f(x) is another function, defined in terms of a limiting expression: f(x + δx) f(x)

The derivative of a function f(x) is another function, defined in terms of a limiting expression: f(x + δx) f(x) Y. D. Chong (2016) MH2801: Complex Methos for the Sciences 1. Derivatives The erivative of a function f(x) is another function, efine in terms of a limiting expression: f (x) f (x) lim x δx 0 f(x + δx)

More information

TEMPORAL AND TIME-FREQUENCY CORRELATION-BASED BLIND SOURCE SEPARATION METHODS. Yannick DEVILLE

TEMPORAL AND TIME-FREQUENCY CORRELATION-BASED BLIND SOURCE SEPARATION METHODS. Yannick DEVILLE TEMPORAL AND TIME-FREQUENCY CORRELATION-BASED BLIND SOURCE SEPARATION METHODS Yannick DEVILLE Université Paul Sabatier Laboratoire Acoustique, Métrologie, Instrumentation Bât. 3RB2, 8 Route e Narbonne,

More information

LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION

LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION The Annals of Statistics 1997, Vol. 25, No. 6, 2313 2327 LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION By Eva Riccomagno, 1 Rainer Schwabe 2 an Henry P. Wynn 1 University of Warwick, Technische

More information

Switching Time Optimization in Discretized Hybrid Dynamical Systems

Switching Time Optimization in Discretized Hybrid Dynamical Systems Switching Time Optimization in Discretize Hybri Dynamical Systems Kathrin Flaßkamp, To Murphey, an Sina Ober-Blöbaum Abstract Switching time optimization (STO) arises in systems that have a finite set

More information

Summary: Differentiation

Summary: Differentiation Techniques of Differentiation. Inverse Trigonometric functions The basic formulas (available in MF5 are: Summary: Differentiation ( sin ( cos The basic formula can be generalize as follows: Note: ( sin

More information

Closed and Open Loop Optimal Control of Buffer and Energy of a Wireless Device

Closed and Open Loop Optimal Control of Buffer and Energy of a Wireless Device Close an Open Loop Optimal Control of Buffer an Energy of a Wireless Device V. S. Borkar School of Technology an Computer Science TIFR, umbai, Inia. borkar@tifr.res.in A. A. Kherani B. J. Prabhu INRIA

More information

Chaos, Solitons and Fractals Nonlinear Science, and Nonequilibrium and Complex Phenomena

Chaos, Solitons and Fractals Nonlinear Science, and Nonequilibrium and Complex Phenomena Chaos, Solitons an Fractals (7 64 73 Contents lists available at ScienceDirect Chaos, Solitons an Fractals onlinear Science, an onequilibrium an Complex Phenomena journal homepage: www.elsevier.com/locate/chaos

More information

Slide10 Haykin Chapter 14: Neurodynamics (3rd Ed. Chapter 13)

Slide10 Haykin Chapter 14: Neurodynamics (3rd Ed. Chapter 13) Slie10 Haykin Chapter 14: Neuroynamics (3r E. Chapter 13) CPSC 636-600 Instructor: Yoonsuck Choe Spring 2012 Neural Networks with Temporal Behavior Inclusion of feeback gives temporal characteristics to

More information

Optimal Control of Spatially Distributed Systems

Optimal Control of Spatially Distributed Systems Optimal Control of Spatially Distribute Systems Naer Motee an Ali Jababaie Abstract In this paper, we stuy the structural properties of optimal control of spatially istribute systems. Such systems consist

More information

A Note on Exact Solutions to Linear Differential Equations by the Matrix Exponential

A Note on Exact Solutions to Linear Differential Equations by the Matrix Exponential Avances in Applie Mathematics an Mechanics Av. Appl. Math. Mech. Vol. 1 No. 4 pp. 573-580 DOI: 10.4208/aamm.09-m0946 August 2009 A Note on Exact Solutions to Linear Differential Equations by the Matrix

More information

Multi-robot Formation Control Using Reinforcement Learning Method

Multi-robot Formation Control Using Reinforcement Learning Method Multi-robot Formation Control Using Reinforcement Learning Metho Guoyu Zuo, Jiatong Han, an Guansheng Han School of Electronic Information & Control Engineering, Beijing University of Technology, Beijing

More information

Cascaded redundancy reduction

Cascaded redundancy reduction Network: Comput. Neural Syst. 9 (1998) 73 84. Printe in the UK PII: S0954-898X(98)88342-5 Cascae reunancy reuction Virginia R e Sa an Geoffrey E Hinton Department of Computer Science, University of Toronto,

More information

Polynomial Inclusion Functions

Polynomial Inclusion Functions Polynomial Inclusion Functions E. e Weert, E. van Kampen, Q. P. Chu, an J. A. Muler Delft University of Technology, Faculty of Aerospace Engineering, Control an Simulation Division E.eWeert@TUDelft.nl

More information

Exponential Tracking Control of Nonlinear Systems with Actuator Nonlinearity

Exponential Tracking Control of Nonlinear Systems with Actuator Nonlinearity Preprints of the 9th Worl Congress The International Feeration of Automatic Control Cape Town, South Africa. August -9, Exponential Tracking Control of Nonlinear Systems with Actuator Nonlinearity Zhengqiang

More information

Thermal conductivity of graded composites: Numerical simulations and an effective medium approximation

Thermal conductivity of graded composites: Numerical simulations and an effective medium approximation JOURNAL OF MATERIALS SCIENCE 34 (999)5497 5503 Thermal conuctivity of grae composites: Numerical simulations an an effective meium approximation P. M. HUI Department of Physics, The Chinese University

More information

Separation of Variables

Separation of Variables Physics 342 Lecture 1 Separation of Variables Lecture 1 Physics 342 Quantum Mechanics I Monay, January 25th, 2010 There are three basic mathematical tools we nee, an then we can begin working on the physical

More information

Perturbation Analysis and Optimization of Stochastic Flow Networks

Perturbation Analysis and Optimization of Stochastic Flow Networks IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. XX, NO. Y, MMM 2004 1 Perturbation Analysis an Optimization of Stochastic Flow Networks Gang Sun, Christos G. Cassanras, Yorai Wari, Christos G. Panayiotou,

More information

Chapter 6: Energy-Momentum Tensors

Chapter 6: Energy-Momentum Tensors 49 Chapter 6: Energy-Momentum Tensors This chapter outlines the general theory of energy an momentum conservation in terms of energy-momentum tensors, then applies these ieas to the case of Bohm's moel.

More information

Expected Value of Partial Perfect Information

Expected Value of Partial Perfect Information Expecte Value of Partial Perfect Information Mike Giles 1, Takashi Goa 2, Howar Thom 3 Wei Fang 1, Zhenru Wang 1 1 Mathematical Institute, University of Oxfor 2 School of Engineering, University of Tokyo

More information

WUCHEN LI AND STANLEY OSHER

WUCHEN LI AND STANLEY OSHER CONSTRAINED DYNAMICAL OPTIMAL TRANSPORT AND ITS LAGRANGIAN FORMULATION WUCHEN LI AND STANLEY OSHER Abstract. We propose ynamical optimal transport (OT) problems constraine in a parameterize probability

More information

On combinatorial approaches to compressed sensing

On combinatorial approaches to compressed sensing On combinatorial approaches to compresse sensing Abolreza Abolhosseini Moghaam an Hayer Raha Department of Electrical an Computer Engineering, Michigan State University, East Lansing, MI, U.S. Emails:{abolhos,raha}@msu.eu

More information

Lecture 6: Calculus. In Song Kim. September 7, 2011

Lecture 6: Calculus. In Song Kim. September 7, 2011 Lecture 6: Calculus In Song Kim September 7, 20 Introuction to Differential Calculus In our previous lecture we came up with several ways to analyze functions. We saw previously that the slope of a linear

More information

Convergence of Random Walks

Convergence of Random Walks Chapter 16 Convergence of Ranom Walks This lecture examines the convergence of ranom walks to the Wiener process. This is very important both physically an statistically, an illustrates the utility of

More information

II. First variation of functionals

II. First variation of functionals II. First variation of functionals The erivative of a function being zero is a necessary conition for the etremum of that function in orinary calculus. Let us now tackle the question of the equivalent

More information

BEYOND THE CONSTRUCTION OF OPTIMAL SWITCHING SURFACES FOR AUTONOMOUS HYBRID SYSTEMS. Mauro Boccadoro Magnus Egerstedt Paolo Valigi Yorai Wardi

BEYOND THE CONSTRUCTION OF OPTIMAL SWITCHING SURFACES FOR AUTONOMOUS HYBRID SYSTEMS. Mauro Boccadoro Magnus Egerstedt Paolo Valigi Yorai Wardi BEYOND THE CONSTRUCTION OF OPTIMAL SWITCHING SURFACES FOR AUTONOMOUS HYBRID SYSTEMS Mauro Boccaoro Magnus Egerstet Paolo Valigi Yorai Wari {boccaoro,valigi}@iei.unipg.it Dipartimento i Ingegneria Elettronica

More information

The Exact Form and General Integrating Factors

The Exact Form and General Integrating Factors 7 The Exact Form an General Integrating Factors In the previous chapters, we ve seen how separable an linear ifferential equations can be solve using methos for converting them to forms that can be easily

More information

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013 Survey Sampling Kosuke Imai Department of Politics, Princeton University February 19, 2013 Survey sampling is one of the most commonly use ata collection methos for social scientists. We begin by escribing

More information

Multi-agent Systems Reaching Optimal Consensus with Time-varying Communication Graphs

Multi-agent Systems Reaching Optimal Consensus with Time-varying Communication Graphs Preprints of the 8th IFAC Worl Congress Multi-agent Systems Reaching Optimal Consensus with Time-varying Communication Graphs Guoong Shi ACCESS Linnaeus Centre, School of Electrical Engineering, Royal

More information

Math 342 Partial Differential Equations «Viktor Grigoryan

Math 342 Partial Differential Equations «Viktor Grigoryan Math 342 Partial Differential Equations «Viktor Grigoryan 6 Wave equation: solution In this lecture we will solve the wave equation on the entire real line x R. This correspons to a string of infinite

More information

Situation awareness of power system based on static voltage security region

Situation awareness of power system based on static voltage security region The 6th International Conference on Renewable Power Generation (RPG) 19 20 October 2017 Situation awareness of power system base on static voltage security region Fei Xiao, Zi-Qing Jiang, Qian Ai, Ran

More information

Number of wireless sensors needed to detect a wildfire

Number of wireless sensors needed to detect a wildfire Number of wireless sensors neee to etect a wilfire Pablo I. Fierens Instituto Tecnológico e Buenos Aires (ITBA) Physics an Mathematics Department Av. Maero 399, Buenos Aires, (C1106ACD) Argentina pfierens@itba.eu.ar

More information

Conservation Laws. Chapter Conservation of Energy

Conservation Laws. Chapter Conservation of Energy 20 Chapter 3 Conservation Laws In orer to check the physical consistency of the above set of equations governing Maxwell-Lorentz electroynamics [(2.10) an (2.12) or (1.65) an (1.68)], we examine the action

More information

Euler equations for multiple integrals

Euler equations for multiple integrals Euler equations for multiple integrals January 22, 2013 Contents 1 Reminer of multivariable calculus 2 1.1 Vector ifferentiation......................... 2 1.2 Matrix ifferentiation........................

More information

Robustness and Perturbations of Minimal Bases

Robustness and Perturbations of Minimal Bases Robustness an Perturbations of Minimal Bases Paul Van Dooren an Froilán M Dopico December 9, 2016 Abstract Polynomial minimal bases of rational vector subspaces are a classical concept that plays an important

More information

Optimal Control of Spatially Distributed Systems

Optimal Control of Spatially Distributed Systems Optimal Control of Spatially Distribute Systems Naer Motee an Ali Jababaie Abstract In this paper, we stuy the structural properties of optimal control of spatially istribute systems. Such systems consist

More information

Generalized Tractability for Multivariate Problems

Generalized Tractability for Multivariate Problems Generalize Tractability for Multivariate Problems Part II: Linear Tensor Prouct Problems, Linear Information, an Unrestricte Tractability Michael Gnewuch Department of Computer Science, University of Kiel,

More information

arxiv: v1 [math.mg] 10 Apr 2018

arxiv: v1 [math.mg] 10 Apr 2018 ON THE VOLUME BOUND IN THE DVORETZKY ROGERS LEMMA FERENC FODOR, MÁRTON NASZÓDI, AND TAMÁS ZARNÓCZ arxiv:1804.03444v1 [math.mg] 10 Apr 2018 Abstract. The classical Dvoretzky Rogers lemma provies a eterministic

More information

Multi-View Clustering via Canonical Correlation Analysis

Multi-View Clustering via Canonical Correlation Analysis Technical Report TTI-TR-2008-5 Multi-View Clustering via Canonical Correlation Analysis Kamalika Chauhuri UC San Diego Sham M. Kakae Toyota Technological Institute at Chicago ABSTRACT Clustering ata in

More information

Sharp Thresholds. Zachary Hamaker. March 15, 2010

Sharp Thresholds. Zachary Hamaker. March 15, 2010 Sharp Threshols Zachary Hamaker March 15, 2010 Abstract The Kolmogorov Zero-One law states that for tail events on infinite-imensional probability spaces, the probability must be either zero or one. Behavior

More information

LECTURE NOTES ON DVORETZKY S THEOREM

LECTURE NOTES ON DVORETZKY S THEOREM LECTURE NOTES ON DVORETZKY S THEOREM STEVEN HEILMAN Abstract. We present the first half of the paper [S]. In particular, the results below, unless otherwise state, shoul be attribute to G. Schechtman.

More information

A new proof of the sharpness of the phase transition for Bernoulli percolation on Z d

A new proof of the sharpness of the phase transition for Bernoulli percolation on Z d A new proof of the sharpness of the phase transition for Bernoulli percolation on Z Hugo Duminil-Copin an Vincent Tassion October 8, 205 Abstract We provie a new proof of the sharpness of the phase transition

More information

Estimation of the Maximum Domination Value in Multi-Dimensional Data Sets

Estimation of the Maximum Domination Value in Multi-Dimensional Data Sets Proceeings of the 4th East-European Conference on Avances in Databases an Information Systems ADBIS) 200 Estimation of the Maximum Domination Value in Multi-Dimensional Data Sets Eleftherios Tiakas, Apostolos.

More information

FLUCTUATIONS IN THE NUMBER OF POINTS ON SMOOTH PLANE CURVES OVER FINITE FIELDS. 1. Introduction

FLUCTUATIONS IN THE NUMBER OF POINTS ON SMOOTH PLANE CURVES OVER FINITE FIELDS. 1. Introduction FLUCTUATIONS IN THE NUMBER OF POINTS ON SMOOTH PLANE CURVES OVER FINITE FIELDS ALINA BUCUR, CHANTAL DAVID, BROOKE FEIGON, MATILDE LALÍN 1 Introuction In this note, we stuy the fluctuations in the number

More information

Perfect Matchings in Õ(n1.5 ) Time in Regular Bipartite Graphs

Perfect Matchings in Õ(n1.5 ) Time in Regular Bipartite Graphs Perfect Matchings in Õ(n1.5 ) Time in Regular Bipartite Graphs Ashish Goel Michael Kapralov Sanjeev Khanna Abstract We consier the well-stuie problem of fining a perfect matching in -regular bipartite

More information

Convergence rates of moment-sum-of-squares hierarchies for optimal control problems

Convergence rates of moment-sum-of-squares hierarchies for optimal control problems Convergence rates of moment-sum-of-squares hierarchies for optimal control problems Milan Kora 1, Diier Henrion 2,3,4, Colin N. Jones 1 Draft of September 8, 2016 Abstract We stuy the convergence rate

More information

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks A PAC-Bayesian Approach to Spectrally-Normalize Margin Bouns for Neural Networks Behnam Neyshabur, Srinah Bhojanapalli, Davi McAllester, Nathan Srebro Toyota Technological Institute at Chicago {bneyshabur,

More information

Physics 5153 Classical Mechanics. The Virial Theorem and The Poisson Bracket-1

Physics 5153 Classical Mechanics. The Virial Theorem and The Poisson Bracket-1 Physics 5153 Classical Mechanics The Virial Theorem an The Poisson Bracket 1 Introuction In this lecture we will consier two applications of the Hamiltonian. The first, the Virial Theorem, applies to systems

More information

Monotonicity for excited random walk in high dimensions

Monotonicity for excited random walk in high dimensions Monotonicity for excite ranom walk in high imensions Remco van er Hofsta Mark Holmes March, 2009 Abstract We prove that the rift θ, β) for excite ranom walk in imension is monotone in the excitement parameter

More information

Implicit Differentiation

Implicit Differentiation Implicit Differentiation Thus far, the functions we have been concerne with have been efine explicitly. A function is efine explicitly if the output is given irectly in terms of the input. For instance,

More information

θ x = f ( x,t) could be written as

θ x = f ( x,t) could be written as 9. Higher orer PDEs as systems of first-orer PDEs. Hyperbolic systems. For PDEs, as for ODEs, we may reuce the orer by efining new epenent variables. For example, in the case of the wave equation, (1)

More information

Permanent vs. Determinant

Permanent vs. Determinant Permanent vs. Determinant Frank Ban Introuction A major problem in theoretical computer science is the Permanent vs. Determinant problem. It asks: given an n by n matrix of ineterminates A = (a i,j ) an

More information

there is no special reason why the value of y should be fixed at y = 0.3. Any y such that

there is no special reason why the value of y should be fixed at y = 0.3. Any y such that 25. More on bivariate functions: partial erivatives integrals Although we sai that the graph of photosynthesis versus temperature in Lecture 16 is like a hill, in the real worl hills are three-imensional

More information

A Sketch of Menshikov s Theorem

A Sketch of Menshikov s Theorem A Sketch of Menshikov s Theorem Thomas Bao March 14, 2010 Abstract Let Λ be an infinite, locally finite oriente multi-graph with C Λ finite an strongly connecte, an let p

More information

All s Well That Ends Well: Supplementary Proofs

All s Well That Ends Well: Supplementary Proofs All s Well That Ens Well: Guarantee Resolution of Simultaneous Rigi Boy Impact 1:1 All s Well That Ens Well: Supplementary Proofs This ocument complements the paper All s Well That Ens Well: Guarantee

More information

Acute sets in Euclidean spaces

Acute sets in Euclidean spaces Acute sets in Eucliean spaces Viktor Harangi April, 011 Abstract A finite set H in R is calle an acute set if any angle etermine by three points of H is acute. We examine the maximal carinality α() of

More information

IPA Derivatives for Make-to-Stock Production-Inventory Systems With Backorders Under the (R,r) Policy

IPA Derivatives for Make-to-Stock Production-Inventory Systems With Backorders Under the (R,r) Policy IPA Derivatives for Make-to-Stock Prouction-Inventory Systems With Backorers Uner the (Rr) Policy Yihong Fan a Benamin Melame b Yao Zhao c Yorai Wari Abstract This paper aresses Infinitesimal Perturbation

More information

Learning Automata in Games with Memory with Application to Circuit-Switched Routing

Learning Automata in Games with Memory with Application to Circuit-Switched Routing Learning Automata in Games with Memory with Application to Circuit-Switche Routing Murat Alanyali Abstract A general setting is consiere in which autonomous users interact by means of a finite-state controlle

More information

Balancing Expected and Worst-Case Utility in Contracting Models with Asymmetric Information and Pooling

Balancing Expected and Worst-Case Utility in Contracting Models with Asymmetric Information and Pooling Balancing Expecte an Worst-Case Utility in Contracting Moels with Asymmetric Information an Pooling R.B.O. erkkamp & W. van en Heuvel & A.P.M. Wagelmans Econometric Institute Report EI2018-01 9th January

More information

How to Minimize Maximum Regret in Repeated Decision-Making

How to Minimize Maximum Regret in Repeated Decision-Making How to Minimize Maximum Regret in Repeate Decision-Making Karl H. Schlag July 3 2003 Economics Department, European University Institute, Via ella Piazzuola 43, 033 Florence, Italy, Tel: 0039-0-4689, email:

More information

Equilibrium in Queues Under Unknown Service Times and Service Value

Equilibrium in Queues Under Unknown Service Times and Service Value University of Pennsylvania ScholarlyCommons Finance Papers Wharton Faculty Research 1-2014 Equilibrium in Queues Uner Unknown Service Times an Service Value Laurens Debo Senthil K. Veeraraghavan University

More information

On the number of isolated eigenvalues of a pair of particles in a quantum wire

On the number of isolated eigenvalues of a pair of particles in a quantum wire On the number of isolate eigenvalues of a pair of particles in a quantum wire arxiv:1812.11804v1 [math-ph] 31 Dec 2018 Joachim Kerner 1 Department of Mathematics an Computer Science FernUniversität in

More information

Sturm-Liouville Theory

Sturm-Liouville Theory LECTURE 5 Sturm-Liouville Theory In the three preceing lectures I emonstrate the utility of Fourier series in solving PDE/BVPs. As we ll now see, Fourier series are just the tip of the iceberg of the theory

More information

Schrödinger s equation.

Schrödinger s equation. Physics 342 Lecture 5 Schröinger s Equation Lecture 5 Physics 342 Quantum Mechanics I Wenesay, February 3r, 2010 Toay we iscuss Schröinger s equation an show that it supports the basic interpretation of

More information

'HVLJQ &RQVLGHUDWLRQ LQ 0DWHULDO 6HOHFWLRQ 'HVLJQ 6HQVLWLYLW\,1752'8&7,21

'HVLJQ &RQVLGHUDWLRQ LQ 0DWHULDO 6HOHFWLRQ 'HVLJQ 6HQVLWLYLW\,1752'8&7,21 Large amping in a structural material may be either esirable or unesirable, epening on the engineering application at han. For example, amping is a esirable property to the esigner concerne with limiting

More information

Introduction to the Vlasov-Poisson system

Introduction to the Vlasov-Poisson system Introuction to the Vlasov-Poisson system Simone Calogero 1 The Vlasov equation Consier a particle with mass m > 0. Let x(t) R 3 enote the position of the particle at time t R an v(t) = ẋ(t) = x(t)/t its

More information

OPTIMAL CONTROL OF A PRODUCTION SYSTEM WITH INVENTORY-LEVEL-DEPENDENT DEMAND

OPTIMAL CONTROL OF A PRODUCTION SYSTEM WITH INVENTORY-LEVEL-DEPENDENT DEMAND Applie Mathematics E-Notes, 5(005), 36-43 c ISSN 1607-510 Available free at mirror sites of http://www.math.nthu.eu.tw/ amen/ OPTIMAL CONTROL OF A PRODUCTION SYSTEM WITH INVENTORY-LEVEL-DEPENDENT DEMAND

More information

ORDINARY DIFFERENTIAL EQUATIONS AND SINGULAR INTEGRALS. Gianluca Crippa

ORDINARY DIFFERENTIAL EQUATIONS AND SINGULAR INTEGRALS. Gianluca Crippa Manuscript submitte to AIMS Journals Volume X, Number 0X, XX 200X Website: http://aimsciences.org pp. X XX ORDINARY DIFFERENTIAL EQUATIONS AND SINGULAR INTEGRALS Gianluca Crippa Departement Mathematik

More information

Combinatorica 9(1)(1989) A New Lower Bound for Snake-in-the-Box Codes. Jerzy Wojciechowski. AMS subject classification 1980: 05 C 35, 94 B 25

Combinatorica 9(1)(1989) A New Lower Bound for Snake-in-the-Box Codes. Jerzy Wojciechowski. AMS subject classification 1980: 05 C 35, 94 B 25 Combinatorica 9(1)(1989)91 99 A New Lower Boun for Snake-in-the-Box Coes Jerzy Wojciechowski Department of Pure Mathematics an Mathematical Statistics, University of Cambrige, 16 Mill Lane, Cambrige, CB2

More information