An Analysis of Reinforcement Learning with Function Approximation
|
|
- Susan Jenkins
- 6 years ago
- Views:
Transcription
1 Francisco S. Melo Carnegie Mellon University, Pittsburgh, PA 15213, USA Sean P. Meyn Coorinate Science Lab, Urbana, IL 61801, USA M. Isabel Ribeiro Institute for Systems an Robotics, Lisboa, Portugal Abstract We aress the problem of computing the optimal Q-function in Markov ecision problems with infinite state-space. We analyze the convergence properties of several variations of Q-learning when combine with function approximation, extening the analysis of TD-learning in (Tsitsiklis & Van Roy, 1996a) to stochastic control settings. We ientify conitions uner which such approximate methos converge with probability 1. We conclue with a brief iscussion on the general applicability of our results an compare them with several relate works. 1. Introuction Convergence of Q-learning with function approximation has been a long staning open question in reinforcement learning (Sutton, 1999). In general, valuebase reinforcement learning (RL) methos for optimal control behave poorly when combine with function approximation (Bair, 1995; Tsitsiklis & Van Roy, 1996a). In this paper, we aress this problem by analyzing the convergence of Q-learning when combine with linear function approximation. We ientify a set of conitions that imply the convergence of this approximation metho with probability 1 (w.p.1), when a fixe learning policy is use, an provie an interpretation of the resulting approximation as the fixe point of a Bellman-like operator. This motivates the analysis of several variations of Q-learning when combine with linear function approximation. In particular, we Appearing in Proceeings of the 25 th International Conference on Machine Learning, Helsinki, Finlan, Copyright 2008 by the author(s)/owner(s). stuy a variation of Q-learning using importance sampling an an on-policy variant of Q-learning (SARSA). The paper is organize as follows. We start in Section 2 by escribing Markov ecision problems. We procee with our analysis of the Q-learning algorithm an its variants, an prouce our main results in Section 3. We also compare our results with other relate works in the RL literature. We conclue with some further iscussion in Section Markov Decision Problems Let (X, A, P, r, γ) be a Markov ecision problem (MDP) with a compact state-space X R p an a finite action set A. The action-epenent kernel P a efines the transition probabilities for the unerlying controlle Markov chain {X t } as P X t+1 U X t = x, A t = a = P a (x, U), where U is any measurable subset of X. The A-value process {A t } represents the control process: A t is the control action at time instant t. 1 Solving the MDP consists in etermining the control process {A t } maximizing the expecte total iscounte rewar V ( {A t }, x ) = E γ t R(X t, A t ) X 0 = x, t=0 where 0 γ < 1 is a iscount-factor an R(x, a) represents a ranom rewar receive for taking action a A in state x X. For simplicity of notation, we consier a boune eterministic function r : X A X R assigning a rewar r(x, a, y) every time a transition from x to y occurs after taking 1 We take the control process {A t} to be aapte to the σ-algebra inuce by {X t}.
2 action a. This means that E R(x, a) = r(x, a, y)p a (x, y). X The optimal value function V is efine for each state x X as V (x) = max V ( {A t }, x ) = {A t} = max E γ t R(X t, A t ) X 0 = x {A t} t=0 an verifies the Bellman optimality equation V (x) = max r(x, a, y) + γv (y) P a (x, y). (1) a A X V (x) represents the expecte total iscounte rewar receive along an optimal trajectory starting at state x. We can also efine the optimal Q-function Q as Q (x, a) = r(x, a, y) + γv (y) P a (x, y), (2) X representing the expecte total iscounte rewar along a trajectory starting at state x obtaine by choosing a as the first action an following the optimal policy thereafter. The control process {A t } efine as A t = arg max Q (X t, a), a A t, is optimal in the sense that V ({A t }, x) = V (x) an efines a mapping π : X A known as the optimal policy. The optimal policy etermines the optimal ecision rule for a given MDP. More generally, a (Markov) policy is any mapping π t efine over X A generating a control process {A t } verifying, for all t, P A t = a X t = x = π t (x, a), We write V πt (x) instea of V ({A t }, x) if the control process {A t } is generate by policy π t. A policy π t is stationary if it oes not epen on t an eterministic if it assigns probability 1 to a single action in each state an is thus represente as a map π t : X A for every t. Notice that the optimal control process can be obtaine from the optimal (stationary, eterministic) policy π, which can in turn be obtaine from Q. Therefore, the optimal control problem is solve once the function Q is known for all pairs (x, a). t. Now given any real function q efine over X A, we efine the Bellman operator (Hq)(x, a) = r(x, a, y) + γ max q(y, u) P a (x, y). u A X (3) The function Q in (2) is the fixe-point of H an, since this operator is a contraction in the sup-norm, a fixe-point iteration can be use to etermine Q (at least theoretically) The Q-Learning Algorithm We previously suggeste that a fixe-point iteration coul be use to etermine the function Q. In practice, this requires two important conitions: The kernel P an the rewar function r are known; The successive estimates for Q can be represente compactly an store in a computer with finite memory. If P an/or r are not known, a fixe-point iteration using H is not possible. To solve this problem, Watkins propose in 1989 the Q-learning algorithm (Watkins, 1989). Q-learning procees as follows: consier a MDP M = (X, A, P, r, γ) an suppose that {x t } is an infinite sample trajectory of the unerlying Markov chain obtaine with some policy π t. The corresponing sample control process is enote as {a t } an the sequence of obtaine rewars as {r t }. Given any initial estimate Q 0, Q-learning successively upates this estimate using the rule Q t+1 (x, a) = Q t (x, a) + α t (x, a) t, (4) where {α t } is a step-size sequence an t is the temporal ifference at time t, t = r t + γ max b A Q t(x t+1, b) Q t (x t, a t ). (5) If both X an A are finite sets, each estimate Q t is simply a X A matrix an can be represente explicitly in a computer. In that case, the convergence of Q-learning an several other relate algorithms (such as TD(λ) or SARSA) has been thoroughly stuie (see, for example, (Bertsekas & Tsitsiklis, 1996) an references therein). However, if either X or A are infinite or very large, explicitly representing each Q t becomes infeasible an some form of compact representation is neee (e.g., using function approximation). In this paper, we aress how several RL methos such as Q- learning an SARSA can be combine with function approximation an still retain their main convergence properties. 3. Reinforcement Learning with Linear Function Approximation In this section, we aress the problem of etermining the optimal Q-function for MDPs with infinite statespace X. Let Q = {Q θ } be a family of real-value
3 functions efine in X A. It is assume that the function class is linearly parameterize, so that Q can be expresse as the linear span of a fixe set of M linearly inepenent functions φ i : X A R. For each M-imensional parameter vector θ R M, the function Q θ Q is efine as, Q θ (x, a) = M φ i (x, a)θ(i) = φ (x, a)θ, i=1 where represents the transpose operator. We will also enote the above function by Q(θ) to emphasize the epenence on θ over the epenency on (x, a). Let π be a fixe stochastic, stationary policy an suppose that {x t }, {a t } an {r t } are sample trajectories of states, actions an rewars obtaine from the MDP using policy π. In the original Q-learning algorithm, the Q-values are upate accoring to (4). The temporal ifference t can be interprete as a 1-step estimation error with respect to the optimal function Q. The upate rule in Q-learning moves the estimates Q t closer to the esire function Q, minimizing the expecte value of t. In our approximate setting, we apply the same unerlying iea to obtain the upate rule for approximate Q-learning: θ t+1 = θ t + α t θ Q θ (x t, a t ) t = θ t + α t φ(x t, a t ) t, (6) where, as above, t is the temporal ifference at time t efine in (5). Notice that (6) upates θ t using the temporal ifference t as the error. The graient θ Q θ provies the irection in which this upate is performe. To establish convergence of the algorithm (6) we aopt an ODE argument, establishing the trajectories of the algorithm to closely follow those of an associate ODE with a globally asymptotically stable equilibrium point. This will require several regularity properties on the policy π an on its inuce Markov chain that will, in turn, motivate the stuy of the on-policy version of the algorithm. This on-policy algorithm can be seen as an extension of SARSA to infinite settings Convergence of Q-Learning We now procee by ientifying conitions that ensure the convergence of Q-learning with linear function approximation as escribe by (6). Due to space limitations, we overlook some of the technical etails in the proofs that can easily be fille in. We start by introucing some notation that will greatly simplify the presentation. Given an MDP M = (X, A, P, r, γ) with compact state space X R p, let (X, P π ) be the Markov chain inuce by a fixe policy π. We assume the chain (X, P π ) to be uniformly ergoic with invariant probability measure µ X an the policy π to verify π(x, a) > 0 for all a A an µ X - almost all x X. We enote by µ π the probability measure efine for each measurable set U X an each action a A as µ π (U {a}) = π(x, a)µ X (x). Let now {φ i, i = 1,..., M} be a set of boune, linearly inepenent basis functions to be use in our approximate Q-learning algorithm. We enote by Σ π the matrix efine as Σ π = E π φ(x, a)φ (x, a) = φ φ µ π U X A Notice that the above expression is well-efine an inepenent of the initial istribution for the chain, ue to our assumption of uniform ergoicity. For fixe θ R M an x X, efine the set of maximizing actions at x as A θ x = {a A φ (x, a )θ = max φ (x, a)θ} a an the greey policy with respect to θ as any policy that, at each state x, assigns positive probability only to actions in A θ x. Finally, let φ x enote the row-vector φ (x, a), where a is a ranom action generate accoring to the policy π at state x; likewise, let φ θ x enote the row-vector φ (x, a θ x), where a θ x is now any action in A θ x. We now introuce the θ-epenent matrix Σ π(θ) = E π (φ θ x ) φ θ x. By construction, both Σ π an Σ π are positive efinite, since the functions φ i are assume linearly inepenent. Notice also the ifference between Σ π an each Σ π: the actions in the efinition of the former are taken accoring to π while in the latter they are taken greeily with respect to a particular θ. We are now in position to introuce our first result. Theorem 1 Let M, π an {φ i, i = 1,..., M} be as efine above. If, for all θ, Σ π > γ 2 Σ π(θ) (7) an the step-size sequence verifies α t = αt 2 <, t t
4 then the algorithm in (6) converges w.p.1 an the limit point θ verifies the recursive relation Q(θ ) = Π Q HQ(θ ), where Π Q is the orthogonal projection onto Q. 2 Proof We establish the main statement of the theorem using a stanar ODE argument. The assumptions on the chain (X, P π ) an basis functions {φ i, i = 1,..., M} an the fact that π(x, a) > 0 for all a A an µ X -almost every x X ensure the applicability of Theorem 17 in page 239 of (Benveniste et al., 1990). Therefore, the convergence of the algorithm can be analyze in terms of the stability of the equilibrium points of the associate ODE θ = E π φ x ( r(x, a, y) + γφ θ y θ φ x θ ), (8) where we omitte the explicit epenence of θ on t to avoi excessively cluttering the expression. If the ODE (8) has a globally asymptotically stable equilibrium point, this implies the algorithm (6) to converge w.p.1 (Benveniste et al., 1990). Let then θ 1 (t) an θ 2 (t) be two trajectories of the ODE starting at ifferent initial conitions, an let θ(t) = θ 1 (t) θ 2 (t). From (8), we get t θ 2 2 = 2 θ Σ π θ + 2γEπ (φx θ)( φ θ 1 y θ 1 φ θ2 y θ 2 ). Notice now that, from the efinition of φ θ1 y φ θ1 y θ 2 φ θ2 y θ 2 φ θ2 y θ 1 φ θ1 y θ 1. an φ θ2 y, Taking this into account an efining the sets S + = {(x, a) φ (x, a) θ > 0} an S = X A S +, the previous expression becomes t θ θ Σ π θ + 2γEπ (φx θ)( φ θ 1 y θ ) I S+ + 2γE π (φx θ)( φ θ 2 y θ ) I S, where I S represents the inicator function for the set S. Applying Höler s inequality to each of the expec- 2 The orthogonal projection is naturally efine in the (infinite-imensional) Hilbert space containing Q with inner-prouct given by f, g = X A f g µ π. tations above, we get t θ θ Σ π θ + 2γ E π (φ x θ)2 I S+ E π (φ θ1 x θ) 2 I S+ + 2γ E π (φ x θ)2 I S E π (φ θ2 x θ) 2 I S an a few simple computations finally yiel t θ θ Σ π θ + 2γ θ Σ π θ max ( θ Σ π(θ 1 ) θ, θ Σ π(θ 2 ) θ). Since, by assumption, Σ π > γ 2 Σ π(θ), we can conclue from the expression above that t θ 2 2 < 0. This means, in particular, that θ(t) converges asymptotically to the origin, i.e., the ODE (8) is globally asymptotically stable. Since the ODE is autonomous (i.e., time-invariant), there exists one globally asymptotically stable equilibrium point for the ODE, that verifies the recursive relation θ = Σ 1 ( π E π φ x r(x, a, y) + γφ θ y θ ). (9) Since Σ π is, by construction, positive efinite, the inverse in (9) is well-efine. Multiplying (9) by φ (x, a) on both sies yiels the esire result. It is now important to observe that conition (7) is quite restrictive: since γ is usually taken close to 1, conition (7) essentially requires that, for every θ, max a A φ (x, a)θ π(x, a)φ (x, a)θ. a A Therefore, such conition will selom be met in practice, since it implies that the learning policy π is alreay close to the policy that the algorithm is meant to compute. In other wors, the maximization above yiels a policy close to the policy use uring learning. An, when this is the case, the algorithm essentially behaves like an on-policy algorithm. On the other han, the above conition can be ensure by consiering only a local maximization aroun the learning policy π. This is the most interesting aspect of the above result: it explicitly relates how much information the learning policy provies about greey policies, as a function of γ. To better unerstan this,
5 notice that each policy π is associate with a particular invariant measure on the inuce chain (X, P π ). In particular, the measure associate with the learning policy may be very ifferent from the one inuce by the greey/optimal policy. Taking into account the fact that γ measures, in a sense, the importance of the future, Theorem 1 basically states that: In problems where the performance of the agent greatly epens on future rewars (γ 1), the information provie by the learning policy can only be safely generalize to nearby greey policies, in the sense of (7). In problems where the performance of the agent is less epenent on future rewars (γ 1), the information provie by the learning policy can be safely generalize to more general greey policies. Suppose then that the maximization in the upate equation (6) is to be replace by a local maximization. In other wors, instea of maximizing over all actions in A, the algorithm shoul maximize over a small neighborhoo of the learning policy π (in policy space). The ifficulty with this approach is that such maximization can be har to implement. The use of importance sampling can reaily overcome such ifficulty, by making the maximization in policy-space implicit. The algorithm thus obtaine, which resembles in many aspects the one propose in (Precup et al., 2001), is escribe by the upate rule θ t+1 = θ t + α t φ(x t, a t ) ˆ t, (10) where the moifie temporal ifference ˆ t is given by ˆ t = r t + γ π θ (x t+1, b) π(x t+1, b) Q θ t (x t+1, b) Q θt (x t, a t ), b (11) where π θ is, for example, a θ-epenent ε-greey policy close to the learning policy π. 3 A possible implementation of such algorithm is sketche in Figure 1. We enote by π N the behavior policy at iteration N of the algorithm; in the stopping conition for the algorithm, any aequate policy norm can be use SARSA with Linear Function Approximation The analysis in the previous subsection suggests that on-policy algorithms may potentially yiel more reliable convergence properties. Such fact has alreay been observe in (Tsitsiklis & Van Roy, 1996a; Perkins & Penrith, 2002). In this subsection we thus focus on 3 Given ε > 0, a policy π is ε-greey with respect to a function Q θ Q if, at each state x X, it chooses a ranom action with probability ε an a greey action a A θ x with probability (1 ε). Algorithm 1 Moifie Q-learning. Require: Initial policy π 0 ; 1: Initialize X 0 = x 0 an set N = 0; 2: for t = 0 until T o 3: Sample A t π N (x t, ); 4: Sample next-state X t+1 P at (x t, ); 5: r t = r(x t, a t, x t+1 ); 6: Upate θ t accoring to (10); 7: en for 8: Set π N+1 (x, a) = π θ (x, a); 9: N = N + 1; 10: if π N π N 1 then 11: return π N ; 12: else 13: Goto 2; 14: en if on-policy algorithms. We analyze the convergence of SARSA when combine with linear function approximation. In our main result, we recover the essence of the result in (Perkins & Precup, 2003), although in a somewhat ifferent setting. The main ifferences between our work an that in (Perkins & Precup, 2003) are iscusse further ahea. Once again, we consier a family Q of real-value functions, the linear span of a fixe set of M linearly inepenent functions φ i : X A R, an erive an on-policy algorithm to compute a parameter vector θ such that φ (x, a)θ approximates the optimal Q- function. To this purpose, an unlike what has been one so far, we now consier a θ-epenent learning policy π θ verifying π θ (x, a) > 0 for all θ. In particular, we consier at each time step a learning policy π θt that is ε-greey with respect to φ (x, a)θ t an Lipschitz continuous with respect to θ, with Lipschitz constant C (with respect to some preferre metric). We further assume that, for every fixe θ, the Markov chain (X, P θ ) inuce by such policy is uniformly ergoic. Let then {x t }, {a t } an {r t } be sample trajectories of states, actions an rewars obtaine from the MDP M = (X, A, P, r, γ) using (at each time-step) the θ- epenent policy π θt. The upate rule for our approximate SARSA algorithm is: θ t+1 = θ t + α t φ(x t, a t ) t, (12) where t is the temporal ifference at time t, t = r t + γφ (x t+1, a t+1 )θ t φ (x t, a t )θ t. In orer to use the SARSA algorithm above to approximate the optimal Q-function, it is necessary to slowly ecay the exploration rate, ε, to zero, while guaran-
6 teeing the learning policy to verify the necessary regularity conitions (namely, Lipschitz continuous w.r.t. θ). However, as will soon become apparent, ecreasing the exploration rate to zero will rener our convergent result (an other relate results) not applicable. We are now in position to introuce our main result. Theorem 2 Let M, π θt an {φ i, i = 1,..., M} be as efine above. Let C be the Lipschitz constant of the learning policy π θ with respect to θ. Assume that the step-size sequence verifies α t = αt 2 <. t Then, there is C 0 > 0 such that, if C < C 0, the algorithm in (12) converges w.p.1. Proof We again use an ODE argument to establish the statement of the theorem. As before, the assumptions on (X, P θ ) an basis functions {φ i, i = 1,..., M} an the fact that the learning policy is Lipschitz continuous with respect to θ an verifies π(x, a) > 0 ensure the applicability of Theorem 17 in page 239 of (Benveniste et al., 1990). Therefore, the convergence of the algorithm can be analyze in terms of the stability of the associate ODE: θ = E θ φ x ( r(x, a, y) + γφy θ φ x θ ). (13) Notice that the expectation is taken with respect to the invariant measure of the chain an learning policy, both θ-epenent. To establish global asymptotic stability, we re-write (13) as where θ(t) = A θ θ(t) + b θ A θ = E θ φ x ( γφy φ x ) ; bθ = E θ φ x r(x, a, y). An equilibrium point of (13) must verify θ = A 1 θ b θ an the existence of such equilibrium point has been establishe in (e Farias & Van Roy, 2000) (Theorem 5.1). Let θ(t) = θ(t) θ. Then, Let t θ 2 2 = 2 θ ( A θ θ + b θ ) = = 2 θ ( A θ θ A θ θ + b θ b θ ). λ A = sup A θ A θ 2 θ t λ b = sup θ θ b θ b θ 2 θ θ 2, where the norm in the efinition of λ A is the inuce operator norm an the one in the efinition of λ b is the regular Eucliian norm. The previous expression thus becomes t θ 2 2 = = 2 θ A θ θ + 2 θ (A θ A θ )θ + 2 θ (b θ b θ ) 2 θ A θ θ + 2(λA + λ b ) θ 2 2. Letting λ = λ A + λ b, the above expression can be written as t θ 2 2 θ ( A θ + λi) θ. The fact that the learning policy is assume Lipschitz w.r.t. θ an the uniform ergoicity of the corresponing inuce chain implies that A θ an b θ are also Lipschitz w.r.t. θ (with a ifferent constant). This means that λ goes to zero with C an, therefore, for C sufficiently small, (A+λI) is a negative efinite matrix. 4 Therefore, the ODE (13) is globally asymptotically stable an the conclusion of the theorem follows. Several remarks are now in orer. First of all, Theorem 2 basically states that, for fixe ε if the epenence of the learning policy π θ can be mae sufficiently smooth, then SARSA converges w.p.1. This result is similar to the result in (Perkins & Precup, 2003), although the algorithms are not exactly similar: we consier a continuing task, while the algorithm feature in (Perkins & Precup, 2003) is implemente in an episoic fashion. Furthermore, in our case, convergence was establishe using an ODE argument, instea of the contraction argument in (Perkins & Precup, 2003). Nevertheless, both methos of proof are, in its essence, equivalent an the results in both papers concorant. A secon remark is relate with the implementation of SARSA with a ecaying exploration policy. The analysis of one such algorithm coul be conucte using, once again, an ODE argument. In particular, SARSA coul be escribe as a two-time-scale algorithm: the iterations of the main algorithm (corresponing to (12)) woul evelop on a faster time-scale an the ecaying exploration rate woul evelop at a slower time-scale. The analysis in (Borkar, 1997) coul then be replicate. However, it is well-known that, as ε approaches zero, the learning policy will approach the greey policy w.r.t. θ which is, in general, iscontinuous. Therefore, there is little hope that the smoothness 4 The fact that A θ is negative efinite has been establishe in several works. See, for example, Lemma 3 in (Perkins & Precup, 2003) or, in a slightly ifferent setting (easily extenable to our setting) the proof of Theorem 1 in (Tsitsiklis & Van Roy, 1996a).
7 conition in Theorem 2 (or its equivalent in (Perkins & Precup, 2003)) can be met as ε approaches to zero. 4. Discussion We now briefly iscuss some of the assumptions in the above theorems. We start by emphasizing that all state conitions are only sufficient, meaning that it is possible that convergence may occur even if some (or all) fail to hol. We also iscuss the relation between our results an other relate works from the literature. Seconsly, uniform ergoicity of a Markov chain essentially means that the chain quickly converges to a stationary behavior uniformly over the state-space an we can stuy any properties of the stationary chain by irect sampling. 5 This property an the requirement that π(x, a) > 0 for all a A an µ X -almost all x X can be interprete as a continuous counterpart to the usual conition that all state-action pairs are visite infinitely often. In fact, uniform ergoicity implies that all the regions of the state-space with positive µ X measure are sufficiently visite (Meyn & Tweeie, 1993), an the conition π(x, a) > 0 ensures that, at each state, every action is sufficiently trie. It appears to be a stanar requirement in this continuous scenario, as it has also been use in other works (Tsitsiklis & Van Roy, 1996a; Singh et al., 1994; Perkins & Precup, 2003). 6 The requirement that π(x, a) > 0 for all a A an µ X -almost all x X also correspons to the concept of fair control as introuce in (Borkar, 2000). We also remark that the ivergence example in (Goron, 1996) is ue to the fact that the learning policy fails to verify the Lipschitz continuity conition state in Theorem 2 (as iscusse in (Perkins & Penrith, 2002)) Relate Work In this paper, we analyze how RL algorithms can be combine with linear function approximation to approximate the optimal Q-function in MDPs with infinite state-spaces. In the last ecae or so, several authors have aresse this same problem from ifferent perspectives. We now briefly iscuss several such 5 Explicit bouns on the rate of convergence to stationarity are available in the literature (Meyn & Tweeie, 1994; Diaconis & Saloff-Coste, 1996; Rosenthal, 2002). However, for general chains, such bouns ten to be loose. 6 Most of these works make use of geometric ergoicity which, since we amit a compact state-space, is a consequence of uniform ergoicity. approaches an their relation to the results in this paper. One possible approach is to rely on soft-state aggregation (Singh et al., 1994; Goron, 1995; Tsitsiklis & Van Roy, 1996b), partitioning the state-space into soft regions. Treating the soft-regions as hyper-states, these methos then use stanar learning methos (such as Q-learning or SARSA) to approximate the optimal Q-function. The main ifferences between such methos an those using linear function approximation (such as the ones portraye here) are that, in the former, only one component of the parameter vector is upate at each iteration an the basis functions are, by construction, restraine to be positive an to a to one at each point of the state-space. Sample-base methos (Ormoneit & Sen, 2002; Szepesvári & Smart, 2004) further generalize the applicability of soft-state aggregation methos by using spreaing functions/kernels (Ribeiro & Szepesvári, 1996). Sample-base methos thus exhibit superior convergence rate when compare with simple softstate aggregation methos, although uner somewhat more restrictive conitions. Finally, RL with general linear function approximation was thorougly stuie in (Tsitsiklis & Van Roy, 1996a; Taić, 2001). Posterior works extene the applicability of such results. In Precup01icml, an offpolicy convergent algorithm was propose that uses an importance-sampling principle similar to the one escribe in Section 3. In (Perkins & Precup, 2003), the authors establish the convergence of SARSA with linear function approximation Concluing Remarks We conclue by observing that all methos analyze here as well as those surveye above experience a egraation in performance as the istance between the target function an the chosen linear space increases. If the functions in the chosen linear space provie only a poor approximation of the esire function, there are no practical guarantees on the usefulness of such approximation. The error bouns erive in (Tsitsiklis & Van Roy, 1996a) are reassuring in that they state that the performance of approximate TD gracefully egraes as the istance between the target function an the chosen linear space increases. Although we have not aresse such topic in our analysis, we expect the error bouns in (Tsitsiklis & Van Roy, 1996a) to carry with little changes to our setting. Finally, we make no use of eligibility traces in our algorithms. However, it is just expectable that the methos escribe herein can easily be aapte to ac-
8 commoate for eligibility traces, this eventually yieling better approximations (with tighter error bouns) (Tsitsiklis & Van Roy, 1996a) Acknowlegements The authors woul like to acknowlege the many useful comments from Doina Precup an the unknown reviewers. Francisco S. Melo acknowleges the support from the ICTI an the Portuguese FCT, uner the Carnegie Mellon-Portugal Program. Sean Meyn gratefully acknowleges the financial support from the National Science Founation (ECS ). Isabel Ribeiro was supporte by the FCT in the frame of ISR-Lisboa pluriannual funing. Any opinions, finings, an conclusions or recommenations expresse in this material are those of the authors an o not necessarily reflect the views of the NSF. References Bair, L. (1995). Resiual algorithms: Reinforcement learning with function approximation. Proc. 12th Int. Conf. Machine Learning (pp ). Benveniste, A., Métivier, M., & Priouret, P. (1990). Aaptive algorithms an stochastic approximations, vol. 22. Springer-Verlag. Bertsekas, D., & Tsitsiklis, J. (1996). Neuro-ynamic programming. Athena Scientific. Borkar, V. (1997). Stochastic approximation with two time scales. Systems & Control Letters, 29, Borkar, V. (2000). A learning algorithm for iscretetime stochastic control. Probability in the Engineering an Informational Sciences, 14, e Farias, D., & Van Roy, B. (2000). On the existence of fixe points for approximate value iteration an temporal-ifference learning. Journal of Optimization Theory an Applications, 105, Diaconis, P., & Saloff-Coste, L. (1996). Logarithmic Sobolev inequalities for finite Markov chains. Annals of Applie Probability, 6, Goron, G. (1995). Stable function approximation in ynamic programming (Technical Report CMU-CS ). School of Computer Science, Carnegie Mellon University. Goron, G. (1996). Chattering in SARSA(λ). (Technical Report). CMU Learning Lab Internal Report. Meyn, S., & Tweeie, R. (1993). Markov chains an stochastic stability. Springer-Verlag. Meyn, S., & Tweeie, R. (1994). Computable bouns for geometric convergence rates of Markov chains. Annals of Applie Probability, 4, Ormoneit, D., & Sen, S. (2002). Kernel-base reinforcement learning. Machine Learning, 49, Perkins, T., & Penrith, M. (2002). On the existence of fixe-points for Q-learning an SARSA in partially observable omains. Proc. 19th Int. Conf. Machine Learning (pp ). Perkins, T., & Precup, D. (2003). A convergent form of approximate policy iteration. Av. Neural Information Proc. Systems (pp ). Precup, D., Sutton, R., & Dasgupta, S. (2001). Offpolicy temporal-ifference learning with function approximation. Proc. 18th Int. Conf. Machine Learning (pp ). Ribeiro, C., & Szepesvári, C. (1996). Q-learning combine with spreaing: Convergence an results. Proc. ISRF-IEE Int. Conf. Intelligent an Cognitive Systems (pp ). Rosenthal, J. (2002). Quantitative convergence rates of Markov chains: A simple account. Electronic Communications in Probability, 7, Singh, S., Jaakkola, T., & Joran, M. (1994). Reinforcement learning with soft state aggregation. Av. Neural Information Proc. Systems (pp ). Sutton, R. (1999). Open theoretical questions in reinforcement learning. Lecture Notes in Computer Science, 1572, Szepesvári, C., & Smart, W. (2004). Interpolationbase Q-learning. Proc. 21st Int. Conf. Machine learning (pp ). Taić, V. (2001). On the convergence of temporalifference learning with linear function approximation. Machine Learning, 42, Tsitsiklis, J., & Van Roy, B. (1996a). An analysis of temporal-ifference learning with function approximation. IEEE Trans. Automatic Control, 42, Tsitsiklis, J., & Van Roy, B. (1996b). Feature-base methos for large scale ynamic programming. Machine Learning, 22, Watkins, C. (1989). Learning from elaye rewars. Doctoral issertation, King s College, University of Cambrige.
Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012
CS-6 Theory Gems November 8, 0 Lecture Lecturer: Alesaner Mąry Scribes: Alhussein Fawzi, Dorina Thanou Introuction Toay, we will briefly iscuss an important technique in probability theory measure concentration
More information6 General properties of an autonomous system of two first order ODE
6 General properties of an autonomous system of two first orer ODE Here we embark on stuying the autonomous system of two first orer ifferential equations of the form ẋ 1 = f 1 (, x 2 ), ẋ 2 = f 2 (, x
More informationTime-of-Arrival Estimation in Non-Line-Of-Sight Environments
2 Conference on Information Sciences an Systems, The Johns Hopkins University, March 2, 2 Time-of-Arrival Estimation in Non-Line-Of-Sight Environments Sinan Gezici, Hisashi Kobayashi an H. Vincent Poor
More informationThe total derivative. Chapter Lagrangian and Eulerian approaches
Chapter 5 The total erivative 51 Lagrangian an Eulerian approaches The representation of a flui through scalar or vector fiels means that each physical quantity uner consieration is escribe as a function
More informationLower bounds on Locality Sensitive Hashing
Lower bouns on Locality Sensitive Hashing Rajeev Motwani Assaf Naor Rina Panigrahy Abstract Given a metric space (X, X ), c 1, r > 0, an p, q [0, 1], a istribution over mappings H : X N is calle a (r,
More informationSINGULAR PERTURBATION AND STATIONARY SOLUTIONS OF PARABOLIC EQUATIONS IN GAUSS-SOBOLEV SPACES
Communications on Stochastic Analysis Vol. 2, No. 2 (28) 289-36 Serials Publications www.serialspublications.com SINGULAR PERTURBATION AND STATIONARY SOLUTIONS OF PARABOLIC EQUATIONS IN GAUSS-SOBOLEV SPACES
More informationComputing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions
Working Paper 2013:5 Department of Statistics Computing Exact Confience Coefficients of Simultaneous Confience Intervals for Multinomial Proportions an their Functions Shaobo Jin Working Paper 2013:5
More informationLecture 2 Lagrangian formulation of classical mechanics Mechanics
Lecture Lagrangian formulation of classical mechanics 70.00 Mechanics Principle of stationary action MATH-GA To specify a motion uniquely in classical mechanics, it suffices to give, at some time t 0,
More informationTable of Common Derivatives By David Abraham
Prouct an Quotient Rules: Table of Common Derivatives By Davi Abraham [ f ( g( ] = [ f ( ] g( + f ( [ g( ] f ( = g( [ f ( ] g( g( f ( [ g( ] Trigonometric Functions: sin( = cos( cos( = sin( tan( = sec
More informationLectures - Week 10 Introduction to Ordinary Differential Equations (ODES) First Order Linear ODEs
Lectures - Week 10 Introuction to Orinary Differential Equations (ODES) First Orer Linear ODEs When stuying ODEs we are consiering functions of one inepenent variable, e.g., f(x), where x is the inepenent
More informationu!i = a T u = 0. Then S satisfies
Deterministic Conitions for Subspace Ientifiability from Incomplete Sampling Daniel L Pimentel-Alarcón, Nigel Boston, Robert D Nowak University of Wisconsin-Maison Abstract Consier an r-imensional subspace
More informationAn Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback
Journal of Machine Learning Research 8 07) - Submitte /6; Publishe 5/7 An Optimal Algorithm for Banit an Zero-Orer Convex Optimization with wo-point Feeback Oha Shamir Department of Computer Science an
More information19 Eigenvalues, Eigenvectors, Ordinary Differential Equations, and Control
19 Eigenvalues, Eigenvectors, Orinary Differential Equations, an Control This section introuces eigenvalues an eigenvectors of a matrix, an iscusses the role of the eigenvalues in etermining the behavior
More informationLeast-Squares Regression on Sparse Spaces
Least-Squares Regression on Sparse Spaces Yuri Grinberg, Mahi Milani Far, Joelle Pineau School of Computer Science McGill University Montreal, Canaa {ygrinb,mmilan1,jpineau}@cs.mcgill.ca 1 Introuction
More information7.1 Support Vector Machine
67577 Intro. to Machine Learning Fall semester, 006/7 Lecture 7: Support Vector Machines an Kernel Functions II Lecturer: Amnon Shashua Scribe: Amnon Shashua 7. Support Vector Machine We return now to
More informationPDE Notes, Lecture #11
PDE Notes, Lecture # from Professor Jalal Shatah s Lectures Febuary 9th, 2009 Sobolev Spaces Recall that for u L loc we can efine the weak erivative Du by Du, φ := udφ φ C0 If v L loc such that Du, φ =
More informationMath Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors
Math 18.02 Notes on ifferentials, the Chain Rule, graients, irectional erivative, an normal vectors Tangent plane an linear approximation We efine the partial erivatives of f( xy, ) as follows: f f( x+
More informationImproved Rate-Based Pull and Push Strategies in Large Distributed Networks
Improve Rate-Base Pull an Push Strategies in Large Distribute Networks Wouter Minnebo an Benny Van Hout Department of Mathematics an Computer Science University of Antwerp - imins Mielheimlaan, B-00 Antwerp,
More informationLower Bounds for the Smoothed Number of Pareto optimal Solutions
Lower Bouns for the Smoothe Number of Pareto optimal Solutions Tobias Brunsch an Heiko Röglin Department of Computer Science, University of Bonn, Germany brunsch@cs.uni-bonn.e, heiko@roeglin.org Abstract.
More informationAgmon Kolmogorov Inequalities on l 2 (Z d )
Journal of Mathematics Research; Vol. 6, No. ; 04 ISSN 96-9795 E-ISSN 96-9809 Publishe by Canaian Center of Science an Eucation Agmon Kolmogorov Inequalities on l (Z ) Arman Sahovic Mathematics Department,
More informationThe canonical controllers and regular interconnection
Systems & Control Letters ( www.elsevier.com/locate/sysconle The canonical controllers an regular interconnection A.A. Julius a,, J.C. Willems b, M.N. Belur c, H.L. Trentelman a Department of Applie Mathematics,
More informationELEC3114 Control Systems 1
ELEC34 Control Systems Linear Systems - Moelling - Some Issues Session 2, 2007 Introuction Linear systems may be represente in a number of ifferent ways. Figure shows the relationship between various representations.
More informationJUST THE MATHS UNIT NUMBER DIFFERENTIATION 2 (Rates of change) A.J.Hobson
JUST THE MATHS UNIT NUMBER 10.2 DIFFERENTIATION 2 (Rates of change) by A.J.Hobson 10.2.1 Introuction 10.2.2 Average rates of change 10.2.3 Instantaneous rates of change 10.2.4 Derivatives 10.2.5 Exercises
More informationd dx But have you ever seen a derivation of these results? We ll prove the first result below. cos h 1
Lecture 5 Some ifferentiation rules Trigonometric functions (Relevant section from Stewart, Seventh Eition: Section 3.3) You all know that sin = cos cos = sin. () But have you ever seen a erivation of
More informationLyapunov Functions. V. J. Venkataramanan and Xiaojun Lin. Center for Wireless Systems and Applications. School of Electrical and Computer Engineering,
On the Queue-Overflow Probability of Wireless Systems : A New Approach Combining Large Deviations with Lyapunov Functions V. J. Venkataramanan an Xiaojun Lin Center for Wireless Systems an Applications
More informationLinear First-Order Equations
5 Linear First-Orer Equations Linear first-orer ifferential equations make up another important class of ifferential equations that commonly arise in applications an are relatively easy to solve (in theory)
More informationLeaving Randomness to Nature: d-dimensional Product Codes through the lens of Generalized-LDPC codes
Leaving Ranomness to Nature: -Dimensional Prouct Coes through the lens of Generalize-LDPC coes Tavor Baharav, Kannan Ramchanran Dept. of Electrical Engineering an Computer Sciences, U.C. Berkeley {tavorb,
More informationMath 1B, lecture 8: Integration by parts
Math B, lecture 8: Integration by parts Nathan Pflueger 23 September 2 Introuction Integration by parts, similarly to integration by substitution, reverses a well-known technique of ifferentiation an explores
More informationCalculus and optimization
Calculus an optimization These notes essentially correspon to mathematical appenix 2 in the text. 1 Functions of a single variable Now that we have e ne functions we turn our attention to calculus. A function
More informationTractability results for weighted Banach spaces of smooth functions
Tractability results for weighte Banach spaces of smooth functions Markus Weimar Mathematisches Institut, Universität Jena Ernst-Abbe-Platz 2, 07740 Jena, Germany email: markus.weimar@uni-jena.e March
More informationSystems & Control Letters
Systems & ontrol Letters ( ) ontents lists available at ScienceDirect Systems & ontrol Letters journal homepage: www.elsevier.com/locate/sysconle A converse to the eterministic separation principle Jochen
More informationNested Saturation with Guaranteed Real Poles 1
Neste Saturation with Guarantee Real Poles Eric N Johnson 2 an Suresh K Kannan 3 School of Aerospace Engineering Georgia Institute of Technology, Atlanta, GA 3332 Abstract The global stabilization of asymptotically
More informationThe derivative of a function f(x) is another function, defined in terms of a limiting expression: f(x + δx) f(x)
Y. D. Chong (2016) MH2801: Complex Methos for the Sciences 1. Derivatives The erivative of a function f(x) is another function, efine in terms of a limiting expression: f (x) f (x) lim x δx 0 f(x + δx)
More informationTEMPORAL AND TIME-FREQUENCY CORRELATION-BASED BLIND SOURCE SEPARATION METHODS. Yannick DEVILLE
TEMPORAL AND TIME-FREQUENCY CORRELATION-BASED BLIND SOURCE SEPARATION METHODS Yannick DEVILLE Université Paul Sabatier Laboratoire Acoustique, Métrologie, Instrumentation Bât. 3RB2, 8 Route e Narbonne,
More informationLATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION
The Annals of Statistics 1997, Vol. 25, No. 6, 2313 2327 LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION By Eva Riccomagno, 1 Rainer Schwabe 2 an Henry P. Wynn 1 University of Warwick, Technische
More informationSwitching Time Optimization in Discretized Hybrid Dynamical Systems
Switching Time Optimization in Discretize Hybri Dynamical Systems Kathrin Flaßkamp, To Murphey, an Sina Ober-Blöbaum Abstract Switching time optimization (STO) arises in systems that have a finite set
More informationSummary: Differentiation
Techniques of Differentiation. Inverse Trigonometric functions The basic formulas (available in MF5 are: Summary: Differentiation ( sin ( cos The basic formula can be generalize as follows: Note: ( sin
More informationClosed and Open Loop Optimal Control of Buffer and Energy of a Wireless Device
Close an Open Loop Optimal Control of Buffer an Energy of a Wireless Device V. S. Borkar School of Technology an Computer Science TIFR, umbai, Inia. borkar@tifr.res.in A. A. Kherani B. J. Prabhu INRIA
More informationChaos, Solitons and Fractals Nonlinear Science, and Nonequilibrium and Complex Phenomena
Chaos, Solitons an Fractals (7 64 73 Contents lists available at ScienceDirect Chaos, Solitons an Fractals onlinear Science, an onequilibrium an Complex Phenomena journal homepage: www.elsevier.com/locate/chaos
More informationSlide10 Haykin Chapter 14: Neurodynamics (3rd Ed. Chapter 13)
Slie10 Haykin Chapter 14: Neuroynamics (3r E. Chapter 13) CPSC 636-600 Instructor: Yoonsuck Choe Spring 2012 Neural Networks with Temporal Behavior Inclusion of feeback gives temporal characteristics to
More informationOptimal Control of Spatially Distributed Systems
Optimal Control of Spatially Distribute Systems Naer Motee an Ali Jababaie Abstract In this paper, we stuy the structural properties of optimal control of spatially istribute systems. Such systems consist
More informationA Note on Exact Solutions to Linear Differential Equations by the Matrix Exponential
Avances in Applie Mathematics an Mechanics Av. Appl. Math. Mech. Vol. 1 No. 4 pp. 573-580 DOI: 10.4208/aamm.09-m0946 August 2009 A Note on Exact Solutions to Linear Differential Equations by the Matrix
More informationMulti-robot Formation Control Using Reinforcement Learning Method
Multi-robot Formation Control Using Reinforcement Learning Metho Guoyu Zuo, Jiatong Han, an Guansheng Han School of Electronic Information & Control Engineering, Beijing University of Technology, Beijing
More informationCascaded redundancy reduction
Network: Comput. Neural Syst. 9 (1998) 73 84. Printe in the UK PII: S0954-898X(98)88342-5 Cascae reunancy reuction Virginia R e Sa an Geoffrey E Hinton Department of Computer Science, University of Toronto,
More informationPolynomial Inclusion Functions
Polynomial Inclusion Functions E. e Weert, E. van Kampen, Q. P. Chu, an J. A. Muler Delft University of Technology, Faculty of Aerospace Engineering, Control an Simulation Division E.eWeert@TUDelft.nl
More informationExponential Tracking Control of Nonlinear Systems with Actuator Nonlinearity
Preprints of the 9th Worl Congress The International Feeration of Automatic Control Cape Town, South Africa. August -9, Exponential Tracking Control of Nonlinear Systems with Actuator Nonlinearity Zhengqiang
More informationThermal conductivity of graded composites: Numerical simulations and an effective medium approximation
JOURNAL OF MATERIALS SCIENCE 34 (999)5497 5503 Thermal conuctivity of grae composites: Numerical simulations an an effective meium approximation P. M. HUI Department of Physics, The Chinese University
More informationSeparation of Variables
Physics 342 Lecture 1 Separation of Variables Lecture 1 Physics 342 Quantum Mechanics I Monay, January 25th, 2010 There are three basic mathematical tools we nee, an then we can begin working on the physical
More informationPerturbation Analysis and Optimization of Stochastic Flow Networks
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. XX, NO. Y, MMM 2004 1 Perturbation Analysis an Optimization of Stochastic Flow Networks Gang Sun, Christos G. Cassanras, Yorai Wari, Christos G. Panayiotou,
More informationChapter 6: Energy-Momentum Tensors
49 Chapter 6: Energy-Momentum Tensors This chapter outlines the general theory of energy an momentum conservation in terms of energy-momentum tensors, then applies these ieas to the case of Bohm's moel.
More informationExpected Value of Partial Perfect Information
Expecte Value of Partial Perfect Information Mike Giles 1, Takashi Goa 2, Howar Thom 3 Wei Fang 1, Zhenru Wang 1 1 Mathematical Institute, University of Oxfor 2 School of Engineering, University of Tokyo
More informationWUCHEN LI AND STANLEY OSHER
CONSTRAINED DYNAMICAL OPTIMAL TRANSPORT AND ITS LAGRANGIAN FORMULATION WUCHEN LI AND STANLEY OSHER Abstract. We propose ynamical optimal transport (OT) problems constraine in a parameterize probability
More informationOn combinatorial approaches to compressed sensing
On combinatorial approaches to compresse sensing Abolreza Abolhosseini Moghaam an Hayer Raha Department of Electrical an Computer Engineering, Michigan State University, East Lansing, MI, U.S. Emails:{abolhos,raha}@msu.eu
More informationLecture 6: Calculus. In Song Kim. September 7, 2011
Lecture 6: Calculus In Song Kim September 7, 20 Introuction to Differential Calculus In our previous lecture we came up with several ways to analyze functions. We saw previously that the slope of a linear
More informationConvergence of Random Walks
Chapter 16 Convergence of Ranom Walks This lecture examines the convergence of ranom walks to the Wiener process. This is very important both physically an statistically, an illustrates the utility of
More informationII. First variation of functionals
II. First variation of functionals The erivative of a function being zero is a necessary conition for the etremum of that function in orinary calculus. Let us now tackle the question of the equivalent
More informationBEYOND THE CONSTRUCTION OF OPTIMAL SWITCHING SURFACES FOR AUTONOMOUS HYBRID SYSTEMS. Mauro Boccadoro Magnus Egerstedt Paolo Valigi Yorai Wardi
BEYOND THE CONSTRUCTION OF OPTIMAL SWITCHING SURFACES FOR AUTONOMOUS HYBRID SYSTEMS Mauro Boccaoro Magnus Egerstet Paolo Valigi Yorai Wari {boccaoro,valigi}@iei.unipg.it Dipartimento i Ingegneria Elettronica
More informationThe Exact Form and General Integrating Factors
7 The Exact Form an General Integrating Factors In the previous chapters, we ve seen how separable an linear ifferential equations can be solve using methos for converting them to forms that can be easily
More informationSurvey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013
Survey Sampling Kosuke Imai Department of Politics, Princeton University February 19, 2013 Survey sampling is one of the most commonly use ata collection methos for social scientists. We begin by escribing
More informationMulti-agent Systems Reaching Optimal Consensus with Time-varying Communication Graphs
Preprints of the 8th IFAC Worl Congress Multi-agent Systems Reaching Optimal Consensus with Time-varying Communication Graphs Guoong Shi ACCESS Linnaeus Centre, School of Electrical Engineering, Royal
More informationMath 342 Partial Differential Equations «Viktor Grigoryan
Math 342 Partial Differential Equations «Viktor Grigoryan 6 Wave equation: solution In this lecture we will solve the wave equation on the entire real line x R. This correspons to a string of infinite
More informationSituation awareness of power system based on static voltage security region
The 6th International Conference on Renewable Power Generation (RPG) 19 20 October 2017 Situation awareness of power system base on static voltage security region Fei Xiao, Zi-Qing Jiang, Qian Ai, Ran
More informationNumber of wireless sensors needed to detect a wildfire
Number of wireless sensors neee to etect a wilfire Pablo I. Fierens Instituto Tecnológico e Buenos Aires (ITBA) Physics an Mathematics Department Av. Maero 399, Buenos Aires, (C1106ACD) Argentina pfierens@itba.eu.ar
More informationConservation Laws. Chapter Conservation of Energy
20 Chapter 3 Conservation Laws In orer to check the physical consistency of the above set of equations governing Maxwell-Lorentz electroynamics [(2.10) an (2.12) or (1.65) an (1.68)], we examine the action
More informationEuler equations for multiple integrals
Euler equations for multiple integrals January 22, 2013 Contents 1 Reminer of multivariable calculus 2 1.1 Vector ifferentiation......................... 2 1.2 Matrix ifferentiation........................
More informationRobustness and Perturbations of Minimal Bases
Robustness an Perturbations of Minimal Bases Paul Van Dooren an Froilán M Dopico December 9, 2016 Abstract Polynomial minimal bases of rational vector subspaces are a classical concept that plays an important
More informationOptimal Control of Spatially Distributed Systems
Optimal Control of Spatially Distribute Systems Naer Motee an Ali Jababaie Abstract In this paper, we stuy the structural properties of optimal control of spatially istribute systems. Such systems consist
More informationGeneralized Tractability for Multivariate Problems
Generalize Tractability for Multivariate Problems Part II: Linear Tensor Prouct Problems, Linear Information, an Unrestricte Tractability Michael Gnewuch Department of Computer Science, University of Kiel,
More informationarxiv: v1 [math.mg] 10 Apr 2018
ON THE VOLUME BOUND IN THE DVORETZKY ROGERS LEMMA FERENC FODOR, MÁRTON NASZÓDI, AND TAMÁS ZARNÓCZ arxiv:1804.03444v1 [math.mg] 10 Apr 2018 Abstract. The classical Dvoretzky Rogers lemma provies a eterministic
More informationMulti-View Clustering via Canonical Correlation Analysis
Technical Report TTI-TR-2008-5 Multi-View Clustering via Canonical Correlation Analysis Kamalika Chauhuri UC San Diego Sham M. Kakae Toyota Technological Institute at Chicago ABSTRACT Clustering ata in
More informationSharp Thresholds. Zachary Hamaker. March 15, 2010
Sharp Threshols Zachary Hamaker March 15, 2010 Abstract The Kolmogorov Zero-One law states that for tail events on infinite-imensional probability spaces, the probability must be either zero or one. Behavior
More informationLECTURE NOTES ON DVORETZKY S THEOREM
LECTURE NOTES ON DVORETZKY S THEOREM STEVEN HEILMAN Abstract. We present the first half of the paper [S]. In particular, the results below, unless otherwise state, shoul be attribute to G. Schechtman.
More informationA new proof of the sharpness of the phase transition for Bernoulli percolation on Z d
A new proof of the sharpness of the phase transition for Bernoulli percolation on Z Hugo Duminil-Copin an Vincent Tassion October 8, 205 Abstract We provie a new proof of the sharpness of the phase transition
More informationEstimation of the Maximum Domination Value in Multi-Dimensional Data Sets
Proceeings of the 4th East-European Conference on Avances in Databases an Information Systems ADBIS) 200 Estimation of the Maximum Domination Value in Multi-Dimensional Data Sets Eleftherios Tiakas, Apostolos.
More informationFLUCTUATIONS IN THE NUMBER OF POINTS ON SMOOTH PLANE CURVES OVER FINITE FIELDS. 1. Introduction
FLUCTUATIONS IN THE NUMBER OF POINTS ON SMOOTH PLANE CURVES OVER FINITE FIELDS ALINA BUCUR, CHANTAL DAVID, BROOKE FEIGON, MATILDE LALÍN 1 Introuction In this note, we stuy the fluctuations in the number
More informationPerfect Matchings in Õ(n1.5 ) Time in Regular Bipartite Graphs
Perfect Matchings in Õ(n1.5 ) Time in Regular Bipartite Graphs Ashish Goel Michael Kapralov Sanjeev Khanna Abstract We consier the well-stuie problem of fining a perfect matching in -regular bipartite
More informationConvergence rates of moment-sum-of-squares hierarchies for optimal control problems
Convergence rates of moment-sum-of-squares hierarchies for optimal control problems Milan Kora 1, Diier Henrion 2,3,4, Colin N. Jones 1 Draft of September 8, 2016 Abstract We stuy the convergence rate
More informationA PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks
A PAC-Bayesian Approach to Spectrally-Normalize Margin Bouns for Neural Networks Behnam Neyshabur, Srinah Bhojanapalli, Davi McAllester, Nathan Srebro Toyota Technological Institute at Chicago {bneyshabur,
More informationPhysics 5153 Classical Mechanics. The Virial Theorem and The Poisson Bracket-1
Physics 5153 Classical Mechanics The Virial Theorem an The Poisson Bracket 1 Introuction In this lecture we will consier two applications of the Hamiltonian. The first, the Virial Theorem, applies to systems
More informationMonotonicity for excited random walk in high dimensions
Monotonicity for excite ranom walk in high imensions Remco van er Hofsta Mark Holmes March, 2009 Abstract We prove that the rift θ, β) for excite ranom walk in imension is monotone in the excitement parameter
More informationImplicit Differentiation
Implicit Differentiation Thus far, the functions we have been concerne with have been efine explicitly. A function is efine explicitly if the output is given irectly in terms of the input. For instance,
More informationθ x = f ( x,t) could be written as
9. Higher orer PDEs as systems of first-orer PDEs. Hyperbolic systems. For PDEs, as for ODEs, we may reuce the orer by efining new epenent variables. For example, in the case of the wave equation, (1)
More informationPermanent vs. Determinant
Permanent vs. Determinant Frank Ban Introuction A major problem in theoretical computer science is the Permanent vs. Determinant problem. It asks: given an n by n matrix of ineterminates A = (a i,j ) an
More informationthere is no special reason why the value of y should be fixed at y = 0.3. Any y such that
25. More on bivariate functions: partial erivatives integrals Although we sai that the graph of photosynthesis versus temperature in Lecture 16 is like a hill, in the real worl hills are three-imensional
More informationA Sketch of Menshikov s Theorem
A Sketch of Menshikov s Theorem Thomas Bao March 14, 2010 Abstract Let Λ be an infinite, locally finite oriente multi-graph with C Λ finite an strongly connecte, an let p
More informationAll s Well That Ends Well: Supplementary Proofs
All s Well That Ens Well: Guarantee Resolution of Simultaneous Rigi Boy Impact 1:1 All s Well That Ens Well: Supplementary Proofs This ocument complements the paper All s Well That Ens Well: Guarantee
More informationAcute sets in Euclidean spaces
Acute sets in Eucliean spaces Viktor Harangi April, 011 Abstract A finite set H in R is calle an acute set if any angle etermine by three points of H is acute. We examine the maximal carinality α() of
More informationIPA Derivatives for Make-to-Stock Production-Inventory Systems With Backorders Under the (R,r) Policy
IPA Derivatives for Make-to-Stock Prouction-Inventory Systems With Backorers Uner the (Rr) Policy Yihong Fan a Benamin Melame b Yao Zhao c Yorai Wari Abstract This paper aresses Infinitesimal Perturbation
More informationLearning Automata in Games with Memory with Application to Circuit-Switched Routing
Learning Automata in Games with Memory with Application to Circuit-Switche Routing Murat Alanyali Abstract A general setting is consiere in which autonomous users interact by means of a finite-state controlle
More informationBalancing Expected and Worst-Case Utility in Contracting Models with Asymmetric Information and Pooling
Balancing Expecte an Worst-Case Utility in Contracting Moels with Asymmetric Information an Pooling R.B.O. erkkamp & W. van en Heuvel & A.P.M. Wagelmans Econometric Institute Report EI2018-01 9th January
More informationHow to Minimize Maximum Regret in Repeated Decision-Making
How to Minimize Maximum Regret in Repeate Decision-Making Karl H. Schlag July 3 2003 Economics Department, European University Institute, Via ella Piazzuola 43, 033 Florence, Italy, Tel: 0039-0-4689, email:
More informationEquilibrium in Queues Under Unknown Service Times and Service Value
University of Pennsylvania ScholarlyCommons Finance Papers Wharton Faculty Research 1-2014 Equilibrium in Queues Uner Unknown Service Times an Service Value Laurens Debo Senthil K. Veeraraghavan University
More informationOn the number of isolated eigenvalues of a pair of particles in a quantum wire
On the number of isolate eigenvalues of a pair of particles in a quantum wire arxiv:1812.11804v1 [math-ph] 31 Dec 2018 Joachim Kerner 1 Department of Mathematics an Computer Science FernUniversität in
More informationSturm-Liouville Theory
LECTURE 5 Sturm-Liouville Theory In the three preceing lectures I emonstrate the utility of Fourier series in solving PDE/BVPs. As we ll now see, Fourier series are just the tip of the iceberg of the theory
More informationSchrödinger s equation.
Physics 342 Lecture 5 Schröinger s Equation Lecture 5 Physics 342 Quantum Mechanics I Wenesay, February 3r, 2010 Toay we iscuss Schröinger s equation an show that it supports the basic interpretation of
More information'HVLJQ &RQVLGHUDWLRQ LQ 0DWHULDO 6HOHFWLRQ 'HVLJQ 6HQVLWLYLW\,1752'8&7,21
Large amping in a structural material may be either esirable or unesirable, epening on the engineering application at han. For example, amping is a esirable property to the esigner concerne with limiting
More informationIntroduction to the Vlasov-Poisson system
Introuction to the Vlasov-Poisson system Simone Calogero 1 The Vlasov equation Consier a particle with mass m > 0. Let x(t) R 3 enote the position of the particle at time t R an v(t) = ẋ(t) = x(t)/t its
More informationOPTIMAL CONTROL OF A PRODUCTION SYSTEM WITH INVENTORY-LEVEL-DEPENDENT DEMAND
Applie Mathematics E-Notes, 5(005), 36-43 c ISSN 1607-510 Available free at mirror sites of http://www.math.nthu.eu.tw/ amen/ OPTIMAL CONTROL OF A PRODUCTION SYSTEM WITH INVENTORY-LEVEL-DEPENDENT DEMAND
More informationORDINARY DIFFERENTIAL EQUATIONS AND SINGULAR INTEGRALS. Gianluca Crippa
Manuscript submitte to AIMS Journals Volume X, Number 0X, XX 200X Website: http://aimsciences.org pp. X XX ORDINARY DIFFERENTIAL EQUATIONS AND SINGULAR INTEGRALS Gianluca Crippa Departement Mathematik
More informationCombinatorica 9(1)(1989) A New Lower Bound for Snake-in-the-Box Codes. Jerzy Wojciechowski. AMS subject classification 1980: 05 C 35, 94 B 25
Combinatorica 9(1)(1989)91 99 A New Lower Boun for Snake-in-the-Box Coes Jerzy Wojciechowski Department of Pure Mathematics an Mathematical Statistics, University of Cambrige, 16 Mill Lane, Cambrige, CB2
More information