Convergence of reinforcement learning with general function approximators

Size: px
Start display at page:

Download "Convergence of reinforcement learning with general function approximators"

Transcription

1 Convergene of reinforement learning with general funtion approximators assilis A. Papavassiliou and Stuart Russell Computer Siene Division, U. of California, Berkeley, CA Abstrat A key open problem in reinforement learning is to assure onvergene when using a ompat hypothesis lass to approximate the value funtion. Although the standard temporal-differene learning algorithm has been shown to onverge when the hypothesis lass is a linear ombination of fixed basis funtions, it may diverge with a general (nonlinear) hypothesis lass. This paper desribes the Bridge algorithm, a new method for reinforement learning, and shows that it onverges to an approximate global optimum for any agnostially learnable hypothesis lass. Convergene is demonstrated on a simple example for whih temporal-differene learning fails. eak onditions are identified under whih the Bridge algorithm onverges for any hypothesis lass. Finally, onnetions are made between the omplexity of reinforement learning and the PAC-learnability of the hypothesis lass. 1 Introdution Reinforement learning (RL) is a widely used method for learning to make deisions in omplex, unertain environments. Typially, an RL agent pereives and ats in an environment, reeiving rewards that provide some indiation of the quality of its ations. The agent s goal is to maximize the sum of rewards reeived. RL algorithms work by learning a value funtion that desribes the long-term expeted sum of rewards from eah state; alternatively, they an learn a Q-funtion desribing the value of eah ation in eah state. These funtions an then be used to make deisions. Temporal-differene (TD) learning [Sutton, 1988] is a ommonly used family of reinforement learning methods. TD algorithms operate by adjusting the value funtion to be loally onsistent. hen used with funtion approximators, suh as neural networks, that provide a ompat parameterized representation of the value funtion, TD methods an solve realworld problems with very large state spaes. Beause of this, one would like to know if suh algorithms an be guaranteed to work i.e., to onverge and to return optimal solutions. The theoretial study of RL algorithms usually divides the problem into two aspets: exploration poliies that an guarantee omplete overage of the environment, and value determination to find the value funtion that orresponds to a given poliy. This paper onentrates on the seond aspet. Prior work [Jaakkola et al., 1995] has established onvergene of TD-learning with probability 1 when the value funtion is represented as a table where eah state has its own entry. For large state spaes, however, ompat parametri representations are required; for suh representations, we are interested in whether an algorithm will onverge to the funtion that is losest, by some metri, to the true value funtion (a form of agnosti learning). Gordon [1995] proved that TD onverges in this sense for representations alled averagers on whih the TD update is a max-norm ontration (see Setion 2). Tsitsiklis and van Roy [1996] proved onvergene and established error bounds for TD() with linear ombinations of fixed basis funtions. ith nonlinear representations, suh as neural networks, TD has been observed to give suboptimal solutions [Bertsekas and Tsitsiklis, 1996] or even to diverge. This is a serious problem sine most real problems require nonlinearity. Baird [1995] introdued residual algorithms, for whih onvergene an be proved when ombined with a gradient desent learning method (suh as used with neural networks). Unfortunately the error in the resulting approximation an be arbitrarily large and furthermore the method requires two i n- dependent visits to eah sampled state. This paper desribes the Bridge algorithm, a new RL method for whih we establish onvergene and error bounds with any agnostially learnable representation. Setion 2 provides the neessary definitions and notation. Setion 3 explains the problem of nononvergene and provides examples of this with TD. Setion 4 outlines the Bridge algorithm, skethes the proof of onvergene, and shows how it solves the examples for whih TD fails. Setion 5 briefly overs additional results on onvergene to loal optima for any representation and on the use of PAC-learning theory. Setion 6 mentions some alternative tehniques one might onsider. The paper is neessarily tehnially dense given the spae restritions. The results, however, should be of broad interest to the AI and mahine learning ommunities. 2 Definitions 2.1 MDP A Markov deision proess M = (S; A; p; r; ) is a set of states S, a set of ations A, transition probability distributions p(jx; a) that define the next state distribution given a urrent state x and ation a, reward distributions r(jx; a) that define the distribution of real-valued reward reeived upon exeuting a in x, and a disount fator 2 (0; 1). Sine we are interested in the problem of value determination, we assume we

2 are given a fixed poliy (hoie of ation at eah state). hen exeuting only this fixed poliy, the MDP atually beomes a Markov hain, and we may therefore also write the transition probabilities as p(jx) and the reward distributions as r(jx). e assume that we are able to define the stationary distribution of the resulting Markov hain and also that the rewards lie in the range [?R max ; R max ]. Let (X 1 ; X 2 ; X 3 ; :::) be a random trae in the Markov hain starting from x, i.e. X 1 = x, X 2 has distribution p(jx 1 ) and X k has distribution p(jx k?1 ). Let (R 1 ; R 2 ; R 3 ; :::) be the observed random rewards, i.e. R k has distribution r(jx k ). Define the true value funtion at a state x to be the expeted, disounted reward to go from state x: (x) = E[R 1 + R R 3 + :::] The problem of value determination is to determine the true value funtion or a good approximation to it. Classial TD solutions have made use of the bakup operator T, whih takes an approximation and produes a better approximation T(x) = E[R 1 + (X 2 )] = E[R 1 ] + X x 22S p(x 2 jx)(x 2 ) An operator A is said to be a ontration with fator < 1 under some norm k k if 8; ka? Ak k? k If = 1, A is said to be a nonexpansion. If we define the max-norm to be kk max = max x2s (x) and the -norm to be kk = [ P x2s (x) 2 (x)] 1=2, then T is a ontration with fator and fixed point under both max-norm and -norm [Tsitsiklis and an Roy, 1996]. Therefore repeated appliation of T (i.e. the iterative proess n+1 = T n ) onverges to. e will use T j to represent the operator T applied j times. If the transition probabilities p and reward distributions r are known, then it is possible to ompute T diretly from its definition. owever, if p and r are not known, then it is not possible to ompute the expetation in the definition of T. In this ase, by observing a sequene of states and rewards in the Markov hain, we an form an unbiased estimate of T. Speifially, if we observe state x and reward r followed by state x 2, then the observed baked-up value, r + (x 2 ), is an unbiased estimate of T(x). Formally we define P T (jx) to be the onditional probability density of observed bakedup values from state x: PT (yjx) = Pr[R 1 + (X 2 ) = yjx] where R 1 and X 2 are, as defined above, random variables with distributions r(jx) and p(jx) respetively. Thus if a random variable Y has assoiated density PT (jx) then E[Y ] = T(x). Similarly, we define PT j (jx) to be the onditional probability density of j-step baked-up values observed from state x. 2.2 Funtion Approximation As state spaes beomes large or even infinite, it beomes infeasible to tabulate the value funtion for eah state x, and we must resort to funtion approximation. Our approximation sheme onsists of a hypothesis lass of representable funtions and a learning operator whih maps arbitrary value funtions to funtions in. The standard T based approahes that use funtion approximation essentially ompute or approximately ompute the iterative proess n+1 = T n. In pratie, the T mapping usually annot be performed exatly beause, even if we have aess to the neessary expetation to ompute T(x) exatly, it is infeasible to do so for all states x. Thus we perform an approximate mapping using samples. e will take the state sample distribution to be the stationary distribution. In general when we annot ompute T(x) exatly, we approximate T by generating samples (x; y) with sample distribution PT (x; y) = (x)p T (yjx) and passing them to a learning algorithm for. The joint probabilitydensity PT (from whih we generate the samples) simply ombines (from whih we sample the state x) with the onditional probability density PT (jx) (from whih we generate an estimate y for T(x)). In this paper we fous on agnosti learning. In this ase, the learning operator seeks the hypothesis h that best mathes the target funtion, even though typially the target funtion is not in the hypothesis lass. If we measure distane using the -norm, then we an define the learning operator for agnosti learning to be: = argmin h2 kh? k As already mentioned, in the typial ase we do not have aess to the exat funtion to be learned, but rather we an draw samples (x; y) from a sample distribution P suh that the expeted value of the onditional distribution P(jx) is (x). If, in addition, P samples the states aording to (or whatever distribution was used to measure distane in the previous definition) then an equivalent definition for agnosti learning is based on minimizing risk: = argmin h2 r(h; P) where we define the risk of a hypothesis h with respet to a distribution P to be: Z r(h; P) = (h(x)? y) 2 dp(x; y) (S;<) In pratie, is approximately performed by generating enough samples (x; y) from the sample distribution P so as to be able to estimate risk well, and thus to be able to output the hypothesis in that has minimal risk with respet to this distribution. In the algorithm we present, we assume the ability to ompute exatly for the given hypothesis lass. This is ertainly not a trivial assumption. In a later setion, we briefly disuss an extension to our algorithm for the ase where ; is a PAC-agnosti learning step rather than an exat agnosti learning step. Finally, let us define our goal. is defined to be the best approximation to possible using : = = argmin h2 kh? k e seek tehniques that return a value funtion that minimizes a relative or absolute error bound: k? k k? k or k? k 3 Nononvergene of TD In this setion, we examine the non-onvergene problem of TD when used with non-linear funtion approximators. e present simple examples whih we will reonsider with the Bridge algorithm in the next setion. As mentioned above, standard TD with funtion approximation is based on the iterative proess n+1 = T n. If

3 is a non-expansion under the same norm that makes T a ontration, then the omposite operator T is a ontration and this proess will onverge to some error bound relative to. For example, Tsitsiklis and an Roy [1996] onsider a linear hypothesis lass, for whih is simply a projetion. If one uses a nonlinear hypothesis lass for whih is not a nonexpansion then this iterative proess an either diverge or get stuk in a loal minimum arbitrarily far from. e now give simple examples demonstrating ways in whih TD an fail when it is used with a nonlinear hypothesis lass. Consider an MDP with two states where the probability of going from one state to the other is 1, and the rewards are also deterministi. The stationary distribution is 0:5 for eah state, the disount fator is.8 and the hypothesis lass is the subset of the value funtion spae R 2 given by the v 1 - axis, the v 2 -axis and the two diagonals v 1 = v 2 and v 1 =?v 2. Formally, = f(v 1 ; v 2 ) 2 R 2 : v 1 = 0 or v 2 = 0 or jv 1 j = jv 2 jg. The learning operator projets a funtion onto the nearest of the 4 axes of. For example, (6; 4) = (5; 5), (6;?4) = (5;?5) and (?7;?1) = (?7; 0). e first onsider the ase where the rewards of the two states are (r 1 ; r 2 ) = (10;?8). The true value funtion turns out to be (10; 0). Note that in this ase, the true value funtion is atually in the hypothesis lass. Starting from 0 = (0; 0) the result of repeated appliations of T is shown in Figure 1(a). For example, the first step is T 0 = (r 1 + (2); r 2 + (1)) = (10;?8) = (9;?9). This proess onverges to = (5;?5) whih is a fixed point beause T(5;?5) = (6;?4) = (5;?5). Remember that 2, so the relative error bound k? k k? k is infinite. Thus T an onverge arbitrarily far (in terms of relative error bound) from the best representation in of the true value funtion. =(10,0) (9, 9) (7.5,7.5) =(10.44,.56) (16,0) (30,0) e begin with a high level desription of the algorithm (details are in the Appendix). This is followed by the onvergene results and another look at the examples from the previous setion. The main algorithm BridgealueDet, determines the value funtion within some error bound by making repeated alls to BridgeStep. e will now desribe the first invoation of BridgeStep. Metaphorially it onsists of throwing a bridge aross the treaherous terrain that is the hypothesis lass, towards a point on the far side of the optimal solution. If the bridge lands somewhere lose to where we aimed it, we will be able to walk along it in a produtive diretion (ahieve a ontration). If the bridge lands far from our target, then we know that there isn t any -expressible value funtion near our target on whih the bridge ould have landed (hene an error bound). This is made preise by Lemma 2 in the next setion. e are given an old approximation from whih we try to reate a better approximation new. e basially have two toolsto work with: T and. As an be seen in Figure 2 (and in the example in the previous setion), if we ombine these two operators in the standard way, new = T, we an get stuk in a loal minimum. e will instead use them more reatively to guarantee progress or establish an error bound. T = T Figure 2: Stuk in a loal minimum e begin by using not T but rather T j where j is determined by the main algorithm BridgealueDet before it alls BridgeStep. e an then ask the question, given we know where and T j are, what does that tell us about the loation of? It turns out that is restrited to lie in some hypersphere whose position an be defined in terms of the positions of and T j. This is made preise by Lemma 1 in the next setion. The hypersphere is depited in Figure 3 and as required, lies inside it. B Figure 1: (a) Suboptimal Fixed Point and (b) Osillation If we modify the rewards slightly to be (r 1 ; r 2 ) = (10;?7:8) then the true value funtion = (10 4 ; 5 ) is 9 9 no longer in. The best representation of is = = (10 4 ; 0). If we start from (0; 0) as above, we will 9 again reah a suboptimal fixed point around (5;?5). owever, starting from 0 = (30; 0) (or even 0 = (15; 0)) the result of repeated appliations of T as shown in Figure 1(b) displays a different type of failure osillation between points approahing (7:5; 7:5) and (16; 0). As in the previous example, k? k is small, so the relative error bound is large. 4 The Bridge Algorithm T j T Figure 3: The bridge is aimed e now define a new operator B based on T j and the identity operator I. B = I + 1 1? j (Tj? I) 1 B simply amplifies the Bellman residual by a fator of 1? j. As an be seen in Figure 3, B is the point on the far side

4 of the hypersphere from. This operator is what we use to throw a bridge. e aim the bridge for B, whih is beyond anywhere where our goal might be, i.e.the true value funtion lies somewhere between and B. The motivation for using B is in a sense to jump over all loal minima between and. Ideally we would be able to represent B (just as in the standard approah we would want to represent T) but this funtion is most likely not in our lass of representable funtions. Therefore we must apply the operator to map it into. The result, = B 2, is shown in Figure 4. The bridge is supported by and and is shown as a line between them. In summary we throw the bridge aiming for B, but determines the point on whih it atually lands. B = B Figure 4: The bridge is established In pratie we perform the mapping by generating samples from an appropriate distribution and passing them to a learning algorithm for. In partiular to ompute B, we generate samples (x; y) aording to the distribution: P B (x; y) = (x)p B (yjx) = (x)pt j ((y? (x))(1? j ) + (x)jx) The key feature of this distributionis that if a random variable Y has assoiated density PB (jx) then E[Y ] = B(x). The final step is to walk along the bridge. The bridge is a line between and and our new approximation new will be some point on this line (see Figure 5). This point is determined by projeting a point 1 of the way from to T j onto the line, where is a funtion of the input parameters. (e ould just projet T j, but using is a refinement that yields a better guaranteed effetive ontration fator.) T j new Figure 5: The new approximation Thus the new approximation new, whih is not neessarily in, is a weighted average of the old approximation and 2. Calulating the weights ( and 1?) in this average requires the ability to measure distane and risk. In partiular we need to measure the distane between and and the risk of and with respet to the distribution P T j. These three lengths (and ) determine the relative position of new with respet to and (See Figure 5). In pratie we estimate the true risk with the empirial risk [aussler, 1992], whih we alulate using samples drawn from the distribution P T j. e have just desribed a single invoation of BridgeStep that represents the first iteration of the main algorithm. Eah iteration builds a new bridge based on the previous one, so a generi iteration would begin with a that was the new of the previous iteration (see Figure 6). In partiular, the input of a generi iteration is not in, but is rather a linear ombination of the initial approximation 0 and all previous funtions. Thus the final result is a tall weighted tree whose leaves are in. If we insist on a final result that is in, then we an apply a final mapping at the very end. Just as the standard TD algorithm was summarized as n+1 = T n, the Bridge algorithm an be essentially summarized as n+1 = (1? n ) n + n B n = ((1? n )I + n (I + 1 n-1 B n n j T n n+1 n-1 1? j (Tj? I))) n Figure 6: Generi iteration of BridgeStep 4.1 Convergene of the Bridge algorithm e will state the main onvergene theorem for the Bridge algorithm, but spae limitations allow us to state only the two most important Lemmas used in the proof. e begin with a very useful observation about the geometri relationship between ; T and. Lemma 1 Let A be a ontration with ontration fator under some norm. Let be the fixed point of A. For any point let O = + 1 1? 2 (A? ). Then, k? Ok 1? 2 ka? k In words, given the positions of and A, let = ka? k. Then we know that the position of has to be on or inside the hypersphere of radius 1? entered at O (see 2 Figure 7). This hypersphere is simply the set of points that are at least a fator of loser to A than to. Note that the distane from to the furthest point on the hypersphere is 1?. e apply Lemma 1 using T j for A and j for. This defines a hypersphere inside of whih the true value funtion must lie. Lemma 1 is used mainly to prove Lemma 2, whih haraterizes the behavior of BridgeStepand provides most of the meat of the onvergene proof. Lemma 2 Given an approximation and parameters > 0 and j 1, BridgeStep(; ; j) returns a new approxima- n

5 arsinζ ζ 1 ζ 2 O T 1+ζ 1 ζ 2 1 ζ 2 Figure 7: ypersphere ontaining 1 ζ tion new that satisfies at least one of the following two onditions, where the error bound = errbound(; ; j) is defined in the Appendix k new? k k? k (Contration) k new? k k? k (Error Bound) Intuitively, if the bridge lands lose to where we aimed it, we will ahieve a ontration towards the goal. If the bridge lands far away, we will prove a relative error bound for the result. The key quantity that determines whih of these two events happens, is the angle formed between the bridge and the line from to B. If = B is lose to B, then will be small, the bridge will lie lose to the hypersphere, and we will be able to walk along the bridge and make progress. If instead is far from B, then will be large and walking along the bridge will not take us loser to the goal, but we will be able to prove that we are already lose enough. Figure 8 shows the ase where the angle is small. As desribed previously, the small hypersphere represents the set of points that are at least a fator of j loser to T j than they are to. This follows from applying Lemma 1 to the operator T j. Now think of BridgeStep as an operator that takes and returns new, and ask the question, what set of points are at least a fator of (whih is an input parameter to BridgeStep) loser to new than to? Applying Lemma 1 to this question defines another, muh larger hypersphere whih is depited in Figure 8 with enter at O for the ase = arsin? arsin j. Note that this larger hypersphere ompletely ontains the smaller hypersphere whih ontains. Thus also lies inside the larger hypersphere and so new is at least a fator of loser to than is. This holds for = arsin? arsin j. If is smaller than this, the ahieved ontration is even better. arsinγ j B j T new θ Figure 8: Contration is ahieved when is small Figure 9 shows the ase where the angle is large. is large when it is not possible to find a hypothesis in lose to B. In fat we hoose = B to be the losest suh hypothesis, so the rest of must lie further away. In partiular must lie ompletely outside the big hypersphere depited O in Figure 9 with enter at B, for otherwise would not be the losest hypothesis to B. Furthermore we know that must lie on or inside the small hypersphere in Figure 9. Thus there is a separation between and and this separation allows us to prove, for any possible position of, an upper bound on the relative error knew? k k? k. B new Figure 9: Relative error bound is established when is large It should be noted that in general we do not know and we annot measure to determine whih of the two onditions of Lemma 2 new satisfies. e only know that it satisfies at least one of them. By Lemma 2, if already satisfies the relative error bound then so will new, beause if new ahieves a ontration over, its error dereases. Thus eah suessive approximation is better than the one before, until we ahieve the relative error bound from whih point every subsequent approximation will also ahieve that bound. e now give the main result, whih is guaranteed onvergene to a relative or absolute error bound. Moreover, the maximum number of invoations of BridgeStep, and thus the maximum number of hypotheses in the linear ombination, an be speified. Theorem 1 Let > 1 and 0 > 0 be the desired relative and absolute error bounds respetively. Let N be an upper bound on the desired number of iterations. Then the algorithm BridgealueDet(; 0 ; N ) produes an approximation ~, onsisting of a linear ombination of at most N +1 hypotheses from, that satisfies at least one of either the relative error bound or the absolute error bound 0 : k~? k k? or k~? k 0 k The proof of the theorem follows diretly from Lemma 2; rewards are bounded, so the true value funtion is bounded, so the absolute error of the initial approximation an be bounded. If all N iterations ahieve a ontration, then the absolute error will be smaller than requested. If at least one of the iterations failed to ahieve a ontration, then it ahieved a relative error bound and all subsequent iterations, inluding the last one, will ahieve the requested relative error bound. Again, sine we do not know whih of the two onditionsof Lemma 2 θ

6 eah iteration satisfies, we do not know whether the final answer ~ satisfies the relative or the absolute error bound. e know only that it satisfies at least one of them. Corollary 1 Let, 0 and N be as defined in Theorem 1. Let ~ = BridgealueDet(; 0 ; N ), a linear ombination of hypotheses from. Then ~ = ~, the result of mapping ~ bak into, satisfies at least one of either the relative error bound or absolute error bound 0 (2 + 1 ). 4.2 The Examples Revisited e now reonsider the examples from Setion 3. The main algorithm BridgealueDet takes parameters ; 0 ; and N; from whih it omputes the number of lookahead steps j to use to ahieve the requested error bounds. Also for eah iteration, it hooses a parameter n whih determines the ontration fator ahieved or relative error bound established for that iteration. These two parameters, j and n, are passed to BridgeStep at eah iteration. In this setion, we examine the effet of repeated appliations of BridgeStep, using j = 3 and = :99 for every iteration. For the first example, with initial 0 = (0; 0), the results of repeated appliations of BridgeStep are shown in Figure 10(a). Beause for this example k? k = 0 (i.e. the true value funtion is in ), the relative error bound is always infinite. Therefore, by Lemma 2, every step ahieves at least a ontration and so the algorithmonverges to the true value funtion =(10,0) =(10.44,.56) Figure 10: Examples revisited with Bridge For the seond example, with 0 = (30; 0), the results of repeated appliations of BridgeStep are shown in Figure 10(b). Looking at the first step more losely, T j 0 = (10:2; 10:6), B 0 = (?10:7; 21:7) and 0 = B 0 = (?16:2; 16:2). The dotted line between 0 and 0 is the bridge. 1, being a weighted average of 0 and 0, lies on this bridge. Similarly, 2 lies on the bridge between 1 and relative error bound effetive ontration fator =(30,0) =( 16.2,16.2) =(0,15.1) =(5.5,5.5) 2=(0,10.2) 7=(15.3,4.0) =(5.5,5.5) Figure 11: (a) Lemma 2 applied to seond example (b) Linear ombination.09 6=(8.7,0) 5=(5.5,5.5) Figure 11(a) demonstrates Lemma 2 on every fifth appliation of BridgeStep. In partiular, note that the effetive 0 ontration fator only exeeds after the desired relative error bound errbound(; ; j) = 4:3 has been ahieved. In fat on this example, the algorithm performs far better than the theory guarantees. Figure 11(b) shows the weights of the averages and the struture of the resulting linear ombination after 7 steps. 5 Extensions It is possible to extend the algorithm in many ways. In partiular relying on an exat, agnosti learning operator is not pratial. ere we briefly disuss the use of two other learning operators and we hope in the future to onsider others still. 5.1 PAC(; ) learning Most signifiantly we have extended our algorithm to the ase where the learning step annot be done exatly but is instead a PAC learning step ; (see [Papavassiliou and Russell, 1998]). e atually use the same and for every iteration, so the learning step ; has the same omplexity for every iteration. This is simple but most likely not optimal. One appealing aspet of onsidering PAC agnosti learning is the potential availability of sample omplexity results based on some measure of the omplexity of. Unfortunately it is neessary to learn and estimate risk under the stationary distribution of the Markov hain. Simply running the hain to generate samples will only generate them orretly in the steadystate limit. Therefore omputing sample omplexity results for the risk estimation and agnosti learning steps requires extending the urrent state of the theory to the ase where samples are generated from a Markov hain, rather than i.i.d. One would expet the sample omplexity to depend on the mixing time of the Markov hain and the variane of the sample distribution. The form of these theorems will also determine the extent to whih samples an be reused between the different risk estimation steps within an iteration or even aross iterations. 5.2 Suboptimal learning Previous algorithms for this problem have been shown to onverge for learning operators that are non-expansions k? k k? k The onvergene results for Bridge hold for learning operators that perform agnosti learning. Unfortunately there is a general lak of useful agnosti learning algorithms (the risk minimization step is typially intratable), so it would be benefiial to extend the results to learning systems that are not optimal. It is possible to weaken the onditions on the learning operator and give onvergene results for Bridge that hold when satisfies the banana-fudge ondition k? k k1(k? k)? k 2(k? k) k? k for some nondereasing funtions k1 > 0 and k2. The only modifiation neessary to Bridge is to inlude k1 and k2 in the alulation of the relative error bound errbound(; ; j). Note that for k1(z) = 1 and k2(z) = z, this ondition redues to k? k? k? k k? k whih is in fat the property of agnosti learning that is used to derive the results in this paper. Intuitively, the nonexpansion ondition for requires that two points that are lose to eah other, are mapped lose

7 to eah other. The banana-fudge ondition requires that two points that are lose to eah other, are mapped a similar distane away, but they an be mapped in opposite diretions and so end up very far from eah other. The banana-fudge ondition is obviously the weaker one, requiring only similarity in level of suess and not similarity in outome. It disallows the ase where one funtion is learned very well, but another funtion very lose to the first is learned very poorly. e are urrently searhing for learning algorithms that satisfy the banana-fudge ondition, but unfortunately it seems most ommon pratial learning algorithms do not. 6 Other Approahes e briefly disuss other known alternatives to Bridge as well as mention some of the new diretions one might onsider. 6.1 Alternatives If is onvex and is the agnosti learning operator, then is a nonexpansion and n+1 = T n onverges. For nononvex, an alternative approah to Bridge is to PAC-agnostially learn the onvex hull of using at eah iteration [Lee et al., 1995]. The resulting iterated proedure n+1 = onvex?hull() T n onverges sine onvex?hull() is a nonexpansion. Unfortunately, this algorithm requires many more agnosti learning steps per iteration than seems pratial. A noniterative method that returns the optimal answer is to redue the value determination problem to a single instane of supervised learning by using the operator T 1 (otherwise known as TD( = 1)). It does unlimited lookahead, has ontration fator 1 = 0 and so it generates after just one iteration. Looked at another way, the distribution PT 1(jx) has mean (x) and so T 1 =. Unfortunately there is empirial evidene that suggests the sample distribution PT is very hard to learn and requires very many samples (perhaps 1 beause it an have high variane). Stritly speaking, it is not neessary to bakup values beyond the -horizon whih is log R max (1?). Even this, however, may yield sample distributions with too muh variane for pratial use, although it is offset by the need to perform only a single learning step. Finally it may be possible, using Lemma 1, to establish onvergene rates and error bounds for the iterated proedure n+1 = T m n where m is less than the -horizon. owever, m would probably have to be muh larger than j, the number of lookahead steps used by Bridge, and so again we would expet bad sample omplexity. 6.2 New Diretions There are many ways in whih the basi tools used in onstruting this algorithm might be used in onstruting more powerful methods. Speifially the geometri relationship between ; T; and established in Lemma 1 is very useful in (1) providing geometri intuition to design new methods and (2) proving performane guarantees for these methods. One an think of many different ways to throw a bridge and many different kinds of bridges to throw. For example, we establish, the other end of the bridge by learning the point + 1 (T j 1? j? ). This hoie is rather arbitrary, piked to simplify the error bound analysis. One might try instead learning a point further or a little loser to. One we establish, we throw a one-dimensional, linear bridge from to and learn a point lose to T j on this line (learning is equivalent to projetion in linear hypothesis lasses). One might try establishing more than two points with whih to support the bridge. For example, given and after establishing (1), we ould try establishing (2) by learning a point strategially loated far from both and (1) on the other side of the hypersphere defined by Lemma 1. Then we ould throw a two-dimensional, planar bridge aross these three points and projet T j (or a point lose by) onto this plane. e an ontinue in this way, onsidering methods that establish (1); :::; (n) and use an n-dimensional hyperplane to learn T j. In the logial limit this method looks like a loal version of [Lee et al., 1995] whih learns the full onvex hull. It is loal in that it only loses under weighted averaging those points of that are losest to some point of the hypersphere defined by Lemma 1. Our urrent method whih only uses one-dimensional bridges is effetively a light version of these onvex-hull methods, in that before learning T j it loses under linear ombinations only two points from, namely and. 7 Conlusion e have developed a method that redues the value determination problem to the agnosti learning problem. Requesting that our algorithm halt in fewer iterations or with better error bounds pushes more of the omplexity into the learning step and in the limit effetively fores it to onsider infinite lookahead whih is T 1. Similarly, if we were to extend our algorithm to use more and more supports for the bridge, we suspet it would approximate the performane of the onvex hull learning algorithm. Thus our method an be thought of as a more versatile and hopefully more effiient alternative to these aggressive methods. The key features that haraterize our approah are (1) the ompliation of learning is abstrated into a learning operator, (2) we use a new operator B rather than being restrited to the bakup operator T, (3) we form linear ombinations of hypotheses from a lass rather than being limited to just, and (4) we use Lemma 1 to prove onvergene and error bound results. These tehniques an be applied or modified to develop endless variations on Bridge as well as ompletely new algorithms. A big missing ingredient in justifying one method over another is sample omplexity. In partiular, we do not know how sample omplexity depends on the lookahead j, or, in the ase of ;, how it depends on, and so we annot properly trade off these parameters to ahieve the best performane. Our results are stated for the problem of value determination, but they apply to any situation with an operator that is a ontration with respet to a norm defined by a samplable distribution. For the problem of value determination, the operator is T, the one-step bakup operator, and it is a ontration under the norm defined by the stationary distribution of the Markov hain. As stated previously, this distribution an only be sampled exatly in the steady-state limit, so improvements in the theory are neessary. Finally, a big hurdle to a pratial, implementable algorithm is the lak of useful, well-behaved (agnosti or not) learning algorithms. By applying the tehniques used in developing Bridge, we hope to bridge the gap between the available supervised learning algorithms and those needed by theoretially justified reinfore-

8 ment learning methods. Referenes [Baird, 1995] Leemon Baird. Residual algorithms: Reinforement learning with funtion approximation. In Proeedings of the Twelfth International Conferene on Mahine Learning, Tahoe City, CA, Proeedings of the Twelfth International Conferene on Mahine Learning Morgan Kaufmann. [Bertsekas and Tsitsiklis, 1996] D. C. Bertsekas and J. N. Tsitsiklis. Neuro-dynami programming. Athena Sientifi, Belmont, Mass., [Gordon, 1995] Geoffrey J. Gordon. Stable funtion approximation in dynami programming. In Proeedings of the Twelfth International Conferene on Mahine Learning, Tahoe City, CA, July Morgan Kaufmann. [aussler, 1992] David aussler. Deision theoreti generalizations of the pa model for neural net and other learning appliations. Information and Computation, 100(1):78 150, [Jaakkola et al., 1995] Tommi Jaakkola, Satinder P. Singh, and Mihael I. Jordan. Reinforement learning algorithm for partially observable Markov deision problems. In G. Tesauro, D. Touretzky, and T. Leen, editors, Advanes in Neural Information Proessing Systems 7, pages , Cambridge, Massahusetts, MIT Press. [Lee et al., 1995].S. Lee, P.L. Bartlett, and R.C. illiamson. On effiient agnosti learning of linear ombinations of basis funtions. In Proeedings of the Eighth Annual Conferene on Computational Learning Theory, pages , [Papavassiliou and Russell, 1998]. Papavassiliou and S. Russell. Convergene of reinforement learning with pa funtion approximators. Tehnial Report UCB//CSD , University of California, Berkeley, [Sutton, 1988] R. S. Sutton. Learning to predit by the methods of temporal differenes. Mahine Learning, 3:9 44, August [Tsitsiklis and an Roy, 1996] John N. Tsitsiklis and Benjamin an Roy. An analysis of temporal-differene learning with funtion approximation. Tehnial Report LIDS- P-2322, Laboratory for Information and Deision Systems, Massahusetts Institute of Tehnology, A Detailed Algorithm ere we give details of the algorithm as well as define the relative error bound errbound(; ; j). BridgealueDet first alulates the neessary number of bakup steps j in order to ahieve the desired error bounds within the desired number of iterations. For eah iteration it intelligently selets the parameter and alls the subroutine BridgeStep with the urrent approximation. Finally it detets when it an halt suessfully and returns a weighted tree of hypotheses. BridgealueDet(; 0; N ) 2R max 0 1=N max = ; goal = ; max = goal 1? max Choose smallest j suh that n = 0; total = 1; 0 = some initial hypothesis LOOP UNTIL total goal f Choose smallest n s.t. errbound( max; ; j) max n ' goal errbound( n; ; j) total n+1 = BridgeStep( n; n; j) total = total n; RETURN n BridgeStep(; ; j) n = n + 1g B = I + 1 1? j (Tj? I) = B; u = k? k v = r (; P T j ); w = r(; P T j ) = = p1 jp? 2j? 1? s 2 (1? 2 ) (1? 2 )(1? 2j ) = v? w + u 2u RETURN errbound(; ; j) = (1? ) + p1 jp? 2j? 1? p 2 1 =? 1 p(1 j [1 +? 2j ) p p 2 = 2 2 p 1? 2 1? 2j ; 3 = ( 1? j )2 (1? 2 )(? p 1? 2 )] 4 = 2j ; 5 = ; s =? 1? j 1? 2j IF 4 5 TEN RETURN + 1 ELSE RETURN 8r >< >: 4 1 3? 2 2 4( ) if s 5 q ( 4? 5 ) 2 if s > 5

Control Theory association of mathematics and engineering

Control Theory association of mathematics and engineering Control Theory assoiation of mathematis and engineering Wojieh Mitkowski Krzysztof Oprzedkiewiz Department of Automatis AGH Univ. of Siene & Tehnology, Craow, Poland, Abstrat In this paper a methodology

More information

Millennium Relativity Acceleration Composition. The Relativistic Relationship between Acceleration and Uniform Motion

Millennium Relativity Acceleration Composition. The Relativistic Relationship between Acceleration and Uniform Motion Millennium Relativity Aeleration Composition he Relativisti Relationship between Aeleration and niform Motion Copyright 003 Joseph A. Rybzyk Abstrat he relativisti priniples developed throughout the six

More information

Complexity of Regularization RBF Networks

Complexity of Regularization RBF Networks Complexity of Regularization RBF Networks Mark A Kon Department of Mathematis and Statistis Boston University Boston, MA 02215 mkon@buedu Leszek Plaskota Institute of Applied Mathematis University of Warsaw

More information

Methods of evaluating tests

Methods of evaluating tests Methods of evaluating tests Let X,, 1 Xn be i.i.d. Bernoulli( p ). Then 5 j= 1 j ( 5, ) T = X Binomial p. We test 1 H : p vs. 1 1 H : p>. We saw that a LRT is 1 if t k* φ ( x ) =. otherwise (t is the observed

More information

A Queueing Model for Call Blending in Call Centers

A Queueing Model for Call Blending in Call Centers A Queueing Model for Call Blending in Call Centers Sandjai Bhulai and Ger Koole Vrije Universiteit Amsterdam Faulty of Sienes De Boelelaan 1081a 1081 HV Amsterdam The Netherlands E-mail: {sbhulai, koole}@s.vu.nl

More information

Hankel Optimal Model Order Reduction 1

Hankel Optimal Model Order Reduction 1 Massahusetts Institute of Tehnology Department of Eletrial Engineering and Computer Siene 6.245: MULTIVARIABLE CONTROL SYSTEMS by A. Megretski Hankel Optimal Model Order Redution 1 This leture overs both

More information

The Laws of Acceleration

The Laws of Acceleration The Laws of Aeleration The Relationships between Time, Veloity, and Rate of Aeleration Copyright 2001 Joseph A. Rybzyk Abstrat Presented is a theory in fundamental theoretial physis that establishes the

More information

Maximum Entropy and Exponential Families

Maximum Entropy and Exponential Families Maximum Entropy and Exponential Families April 9, 209 Abstrat The goal of this note is to derive the exponential form of probability distribution from more basi onsiderations, in partiular Entropy. It

More information

Sensitivity Analysis in Markov Networks

Sensitivity Analysis in Markov Networks Sensitivity Analysis in Markov Networks Hei Chan and Adnan Darwihe Computer Siene Department University of California, Los Angeles Los Angeles, CA 90095 {hei,darwihe}@s.ula.edu Abstrat This paper explores

More information

Chapter 8 Hypothesis Testing

Chapter 8 Hypothesis Testing Leture 5 for BST 63: Statistial Theory II Kui Zhang, Spring Chapter 8 Hypothesis Testing Setion 8 Introdution Definition 8 A hypothesis is a statement about a population parameter Definition 8 The two

More information

Danielle Maddix AA238 Final Project December 9, 2016

Danielle Maddix AA238 Final Project December 9, 2016 Struture and Parameter Learning in Bayesian Networks with Appliations to Prediting Breast Caner Tumor Malignany in a Lower Dimension Feature Spae Danielle Maddix AA238 Final Projet Deember 9, 2016 Abstrat

More information

Nonreversibility of Multiple Unicast Networks

Nonreversibility of Multiple Unicast Networks Nonreversibility of Multiple Uniast Networks Randall Dougherty and Kenneth Zeger September 27, 2005 Abstrat We prove that for any finite direted ayli network, there exists a orresponding multiple uniast

More information

Measuring & Inducing Neural Activity Using Extracellular Fields I: Inverse systems approach

Measuring & Inducing Neural Activity Using Extracellular Fields I: Inverse systems approach Measuring & Induing Neural Ativity Using Extraellular Fields I: Inverse systems approah Keith Dillon Department of Eletrial and Computer Engineering University of California San Diego 9500 Gilman Dr. La

More information

Optimization of Statistical Decisions for Age Replacement Problems via a New Pivotal Quantity Averaging Approach

Optimization of Statistical Decisions for Age Replacement Problems via a New Pivotal Quantity Averaging Approach Amerian Journal of heoretial and Applied tatistis 6; 5(-): -8 Published online January 7, 6 (http://www.sienepublishinggroup.om/j/ajtas) doi:.648/j.ajtas.s.65.4 IN: 36-8999 (Print); IN: 36-96 (Online)

More information

A Characterization of Wavelet Convergence in Sobolev Spaces

A Characterization of Wavelet Convergence in Sobolev Spaces A Charaterization of Wavelet Convergene in Sobolev Spaes Mark A. Kon 1 oston University Louise Arakelian Raphael Howard University Dediated to Prof. Robert Carroll on the oasion of his 70th birthday. Abstrat

More information

max min z i i=1 x j k s.t. j=1 x j j:i T j

max min z i i=1 x j k s.t. j=1 x j j:i T j AM 221: Advaned Optimization Spring 2016 Prof. Yaron Singer Leture 22 April 18th 1 Overview In this leture, we will study the pipage rounding tehnique whih is a deterministi rounding proedure that an be

More information

SINCE Zadeh s compositional rule of fuzzy inference

SINCE Zadeh s compositional rule of fuzzy inference IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 14, NO. 6, DECEMBER 2006 709 Error Estimation of Perturbations Under CRI Guosheng Cheng Yuxi Fu Abstrat The analysis of stability robustness of fuzzy reasoning

More information

Computer Science 786S - Statistical Methods in Natural Language Processing and Data Analysis Page 1

Computer Science 786S - Statistical Methods in Natural Language Processing and Data Analysis Page 1 Computer Siene 786S - Statistial Methods in Natural Language Proessing and Data Analysis Page 1 Hypothesis Testing A statistial hypothesis is a statement about the nature of the distribution of a random

More information

Einstein s Three Mistakes in Special Relativity Revealed. Copyright Joseph A. Rybczyk

Einstein s Three Mistakes in Special Relativity Revealed. Copyright Joseph A. Rybczyk Einstein s Three Mistakes in Speial Relativity Revealed Copyright Joseph A. Rybzyk Abstrat When the evidene supported priniples of eletromagneti propagation are properly applied, the derived theory is

More information

A model for measurement of the states in a coupled-dot qubit

A model for measurement of the states in a coupled-dot qubit A model for measurement of the states in a oupled-dot qubit H B Sun and H M Wiseman Centre for Quantum Computer Tehnology Centre for Quantum Dynamis Griffith University Brisbane 4 QLD Australia E-mail:

More information

A Functional Representation of Fuzzy Preferences

A Functional Representation of Fuzzy Preferences Theoretial Eonomis Letters, 017, 7, 13- http://wwwsirporg/journal/tel ISSN Online: 16-086 ISSN Print: 16-078 A Funtional Representation of Fuzzy Preferenes Susheng Wang Department of Eonomis, Hong Kong

More information

Average Rate Speed Scaling

Average Rate Speed Scaling Average Rate Speed Saling Nikhil Bansal David P. Bunde Ho-Leung Chan Kirk Pruhs May 2, 2008 Abstrat Speed saling is a power management tehnique that involves dynamially hanging the speed of a proessor.

More information

Weighted K-Nearest Neighbor Revisited

Weighted K-Nearest Neighbor Revisited Weighted -Nearest Neighbor Revisited M. Biego University of Verona Verona, Italy Email: manuele.biego@univr.it M. Loog Delft University of Tehnology Delft, The Netherlands Email: m.loog@tudelft.nl Abstrat

More information

CMSC 451: Lecture 9 Greedy Approximation: Set Cover Thursday, Sep 28, 2017

CMSC 451: Lecture 9 Greedy Approximation: Set Cover Thursday, Sep 28, 2017 CMSC 451: Leture 9 Greedy Approximation: Set Cover Thursday, Sep 28, 2017 Reading: Chapt 11 of KT and Set 54 of DPV Set Cover: An important lass of optimization problems involves overing a ertain domain,

More information

The Effectiveness of the Linear Hull Effect

The Effectiveness of the Linear Hull Effect The Effetiveness of the Linear Hull Effet S. Murphy Tehnial Report RHUL MA 009 9 6 Otober 009 Department of Mathematis Royal Holloway, University of London Egham, Surrey TW0 0EX, England http://www.rhul.a.uk/mathematis/tehreports

More information

A Spatiotemporal Approach to Passive Sound Source Localization

A Spatiotemporal Approach to Passive Sound Source Localization A Spatiotemporal Approah Passive Sound Soure Loalization Pasi Pertilä, Mikko Parviainen, Teemu Korhonen and Ari Visa Institute of Signal Proessing Tampere University of Tehnology, P.O.Box 553, FIN-330,

More information

Lightpath routing for maximum reliability in optical mesh networks

Lightpath routing for maximum reliability in optical mesh networks Vol. 7, No. 5 / May 2008 / JOURNAL OF OPTICAL NETWORKING 449 Lightpath routing for maximum reliability in optial mesh networks Shengli Yuan, 1, * Saket Varma, 2 and Jason P. Jue 2 1 Department of Computer

More information

23.1 Tuning controllers, in the large view Quoting from Section 16.7:

23.1 Tuning controllers, in the large view Quoting from Section 16.7: Lesson 23. Tuning a real ontroller - modeling, proess identifiation, fine tuning 23.0 Context We have learned to view proesses as dynami systems, taking are to identify their input, intermediate, and output

More information

CSC2515 Winter 2015 Introduc3on to Machine Learning. Lecture 5: Clustering, mixture models, and EM

CSC2515 Winter 2015 Introduc3on to Machine Learning. Lecture 5: Clustering, mixture models, and EM CSC2515 Winter 2015 Introdu3on to Mahine Learning Leture 5: Clustering, mixture models, and EM All leture slides will be available as.pdf on the ourse website: http://www.s.toronto.edu/~urtasun/ourses/csc2515/

More information

Estimating the probability law of the codelength as a function of the approximation error in image compression

Estimating the probability law of the codelength as a function of the approximation error in image compression Estimating the probability law of the odelength as a funtion of the approximation error in image ompression François Malgouyres Marh 7, 2007 Abstrat After some reolletions on ompression of images using

More information

Lecture 7: Sampling/Projections for Least-squares Approximation, Cont. 7 Sampling/Projections for Least-squares Approximation, Cont.

Lecture 7: Sampling/Projections for Least-squares Approximation, Cont. 7 Sampling/Projections for Least-squares Approximation, Cont. Stat60/CS94: Randomized Algorithms for Matries and Data Leture 7-09/5/013 Leture 7: Sampling/Projetions for Least-squares Approximation, Cont. Leturer: Mihael Mahoney Sribe: Mihael Mahoney Warning: these

More information

Packing Plane Spanning Trees into a Point Set

Packing Plane Spanning Trees into a Point Set Paking Plane Spanning Trees into a Point Set Ahmad Biniaz Alfredo Garía Abstrat Let P be a set of n points in the plane in general position. We show that at least n/3 plane spanning trees an be paked into

More information

Stochastic Combinatorial Optimization with Risk Evdokia Nikolova

Stochastic Combinatorial Optimization with Risk Evdokia Nikolova Computer Siene and Artifiial Intelligene Laboratory Tehnial Report MIT-CSAIL-TR-2008-055 September 13, 2008 Stohasti Combinatorial Optimization with Risk Evdokia Nikolova massahusetts institute of tehnology,

More information

11.1 Polynomial Least-Squares Curve Fit

11.1 Polynomial Least-Squares Curve Fit 11.1 Polynomial Least-Squares Curve Fit A. Purpose This subroutine determines a univariate polynomial that fits a given disrete set of data in the sense of minimizing the weighted sum of squares of residuals.

More information

Coding for Random Projections and Approximate Near Neighbor Search

Coding for Random Projections and Approximate Near Neighbor Search Coding for Random Projetions and Approximate Near Neighbor Searh Ping Li Department of Statistis & Biostatistis Department of Computer Siene Rutgers University Pisataay, NJ 8854, USA pingli@stat.rutgers.edu

More information

The Hanging Chain. John McCuan. January 19, 2006

The Hanging Chain. John McCuan. January 19, 2006 The Hanging Chain John MCuan January 19, 2006 1 Introdution We onsider a hain of length L attahed to two points (a, u a and (b, u b in the plane. It is assumed that the hain hangs in the plane under a

More information

Counting Idempotent Relations

Counting Idempotent Relations Counting Idempotent Relations Beriht-Nr. 2008-15 Florian Kammüller ISSN 1436-9915 2 Abstrat This artile introdues and motivates idempotent relations. It summarizes haraterizations of idempotents and their

More information

Lecture 3 - Lorentz Transformations

Lecture 3 - Lorentz Transformations Leture - Lorentz Transformations A Puzzle... Example A ruler is positioned perpendiular to a wall. A stik of length L flies by at speed v. It travels in front of the ruler, so that it obsures part of the

More information

Product Policy in Markets with Word-of-Mouth Communication. Technical Appendix

Product Policy in Markets with Word-of-Mouth Communication. Technical Appendix rodut oliy in Markets with Word-of-Mouth Communiation Tehnial Appendix August 05 Miro-Model for Inreasing Awareness In the paper, we make the assumption that awareness is inreasing in ustomer type. I.e.,

More information

7 Max-Flow Problems. Business Computing and Operations Research 608

7 Max-Flow Problems. Business Computing and Operations Research 608 7 Max-Flow Problems Business Computing and Operations Researh 68 7. Max-Flow Problems In what follows, we onsider a somewhat modified problem onstellation Instead of osts of transmission, vetor now indiates

More information

Assessing the Performance of a BCI: A Task-Oriented Approach

Assessing the Performance of a BCI: A Task-Oriented Approach Assessing the Performane of a BCI: A Task-Oriented Approah B. Dal Seno, L. Mainardi 2, M. Matteui Department of Eletronis and Information, IIT-Unit, Politenio di Milano, Italy 2 Department of Bioengineering,

More information

UTC. Engineering 329. Proportional Controller Design. Speed System. John Beverly. Green Team. John Beverly Keith Skiles John Barker.

UTC. Engineering 329. Proportional Controller Design. Speed System. John Beverly. Green Team. John Beverly Keith Skiles John Barker. UTC Engineering 329 Proportional Controller Design for Speed System By John Beverly Green Team John Beverly Keith Skiles John Barker 24 Mar 2006 Introdution This experiment is intended test the variable

More information

MultiPhysics Analysis of Trapped Field in Multi-Layer YBCO Plates

MultiPhysics Analysis of Trapped Field in Multi-Layer YBCO Plates Exerpt from the Proeedings of the COMSOL Conferene 9 Boston MultiPhysis Analysis of Trapped Field in Multi-Layer YBCO Plates Philippe. Masson Advaned Magnet Lab *7 Main Street, Bldg. #4, Palm Bay, Fl-95,

More information

Sensor management for PRF selection in the track-before-detect context

Sensor management for PRF selection in the track-before-detect context Sensor management for PRF seletion in the tra-before-detet ontext Fotios Katsilieris, Yvo Boers, and Hans Driessen Thales Nederland B.V. Haasbergerstraat 49, 7554 PA Hengelo, the Netherlands Email: {Fotios.Katsilieris,

More information

Supplementary Materials

Supplementary Materials Supplementary Materials Neural population partitioning and a onurrent brain-mahine interfae for sequential motor funtion Maryam M. Shanehi, Rollin C. Hu, Marissa Powers, Gregory W. Wornell, Emery N. Brown

More information

Probabilistic and nondeterministic aspects of Anonymity 1

Probabilistic and nondeterministic aspects of Anonymity 1 MFPS XX1 Preliminary Version Probabilisti and nondeterministi aspets of Anonymity 1 Catusia Palamidessi 2 INRIA and LIX Éole Polytehnique, Rue de Salay, 91128 Palaiseau Cedex, FRANCE Abstrat Anonymity

More information

IMPEDANCE EFFECTS OF LEFT TURNERS FROM THE MAJOR STREET AT A TWSC INTERSECTION

IMPEDANCE EFFECTS OF LEFT TURNERS FROM THE MAJOR STREET AT A TWSC INTERSECTION 09-1289 Citation: Brilon, W. (2009): Impedane Effets of Left Turners from the Major Street at A TWSC Intersetion. Transportation Researh Reord Nr. 2130, pp. 2-8 IMPEDANCE EFFECTS OF LEFT TURNERS FROM THE

More information

Where as discussed previously we interpret solutions to this partial differential equation in the weak sense: b

Where as discussed previously we interpret solutions to this partial differential equation in the weak sense: b Consider the pure initial value problem for a homogeneous system of onservation laws with no soure terms in one spae dimension: Where as disussed previously we interpret solutions to this partial differential

More information

A NONLILEAR CONTROLLER FOR SHIP AUTOPILOTS

A NONLILEAR CONTROLLER FOR SHIP AUTOPILOTS Vietnam Journal of Mehanis, VAST, Vol. 4, No. (), pp. A NONLILEAR CONTROLLER FOR SHIP AUTOPILOTS Le Thanh Tung Hanoi University of Siene and Tehnology, Vietnam Abstrat. Conventional ship autopilots are

More information

Multicomponent analysis on polluted waters by means of an electronic tongue

Multicomponent analysis on polluted waters by means of an electronic tongue Sensors and Atuators B 44 (1997) 423 428 Multiomponent analysis on polluted waters by means of an eletroni tongue C. Di Natale a, *, A. Maagnano a, F. Davide a, A. D Amio a, A. Legin b, Y. Vlasov b, A.

More information

FINITE WORD LENGTH EFFECTS IN DSP

FINITE WORD LENGTH EFFECTS IN DSP FINITE WORD LENGTH EFFECTS IN DSP PREPARED BY GUIDED BY Snehal Gor Dr. Srianth T. ABSTRACT We now that omputers store numbers not with infinite preision but rather in some approximation that an be paed

More information

State Diagrams. Margaret M. Fleck. 14 November 2011

State Diagrams. Margaret M. Fleck. 14 November 2011 State Diagrams Margaret M. Flek 14 November 2011 These notes over state diagrams. 1 Introdution State diagrams are a type of direted graph, in whih the graph nodes represent states and labels on the graph

More information

Likelihood-confidence intervals for quantiles in Extreme Value Distributions

Likelihood-confidence intervals for quantiles in Extreme Value Distributions Likelihood-onfidene intervals for quantiles in Extreme Value Distributions A. Bolívar, E. Díaz-Franés, J. Ortega, and E. Vilhis. Centro de Investigaión en Matemátias; A.P. 42, Guanajuato, Gto. 36; Méxio

More information

A NETWORK SIMPLEX ALGORITHM FOR THE MINIMUM COST-BENEFIT NETWORK FLOW PROBLEM

A NETWORK SIMPLEX ALGORITHM FOR THE MINIMUM COST-BENEFIT NETWORK FLOW PROBLEM NETWORK SIMPLEX LGORITHM FOR THE MINIMUM COST-BENEFIT NETWORK FLOW PROBLEM Cen Çalışan, Utah Valley University, 800 W. University Parway, Orem, UT 84058, 801-863-6487, en.alisan@uvu.edu BSTRCT The minimum

More information

Developing Excel Macros for Solving Heat Diffusion Problems

Developing Excel Macros for Solving Heat Diffusion Problems Session 50 Developing Exel Maros for Solving Heat Diffusion Problems N. N. Sarker and M. A. Ketkar Department of Engineering Tehnology Prairie View A&M University Prairie View, TX 77446 Abstrat This paper

More information

Sufficient Conditions for a Flexible Manufacturing System to be Deadlocked

Sufficient Conditions for a Flexible Manufacturing System to be Deadlocked Paper 0, INT 0 Suffiient Conditions for a Flexile Manufaturing System to e Deadloked Paul E Deering, PhD Department of Engineering Tehnology and Management Ohio University deering@ohioedu Astrat In reent

More information

15.12 Applications of Suffix Trees

15.12 Applications of Suffix Trees 248 Algorithms in Bioinformatis II, SoSe 07, ZBIT, D. Huson, May 14, 2007 15.12 Appliations of Suffix Trees 1. Searhing for exat patterns 2. Minimal unique substrings 3. Maximum unique mathes 4. Maximum

More information

General Equilibrium. What happens to cause a reaction to come to equilibrium?

General Equilibrium. What happens to cause a reaction to come to equilibrium? General Equilibrium Chemial Equilibrium Most hemial reations that are enountered are reversible. In other words, they go fairly easily in either the forward or reverse diretions. The thing to remember

More information

Model-based mixture discriminant analysis an experimental study

Model-based mixture discriminant analysis an experimental study Model-based mixture disriminant analysis an experimental study Zohar Halbe and Mayer Aladjem Department of Eletrial and Computer Engineering, Ben-Gurion University of the Negev P.O.Box 653, Beer-Sheva,

More information

Advances in Radio Science

Advances in Radio Science Advanes in adio Siene 2003) 1: 99 104 Copernius GmbH 2003 Advanes in adio Siene A hybrid method ombining the FDTD and a time domain boundary-integral equation marhing-on-in-time algorithm A Beker and V

More information

Robust Recovery of Signals From a Structured Union of Subspaces

Robust Recovery of Signals From a Structured Union of Subspaces Robust Reovery of Signals From a Strutured Union of Subspaes 1 Yonina C. Eldar, Senior Member, IEEE and Moshe Mishali, Student Member, IEEE arxiv:87.4581v2 [nlin.cg] 3 Mar 29 Abstrat Traditional sampling

More information

DIGITAL DISTANCE RELAYING SCHEME FOR PARALLEL TRANSMISSION LINES DURING INTER-CIRCUIT FAULTS

DIGITAL DISTANCE RELAYING SCHEME FOR PARALLEL TRANSMISSION LINES DURING INTER-CIRCUIT FAULTS CHAPTER 4 DIGITAL DISTANCE RELAYING SCHEME FOR PARALLEL TRANSMISSION LINES DURING INTER-CIRCUIT FAULTS 4.1 INTRODUCTION Around the world, environmental and ost onsiousness are foring utilities to install

More information

Tests of fit for symmetric variance gamma distributions

Tests of fit for symmetric variance gamma distributions Tests of fit for symmetri variane gamma distributions Fragiadakis Kostas UADPhilEon, National and Kapodistrian University of Athens, 4 Euripidou Street, 05 59 Athens, Greee. Keywords: Variane Gamma Distribution,

More information

AC : A GRAPHICAL USER INTERFACE (GUI) FOR A UNIFIED APPROACH FOR CONTINUOUS-TIME COMPENSATOR DESIGN

AC : A GRAPHICAL USER INTERFACE (GUI) FOR A UNIFIED APPROACH FOR CONTINUOUS-TIME COMPENSATOR DESIGN AC 28-1986: A GRAPHICAL USER INTERFACE (GUI) FOR A UNIFIED APPROACH FOR CONTINUOUS-TIME COMPENSATOR DESIGN Minh Cao, Wihita State University Minh Cao ompleted his Bahelor s of Siene degree at Wihita State

More information

Planning with Uncertainty in Position: an Optimal Planner

Planning with Uncertainty in Position: an Optimal Planner Planning with Unertainty in Position: an Optimal Planner Juan Pablo Gonzalez Anthony (Tony) Stentz CMU-RI -TR-04-63 The Robotis Institute Carnegie Mellon University Pittsburgh, Pennsylvania 15213 Otober

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilisti Graphial Models David Sontag New York University Leture 12, April 19, 2012 Aknowledgement: Partially based on slides by Eri Xing at CMU and Andrew MCallum at UMass Amherst David Sontag (NYU)

More information

Viewing the Rings of a Tree: Minimum Distortion Embeddings into Trees

Viewing the Rings of a Tree: Minimum Distortion Embeddings into Trees Viewing the Rings of a Tree: Minimum Distortion Embeddings into Trees Amir Nayyeri Benjamin Raihel Abstrat We desribe a 1+ε) approximation algorithm for finding the minimum distortion embedding of an n-point

More information

Aharonov-Bohm effect. Dan Solomon.

Aharonov-Bohm effect. Dan Solomon. Aharonov-Bohm effet. Dan Solomon. In the figure the magneti field is onfined to a solenoid of radius r 0 and is direted in the z- diretion, out of the paper. The solenoid is surrounded by a barrier that

More information

Remark 4.1 Unlike Lyapunov theorems, LaSalle s theorem does not require the function V ( x ) to be positive definite.

Remark 4.1 Unlike Lyapunov theorems, LaSalle s theorem does not require the function V ( x ) to be positive definite. Leture Remark 4.1 Unlike Lyapunov theorems, LaSalle s theorem does not require the funtion V ( x ) to be positive definite. ost often, our interest will be to show that x( t) as t. For that we will need

More information

Advanced Computational Fluid Dynamics AA215A Lecture 4

Advanced Computational Fluid Dynamics AA215A Lecture 4 Advaned Computational Fluid Dynamis AA5A Leture 4 Antony Jameson Winter Quarter,, Stanford, CA Abstrat Leture 4 overs analysis of the equations of gas dynamis Contents Analysis of the equations of gas

More information

THE METHOD OF SECTIONING WITH APPLICATION TO SIMULATION, by Danie 1 Brent ~~uffman'i

THE METHOD OF SECTIONING WITH APPLICATION TO SIMULATION, by Danie 1 Brent ~~uffman'i THE METHOD OF SECTIONING '\ WITH APPLICATION TO SIMULATION, I by Danie 1 Brent ~~uffman'i Thesis submitted to the Graduate Faulty of the Virginia Polytehni Institute and State University in partial fulfillment

More information

The Second Postulate of Euclid and the Hyperbolic Geometry

The Second Postulate of Euclid and the Hyperbolic Geometry 1 The Seond Postulate of Eulid and the Hyperboli Geometry Yuriy N. Zayko Department of Applied Informatis, Faulty of Publi Administration, Russian Presidential Aademy of National Eonomy and Publi Administration,

More information

c-perfect Hashing Schemes for Binary Trees, with Applications to Parallel Memories

c-perfect Hashing Schemes for Binary Trees, with Applications to Parallel Memories -Perfet Hashing Shemes for Binary Trees, with Appliations to Parallel Memories (Extended Abstrat Gennaro Cordaso 1, Alberto Negro 1, Vittorio Sarano 1, and Arnold L.Rosenberg 2 1 Dipartimento di Informatia

More information

arxiv:cond-mat/ v1 [cond-mat.stat-mech] 16 Aug 2004

arxiv:cond-mat/ v1 [cond-mat.stat-mech] 16 Aug 2004 Computational omplexity and fundamental limitations to fermioni quantum Monte Carlo simulations arxiv:ond-mat/0408370v1 [ond-mat.stat-meh] 16 Aug 2004 Matthias Troyer, 1 Uwe-Jens Wiese 2 1 Theoretishe

More information

Searching All Approximate Covers and Their Distance using Finite Automata

Searching All Approximate Covers and Their Distance using Finite Automata Searhing All Approximate Covers and Their Distane using Finite Automata Ondřej Guth, Bořivoj Melihar, and Miroslav Balík České vysoké učení tehniké v Praze, Praha, CZ, {gutho1,melihar,alikm}@fel.vut.z

More information

Evaluation of effect of blade internal modes on sensitivity of Advanced LIGO

Evaluation of effect of blade internal modes on sensitivity of Advanced LIGO Evaluation of effet of blade internal modes on sensitivity of Advaned LIGO T0074-00-R Norna A Robertson 5 th Otober 00. Introdution The urrent model used to estimate the isolation ahieved by the quadruple

More information

Singular Event Detection

Singular Event Detection Singular Event Detetion Rafael S. Garía Eletrial Engineering University of Puerto Rio at Mayagüez Rafael.Garia@ee.uprm.edu Faulty Mentor: S. Shankar Sastry Researh Supervisor: Jonathan Sprinkle Graduate

More information

An Adaptive Optimization Approach to Active Cancellation of Repeated Transient Vibration Disturbances

An Adaptive Optimization Approach to Active Cancellation of Repeated Transient Vibration Disturbances An aptive Optimization Approah to Ative Canellation of Repeated Transient Vibration Disturbanes David L. Bowen RH Lyon Corp / Aenteh, 33 Moulton St., Cambridge, MA 138, U.S.A., owen@lyonorp.om J. Gregory

More information

Maximum Likelihood Multipath Estimation in Comparison with Conventional Delay Lock Loops

Maximum Likelihood Multipath Estimation in Comparison with Conventional Delay Lock Loops Maximum Likelihood Multipath Estimation in Comparison with Conventional Delay Lok Loops Mihael Lentmaier and Bernhard Krah, German Aerospae Center (DLR) BIOGRAPY Mihael Lentmaier reeived the Dipl.-Ing.

More information

EE 321 Project Spring 2018

EE 321 Project Spring 2018 EE 21 Projet Spring 2018 This ourse projet is intended to be an individual effort projet. The student is required to omplete the work individually, without help from anyone else. (The student may, however,

More information

On Component Order Edge Reliability and the Existence of Uniformly Most Reliable Unicycles

On Component Order Edge Reliability and the Existence of Uniformly Most Reliable Unicycles Daniel Gross, Lakshmi Iswara, L. William Kazmierzak, Kristi Luttrell, John T. Saoman, Charles Suffel On Component Order Edge Reliability and the Existene of Uniformly Most Reliable Uniyles DANIEL GROSS

More information

Modes are solutions, of Maxwell s equation applied to a specific device.

Modes are solutions, of Maxwell s equation applied to a specific device. Mirowave Integrated Ciruits Prof. Jayanta Mukherjee Department of Eletrial Engineering Indian Institute of Tehnology, Bombay Mod 01, Le 06 Mirowave omponents Welome to another module of this NPTEL mok

More information

Normative and descriptive approaches to multiattribute decision making

Normative and descriptive approaches to multiattribute decision making De. 009, Volume 8, No. (Serial No.78) China-USA Business Review, ISSN 57-54, USA Normative and desriptive approahes to multiattribute deision making Milan Terek (Department of Statistis, University of

More information

Quasi-Monte Carlo Algorithms for unbounded, weighted integration problems

Quasi-Monte Carlo Algorithms for unbounded, weighted integration problems Quasi-Monte Carlo Algorithms for unbounded, weighted integration problems Jürgen Hartinger Reinhold F. Kainhofer Robert F. Tihy Department of Mathematis, Graz University of Tehnology, Steyrergasse 30,

More information

On the Bit Error Probability of Noisy Channel Networks With Intermediate Node Encoding I. INTRODUCTION

On the Bit Error Probability of Noisy Channel Networks With Intermediate Node Encoding I. INTRODUCTION 5188 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 54, NO. 11, NOVEMBER 2008 [8] A. P. Dempster, N. M. Laird, and D. B. Rubin, Maximum likelihood estimation from inomplete data via the EM algorithm, J.

More information

Wavetech, LLC. Ultrafast Pulses and GVD. John O Hara Created: Dec. 6, 2013

Wavetech, LLC. Ultrafast Pulses and GVD. John O Hara Created: Dec. 6, 2013 Ultrafast Pulses and GVD John O Hara Created: De. 6, 3 Introdution This doument overs the basi onepts of group veloity dispersion (GVD) and ultrafast pulse propagation in an optial fiber. Neessarily, it

More information

Reliability Guaranteed Energy-Aware Frame-Based Task Set Execution Strategy for Hard Real-Time Systems

Reliability Guaranteed Energy-Aware Frame-Based Task Set Execution Strategy for Hard Real-Time Systems Reliability Guaranteed Energy-Aware Frame-Based ask Set Exeution Strategy for Hard Real-ime Systems Zheng Li a, Li Wang a, Shuhui Li a, Shangping Ren a, Gang Quan b a Illinois Institute of ehnology, Chiago,

More information

Analysis of discretization in the direct simulation Monte Carlo

Analysis of discretization in the direct simulation Monte Carlo PHYSICS OF FLUIDS VOLUME 1, UMBER 1 OCTOBER Analysis of disretization in the diret simulation Monte Carlo iolas G. Hadjionstantinou a) Department of Mehanial Engineering, Massahusetts Institute of Tehnology,

More information

A Variational Definition for Limit and Derivative

A Variational Definition for Limit and Derivative Amerian Journal of Applied Mathematis 2016; 4(3): 137-141 http://www.sienepublishinggroup.om/j/ajam doi: 10.11648/j.ajam.20160403.14 ISSN: 2330-0043 (Print); ISSN: 2330-006X (Online) A Variational Definition

More information

Moments and Wavelets in Signal Estimation

Moments and Wavelets in Signal Estimation Moments and Wavelets in Signal Estimation Edward J. Wegman 1 Center for Computational Statistis George Mason University Hung T. Le 2 International usiness Mahines Abstrat: The problem of generalized nonparametri

More information

UPPER-TRUNCATED POWER LAW DISTRIBUTIONS

UPPER-TRUNCATED POWER LAW DISTRIBUTIONS Fratals, Vol. 9, No. (00) 09 World Sientifi Publishing Company UPPER-TRUNCATED POWER LAW DISTRIBUTIONS STEPHEN M. BURROUGHS and SARAH F. TEBBENS College of Marine Siene, University of South Florida, St.

More information

10.5 Unsupervised Bayesian Learning

10.5 Unsupervised Bayesian Learning The Bayes Classifier Maximum-likelihood methods: Li Yu Hongda Mao Joan Wang parameter vetor is a fixed but unknown value Bayes methods: parameter vetor is a random variable with known prior distribution

More information

3 Tidal systems modelling: ASMITA model

3 Tidal systems modelling: ASMITA model 3 Tidal systems modelling: ASMITA model 3.1 Introdution For many pratial appliations, simulation and predition of oastal behaviour (morphologial development of shorefae, beahes and dunes) at a ertain level

More information

Relativistic Dynamics

Relativistic Dynamics Chapter 7 Relativisti Dynamis 7.1 General Priniples of Dynamis 7.2 Relativisti Ation As stated in Setion A.2, all of dynamis is derived from the priniple of least ation. Thus it is our hore to find a suitable

More information

The universal model of error of active power measuring channel

The universal model of error of active power measuring channel 7 th Symposium EKO TC 4 3 rd Symposium EKO TC 9 and 5 th WADC Workshop nstrumentation for the CT Era Sept. 8-2 Kosie Slovakia The universal model of error of ative power measuring hannel Boris Stogny Evgeny

More information

Combined Electric and Magnetic Dipoles for Mesoband Radiation, Part 2

Combined Electric and Magnetic Dipoles for Mesoband Radiation, Part 2 Sensor and Simulation Notes Note 53 3 May 8 Combined Eletri and Magneti Dipoles for Mesoband Radiation, Part Carl E. Baum University of New Mexio Department of Eletrial and Computer Engineering Albuquerque

More information

Exploring the feasibility of on-site earthquake early warning using close-in records of the 2007 Noto Hanto earthquake

Exploring the feasibility of on-site earthquake early warning using close-in records of the 2007 Noto Hanto earthquake Exploring the feasibility of on-site earthquake early warning using lose-in reords of the 2007 Noto Hanto earthquake Yih-Min Wu 1 and Hiroo Kanamori 2 1. Department of Geosienes, National Taiwan University,

More information

Tight bounds for selfish and greedy load balancing

Tight bounds for selfish and greedy load balancing Tight bounds for selfish and greedy load balaning Ioannis Caragiannis Mihele Flammini Christos Kaklamanis Panagiotis Kanellopoulos Lua Mosardelli Deember, 009 Abstrat We study the load balaning problem

More information

Bilinear Formulated Multiple Kernel Learning for Multi-class Classification Problem

Bilinear Formulated Multiple Kernel Learning for Multi-class Classification Problem Bilinear Formulated Multiple Kernel Learning for Multi-lass Classifiation Problem Takumi Kobayashi and Nobuyuki Otsu National Institute of Advaned Industrial Siene and Tehnology, -- Umezono, Tsukuba, Japan

More information

Bending resistance of high performance concrete elements

Bending resistance of high performance concrete elements High Performane Strutures and Materials IV 89 Bending resistane of high performane onrete elements D. Mestrovi 1 & L. Miulini 1 Faulty of Civil Engineering, University of Zagreb, Croatia Faulty of Civil

More information