An Analysis of Laplacian Methods for Value Function Approximation in MDPs
|
|
- Marsha Daniels
- 6 years ago
- Views:
Transcription
1 An Analysis of Laplacian Methods for Value Function Approximation in MDPs Marek Petrik Department of Computer Science University of Massachusetts Amherst, MA 3 petrik@cs.umass.edu Abstract Recently, a method based on Laplacian eigenfunctions was proposed to automatically construct a basis for value function approximation in MDPs. We show that its success may be explained by drawing a connection between the spectrum of the Laplacian and the value function of the MDP. This explanation helps us to identify more precisely the conditions that this method requires to achieve good performance. Based on this, we propose a modification of the Laplacian method for which we derive an analytical bound on the approximation error. Further, we show that the method is related the augmented Krylov methods, commonly used to solve sparse linear systems. Finally, we empirically demonstrate that in basis construction the augmented Krylov methods may significantly outperform the Laplacian methods in terms of both speed and quality. 1 Introduction Markov Decision Processes (MDP) [Puterman, 2] are a widely-used framework for planning under uncertainty. In this paper, we focus on the discounted infinite horizon problem with discount γ such that < γ < 1. We also assume finite state and action spaces. While solving this problem requires only polynomial time, many practical problems are too large to be solved precisely. This motivated the development of approximate methods for solving very large MDPs that are sparse and structured. Value Function Approximation (VFA) is a method for finding approximate solutions for MDPs, which has received a lot of attention [Bertsekas and Tsitsiklis, 1996]. Linear approximation is a popular VFA method, because it is simple to analyze and use. The representation of the value function in linear schemes is a linear combination of basis vectors. The optimal policy is usually calculated using approximate value iteration or approximate linear programming [Bertsekas and Tsitsiklis, 1996]. The choice of the basis plays an important role in solving the problem. Usually, the basis used to represent the space is hand-crafted using human insight about some topological properties of the problem [Sutton and Barto, 1998]. Recently, a new framework for automatically constructing the basis was proposed in [Mahadevan, 2]. This approach is based on analysis of the neighborhood relation among the states. The analysis is inspired by spectral methods, often used in machine-learning tasks. We describe this framework in more detail in Section 2. In the following, we use v to denote the value function, and r to denote the reward vector. The matrix I denotes an identity matrix of an appropriate size. We use n to denote the number of states of the MDP. An important property of many VFA methods is that they are guaranteed to converge to an approximately optimal solution. In methods based on approximate policy iteration the maximal distance of the optimal approximate value function ˆv from the optimal value function v is bounded by [Munos, 23]: lim sup v 2γ ˆv µ k (1 γ) 2 sup v k ṽ k µk, k where v k and ṽ k are true and approximated value at step k given the current policy. The norm µ denotes a weighted quadratic norm with distribution µ. The distribution µ is arbitrary, and µ k depends on µ and the current transition matrix [Munos, 23]. Similar bounds hold for algorithms based on Bellman residual minimization, when v k ṽ k is replaced by r (I γp πk )ṽ k, where P πk is a transition matrix of the current policy. In addition, these bounds hold also for the max norm [Bertsekas and Tsitsiklis, 1996]. In the following, we refer to the value function v k and its approximation ṽ k as v and ṽ respectively, without using the index k. The main focus of the paper is the construction a good basis for algorithms that minimize v ṽ 2 in each iteration. We focus only on the quadratic approximation bound, because the other bounds are related. Notice that 2. The paper is organized as follows. Section 2 describes the spectral methods for VFA in greater detail. The main contribution of the paper is an explanation of the good performance of spectral methods for VFA and its connection to methods for solving sparse linear systems. This is described in Section 3, where we also propose two new alternative algorithms. In Section 4, we show theoretical error bounds of one of the methods, and then in Section 6 we demonstrate that our method may significantly outperform the previously proposed spectral methods.
2 2 Proto-Value Functions In this section, we describe in greater detail the proto-value function methods proposed in [Mahadevan, 2]. These methods use the spectral graph framework to construct a basis for linear VFA that respects the topology of the problem. The states of the MDP for a fixed policy represent nodes of an undirected weighted graph (N, E). This graph may be represented by a symmetric adjacency matrix W, where W xy represents the weight between the nodes. A diagonal matrix of the node degrees is denoted by D and defined as D xx = y N W xy. There are several definitions of the graph Laplacian, with the most common being the normalized Laplacian, defined for W as L = I D 1 2 W D 1 2. The advantage of the normalized Laplacian is that it is symmetric, and thus its eigenvectors are orthogonal. Its use is closely related to the random walk Laplacian L r = I D 1 W and the combinatorial Laplacian L c = D W, which are commonly used to motivate the use of the Laplacian by making a connection to random walks on graphs [Chung, 1997]. A function f on a graph (N, E) is a mapping from each vertex to R. One well-defined measure of smoothness of a function on a graph is its Sobolev norm. The bottom eigenvectors of L c may be seen as a good approximation to smooth functions, that is, functions with low Sobolev norm [Chung, 1997]. The application of the spectral framework to VFA is straightforward. The value function may be seen as a function on the graph of nodes that correspond to states. The edge weight between two nodes is determined by the probability of transiting either way between the corresponding states, given a fixed policy. Usually, the weight is 1 if the transition is possible either way and otherwise. Notice that these weights must be symmetric, unlike the transition probabilities. If we assume the value function is smooth on the induced graph, the spectral graph framework should lead to good results. While the adjacency matrix with edge weights of 1 was usually used before, [Mahadevan, 2] also discusses other schemes. While the approach is interesting, it suffers from some problems. In our opinion, the good performance of this method has not been sufficiently well explained, because the construction of the adjacency matrix is not well motivated. In addition, the construction of the adjacency matrix makes the method very hard to analyze. As we show later, using the actual transition matrix instead of the adjacency matrix leads to a better motivated algorithm, which is easy to derive and analyze. The requirement of the value function being smooth was partially resolved by using diffusion wavelets in [Mahadevan and Maggioni, 2; Maggioni and Mahadevan, 26]. Briefly, diffusion wavelets construct a low-order approximation of the inverse of a matrix. An disadvantage of using the wavelets is the high computational overhead needed to construct the inverse approximation. The advantage is, however, that once the inverse is constructed it may be reused for different rewards as long as the transition matrix is fixed. Thus, we do not compare our approach to diffusion wavelets due to the different goals and computational complexities of the two methods. 3 Analysis In this section we show that if we use the actual transition matrix instead of the random walk Laplacian of the adjacency matrix, the method may be well justified and analyzed. We assume in the following that the transition matrix P and the reward function r are available. It is reasonable to explain the performance using P instead of L r, because in the past applications P was usually very similar to I L r. This is because the transition in these problems were symmetric, and the adjacency matrix was based on a random policy. Notice that the eigenvectors of L r and I L r are the same. Moreover, if λ is an eigenvalue of L r, then 1 λ is an eigenvalue of I L r. 3.1 Spectral Approximation Assuming that P is the transition matrix for a fixed Markov policy, we can express the value function as [Puterman, 2]: v = (I γp ) 1 r = (γp ) i r. (3.1) i= The second equality follows from the Neumann series expansion. In fact, synchronous backups in the modified policy iteration calculate this series adding one term in each iteration. We assume that the transition matrix is diagonalizable. The analysis for non-diagonalizable matrices would be similar, but would have to use Jordan decomposition. Let x 1... x n be the eigenvectors of P with corresponding eigenvalues λ 1... λ n. Moreover, without loss of generality, x j 2 = 1 for all j. Since the matrix is diagonalizable, x 1... x n are linearly independent, and we can decompose the reward as: r = c j x j, j=1 for some c 1... c n. Using P i x j = λ i j x j, we have v = j=1 i= (γλ j ) i c j x j = j=1 1 1 γλ j c j x j = d j x j. j=1 Considering a subset U of eigenvectors x j as a basis will lead to the following bound on approximation error: v ṽ 2 j / U d j. (3.2) Therefore, the value function may be well approximated by considering those x j with greatest d j. Assuming that all c j are equal, then the best choice to minimize the bound (3.2) is to consider the eigenvectors with high λ j. This is identical to taking the low order eigenvectors of the random walk Laplacian, as proposed in the spectral proto-vfa framework, only that the transition matrix is used instead. Using the analysis above, we propose a new algorithm in Subsection Krylov Methods There are also other well-motivated base choices besides the eigenvectors. Another choice is to use some of the vectors in the Neumann series (3.1). We denote these vectors as
3 y i = P i r for all i, ). Calculating the value function by progressively adding all vectors in the series, as done in the modified policy iteration, potentially requires an infinite number of iterations. However, since the linear VFA methods consider any linear combination of the basis vectors, we need at most n linearly independent vectors from the sequence {y i } i=. Then it is preferable to choose those y i with small i, since these are simple to calculate. Interestingly, it can be shown that we just need y... y m 1 to represent any value function, where m is the degree of the minimal polynomial [Ipsen and Meyer, 1998] of (I γp ). Moreover, even taking fewer vectors than that may be a good choice in many cases. To show that y... y m 1 vectors are sufficient to precisely represent any value function, let the minimal polynomial be: p(a) = m i= α ia i. Then let B = 1 m 1 α i= α i+1 A i. By algebraic multiplication, we have BA = I. Having this, the value function may be represented as: v = Br = 1 m 1 α i= α i+1 (I γp ) i r = m 1 i= β i y i, for some β i. A more rigorous derivation may be found for example in [Ipsen and Meyer, 1998; Golub and Loan, 1996]. The space spanned by y... y m 1 is known as Krylov space (or Krylov subspace), denoted as K(P, r). It has been previously used in a variety of numerical methods, such as GM- RES, or Lancoz, and Arnoldi [Golub and Loan, 1996]. It is also common to combine the use of eigenvectors and Krylov space, what is known as an augmented Krylov methods [Saad, 1997]. These methods actually subsume the methods based on simply considering the largest eigenvectors of P. We discuss this method in Subsection Algorithms In this section we propose two algorithms based on the previous analysis for constructing a good basis for VFA. These algorithms deal only with the selection of the basis and they can be arbitrarily incorporated into any algorithm based on approximate policy iteration. The first algorithm, which we refer to as the weighted spectral method, is to form the basis from the eigenvectors of the transition matrix. This is in contrast with the method presented in [Mahadevan, 2], which uses any of the Laplacians. Moreover, we propose to choose the vectors with greatest d j value, in contrast to choosing the ones with largest λ j. The reason is that this leads to minimization of the bound in (3.2). A practical implementation of this algorithm faces some major obstacles. One issue is that there is no standard eigenvector solver that can efficiently calculate the top eigenvectors with regard to d j of a sparse matrix. However, such a method may be developed in the future. In addition, when the transition matrix is not diagonalizable, we need to calculate the Jordan decomposition, which is very time-consuming and unstable. Finally, some of the eigenvectors and eigenvalues Require: P, r, k - number of eigenvectors in the basis, l - total number of vectors Let z 1... z k be the top real eigenvectors of P z k+1 r for i 1... l + k do if i > k + 1 then z i P z i 1 end if for j 1... (i 1) do z i z i z j, z i z j end for if z i then break end if z i 1 z z i i end for Figure 1: Augmented Krylov method for basis construction. may contain complex numbers, what would require a revision of the approximate policy iteration algorithms. If these issues were resolved, this may be a viable alternative to other methods. The second algorithm we propose is to use the vectors from the augmented Krylov method, that is, to combine the vectors in the Krylov space with a few top eigenvectors. A pseudocode of the algorithm is in Figure 1. The algorithm calculates an orthonormal basis of the augmented Krylov space using a modified Gram-Schmidt method. Using the Krylov space eliminates the problems with nondiagonalizable transition matrices and complex eigenvectors. Another reason to combine these two methods is to take advantage of approximation properties of both methods. Intuitively, Krylov vectors capture the short-term behavior, while the eigenvectors capture the long-term behavior. Though, there is no reliable decision rule to determine the right number of augmenting eigenvectors, it is preferable to keep their number relatively low. This is because they are usually more expensive to compute than vectors in the Krylov space. 4 Approximation Bounds In this section, we briefly present a theoretical error bound guaranteed by the augmented Krylov methods. The bound characterizes the worst-case approximation error of the value function, given that the basis is constructed using the current transition matrix and reward vector. We focus on bounding the quadratic norm, what also implies the max norm. We show the bound for r ṽ + γp ṽ 2. This bound applies directly to algorithms that minimize the Bellman residual [Bertsekas and Tsitsiklis, 1996], and it also implies: v ṽ = (I γp ) 1 r (I γp ) 1 (I γp )ṽ 1 1 γ r (I γp )ṽ 2. This follows from the Neumann series expansion of the inverse and from P = 1.
4 In the following, we denote the set of m Krylov vectors as K m and the chosen set of top eigenvectors of P as U. We also use E(c, d, a) to denote an ellipse in the set of complex numbers with center c, focal distance d, and major semi-axis a. The approximation error for a basis constructed for the current policy may be bounded as the following theorem states. Theorem 4.1. Let (I γp ) = XSX 1 be a diagonalizable matrix. Further, let φ = max XT x/ U, x X 1 x 2, 2=1 where T is a diagonal matrix with 1 in place of eigenvalues of eigenvectors in U. The approximation error using the basis K m U is bounded by: { r (I γp )ṽ 2 max κ(x) C m( a1 d 1 ) C m ( c1 d 1, φ C a m( d ) C m ( c d ), where C m is the Chebyshev polynomial of the first kind of degree m, and κ(x) = X 2 X 1 2. The parameters a, c, d and a 1, c 1, d 1 are chosen such that E(c 1, d 1, a 1 ) includes the lower n U eigenvalues of (I γp ), and E(c, d, a) includes all its eigenvalues. The value of φ depends on the angle between the invariant subspace U and the other eigenvectors of P. If they are perpendicular, such as when P is symmetric, then φ =. Proof. To simplify the notation, we denote A = (I γp ). First, we show the standard bound on approximation by a Krylov space [Saad, 23], ignoring the additional eigenvectors. In this case, the objective is to find such w K m that minimizes the following: r Aw 2 = r w(i)a i r 2 = w(i)a i r 2, i=1 i= where w() = 1. Notice that this defines a polynomial in A multiplied by r with the constant factors determined by w. Let P m denote the set of polynomials of degree at most m such that every p P m satisfies p() = 1. The minimization problem may be then expressed as finding a polynomial p P m that minimizes p(a)r 2. This is related to the value of the polynomial on complex numbers as [Golub and Loan, 1996]: p(a) 2 κ(x) max p(λ i), i=1,...,n where λ i are the eigenvalues of A. A more practical bound may be obtained using Chebyshev polynomials, as for example in [Saad, 23], as follows: min max p(λ i) min max p(λ) C m( a d ) p P m i=1,...,n p P m λ E(c,d,a) C m ( c d ), where the ellipse E(c, d, a) covers all eigenvalues of A. In the approximation with eigenvectors, the minimization may be expressed as follows: min v r Aṽ 2 = min w K m min q U r Aw AUq 2 = = min min p(a)r + AUq 2 = p P m q U = min min p(a)(i P U )r + p(a)p U r AUq 2 p P m q U = min p(a)(i P U )r 2. p P m } where P U is the least squares projection matrix to U. Note that p(a)p U r AUy =, since U is an invariant space of A. Moreover, (I P U ) r is perpendicular to U. The theorem then follows from the approximation by Chebyshev polynomials as described above, and the definition of matrixvalued function approximation [Golub and Loan, 1996]. This bound shows that it is important to choose an invariant subspace that corresponds to the top eigenvalues of P. It also shows that U should be chosen to be to the highest degree perpendicular to the remaining eigenvectors. Finally, the bound also implies that the approximation precision increases with lower γ, as this decreases the size of the ellipse to cover the eigenvalues. Experiments In this section we demonstrate the proposed methods on the two-room problem similar to the one used in [Mahadevan, 2; Mahadevan and Maggioni, 2]. This problem is a typical representative of some stochastic planning problems encountered in AI. The MDP we use is a two-room grid with a single-cell doorway in the middle of the wall. The actions are to move in any of the four main direction. We use the policy with an equal probability of taking any action. The size of each room is by cells. The problem size is intentionally small to make explicit calculation of the value function possible. Notice that because of the structure of the problem and the policy, the random walk Laplacian has identical eigenvectors as the transition matrix in the same order. This allows us to evaluate the impact of choosing the eigenvectors in the order proposed by the weighted spectral method. We used the problem with various reward vectors and discount factors, but with the same transition structure. The reward vectors are synthetically constructed to show possible advantages and disadvantages of the methods. They are shown projected onto the grid in Figure 2. Vector Reward 1 represents an arbitrary smooth reward function. Reward 2 and Reward 3 are made perpendicular to the top 4 and the top 19 eigenvectors of the random walk Laplacian respectively. A lower discount rate is generally more favorable to Krylov methods, because the value is closer to the value of the reward received in the state. This can also be seen from Theorem 4.1, because γ shrinks the eigenvalues. Therefore, we use problems that are generally less favorable to Krylov methods, we evaluate the approach using two high discount rates, γ =.9 and γ =.99. In our experiments, we evaluate the Mean Squared Error () of the value approximation with regard to the number of vectors in the basis. The basis and the value function were calculated based on the same policy with equiprobable action choice. We obtained the true value function by explicitly evaluating (I γp ) 1 r. Notice, however, that the true value does not need to be used in the approximation, it is only required to determine the approximation error. We compared the following 4 methods: Laplacian The technique of using the eigenvectors of the random walk Laplacian of the adjacency matrix, as pro-
5 Reward 1 Reward 2 Reward 3 4 Reward Reward Reward 2 2 Y X 2 Y X 2 4 Y X 2 Figure 2: Reward vectors projected onto the grid Reward 1 Laplacian WL Krylov KA Reward 2 Reward Figure 3: Mean squared error of each method, using discount of γ =.9. The axis is in log scale. posed in [Mahadevan, 2]. The results of the normalized Laplacian are not shown because they were practically identical to the random walk Laplacian. WL This is the weighted spectral method, described in Subsection 3.3, which determines the optimal order of the eigenvectors to be used based on the value of d j. Krylov This is the augmented Krylov method with no eigenvectors, as described in Subsection 3.3. KA This is the augmented Krylov method with 3 eigenvectors, as proposed in Figure 1. The number of eigenvectors was chosen before running any experiments on this domain. The results for γ =.9 are in Figure 3 and for γ =.99 are in Figure 4. Reward 3 intentionally violates the smoothness assumption, but it is the easiest one to approximate for all methods except the Laplacian. In the case of Reward 1 and γ =.99, the weighted spectral method and the random walk Laplacian outperform the ordinary Krylov method for the first vectors. However, with additional vectors, both Krylov and augmented Krylov methods significantly outperform them. Our results suggest that Krylov methods may offer superior performance compared to using the eigenvectors of the Laplacian. Since these methods have been found very effective in many sparse linear systems [Saad, 23], they may perform well for basis construction also in other MDP problems. In addition, constructing a Krylov space is typically faster than constructing the invariant space of eigenvectors. Using Matlab on a standard PC, it took 2.99 seconds to calculate the top eigenvectors, while it took only.24 second to calculate the first vectors of the Krylov space for Reward 1. 6 Discussion We presented an alternative explanation of the success of Laplacian methods used for value approximation. This explanation allows us to more precisely determine when the method may work. Moreover, it shows that these methods are closely related to augmented Krylov methods. We demonstrated on a limited problem set that basis constructed from augmented Krylov methods may be superior to one constructed from the eigenvectors. In addition, both the weighted spectral method and the augmented Krylov method do not assume the value function to be smooth. Moreover, calculating vectors in the Krylov space is typically cheaper than calculating the eigenvectors. An approach for state-space compression of Partially Observable MDPs (POMDP) that also uses Krylov space was proposed in [Poupart and Boutilier, 22]. While POMDPs are a generalization of MDPs, the approach is not practical for large MDPs since it implicitly assumes that the number of observations is small. This is not true for MDPs, because the number of observations is equivalent to the number of states. In addition, the objectives of this approach are somewhat different, though the method is similar.
6 Reward 1 Laplacian WL Krylov KA Reward 2 Reward Figure 4: Mean squared error of each method, using discount of γ =.99. The axis is in log scale. One important issue that we did not address is an application to state spaces that are too large to be enumerated, such as factored MDPs. Because the vectors in Krylov space are as large as the whole state space, they cannot be enumerated either. However, the approach may be extended along similar lines as discussed in [Poupart and Boutilier, 22]. In reinforcement learning, the MDP needs to be solved using only an estimation of the transition matrix from sampling. An additional problem is that the basis is not defined for states that were not sampled. This was addressed for eigenvectors for example in [Mahadevan et al., 26]. Thus augmenting the Krylov space by the eigenvectors also has the advantage that these methods may be directly applied. We focused mainly on the VFA for a fixed policy without explicitly considering the control part of the problem. While these methods may be combined arbitrarily, a future challenge is to determine an efficient combination. Our results suggest that exploring the connection between the basis construction in VFA and sparse linear solvers may bring about interesting advances in the future. Acknowledgements The author was supported by the National Science Foundation under Grant No. IIS I also thank Sridhar Mahadevan, Hala Mostafa, Shlomo Zilberstein, and the anonymous reviewers for valuable comments. References [Bertsekas and Tsitsiklis, 1996] Dimitri P. Bertsekas and John N. Tsitsiklis. Neuro-dynamic programming. Athena Scientific, [Chung, 1997] Fang Chung. Spectral graph theory. American Mathematical Society, [Golub and Loan, 1996] Gene H. Golub and Charles F. Van Loan. Matrix computations. John Hopkins University Press, 3rd edition, [Ipsen and Meyer, 1998] Ilse C. F. Ipsen and Carl D. Meyer. The idea behind Krylov methods. American Mathematical Monthly, (): , [Maggioni and Mahadevan, 26] Mauro Maggioni and Sridhar Mahadevan. Fast direct policy evaluation using multiscale analysis of markov diffusion processes. In Proceedings of the International Conference on Machine Learning, pages ACM Press, 26. [Mahadevan and Maggioni, 2] Sridhar Mahadevan and Mauro Maggioni. Value function approximation with diffusion wavelets and Laplacian eigenfuctions. In Proceedings of Advances in Neural Information Processing Systems, 2. [Mahadevan et al., 26] Sridhar Mahadevan, Mauro Maggioni, Kimberly Ferguson, and Sarah Osentoski. Learning representation and control in continuous Markov decision processes. In Proceedings of the National Conference on Artificial Intelligence, 26. [Mahadevan, 2] Sridhar Mahadevan. Samuel meets Amarel: Automating value function approximation using global state space analysis. In Proceedings of the National Conference on Artificial Intelligence, 2. [Munos, 23] Remi Munos. Error bounds for approximate policy iteration. In Proceedings of the International Conference on Machine Learning, pages 6 67, 23. [Poupart and Boutilier, 22] Pascal Poupart and Craig Boutilier. Value-directed compression of POMDPs. In Proceedings of Advances in Neural Information Processing Systems, pages , 22. [Puterman, 2] Martin L. Puterman. Markov decision processes: Discrete stochastic dynamic programming. John Wiley & Sons, Inc., 2. [Saad, 199] Yosef Saad. Preconditioned Krylov subspace methods for CFD applications. In W. G. Hasbashi, editor, Solution Techniques for Large Scale CFD Problems, page 141. Wiley, 199. [Saad, 1997] Yousef Saad. Analysis of augmented Krylov subspace methods. SIAM Journal on Matrix Analysis and Applications, 18(2):43 449, April [Saad, 23] Yousef Saad. Iterative methods for sparse linear systems. SIAM, 2nd edition, 23. [Sutton and Barto, 1998] Richard S. Sutton and Andrew Barto. Reinforcement learning. MIT Press, 1998.
Basis Construction from Power Series Expansions of Value Functions
Basis Construction from Power Series Expansions of Value Functions Sridhar Mahadevan Department of Computer Science University of Massachusetts Amherst, MA 3 mahadeva@cs.umass.edu Bo Liu Department of
More informationValue Function Approximation with Diffusion Wavelets and Laplacian Eigenfunctions
Value Function Approximation with Diffusion Wavelets and Laplacian Eigenfunctions Sridhar Mahadevan Department of Computer Science University of Massachusetts Amherst, MA 13 mahadeva@cs.umass.edu Mauro
More informationLaplacian Agent Learning: Representation Policy Iteration
Laplacian Agent Learning: Representation Policy Iteration Sridhar Mahadevan Example of a Markov Decision Process a1: $0 Heaven $1 Earth What should the agent do? a2: $100 Hell $-1 V a1 ( Earth ) = f(0,1,1,1,1,...)
More informationFast Direct Policy Evaluation using Multiscale Analysis of Markov Diffusion Processes
Fast Direct Policy Evaluation using Multiscale Analysis of Markov Diffusion Processes Mauro Maggioni mauro.maggioni@yale.edu Department of Mathematics, Yale University, P.O. Box 88, New Haven, CT,, U.S.A.
More informationFeature Selection by Singular Value Decomposition for Reinforcement Learning
Feature Selection by Singular Value Decomposition for Reinforcement Learning Bahram Behzadian 1 Marek Petrik 1 Abstract Linear value function approximation is a standard approach to solving reinforcement
More informationConstructing Basis Functions from Directed Graphs for Value Function Approximation
Constructing Basis Functions from Directed Graphs for Value Function Approximation Jeff Johns johns@cs.umass.edu Sridhar Mahadevan mahadeva@cs.umass.edu Dept. of Computer Science, Univ. of Massachusetts
More informationRepresentation Policy Iteration
Representation Policy Iteration Sridhar Mahadevan Department of Computer Science University of Massachusetts 14 Governor s Drive Amherst, MA 13 mahadeva@cs.umass.edu Abstract This paper addresses a fundamental
More informationLow-rank Feature Selection for Reinforcement Learning
Low-rank Feature Selection for Reinforcement Learning Bahram Behzadian and Marek Petrik Department of Computer Science University of New Hampshire {bahram, mpetrik}@cs.unh.edu Abstract Linear value function
More informationPrioritized Sweeping Converges to the Optimal Value Function
Technical Report DCS-TR-631 Prioritized Sweeping Converges to the Optimal Value Function Lihong Li and Michael L. Littman {lihong,mlittman}@cs.rutgers.edu RL 3 Laboratory Department of Computer Science
More informationMDP Preliminaries. Nan Jiang. February 10, 2019
MDP Preliminaries Nan Jiang February 10, 2019 1 Markov Decision Processes In reinforcement learning, the interactions between the agent and the environment are often described by a Markov Decision Process
More informationProto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes
Journal of Machine Learning Research 8 (27) 2169-2231 Submitted 6/6; Revised 3/7; Published 1/7 Proto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes
More informationValue Function Approximation in Reinforcement Learning using the Fourier Basis
Value Function Approximation in Reinforcement Learning using the Fourier Basis George Konidaris Sarah Osentoski Technical Report UM-CS-28-19 Autonomous Learning Laboratory Computer Science Department University
More informationRepresentation Policy Iteration: A Unified Framework for Learning Behavior and Representation
Representation Policy Iteration: A Unified Framework for Learning Behavior and Representation Sridhar Mahadevan Unified Theories of Cognition (Newell, William James Lectures, Harvard) How are we able to
More informationThe Conjugate Gradient Method
The Conjugate Gradient Method Classical Iterations We have a problem, We assume that the matrix comes from a discretization of a PDE. The best and most popular model problem is, The matrix will be as large
More informationProto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes
Proto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes Sridhar Mahadevan Department of Computer Science University of Massachusetts Amherst, MA
More informationAn Adaptive Clustering Method for Model-free Reinforcement Learning
An Adaptive Clustering Method for Model-free Reinforcement Learning Andreas Matt and Georg Regensburger Institute of Mathematics University of Innsbruck, Austria {andreas.matt, georg.regensburger}@uibk.ac.at
More informationA Multiscale Framework for Markov Decision Processes using Diffusion Wavelets
A Multiscale Framework for Markov Decision Processes using Diffusion Wavelets Mauro Maggioni Program in Applied Mathematics Department of Mathematics Yale University New Haven, CT 6 mauro.maggioni@yale.edu
More informationAn Analysis of Linear Models, Linear Value-Function Approximation, and Feature Selection For Reinforcement Learning
An Analysis of Linear Models, Linear Value-Function Approximation, and Feature Selection For Reinforcement Learning Ronald Parr, Lihong Li, Gavin Taylor, Christopher Painter-Wakefield, Michael L. Littman
More informationBiasing Approximate Dynamic Programming with a Lower Discount Factor
Biasing Approximate Dynamic Programming with a Lower Discount Factor Marek Petrik, Bruno Scherrer To cite this version: Marek Petrik, Bruno Scherrer. Biasing Approximate Dynamic Programming with a Lower
More informationLearning Representation and Control In Continuous Markov Decision Processes
Learning Representation and Control In Continuous Markov Decision Processes Sridhar Mahadevan Department of Computer Science University of Massachusetts 140 Governor s Drive Amherst, MA 03 mahadeva@cs.umass.edu
More informationCompact Spectral Bases for Value Function Approximation Using Kronecker Factorization
Compact Spectral Bases for Value Function Approximation Using Kronecker Factorization Jeff Johns and Sridhar Mahadevan and Chang Wang Computer Science Department University of Massachusetts Amherst Amherst,
More information1 Problem Formulation
Book Review Self-Learning Control of Finite Markov Chains by A. S. Poznyak, K. Najim, and E. Gómez-Ramírez Review by Benjamin Van Roy This book presents a collection of work on algorithms for learning
More informationMarkov Decision Processes and Dynamic Programming
Markov Decision Processes and Dynamic Programming A. LAZARIC (SequeL Team @INRIA-Lille) Ecole Centrale - Option DAD SequeL INRIA Lille EC-RL Course In This Lecture A. LAZARIC Markov Decision Processes
More informationMATH 1120 (LINEAR ALGEBRA 1), FINAL EXAM FALL 2011 SOLUTIONS TO PRACTICE VERSION
MATH (LINEAR ALGEBRA ) FINAL EXAM FALL SOLUTIONS TO PRACTICE VERSION Problem (a) For each matrix below (i) find a basis for its column space (ii) find a basis for its row space (iii) determine whether
More informationIterative Methods for Solving A x = b
Iterative Methods for Solving A x = b A good (free) online source for iterative methods for solving A x = b is given in the description of a set of iterative solvers called templates found at netlib: http
More informationLinearly-solvable Markov decision problems
Advances in Neural Information Processing Systems 2 Linearly-solvable Markov decision problems Emanuel Todorov Department of Cognitive Science University of California San Diego todorov@cogsci.ucsd.edu
More informationConvergence of Synchronous Reinforcement Learning. with linear function approximation
Convergence of Synchronous Reinforcement Learning with Linear Function Approximation Artur Merke artur.merke@udo.edu Lehrstuhl Informatik, University of Dortmund, 44227 Dortmund, Germany Ralf Schoknecht
More informationOnline solution of the average cost Kullback-Leibler optimization problem
Online solution of the average cost Kullback-Leibler optimization problem Joris Bierkens Radboud University Nijmegen j.bierkens@science.ru.nl Bert Kappen Radboud University Nijmegen b.kappen@science.ru.nl
More informationA Generalized Reduced Linear Program for Markov Decision Processes
Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence A Generalized Reduced Linear Program for Markov Decision Processes Chandrashekar Lakshminarayanan and Shalabh Bhatnagar Department
More informationLearning on Graphs and Manifolds. CMPSCI 689 Sridhar Mahadevan U.Mass Amherst
Learning on Graphs and Manifolds CMPSCI 689 Sridhar Mahadevan U.Mass Amherst Outline Manifold learning is a relatively new area of machine learning (2000-now). Main idea Model the underlying geometry of
More informationBasis Construction and Utilization for Markov Decision Processes Using Graphs
University of Massachusetts Amherst ScholarWorks@UMass Amherst Open Access Dissertations 2-2010 Basis Construction and Utilization for Markov Decision Processes Using Graphs Jeffrey Thomas Johns University
More informationCompact Spectral Bases for Value Function Approximation Using Kronecker Factorization
Compact Spectral Bases for Value Function Approximation Using Kronecker Factorization Jeff Johns and Sridhar Mahadevan and Chang Wang Computer Science Department University of Massachusetts Amherst Amherst,
More information, and rewards and transition matrices as shown below:
CSE 50a. Assignment 7 Out: Tue Nov Due: Thu Dec Reading: Sutton & Barto, Chapters -. 7. Policy improvement Consider the Markov decision process (MDP) with two states s {0, }, two actions a {0, }, discount
More informationApproximate Dynamic Programming
Master MVA: Reinforcement Learning Lecture: 5 Approximate Dynamic Programming Lecturer: Alessandro Lazaric http://researchers.lille.inria.fr/ lazaric/webpage/teaching.html Objectives of the lecture 1.
More informationChristopher Watkins and Peter Dayan. Noga Zaslavsky. The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015
Q-Learning Christopher Watkins and Peter Dayan Noga Zaslavsky The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015 Noga Zaslavsky Q-Learning (Watkins & Dayan, 1992)
More informationOpen Theoretical Questions in Reinforcement Learning
Open Theoretical Questions in Reinforcement Learning Richard S. Sutton AT&T Labs, Florham Park, NJ 07932, USA, sutton@research.att.com, www.cs.umass.edu/~rich Reinforcement learning (RL) concerns the problem
More informationilstd: Eligibility Traces and Convergence Analysis
ilstd: Eligibility Traces and Convergence Analysis Alborz Geramifard Michael Bowling Martin Zinkevich Richard S. Sutton Department of Computing Science University of Alberta Edmonton, Alberta {alborz,bowling,maz,sutton}@cs.ualberta.ca
More informationMarkov Decision Processes and Dynamic Programming
Markov Decision Processes and Dynamic Programming A. LAZARIC (SequeL Team @INRIA-Lille) ENS Cachan - Master 2 MVA SequeL INRIA Lille MVA-RL Course How to model an RL problem The Markov Decision Process
More informationArnoldi Methods in SLEPc
Scalable Library for Eigenvalue Problem Computations SLEPc Technical Report STR-4 Available at http://slepc.upv.es Arnoldi Methods in SLEPc V. Hernández J. E. Román A. Tomás V. Vidal Last update: October,
More informationValue Function Approximation in Reinforcement Learning Using the Fourier Basis
Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence Value Function Approximation in Reinforcement Learning Using the Fourier Basis George Konidaris,3 MIT CSAIL gdk@csail.mit.edu
More informationLearning Representation & Behavior:
Learning Representation & Behavior: Manifold and Spectral Methods for Markov Decision Processes and Reinforcement Learning Sridhar Mahadevan, U. Mass, Amherst Mauro Maggioni, Yale Univ. June 25, 26 ICML
More informationCMPSCI 791BB: Advanced ML: Laplacian Learning
CMPSCI 791BB: Advanced ML: Laplacian Learning Sridhar Mahadevan Outline! Spectral graph operators! Combinatorial graph Laplacian! Normalized graph Laplacian! Random walks! Machine learning on graphs! Clustering!
More informationInternet Monetization
Internet Monetization March May, 2013 Discrete time Finite A decision process (MDP) is reward process with decisions. It models an environment in which all states are and time is divided into stages. Definition
More informationProcedia Computer Science 00 (2011) 000 6
Procedia Computer Science (211) 6 Procedia Computer Science Complex Adaptive Systems, Volume 1 Cihan H. Dagli, Editor in Chief Conference Organized by Missouri University of Science and Technology 211-
More informationBias-Variance Error Bounds for Temporal Difference Updates
Bias-Variance Bounds for Temporal Difference Updates Michael Kearns AT&T Labs mkearns@research.att.com Satinder Singh AT&T Labs baveja@research.att.com Abstract We give the first rigorous upper bounds
More informationOn the Convergence of Optimistic Policy Iteration
Journal of Machine Learning Research 3 (2002) 59 72 Submitted 10/01; Published 7/02 On the Convergence of Optimistic Policy Iteration John N. Tsitsiklis LIDS, Room 35-209 Massachusetts Institute of Technology
More informationApproximate Dynamic Programming
Approximate Dynamic Programming A. LAZARIC (SequeL Team @INRIA-Lille) Ecole Centrale - Option DAD SequeL INRIA Lille EC-RL Course Value Iteration: the Idea 1. Let V 0 be any vector in R N A. LAZARIC Reinforcement
More informationMarkov Decision Processes Infinite Horizon Problems
Markov Decision Processes Infinite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld 1 What is a solution to an MDP? MDP Planning Problem: Input: an MDP (S,A,R,T)
More informationWE consider finite-state Markov decision processes
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 54, NO. 7, JULY 2009 1515 Convergence Results for Some Temporal Difference Methods Based on Least Squares Huizhen Yu and Dimitri P. Bertsekas Abstract We consider
More informationLecture 4: Approximate dynamic programming
IEOR 800: Reinforcement learning By Shipra Agrawal Lecture 4: Approximate dynamic programming Deep Q Networks discussed in the last lecture are an instance of approximate dynamic programming. These are
More informationBasics of reinforcement learning
Basics of reinforcement learning Lucian Buşoniu TMLSS, 20 July 2018 Main idea of reinforcement learning (RL) Learn a sequential decision policy to optimize the cumulative performance of an unknown system
More information1 Conjugate gradients
Notes for 2016-11-18 1 Conjugate gradients We now turn to the method of conjugate gradients (CG), perhaps the best known of the Krylov subspace solvers. The CG iteration can be characterized as the iteration
More informationMarkov Chains and Spectral Clustering
Markov Chains and Spectral Clustering Ning Liu 1,2 and William J. Stewart 1,3 1 Department of Computer Science North Carolina State University, Raleigh, NC 27695-8206, USA. 2 nliu@ncsu.edu, 3 billy@ncsu.edu
More informationCourse Notes: Week 1
Course Notes: Week 1 Math 270C: Applied Numerical Linear Algebra 1 Lecture 1: Introduction (3/28/11) We will focus on iterative methods for solving linear systems of equations (and some discussion of eigenvalues
More informationCover Page. The handle holds various files of this Leiden University dissertation
Cover Page The handle http://hdl.handle.net/1887/39637 holds various files of this Leiden University dissertation Author: Smit, Laurens Title: Steady-state analysis of large scale systems : the successive
More information4.8 Arnoldi Iteration, Krylov Subspaces and GMRES
48 Arnoldi Iteration, Krylov Subspaces and GMRES We start with the problem of using a similarity transformation to convert an n n matrix A to upper Hessenberg form H, ie, A = QHQ, (30) with an appropriate
More informationPlanning in Markov Decision Processes
Carnegie Mellon School of Computer Science Deep Reinforcement Learning and Control Planning in Markov Decision Processes Lecture 3, CMU 10703 Katerina Fragkiadaki Markov Decision Process (MDP) A Markov
More informationReinforcement Learning. Yishay Mansour Tel-Aviv University
Reinforcement Learning Yishay Mansour Tel-Aviv University 1 Reinforcement Learning: Course Information Classes: Wednesday Lecture 10-13 Yishay Mansour Recitations:14-15/15-16 Eliya Nachmani Adam Polyak
More informationLeast-Squares Temporal Difference Learning based on Extreme Learning Machine
Least-Squares Temporal Difference Learning based on Extreme Learning Machine Pablo Escandell-Montero, José M. Martínez-Martínez, José D. Martín-Guerrero, Emilio Soria-Olivas, Juan Gómez-Sanchis IDAL, Intelligent
More informationDS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.
DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1
More informationApproximate Dynamic Programming
Approximate Dynamic Programming A. LAZARIC (SequeL Team @INRIA-Lille) ENS Cachan - Master 2 MVA SequeL INRIA Lille MVA-RL Course Approximate Dynamic Programming (a.k.a. Batch Reinforcement Learning) A.
More informationITERATIVE METHODS BASED ON KRYLOV SUBSPACES
ITERATIVE METHODS BASED ON KRYLOV SUBSPACES LONG CHEN We shall present iterative methods for solving linear algebraic equation Au = b based on Krylov subspaces We derive conjugate gradient (CG) method
More informationMath 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination
Math 0, Winter 07 Final Exam Review Chapter. Matrices and Gaussian Elimination { x + x =,. Different forms of a system of linear equations. Example: The x + 4x = 4. [ ] [ ] [ ] vector form (or the column
More informationLinear Algebra review Powers of a diagonalizable matrix Spectral decomposition
Linear Algebra review Powers of a diagonalizable matrix Spectral decomposition Prof. Tesler Math 283 Fall 2018 Also see the separate version of this with Matlab and R commands. Prof. Tesler Diagonalizing
More informationApproximate Dynamic Programming By Minimizing Distributionally Robust Bounds
Approximate Dynamic Programming By Minimizing Distributionally Robust Bounds Marek Petrik IBM T.J. Watson Research Center, Yorktown, NY, USA Abstract Approximate dynamic programming is a popular method
More informationLINEAR ALGEBRA: NUMERICAL METHODS. Version: August 12,
LINEAR ALGEBRA: NUMERICAL METHODS. Version: August 12, 2000 74 6 Summary Here we summarize the most important information about theoretical and numerical linear algebra. MORALS OF THE STORY: I. Theoretically
More informationMATH 20F: LINEAR ALGEBRA LECTURE B00 (T. KEMP)
MATH 20F: LINEAR ALGEBRA LECTURE B00 (T KEMP) Definition 01 If T (x) = Ax is a linear transformation from R n to R m then Nul (T ) = {x R n : T (x) = 0} = Nul (A) Ran (T ) = {Ax R m : x R n } = {b R m
More informationExercise Set 7.2. Skills
Orthogonally diagonalizable matrix Spectral decomposition (or eigenvalue decomposition) Schur decomposition Subdiagonal Upper Hessenburg form Upper Hessenburg decomposition Skills Be able to recognize
More informationMath 307 Learning Goals
Math 307 Learning Goals May 14, 2018 Chapter 1 Linear Equations 1.1 Solving Linear Equations Write a system of linear equations using matrix notation. Use Gaussian elimination to bring a system of linear
More informationOptimizing Memory-Bounded Controllers for Decentralized POMDPs
Optimizing Memory-Bounded Controllers for Decentralized POMDPs Christopher Amato, Daniel S. Bernstein and Shlomo Zilberstein Department of Computer Science University of Massachusetts Amherst, MA 01003
More informationLecture 9: Krylov Subspace Methods. 2 Derivation of the Conjugate Gradient Algorithm
CS 622 Data-Sparse Matrix Computations September 19, 217 Lecture 9: Krylov Subspace Methods Lecturer: Anil Damle Scribes: David Eriksson, Marc Aurele Gilles, Ariah Klages-Mundt, Sophia Novitzky 1 Introduction
More informationReview of similarity transformation and Singular Value Decomposition
Review of similarity transformation and Singular Value Decomposition Nasser M Abbasi Applied Mathematics Department, California State University, Fullerton July 8 7 page compiled on June 9, 5 at 9:5pm
More informationLeast-Squares λ Policy Iteration: Bias-Variance Trade-off in Control Problems
: Bias-Variance Trade-off in Control Problems Christophe Thiery Bruno Scherrer LORIA - INRIA Lorraine - Campus Scientifique - BP 239 5456 Vandœuvre-lès-Nancy CEDEX - FRANCE thierych@loria.fr scherrer@loria.fr
More informationDual Temporal Difference Learning
Dual Temporal Difference Learning Min Yang Dept. of Computing Science University of Alberta Edmonton, Alberta Canada T6G 2E8 Yuxi Li Dept. of Computing Science University of Alberta Edmonton, Alberta Canada
More informationOptimism in the Face of Uncertainty Should be Refutable
Optimism in the Face of Uncertainty Should be Refutable Ronald ORTNER Montanuniversität Leoben Department Mathematik und Informationstechnolgie Franz-Josef-Strasse 18, 8700 Leoben, Austria, Phone number:
More information7. Symmetric Matrices and Quadratic Forms
Linear Algebra 7. Symmetric Matrices and Quadratic Forms CSIE NCU 1 7. Symmetric Matrices and Quadratic Forms 7.1 Diagonalization of symmetric matrices 2 7.2 Quadratic forms.. 9 7.4 The singular value
More informationLINEAR ALGEBRA 1, 2012-I PARTIAL EXAM 3 SOLUTIONS TO PRACTICE PROBLEMS
LINEAR ALGEBRA, -I PARTIAL EXAM SOLUTIONS TO PRACTICE PROBLEMS Problem (a) For each of the two matrices below, (i) determine whether it is diagonalizable, (ii) determine whether it is orthogonally diagonalizable,
More informationAnalyzing Feature Generation for Value-Function Approximation
Ronald Parr Christopher Painter-Wakefield Department of Computer Science, Duke University, Durham, NC 778 USA Lihong Li Michael Littman Department of Computer Science, Rutgers University, Piscataway, NJ
More informationPractice Final Exam Solutions for Calculus II, Math 1502, December 5, 2013
Practice Final Exam Solutions for Calculus II, Math 5, December 5, 3 Name: Section: Name of TA: This test is to be taken without calculators and notes of any sorts. The allowed time is hours and 5 minutes.
More informationAn Empirical Algorithm for Relative Value Iteration for Average-cost MDPs
2015 IEEE 54th Annual Conference on Decision and Control CDC December 15-18, 2015. Osaka, Japan An Empirical Algorithm for Relative Value Iteration for Average-cost MDPs Abhishek Gupta Rahul Jain Peter
More informationThe convergence limit of the temporal difference learning
The convergence limit of the temporal difference learning Ryosuke Nomura the University of Tokyo September 3, 2013 1 Outline Reinforcement Learning Convergence limit Construction of the feature vector
More informationM.A. Botchev. September 5, 2014
Rome-Moscow school of Matrix Methods and Applied Linear Algebra 2014 A short introduction to Krylov subspaces for linear systems, matrix functions and inexact Newton methods. Plan and exercises. M.A. Botchev
More informationMarkov Chains, Random Walks on Graphs, and the Laplacian
Markov Chains, Random Walks on Graphs, and the Laplacian CMPSCI 791BB: Advanced ML Sridhar Mahadevan Random Walks! There is significant interest in the problem of random walks! Markov chain analysis! Computer
More informationIn Advances in Neural Information Processing Systems 6. J. D. Cowan, G. Tesauro and. Convergence of Indirect Adaptive. Andrew G.
In Advances in Neural Information Processing Systems 6. J. D. Cowan, G. Tesauro and J. Alspector, (Eds.). Morgan Kaufmann Publishers, San Fancisco, CA. 1994. Convergence of Indirect Adaptive Asynchronous
More informationPlanning in POMDPs Using Multiplicity Automata
Planning in POMDPs Using Multiplicity Automata Eyal Even-Dar School of Computer Science Tel-Aviv University Tel-Aviv, Israel 69978 evend@post.tau.ac.il Sham M. Kakade Computer and Information Science University
More informationQ-Learning in Continuous State Action Spaces
Q-Learning in Continuous State Action Spaces Alex Irpan alexirpan@berkeley.edu December 5, 2015 Contents 1 Introduction 1 2 Background 1 3 Q-Learning 2 4 Q-Learning In Continuous Spaces 4 5 Experimental
More informationControl Theory : Course Summary
Control Theory : Course Summary Author: Joshua Volkmann Abstract There are a wide range of problems which involve making decisions over time in the face of uncertainty. Control theory draws from the fields
More information5.) For each of the given sets of vectors, determine whether or not the set spans R 3. Give reasons for your answers.
Linear Algebra - Test File - Spring Test # For problems - consider the following system of equations. x + y - z = x + y + 4z = x + y + 6z =.) Solve the system without using your calculator..) Find the
More informationAMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)
AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) Lecture 19: Computing the SVD; Sparse Linear Systems Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical
More informationSpectral Processing. Misha Kazhdan
Spectral Processing Misha Kazhdan [Taubin, 1995] A Signal Processing Approach to Fair Surface Design [Desbrun, et al., 1999] Implicit Fairing of Arbitrary Meshes [Vallet and Levy, 2008] Spectral Geometry
More informationMath 307 Learning Goals. March 23, 2010
Math 307 Learning Goals March 23, 2010 Course Description The course presents core concepts of linear algebra by focusing on applications in Science and Engineering. Examples of applications from recent
More informationThe Lanczos and conjugate gradient algorithms
The Lanczos and conjugate gradient algorithms Gérard MEURANT October, 2008 1 The Lanczos algorithm 2 The Lanczos algorithm in finite precision 3 The nonsymmetric Lanczos algorithm 4 The Golub Kahan bidiagonalization
More informationCS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works
CS68: The Modern Algorithmic Toolbox Lecture #8: How PCA Works Tim Roughgarden & Gregory Valiant April 20, 206 Introduction Last lecture introduced the idea of principal components analysis (PCA). The
More informationBalancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm
Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm Michail G. Lagoudakis Department of Computer Science Duke University Durham, NC 2778 mgl@cs.duke.edu
More informationApproximate Optimal-Value Functions. Satinder P. Singh Richard C. Yee. University of Massachusetts.
An Upper Bound on the oss from Approximate Optimal-Value Functions Satinder P. Singh Richard C. Yee Department of Computer Science University of Massachusetts Amherst, MA 01003 singh@cs.umass.edu, yee@cs.umass.edu
More informationMATH 31 - ADDITIONAL PRACTICE PROBLEMS FOR FINAL
MATH 3 - ADDITIONAL PRACTICE PROBLEMS FOR FINAL MAIN TOPICS FOR THE FINAL EXAM:. Vectors. Dot product. Cross product. Geometric applications. 2. Row reduction. Null space, column space, row space, left
More informationTopics. The CG Algorithm Algorithmic Options CG s Two Main Convergence Theorems
Topics The CG Algorithm Algorithmic Options CG s Two Main Convergence Theorems What about non-spd systems? Methods requiring small history Methods requiring large history Summary of solvers 1 / 52 Conjugate
More informationKey words. conjugate gradients, normwise backward error, incremental norm estimation.
Proceedings of ALGORITMY 2016 pp. 323 332 ON ERROR ESTIMATION IN THE CONJUGATE GRADIENT METHOD: NORMWISE BACKWARD ERROR PETR TICHÝ Abstract. Using an idea of Duff and Vömel [BIT, 42 (2002), pp. 300 322
More informationav 1 x 2 + 4y 2 + xy + 4z 2 = 16.
74 85 Eigenanalysis The subject of eigenanalysis seeks to find a coordinate system, in which the solution to an applied problem has a simple expression Therefore, eigenanalysis might be called the method
More informationConjugate gradient method. Descent method. Conjugate search direction. Conjugate Gradient Algorithm (294)
Conjugate gradient method Descent method Hestenes, Stiefel 1952 For A N N SPD In exact arithmetic, solves in N steps In real arithmetic No guaranteed stopping Often converges in many fewer than N steps
More information