Linear stochastic approximation driven by slowly varying Markov chains
|
|
- Brett Jackson
- 6 years ago
- Views:
Transcription
1 Available online at Systems & Control Letters Linear stochastic approximation driven by slowly varying Marov chains Viay R. Konda, John N. Tsitsilis Laboratory for Information and Decision Systems, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA Received 5 June 2002; received in revised form 30 December 2002; accepted 23 February 2003 Abstract We study a linear stochastic approximation algorithm that arises in the context of reinforcement learning. The algorithm employs a decreasing step-size, and is driven by Marov noise with time-varying statistics. We show that under suitable conditions, the algorithm can trac the changes in the statistics of the Marov noise, as long as these changes are slower than the rate at which the step-size of the algorithm goes to zero. c 2003 Elsevier B.V. All rights reserved. Keywords: Stochastic approximation; Adaptive algorithms; Reinforcement learning 0. Introduction The convergence of stochastic approximation algorithms driven by ergodic noise sequences has been extensively studied in the literature [1,5,9,10]. Typical convergence results show that the iterates converge to a stationary point of the vector eld obtained by averaging the update direction with respect to the statistics of the driving noise. In some applications, the statistics of the driving noise change with time. In such cases, the point to which the algorithm would converge if the noise were held stationary, also changes with time. In this context, a meaningful obective is to have the stochastic approximation algorithm trac this changing point Corresponding author. addresses: onda@alum.mit.edu V.R. Konda, nt@mit.edu J.N. Tsitsilis. closely, after an initial transient period. Algorithms of this type are often called adaptive, as they adapt themselves to the changing environment for a textboo account of adaptive algorithms, see [1]. The tracing ability of adaptive algorithms has been analyzed in several contexts [6,14]. For example, it is nown that the usual constant step-size stochastic approximation algorithms can adapt to changes in the statistics of the driving noise that are slow relative to the step-size of the algorithm. However, the tracing ability of decreasing step-size stochastic approximation algorithms has not been studied before, because of the nature of the assumptions that would be required: for the changes in the statistics of the driving noise to remain slow relative to the decreasing step-size, these changes should become progressively slower. Such an assumption would be too restrictive and unnatural in applications where the noise is exogenously generated /03/$ - see front matter c 2003 Elsevier B.V. All rights reserved. doi: /s
2 96 V.R. Konda, J.N. Tsitsilis / Systems & Control Letters In some contexts, however, the statistics of the driving noise depend only on a parameter that is deliberately changed by the user at a rate which is slower than the natural time scale of the stochastic approximation algorithm. In such cases, it becomes meaningful to study the tracing ability of decreasing step-size stochastic approximation. In this paper, we focus on linear iterations and establish that when the update direction and the statistics of the noise in the update direction depend on a slowly changing parameter, the algorithm can trac such changes in a strong sense to be explained later. To the best of our nowledge, this is the rst result on the tracing ability of adaptive algorithms with decreasing step-sizes. Similar results are possible for methods involving nonlinear iterations under certain stability conditions see, e.g. [4] for stability conditions in a single time scale setting with a simpler noise model, but this direction is not pursued here. The rest of the paper is organized as follows. In Section 1, we motivate the linear algorithms considered within the context of reinforcement learning. In Section 2, we state the main result of this paper. The last three sections are devoted to the proof of this result. 1. Motivation The motivation for the analysis of the linear iterations considered in this paper comes from simulation-based reinforcement learning methods for Marov decision problems [2,13], such as Temporal Dierence TD learning [12], that are used to approximate the value function under a given xed policy. More precisely, consider an ergodic nite-state Marov chain {X }, with transition probabilities py x. Let gx be a one-stage reward function, and let be its steady-state expectation. TD learning methods consider value functions that are linear in a prescribed set of basis functions i x, the Poisson equation V x=gx + py xv y: y The parameters and r, which denote the estimate of average reward and the parameters of the approximate value function, are updated recursively as +1 = + gx ; r +1 = r + gx + r X +1 r X Z ; where Z = l X l l=0 is a so-called eligibility trace, and is a constant with Observe that the update equation for the vector ;r is of the form r +1 = r + hy +1 GY +1 r ; where Y +1 =X ;X +1 ;Z is a Marov chain whose transition probabilities are not aected by the parameters being updated. However, in the context of optimization over a parametric family of Marov chains, both the basis functions and the transition probabilities of the Marov chain {X } depend on a policy parameter that is constantly changing. Therefore, in this context, the update taes the form r +1 = r + h Y +1 G Y +1 r ; where denotes the value of at time, and where Y +1 is generated from Y using the transition probabilities corresponding to.if is held constant at, the algorithm would generally converge to some r. We would lie to prove that even if moves slowly, then r tracs r, in the sense that r r goes to zero. A result of this type, as developed in the next section, is necessary for analyzing the convergence properties of actor-critic algorithms, which combine temporal dierence learning and slowly changing policies [7,8]. ˆ V x; r= i r i i x 2. Main result and adust the vector r of free parameters r i, to obtain a sequence of approximations to the solution V of Consider a stochastic process {Y } taing values in a Polish complete, separable, metric space Y,
3 V.R. Konda, J.N. Tsitsilis / Systems & Control Letters endowed with its Borel -eld. Let {P y; dy; R n } be a parameterized family of transition ernels on Y. Consider the following update equations for a vector r R m and a parameter R n : r +1 = r + h Y +1 G Y +1 r + +1 r ; +1 = + H +1 : 1 In the above iteration, {h ;G : R n } is a parameterized family of m-vector valued and m m-matrixvalued measurable functions on Y. Also, H is a random process that drives the changes in the parameter, which in turn aects the linear stochastic approximation updates of r.wenow continue with our assumptions. Assumption 1. The step-size sequence { } is deterministic, non-increasing, and satises = ; 2 : Let F be the -eld generated by {Y l ;H l ; r l ; l : l 6 }. Assumption 2. For every measurable set A Y, PY +1 A F =PY +1 A Y ; =P Y ;A: For any measurable function f on Y, let P f denote the measurable function y P y; dyfy. Also, for any vector r, let r denote its Euclidean norm. Assumption 3 Existence and properties of solutions to the Poisson equation: For each, there exist functions h R m, G R m m, ĥ : Y R m, and Ĝ : Y R m m that satisfy the following: 1. For each y Y, ĥ y=h y h+p ĥ y; Ĝ y=g y G+P Ĝ y: 2. For some constant C and for all, we have max h ; G 6 C: 3. For any d 0, there exists C d 0 such that sup E[ f Y d ] 6 C d ; where f represents any of the functions ĥ ;h ; Ĝ ;G. 4. For some constant C 0, and for all ; R n, max h h ; G G 6 C : 5. There exists a positive measurable function C on Y such that for each d 0, sup E[CY d ] ; and P f y P f y 6 Cy ; where f is any of the functions ĥ and Ĝ. Assumption 4 Slowly changing environment: The random process {H } satises sup E[ H d ] for all d 0. Furthermore, the sequence { } is deterministic and satises d ; for some d 0. Assumption 5. The sequence { } is a m m-matrix valued F -martingale dierence, with bounded moments i.e., E[ +1 F ]=0; sup E[ d ] d 0: Assumption 6 Uniformpositivedeniteness. There exists some a 0 such that for all r R m and R n : r Gr a r 2 : Our main result is the following. Theorem 7. If Assumptions 1 6 are satised, then lim r G 1 h =0: When is held xed at some value, for all, our result states that r converges to G 1 h, which is a special case of Theorem 17 in p. 239 of [1]. In fact Assumptions 1 and 2 are the counterparts
4 98 V.R. Konda, J.N. Tsitsilis / Systems & Control Letters of Assumptions A.1 and A.2 of [1], and Assumption 3 is the counterpart of Assumptions A.3 and A.4 in [1]. Several sucient conditions for this assumption are presented in [1] when the state space Y of the process Y is a subset of a Euclidean space. When Y is Polish, these conditions can be generalized using the techniques of [11]. There are two dierences between our assumptions and those in [1]. A minor dierence is that [1] considers vector-valued processes Y, whereas we consider more general processes Y. Accordingly, the bounds in [1] are stated in terms of Y, whereas our bounds are stated in terms of some positive functions of Y. The second dierence, which is the more signicant one, is that is changing, albeit slowly. For this reason, we need to use a dierent proof technique. Our proof combines various techniques used in [1,3,4]. In the next section, we present an overview of the proof and the intuition behind it. 3. Overview of the proof We note that the sequence ˆ = G r h satises the iteration: ˆ +1 =ˆ G +1 ˆ ; where 1 +1 = G +1 h Y +1 h G +1 G Y +1 G r + G r ; 2 +1 = 1 G +1 G r h +1 h as can be veried by some simple algebra. Assumption 3 implies that the vector h Y +1 and the matrix G Y +1 have expected values h and G respectively under the steady-state distribution of the time-homogeneous Marov chain Y with transition ernel P. Therefore, we expect that the eect of the error term 1 +1 can be averaged out in the long term. Similarly, since is changing very slowly with respect to the step-size, we expect that 2 +1 goes to zero. The proof consists of showing that the terms ;i=1; 2; are inconsequential in the limit, and by observing that the sequence { ˆ } converges to zero when these terms are absent. We formalize this intuition in the next two sections. Note that the error terms are ane in r, and therefore can be very large if r is large. A ey step in the proof is to show boundedness of the iterates r, and this is the subect of the next section. The main result is then proved in the last section. The approach and techniques used here are inspired by more general techniques developed in [1,4]. i Proof of boundedness Note that the dierence between two successive iterates at time is of the order, which goes to zero as goes to innity. Therefore, to study the asymptotic behavior of the sequence {r }, we need to focus on a subsequence {r }, where the sequence of non-negative integers { } is dened by 0 =0; +1 = min 1 T : l= Here, T is a positive constant that will be held xed throughout the rest of the paper. The sequence { } is chosen so that any two successive elements are suciently apart, resulting in a non-trivial dierence between r +1 and r. To obtain a relation between r +1 and r, we dene a sequence { ˆr } by ˆr = r =max1; r for : Note that ˆr 6 1 for all. Furthermore, ˆr is F -adapted and satises ˆr +1 =ˆr + h max1; r G ˆr + ˆ +1 ; ; where for, the term ˆ +1 = h Y +1 h max1; r G Y +1 G ˆr + +1 ˆr ;
5 V.R. Konda, J.N. Tsitsilis / Systems & Control Letters can be viewed as perturbation noise. Similarly, for each, dene a sequence {r } as follows: r =ˆr ; r +1 = r + h max1; r G r ; : Note that the iteration satised by r is the same as that of ˆr, except that the perturbation noise is not present in it. We will show that the perturbation noise is negligible, and that ˆr tracs r, in the sense that lim max ˆr r =0; w:p:1: The next lemma provides bounds on the perturbation noise. It involves the stopping times 1 C, which are dened as follows: given some constant C 1, let 1 C = min{ : ˆr C}: The stopping time 1 C is the rst time the random sequence { ˆr } exits a ball of radius C. We will often use the simpler notation 1, if the value of C is clear from the context. The following lemma derives bounds on the eect of the perturbation noise before time 1. ˆ Lemma 8. For any given C 0, there exists a constant C 1 0 such that for all, we have 2 E max l ˆ l+1 6 C1 2 : +1 l= +1 = +1 Proof. We only outline the proof as this result is similar to Proposition 7, in [1, pp. 228]. The main dierence is that this proposition considers the sum 1 l=0 l ˆ l+1, whereas we are interested in the sum l= +1 l ˆ l+1. The dierence in the initial limit of the summation does not aect the proof. However, the difference in the nal limit of the summation could aect the proof, because ˆr I{1 =} is not bounded. However, we note that the proof in [1] still goes through, as long as ˆr I{1 =} has bounded moments. To see that ˆr I{1 = } has bounded moments, we observe that it is ane in ˆr 1 I{1 = } which is bounded, with the coecients having bounded moments. The outline of the rest of the proof is as follows. Consider a xed, and suppress the superscript to simplify notation. Note that the perturbation noise ˆ l+1 is of the form F l ˆr l ; Y l+1 F l ˆr l + l+1 ˆr l ; where F r is the steady-state expectation of F r; Y l, and where Y l is a Marov chain with transition ernel P. Using Assumption 3, it is easy to see that for each ; r there exists a solution ˆF r; y to the Poisson equation: ˆF r; y=f r; y F r+p ˆF r; y: The perturbation noise can be expressed in terms of ˆF as follows: ˆ l+1 = l+1 ˆr l + F l ˆr l ; Y l+1 F l ˆr l = l+1 ˆr l +ˆF l ˆr l ; Y l+1 P l ˆF l ˆr l ; Y l+1 = l+1 ˆr l + ˆF l ˆ r l ; Y l+1 P l ˆF l ˆr l ; Y l +P l 1 ˆF l 1 ˆr l 1 ; Y l P l ˆF l ˆr l ; Y l+1 +P l ˆF l ˆr l ; Y l P l ˆF l ˆr l 1 ; Y l +P l ˆF l ˆr l 1 ; Y l P l 1 ˆF l 1 ˆr l 1 ; Y l : To prove the lemma, it is sucient to show that the desired inequality holds when ˆ l is replaced by each of the terms on the right-hand side of the above equation. Indeed, the rst term is a martingale dierence with bounded second moment and the second term is the summand in a telescoping series. The last two terms are of the order O ˆr l ˆr l 1 and O l+1 l, respectively. Using these observations, it is easily shown that each of these terms satises the desired inequality. Lemma 8 indicates that as long as ˆr is bounded, the perturbation noise remains negligible. In the next lemma, we prove that the sequence { ˆr } closely approximates r. Lemma 9. Lim max ˆr r =0,w.p.1. Proof. Since G is bounded, there exists a constant D such that ˆr +1 r +1 6 D l ˆr l r l + l ˆ l+1 l= l=
6 100 V.R. Konda, J.N. Tsitsilis / Systems & Control Letters for every. Using the discrete Gronwall inequality, 1 it is easy to see that max e DT max ˆr +1 r l ˆ l+1 l= ; where T = T Therefore, Lemma 8 and the Chebyshev inequality imply that P max ˆr +1 r +1 6 C l= for some C 1 0 that depends on the constant C in the denition of the stopping time 1 C. Consider the stopping time 2 dened by 2 = min{ : ˆr r }; which is the rst time that the sequence { ˆr } exits from a tube of radius around the sequence {r }.To prove the lemma, we need bounds on P , whereas we only have bounds on P Therefore, we need to relate the stopping times 1 and 2. To do this, note that sup max r 6 C 6 for some constant C 1, because h and G are bounded, and the function G satises Assumption 6. At time = 1 C +, we have ˆr C + and r 6 C, which implies that ˆr r, i.e., 6 1 C + : 2 1 For a non-negative sequence { } and a constant B 0, let {b } be a sequence satisfying b 0 = 0 and b +1 6 b + B : l=0 Then, for every, we have b +1 6 B exp l : l=0 2 l Therefore, P max ˆr +1 r = P = P C + 6 C l : l= The result follows from the summability of the series and the Borel Cantelli lemma. 2 Now we are ready to prove boundedness, using the following simple results. Lemma 10. Suppose that and that {a }, { } are non-negative sequences that satisfy a +1 6 a +. 1 If sup, then sup a. 2 If 0, then a 0. Lemma 11. If an m m matrix G satises r Gr r 2 r R m ; then for suciently small 0, I Gr r : Lemma 12. sup r, w.p.1. Proof. Since h is bounded, Assumption 6 and Lemma 11 imply the following. There exists a constant C such that, for suciently large, and, r a r + C max1; r : Using the inequality 1 x 6 e x, we have 1 r +1 6 e 2 a l= l r + C max1; r : 2 l= l
7 V.R. Konda, J.N. Tsitsilis / Systems & Control Letters This, along with Lemma 9, implies r +1 max1; r 6 e at=2 r max1; r C +T max1; r + ; where 0, w.p.1. Multiplying both sides by max1; r and using the fact that it is less than 1+ r, we have r +1 6 e at=2 + r + CT + : Since e at 1 and 0, w.p.1, it follows from Lemma 10a that sup r ; w:p:1: Recall that r =ˆr is bounded. Then, using Eq. 2 it follows that sup r w:p:1: ; The boundedness of r now follows from the observation: sup r = sup max1; r max ˆr sup {1 + r max r + max r ˆr }; the boundedness of r and r, and Lemma Proof of Theorem 7 To prove Theorem 7, we consider the sequence ˆ = G r h. As noted in Section 3, this sequence evolves according to the iteration: ˆ +1 =ˆ G +1 ˆ ; where 1 +1 = G +1 h Y +1 h G +1 G Y +1 G r + G r ; 2 +1 = 1 G +1 G r h +1 h : Lemma converges w.p.1. Proof. The proof relies on Assumptions 1, 3, and 5, and is omitted because it is very similar to the proof of Lemma 2 in [2, pp. 224]. Lemma 14. lim 2 =0,w.p.1. Proof. Using Assumption 34 and the update equation for, we have C +1 r +1 = C H r +1: Since {H } has bounded moments, the second part of Assumption 4 yields [ ] d E H d for some d 0. Therefore, = H converges to zero, w.p.1. The result follows from the boundedness of {r }. Recall the notation from the previous section. For each, dene, for as follows: +1 =I G ; =ˆ : Lemma 15. lim max ˆ =0,w.p.1. Proof. Since G is bounded, there exists some C such that for every and, 1 ˆ 6 C l ˆ l l + 1 l 1 l+1 l= + 2 l+1 l= : Using the discrete Gronwall inequality, it can be seen that max ˆ e CT max l 1 l l+1 l= 1 6 e CT sup l 1 l+1 l= +ect T sup 2 +1 :
8 102 V.R. Konda, J.N. Tsitsilis / Systems & Control Letters The result follows from the previous two lemmas. We are now ready to complete the proof of Theorem 7. Using Assumption 6 and Lemma 11, we have +1 6 e at=2 : Therefore Lemma 15 implies that ˆ +1 6 e at=2 ˆ + ; where 0 w.p.1. Using Lemma 10b, it follows that G r h = ˆ converges to zero. Using arguments similar to the closing arguments in the proof of Lemma 12, we nally conclude that lim G r h =0; w:p:1: Acnowledgements This research was partially supported by the National Science Foundation under contract ECS , and by the AFOSR under contract F References [1] A. Benveniste, M. Metivier, P. Priouret, Adaptive Algorithms and Stochastic Approximations, Springer, Berlin, [2] D.P. Bertseas, J.N. Tsitsilis, Neuro-Dynamic Programming, Athena Scientic, Belmont, MA, [3] V.S. Borar, Stochastic approximation with two time scales, Systems Control Lett [4] V.S. Borar, S.P. Meyn, The o.d.e. method for convergence of stochastic approximation and reinforcement learning, SIAM J. Control Optim [5] M. Duo, Random Iterative Models, Springer, New Yor, [6] E. Eweda, O. Machi, Convergence of an adaptive linear estimation algorithm, IEEE Trans. Automat. Control AC [7] V.R. Konda, Actor-critic algorithms, Ph.D. Thesis, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, [8] V.R. Konda, J.N. Tsitsilis, On actor-critic algorithms, SIAM J. Control Optim., to appear. [9] H.H. Kushner, Asymptotic behavior of stochastic approximation and large deviations, IEEE Trans. Automat. Control [10] H.J. Kushner, G.G. Yin, Stochastic Approximation Algorithms and Applications, Springer, New Yor, [11] S.P. Meyn, R.L. Tweedie, Marov Chains and Stochastic Stability, Springer, London, [12] R.S. Sutton, Learning to predict by the methods of temporal dierences, Mach. Learning [13] R. Sutton, A. Barto, Reinforcement Learning: An Introduction, MIT Press, Cambridge, MA, [14] B. Widrow, J. McCool, M.G. Larimore, C.R. Johnson, Stationary and non-stationary learning characteristics of the LMS adaptive lter, Proc. IEEE
Almost Sure Convergence of Two Time-Scale Stochastic Approximation Algorithms
Almost Sure Convergence of Two Time-Scale Stochastic Approximation Algorithms Vladislav B. Tadić Abstract The almost sure convergence of two time-scale stochastic approximation algorithms is analyzed under
More informationAverage Reward Parameters
Simulation-Based Optimization of Markov Reward Processes: Implementation Issues Peter Marbach 2 John N. Tsitsiklis 3 Abstract We consider discrete time, nite state space Markov reward processes which depend
More informationProcedia Computer Science 00 (2011) 000 6
Procedia Computer Science (211) 6 Procedia Computer Science Complex Adaptive Systems, Volume 1 Cihan H. Dagli, Editor in Chief Conference Organized by Missouri University of Science and Technology 211-
More informationNP-hardness of the stable matrix in unit interval family problem in discrete time
Systems & Control Letters 42 21 261 265 www.elsevier.com/locate/sysconle NP-hardness of the stable matrix in unit interval family problem in discrete time Alejandra Mercado, K.J. Ray Liu Electrical and
More informationQ-Learning and Enhanced Policy Iteration in Discounted Dynamic Programming
MATHEMATICS OF OPERATIONS RESEARCH Vol. 37, No. 1, February 2012, pp. 66 94 ISSN 0364-765X (print) ISSN 1526-5471 (online) http://dx.doi.org/10.1287/moor.1110.0532 2012 INFORMS Q-Learning and Enhanced
More informationCONVERGENCE RATE OF LINEAR TWO-TIME-SCALE STOCHASTIC APPROXIMATION 1. BY VIJAY R. KONDA AND JOHN N. TSITSIKLIS Massachusetts Institute of Technology
The Annals of Applied Probability 2004, Vol. 14, No. 2, 796 819 Institute of Mathematical Statistics, 2004 CONVERGENCE RATE OF LINEAR TWO-TIME-SCALE STOCHASTIC APPROXIMATION 1 BY VIJAY R. KONDA AND JOHN
More informationQ-Learning and Stochastic Approximation
MS&E338 Reinforcement Learning Lecture 4-04.11.018 Q-Learning and Stochastic Approximation Lecturer: Ben Van Roy Scribe: Christopher Lazarus Javier Sagastuy In this lecture we study the convergence of
More informationOnline solution of the average cost Kullback-Leibler optimization problem
Online solution of the average cost Kullback-Leibler optimization problem Joris Bierkens Radboud University Nijmegen j.bierkens@science.ru.nl Bert Kappen Radboud University Nijmegen b.kappen@science.ru.nl
More informationAdaptive linear quadratic control using policy. iteration. Steven J. Bradtke. University of Massachusetts.
Adaptive linear quadratic control using policy iteration Steven J. Bradtke Computer Science Department University of Massachusetts Amherst, MA 01003 bradtke@cs.umass.edu B. Erik Ydstie Department of Chemical
More informationA characterization of consistency of model weights given partial information in normal linear models
Statistics & Probability Letters ( ) A characterization of consistency of model weights given partial information in normal linear models Hubert Wong a;, Bertrand Clare b;1 a Department of Health Care
More informationIn: Proc. BENELEARN-98, 8th Belgian-Dutch Conference on Machine Learning, pp 9-46, 998 Linear Quadratic Regulation using Reinforcement Learning Stephan ten Hagen? and Ben Krose Department of Mathematics,
More informationOn the Convergence of Optimistic Policy Iteration
Journal of Machine Learning Research 3 (2002) 59 72 Submitted 10/01; Published 7/02 On the Convergence of Optimistic Policy Iteration John N. Tsitsiklis LIDS, Room 35-209 Massachusetts Institute of Technology
More informationCONTROL SYSTEMS, ROBOTICS AND AUTOMATION Vol. XI Stochastic Stability - H.J. Kushner
STOCHASTIC STABILITY H.J. Kushner Applied Mathematics, Brown University, Providence, RI, USA. Keywords: stability, stochastic stability, random perturbations, Markov systems, robustness, perturbed systems,
More information1 Introduction 1.1 The Contribution The well-nown least mean squares (LMS) algorithm, aiming at tracing the \best linear t" of an observed (or desired
Necessary and Sucient Conditions for Stability of LMS Lei Guo y Lennart Ljung z and Guan-Jun Wang x First Version: November 9, 1995 Revised Version: August 2, 1996 Abstract. In a recent wor [7], some general
More informationLinearly-solvable Markov decision problems
Advances in Neural Information Processing Systems 2 Linearly-solvable Markov decision problems Emanuel Todorov Department of Cognitive Science University of California San Diego todorov@cogsci.ucsd.edu
More information1 Problem Formulation
Book Review Self-Learning Control of Finite Markov Chains by A. S. Poznyak, K. Najim, and E. Gómez-Ramírez Review by Benjamin Van Roy This book presents a collection of work on algorithms for learning
More informationSemi-strongly asymptotically non-expansive mappings and their applications on xed point theory
Hacettepe Journal of Mathematics and Statistics Volume 46 (4) (2017), 613 620 Semi-strongly asymptotically non-expansive mappings and their applications on xed point theory Chris Lennard and Veysel Nezir
More informationLecture 5. 1 Chung-Fuchs Theorem. Tel Aviv University Spring 2011
Random Walks and Brownian Motion Tel Aviv University Spring 20 Instructor: Ron Peled Lecture 5 Lecture date: Feb 28, 20 Scribe: Yishai Kohn In today's lecture we return to the Chung-Fuchs theorem regarding
More informationPrediction-based adaptive control of a class of discrete-time nonlinear systems with nonlinear growth rate
www.scichina.com info.scichina.com www.springerlin.com Prediction-based adaptive control of a class of discrete-time nonlinear systems with nonlinear growth rate WEI Chen & CHEN ZongJi School of Automation
More informationNew Lyapunov Krasovskii functionals for stability of linear retarded and neutral type systems
Systems & Control Letters 43 (21 39 319 www.elsevier.com/locate/sysconle New Lyapunov Krasovskii functionals for stability of linear retarded and neutral type systems E. Fridman Department of Electrical
More information1 Introduction This work follows a paper by P. Shields [1] concerned with a problem of a relation between the entropy rate of a nite-valued stationary
Prexes and the Entropy Rate for Long-Range Sources Ioannis Kontoyiannis Information Systems Laboratory, Electrical Engineering, Stanford University. Yurii M. Suhov Statistical Laboratory, Pure Math. &
More information6 Basic Convergence Results for RL Algorithms
Learning in Complex Systems Spring 2011 Lecture Notes Nahum Shimkin 6 Basic Convergence Results for RL Algorithms We establish here some asymptotic convergence results for the basic RL algorithms, by showing
More informationStochastic Gradient Descent in Continuous Time
Stochastic Gradient Descent in Continuous Time Justin Sirignano University of Illinois at Urbana Champaign with Konstantinos Spiliopoulos (Boston University) 1 / 27 We consider a diffusion X t X = R m
More informationIn Advances in Neural Information Processing Systems 6. J. D. Cowan, G. Tesauro and. Convergence of Indirect Adaptive. Andrew G.
In Advances in Neural Information Processing Systems 6. J. D. Cowan, G. Tesauro and J. Alspector, (Eds.). Morgan Kaufmann Publishers, San Fancisco, CA. 1994. Convergence of Indirect Adaptive Asynchronous
More informationApproximate Optimal-Value Functions. Satinder P. Singh Richard C. Yee. University of Massachusetts.
An Upper Bound on the oss from Approximate Optimal-Value Functions Satinder P. Singh Richard C. Yee Department of Computer Science University of Massachusetts Amherst, MA 01003 singh@cs.umass.edu, yee@cs.umass.edu
More informationarxiv: v1 [cs.lg] 23 Oct 2017
Accelerated Reinforcement Learning K. Lakshmanan Department of Computer Science and Engineering Indian Institute of Technology (BHU), Varanasi, India Email: lakshmanank.cse@itbhu.ac.in arxiv:1710.08070v1
More informationSpeedy Q-Learning. Abstract
Speedy Q-Learning Mohammad Gheshlaghi Azar Radboud University Nijmegen Geert Grooteplein N, 6 EZ Nijmegen, Netherlands m.azar@science.ru.nl Mohammad Ghavamzadeh INRIA Lille, SequeL Project 40 avenue Halley
More informationOn Average Versus Discounted Reward Temporal-Difference Learning
Machine Learning, 49, 179 191, 2002 c 2002 Kluwer Academic Publishers. Manufactured in The Netherlands. On Average Versus Discounted Reward Temporal-Difference Learning JOHN N. TSITSIKLIS Laboratory for
More information9 Improved Temporal Difference
9 Improved Temporal Difference Methods with Linear Function Approximation DIMITRI P. BERTSEKAS and ANGELIA NEDICH Massachusetts Institute of Technology Alphatech, Inc. VIVEK S. BORKAR Tata Institute of
More informationOn Differentiability of Average Cost in Parameterized Markov Chains
On Differentiability of Average Cost in Parameterized Markov Chains Vijay Konda John N. Tsitsiklis August 30, 2002 1 Overview The purpose of this appendix is to prove Theorem 4.6 in 5 and establish various
More informationPerformance Comparison of Two Implementations of the Leaky. LMS Adaptive Filter. Scott C. Douglas. University of Utah. Salt Lake City, Utah 84112
Performance Comparison of Two Implementations of the Leaky LMS Adaptive Filter Scott C. Douglas Department of Electrical Engineering University of Utah Salt Lake City, Utah 8411 Abstract{ The leaky LMS
More informationAn almost sure invariance principle for additive functionals of Markov chains
Statistics and Probability Letters 78 2008 854 860 www.elsevier.com/locate/stapro An almost sure invariance principle for additive functionals of Markov chains F. Rassoul-Agha a, T. Seppäläinen b, a Department
More informationWE consider finite-state Markov decision processes
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 54, NO. 7, JULY 2009 1515 Convergence Results for Some Temporal Difference Methods Based on Least Squares Huizhen Yu and Dimitri P. Bertsekas Abstract We consider
More informationRobust linear optimization under general norms
Operations Research Letters 3 (004) 50 56 Operations Research Letters www.elsevier.com/locate/dsw Robust linear optimization under general norms Dimitris Bertsimas a; ;, Dessislava Pachamanova b, Melvyn
More informationFuzzy relational equation with defuzzication algorithm for the largest solution
Fuzzy Sets and Systems 123 (2001) 119 127 www.elsevier.com/locate/fss Fuzzy relational equation with defuzzication algorithm for the largest solution S. Kagei Department of Information and Systems, Faculty
More informationAsymptotic Filtering and Entropy Rate of a Hidden Markov Process in the Rare Transitions Regime
Asymptotic Filtering and Entropy Rate of a Hidden Marov Process in the Rare Transitions Regime Chandra Nair Dept. of Elect. Engg. Stanford University Stanford CA 94305, USA mchandra@stanford.edu Eri Ordentlich
More informationAn Adaptive Clustering Method for Model-free Reinforcement Learning
An Adaptive Clustering Method for Model-free Reinforcement Learning Andreas Matt and Georg Regensburger Institute of Mathematics University of Innsbruck, Austria {andreas.matt, georg.regensburger}@uibk.ac.at
More informationA note on continuous behavior homomorphisms
Available online at www.sciencedirect.com Systems & Control Letters 49 (2003) 359 363 www.elsevier.com/locate/sysconle A note on continuous behavior homomorphisms P.A. Fuhrmann 1 Department of Mathematics,
More informationQ-Learning for Markov Decision Processes*
McGill University ECSE 506: Term Project Q-Learning for Markov Decision Processes* Authors: Khoa Phan khoa.phan@mail.mcgill.ca Sandeep Manjanna sandeep.manjanna@mail.mcgill.ca (*Based on: Convergence of
More informationMarkov processes Course note 2. Martingale problems, recurrence properties of discrete time chains.
Institute for Applied Mathematics WS17/18 Massimiliano Gubinelli Markov processes Course note 2. Martingale problems, recurrence properties of discrete time chains. [version 1, 2017.11.1] We introduce
More informationGeneralization of Hensel lemma: nding of roots of p-adic Lipschitz functions
Generalization of Hensel lemma: nding of roots of p-adic Lipschitz functions (joint talk with Andrei Khrennikov) Dr. Ekaterina Yurova Axelsson Linnaeus University, Sweden September 8, 2015 Outline Denitions
More informationNonmonotonic back-tracking trust region interior point algorithm for linear constrained optimization
Journal of Computational and Applied Mathematics 155 (2003) 285 305 www.elsevier.com/locate/cam Nonmonotonic bac-tracing trust region interior point algorithm for linear constrained optimization Detong
More informationA System Theoretic Perspective of Learning and Optimization
A System Theoretic Perspective of Learning and Optimization Xi-Ren Cao* Hong Kong University of Science and Technology Clear Water Bay, Kowloon, Hong Kong eecao@ee.ust.hk Abstract Learning and optimization
More informationVijaymohan Konda. Doctor of Philosophy. at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY. June Vijaymohan Konda, MMII. All rights reserved.
Actor-Critic Algorithms by Vijaymohan Konda Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Doctor of Philosophy
More informationSpatial autoregression model:strong consistency
Statistics & Probability Letters 65 (2003 71 77 Spatial autoregression model:strong consistency B.B. Bhattacharyya a, J.-J. Ren b, G.D. Richardson b;, J. Zhang b a Department of Statistics, North Carolina
More informationOn at systems behaviors and observable image representations
Available online at www.sciencedirect.com Systems & Control Letters 51 (2004) 51 55 www.elsevier.com/locate/sysconle On at systems behaviors and observable image representations H.L. Trentelman Institute
More informationStochastic Dynamic Programming. Jesus Fernandez-Villaverde University of Pennsylvania
Stochastic Dynamic Programming Jesus Fernande-Villaverde University of Pennsylvania 1 Introducing Uncertainty in Dynamic Programming Stochastic dynamic programming presents a very exible framework to handle
More informationTHE LUBRICATION APPROXIMATION FOR THIN VISCOUS FILMS: REGULARITY AND LONG TIME BEHAVIOR OF WEAK SOLUTIONS A.L. BERTOZZI AND M. PUGH.
THE LUBRICATION APPROXIMATION FOR THIN VISCOUS FILMS: REGULARITY AND LONG TIME BEHAVIOR OF WEAK SOLUTIONS A.L. BERTOI AND M. PUGH April 1994 Abstract. We consider the fourth order degenerate diusion equation
More informationSimulation-Based Optimization of Markov Reward Processes
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 46, NO. 2, FEBRUARY 2001 191 Simulation-Based Optimization of Markov Reward Processes Peter Marbach John N. Tsitsiklis, Fellow, IEEE Abstract This paper proposes
More informationStability of nonlinear AR(1) time series with delay
Stochastic Processes and their Applications 82 (1999 307 333 www.elsevier.com/locate/spa Stability of nonlinear AR(1 time series with delay Daren B.H. Cline, Huay-min H. Pu Department of Statistics, Texas
More informationError Empirical error. Generalization error. Time (number of iteration)
Submitted to Neural Networks. Dynamics of Batch Learning in Multilayer Networks { Overrealizability and Overtraining { Kenji Fukumizu The Institute of Physical and Chemical Research (RIKEN) E-mail: fuku@brain.riken.go.jp
More informationSemi-Markov Decision Problems and Performance Sensitivity Analysis
758 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 48, NO. 5, MAY 2003 Semi-Markov Decision Problems Performance Sensitivity Analysis Xi-Ren Cao, Fellow, IEEE Abstract Recent research indicates that Markov
More informationChapter 7 Interconnected Systems and Feedback: Well-Posedness, Stability, and Performance 7. Introduction Feedback control is a powerful approach to o
Lectures on Dynamic Systems and Control Mohammed Dahleh Munther A. Dahleh George Verghese Department of Electrical Engineering and Computer Science Massachuasetts Institute of Technology c Chapter 7 Interconnected
More informationA converse Lyapunov theorem for discrete-time systems with disturbances
Systems & Control Letters 45 (2002) 49 58 www.elsevier.com/locate/sysconle A converse Lyapunov theorem for discrete-time systems with disturbances Zhong-Ping Jiang a; ; 1, Yuan Wang b; 2 a Department of
More informationDivisor matrices and magic sequences
Discrete Mathematics 250 (2002) 125 135 www.elsevier.com/locate/disc Divisor matrices and magic sequences R.H. Jeurissen Mathematical Institute, University of Nijmegen, Toernooiveld, 6525 ED Nijmegen,
More informationThe best expert versus the smartest algorithm
Theoretical Computer Science 34 004 361 380 www.elsevier.com/locate/tcs The best expert versus the smartest algorithm Peter Chen a, Guoli Ding b; a Department of Computer Science, Louisiana State University,
More informationComments on integral variants of ISS 1
Systems & Control Letters 34 (1998) 93 1 Comments on integral variants of ISS 1 Eduardo D. Sontag Department of Mathematics, Rutgers University, Piscataway, NJ 8854-819, USA Received 2 June 1997; received
More informationPade approximants and noise: rational functions
Journal of Computational and Applied Mathematics 105 (1999) 285 297 Pade approximants and noise: rational functions Jacek Gilewicz a; a; b;1, Maciej Pindor a Centre de Physique Theorique, Unite Propre
More informationLifting to non-integral idempotents
Journal of Pure and Applied Algebra 162 (2001) 359 366 www.elsevier.com/locate/jpaa Lifting to non-integral idempotents Georey R. Robinson School of Mathematics and Statistics, University of Birmingham,
More informationStrassen s law of the iterated logarithm for diusion processes for small time 1
Stochastic Processes and their Applications 74 (998) 9 Strassen s law of the iterated logarithm for diusion processes for small time Lucia Caramellino Dipartimento di Matematica, Universita degli Studi
More informationTemporal difference learning
Temporal difference learning AI & Agents for IET Lecturer: S Luz http://www.scss.tcd.ie/~luzs/t/cs7032/ February 4, 2014 Recall background & assumptions Environment is a finite MDP (i.e. A and S are finite).
More informationOn Finding Optimal Policies for Markovian Decision Processes Using Simulation
On Finding Optimal Policies for Markovian Decision Processes Using Simulation Apostolos N. Burnetas Case Western Reserve University Michael N. Katehakis Rutgers University February 1995 Abstract A simulation
More information1 Introduction The purpose of this paper is to illustrate the typical behavior of learning algorithms using stochastic approximations (SA). In particu
Strong Points of Weak Convergence: A Study Using RPA Gradient Estimation for Automatic Learning Felisa J. Vazquez-Abad * Department of Computer Science and Operations Research University of Montreal, Montreal,
More information2 optimal prices the link is either underloaded or critically loaded; it is never overloaded. For the social welfare maximization problem we show that
1 Pricing in a Large Single Link Loss System Costas A. Courcoubetis a and Martin I. Reiman b a ICS-FORTH and University of Crete Heraklion, Crete, Greece courcou@csi.forth.gr b Bell Labs, Lucent Technologies
More informationApplied Mathematics &Optimization
Appl Math Optim 29: 211-222 (1994) Applied Mathematics &Optimization c 1994 Springer-Verlag New Yor Inc. An Algorithm for Finding the Chebyshev Center of a Convex Polyhedron 1 N.D.Botin and V.L.Turova-Botina
More informationUniversity of California. Berkeley, CA fzhangjun johans lygeros Abstract
Dynamical Systems Revisited: Hybrid Systems with Zeno Executions Jun Zhang, Karl Henrik Johansson y, John Lygeros, and Shankar Sastry Department of Electrical Engineering and Computer Sciences University
More informationThe convergence limit of the temporal difference learning
The convergence limit of the temporal difference learning Ryosuke Nomura the University of Tokyo September 3, 2013 1 Outline Reinforcement Learning Convergence limit Construction of the feature vector
More informationStochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions
International Journal of Control Vol. 00, No. 00, January 2007, 1 10 Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions I-JENG WANG and JAMES C.
More informationConstrained Leja points and the numerical solution of the constrained energy problem
Journal of Computational and Applied Mathematics 131 (2001) 427 444 www.elsevier.nl/locate/cam Constrained Leja points and the numerical solution of the constrained energy problem Dan I. Coroian, Peter
More informationMultiplicative Multifractal Modeling of. Long-Range-Dependent (LRD) Trac in. Computer Communications Networks. Jianbo Gao and Izhak Rubin
Multiplicative Multifractal Modeling of Long-Range-Dependent (LRD) Trac in Computer Communications Networks Jianbo Gao and Izhak Rubin Electrical Engineering Department, University of California, Los Angeles
More informationA Least Squares Q-Learning Algorithm for Optimal Stopping Problems
LIDS REPORT 273 December 2006 Revised: June 2007 A Least Squares Q-Learning Algorithm for Optimal Stopping Problems Huizhen Yu janey.yu@cs.helsinki.fi Dimitri P. Bertsekas dimitrib@mit.edu Abstract We
More informationOff-Policy Actor-Critic
Off-Policy Actor-Critic Ludovic Trottier Laval University July 25 2012 Ludovic Trottier (DAMAS Laboratory) Off-Policy Actor-Critic July 25 2012 1 / 34 Table of Contents 1 Reinforcement Learning Theory
More informationSpurious Chaotic Solutions of Dierential. Equations. Sigitas Keras. September Department of Applied Mathematics and Theoretical Physics
UNIVERSITY OF CAMBRIDGE Numerical Analysis Reports Spurious Chaotic Solutions of Dierential Equations Sigitas Keras DAMTP 994/NA6 September 994 Department of Applied Mathematics and Theoretical Physics
More informationThe average dimension of the hull of cyclic codes
Discrete Applied Mathematics 128 (2003) 275 292 www.elsevier.com/locate/dam The average dimension of the hull of cyclic codes Gintaras Skersys Matematikos ir Informatikos Fakultetas, Vilniaus Universitetas,
More informationExistence and uniqueness of solutions for a continuous-time opinion dynamics model with state-dependent connectivity
Existence and uniqueness of solutions for a continuous-time opinion dynamics model with state-dependent connectivity Vincent D. Blondel, Julien M. Hendricx and John N. Tsitsilis July 24, 2009 Abstract
More information[4] T. I. Seidman, \\First Come First Serve" is Unstable!," tech. rep., University of Maryland Baltimore County, 1993.
[2] C. J. Chase and P. J. Ramadge, \On real-time scheduling policies for exible manufacturing systems," IEEE Trans. Automat. Control, vol. AC-37, pp. 491{496, April 1992. [3] S. H. Lu and P. R. Kumar,
More informationNote Watson Crick D0L systems with regular triggers
Theoretical Computer Science 259 (2001) 689 698 www.elsevier.com/locate/tcs Note Watson Crick D0L systems with regular triggers Juha Honkala a; ;1, Arto Salomaa b a Department of Mathematics, University
More informationApproximate Dynamic Programming
Approximate Dynamic Programming A. LAZARIC (SequeL Team @INRIA-Lille) Ecole Centrale - Option DAD SequeL INRIA Lille EC-RL Course Value Iteration: the Idea 1. Let V 0 be any vector in R N A. LAZARIC Reinforcement
More informationOn the convergence rates of genetic algorithms
Theoretical Computer Science 229 (1999) 23 39 www.elsevier.com/locate/tcs On the convergence rates of genetic algorithms Jun He a;, Lishan Kang b a Department of Computer Science, Northern Jiaotong University,
More informationSome multivariate risk indicators; minimization by using stochastic algorithms
Some multivariate risk indicators; minimization by using stochastic algorithms Véronique Maume-Deschamps, université Lyon 1 - ISFA, Joint work with P. Cénac and C. Prieur. AST&Risk (ANR Project) 1 / 51
More informationSTOCHASTIC DIFFERENTIAL EQUATIONS WITH EXTRA PROPERTIES H. JEROME KEISLER. Department of Mathematics. University of Wisconsin.
STOCHASTIC DIFFERENTIAL EQUATIONS WITH EXTRA PROPERTIES H. JEROME KEISLER Department of Mathematics University of Wisconsin Madison WI 5376 keisler@math.wisc.edu 1. Introduction The Loeb measure construction
More informationOn reaching head-to-tail ratios for balanced and unbalanced coins
Journal of Statistical Planning and Inference 0 (00) 0 0 www.elsevier.com/locate/jspi On reaching head-to-tail ratios for balanced and unbalanced coins Tamas Lengyel Department of Mathematics, Occidental
More information/97/$10.00 (c) 1997 AACC
Optimal Random Perturbations for Stochastic Approximation using a Simultaneous Perturbation Gradient Approximation 1 PAYMAN SADEGH, and JAMES C. SPALL y y Dept. of Mathematical Modeling, Technical University
More informationWARDROP EQUILIBRIA IN AN INFINITE NETWORK
LE MATEMATICHE Vol. LV (2000) Fasc. I, pp. 1728 WARDROP EQUILIBRIA IN AN INFINITE NETWORK BRUCE CALVERT In a nite network, there is a classical theory of trafc ow, which gives existence of a Wardrop equilibrium
More informationElements of Reinforcement Learning
Elements of Reinforcement Learning Policy: way learning algorithm behaves (mapping from state to action) Reward function: Mapping of state action pair to reward or cost Value function: long term reward,
More informationConvergence of Simultaneous Perturbation Stochastic Approximation for Nondifferentiable Optimization
1 Convergence of Simultaneous Perturbation Stochastic Approximation for Nondifferentiable Optimization Ying He yhe@engr.colostate.edu Electrical and Computer Engineering Colorado State University Fort
More informationRichard DiSalvo. Dr. Elmer. Mathematical Foundations of Economics. Fall/Spring,
The Finite Dimensional Normed Linear Space Theorem Richard DiSalvo Dr. Elmer Mathematical Foundations of Economics Fall/Spring, 20-202 The claim that follows, which I have called the nite-dimensional normed
More informationLimiting distribution for subcritical controlled branching processes with random control function
Available online at www.sciencedirect.com Statistics & Probability Letters 67 (2004 277 284 Limiting distribution for subcritical controlled branching processes with random control function M. Gonzalez,
More informationTotal Expected Discounted Reward MDPs: Existence of Optimal Policies
Total Expected Discounted Reward MDPs: Existence of Optimal Policies Eugene A. Feinberg Department of Applied Mathematics and Statistics State University of New York at Stony Brook Stony Brook, NY 11794-3600
More informationBias-Variance Error Bounds for Temporal Difference Updates
Bias-Variance Bounds for Temporal Difference Updates Michael Kearns AT&T Labs mkearns@research.att.com Satinder Singh AT&T Labs baveja@research.att.com Abstract We give the first rigorous upper bounds
More informationOn the fast convergence of random perturbations of the gradient flow.
On the fast convergence of random perturbations of the gradient flow. Wenqing Hu. 1 (Joint work with Chris Junchi Li 2.) 1. Department of Mathematics and Statistics, Missouri S&T. 2. Department of Operations
More informationA Finite Element Method for an Ill-Posed Problem. Martin-Luther-Universitat, Fachbereich Mathematik/Informatik,Postfach 8, D Halle, Abstract
A Finite Element Method for an Ill-Posed Problem W. Lucht Martin-Luther-Universitat, Fachbereich Mathematik/Informatik,Postfach 8, D-699 Halle, Germany Abstract For an ill-posed problem which has its origin
More informationThe nonsmooth Newton method on Riemannian manifolds
The nonsmooth Newton method on Riemannian manifolds C. Lageman, U. Helmke, J.H. Manton 1 Introduction Solving nonlinear equations in Euclidean space is a frequently occurring problem in optimization and
More informationINTRODUCTION TO MARKOV DECISION PROCESSES
INTRODUCTION TO MARKOV DECISION PROCESSES Balázs Csanád Csáji Research Fellow, The University of Melbourne Signals & Systems Colloquium, 29 April 2010 Department of Electrical and Electronic Engineering,
More informationPROOF OF TWO MATRIX THEOREMS VIA TRIANGULAR FACTORIZATIONS ROY MATHIAS
PROOF OF TWO MATRIX THEOREMS VIA TRIANGULAR FACTORIZATIONS ROY MATHIAS Abstract. We present elementary proofs of the Cauchy-Binet Theorem on determinants and of the fact that the eigenvalues of a matrix
More informationECON 616: Lecture 1: Time Series Basics
ECON 616: Lecture 1: Time Series Basics ED HERBST August 30, 2017 References Overview: Chapters 1-3 from Hamilton (1994). Technical Details: Chapters 2-3 from Brockwell and Davis (1987). Intuition: Chapters
More informationComments and Corrections
1386 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL 59, NO 5, MAY 2014 Comments and Corrections Corrections to Stochastic Barbalat s Lemma and Its Applications Xin Yu and Zhaojing Wu Abstract The proof of
More informationLinear Algebra (part 1) : Vector Spaces (by Evan Dummit, 2017, v. 1.07) 1.1 The Formal Denition of a Vector Space
Linear Algebra (part 1) : Vector Spaces (by Evan Dummit, 2017, v. 1.07) Contents 1 Vector Spaces 1 1.1 The Formal Denition of a Vector Space.................................. 1 1.2 Subspaces...................................................
More informationOn Coarse Geometry and Coarse Embeddability
On Coarse Geometry and Coarse Embeddability Ilmari Kangasniemi August 10, 2016 Master's Thesis University of Helsinki Faculty of Science Department of Mathematics and Statistics Supervised by Erik Elfving
More informationIntroduction Wavelet shrinage methods have been very successful in nonparametric regression. But so far most of the wavelet regression methods have be
Wavelet Estimation For Samples With Random Uniform Design T. Tony Cai Department of Statistics, Purdue University Lawrence D. Brown Department of Statistics, University of Pennsylvania Abstract We show
More information