Quasi Stochastic Approximation American Control Conference San Francisco, June 2011
|
|
- Mariah Austin
- 5 years ago
- Views:
Transcription
1 Quasi Stochastic Approximation American Control Conference San Francisco, June 2011 Sean P. Meyn Joint work with Darshan Shirodkar and Prashant Mehta Coordinated Science Laboratory and the Department of Electrical and Computer Engineering University of Illinois at Urbana-Champaign, USA Thanks to NSF & AFOSR
2 Outline 1 Background: Stochastic Approximation 2 Background: Q-learning 3 Quasi-Stochastic Approximation 4 Conclusions 2 / 15
3 Background: Stochastic Approximation Background: Stochastic Approximation Robbins and Monro Stochastic Approximation Setting: Solve the equation h(ϑ) = 0, with h(ϑ) = E[h(ϑ, ζ)], where ζ is random. Robbins and Monro [5] : Fixed point iteration with noisy measurements, ϑ n+1 = ϑ n + a n h(ϑ n, ζ n ), n 0, ϑ 0 R d given. 3 / 15
4 Background: Stochastic Approximation Background: Stochastic Approximation Robbins and Monro Stochastic Approximation Setting: Solve the equation h(ϑ) = 0, with h(ϑ) = E[h(ϑ, ζ)], where ζ is random. Robbins and Monro [5] : Fixed point iteration with noisy measurements, ϑ n+1 = ϑ n + a n h(ϑ n, ζ n ), n 0, ϑ 0 R d given. Typical assumptions for convergence: Step size a n = (1 + n) 1 Random sequnce ζ n is identical to ζ in distribution and i.i.d.. Global stability of ODE θ = h(θ). Excellent recent reference: Borkar 2008 [2]. 3 / 15
5 Background: Stochastic Approximation Background: Stochastic Approximation Variance Linearization of Stochastic Approximation Assuming convergence, write ϑ n+1 a n [ A ϑn + Z n ] with Z n = h(ϑ, ζ n ) (zero mean), and A = h (ϑ ). 4 / 15
6 Background: Stochastic Approximation Background: Stochastic Approximation Variance Linearization of Stochastic Approximation Assuming convergence, write ϑ n+1 a n [ A ϑn + Z n ] with Z n = h(ϑ, ζ n ) (zero mean), and A = h (ϑ ). Central Limit Theorem: n ϑn N(0, Σ ϑ ) 4 / 15
7 Background: Stochastic Approximation Background: Stochastic Approximation Variance Linearization of Stochastic Approximation Assuming convergence, write ϑ n+1 a n [ A ϑn + Z n ] with Z n = h(ϑ, ζ n ) (zero mean), and A = h (ϑ ). Central Limit Theorem: n ϑn N(0, Σ ϑ ) Holds under mild conditions. Lyapunov equation for variance, requires eig (A) < 1 2 : (A I)Σ ϑ + Σ ϑ (A I)T + Σ Z = 0 4 / 15
8 Background: Stochastic Approximation Background: Stochastic Approximation Stochastic Newton Raphson Corollary to CLT Question: What is the optimal matrix gain Γ n? ϑ n+1 = ϑ n + a n Γ n h(ϑ n, ζ n ), n 0, ϑ 0 R d given. 5 / 15
9 Background: Stochastic Approximation Background: Stochastic Approximation Stochastic Newton Raphson Corollary to CLT Question: What is the optimal matrix gain Γ n? ϑ n+1 = ϑ n + a n Γ n h(ϑ n, ζ n ), n 0, ϑ 0 R d given. Answer: Stochastic Newton-Raphson, so that Γ n A 1. 5 / 15
10 Background: Stochastic Approximation Background: Stochastic Approximation Stochastic Newton Raphson Corollary to CLT Question: What is the optimal matrix gain Γ n? ϑ n+1 = ϑ n + a n Γ n h(ϑ n, ζ n ), n 0, ϑ 0 R d given. Answer: Stochastic Newton-Raphson, so that Γ n A 1. Proof: Linearization reduces to standard Monte-Carlo 5 / 15
11 Background: Stochastic Approximation Example: Root finding h(ϑ, ζ) = 1 tan(ϑ) + ζ. ζ, normal random variable N(0, 9) h(ϑ, ζ) = 1 tan(ϑ) + ζ h(ϑ) = 1 tan(ϑ) = ϑ = π/4. 6 / 15
12 Background: Stochastic Approximation Example: Root finding h(ϑ, ζ) = 1 tan(ϑ) + ζ. ζ, normal random variable N(0, 9) h(ϑ, ζ) = 1 tan(ϑ) + ζ h(ϑ) = 1 tan(ϑ) = ϑ = π/4. Estimates of ϑ using i.i.d. and deterministic sequences (each ζ): ζ i.i.d. Gaussian ζ = φ 1 ( ) deterministic / 15
13 Background: Stochastic Approximation Example: Root finding h(ϑ, ζ) = 1 tan(ϑ) + ζ. ζ, normal random variable N(0, 9) h(ϑ, ζ) = 1 tan(ϑ) + ζ h(ϑ) = 1 tan(ϑ) = ϑ = π/4. Estimates of ϑ using i.i.d. and deterministic sequences (each ζ): ζ i.i.d. Gaussian ζ = φ 1 ( ) deterministic Analysis requires theory of quasi-stochastic approximation... 6 / 15
14 Background: Q-learning Background: Q-learning M&M 2009 system: ẋ = f(x, u) cost: c(x, u) HJB equation for discounted-cost optimal control problem, min{c(x, u) + f(x, u) h (x) u }{{} } = γh(x) Watkins & Dayan, Q-learning, 1992; Bertsekas & Tsitsiklis, NDP, 1996; Mehta & Meyn, CDC / 15
15 Background: Q-learning Background: Q-learning M&M 2009 system: ẋ = f(x, u) cost: c(x, u) HJB equation for discounted-cost optimal control problem, min{c(x, u) + f(x, u) h (x) } = γh(x) u }{{} Q function Q(x, u) = c(x, u) + f(x, u) h(x) Watkins & Dayan, Q-learning, 1992; Bertsekas & Tsitsiklis, NDP, 1996; Mehta & Meyn, CDC / 15
16 Background: Q-learning Background: Q-learning M&M 2009 system: ẋ = f(x, u) cost: c(x, u) HJB equation for discounted-cost optimal control problem, min{c(x, u) + f(x, u) h (x) } = γh(x) u }{{} Q function Fixed point equation for Q-function: Writing Q = min u Q(x, u) = γh(x), Q(x, u) = c(x, u) + f(x, u) h(x) Watkins & Dayan, Q-learning, 1992; Bertsekas & Tsitsiklis, NDP, 1996; Mehta & Meyn, CDC / 15
17 Background: Q-learning Background: Q-learning M&M 2009 system: ẋ = f(x, u) cost: c(x, u) HJB equation for discounted-cost optimal control problem, min{c(x, u) + f(x, u) h (x) } = γh(x) u }{{} Q function Fixed point equation for Q-function: Writing Q = min u Q(x, u) = γh(x), Q(x, u) = c(x, u) + f(x, u) h(x) = c(x, u) + γ 1 f(x, u) Q (x) Watkins & Dayan, Q-learning, 1992; Bertsekas & Tsitsiklis, NDP, 1996; Mehta & Meyn, CDC / 15
18 Background: Q-learning Background: Q-learning M&M 2009 system: ẋ = f(x, u) cost: c(x, u) HJB equation for discounted-cost optimal control problem, min{c(x, u) + f(x, u) h (x) } = γh(x) u }{{} Q function Fixed point equation for Q-function: Writing Q = min u Q(x, u) = γh(x), Q(x, u) = c(x, u) + f(x, u) h(x) = c(x, u) + γ 1 f(x, u) Q (x) Parameterization set of approximations, {Q ϑ (x, u) : ϑ R d } Bellman error: E ϑ (x, u) = γ(q ϑ (x, u) c(x, u)) f(x, u) Q θ (x) Watkins & Dayan, Q-learning, 1992; Bertsekas & Tsitsiklis, NDP, 1996; Mehta & Meyn, CDC / 15
19 Background: Q-learning Background: Q-learning M&M 2009 system: ẋ = f(x, u) cost: c(x, u) Parameterization, {Q ϑ (x, u) : ϑ R d } Bellman error: E ϑ (x, u) = γ(q ϑ (x, u) c(x, u)) f(x, u) Q θ (x) Watkins & Dayan, Q-learning, 1992; Bertsekas & Tsitsiklis, NDP, 1996; Mehta & Meyn, CDC / 15
20 Background: Q-learning Background: Q-learning M&M 2009 system: ẋ = f(x, u) cost: c(x, u) Parameterization, {Q ϑ (x, u) : ϑ R d } Bellman error: E ϑ (x, u) = γ(q ϑ (x, u) c(x, u)) f(x, u) Q θ (x) Model-free form: E ϑ (x, u) = γ(q ϑ (x, u) c(x, u)) d dt Qθ (x(t)) x=x(t), u=u(t) Watkins & Dayan, Q-learning, 1992; Bertsekas & Tsitsiklis, NDP, 1996; Mehta & Meyn, CDC / 15
21 Background: Q-learning Background: Q-learning M&M 2009 system: ẋ = f(x, u) cost: c(x, u) Parameterization, {Q ϑ (x, u) : ϑ R d } Bellman error: E ϑ (x, u) = γ(q ϑ (x, u) c(x, u)) f(x, u) Q θ (x) Model-free form: E ϑ (x, u) = γ(q ϑ (x, u) c(x, u)) d dt Qθ (x(t)) x=x(t), u=u(t) Q-learning of M&M Find zeros of h(ϑ) = E[E ϑ (x, u) 2 ] ζ = (x, u ) ergodic steady-state. Choose input: stable feedback + mixture of sinusoids, u(t) = k(x(t)) + ω(t), Watkins & Dayan, Q-learning, 1992; Bertsekas & Tsitsiklis, NDP, 1996; Mehta & Meyn, CDC / 15
22 Background: Q-learning Example: Q-Learning ẋ = x 3 + u, c(x, u) = 1 2 (x2 + u 2 ) HJB Equation: [ min c(x, u) + ( x 3 + u) h (x) ] = γh(x) u 9 / 15
23 Background: Q-learning Example: Q-Learning ẋ = x 3 + u, c(x, u) = 1 2 (x2 + u 2 ) [ HJB Equation: min c(x, u) + ( x 3 + u) h (x) ] = γh(x) u Basis: Q ϑ (x, u) = c(x, u) + ϑ 1 x 2 xu + ϑ x 2 9 / 15
24 Background: Q-learning Example: Q-Learning ẋ = x 3 + u, c(x, u) = 1 2 (x2 + u 2 ) [ HJB Equation: min c(x, u) + ( x 3 + u) h (x) ] = γh(x) u Basis: Q ϑ (x, u) = c(x, u) + ϑ 1 x 2 xu + ϑ x 2 Control: u(t) = A[sin(t) + sin(πt) + sin(et)] 9 / 15
25 Background: Q-learning Example: Q-Learning ẋ = x 3 + u, c(x, u) = 1 2 (x2 + u 2 ) [ HJB Equation: min c(x, u) + ( x 3 + u) h (x) ] = γh(x) u Basis: Q ϑ (x, u) = c(x, u) + ϑ 1 x 2 xu + ϑ x 2 Control: u(t) = A[sin(t) + sin(πt) + sin(et)] 1 Optimal policy Optimal policy Low amplitude input High amplitude input 9 / 15
26 Background: Q-learning Example: Q-Learning ẋ = x 3 + u, c(x, u) = 1 2 (x2 + u 2 ) [ HJB Equation: min c(x, u) + ( x 3 + u) h (x) ] = γh(x) u Basis: Q ϑ (x, u) = c(x, u) + ϑ 1 x 2 xu + ϑ x 2 Control: u(t) = A[sin(t) + sin(πt) + sin(et)] 1 Optimal policy Optimal policy Low amplitude input High amplitude input Analysis requires theory of quasi-stochastic approximation... 9 / 15
27 Quasi-Stochastic Approximation Quasi-Stochastic Approximation Continuous time, deterministic version of SA d ϑ(t) = a(t)h(ϑ(t), ζ(t)) dt 10 / 15
28 Quasi-Stochastic Approximation Quasi-Stochastic Approximation Continuous time, deterministic version of SA Assumptions Ergodicity: ζ satisfies d ϑ(t) = a(t)h(ϑ(t), ζ(t)) dt 1 T h(θ) = lim h(θ, ζ(t)) dt, for all θ R d. T T 0 Decreasing gain: a(t) 0, with 0 a(t) dt =, 0 a(t) 2 dt < usually, take a(t) = (1 + t) 1 10 / 15
29 Quasi-Stochastic Approximation Quasi-Stochastic Approximation Continuous time, deterministic version of SA Assumptions Ergodicity: ζ satisfies d ϑ(t) = a(t)h(ϑ(t), ζ(t)) dt 1 T h(θ) = lim h(θ, ζ(t)) dt, for all θ R d. T T 0 Decreasing gain: a(t) 0, with 0 a(t) dt =, 0 a(t) 2 dt < usually, take a(t) = (1 + t) 1 Stable ODE: h is globally Lipschitz, and ϑ = h(ϑ) is globally asymptotically stable, with globally Lipschitz Lyapunov function 10 / 15
30 Quasi-Stochastic Approximation Quasi-Stochastic Approximation Stability & Convergence Assumptions Ergodicity: ζ satisfies d ϑ(t) = a(t)h(ϑ(t), ζ(t)) dt Decreasing gain: a(t) 0, with 1 T h(θ) = lim h(θ, ζ(t)) dt, for all θ R d. T T 0 a(t) dt =, 0 a(t) 2 dt < 0 Stable ODE: h is globally Lipschitz, and ϑ = h(ϑ) is globally asymptotically stable, with globally Lipschitz Lyapunov function Theorem: ϑ(t) ϑ for any initial condition. 11 / 15
31 Quasi-Stochastic Approximation Quasi-Stochastic Approximation Variance a(t) = 1/(1 + t) The model is linear: h(θ, ζ) = Aθ + ζ, and each λ(a) satisfies Re(λ) < 1. 1 T ζ has zero mean: lim ζ(t) dt = 0, at rate 1/T. T T 0 12 / 15
32 Quasi-Stochastic Approximation Quasi-Stochastic Approximation Variance a(t) = 1/(1 + t) The model is linear: h(θ, ζ) = Aθ + ζ, and each λ(a) satisfies Re(λ) < 1. 1 T ζ has zero mean: lim ζ(t) dt = 0, at rate 1/T. T T 0 Rate of convergence is t 1, and not t 1 2, under these assumptions: 12 / 15
33 Quasi-Stochastic Approximation Quasi-Stochastic Approximation Variance a(t) = 1/(1 + t) The model is linear: h(θ, ζ) = Aθ + ζ, and each λ(a) satisfies Re(λ) < 1. 1 T ζ has zero mean: lim ζ(t) dt = 0, at rate 1/T. T T 0 Rate of convergence is t 1, and not t 1 2, under these assumptions: Theorem: For some constant σ <, lim sup t t ϑ(t) ϑ σ 12 / 15
34 Quasi-Stochastic Approximation Polyak and Juditsky Polyak and Juditsky [4, 2] obtain optimal variance for SA using a very simple approach: High gain, and averaging. 13 / 15
35 Quasi-Stochastic Approximation Polyak and Juditsky Polyak and Juditsky [4, 2] obtain optimal variance for SA using a very simple approach: High gain, and averaging. Deterministic P&J: Choose high-gain, a(t) = 1/(1 + t) δ, with δ (0, 1). d dt γ(t) = 1 h(γ(t), ζ(t)) (1 + t) δ 13 / 15
36 Quasi-Stochastic Approximation Polyak and Juditsky Polyak and Juditsky [4, 2] obtain optimal variance for SA using a very simple approach: High gain, and averaging. Deterministic P&J: Choose high-gain, a(t) = 1/(1 + t) δ, with δ (0, 1). d dt γ(t) = 1 h(γ(t), ζ(t)) (1 + t) δ The output of this algorithm is then averaged: d dt ϑ(t) = 1 ( ) ϑ(t) + γ(t) 1 + t 13 / 15
37 Quasi-Stochastic Approximation Polyak and Juditsky Polyak and Juditsky [4, 2] obtain optimal variance for SA using a very simple approach: High gain, and averaging. Deterministic P&J: Choose high-gain, a(t) = 1/(1 + t) δ, with δ (0, 1). d dt γ(t) = 1 h(γ(t), ζ(t)) (1 + t) δ The output of this algorithm is then averaged: d dt ϑ(t) = 1 ( ) ϑ(t) + γ(t) 1 + t Linear convergence under stability alone 1 There is a finite constant σ satisfying, lim sup t t ϑ(t) ϑ σ 13 / 15
38 Quasi-Stochastic Approximation Polyak and Juditsky Polyak and Juditsky [4, 2] obtain optimal variance for SA using a very simple approach: High gain, and averaging. Deterministic P&J: Choose high-gain, a(t) = 1/(1 + t) δ, with δ (0, 1). d dt γ(t) = 1 h(γ(t), ζ(t)) (1 + t) δ The output of this algorithm is then averaged: d dt ϑ(t) = 1 ( ) ϑ(t) + γ(t) 1 + t Linear convergence under stability alone 1 There is a finite constant σ satisfying, lim sup t 2 Question: Is σ minimal? t ϑ(t) ϑ σ 13 / 15
39 Conclusions Conclusions Summary: QSA is the most natural approach to approximate dynamic programming for deterministic systems, such as Q-learning [3]. Stability is established using standard techniques Linear convergence is obtained under mild stability assumptions. 14 / 15
40 Conclusions Conclusions Summary: QSA is the most natural approach to approximate dynamic programming for deterministic systems, such as Q-learning [3]. Stability is established using standard techniques Linear convergence is obtained under mild stability assumptions. Current research: 1 Open question: Can we extend the approach of Polyak and Juditsky to obtain optimal convergence? 14 / 15
41 Conclusions Conclusions Summary: QSA is the most natural approach to approximate dynamic programming for deterministic systems, such as Q-learning [3]. Stability is established using standard techniques Linear convergence is obtained under mild stability assumptions. Current research: 1 Open question: Can we extend the approach of Polyak and Juditsky to obtain optimal convergence? 1 First we must answer, what is the optimal value of σ? 14 / 15
42 Conclusions Conclusions Summary: QSA is the most natural approach to approximate dynamic programming for deterministic systems, such as Q-learning [3]. Stability is established using standard techniques Linear convergence is obtained under mild stability assumptions. Current research: 1 Open question: Can we extend the approach of Polyak and Juditsky to obtain optimal convergence? 1 First we must answer, what is the optimal value of σ? 2 Concentration on applications: Further development of Q-learning Applications to nonlinear filtering. 14 / 15
43 Conclusions References D.P. Bertsekas and J. N. Tsitsiklis. Neuro-Dynamic Programming. Atena Scientific, Cambridge, Mass, V. S. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint. Hindustan Book Agency and Cambridge University Press (jointly), Delhi, India and Cambridge, UK, P. G. Mehta and S. P. Meyn. Q-learning and Pontryagin s minimum principle. In Proc. of the 48th IEEE Conf. on Dec. and Control; held jointly with the th Chinese Control Conference, pages , Dec B. T. Polyak and A. B. Juditsky. Acceleration of stochastic approximation by averaging. SIAM J. Control Optim., 30(4): , H. Robbins and S. Monro. A stochastic approximation method. Annals of Mathematical Statistics, 22: , C. J. C. H. Watkins and P. Dayan. Q-learning. Machine Learning, 8(3-4): , / 15
Average-cost temporal difference learning and adaptive control variates
Average-cost temporal difference learning and adaptive control variates Sean Meyn Department of ECE and the Coordinated Science Laboratory Joint work with S. Mannor, McGill V. Tadic, Sheffield S. Henderson,
More informationTD Learning. Sean Meyn. Department of Electrical and Computer Engineering University of Illinois & the Coordinated Science Laboratory
TD Learning Sean Meyn Department of Electrical and Computer Engineering University of Illinois & the Coordinated Science Laboratory Joint work with I.-K. Cho, Illinois S. Mannor, McGill V. Tadic, Sheffield
More informationStochastic Gradient Descent in Continuous Time
Stochastic Gradient Descent in Continuous Time Justin Sirignano University of Illinois at Urbana Champaign with Konstantinos Spiliopoulos (Boston University) 1 / 27 We consider a diffusion X t X = R m
More informationAsymptotic and finite-sample properties of estimators based on stochastic gradients
Asymptotic and finite-sample properties of estimators based on stochastic gradients Panagiotis (Panos) Toulis panos.toulis@chicagobooth.edu Econometrics and Statistics University of Chicago, Booth School
More informationInverse Optimality Design for Biological Movement Systems
Inverse Optimality Design for Biological Movement Systems Weiwei Li Emanuel Todorov Dan Liu Nordson Asymtek Carlsbad CA 921 USA e-mail: wwli@ieee.org. University of Washington Seattle WA 98195 USA Google
More informationA Convergent O(n) Algorithm for Off-policy Temporal-difference Learning with Linear Function Approximation
A Convergent O(n) Algorithm for Off-policy Temporal-difference Learning with Linear Function Approximation Richard S. Sutton, Csaba Szepesvári, Hamid Reza Maei Reinforcement Learning and Artificial Intelligence
More informationProcedia Computer Science 00 (2011) 000 6
Procedia Computer Science (211) 6 Procedia Computer Science Complex Adaptive Systems, Volume 1 Cihan H. Dagli, Editor in Chief Conference Organized by Missouri University of Science and Technology 211-
More informationHamilton-Jacobi-Bellman Equation Feb 25, 2008
Hamilton-Jacobi-Bellman Equation Feb 25, 2008 What is it? The Hamilton-Jacobi-Bellman (HJB) equation is the continuous-time analog to the discrete deterministic dynamic programming algorithm Discrete VS
More information1 / 21 Perturbation of system dynamics and the covariance completion problem Armin Zare Joint work with: Mihailo R. Jovanović Tryphon T. Georgiou 55th
1 / 21 Perturbation of system dynamics and the covariance completion problem Armin Zare Joint work with: Mihailo R. Jovanović Tryphon T. Georgiou 55th IEEE Conference on Decision and Control, Las Vegas,
More informationRandomized Algorithms for Semi-Infinite Programming Problems
Randomized Algorithms for Semi-Infinite Programming Problems Vladislav B. Tadić 1, Sean P. Meyn 2, and Roberto Tempo 3 1 Department of Automatic Control and Systems Engineering University of Sheffield,
More informationStochastic Composition Optimization
Stochastic Composition Optimization Algorithms and Sample Complexities Mengdi Wang Joint works with Ethan X. Fang, Han Liu, and Ji Liu ORFE@Princeton ICCOPT, Tokyo, August 8-11, 2016 1 / 24 Collaborators
More informationPerturbation of system dynamics and the covariance completion problem
1 / 21 Perturbation of system dynamics and the covariance completion problem Armin Zare Joint work with: Mihailo R. Jovanović Tryphon T. Georgiou 55th IEEE Conference on Decision and Control, Las Vegas,
More informationGradient-based Adaptive Stochastic Search
1 / 41 Gradient-based Adaptive Stochastic Search Enlu Zhou H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology November 5, 2014 Outline 2 / 41 1 Introduction
More informationProblem 1 Cost of an Infinite Horizon LQR
THE UNIVERSITY OF TEXAS AT SAN ANTONIO EE 5243 INTRODUCTION TO CYBER-PHYSICAL SYSTEMS H O M E W O R K # 5 Ahmad F. Taha October 12, 215 Homework Instructions: 1. Type your solutions in the LATEX homework
More informationLinear-Quadratic-Gaussian (LQG) Controllers and Kalman Filters
Linear-Quadratic-Gaussian (LQG) Controllers and Kalman Filters Emo Todorov Applied Mathematics and Computer Science & Engineering University of Washington Winter 204 Emo Todorov (UW) AMATH/CSE 579, Winter
More informationCentral-limit approach to risk-aware Markov decision processes
Central-limit approach to risk-aware Markov decision processes Jia Yuan Yu Concordia University November 27, 2015 Joint work with Pengqian Yu and Huan Xu. Inventory Management 1 1 Look at current inventory
More informationLecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning
Lecture 0 Neural networks and optimization Machine Learning and Data Mining November 2009 UBC Gradient Searching for a good solution can be interpreted as looking for a minimum of some error (loss) function
More informationarxiv: v1 [cs.lg] 23 Oct 2017
Accelerated Reinforcement Learning K. Lakshmanan Department of Computer Science and Engineering Indian Institute of Technology (BHU), Varanasi, India Email: lakshmanank.cse@itbhu.ac.in arxiv:1710.08070v1
More informationAn Empirical Algorithm for Relative Value Iteration for Average-cost MDPs
2015 IEEE 54th Annual Conference on Decision and Control CDC December 15-18, 2015. Osaka, Japan An Empirical Algorithm for Relative Value Iteration for Average-cost MDPs Abhishek Gupta Rahul Jain Peter
More informationChristopher Watkins and Peter Dayan. Noga Zaslavsky. The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015
Q-Learning Christopher Watkins and Peter Dayan Noga Zaslavsky The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015 Noga Zaslavsky Q-Learning (Watkins & Dayan, 1992)
More informationApproximate Dynamic Programming
Approximate Dynamic Programming A. LAZARIC (SequeL Team @INRIA-Lille) Ecole Centrale - Option DAD SequeL INRIA Lille EC-RL Course Value Iteration: the Idea 1. Let V 0 be any vector in R N A. LAZARIC Reinforcement
More informationA Least Squares Q-Learning Algorithm for Optimal Stopping Problems
LIDS REPORT 273 December 2006 Revised: June 2007 A Least Squares Q-Learning Algorithm for Optimal Stopping Problems Huizhen Yu janey.yu@cs.helsinki.fi Dimitri P. Bertsekas dimitrib@mit.edu Abstract We
More informationTemporal difference learning
Temporal difference learning AI & Agents for IET Lecturer: S Luz http://www.scss.tcd.ie/~luzs/t/cs7032/ February 4, 2014 Recall background & assumptions Environment is a finite MDP (i.e. A and S are finite).
More informationarxiv: v2 [cs.sy] 27 Sep 2016
Analysis of gradient descent methods with non-diminishing, bounded errors Arunselvan Ramaswamy 1 and Shalabh Bhatnagar 2 arxiv:1604.00151v2 [cs.sy] 27 Sep 2016 1 arunselvan@csa.iisc.ernet.in 2 shalabh@csa.iisc.ernet.in
More informationMonte Carlo methods for sampling-based Stochastic Optimization
Monte Carlo methods for sampling-based Stochastic Optimization Gersende FORT LTCI CNRS & Telecom ParisTech Paris, France Joint works with B. Jourdain, T. Lelièvre, G. Stoltz from ENPC and E. Kuhn from
More informationTemporal-Difference Q-learning in Active Fault Diagnosis
Temporal-Difference Q-learning in Active Fault Diagnosis Jan Škach 1 Ivo Punčochář 1 Frank L. Lewis 2 1 Identification and Decision Making Research Group (IDM) European Centre of Excellence - NTIS University
More informationNonlinear Systems and Control Lecture # 19 Perturbed Systems & Input-to-State Stability
p. 1/1 Nonlinear Systems and Control Lecture # 19 Perturbed Systems & Input-to-State Stability p. 2/1 Perturbed Systems: Nonvanishing Perturbation Nominal System: Perturbed System: ẋ = f(x), f(0) = 0 ẋ
More informationLearning parameters in ODEs
Learning parameters in ODEs Application to biological networks Florence d Alché-Buc Joint work with Minh Quach and Nicolas Brunel IBISC FRE 3190 CNRS, Université d Évry-Val d Essonne, France /14 Florence
More informationOpen Theoretical Questions in Reinforcement Learning
Open Theoretical Questions in Reinforcement Learning Richard S. Sutton AT&T Labs, Florham Park, NJ 07932, USA, sutton@research.att.com, www.cs.umass.edu/~rich Reinforcement learning (RL) concerns the problem
More informationSufficient Conditions for the Existence of Resolution Complete Planning Algorithms
Sufficient Conditions for the Existence of Resolution Complete Planning Algorithms Dmitry Yershov and Steve LaValle Computer Science niversity of Illinois at rbana-champaign WAFR 2010 December 15, 2010
More informationIntroduction to Reinforcement Learning. CMPT 882 Mar. 18
Introduction to Reinforcement Learning CMPT 882 Mar. 18 Outline for the week Basic ideas in RL Value functions and value iteration Policy evaluation and policy improvement Model-free RL Monte-Carlo and
More informationOn Stochastic Adaptive Control & its Applications. Bozenna Pasik-Duncan University of Kansas, USA
On Stochastic Adaptive Control & its Applications Bozenna Pasik-Duncan University of Kansas, USA ASEAS Workshop, AFOSR, 23-24 March, 2009 1. Motivation: Work in the 1970's 2. Adaptive Control of Continuous
More informationOn the Convergence of Optimistic Policy Iteration
Journal of Machine Learning Research 3 (2002) 59 72 Submitted 10/01; Published 7/02 On the Convergence of Optimistic Policy Iteration John N. Tsitsiklis LIDS, Room 35-209 Massachusetts Institute of Technology
More informationModel-Based Reinforcement Learning with Continuous States and Actions
Marc P. Deisenroth, Carl E. Rasmussen, and Jan Peters: Model-Based Reinforcement Learning with Continuous States and Actions in Proceedings of the 16th European Symposium on Artificial Neural Networks
More informationA framework for adaptive Monte-Carlo procedures
A framework for adaptive Monte-Carlo procedures Jérôme Lelong (with B. Lapeyre) http://www-ljk.imag.fr/membres/jerome.lelong/ Journées MAS Bordeaux Friday 3 September 2010 J. Lelong (ENSIMAG LJK) Journées
More informationRisk-Sensitive and Robust Mean Field Games
Risk-Sensitive and Robust Mean Field Games Tamer Başar Coordinated Science Laboratory Department of Electrical and Computer Engineering University of Illinois at Urbana-Champaign Urbana, IL - 6181 IPAM
More information1 Problem Formulation
Book Review Self-Learning Control of Finite Markov Chains by A. S. Poznyak, K. Najim, and E. Gómez-Ramírez Review by Benjamin Van Roy This book presents a collection of work on algorithms for learning
More informationStability of optimization problems with stochastic dominance constraints
Stability of optimization problems with stochastic dominance constraints D. Dentcheva and W. Römisch Stevens Institute of Technology, Hoboken Humboldt-University Berlin www.math.hu-berlin.de/~romisch SIAM
More informationDynamic programming using radial basis functions and Shepard approximations
Dynamic programming using radial basis functions and Shepard approximations Oliver Junge, Alex Schreiber Fakultät für Mathematik Technische Universität München Workshop on algorithms for dynamical systems
More informationIntroduction to Nonlinear Control Lecture # 3 Time-Varying and Perturbed Systems
p. 1/5 Introduction to Nonlinear Control Lecture # 3 Time-Varying and Perturbed Systems p. 2/5 Time-varying Systems ẋ = f(t, x) f(t, x) is piecewise continuous in t and locally Lipschitz in x for all t
More informationOnline Control Basis Selection by a Regularized Actor Critic Algorithm
Online Control Basis Selection by a Regularized Actor Critic Algorithm Jianjun Yuan and Andrew Lamperski Abstract Policy gradient algorithms are useful reinforcement learning methods which optimize a control
More informationA Systematic Approach to Extremum Seeking Based on Parameter Estimation
49th IEEE Conference on Decision and Control December 15-17, 21 Hilton Atlanta Hotel, Atlanta, GA, USA A Systematic Approach to Extremum Seeking Based on Parameter Estimation Dragan Nešić, Alireza Mohammadi
More informationc 2014 Krishna Kalyan Medarametla
c 2014 Krishna Kalyan Medarametla COMPARISON OF TWO NONLINEAR FILTERING TECHNIQUES - THE EXTENDED KALMAN FILTER AND THE FEEDBACK PARTICLE FILTER BY KRISHNA KALYAN MEDARAMETLA THESIS Submitted in partial
More informationHJB equations. Seminar in Stochastic Modelling in Economics and Finance January 10, 2011
Department of Probability and Mathematical Statistics Faculty of Mathematics and Physics, Charles University in Prague petrasek@karlin.mff.cuni.cz Seminar in Stochastic Modelling in Economics and Finance
More informationMarkov Decision Processes and Dynamic Programming
Markov Decision Processes and Dynamic Programming A. LAZARIC (SequeL Team @INRIA-Lille) Ecole Centrale - Option DAD SequeL INRIA Lille EC-RL Course In This Lecture A. LAZARIC Markov Decision Processes
More informationStochastic Process II Dr.-Ing. Sudchai Boonto
Dr-Ing Sudchai Boonto Department of Control System and Instrumentation Engineering King Mongkuts Unniversity of Technology Thonburi Thailand Random process Consider a random experiment specified by the
More informationQuadratic Stability of Dynamical Systems. Raktim Bhattacharya Aerospace Engineering, Texas A&M University
.. Quadratic Stability of Dynamical Systems Raktim Bhattacharya Aerospace Engineering, Texas A&M University Quadratic Lyapunov Functions Quadratic Stability Dynamical system is quadratically stable if
More informationC~I- c-5~- n.=. r (s) = + 'I. ~t(y(-.*) - y^cbnld)
A C~I- c-5~- n.=. r (s) = + 'I. ~t(y(-.*) - y^cbnld) b 9, (4 f 3 cc*l, 'PC w4. ECE 555 Control of Stochastic Systems Fall 2005 Handout: Reinforcement learning In this handout we analyse reinforcement
More informationAlmost Sure Convergence of Two Time-Scale Stochastic Approximation Algorithms
Almost Sure Convergence of Two Time-Scale Stochastic Approximation Algorithms Vladislav B. Tadić Abstract The almost sure convergence of two time-scale stochastic approximation algorithms is analyzed under
More informationErrors-in-variables identification through covariance matching: Analysis of a colored measurement noise case
008 American Control Conference Westin Seattle Hotel Seattle Washington USA June -3 008 WeB8.4 Errors-in-variables identification through covariance matching: Analysis of a colored measurement noise case
More informationI D I A P. Online Policy Adaptation for Ensemble Classifiers R E S E A R C H R E P O R T. Samy Bengio b. Christos Dimitrakakis a IDIAP RR 03-69
R E S E A R C H R E P O R T Online Policy Adaptation for Ensemble Classifiers Christos Dimitrakakis a IDIAP RR 03-69 Samy Bengio b I D I A P December 2003 D a l l e M o l l e I n s t i t u t e for Perceptual
More informationValue Function Based Reinforcement Learning in Changing Markovian Environments
Journal of Machine Learning Research 9 (2008) 1679-1709 Submitted 6/07; Revised 12/07; Published 8/08 Value Function Based Reinforcement Learning in Changing Markovian Environments Balázs Csanád Csáji
More information6 OUTPUT FEEDBACK DESIGN
6 OUTPUT FEEDBACK DESIGN When the whole sate vector is not available for feedback, i.e, we can measure only y = Cx. 6.1 Review of observer design Recall from the first class in linear systems that a simple
More informationAuxiliary Particle Methods
Auxiliary Particle Methods Perspectives & Applications Adam M. Johansen 1 adam.johansen@bristol.ac.uk Oxford University Man Institute 29th May 2008 1 Collaborators include: Arnaud Doucet, Nick Whiteley
More informationOptimal Control. Lecture 18. Hamilton-Jacobi-Bellman Equation, Cont. John T. Wen. March 29, Ref: Bryson & Ho Chapter 4.
Optimal Control Lecture 18 Hamilton-Jacobi-Bellman Equation, Cont. John T. Wen Ref: Bryson & Ho Chapter 4. March 29, 2004 Outline Hamilton-Jacobi-Bellman (HJB) Equation Iterative solution of HJB Equation
More informationON STEP SIZES, STOCHASTIC SHORTEST PATHS, AND SURVIVAL PROBABILITIES IN REINFORCEMENT LEARNING. Abhijit Gosavi
Proceedings of the 2008 Winter Simulation Conference S. J. Mason, R. R. Hill, L. Moench, O. Rose, eds. ON STEP SIZES, STOCHASTIC SHORTEST PATHS, AND SURVIVAL PROBABILITIES IN REINFORCEMENT LEARNING Abhijit
More informationLecture 5 Linear Quadratic Stochastic Control
EE363 Winter 2008-09 Lecture 5 Linear Quadratic Stochastic Control linear-quadratic stochastic control problem solution via dynamic programming 5 1 Linear stochastic system linear dynamical system, over
More informationSUPPLEMENT TO CONTROLLED EQUILIBRIUM SELECTION IN STOCHASTICALLY PERTURBED DYNAMICS
SUPPLEMENT TO CONTROLLED EQUILIBRIUM SELECTION IN STOCHASTICALLY PERTURBED DYNAMICS By Ari Arapostathis, Anup Biswas, and Vivek S. Borkar The University of Texas at Austin, Indian Institute of Science
More informationControlled Diffusions and Hamilton-Jacobi Bellman Equations
Controlled Diffusions and Hamilton-Jacobi Bellman Equations Emo Todorov Applied Mathematics and Computer Science & Engineering University of Washington Winter 2014 Emo Todorov (UW) AMATH/CSE 579, Winter
More informationON THE POLICY IMPROVEMENT ALGORITHM IN CONTINUOUS TIME
ON THE POLICY IMPROVEMENT ALGORITHM IN CONTINUOUS TIME SAUL D. JACKA AND ALEKSANDAR MIJATOVIĆ Abstract. We develop a general approach to the Policy Improvement Algorithm (PIA) for stochastic control problems
More informationON STEP SIZES, STOCHASTIC SHORTEST PATHS, AND SURVIVAL PROBABILITIES IN REINFORCEMENT LEARNING. Abhijit Gosavi
Proceedings of the 2008 Winter Simulation Conference S. J. Mason, R. R. Hill, L. Mönch, O. Rose, T. Jefferson, J. W. Fowler eds. ON STEP SIZES, STOCHASTIC SHORTEST PATHS, AND SURVIVAL PROBABILITIES IN
More informationAdaptive linear quadratic control using policy. iteration. Steven J. Bradtke. University of Massachusetts.
Adaptive linear quadratic control using policy iteration Steven J. Bradtke Computer Science Department University of Massachusetts Amherst, MA 01003 bradtke@cs.umass.edu B. Erik Ydstie Department of Chemical
More informationAlgorithms for Reinforcement Learning
Algorithms for Reinforcement Learning Draft of the lecture published in the Synthesis Lectures on Artificial Intelligence and Machine Learning series by Morgan & Claypool Publishers Csaba Szepesvári June
More informationAn asymptotic ratio characterization of input-to-state stability
1 An asymptotic ratio characterization of input-to-state stability Daniel Liberzon and Hyungbo Shim Abstract For continuous-time nonlinear systems with inputs, we introduce the notion of an asymptotic
More informationReinforcement learning
Reinforcement learning Based on [Kaelbling et al., 1996, Bertsekas, 2000] Bert Kappen Reinforcement learning Reinforcement learning is the problem faced by an agent that must learn behavior through trial-and-error
More informationBeyond stochastic gradient descent for large-scale machine learning
Beyond stochastic gradient descent for large-scale machine learning Francis Bach INRIA - Ecole Normale Supérieure, Paris, France Joint work with Eric Moulines - October 2014 Big data revolution? A new
More informationON-LINE OPTIMIZATION OF SWITCHED-MODE DYNAMICAL SYSTEMS 1. X.C. Ding, Y. Wardi 2, and M. Egerstedt
ON-LINE OPTIMIZATION OF SWITCHED-MODE DYNAMICAL SYSTEMS 1 X.C. Ding, Y. Wardi, and M. Egerstedt Abstract This paper considers an optimization problem in the setting of switched-mode hybrid dynamical systems,
More informationLinearly-Convergent Stochastic-Gradient Methods
Linearly-Convergent Stochastic-Gradient Methods Joint work with Francis Bach, Michael Friedlander, Nicolas Le Roux INRIA - SIERRA Project - Team Laboratoire d Informatique de l École Normale Supérieure
More informationOPTIMAL CONTROL. Sadegh Bolouki. Lecture slides for ECE 515. University of Illinois, Urbana-Champaign. Fall S. Bolouki (UIUC) 1 / 28
OPTIMAL CONTROL Sadegh Bolouki Lecture slides for ECE 515 University of Illinois, Urbana-Champaign Fall 2016 S. Bolouki (UIUC) 1 / 28 (Example from Optimal Control Theory, Kirk) Objective: To get from
More informationStochastic Hybrid Systems: Modeling, analysis, and applications to networks and biology
research supported by NSF Stochastic Hybrid Systems: Modeling, analysis, and applications to networks and biology João P. Hespanha Center for Control Engineering and Computation University of California
More informationChapter 2 Event-Triggered Sampling
Chapter Event-Triggered Sampling In this chapter, some general ideas and basic results on event-triggered sampling are introduced. The process considered is described by a first-order stochastic differential
More informationIterative Markov Chain Monte Carlo Computation of Reference Priors and Minimax Risk
Iterative Markov Chain Monte Carlo Computation of Reference Priors and Minimax Risk John Lafferty School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 lafferty@cs.cmu.edu Abstract
More informationApproximate Dynamic Programming
Master MVA: Reinforcement Learning Lecture: 5 Approximate Dynamic Programming Lecturer: Alessandro Lazaric http://researchers.lille.inria.fr/ lazaric/webpage/teaching.html Objectives of the lecture 1.
More informationHYBRID DETERMINISTIC-STOCHASTIC GRADIENT LANGEVIN DYNAMICS FOR BAYESIAN LEARNING
COMMUNICATIONS IN INFORMATION AND SYSTEMS c 01 International Press Vol. 1, No. 3, pp. 1-3, 01 003 HYBRID DETERMINISTIC-STOCHASTIC GRADIENT LANGEVIN DYNAMICS FOR BAYESIAN LEARNING QI HE AND JACK XIN Abstract.
More informationA Function Approximation Approach to Estimation of Policy Gradient for POMDP with Structured Policies
A Function Approximation Approach to Estimation of Policy Gradient for POMDP with Structured Policies Huizhen Yu Lab for Information and Decision Systems Massachusetts Institute of Technology Cambridge,
More informationGradient-Based Adaptive Stochastic Search for Non-Differentiable Optimization
Gradient-Based Adaptive Stochastic Search for Non-Differentiable Optimization Enlu Zhou Department of Industrial & Enterprise Systems Engineering, University of Illinois at Urbana-Champaign, IL 61801,
More informationProceedings of the International Conference on Neural Networks, Orlando Florida, June Leemon C. Baird III
Proceedings of the International Conference on Neural Networks, Orlando Florida, June 1994. REINFORCEMENT LEARNING IN CONTINUOUS TIME: ADVANTAGE UPDATING Leemon C. Baird III bairdlc@wl.wpafb.af.mil Wright
More informationOn Two Different Signal Processing Models
On Two Different Signal Processing Models Department of Mathematics & Statistics Indian Institute of Technology Kanpur January 15, 2015 Outline First Model 1 First Model 2 3 4 5 6 Outline First Model 1
More informationECE276B: Planning & Learning in Robotics Lecture 16: Model-free Control
ECE276B: Planning & Learning in Robotics Lecture 16: Model-free Control Lecturer: Nikolay Atanasov: natanasov@ucsd.edu Teaching Assistants: Tianyu Wang: tiw161@eng.ucsd.edu Yongxi Lu: yol070@eng.ucsd.edu
More informationDynamic Matching Models
Dynamic Matching Models Ana Bušić Inria Paris - Rocquencourt CS Department of École normale supérieure joint work with Varun Gupta, Jean Mairesse and Sean Meyn 3rd Workshop on Cognition and Control January
More informationFinding the best mismatched detector for channel coding and hypothesis testing
Finding the best mismatched detector for channel coding and hypothesis testing Sean Meyn Department of Electrical and Computer Engineering University of Illinois and the Coordinated Science Laboratory
More informationA System Theoretic Perspective of Learning and Optimization
A System Theoretic Perspective of Learning and Optimization Xi-Ren Cao* Hong Kong University of Science and Technology Clear Water Bay, Kowloon, Hong Kong eecao@ee.ust.hk Abstract Learning and optimization
More informationThe Art of Sequential Optimization via Simulations
The Art of Sequential Optimization via Simulations Stochastic Systems and Learning Laboratory EE, CS* & ISE* Departments Viterbi School of Engineering University of Southern California (Based on joint
More informationLinearly-Solvable Stochastic Optimal Control Problems
Linearly-Solvable Stochastic Optimal Control Problems Emo Todorov Applied Mathematics and Computer Science & Engineering University of Washington Winter 2014 Emo Todorov (UW) AMATH/CSE 579, Winter 2014
More informationDeterministic Dynamic Programming
Deterministic Dynamic Programming 1 Value Function Consider the following optimal control problem in Mayer s form: V (t 0, x 0 ) = inf u U J(t 1, x(t 1 )) (1) subject to ẋ(t) = f(t, x(t), u(t)), x(t 0
More informationControlled sequential Monte Carlo
Controlled sequential Monte Carlo Jeremy Heng, Department of Statistics, Harvard University Joint work with Adrian Bishop (UTS, CSIRO), George Deligiannidis & Arnaud Doucet (Oxford) Bayesian Computation
More informationChapter 13 Wow! Least Squares Methods in Batch RL
Chapter 13 Wow! Least Squares Methods in Batch RL Objectives of this chapter: Introduce batch RL Tradeoffs: Least squares vs. gradient methods Evaluating policies: Fitted value iteration Bellman residual
More informationGradient estimation for attractor networks. Thomas Flynn
Gradient estimation for attractor networks Thomas Flynn Abstract We study several types of neural networks that involve feedback connections. These models lead to interesting problems in optimization and
More informationStability and Asymptotic Optimality of h-maxweight Policies
Stability and Asymptotic Optimality of h-maxweight Policies - A. Rybko, 2006 Sean Meyn Department of Electrical and Computer Engineering University of Illinois & the Coordinated Science Laboratory NSF
More informationResource Pooling for Optimal Evacuation of a Large Building
Proceedings of the 47th IEEE Conference on Decision and Control Cancun, Mexico, Dec. 9-11, 28 Resource Pooling for Optimal Evacuation of a Large Building Kun Deng, Wei Chen, Prashant G. Mehta, and Sean
More informationStability of Nonlinear Systems An Introduction
Stability of Nonlinear Systems An Introduction Michael Baldea Department of Chemical Engineering The University of Texas at Austin April 3, 2012 The Concept of Stability Consider the generic nonlinear
More informationStatistical Sparse Online Regression: A Diffusion Approximation Perspective
Statistical Sparse Online Regression: A Diffusion Approximation Perspective Jianqing Fan Wenyan Gong Chris Junchi Li Qiang Sun Princeton University Princeton University Princeton University University
More informationIntroduction to multiscale modeling and simulation. Explicit methods for ODEs : forward Euler. y n+1 = y n + tf(y n ) dy dt = f(y), y(0) = y 0
Introduction to multiscale modeling and simulation Lecture 5 Numerical methods for ODEs, SDEs and PDEs The need for multiscale methods Two generic frameworks for multiscale computation Explicit methods
More informationReinforcement Learning in Continuous Action Spaces
Reinforcement Learning in Continuous Action Spaces Hado van Hasselt and Marco A. Wiering Intelligent Systems Group, Department of Information and Computing Sciences, Utrecht University Padualaan 14, 3508
More informationStochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions
International Journal of Control Vol. 00, No. 00, January 2007, 1 10 Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions I-JENG WANG and JAMES C.
More informationLyapunov Stability Theory
Lyapunov Stability Theory Peter Al Hokayem and Eduardo Gallestey March 16, 2015 1 Introduction In this lecture we consider the stability of equilibrium points of autonomous nonlinear systems, both in continuous
More informationarxiv: v1 [math.na] 26 Feb 2013
1 Feedback Particle Filter Tao Yang, Prashant G. Mehta and Sean P. Meyn Abstract arxiv:1302.6563v1 [math.na] 26 Feb 2013 A new formulation of the particle filter for nonlinear filtering is presented, based
More informationLecture 5: Control Over Lossy Networks
Lecture 5: Control Over Lossy Networks Yilin Mo July 2, 2015 1 Classical LQG Control The system: x k+1 = Ax k + Bu k + w k, y k = Cx k + v k x 0 N (0, Σ), w k N (0, Q), v k N (0, R). Information available
More informationilstd: Eligibility Traces and Convergence Analysis
ilstd: Eligibility Traces and Convergence Analysis Alborz Geramifard Michael Bowling Martin Zinkevich Richard S. Sutton Department of Computing Science University of Alberta Edmonton, Alberta {alborz,bowling,maz,sutton}@cs.ualberta.ca
More informationDirac Mixture Density Approximation Based on Minimization of the Weighted Cramér von Mises Distance
Dirac Mixture Density Approximation Based on Minimization of the Weighted Cramér von Mises Distance Oliver C Schrempf, Dietrich Brunn, and Uwe D Hanebeck Abstract This paper proposes a systematic procedure
More information