Data-Efficient Control Policy Search using Residual Dynamics Learning
|
|
- Ralf Collins
- 6 years ago
- Views:
Transcription
1 Data-Efficient Control Policy Search sing Residal Dynamics Learning Matteo Saveriano 1, Ychao Yin 1, Pietro Falco 1 and Donghei Lee 1,2 Abstract In this work, we propose a model-based and data efficient approach for reinforcement learning. The main idea of or algorithm is to combine simlated and real rollots to efficiently find an optimal control policy. While performing rollots on the robot, we exploit sensory data to learn a probabilistic model of the residal difference between the measred state and the state predicted by a simplified model. The simplified model can be any dynamical system, from a very accrate system to a simple, linear one. The residal difference is learned with Gassian processes. Hence, we assme that the difference between real and simplified model is Gassian distribted, which is less strict than assming that the real system is Gassian distribted. The combination of the partial model and the learned residals is exploited to predict the real system behavior and to search for an optimal policy. Simlations and experiments show that or approach significantly redces the nmber of rollots needed to find an optimal control policy for the real system. Simlation Approximate Policy Policy Improvement Sec. II.E, II.F Real Experience θ n Real Policy Δx π a π n Approximate Model + Robot π a, x a Learning Residal Dynamics Sec. II.B, II.C π r, x r x r x a I. INTRODUCTION Fig. 1. Overview of the algorithm. Robots that learn novel tasks by self-practice have the chance to be exploited in a wide range of sitations, inclding indstrial and social applications. Reinforcement Learning RL gives the agent the possibility to atonomosly learn a task by trial and error [1]. In particlar, the agent performs a nmber of trials rollots and ses sensory data to improve the exection. In robotics and control applications, RL presents two main disadvantages [2]. First, the state and action spaces are continos-valed and high dimensional, and existing approaches in RL to search continos spaces are not sample-efficient. A common strategy [3] [6] to redce the search space to a discrete space consists in parameterizing the control policy with a discrete nmber of learnable parameters. Typical policy parameterizations adopt Gassian mixtre models [3], hidden Markov models [7], [8], or stable dynamical systems [5]. Second, it is extremely time consming to perform a big nmber of trials rollots on real devices. The rollots can be redced with a proper initialization of the policy, for example sing kinesthetic teaching approaches [9]. However, redcing the rollots needed to find an optimal policy is still an open problem. RL algorithms can be classified into two categories, namely model-free and model-based approaches. Model-free methods [4] [6] directly learn the policy on the data samples obtained after each rollot. Model-free approaches do not reqire any model approximation and are applicable in many 1 Hman-centered Assistive Robotics, Technical University of Mnich, Mnich, Germany {matteo.saveriano, ychao.yin, pietro.falco, dhlee}@tm.de 2 Institte of Robotics and Mechatronics, German Aerospace Center DLR, Germany sitations. The problem of model-free approaches is that they may reqire many rollots to find the optimal policy [4]. Model-based methods, instead, explicitly learn a model of the system to control from sensory data and se this learned model to search the optimal policy. The comparison in [1] shows that model-based approaches are efficient compared with model-free methods. However, model-based methods strongly sffer from the so-called model bias problem. Indeed, they assme that the learned model is sfficiently accrate to represent the real system [11] [13]. Learning an accrate model of the system is not always possible. This is the case, for instance, when the training data is too sparse. It is clear that searching for a control policy sing inaccrate models might lead to poor reslts. The Probabilistic Inference for Learning Control algorithm in [1] alleviates the model-bias problem by inclding ncertainty in the learned model. In particlar, leverages Gassian Processes GP [14] to learn a probabilistic model of the system which represents model ncertainties as Gassian noise. The GP model allows to consider ncertainties in the long-term state predictions, and to explicitly consider evental model errors in the policy evalation. The comparison performed in [1] shows that otperforms state-of-the-art algorithms in terms of performed rollots on the real robot in pendlm and cart pole swing-p tasks. Despite the good performance, needs to learn a reliable model from sensory data to find an optimal policy. In order to explore the state space as mch as possible and to rapidly learn a reliable model, applies a random policy at the first stage of the learning. In
2 general, starting from a randomly initialized policy can be dangeros for real systems. The robot, in fact, can exhibit nstable behaviors in the first stages of the learning process, with the conseqent risk of damages. The work in [15] combines the benefits of model-free and model-based RL. Athors propose to model the system to control as a dynamical system and to find an initial control policy by sing optimal control. The learned policy is then applied to the real system and improved sing a model-free approach. The policy learned on the model represents a good starting point for policy search, significantly redcing the nmber of performed rollots. The approach in [16] also ses optimal control to find an initial policy, bt it ses sensory data to fit a locally linear model of the system. In this paper, we propose a model-based RL approach that exploits an approximate, analytical model of the robot to rapidly learn an optimal control policy. The proposed Policy Improvement with REsidal Model learning assmes that the system is modeled as a non-linear dynamical system. In real scenarios, it is not trivial to analytically derive the entire model, bt it is reasonable to assme that part of the model is easy to derive. For instance, in a robotic application, the model of the robot in free motion has a wellknown strctre [17], bt it is hard to model the sensor noise or the friction coefficient [18]. We refer to the known part of the model as the approximate model. Given the approximate model, we learn an optimal control policy in simlation. We exploit in this stdy, bt other choices are possible. Directly applying the policy learned in simlation on a real robot does not garantee to flfill the task. Hence, we exploit sensory data to learn the residal difference between the real and the approximate model sing Gassian Processes GP [14]. The sperposition of the approximate model and the learned residal term represents a good estimation of the real dynamics and it is adopted for model-based policy search. An overview of is shown in Fig. 1. In contrast to, or approach exploits the approximate model to be more data efficient and to converge faster to the optimal policy. Compared to the work in [15], explicitly learns the residal difference to improve the model and speed-p the policy search. Indeed, the approximate model is globally valid and we se sensory data only to estimate local variations of that model. In contrast to [16], we do not constraint the system dynamics to be locally linear. The rest of the paper is organized as follows. Section II describes or approach for model-based policy search. Experiments and comparisons are reported in Sec. III. Section IV states the conclsion and proposes ftre extensions. II. POLICY IMPROVEMENT WITH RESIDUAL MODEL LEARNING A. Preliminaries The goal of a reinforcement learning algorithm is to find a control policy = πx that minimizes the expected retrn J π θ = N E[cx t ] 1 t= where E[ ] is the operator that comptes the expected vale of a random variable, and cx t is the cost of being in state x at time t. The vector θ represents a set of vales sed to parameterize the continos-valed policy fnction πx, θ. In this work, we consider that the system to control is modeled as a non-linear dynamical system in the form x r t+1 = f r x r t, r t 2 where x r R d is the continos-valed state vector, r R f is the control inpt vector, f r R d is a continos and continosly differentiable fnction. The sperscript r indicates that the dynamical system 2 is the real exact model of the system. The difference eqation 2 can model a variety of physical systems, inclding robotic platforms. B. Additive Unknown Dynamics In general, it is not trivial to derive the exact model in 2. For example, althogh it is relatively simple to compte the dynamical model of a robotic maniplator in free motion, it is complicated to model evental external forces. In many robotics applications, we can assme that the fnction f r, in 2 consists of two terms: i a deterministic and know term, and ii an additive nknown term. Under this assmption, the dynamical system in 2 can be rewritten as x r t+1 = f a x r t, r t + f x r t, r t + w 3 where the fnction f a, is assmed known and deterministic, f, + w is an nknown and stochastic term, and w R d is an additive Gassian noise. The dynamical system x a t+1 = f a x a t, a t 4 represents the known dynamics in 3, and we call it the approximate model. Note that x a t and x r t represent the same state vector in different points of the state space R d. The approximate model is known a priori. Hence, a control policy a = π a x a for 3 can be learned in simlation by sing a reinforcement learning or an optimal control approach. In this work, we sed the algorithm [1] to learn π a, bt other choices are possible. It is worth noting that searching for the policy π a reqires only simlations, i.e., no experiments on the robot are performed at this stage. The learned policy π a represents a good starting point to search for a control policy π r for the real robot. This idea of initializing a control policy from simlations has been effectively applied in [15], where a linear system is sed as approximate model. Apart form initializing a policy from simlation, this work exploits real data for the robot to explicitly learn the nknown dynamics f, + w in 3. In other words, when the learned policy π a is applied to the real robot, the measred states are sed either to improve the crrent policy or to learn an estimate of the nknown dynamic. This significantly redces the nmber of rollots to perform on the real robot [1].
3 C. Learning Unknown Dynamics sing Gassian Processes The nknown dynamics f, + w in 3 is learned from sensory data sing Gassian Processes GP [14]. The goal of the learning process is to obtain an estimate ˆf of f +w sch that f a, + ˆf, f a, +f, +w. To learn the nknown dynamics, we exploit the real x r t and approximate x a t states, as well as the real r t and approximate a t control inpts. For convenience, let s define t = x t, t, where x t = x r t x a t and t = r t a t. We also define x r t = x r t, r t and x a t = x a t, a t. Using these definitions, it is straightforward to show that x r t = x a t + t and to rewrite the real dynamics in 3 as f r x r t = f r x a t + t = f a x a t + t + f x a t + t + w 5 Recalling that the zero-order Taylor series expansion of a fnction fa + a = fa + r a, we can write f r x a t + t = f a x a t + r a t + f x a t + r t + w 6 where r a t and r t are the residal errors generated by stopping the Taylor series expansion at the zero-order. Considering 6, the difference between the real 3 and approximate 4 systems can be written as x t+1 = f r x a t + t f a x a t = f a x a t + f x a t + r t + w f a x a t = f x a t + r t + w = ˆf x a t, t where we pose r t = r a t + r t. GP assme that the data are generated by a set of nderlying fnctions, whose joint probability distribtion is Gassian [14]. Having assmed a Gassian noise term w, we can conclde that the stochastic process 7 can be effectively represented as a GP. Eqation 7 shows that ˆf maps x a t and t into the next state x t+1. Hence, the training inpts of or GP, while the training otpts are {y t = x t+1 x t } N 1 t=. The training data are obtained as follows. At the n th rollot, the crrent policy π n is applied to the robot and to the approximate model. As already mentioned, for the first rollot we set π 1 = π a. After the trial, we have the two data sets X a = {x a t, a t } N t= and are the tples { x a t, t } N 1 t= = {x r t, r t } N t=. Hence, we have to simply calclate the element-wise difference X r X a to obtain the tples X d = { x t, t } N t=, which contain the rest of the training data for the Gassian process. The state predicted with a GP x t+1 is Gassian distribted, i.e., p x t+1 x a t, t = N x t+1 µ t+1, Σ t+1. Given N training inpts X = [ x a 1, 1,..., x a N, N ], training otpts y = [y 1,..., y N ] T, and a qery point x = X r T t T ] T, the predicted one-step mean µ t+1 and variance Σ t+1 are µ t+1 = x t + K x X K X X + σni 2 1 y Σ t+1 = k x, x K x X K X X + σni K X x [ x a, t where K x X = k x, X, K X x = K T x, k, is a X covariance fnction and the generic element ij of the matrix 7 K X X is given by K ij X = k x X i, x j. In or approach, k, is the sqared exponential covariance fnction k x i, x j = σk 2 exp 1 2 x i x j T Λ 1 x i x j 9 where Λ = diagl 2 1,..., l 2 D. The tnable parameters Λ, σ2 k and σ 2 n are learned from the training data by sing evidence maximization [14]. The standard GP formlation works for scalar otpts. For systems with a mltidimensional state, we consider one GP for each dimension assming that the dimensions are independent to each other. D. Policy Parameterization and Cost Fnction The control of a robotic device reqires a continosvaled control policy. It is clear that searching for a control policy in a continos space is nfeasible in terms of comptational cost. A common strategy [2] to solve this isse consists in assming that the policy π is parameterized by θ R p, i.e., πx, θ. In this work, we assme that the policy is the mean of a GP in the form D πx, θ = kx, c i K CC + σπi 2 1 yπ 1 i where x is the crrent state, C = [c 1,..., c D ] are the centers of the Gassian basis fnction k, defined as in 9. The vector y π plays the role of the GP training targets, while K CC is defined as in 8. The tnable parameters of the policy 1 are θ = [C, Λ, y π ]. As sggested in [1], we set the variance σπ 2 =.1 in or experiments. As already mentioned, the optimal policy minimizes the expected retrn J π θ in 1. In this work, we adopt a satrating cost fnction cx r t with spread σc 2 cx r t = 1 exp 1 2 x r t x g 2 [, 1] 11 where x g denotes the goal target state 1. E. Policy Evalation The learned GP model is sed to compte the longterm predictions p x 1 π,..., p x N, π starting from p x. The long-term predictions, in fact, are needed to compte the expected costs and to minimize the expected retrn in 1. Compte the long-term predictions from the one-step predictions in 8 reqires the predictive distribtion py t = pˆf ξ t ξ t pξ t dˆf dξ t 12 where ξ t = [ x a t T t T ] T and y t = x t+1 x t. Solving the integral in 12 is analytically intractable. Hence, we assme a Gassian distribtion p t = N t µ, Σ. The mean µ and Σ are obtained by applying the moment matching algorithm in [1]. The predictive distribtion p x t+1 π is also Gassian with mean and covariance µ t+1 = µ t + µ Σ t+1 = Σ t + Σ + cov x t, t + cov t, x t 13 1 In [19], the athor presents the benefits of the satrating cost 11 compared to a qadratic cost.
4 where the covariance cov, is compted as in [19]. Once the long-term predictions are compted, the estimated real state can be calclated as x r t = x a t + x t, where x a t is the deterministic state of the approximate model see Sec. II-A. Given that x r t = x a t + x t, the expected real long-term cost can be expressed as N N J π θ = E x r t [cx r t ] = E xt [c x t ] 14 t= t= Indeed, by sbstitting x t = x r t x a t into 11 we obtain cx r t = 1 exp 1 2 x r t x g 2 = 1 exp 1 2 x t + x a t x g 2 = 1 exp 1 2 x t x g x a t 2 = 1 exp 1 2 x t x g,t 2 = c x t The vector x a t is the approximate state compted by applying the initial policy π a and it is deterministic. Hence, we can precalclate all the x g,t for t = 1,..., N. Note that by minimizing c x t we force the real robot to follow the same trajectory of the approximate model. Recalling that the predictive distribtion p x t π = N x t µ t, Σ t, we can rewrite the real expected costs in 1 as E xt [c x t ] = c x t N x t µ t, Σ t d x t 15 where µ t and Σ t are defined as in 13. The expected vale E xt [c x t ] in 15 can be compted analytically and it is differentiable, which allows the adoption of gradient based approaches for policy search. F. Policy Search We leverage the gradient J π θ/ θ to search the policy parameters θ that minimize the expected long-term cost J π θ. In or formlation, the expected costs in 1 can be written in the form 15. Given that the expected costs have the form 15, and assming that the moment matching algorithm is sed for long-term predictions, the gradient J π θ/ θ can be compted analytically. The comptation of the gradient J π θ/ θ is detailed in [1]. The reslt of the policy improvement is a set of policy parameters θ n, where n indicates the n th iteration. The control policy π n = πx, θ n drives the estimated real state x a t + x t towards the desired goal. The new policy π n is applied to the real robot to obtain new training data and to refine the residal model. The process is repeated ntil the task is learned. is smmarized in Algorithm 1. III. EXPERIMENTS Experiments in this section show the effectiveness of or approach in learning control policies for simlated and real physical systems. The proposed approach is compared with the algorithm in [1] considering the nmber of Algorithm 1 Policy Improvement with REsidal Model learning 1: Learn a policy π a for the approximate model 4 2: π n = π a 3: while task learning is not completed do 4: Apply π n to the approximate model and the real robot 5: Calclate the new training set X d = X r X a 6: Calclate the new target set x g,t = x g x a t 7: Learn GP model for the dynamics 7 8: while J π θ not minimized do 9: Policy evalation sing 13 and 15 1: Policy improvement Sec. II-F and [1] 11: end while 12: retrn θ 13: end while 14: retrn π f k + θ a l m Fig. 2. a The pendlm interacting with an elastic environment. b The cart pole system connected to a spring. rollots performed on the real system and the total time dration of the rollots real experience. Both and are implemented in Matlab R. In all the experiments, we se the satrating cost fnction in 11. To consider bonded control inpts, the control policy in 1 is sqashed into the interval [ max, max ] by sing the fnction σx = max 9 sin x+sin 3x 8. A. Simlation Reslts We tested in two different tasks, namely a pendlm swing-p and a cart pole swing-p see Fig. 2, and compared the reslts with [1]. For a fair comparison, we se the same parameters for and. 1 Pendlm swing-p: The task consists in balancing the pendlm in the inverted position θ g = π rad, starting from the stable position θ = rad. As shown in Fig. 2a, the pendlm interacts with an elastic environment modeled as a spring when it reaches the vertical position θ = π rad. The interaction generates an extra force f k on the pendlm, which is neglected in or approximate model. Hence, the approximate model is the standard pendlm model + b θ t 1 4 ml2 + I mlg sin θ t = t b θ t θ l f k m
5 where the mass m = 1 kg, the length l = 1 m, I = 1 12 ml2 is the moment of inertia of a pendlm arond the midpoint, b =.1 snm/rad is the friction coefficient, g = 9.81 m/s 2 is the gravity acceleration and t is the control torqe. The state of the pendlm is defined as x = [ θ, θ] T, while the goal to reach is x g = [, π] T. The external force f k depends on the stiffness of the spring. We tested three different stiffness vales, namely 1, 2 and 5 N/m. Reslts are reported is Tab. I. Note that the time of real experience in Tab. I is a mltiple of the nmber of rollots. This is becase we keep the dration of the task fixed for each rollot. TABLE I RESULTS FOR THE PENDULUM SWING-UP. Stiffness Real rollots Real experience [N/m] [#] [s] [1] [1] [1] Stiffness 1 N/m: For a stiffness eqal to 1 N/m, max = 5 Nm is sfficient to flfill the task. The total dration of the task is T = 4 s, while the sampling time is dt =.1 s. Reslts in Tab. I show that needs only 2 iterations and 8 s of real experience to learn the control policy. For comparison, needs 6 iterations and 24 s of real experience. The improvement mostly depends on the redced state prediction error of. As shown in Fig. 3a, the approximate model accrately represents the system ntil the pendlm toches the spring. This lets to properly estimate the state after 2 iterations see Fig. 3b., instead, spends 5 iterations to learn a proper model of the system. As shown in Fig. 4, after 2 iterations the pendlm is able to reach the goal x g = [, π] T. The learned control policy is bonded x [ 5, 5] Nm, and it drives the pendlm to the goal position in less than 1 s see Fig. 4a. This reslts in an average expected retrn J π θ 12 see Fig. 4b. needs more rollots 6 instead of 2 to find the optimal policy. Since we sed the same policy parameterization and cost fnction in both the approaches, and learn almost the same control policy that generates a similar behavior of the pendlm. Stiffness 2 N/m: For a stiffness eqal to 2 N/m, we have to redce the sampling time to.5 s to avoid nmerical instabilities in the simlation of the real model. Moreover, we increase the maximm control inpt to 15 Nm. In this case, or approach needs 3 iterations and 12 s of real experience to learn the control policy see Tab. I. Stiffness 5 N/m: For a stiffness eqal to 5 N/m, we have to frther redce the sampling time to.125 s. To speed-p the learning, we redce the prediction time to 2 s. Moreover, we increase the maximm control inpt to 25 Nm. In this case, or approach needs 2 iterations and 4 s of real experience to learn the control policy see Tab. I predicted real a Rollots [#] b 5 6 Fig. 3. Pendlm model learning reslts stiffness 1 N/m. a State predicted and measred after one iteration of. b State prediction root mean sqare errors of and for different iterations. [Nm] -4 5 Pendlm angle Control policy a Rollots [#] b 6 7 Fig. 4. Reslts for the pendlm swing-p stiffness 1 N/m. a Pendlm angle and control policy obtained with after 2 iterations. The black solid line represents the goal. b Evoltion of the expected retrn J π θ for different rollots mean and std over 5 exections. 2 Cart pole swing-p: The task consists in swingingp the pendlm on the cart and balance it in the inverted position θ g = π rad, starting from the stable position θ = rad. The control inpt is the horizontal force t that makes the cart moving to the left or to the right. As shown in Fig. 2b, the cart is connected to a wall throgh a spring. The additive force f k of the spring affects the motion of the cart. The mass of the cart is m c =.5 kg, the mass of the pendlm is m p =.5 kg, and the length of the pendlm is l =.5 m. The state of the cart pole system is x = [p, ṗ, θ, θ] T, where p is the position of the cart, ṗ is the velocity of the cart, θ and θ are respectively the angle and the anglar velocity of the pendlm. The goal state is x g = [,, π, ] T. The cart pole system is more complicated than the single pendlm, since the state has for dimensions instead of two. Also in this case, the approximate model does not consider the extra force f k. The approximate model of the cart pole pendlm is then m c + m p p t mpl θ t cos θ t 1 2 mpl θ 2 t sin θ t = t bṗ t 2l θ t + 3 p t cos θ t + 3g sin θ t = We tested three different stiffness vales, namely 25, 5 and 12 N/m. Reslts are reported is Tab. II. Recall that the time of real experience in Tab. II is a mltiple of the nmber of rollots. Stiffness 25 N/m: For a stiffness eqal to 25 N/m, max = 1 N is sfficient to flfill the task. The total dration of the task is T = 4 s, while the sampling time is dt =.1 s. Reslts in Tab. II show that or approach needs only 2 iterations and 8 s of real experience to learn the control
6 [m] [m/s] [N] TABLE II RESULTS FOR THE CART POLE SWING-UP. Stiffness Real rollots Real experience [N/m] [#] [s] [1] [1] [1] Cart position Cart velocity a c [rad/s] 4 2 Pendlm angle Pendlm anglar velocity b Rollots [#] Fig. 5. Reslts for the cart pole swing-p stiffness 12 N/m. a b State of the cart pole system and c learned control policy after 15 iterations of. The black solid line represents the goal. d Evoltion of the expected retrn J π θ for different rollots mean and std over 5 exections. d takes from 25% to 67% less iterations than. As a conseqence, the time of real experience is redced from 25% to 67% from a minimm of 2 s to a maximm of 16 s less than. B. Robot Experiment is applied to control a qbmove Maker variable stiffness actator VSA [2]. As shown in Fig. 6, the arm consists of for VSA. The first joint of the arm is actated, while the other three are not controlled. The task consists in swinging-p the robotic arm by controlling the actated joint. As approximate model, we se the mass spring system m θ t + kθ t t = 16 where m =.78 kg is the mass of the nactated joints, and k = 14 Nm/rad is the maximm stiffness of the qbmove Maker [2]. The variable θ t represents the link position, while t is the commanded motor position. It is worth noticing that the approximate model in 16 neglects the length, inertia and friction of the arm in Fig. 6. Moreover, the model in 16 represents a VSA only in the region where the behavior of the spring is linear. Actated Joint Unactated Joints policy. For comparison, needs 5 iterations and 2 s of real experience. Stiffness 5 N/m: For a stiffness eqal to 2 N/m, we have to increase the maximm control inpt to 15 N in order to flfill the task. In this case, needs 3 iterations and 12 s of real experience to learn the policy see Tab. II. Stiffness 12 N/m: For a stiffness eqal to 12 N/m, we have to redce the sampling time to.5 s to avoid nmerical instabilities in the simlation of the real model. We also redce the dration of the task to T = 2 s to speed-p the learning process. In this case, or approach needs 15 iterations and 3 s of real experience to learn the control policy see Tab. II. As shown in Fig. 5, after 15 iterations the cart pole system reaches the goal x g = [,, π, ] T. The learned control policy is bonded x [ 15, 15] N, and it drives the cart pole system to the goal position in less than 4 s see Fig. 5a to 5c. This reslts in an average expected retrn J π θ 2 see Fig. 5d. needs more rollots 23 instead of 15 to find the optimal policy. Since we sed the same policy parameterization and cost fnction in both the approaches, and learn almost the same control policy that generates a similar behavior of the cart pole system. In the considered simlations, the proposed performs better than. In terms of rollots, or approach Fig. 6. The VSA arm sed in the robotic experiment. The goal is to learn a control policy t that drives the model from the initial state x = [θ, θ ] = [π/2, ] T to the goal x g = [ π/2, ] T in T = 5 s. The maximm inpt position is max = 1.7 rad. Joint velocities are compted from the measred joint angles sing a constant velocity Kalman filter. The robot is controlled at 5 Hz via a Simlink R interface. Reslts in Tab. III show that or approach needs 3 iterations and 15 s of real experience to lean the task. For comparison, needs 5 iterations and 25 s of real experience to learn the same task. Hence, shows an improvement of the performance of 4%. Figre 7 shows the learned policy, the cost evoltion and the state of the robot after 3 iterations of. Note that, for a better visalization, we show only the first two seconds of the task. As shown in Fig. 7a, the robot effectively reaches the goal after approximatively.5 s. This reslts in an expected retrn J π θ 9 see Fig. 7d. needs 5 rollots instead of 3 to find the optimal policy. As shown in Fig. 7d, the expected retrn of is also J π θ 9. This indicates that and learn a similar control policy that generates a similar behavior of the VSA pendlm.
7 [rad/s] Joint angle Joint anglar velocity a c Control policy b Rollots [#] d Fig. 7. Reslts for the VSA pendlm swing-p. a State of the VSA pendlm, b learned control policy, and c snapshots of the task exection after 3 iterations of. The black solid line in a represents the goal. d Evoltion of the expected retrn J π θ for different rollots. Performed simlations and experiments show that learning a black-box model of the residal, instead of a black-box model of the entire system, can significantly improve the performance of the policy search algorithm. TABLE III RESULTS FOR THE VSA PENDULUM SWING-UP. Real rollots Real experience [#] [s] 3 15 [1] 5 25 IV. CONCLUSION AND FUTURE WORK In this work, we proposed, a model-based reinforcement learning approach that exploits a simplified model to find an optimal control policy for a given task. The proposed approach works in two steps. First, an optimal policy is fond for the simplified model sing standard policy search algorithms. Second, sensory data from the real robot are exploited to learn a probabilistic model that represents the difference between the real system and its simplified model. Differently from most model-based RL approaches, or method does not learn a black-box model of the entire system. learns, instead, a black-box model of the differences between the real system and an approximated model, i.e., the model residal. As a conseqence, or approach is sitable in applications where the system model is inaccrate or only partially available. The approach has been compared with, showing improvements in terms of performed rollots and time of real experience. Or ftre research will focs on testing the proposed approach in more complicated tasks with a high dimensional searching space. An important open qestion is how 6 inaccrate the model can be in order to maintain enhanced performance. We expect that with a completely wrong model, the performance of the algorithm becomes similar or slightly worse than classical black-box approaches sch as. However, formally qantifying the tolerated ncertainty still remains an open theoretical problem. Another potential limitation of or approach is that we assme additive model residals. Evalating the generality of this assmption more precisely will be a possible ftre research topic. ACKNOWLEDGEMENTS This work has been partially spported by the Technical University of Mnich, International Gradate School of Science and Engineering, and by the Marie Crie Action LEACON, EU project REFERENCES [1] R. Stton and A. Barto, Reinforcement learning: an introdction, ser. A Bradford book. MIT Press, [2] J. Kober, D. Bagnell, and J. Peters, Reinforcement learning in robotics: a srvey, IJRR, vol. 32, no. 11, pp , 213. [3] S. Calinon, P. Kormshev, and D. Caldwell, Compliant skills acqisition and mlti-optima policy search with EM-based reinforcement learning, RAS, vol. 61, no. 4, pp , 213. [4] J. Kober and J. Peters, Policy search for motor primitives in robotics, in NIPS, 29, pp [5] E. Theodoro, J. Bchli, and S. Schaal, A generalized path integral control approach to reinforcement learning, Jornal of Machine Learning Research, vol. 11, pp , 21. [6] J. Peters, K. Melling, and Y. Altn, Relative entropy policy search, in Conference on Articial Intelligence AAAI, 21, pp [7] D. Lee, H. Knori, and Y. Nakamra, Association of whole body motion from tool knowledge for hmanoid robots, in International Conference on Intelligent Robots and Systems, 28, pp [8] H. Knori, D. Lee, and Y. Nakamra, Associating and reshaping of whole body motions for object maniplation, in International Conference on Intelligent Robots and Systems, 29, pp [9] M. Saveriano, S. An, and D. Lee, Incremental kinesthetic teaching of end-effector and nll-space motion primitives, in International Conference on Robotics and Atomation, 215, pp [1] M. P. Deisenroth, D. Fox, and C. E. Rasmssen, Gassian processes for data-efficient learning in robotics and control, TPAMI, vol. 37, no. 2, pp , 215. [11] J. G. Schneider, Exploiting model ncertainty estimates for safe dynamic control learning, in in Neral Information Processing Systems 9. The MIT Press, 1996, pp [12] C. G. Atkeson and S. Schaal, Robot learning from demonstration, in International Conference on Machine Learning, 1997, pp [13] C. G. Atkeson and J. C. Santamaria, A comparison of direct and model-based reinforcement learning, in International conference on robotics and atomation, 1997, pp [14] C. E. Rasmssen and C. K. I. Williams, Gassian Processes for Machine Learning. MIT Press, 26. [15] F. Farshidian, M. Nenert, and J. Bchli, Learning of closed-loop motion control, in IROS, 214, pp [16] S. Levine and P. Abbeel, Learning neral network policies with gided policy search nder nknown dynamics, in Advances in Neral Information Processing Systems, 214, pp [17] B. Siciliano, L. Sciavicco, L. Villani, and G. Oriolo, Robotics - Modelling, Planning and Control. Springer, 29. [18] G. D. Maria, P. Falco, C. Natale, and S. Pirozzi, Integrated force/tactile sensing: The enabling technology for slipping detection and avoidance, in ICRA, 215, pp [19] M. Deisenroth, Efficient Reinforcement Learning sing Gassian Processes. KIT Scientific Pblishing, 21. [2] M. G. Catalano, G. Grioli, M. Garabini, F. Bonomo, M. Mancini, N. Tsagarakis, and A. Bicchi, Vsa-cbebot: A modlar variable stiffness platform for mltiple degrees of freedom robots, in International Conference on Robotics and Atomation, 211, pp
Chapter 4 Supervised learning:
Chapter 4 Spervised learning: Mltilayer Networks II Madaline Other Feedforward Networks Mltiple adalines of a sort as hidden nodes Weight change follows minimm distrbance principle Adaptive mlti-layer
More informationA Model-Free Adaptive Control of Pulsed GTAW
A Model-Free Adaptive Control of Plsed GTAW F.L. Lv 1, S.B. Chen 1, and S.W. Dai 1 Institte of Welding Technology, Shanghai Jiao Tong University, Shanghai 00030, P.R. China Department of Atomatic Control,
More informationAn Investigation into Estimating Type B Degrees of Freedom
An Investigation into Estimating Type B Degrees of H. Castrp President, Integrated Sciences Grop Jne, 00 Backgrond The degrees of freedom associated with an ncertainty estimate qantifies the amont of information
More informationDecision Oriented Bayesian Design of Experiments
Decision Oriented Bayesian Design of Experiments Farminder S. Anand*, Jay H. Lee**, Matthew J. Realff*** *School of Chemical & Biomoleclar Engineering Georgia Institte of echnology, Atlanta, GA 3332 USA
More informationNonparametric Identification and Robust H Controller Synthesis for a Rotational/Translational Actuator
Proceedings of the 6 IEEE International Conference on Control Applications Mnich, Germany, October 4-6, 6 WeB16 Nonparametric Identification and Robst H Controller Synthesis for a Rotational/Translational
More informationLinearly Solvable Markov Games
Linearly Solvable Markov Games Krishnamrthy Dvijotham and mo Todorov Abstract Recent work has led to an interesting new theory of linearly solvable control, where the Bellman eqation characterizing the
More informationUNCERTAINTY FOCUSED STRENGTH ANALYSIS MODEL
8th International DAAAM Baltic Conference "INDUSTRIAL ENGINEERING - 19-1 April 01, Tallinn, Estonia UNCERTAINTY FOCUSED STRENGTH ANALYSIS MODEL Põdra, P. & Laaneots, R. Abstract: Strength analysis is a
More informationStudy on the impulsive pressure of tank oscillating by force towards multiple degrees of freedom
EPJ Web of Conferences 80, 0034 (08) EFM 07 Stdy on the implsive pressre of tank oscillating by force towards mltiple degrees of freedom Shigeyki Hibi,* The ational Defense Academy, Department of Mechanical
More informationON THE SHAPES OF BILATERAL GAMMA DENSITIES
ON THE SHAPES OF BILATERAL GAMMA DENSITIES UWE KÜCHLER, STEFAN TAPPE Abstract. We investigate the for parameter family of bilateral Gamma distribtions. The goal of this paper is to provide a thorogh treatment
More informationPrandl established a universal velocity profile for flow parallel to the bed given by
EM 0--00 (Part VI) (g) The nderlayers shold be at least three thicknesses of the W 50 stone, bt never less than 0.3 m (Ahrens 98b). The thickness can be calclated sing Eqation VI-5-9 with a coefficient
More informationSources of Non Stationarity in the Semivariogram
Sorces of Non Stationarity in the Semivariogram Migel A. Cba and Oy Leangthong Traditional ncertainty characterization techniqes sch as Simple Kriging or Seqential Gassian Simlation rely on stationary
More informationTechnical Note. ODiSI-B Sensor Strain Gage Factor Uncertainty
Technical Note EN-FY160 Revision November 30, 016 ODiSI-B Sensor Strain Gage Factor Uncertainty Abstract Lna has pdated or strain sensor calibration tool to spport NIST-traceable measrements, to compte
More informationThe Real Stabilizability Radius of the Multi-Link Inverted Pendulum
Proceedings of the 26 American Control Conference Minneapolis, Minnesota, USA, Jne 14-16, 26 WeC123 The Real Stabilizability Radis of the Mlti-Link Inerted Pendlm Simon Lam and Edward J Daison Abstract
More informationControl of a Power Assisted Lifting Device
Proceedings of the RAAD 212 21th International Workshop on Robotics in Alpe-Adria-Danbe Region eptember 1-13, 212, Napoli, Italy Control of a Power Assisted Lifting Device Dimeas otios a, Kostompardis
More informationSystem identification of buildings equipped with closed-loop control devices
System identification of bildings eqipped with closed-loop control devices Akira Mita a, Masako Kamibayashi b a Keio University, 3-14-1 Hiyoshi, Kohok-k, Yokohama 223-8522, Japan b East Japan Railway Company
More informationCreating a Sliding Mode in a Motion Control System by Adopting a Dynamic Defuzzification Strategy in an Adaptive Neuro Fuzzy Inference System
Creating a Sliding Mode in a Motion Control System by Adopting a Dynamic Defzzification Strategy in an Adaptive Nero Fzzy Inference System M. Onder Efe Bogazici University, Electrical and Electronic Engineering
More informationChapter 3 MATHEMATICAL MODELING OF DYNAMIC SYSTEMS
Chapter 3 MATHEMATICAL MODELING OF DYNAMIC SYSTEMS 3. System Modeling Mathematical Modeling In designing control systems we mst be able to model engineered system dynamics. The model of a dynamic system
More informationSafe Manual Control of the Furuta Pendulum
Safe Manal Control of the Frta Pendlm Johan Åkesson, Karl Johan Åström Department of Atomatic Control, Lnd Institte of Technology (LTH) Box 8, Lnd, Sweden PSfrag {jakesson,kja}@control.lth.se replacements
More informationConvergence analysis of ant colony learning
Delft University of Technology Delft Center for Systems and Control Technical report 11-012 Convergence analysis of ant colony learning J van Ast R Babška and B De Schtter If yo want to cite this report
More informationLecture Notes: Finite Element Analysis, J.E. Akin, Rice University
9. TRUSS ANALYSIS... 1 9.1 PLANAR TRUSS... 1 9. SPACE TRUSS... 11 9.3 SUMMARY... 1 9.4 EXERCISES... 15 9. Trss analysis 9.1 Planar trss: The differential eqation for the eqilibrim of an elastic bar (above)
More informationStability of Model Predictive Control using Markov Chain Monte Carlo Optimisation
Stability of Model Predictive Control sing Markov Chain Monte Carlo Optimisation Elilini Siva, Pal Golart, Jan Maciejowski and Nikolas Kantas Abstract We apply stochastic Lyapnov theory to perform stability
More informationCHANNEL SELECTION WITH RAYLEIGH FADING: A MULTI-ARMED BANDIT FRAMEWORK. Wassim Jouini and Christophe Moy
CHANNEL SELECTION WITH RAYLEIGH FADING: A MULTI-ARMED BANDIT FRAMEWORK Wassim Joini and Christophe Moy SUPELEC, IETR, SCEE, Avene de la Bolaie, CS 47601, 5576 Cesson Sévigné, France. INSERM U96 - IFR140-
More informationThe Linear Quadratic Regulator
10 The Linear Qadratic Reglator 10.1 Problem formlation This chapter concerns optimal control of dynamical systems. Most of this development concerns linear models with a particlarly simple notion of optimality.
More informationWe automate the bivariate change-of-variables technique for bivariate continuous random variables with
INFORMS Jornal on Compting Vol. 4, No., Winter 0, pp. 9 ISSN 09-9856 (print) ISSN 56-558 (online) http://dx.doi.org/0.87/ijoc.046 0 INFORMS Atomating Biariate Transformations Jeff X. Yang, John H. Drew,
More informationarxiv: v1 [cs.sy] 22 Nov 2018
AAS 18-253 ROBUST MYOPIC CONTROL FOR SYSTEMS WITH IMPERFECT OBSERVATIONS Dantong Ge, Melkior Ornik, and Ufk Topc arxiv:1811.09000v1 [cs.sy] 22 Nov 2018 INTRODUCTION Control of systems operating in nexplored
More informationQUANTILE ESTIMATION IN SUCCESSIVE SAMPLING
Jornal of the Korean Statistical Society 2007, 36: 4, pp 543 556 QUANTILE ESTIMATION IN SUCCESSIVE SAMPLING Hosila P. Singh 1, Ritesh Tailor 2, Sarjinder Singh 3 and Jong-Min Kim 4 Abstract In sccessive
More informationTheoretical and Experimental Implementation of DC Motor Nonlinear Controllers
Theoretical and Experimental Implementation of DC Motor Nonlinear Controllers D.R. Espinoza-Trejo and D.U. Campos-Delgado Facltad de Ingeniería, CIEP, UASLP, espinoza trejo dr@aslp.mx Facltad de Ciencias,
More informationOptimal Control of a Heterogeneous Two Server System with Consideration for Power and Performance
Optimal Control of a Heterogeneos Two Server System with Consideration for Power and Performance by Jiazheng Li A thesis presented to the University of Waterloo in flfilment of the thesis reqirement for
More informationFRTN10 Exercise 12. Synthesis by Convex Optimization
FRTN Exercise 2. 2. We want to design a controller C for the stable SISO process P as shown in Figre 2. sing the Yola parametrization and convex optimization. To do this, the control loop mst first be
More informationAN ALTERNATIVE DECOUPLED SINGLE-INPUT FUZZY SLIDING MODE CONTROL WITH APPLICATIONS
AN ALTERNATIVE DECOUPLED SINGLE-INPUT FUZZY SLIDING MODE CONTROL WITH APPLICATIONS Fang-Ming Y, Hng-Yan Chng* and Chen-Ning Hang Department of Electrical Engineering National Central University, Chngli,
More informationControl Performance Monitoring of State-Dependent Nonlinear Processes
Control Performance Monitoring of State-Dependent Nonlinear Processes Lis F. Recalde*, Hong Ye Wind Energy and Control Centre, Department of Electronic and Electrical Engineering, University of Strathclyde,
More informationAN ALTERNATIVE DECOUPLED SINGLE-INPUT FUZZY SLIDING MODE CONTROL WITH APPLICATIONS
AN ALTERNATIVE DECOUPLED SINGLE-INPUT FUZZY SLIDING MODE CONTROL WITH APPLICATIONS Fang-Ming Y, Hng-Yan Chng* and Chen-Ning Hang Department of Electrical Engineering National Central University, Chngli,
More informationFOUNTAIN codes [3], [4] provide an efficient solution
Inactivation Decoding of LT and Raptor Codes: Analysis and Code Design Francisco Lázaro, Stdent Member, IEEE, Gianligi Liva, Senior Member, IEEE, Gerhard Bach, Fellow, IEEE arxiv:176.5814v1 [cs.it 19 Jn
More informationREINFORCEMENT LEARNING AND OPTIMAL ADAPTIVE CONTROL
Lewis c11.tex V1-10/19/2011 4:10pm Page 461 11 REINFORCEMENT LEARNING AND OPTIMAL ADAPTIVE CONTROL In this book we have presented a variety of methods for the analysis and design of optimal control systems.
More informationStep-Size Bounds Analysis of the Generalized Multidelay Adaptive Filter
WCE 007 Jly - 4 007 London UK Step-Size onds Analysis of the Generalized Mltidelay Adaptive Filter Jnghsi Lee and Hs Chang Hang Abstract In this paper we analyze the bonds of the fixed common step-size
More informationCurves - Foundation of Free-form Surfaces
Crves - Fondation of Free-form Srfaces Why Not Simply Use a Point Matrix to Represent a Crve? Storage isse and limited resoltion Comptation and transformation Difficlties in calclating the intersections
More informationAdaptive Fault-tolerant Control with Control Allocation for Flight Systems with Severe Actuator Failures and Input Saturation
213 American Control Conference (ACC) Washington, DC, USA, Jne 17-19, 213 Adaptive Falt-tolerant Control with Control Allocation for Flight Systems with Severe Actator Failres and Inpt Satration Man Wang,
More informationDiscussion of The Forward Search: Theory and Data Analysis by Anthony C. Atkinson, Marco Riani, and Andrea Ceroli
1 Introdction Discssion of The Forward Search: Theory and Data Analysis by Anthony C. Atkinson, Marco Riani, and Andrea Ceroli Søren Johansen Department of Economics, University of Copenhagen and CREATES,
More informationEffects of Soil Spatial Variability on Bearing Capacity of Shallow Foundations
Geotechnical Safety and Risk V T. Schweckendiek et al. (Eds.) 2015 The athors and IOS Press. This article is pblished online with Open Access by IOS Press and distribted nder the terms of the Creative
More informationStudy of the diffusion operator by the SPH method
IOSR Jornal of Mechanical and Civil Engineering (IOSR-JMCE) e-issn: 2278-684,p-ISSN: 2320-334X, Volme, Isse 5 Ver. I (Sep- Oct. 204), PP 96-0 Stdy of the diffsion operator by the SPH method Abdelabbar.Nait
More informationThe spreading residue harmonic balance method for nonlinear vibration of an electrostatically actuated microbeam
J.L. Pan W.Y. Zh Nonlinear Sci. Lett. Vol.8 No. pp.- September The spreading reside harmonic balance method for nonlinear vibration of an electrostatically actated microbeam J. L. Pan W. Y. Zh * College
More informationThe Dual of the Maximum Likelihood Method
Department of Agricltral and Resorce Economics University of California, Davis The Dal of the Maximm Likelihood Method by Qirino Paris Working Paper No. 12-002 2012 Copyright @ 2012 by Qirino Paris All
More informationRESGen: Renewable Energy Scenario Generation Platform
1 RESGen: Renewable Energy Scenario Generation Platform Emil B. Iversen, Pierre Pinson, Senior Member, IEEE, and Igor Ardin Abstract Space-time scenarios of renewable power generation are increasingly
More informationSareban: Evaluation of Three Common Algorithms for Structure Active Control
Engineering, Technology & Applied Science Research Vol. 7, No. 3, 2017, 1638-1646 1638 Evalation of Three Common Algorithms for Strctre Active Control Mohammad Sareban Department of Civil Engineering Shahrood
More informationAnalytical Value-at-Risk and Expected Shortfall under Regime Switching *
Working Paper 9-4 Departamento de Economía Economic Series 14) Universidad Carlos III de Madrid March 9 Calle Madrid, 16 893 Getafe Spain) Fax 34) 91649875 Analytical Vale-at-Risk and Expected Shortfall
More informationBLOOM S TAXONOMY. Following Bloom s Taxonomy to Assess Students
BLOOM S TAXONOMY Topic Following Bloom s Taonomy to Assess Stdents Smmary A handot for stdents to eplain Bloom s taonomy that is sed for item writing and test constrction to test stdents to see if they
More informationOptimization in Predictive Control Algorithm
Latest rends in Circits, Systems, Sinal Processin and Atomatic Control Optimization in Predictive Control Alorithm JAN ANOŠ, MAREK KUBALČÍK omas Bata University in Zlín, Faclty of Applied Informatics Nám..
More informationLecture 17 Errors in Matlab s Turbulence PSD and Shaping Filter Expressions
Lectre 7 Errors in Matlab s Trblence PSD and Shaping Filter Expressions b Peter J Sherman /7/7 [prepared for AERE 355 class] In this brief note we will show that the trblence power spectral densities (psds)
More informationTrustSVD: Collaborative Filtering with Both the Explicit and Implicit Influence of User Trust and of Item Ratings
Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence TrstSVD: Collaborative Filtering with Both the Explicit and Implicit Inflence of User Trst and of Item Ratings Gibing Go Jie Zhang
More informationElectron Phase Slip in an Undulator with Dipole Field and BPM Errors
CS-T--14 October 3, Electron Phase Slip in an Undlator with Dipole Field and BPM Errors Pal Emma SAC ABSTRACT A statistical analysis of a corrected electron trajectory throgh a planar ndlator is sed to
More informationA Characterization of Interventional Distributions in Semi-Markovian Causal Models
A Characterization of Interventional Distribtions in Semi-Markovian Casal Models Jin Tian and Changsng Kang Department of Compter Science Iowa State University Ames, IA 50011 {jtian, cskang}@cs.iastate.ed
More informationCalculations involving a single random variable (SRV)
Calclations involving a single random variable (SRV) Example of Bearing Capacity q φ = 0 µ σ c c = 100kN/m = 50kN/m ndrained shear strength parameters What is the relationship between the Factor of Safety
More information1. State-Space Linear Systems 2. Block Diagrams 3. Exercises
LECTURE 1 State-Space Linear Sstems This lectre introdces state-space linear sstems, which are the main focs of this book. Contents 1. State-Space Linear Sstems 2. Block Diagrams 3. Exercises 1.1 State-Space
More informationDESIGN AND SIMULATION OF SELF-TUNING PREDICTIVE CONTROL OF TIME-DELAY PROCESSES
DESIGN AND SIMULAION OF SELF-UNING PREDICIVE CONROL OF IME-DELAY PROCESSES Vladimír Bobál,, Marek Kbalčík and Petr Dostál, omas Bata University in Zlín Centre of Polymer Systems, University Institte Department
More informationInternational Journal of Physical and Mathematical Sciences journal homepage:
64 International Jornal of Physical and Mathematical Sciences Vol 2, No 1 (2011) ISSN: 2010-1791 International Jornal of Physical and Mathematical Sciences jornal homepage: http://icoci.org/ijpms PRELIMINARY
More informationACTUATION AND SIMULATION OF A MINISYSTEM WITH FLEXURE HINGES
The 4th International Conference Comptational Mechanics and Virtal Engineering COMEC 2011 20-22 OCTOBER 2011, Brasov, Romania ACTUATION AND SIMULATION OF A MINISYSTEM WITH FLEXURE HINGES D. NOVEANU 1,
More informationLecture Notes On THEORY OF COMPUTATION MODULE - 2 UNIT - 2
BIJU PATNAIK UNIVERSITY OF TECHNOLOGY, ODISHA Lectre Notes On THEORY OF COMPUTATION MODULE - 2 UNIT - 2 Prepared by, Dr. Sbhend Kmar Rath, BPUT, Odisha. Tring Machine- Miscellany UNIT 2 TURING MACHINE
More informationSTABILIZATIO ON OF LONGITUDINAL AIRCRAFT MOTION USING MODEL PREDICTIVE CONTROL AND EXACT LINEARIZATION
8 TH INTERNATIONAL CONGRESS OF THE AERONAUTICAL SCIENCES STABILIZATIO ON OF LONGITUDINAL AIRCRAFT MOTION USING MODEL PREDICTIVE CONTROL AND EXACT LINEARIZATION Čeliovsý S.*, Hospodář P.** *CTU Prage, Faclty
More informationCOMPARATIVE STUDY OF ROBUST CONTROL TECHNIQUES FOR OTTO-CYCLE MOTOR CONTROL
ACM Symposim Series in Mechatronics - Vol. - pp.76-85 Copyright c 28 by ACM COMPARATIVE STUDY OF ROUST CONTROL TECHNIQUES FOR OTTO-CYCLE MOTOR CONTROL Marcos Salazar Francisco, marcos.salazar@member.isa.org
More informationLinear and Nonlinear Model Predictive Control of Quadruple Tank Process
Linear and Nonlinear Model Predictive Control of Qadrple Tank Process P.Srinivasarao Research scholar Dr.M.G.R.University Chennai, India P.Sbbaiah, PhD. Prof of Dhanalaxmi college of Engineering Thambaram
More informationDeakin Research Online
Deakin Research Online his is the pblished version: Herek A. Samir S Steven rinh Hie and Ha Qang P. 9 Performance of first and second-order sliding mode observers for nonlinear systems in AISA 9 : Proceedings
More informationIntelligent Positioning Plate Predictive Control and Concept of Diagnosis System Design
Intelligent Positioning Plate Predictive Control and Concept of Diagnosis System Design Matej Oravec - Anna Jadlovská Department of Cybernetics and Artificial Intelligence, Faclty of Electrical Engineering
More informationFREQUENCY DOMAIN FLUTTER SOLUTION TECHNIQUE USING COMPLEX MU-ANALYSIS
7 TH INTERNATIONAL CONGRESS O THE AERONAUTICAL SCIENCES REQUENCY DOMAIN LUTTER SOLUTION TECHNIQUE USING COMPLEX MU-ANALYSIS Yingsong G, Zhichn Yang Northwestern Polytechnical University, Xi an, P. R. China,
More informationDiscussion Papers Department of Economics University of Copenhagen
Discssion Papers Department of Economics University of Copenhagen No. 10-06 Discssion of The Forward Search: Theory and Data Analysis, by Anthony C. Atkinson, Marco Riani, and Andrea Ceroli Søren Johansen,
More informationStochastic Extended LQR: Optimization-based Motion Planning Under Uncertainty
Stochastic Extended LQR: Optimization-based Motion Planning Under Uncertainty Wen Sn 1, Jr van den Berg 2, and Ron Alterovitz 1 1 University of North Carolina at Chapel Hill, USA {wens, ron}@cs.nc.ed 2
More informationJoint Transfer of Energy and Information in a Two-hop Relay Channel
Joint Transfer of Energy and Information in a Two-hop Relay Channel Ali H. Abdollahi Bafghi, Mahtab Mirmohseni, and Mohammad Reza Aref Information Systems and Secrity Lab (ISSL Department of Electrical
More informationDynamic Optimization of First-Order Systems via Static Parametric Programming: Application to Electrical Discharge Machining
Dynamic Optimization of First-Order Systems via Static Parametric Programming: Application to Electrical Discharge Machining P. Hgenin*, B. Srinivasan*, F. Altpeter**, R. Longchamp* * Laboratoire d Atomatiqe,
More informationStochastic Extended LQR for Optimization-based Motion Planning Under Uncertainty
SUBMITTED TO IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, NOVEMBER 204 Stochastic Extended LQR for Optimization-based Motion Planning Under Uncertainty Wen Sn, Jr van den Berg, and Ron Alterovitz
More informationExperimental Study of an Impinging Round Jet
Marie Crie ay Final Report : Experimental dy of an Impinging Rond Jet BOURDETTE Vincent Ph.D stdent at the Rovira i Virgili University (URV), Mechanical Engineering Department. Work carried ot dring a
More informationPILCO: A Model-Based and Data-Efficient Approach to Policy Search
PILCO: A Model-Based and Data-Efficient Approach to Policy Search (M.P. Deisenroth and C.E. Rasmussen) CSC2541 November 4, 2016 PILCO Graphical Model PILCO Probabilistic Inference for Learning COntrol
More informationRegression Analysis of Octal Rings as Mechanical Force Transducers
Regression Analysis of Octal Rings as Mechanical Force Transdcers KHALED A. ABUHASEL* & ESSAM SOLIMAN** *Department of Mechanical Engineering College of Engineering, University of Bisha, Bisha, Kingdom
More informationPREDICTABILITY OF SOLID STATE ZENER REFERENCES
PREDICTABILITY OF SOLID STATE ZENER REFERENCES David Deaver Flke Corporation PO Box 99 Everett, WA 986 45-446-6434 David.Deaver@Flke.com Abstract - With the advent of ISO/IEC 175 and the growth in laboratory
More informationDiscontinuous Fluctuation Distribution for Time-Dependent Problems
Discontinos Flctation Distribtion for Time-Dependent Problems Matthew Hbbard School of Compting, University of Leeds, Leeds, LS2 9JT, UK meh@comp.leeds.ac.k Introdction For some years now, the flctation
More informationModel Predictive Control Lecture VIa: Impulse Response Models
Moel Preictive Control Lectre VIa: Implse Response Moels Niet S. Kaisare Department of Chemical Engineering Inian Institte of Technolog Maras Ingreients of Moel Preictive Control Dnamic Moel Ftre preictions
More informationSensitivity Analysis in Bayesian Networks: From Single to Multiple Parameters
Sensitivity Analysis in Bayesian Networks: From Single to Mltiple Parameters Hei Chan and Adnan Darwiche Compter Science Department University of California, Los Angeles Los Angeles, CA 90095 {hei,darwiche}@cs.cla.ed
More informationOutline. Model Predictive Control: Current Status and Future Challenges. Separation of the control problem. Separation of the control problem
Otline Model Predictive Control: Crrent Stats and Ftre Challenges James B. Rawlings Department of Chemical and Biological Engineering University of Wisconsin Madison UCLA Control Symposim May, 6 Overview
More informationAn Introduction to Geostatistics
An Introdction to Geostatistics András Bárdossy Universität Stttgart Institt für Wasser- nd Umweltsystemmodellierng Lehrsthl für Hydrologie nd Geohydrologie Prof. Dr. rer. nat. Dr.-Ing. András Bárdossy
More informationA Survey of the Implementation of Numerical Schemes for Linear Advection Equation
Advances in Pre Mathematics, 4, 4, 467-479 Pblished Online Agst 4 in SciRes. http://www.scirp.org/jornal/apm http://dx.doi.org/.436/apm.4.485 A Srvey of the Implementation of Nmerical Schemes for Linear
More informationCOMPARISON OF MODEL INTEGRATION APPROACHES FOR FLEXIBLE AIRCRAFT FLIGHT DYNAMICS MODELLING
COMPARISON OF MODEL INTEGRATION APPROACHES FOR FLEXIBLE AIRCRAFT FLIGHT DYNAMICS MODELLING Christian Reschke and Gertjan Looye DLR German Aerospace Center, Institte of Robotics and Mechatronics D-82234
More informationDesigning of Virtual Experiments for the Physics Class
Designing of Virtal Experiments for the Physics Class Marin Oprea, Cristina Miron Faclty of Physics, University of Bcharest, Bcharest-Magrele, Romania E-mail: opreamarin2007@yahoo.com Abstract Physics
More informationLecture: Corporate Income Tax
Lectre: Corporate Income Tax Ltz Krschwitz & Andreas Löffler Disconted Cash Flow, Section 2.1, Otline 2.1 Unlevered firms Similar companies Notation 2.1.1 Valation eqation 2.1.2 Weak atoregressive cash
More informationTed Pedersen. Southern Methodist University. large sample assumptions implicit in traditional goodness
Appears in the Proceedings of the Soth-Central SAS Users Grop Conference (SCSUG-96), Astin, TX, Oct 27-29, 1996 Fishing for Exactness Ted Pedersen Department of Compter Science & Engineering Sothern Methodist
More informationA Macroscopic Traffic Data Assimilation Framework Based on Fourier-Galerkin Method and Minimax Estimation
A Macroscopic Traffic Data Assimilation Framework Based on Forier-Galerkin Method and Minima Estimation Tigran T. Tchrakian and Sergiy Zhk Abstract In this paper, we propose a new framework for macroscopic
More informationRobust Tracking and Regulation Control of Uncertain Piecewise Linear Hybrid Systems
ISIS Tech. Rept. - 2003-005 Robst Tracking and Reglation Control of Uncertain Piecewise Linear Hybrid Systems Hai Lin Panos J. Antsaklis Department of Electrical Engineering, University of Notre Dame,
More informationPerformance analysis of GTS allocation in Beacon Enabled IEEE
1 Performance analysis of GTS allocation in Beacon Enabled IEEE 8.15.4 Pangn Park, Carlo Fischione, Karl Henrik Johansson Abstract Time-critical applications for wireless sensor networks (WSNs) are an
More informationA Regulator for Continuous Sedimentation in Ideal Clarifier-Thickener Units
A Reglator for Continos Sedimentation in Ideal Clarifier-Thickener Units STEFAN DIEHL Centre for Mathematical Sciences, Lnd University, P.O. Box, SE- Lnd, Sweden e-mail: diehl@maths.lth.se) Abstract. The
More informationReducing Conservatism in Flutterometer Predictions Using Volterra Modeling with Modal Parameter Estimation
JOURNAL OF AIRCRAFT Vol. 42, No. 4, Jly Agst 2005 Redcing Conservatism in Fltterometer Predictions Using Volterra Modeling with Modal Parameter Estimation Rick Lind and Joao Pedro Mortaga University of
More informationImprecise Continuous-Time Markov Chains
Imprecise Continos-Time Markov Chains Thomas Krak *, Jasper De Bock, and Arno Siebes t.e.krak@.nl, a.p.j.m.siebes@.nl Utrecht University, Department of Information and Compting Sciences, Princetonplein
More informationGradient Projection Anti-windup Scheme on Constrained Planar LTI Systems. Justin Teo and Jonathan P. How
1 Gradient Projection Anti-windp Scheme on Constrained Planar LTI Systems Jstin Teo and Jonathan P. How Technical Report ACL1 1 Aerospace Controls Laboratory Department of Aeronatics and Astronatics Massachsetts
More informationMove Blocking Strategies in Receding Horizon Control
Move Blocking Strategies in Receding Horizon Control Raphael Cagienard, Pascal Grieder, Eric C. Kerrigan and Manfred Morari Abstract In order to deal with the comptational brden of optimal control, it
More informationIMPROVED ANALYSIS OF BOLTED SHEAR CONNECTION UNDER ECCENTRIC LOADS
Jornal of Marine Science and Technology, Vol. 5, No. 4, pp. 373-38 (17) 373 DOI: 1.6119/JMST-17-3-1 IMPROVED ANALYSIS OF BOLTED SHEAR ONNETION UNDER EENTRI LOADS Dng-Mya Le 1, heng-yen Liao, hien-hien
More informationIntegrated Design of a Planar Maglev System for Micro Positioning
005 American Control Conference Jne 8-0, 005. Portland, OR, USA hc06.6 Integrated Design of a Planar Maglev System for Micro Positioning Mei-Yng Chen, Chia-Feng sai, Hsan-Han Hang and Li-Chen F.Department
More informationUncertainties of measurement
Uncertainties of measrement Laboratory tas A temperatre sensor is connected as a voltage divider according to the schematic diagram on Fig.. The temperatre sensor is a thermistor type B5764K [] with nominal
More informationNonlinear parametric optimization using cylindrical algebraic decomposition
Proceedings of the 44th IEEE Conference on Decision and Control, and the Eropean Control Conference 2005 Seville, Spain, December 12-15, 2005 TC08.5 Nonlinear parametric optimization sing cylindrical algebraic
More informationPropagation of measurement uncertainty in spatial characterisation of recreational fishing catch rates using logistic transform indicator kriging
st International Congress on Modelling and Simlation, Gold Coast, Astralia, 9 Nov to 4 Dec 05 www.mssan.org.a/modsim05 Propagation of measrement ncertainty in spatial characterisation of recreational fishing
More information1 Introduction. r + _
A method and an algorithm for obtaining the Stable Oscillator Regimes Parameters of the Nonlinear Sstems, with two time constants and Rela with Dela and Hsteresis NUŢU VASILE, MOLDOVEANU CRISTIAN-EMIL,
More informationAPPLICATION OF MICROTREMOR MEASUREMENTS TO EARTHQUAKE ENGINEERING
170 APPLICATION OF MICROTREMOR MEASUREMENTS TO EARTHQUAKE ENGINEERING I. M. Parton* and P. W. Taylor* INTRODUCTION Two previos papers pblished in this Blletin (Refs 1, 2) described the methods developed
More informationHome Range Formation in Wolves Due to Scent Marking
Blletin of Mathematical Biology () 64, 61 84 doi:1.16/blm.1.73 Available online at http://www.idealibrary.com on Home Range Formation in Wolves De to Scent Marking BRIAN K. BRISCOE, MARK A. LEWIS AND STEPHEN
More informationEstimating models of inverse systems
Estimating models of inverse systems Ylva Jng and Martin Enqvist Linköping University Post Print N.B.: When citing this work, cite the original article. Original Pblication: Ylva Jng and Martin Enqvist,
More informationAssignment Fall 2014
Assignment 5.086 Fall 04 De: Wednesday, 0 December at 5 PM. Upload yor soltion to corse website as a zip file YOURNAME_ASSIGNMENT_5 which incldes the script for each qestion as well as all Matlab fnctions
More information