Data-Efficient Control Policy Search using Residual Dynamics Learning

Size: px
Start display at page:

Download "Data-Efficient Control Policy Search using Residual Dynamics Learning"

Transcription

1 Data-Efficient Control Policy Search sing Residal Dynamics Learning Matteo Saveriano 1, Ychao Yin 1, Pietro Falco 1 and Donghei Lee 1,2 Abstract In this work, we propose a model-based and data efficient approach for reinforcement learning. The main idea of or algorithm is to combine simlated and real rollots to efficiently find an optimal control policy. While performing rollots on the robot, we exploit sensory data to learn a probabilistic model of the residal difference between the measred state and the state predicted by a simplified model. The simplified model can be any dynamical system, from a very accrate system to a simple, linear one. The residal difference is learned with Gassian processes. Hence, we assme that the difference between real and simplified model is Gassian distribted, which is less strict than assming that the real system is Gassian distribted. The combination of the partial model and the learned residals is exploited to predict the real system behavior and to search for an optimal policy. Simlations and experiments show that or approach significantly redces the nmber of rollots needed to find an optimal control policy for the real system. Simlation Approximate Policy Policy Improvement Sec. II.E, II.F Real Experience θ n Real Policy Δx π a π n Approximate Model + Robot π a, x a Learning Residal Dynamics Sec. II.B, II.C π r, x r x r x a I. INTRODUCTION Fig. 1. Overview of the algorithm. Robots that learn novel tasks by self-practice have the chance to be exploited in a wide range of sitations, inclding indstrial and social applications. Reinforcement Learning RL gives the agent the possibility to atonomosly learn a task by trial and error [1]. In particlar, the agent performs a nmber of trials rollots and ses sensory data to improve the exection. In robotics and control applications, RL presents two main disadvantages [2]. First, the state and action spaces are continos-valed and high dimensional, and existing approaches in RL to search continos spaces are not sample-efficient. A common strategy [3] [6] to redce the search space to a discrete space consists in parameterizing the control policy with a discrete nmber of learnable parameters. Typical policy parameterizations adopt Gassian mixtre models [3], hidden Markov models [7], [8], or stable dynamical systems [5]. Second, it is extremely time consming to perform a big nmber of trials rollots on real devices. The rollots can be redced with a proper initialization of the policy, for example sing kinesthetic teaching approaches [9]. However, redcing the rollots needed to find an optimal policy is still an open problem. RL algorithms can be classified into two categories, namely model-free and model-based approaches. Model-free methods [4] [6] directly learn the policy on the data samples obtained after each rollot. Model-free approaches do not reqire any model approximation and are applicable in many 1 Hman-centered Assistive Robotics, Technical University of Mnich, Mnich, Germany {matteo.saveriano, ychao.yin, pietro.falco, dhlee}@tm.de 2 Institte of Robotics and Mechatronics, German Aerospace Center DLR, Germany sitations. The problem of model-free approaches is that they may reqire many rollots to find the optimal policy [4]. Model-based methods, instead, explicitly learn a model of the system to control from sensory data and se this learned model to search the optimal policy. The comparison in [1] shows that model-based approaches are efficient compared with model-free methods. However, model-based methods strongly sffer from the so-called model bias problem. Indeed, they assme that the learned model is sfficiently accrate to represent the real system [11] [13]. Learning an accrate model of the system is not always possible. This is the case, for instance, when the training data is too sparse. It is clear that searching for a control policy sing inaccrate models might lead to poor reslts. The Probabilistic Inference for Learning Control algorithm in [1] alleviates the model-bias problem by inclding ncertainty in the learned model. In particlar, leverages Gassian Processes GP [14] to learn a probabilistic model of the system which represents model ncertainties as Gassian noise. The GP model allows to consider ncertainties in the long-term state predictions, and to explicitly consider evental model errors in the policy evalation. The comparison performed in [1] shows that otperforms state-of-the-art algorithms in terms of performed rollots on the real robot in pendlm and cart pole swing-p tasks. Despite the good performance, needs to learn a reliable model from sensory data to find an optimal policy. In order to explore the state space as mch as possible and to rapidly learn a reliable model, applies a random policy at the first stage of the learning. In

2 general, starting from a randomly initialized policy can be dangeros for real systems. The robot, in fact, can exhibit nstable behaviors in the first stages of the learning process, with the conseqent risk of damages. The work in [15] combines the benefits of model-free and model-based RL. Athors propose to model the system to control as a dynamical system and to find an initial control policy by sing optimal control. The learned policy is then applied to the real system and improved sing a model-free approach. The policy learned on the model represents a good starting point for policy search, significantly redcing the nmber of performed rollots. The approach in [16] also ses optimal control to find an initial policy, bt it ses sensory data to fit a locally linear model of the system. In this paper, we propose a model-based RL approach that exploits an approximate, analytical model of the robot to rapidly learn an optimal control policy. The proposed Policy Improvement with REsidal Model learning assmes that the system is modeled as a non-linear dynamical system. In real scenarios, it is not trivial to analytically derive the entire model, bt it is reasonable to assme that part of the model is easy to derive. For instance, in a robotic application, the model of the robot in free motion has a wellknown strctre [17], bt it is hard to model the sensor noise or the friction coefficient [18]. We refer to the known part of the model as the approximate model. Given the approximate model, we learn an optimal control policy in simlation. We exploit in this stdy, bt other choices are possible. Directly applying the policy learned in simlation on a real robot does not garantee to flfill the task. Hence, we exploit sensory data to learn the residal difference between the real and the approximate model sing Gassian Processes GP [14]. The sperposition of the approximate model and the learned residal term represents a good estimation of the real dynamics and it is adopted for model-based policy search. An overview of is shown in Fig. 1. In contrast to, or approach exploits the approximate model to be more data efficient and to converge faster to the optimal policy. Compared to the work in [15], explicitly learns the residal difference to improve the model and speed-p the policy search. Indeed, the approximate model is globally valid and we se sensory data only to estimate local variations of that model. In contrast to [16], we do not constraint the system dynamics to be locally linear. The rest of the paper is organized as follows. Section II describes or approach for model-based policy search. Experiments and comparisons are reported in Sec. III. Section IV states the conclsion and proposes ftre extensions. II. POLICY IMPROVEMENT WITH RESIDUAL MODEL LEARNING A. Preliminaries The goal of a reinforcement learning algorithm is to find a control policy = πx that minimizes the expected retrn J π θ = N E[cx t ] 1 t= where E[ ] is the operator that comptes the expected vale of a random variable, and cx t is the cost of being in state x at time t. The vector θ represents a set of vales sed to parameterize the continos-valed policy fnction πx, θ. In this work, we consider that the system to control is modeled as a non-linear dynamical system in the form x r t+1 = f r x r t, r t 2 where x r R d is the continos-valed state vector, r R f is the control inpt vector, f r R d is a continos and continosly differentiable fnction. The sperscript r indicates that the dynamical system 2 is the real exact model of the system. The difference eqation 2 can model a variety of physical systems, inclding robotic platforms. B. Additive Unknown Dynamics In general, it is not trivial to derive the exact model in 2. For example, althogh it is relatively simple to compte the dynamical model of a robotic maniplator in free motion, it is complicated to model evental external forces. In many robotics applications, we can assme that the fnction f r, in 2 consists of two terms: i a deterministic and know term, and ii an additive nknown term. Under this assmption, the dynamical system in 2 can be rewritten as x r t+1 = f a x r t, r t + f x r t, r t + w 3 where the fnction f a, is assmed known and deterministic, f, + w is an nknown and stochastic term, and w R d is an additive Gassian noise. The dynamical system x a t+1 = f a x a t, a t 4 represents the known dynamics in 3, and we call it the approximate model. Note that x a t and x r t represent the same state vector in different points of the state space R d. The approximate model is known a priori. Hence, a control policy a = π a x a for 3 can be learned in simlation by sing a reinforcement learning or an optimal control approach. In this work, we sed the algorithm [1] to learn π a, bt other choices are possible. It is worth noting that searching for the policy π a reqires only simlations, i.e., no experiments on the robot are performed at this stage. The learned policy π a represents a good starting point to search for a control policy π r for the real robot. This idea of initializing a control policy from simlations has been effectively applied in [15], where a linear system is sed as approximate model. Apart form initializing a policy from simlation, this work exploits real data for the robot to explicitly learn the nknown dynamics f, + w in 3. In other words, when the learned policy π a is applied to the real robot, the measred states are sed either to improve the crrent policy or to learn an estimate of the nknown dynamic. This significantly redces the nmber of rollots to perform on the real robot [1].

3 C. Learning Unknown Dynamics sing Gassian Processes The nknown dynamics f, + w in 3 is learned from sensory data sing Gassian Processes GP [14]. The goal of the learning process is to obtain an estimate ˆf of f +w sch that f a, + ˆf, f a, +f, +w. To learn the nknown dynamics, we exploit the real x r t and approximate x a t states, as well as the real r t and approximate a t control inpts. For convenience, let s define t = x t, t, where x t = x r t x a t and t = r t a t. We also define x r t = x r t, r t and x a t = x a t, a t. Using these definitions, it is straightforward to show that x r t = x a t + t and to rewrite the real dynamics in 3 as f r x r t = f r x a t + t = f a x a t + t + f x a t + t + w 5 Recalling that the zero-order Taylor series expansion of a fnction fa + a = fa + r a, we can write f r x a t + t = f a x a t + r a t + f x a t + r t + w 6 where r a t and r t are the residal errors generated by stopping the Taylor series expansion at the zero-order. Considering 6, the difference between the real 3 and approximate 4 systems can be written as x t+1 = f r x a t + t f a x a t = f a x a t + f x a t + r t + w f a x a t = f x a t + r t + w = ˆf x a t, t where we pose r t = r a t + r t. GP assme that the data are generated by a set of nderlying fnctions, whose joint probability distribtion is Gassian [14]. Having assmed a Gassian noise term w, we can conclde that the stochastic process 7 can be effectively represented as a GP. Eqation 7 shows that ˆf maps x a t and t into the next state x t+1. Hence, the training inpts of or GP, while the training otpts are {y t = x t+1 x t } N 1 t=. The training data are obtained as follows. At the n th rollot, the crrent policy π n is applied to the robot and to the approximate model. As already mentioned, for the first rollot we set π 1 = π a. After the trial, we have the two data sets X a = {x a t, a t } N t= and are the tples { x a t, t } N 1 t= = {x r t, r t } N t=. Hence, we have to simply calclate the element-wise difference X r X a to obtain the tples X d = { x t, t } N t=, which contain the rest of the training data for the Gassian process. The state predicted with a GP x t+1 is Gassian distribted, i.e., p x t+1 x a t, t = N x t+1 µ t+1, Σ t+1. Given N training inpts X = [ x a 1, 1,..., x a N, N ], training otpts y = [y 1,..., y N ] T, and a qery point x = X r T t T ] T, the predicted one-step mean µ t+1 and variance Σ t+1 are µ t+1 = x t + K x X K X X + σni 2 1 y Σ t+1 = k x, x K x X K X X + σni K X x [ x a, t where K x X = k x, X, K X x = K T x, k, is a X covariance fnction and the generic element ij of the matrix 7 K X X is given by K ij X = k x X i, x j. In or approach, k, is the sqared exponential covariance fnction k x i, x j = σk 2 exp 1 2 x i x j T Λ 1 x i x j 9 where Λ = diagl 2 1,..., l 2 D. The tnable parameters Λ, σ2 k and σ 2 n are learned from the training data by sing evidence maximization [14]. The standard GP formlation works for scalar otpts. For systems with a mltidimensional state, we consider one GP for each dimension assming that the dimensions are independent to each other. D. Policy Parameterization and Cost Fnction The control of a robotic device reqires a continosvaled control policy. It is clear that searching for a control policy in a continos space is nfeasible in terms of comptational cost. A common strategy [2] to solve this isse consists in assming that the policy π is parameterized by θ R p, i.e., πx, θ. In this work, we assme that the policy is the mean of a GP in the form D πx, θ = kx, c i K CC + σπi 2 1 yπ 1 i where x is the crrent state, C = [c 1,..., c D ] are the centers of the Gassian basis fnction k, defined as in 9. The vector y π plays the role of the GP training targets, while K CC is defined as in 8. The tnable parameters of the policy 1 are θ = [C, Λ, y π ]. As sggested in [1], we set the variance σπ 2 =.1 in or experiments. As already mentioned, the optimal policy minimizes the expected retrn J π θ in 1. In this work, we adopt a satrating cost fnction cx r t with spread σc 2 cx r t = 1 exp 1 2 x r t x g 2 [, 1] 11 where x g denotes the goal target state 1. E. Policy Evalation The learned GP model is sed to compte the longterm predictions p x 1 π,..., p x N, π starting from p x. The long-term predictions, in fact, are needed to compte the expected costs and to minimize the expected retrn in 1. Compte the long-term predictions from the one-step predictions in 8 reqires the predictive distribtion py t = pˆf ξ t ξ t pξ t dˆf dξ t 12 where ξ t = [ x a t T t T ] T and y t = x t+1 x t. Solving the integral in 12 is analytically intractable. Hence, we assme a Gassian distribtion p t = N t µ, Σ. The mean µ and Σ are obtained by applying the moment matching algorithm in [1]. The predictive distribtion p x t+1 π is also Gassian with mean and covariance µ t+1 = µ t + µ Σ t+1 = Σ t + Σ + cov x t, t + cov t, x t 13 1 In [19], the athor presents the benefits of the satrating cost 11 compared to a qadratic cost.

4 where the covariance cov, is compted as in [19]. Once the long-term predictions are compted, the estimated real state can be calclated as x r t = x a t + x t, where x a t is the deterministic state of the approximate model see Sec. II-A. Given that x r t = x a t + x t, the expected real long-term cost can be expressed as N N J π θ = E x r t [cx r t ] = E xt [c x t ] 14 t= t= Indeed, by sbstitting x t = x r t x a t into 11 we obtain cx r t = 1 exp 1 2 x r t x g 2 = 1 exp 1 2 x t + x a t x g 2 = 1 exp 1 2 x t x g x a t 2 = 1 exp 1 2 x t x g,t 2 = c x t The vector x a t is the approximate state compted by applying the initial policy π a and it is deterministic. Hence, we can precalclate all the x g,t for t = 1,..., N. Note that by minimizing c x t we force the real robot to follow the same trajectory of the approximate model. Recalling that the predictive distribtion p x t π = N x t µ t, Σ t, we can rewrite the real expected costs in 1 as E xt [c x t ] = c x t N x t µ t, Σ t d x t 15 where µ t and Σ t are defined as in 13. The expected vale E xt [c x t ] in 15 can be compted analytically and it is differentiable, which allows the adoption of gradient based approaches for policy search. F. Policy Search We leverage the gradient J π θ/ θ to search the policy parameters θ that minimize the expected long-term cost J π θ. In or formlation, the expected costs in 1 can be written in the form 15. Given that the expected costs have the form 15, and assming that the moment matching algorithm is sed for long-term predictions, the gradient J π θ/ θ can be compted analytically. The comptation of the gradient J π θ/ θ is detailed in [1]. The reslt of the policy improvement is a set of policy parameters θ n, where n indicates the n th iteration. The control policy π n = πx, θ n drives the estimated real state x a t + x t towards the desired goal. The new policy π n is applied to the real robot to obtain new training data and to refine the residal model. The process is repeated ntil the task is learned. is smmarized in Algorithm 1. III. EXPERIMENTS Experiments in this section show the effectiveness of or approach in learning control policies for simlated and real physical systems. The proposed approach is compared with the algorithm in [1] considering the nmber of Algorithm 1 Policy Improvement with REsidal Model learning 1: Learn a policy π a for the approximate model 4 2: π n = π a 3: while task learning is not completed do 4: Apply π n to the approximate model and the real robot 5: Calclate the new training set X d = X r X a 6: Calclate the new target set x g,t = x g x a t 7: Learn GP model for the dynamics 7 8: while J π θ not minimized do 9: Policy evalation sing 13 and 15 1: Policy improvement Sec. II-F and [1] 11: end while 12: retrn θ 13: end while 14: retrn π f k + θ a l m Fig. 2. a The pendlm interacting with an elastic environment. b The cart pole system connected to a spring. rollots performed on the real system and the total time dration of the rollots real experience. Both and are implemented in Matlab R. In all the experiments, we se the satrating cost fnction in 11. To consider bonded control inpts, the control policy in 1 is sqashed into the interval [ max, max ] by sing the fnction σx = max 9 sin x+sin 3x 8. A. Simlation Reslts We tested in two different tasks, namely a pendlm swing-p and a cart pole swing-p see Fig. 2, and compared the reslts with [1]. For a fair comparison, we se the same parameters for and. 1 Pendlm swing-p: The task consists in balancing the pendlm in the inverted position θ g = π rad, starting from the stable position θ = rad. As shown in Fig. 2a, the pendlm interacts with an elastic environment modeled as a spring when it reaches the vertical position θ = π rad. The interaction generates an extra force f k on the pendlm, which is neglected in or approximate model. Hence, the approximate model is the standard pendlm model + b θ t 1 4 ml2 + I mlg sin θ t = t b θ t θ l f k m

5 where the mass m = 1 kg, the length l = 1 m, I = 1 12 ml2 is the moment of inertia of a pendlm arond the midpoint, b =.1 snm/rad is the friction coefficient, g = 9.81 m/s 2 is the gravity acceleration and t is the control torqe. The state of the pendlm is defined as x = [ θ, θ] T, while the goal to reach is x g = [, π] T. The external force f k depends on the stiffness of the spring. We tested three different stiffness vales, namely 1, 2 and 5 N/m. Reslts are reported is Tab. I. Note that the time of real experience in Tab. I is a mltiple of the nmber of rollots. This is becase we keep the dration of the task fixed for each rollot. TABLE I RESULTS FOR THE PENDULUM SWING-UP. Stiffness Real rollots Real experience [N/m] [#] [s] [1] [1] [1] Stiffness 1 N/m: For a stiffness eqal to 1 N/m, max = 5 Nm is sfficient to flfill the task. The total dration of the task is T = 4 s, while the sampling time is dt =.1 s. Reslts in Tab. I show that needs only 2 iterations and 8 s of real experience to learn the control policy. For comparison, needs 6 iterations and 24 s of real experience. The improvement mostly depends on the redced state prediction error of. As shown in Fig. 3a, the approximate model accrately represents the system ntil the pendlm toches the spring. This lets to properly estimate the state after 2 iterations see Fig. 3b., instead, spends 5 iterations to learn a proper model of the system. As shown in Fig. 4, after 2 iterations the pendlm is able to reach the goal x g = [, π] T. The learned control policy is bonded x [ 5, 5] Nm, and it drives the pendlm to the goal position in less than 1 s see Fig. 4a. This reslts in an average expected retrn J π θ 12 see Fig. 4b. needs more rollots 6 instead of 2 to find the optimal policy. Since we sed the same policy parameterization and cost fnction in both the approaches, and learn almost the same control policy that generates a similar behavior of the pendlm. Stiffness 2 N/m: For a stiffness eqal to 2 N/m, we have to redce the sampling time to.5 s to avoid nmerical instabilities in the simlation of the real model. Moreover, we increase the maximm control inpt to 15 Nm. In this case, or approach needs 3 iterations and 12 s of real experience to learn the control policy see Tab. I. Stiffness 5 N/m: For a stiffness eqal to 5 N/m, we have to frther redce the sampling time to.125 s. To speed-p the learning, we redce the prediction time to 2 s. Moreover, we increase the maximm control inpt to 25 Nm. In this case, or approach needs 2 iterations and 4 s of real experience to learn the control policy see Tab. I predicted real a Rollots [#] b 5 6 Fig. 3. Pendlm model learning reslts stiffness 1 N/m. a State predicted and measred after one iteration of. b State prediction root mean sqare errors of and for different iterations. [Nm] -4 5 Pendlm angle Control policy a Rollots [#] b 6 7 Fig. 4. Reslts for the pendlm swing-p stiffness 1 N/m. a Pendlm angle and control policy obtained with after 2 iterations. The black solid line represents the goal. b Evoltion of the expected retrn J π θ for different rollots mean and std over 5 exections. 2 Cart pole swing-p: The task consists in swingingp the pendlm on the cart and balance it in the inverted position θ g = π rad, starting from the stable position θ = rad. The control inpt is the horizontal force t that makes the cart moving to the left or to the right. As shown in Fig. 2b, the cart is connected to a wall throgh a spring. The additive force f k of the spring affects the motion of the cart. The mass of the cart is m c =.5 kg, the mass of the pendlm is m p =.5 kg, and the length of the pendlm is l =.5 m. The state of the cart pole system is x = [p, ṗ, θ, θ] T, where p is the position of the cart, ṗ is the velocity of the cart, θ and θ are respectively the angle and the anglar velocity of the pendlm. The goal state is x g = [,, π, ] T. The cart pole system is more complicated than the single pendlm, since the state has for dimensions instead of two. Also in this case, the approximate model does not consider the extra force f k. The approximate model of the cart pole pendlm is then m c + m p p t mpl θ t cos θ t 1 2 mpl θ 2 t sin θ t = t bṗ t 2l θ t + 3 p t cos θ t + 3g sin θ t = We tested three different stiffness vales, namely 25, 5 and 12 N/m. Reslts are reported is Tab. II. Recall that the time of real experience in Tab. II is a mltiple of the nmber of rollots. Stiffness 25 N/m: For a stiffness eqal to 25 N/m, max = 1 N is sfficient to flfill the task. The total dration of the task is T = 4 s, while the sampling time is dt =.1 s. Reslts in Tab. II show that or approach needs only 2 iterations and 8 s of real experience to learn the control

6 [m] [m/s] [N] TABLE II RESULTS FOR THE CART POLE SWING-UP. Stiffness Real rollots Real experience [N/m] [#] [s] [1] [1] [1] Cart position Cart velocity a c [rad/s] 4 2 Pendlm angle Pendlm anglar velocity b Rollots [#] Fig. 5. Reslts for the cart pole swing-p stiffness 12 N/m. a b State of the cart pole system and c learned control policy after 15 iterations of. The black solid line represents the goal. d Evoltion of the expected retrn J π θ for different rollots mean and std over 5 exections. d takes from 25% to 67% less iterations than. As a conseqence, the time of real experience is redced from 25% to 67% from a minimm of 2 s to a maximm of 16 s less than. B. Robot Experiment is applied to control a qbmove Maker variable stiffness actator VSA [2]. As shown in Fig. 6, the arm consists of for VSA. The first joint of the arm is actated, while the other three are not controlled. The task consists in swinging-p the robotic arm by controlling the actated joint. As approximate model, we se the mass spring system m θ t + kθ t t = 16 where m =.78 kg is the mass of the nactated joints, and k = 14 Nm/rad is the maximm stiffness of the qbmove Maker [2]. The variable θ t represents the link position, while t is the commanded motor position. It is worth noticing that the approximate model in 16 neglects the length, inertia and friction of the arm in Fig. 6. Moreover, the model in 16 represents a VSA only in the region where the behavior of the spring is linear. Actated Joint Unactated Joints policy. For comparison, needs 5 iterations and 2 s of real experience. Stiffness 5 N/m: For a stiffness eqal to 2 N/m, we have to increase the maximm control inpt to 15 N in order to flfill the task. In this case, needs 3 iterations and 12 s of real experience to learn the policy see Tab. II. Stiffness 12 N/m: For a stiffness eqal to 12 N/m, we have to redce the sampling time to.5 s to avoid nmerical instabilities in the simlation of the real model. We also redce the dration of the task to T = 2 s to speed-p the learning process. In this case, or approach needs 15 iterations and 3 s of real experience to learn the control policy see Tab. II. As shown in Fig. 5, after 15 iterations the cart pole system reaches the goal x g = [,, π, ] T. The learned control policy is bonded x [ 15, 15] N, and it drives the cart pole system to the goal position in less than 4 s see Fig. 5a to 5c. This reslts in an average expected retrn J π θ 2 see Fig. 5d. needs more rollots 23 instead of 15 to find the optimal policy. Since we sed the same policy parameterization and cost fnction in both the approaches, and learn almost the same control policy that generates a similar behavior of the cart pole system. In the considered simlations, the proposed performs better than. In terms of rollots, or approach Fig. 6. The VSA arm sed in the robotic experiment. The goal is to learn a control policy t that drives the model from the initial state x = [θ, θ ] = [π/2, ] T to the goal x g = [ π/2, ] T in T = 5 s. The maximm inpt position is max = 1.7 rad. Joint velocities are compted from the measred joint angles sing a constant velocity Kalman filter. The robot is controlled at 5 Hz via a Simlink R interface. Reslts in Tab. III show that or approach needs 3 iterations and 15 s of real experience to lean the task. For comparison, needs 5 iterations and 25 s of real experience to learn the same task. Hence, shows an improvement of the performance of 4%. Figre 7 shows the learned policy, the cost evoltion and the state of the robot after 3 iterations of. Note that, for a better visalization, we show only the first two seconds of the task. As shown in Fig. 7a, the robot effectively reaches the goal after approximatively.5 s. This reslts in an expected retrn J π θ 9 see Fig. 7d. needs 5 rollots instead of 3 to find the optimal policy. As shown in Fig. 7d, the expected retrn of is also J π θ 9. This indicates that and learn a similar control policy that generates a similar behavior of the VSA pendlm.

7 [rad/s] Joint angle Joint anglar velocity a c Control policy b Rollots [#] d Fig. 7. Reslts for the VSA pendlm swing-p. a State of the VSA pendlm, b learned control policy, and c snapshots of the task exection after 3 iterations of. The black solid line in a represents the goal. d Evoltion of the expected retrn J π θ for different rollots. Performed simlations and experiments show that learning a black-box model of the residal, instead of a black-box model of the entire system, can significantly improve the performance of the policy search algorithm. TABLE III RESULTS FOR THE VSA PENDULUM SWING-UP. Real rollots Real experience [#] [s] 3 15 [1] 5 25 IV. CONCLUSION AND FUTURE WORK In this work, we proposed, a model-based reinforcement learning approach that exploits a simplified model to find an optimal control policy for a given task. The proposed approach works in two steps. First, an optimal policy is fond for the simplified model sing standard policy search algorithms. Second, sensory data from the real robot are exploited to learn a probabilistic model that represents the difference between the real system and its simplified model. Differently from most model-based RL approaches, or method does not learn a black-box model of the entire system. learns, instead, a black-box model of the differences between the real system and an approximated model, i.e., the model residal. As a conseqence, or approach is sitable in applications where the system model is inaccrate or only partially available. The approach has been compared with, showing improvements in terms of performed rollots and time of real experience. Or ftre research will focs on testing the proposed approach in more complicated tasks with a high dimensional searching space. An important open qestion is how 6 inaccrate the model can be in order to maintain enhanced performance. We expect that with a completely wrong model, the performance of the algorithm becomes similar or slightly worse than classical black-box approaches sch as. However, formally qantifying the tolerated ncertainty still remains an open theoretical problem. Another potential limitation of or approach is that we assme additive model residals. Evalating the generality of this assmption more precisely will be a possible ftre research topic. ACKNOWLEDGEMENTS This work has been partially spported by the Technical University of Mnich, International Gradate School of Science and Engineering, and by the Marie Crie Action LEACON, EU project REFERENCES [1] R. Stton and A. Barto, Reinforcement learning: an introdction, ser. A Bradford book. MIT Press, [2] J. Kober, D. Bagnell, and J. Peters, Reinforcement learning in robotics: a srvey, IJRR, vol. 32, no. 11, pp , 213. [3] S. Calinon, P. Kormshev, and D. Caldwell, Compliant skills acqisition and mlti-optima policy search with EM-based reinforcement learning, RAS, vol. 61, no. 4, pp , 213. [4] J. Kober and J. Peters, Policy search for motor primitives in robotics, in NIPS, 29, pp [5] E. Theodoro, J. Bchli, and S. Schaal, A generalized path integral control approach to reinforcement learning, Jornal of Machine Learning Research, vol. 11, pp , 21. [6] J. Peters, K. Melling, and Y. Altn, Relative entropy policy search, in Conference on Articial Intelligence AAAI, 21, pp [7] D. Lee, H. Knori, and Y. Nakamra, Association of whole body motion from tool knowledge for hmanoid robots, in International Conference on Intelligent Robots and Systems, 28, pp [8] H. Knori, D. Lee, and Y. Nakamra, Associating and reshaping of whole body motions for object maniplation, in International Conference on Intelligent Robots and Systems, 29, pp [9] M. Saveriano, S. An, and D. Lee, Incremental kinesthetic teaching of end-effector and nll-space motion primitives, in International Conference on Robotics and Atomation, 215, pp [1] M. P. Deisenroth, D. Fox, and C. E. Rasmssen, Gassian processes for data-efficient learning in robotics and control, TPAMI, vol. 37, no. 2, pp , 215. [11] J. G. Schneider, Exploiting model ncertainty estimates for safe dynamic control learning, in in Neral Information Processing Systems 9. The MIT Press, 1996, pp [12] C. G. Atkeson and S. Schaal, Robot learning from demonstration, in International Conference on Machine Learning, 1997, pp [13] C. G. Atkeson and J. C. Santamaria, A comparison of direct and model-based reinforcement learning, in International conference on robotics and atomation, 1997, pp [14] C. E. Rasmssen and C. K. I. Williams, Gassian Processes for Machine Learning. MIT Press, 26. [15] F. Farshidian, M. Nenert, and J. Bchli, Learning of closed-loop motion control, in IROS, 214, pp [16] S. Levine and P. Abbeel, Learning neral network policies with gided policy search nder nknown dynamics, in Advances in Neral Information Processing Systems, 214, pp [17] B. Siciliano, L. Sciavicco, L. Villani, and G. Oriolo, Robotics - Modelling, Planning and Control. Springer, 29. [18] G. D. Maria, P. Falco, C. Natale, and S. Pirozzi, Integrated force/tactile sensing: The enabling technology for slipping detection and avoidance, in ICRA, 215, pp [19] M. Deisenroth, Efficient Reinforcement Learning sing Gassian Processes. KIT Scientific Pblishing, 21. [2] M. G. Catalano, G. Grioli, M. Garabini, F. Bonomo, M. Mancini, N. Tsagarakis, and A. Bicchi, Vsa-cbebot: A modlar variable stiffness platform for mltiple degrees of freedom robots, in International Conference on Robotics and Atomation, 211, pp

Chapter 4 Supervised learning:

Chapter 4 Supervised learning: Chapter 4 Spervised learning: Mltilayer Networks II Madaline Other Feedforward Networks Mltiple adalines of a sort as hidden nodes Weight change follows minimm distrbance principle Adaptive mlti-layer

More information

A Model-Free Adaptive Control of Pulsed GTAW

A Model-Free Adaptive Control of Pulsed GTAW A Model-Free Adaptive Control of Plsed GTAW F.L. Lv 1, S.B. Chen 1, and S.W. Dai 1 Institte of Welding Technology, Shanghai Jiao Tong University, Shanghai 00030, P.R. China Department of Atomatic Control,

More information

An Investigation into Estimating Type B Degrees of Freedom

An Investigation into Estimating Type B Degrees of Freedom An Investigation into Estimating Type B Degrees of H. Castrp President, Integrated Sciences Grop Jne, 00 Backgrond The degrees of freedom associated with an ncertainty estimate qantifies the amont of information

More information

Decision Oriented Bayesian Design of Experiments

Decision Oriented Bayesian Design of Experiments Decision Oriented Bayesian Design of Experiments Farminder S. Anand*, Jay H. Lee**, Matthew J. Realff*** *School of Chemical & Biomoleclar Engineering Georgia Institte of echnology, Atlanta, GA 3332 USA

More information

Nonparametric Identification and Robust H Controller Synthesis for a Rotational/Translational Actuator

Nonparametric Identification and Robust H Controller Synthesis for a Rotational/Translational Actuator Proceedings of the 6 IEEE International Conference on Control Applications Mnich, Germany, October 4-6, 6 WeB16 Nonparametric Identification and Robst H Controller Synthesis for a Rotational/Translational

More information

Linearly Solvable Markov Games

Linearly Solvable Markov Games Linearly Solvable Markov Games Krishnamrthy Dvijotham and mo Todorov Abstract Recent work has led to an interesting new theory of linearly solvable control, where the Bellman eqation characterizing the

More information

UNCERTAINTY FOCUSED STRENGTH ANALYSIS MODEL

UNCERTAINTY FOCUSED STRENGTH ANALYSIS MODEL 8th International DAAAM Baltic Conference "INDUSTRIAL ENGINEERING - 19-1 April 01, Tallinn, Estonia UNCERTAINTY FOCUSED STRENGTH ANALYSIS MODEL Põdra, P. & Laaneots, R. Abstract: Strength analysis is a

More information

Study on the impulsive pressure of tank oscillating by force towards multiple degrees of freedom

Study on the impulsive pressure of tank oscillating by force towards multiple degrees of freedom EPJ Web of Conferences 80, 0034 (08) EFM 07 Stdy on the implsive pressre of tank oscillating by force towards mltiple degrees of freedom Shigeyki Hibi,* The ational Defense Academy, Department of Mechanical

More information

ON THE SHAPES OF BILATERAL GAMMA DENSITIES

ON THE SHAPES OF BILATERAL GAMMA DENSITIES ON THE SHAPES OF BILATERAL GAMMA DENSITIES UWE KÜCHLER, STEFAN TAPPE Abstract. We investigate the for parameter family of bilateral Gamma distribtions. The goal of this paper is to provide a thorogh treatment

More information

Prandl established a universal velocity profile for flow parallel to the bed given by

Prandl established a universal velocity profile for flow parallel to the bed given by EM 0--00 (Part VI) (g) The nderlayers shold be at least three thicknesses of the W 50 stone, bt never less than 0.3 m (Ahrens 98b). The thickness can be calclated sing Eqation VI-5-9 with a coefficient

More information

Sources of Non Stationarity in the Semivariogram

Sources of Non Stationarity in the Semivariogram Sorces of Non Stationarity in the Semivariogram Migel A. Cba and Oy Leangthong Traditional ncertainty characterization techniqes sch as Simple Kriging or Seqential Gassian Simlation rely on stationary

More information

Technical Note. ODiSI-B Sensor Strain Gage Factor Uncertainty

Technical Note. ODiSI-B Sensor Strain Gage Factor Uncertainty Technical Note EN-FY160 Revision November 30, 016 ODiSI-B Sensor Strain Gage Factor Uncertainty Abstract Lna has pdated or strain sensor calibration tool to spport NIST-traceable measrements, to compte

More information

The Real Stabilizability Radius of the Multi-Link Inverted Pendulum

The Real Stabilizability Radius of the Multi-Link Inverted Pendulum Proceedings of the 26 American Control Conference Minneapolis, Minnesota, USA, Jne 14-16, 26 WeC123 The Real Stabilizability Radis of the Mlti-Link Inerted Pendlm Simon Lam and Edward J Daison Abstract

More information

Control of a Power Assisted Lifting Device

Control of a Power Assisted Lifting Device Proceedings of the RAAD 212 21th International Workshop on Robotics in Alpe-Adria-Danbe Region eptember 1-13, 212, Napoli, Italy Control of a Power Assisted Lifting Device Dimeas otios a, Kostompardis

More information

System identification of buildings equipped with closed-loop control devices

System identification of buildings equipped with closed-loop control devices System identification of bildings eqipped with closed-loop control devices Akira Mita a, Masako Kamibayashi b a Keio University, 3-14-1 Hiyoshi, Kohok-k, Yokohama 223-8522, Japan b East Japan Railway Company

More information

Creating a Sliding Mode in a Motion Control System by Adopting a Dynamic Defuzzification Strategy in an Adaptive Neuro Fuzzy Inference System

Creating a Sliding Mode in a Motion Control System by Adopting a Dynamic Defuzzification Strategy in an Adaptive Neuro Fuzzy Inference System Creating a Sliding Mode in a Motion Control System by Adopting a Dynamic Defzzification Strategy in an Adaptive Nero Fzzy Inference System M. Onder Efe Bogazici University, Electrical and Electronic Engineering

More information

Chapter 3 MATHEMATICAL MODELING OF DYNAMIC SYSTEMS

Chapter 3 MATHEMATICAL MODELING OF DYNAMIC SYSTEMS Chapter 3 MATHEMATICAL MODELING OF DYNAMIC SYSTEMS 3. System Modeling Mathematical Modeling In designing control systems we mst be able to model engineered system dynamics. The model of a dynamic system

More information

Safe Manual Control of the Furuta Pendulum

Safe Manual Control of the Furuta Pendulum Safe Manal Control of the Frta Pendlm Johan Åkesson, Karl Johan Åström Department of Atomatic Control, Lnd Institte of Technology (LTH) Box 8, Lnd, Sweden PSfrag {jakesson,kja}@control.lth.se replacements

More information

Convergence analysis of ant colony learning

Convergence analysis of ant colony learning Delft University of Technology Delft Center for Systems and Control Technical report 11-012 Convergence analysis of ant colony learning J van Ast R Babška and B De Schtter If yo want to cite this report

More information

Lecture Notes: Finite Element Analysis, J.E. Akin, Rice University

Lecture Notes: Finite Element Analysis, J.E. Akin, Rice University 9. TRUSS ANALYSIS... 1 9.1 PLANAR TRUSS... 1 9. SPACE TRUSS... 11 9.3 SUMMARY... 1 9.4 EXERCISES... 15 9. Trss analysis 9.1 Planar trss: The differential eqation for the eqilibrim of an elastic bar (above)

More information

Stability of Model Predictive Control using Markov Chain Monte Carlo Optimisation

Stability of Model Predictive Control using Markov Chain Monte Carlo Optimisation Stability of Model Predictive Control sing Markov Chain Monte Carlo Optimisation Elilini Siva, Pal Golart, Jan Maciejowski and Nikolas Kantas Abstract We apply stochastic Lyapnov theory to perform stability

More information

CHANNEL SELECTION WITH RAYLEIGH FADING: A MULTI-ARMED BANDIT FRAMEWORK. Wassim Jouini and Christophe Moy

CHANNEL SELECTION WITH RAYLEIGH FADING: A MULTI-ARMED BANDIT FRAMEWORK. Wassim Jouini and Christophe Moy CHANNEL SELECTION WITH RAYLEIGH FADING: A MULTI-ARMED BANDIT FRAMEWORK Wassim Joini and Christophe Moy SUPELEC, IETR, SCEE, Avene de la Bolaie, CS 47601, 5576 Cesson Sévigné, France. INSERM U96 - IFR140-

More information

The Linear Quadratic Regulator

The Linear Quadratic Regulator 10 The Linear Qadratic Reglator 10.1 Problem formlation This chapter concerns optimal control of dynamical systems. Most of this development concerns linear models with a particlarly simple notion of optimality.

More information

We automate the bivariate change-of-variables technique for bivariate continuous random variables with

We automate the bivariate change-of-variables technique for bivariate continuous random variables with INFORMS Jornal on Compting Vol. 4, No., Winter 0, pp. 9 ISSN 09-9856 (print) ISSN 56-558 (online) http://dx.doi.org/0.87/ijoc.046 0 INFORMS Atomating Biariate Transformations Jeff X. Yang, John H. Drew,

More information

arxiv: v1 [cs.sy] 22 Nov 2018

arxiv: v1 [cs.sy] 22 Nov 2018 AAS 18-253 ROBUST MYOPIC CONTROL FOR SYSTEMS WITH IMPERFECT OBSERVATIONS Dantong Ge, Melkior Ornik, and Ufk Topc arxiv:1811.09000v1 [cs.sy] 22 Nov 2018 INTRODUCTION Control of systems operating in nexplored

More information

QUANTILE ESTIMATION IN SUCCESSIVE SAMPLING

QUANTILE ESTIMATION IN SUCCESSIVE SAMPLING Jornal of the Korean Statistical Society 2007, 36: 4, pp 543 556 QUANTILE ESTIMATION IN SUCCESSIVE SAMPLING Hosila P. Singh 1, Ritesh Tailor 2, Sarjinder Singh 3 and Jong-Min Kim 4 Abstract In sccessive

More information

Theoretical and Experimental Implementation of DC Motor Nonlinear Controllers

Theoretical and Experimental Implementation of DC Motor Nonlinear Controllers Theoretical and Experimental Implementation of DC Motor Nonlinear Controllers D.R. Espinoza-Trejo and D.U. Campos-Delgado Facltad de Ingeniería, CIEP, UASLP, espinoza trejo dr@aslp.mx Facltad de Ciencias,

More information

Optimal Control of a Heterogeneous Two Server System with Consideration for Power and Performance

Optimal Control of a Heterogeneous Two Server System with Consideration for Power and Performance Optimal Control of a Heterogeneos Two Server System with Consideration for Power and Performance by Jiazheng Li A thesis presented to the University of Waterloo in flfilment of the thesis reqirement for

More information

FRTN10 Exercise 12. Synthesis by Convex Optimization

FRTN10 Exercise 12. Synthesis by Convex Optimization FRTN Exercise 2. 2. We want to design a controller C for the stable SISO process P as shown in Figre 2. sing the Yola parametrization and convex optimization. To do this, the control loop mst first be

More information

AN ALTERNATIVE DECOUPLED SINGLE-INPUT FUZZY SLIDING MODE CONTROL WITH APPLICATIONS

AN ALTERNATIVE DECOUPLED SINGLE-INPUT FUZZY SLIDING MODE CONTROL WITH APPLICATIONS AN ALTERNATIVE DECOUPLED SINGLE-INPUT FUZZY SLIDING MODE CONTROL WITH APPLICATIONS Fang-Ming Y, Hng-Yan Chng* and Chen-Ning Hang Department of Electrical Engineering National Central University, Chngli,

More information

Control Performance Monitoring of State-Dependent Nonlinear Processes

Control Performance Monitoring of State-Dependent Nonlinear Processes Control Performance Monitoring of State-Dependent Nonlinear Processes Lis F. Recalde*, Hong Ye Wind Energy and Control Centre, Department of Electronic and Electrical Engineering, University of Strathclyde,

More information

AN ALTERNATIVE DECOUPLED SINGLE-INPUT FUZZY SLIDING MODE CONTROL WITH APPLICATIONS

AN ALTERNATIVE DECOUPLED SINGLE-INPUT FUZZY SLIDING MODE CONTROL WITH APPLICATIONS AN ALTERNATIVE DECOUPLED SINGLE-INPUT FUZZY SLIDING MODE CONTROL WITH APPLICATIONS Fang-Ming Y, Hng-Yan Chng* and Chen-Ning Hang Department of Electrical Engineering National Central University, Chngli,

More information

FOUNTAIN codes [3], [4] provide an efficient solution

FOUNTAIN codes [3], [4] provide an efficient solution Inactivation Decoding of LT and Raptor Codes: Analysis and Code Design Francisco Lázaro, Stdent Member, IEEE, Gianligi Liva, Senior Member, IEEE, Gerhard Bach, Fellow, IEEE arxiv:176.5814v1 [cs.it 19 Jn

More information

REINFORCEMENT LEARNING AND OPTIMAL ADAPTIVE CONTROL

REINFORCEMENT LEARNING AND OPTIMAL ADAPTIVE CONTROL Lewis c11.tex V1-10/19/2011 4:10pm Page 461 11 REINFORCEMENT LEARNING AND OPTIMAL ADAPTIVE CONTROL In this book we have presented a variety of methods for the analysis and design of optimal control systems.

More information

Step-Size Bounds Analysis of the Generalized Multidelay Adaptive Filter

Step-Size Bounds Analysis of the Generalized Multidelay Adaptive Filter WCE 007 Jly - 4 007 London UK Step-Size onds Analysis of the Generalized Mltidelay Adaptive Filter Jnghsi Lee and Hs Chang Hang Abstract In this paper we analyze the bonds of the fixed common step-size

More information

Curves - Foundation of Free-form Surfaces

Curves - Foundation of Free-form Surfaces Crves - Fondation of Free-form Srfaces Why Not Simply Use a Point Matrix to Represent a Crve? Storage isse and limited resoltion Comptation and transformation Difficlties in calclating the intersections

More information

Adaptive Fault-tolerant Control with Control Allocation for Flight Systems with Severe Actuator Failures and Input Saturation

Adaptive Fault-tolerant Control with Control Allocation for Flight Systems with Severe Actuator Failures and Input Saturation 213 American Control Conference (ACC) Washington, DC, USA, Jne 17-19, 213 Adaptive Falt-tolerant Control with Control Allocation for Flight Systems with Severe Actator Failres and Inpt Satration Man Wang,

More information

Discussion of The Forward Search: Theory and Data Analysis by Anthony C. Atkinson, Marco Riani, and Andrea Ceroli

Discussion of The Forward Search: Theory and Data Analysis by Anthony C. Atkinson, Marco Riani, and Andrea Ceroli 1 Introdction Discssion of The Forward Search: Theory and Data Analysis by Anthony C. Atkinson, Marco Riani, and Andrea Ceroli Søren Johansen Department of Economics, University of Copenhagen and CREATES,

More information

Effects of Soil Spatial Variability on Bearing Capacity of Shallow Foundations

Effects of Soil Spatial Variability on Bearing Capacity of Shallow Foundations Geotechnical Safety and Risk V T. Schweckendiek et al. (Eds.) 2015 The athors and IOS Press. This article is pblished online with Open Access by IOS Press and distribted nder the terms of the Creative

More information

Study of the diffusion operator by the SPH method

Study of the diffusion operator by the SPH method IOSR Jornal of Mechanical and Civil Engineering (IOSR-JMCE) e-issn: 2278-684,p-ISSN: 2320-334X, Volme, Isse 5 Ver. I (Sep- Oct. 204), PP 96-0 Stdy of the diffsion operator by the SPH method Abdelabbar.Nait

More information

The spreading residue harmonic balance method for nonlinear vibration of an electrostatically actuated microbeam

The spreading residue harmonic balance method for nonlinear vibration of an electrostatically actuated microbeam J.L. Pan W.Y. Zh Nonlinear Sci. Lett. Vol.8 No. pp.- September The spreading reside harmonic balance method for nonlinear vibration of an electrostatically actated microbeam J. L. Pan W. Y. Zh * College

More information

The Dual of the Maximum Likelihood Method

The Dual of the Maximum Likelihood Method Department of Agricltral and Resorce Economics University of California, Davis The Dal of the Maximm Likelihood Method by Qirino Paris Working Paper No. 12-002 2012 Copyright @ 2012 by Qirino Paris All

More information

RESGen: Renewable Energy Scenario Generation Platform

RESGen: Renewable Energy Scenario Generation Platform 1 RESGen: Renewable Energy Scenario Generation Platform Emil B. Iversen, Pierre Pinson, Senior Member, IEEE, and Igor Ardin Abstract Space-time scenarios of renewable power generation are increasingly

More information

Sareban: Evaluation of Three Common Algorithms for Structure Active Control

Sareban: Evaluation of Three Common Algorithms for Structure Active Control Engineering, Technology & Applied Science Research Vol. 7, No. 3, 2017, 1638-1646 1638 Evalation of Three Common Algorithms for Strctre Active Control Mohammad Sareban Department of Civil Engineering Shahrood

More information

Analytical Value-at-Risk and Expected Shortfall under Regime Switching *

Analytical Value-at-Risk and Expected Shortfall under Regime Switching * Working Paper 9-4 Departamento de Economía Economic Series 14) Universidad Carlos III de Madrid March 9 Calle Madrid, 16 893 Getafe Spain) Fax 34) 91649875 Analytical Vale-at-Risk and Expected Shortfall

More information

BLOOM S TAXONOMY. Following Bloom s Taxonomy to Assess Students

BLOOM S TAXONOMY. Following Bloom s Taxonomy to Assess Students BLOOM S TAXONOMY Topic Following Bloom s Taonomy to Assess Stdents Smmary A handot for stdents to eplain Bloom s taonomy that is sed for item writing and test constrction to test stdents to see if they

More information

Optimization in Predictive Control Algorithm

Optimization in Predictive Control Algorithm Latest rends in Circits, Systems, Sinal Processin and Atomatic Control Optimization in Predictive Control Alorithm JAN ANOŠ, MAREK KUBALČÍK omas Bata University in Zlín, Faclty of Applied Informatics Nám..

More information

Lecture 17 Errors in Matlab s Turbulence PSD and Shaping Filter Expressions

Lecture 17 Errors in Matlab s Turbulence PSD and Shaping Filter Expressions Lectre 7 Errors in Matlab s Trblence PSD and Shaping Filter Expressions b Peter J Sherman /7/7 [prepared for AERE 355 class] In this brief note we will show that the trblence power spectral densities (psds)

More information

TrustSVD: Collaborative Filtering with Both the Explicit and Implicit Influence of User Trust and of Item Ratings

TrustSVD: Collaborative Filtering with Both the Explicit and Implicit Influence of User Trust and of Item Ratings Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence TrstSVD: Collaborative Filtering with Both the Explicit and Implicit Inflence of User Trst and of Item Ratings Gibing Go Jie Zhang

More information

Electron Phase Slip in an Undulator with Dipole Field and BPM Errors

Electron Phase Slip in an Undulator with Dipole Field and BPM Errors CS-T--14 October 3, Electron Phase Slip in an Undlator with Dipole Field and BPM Errors Pal Emma SAC ABSTRACT A statistical analysis of a corrected electron trajectory throgh a planar ndlator is sed to

More information

A Characterization of Interventional Distributions in Semi-Markovian Causal Models

A Characterization of Interventional Distributions in Semi-Markovian Causal Models A Characterization of Interventional Distribtions in Semi-Markovian Casal Models Jin Tian and Changsng Kang Department of Compter Science Iowa State University Ames, IA 50011 {jtian, cskang}@cs.iastate.ed

More information

Calculations involving a single random variable (SRV)

Calculations involving a single random variable (SRV) Calclations involving a single random variable (SRV) Example of Bearing Capacity q φ = 0 µ σ c c = 100kN/m = 50kN/m ndrained shear strength parameters What is the relationship between the Factor of Safety

More information

1. State-Space Linear Systems 2. Block Diagrams 3. Exercises

1. State-Space Linear Systems 2. Block Diagrams 3. Exercises LECTURE 1 State-Space Linear Sstems This lectre introdces state-space linear sstems, which are the main focs of this book. Contents 1. State-Space Linear Sstems 2. Block Diagrams 3. Exercises 1.1 State-Space

More information

DESIGN AND SIMULATION OF SELF-TUNING PREDICTIVE CONTROL OF TIME-DELAY PROCESSES

DESIGN AND SIMULATION OF SELF-TUNING PREDICTIVE CONTROL OF TIME-DELAY PROCESSES DESIGN AND SIMULAION OF SELF-UNING PREDICIVE CONROL OF IME-DELAY PROCESSES Vladimír Bobál,, Marek Kbalčík and Petr Dostál, omas Bata University in Zlín Centre of Polymer Systems, University Institte Department

More information

International Journal of Physical and Mathematical Sciences journal homepage:

International Journal of Physical and Mathematical Sciences journal homepage: 64 International Jornal of Physical and Mathematical Sciences Vol 2, No 1 (2011) ISSN: 2010-1791 International Jornal of Physical and Mathematical Sciences jornal homepage: http://icoci.org/ijpms PRELIMINARY

More information

ACTUATION AND SIMULATION OF A MINISYSTEM WITH FLEXURE HINGES

ACTUATION AND SIMULATION OF A MINISYSTEM WITH FLEXURE HINGES The 4th International Conference Comptational Mechanics and Virtal Engineering COMEC 2011 20-22 OCTOBER 2011, Brasov, Romania ACTUATION AND SIMULATION OF A MINISYSTEM WITH FLEXURE HINGES D. NOVEANU 1,

More information

Lecture Notes On THEORY OF COMPUTATION MODULE - 2 UNIT - 2

Lecture Notes On THEORY OF COMPUTATION MODULE - 2 UNIT - 2 BIJU PATNAIK UNIVERSITY OF TECHNOLOGY, ODISHA Lectre Notes On THEORY OF COMPUTATION MODULE - 2 UNIT - 2 Prepared by, Dr. Sbhend Kmar Rath, BPUT, Odisha. Tring Machine- Miscellany UNIT 2 TURING MACHINE

More information

STABILIZATIO ON OF LONGITUDINAL AIRCRAFT MOTION USING MODEL PREDICTIVE CONTROL AND EXACT LINEARIZATION

STABILIZATIO ON OF LONGITUDINAL AIRCRAFT MOTION USING MODEL PREDICTIVE CONTROL AND EXACT LINEARIZATION 8 TH INTERNATIONAL CONGRESS OF THE AERONAUTICAL SCIENCES STABILIZATIO ON OF LONGITUDINAL AIRCRAFT MOTION USING MODEL PREDICTIVE CONTROL AND EXACT LINEARIZATION Čeliovsý S.*, Hospodář P.** *CTU Prage, Faclty

More information

COMPARATIVE STUDY OF ROBUST CONTROL TECHNIQUES FOR OTTO-CYCLE MOTOR CONTROL

COMPARATIVE STUDY OF ROBUST CONTROL TECHNIQUES FOR OTTO-CYCLE MOTOR CONTROL ACM Symposim Series in Mechatronics - Vol. - pp.76-85 Copyright c 28 by ACM COMPARATIVE STUDY OF ROUST CONTROL TECHNIQUES FOR OTTO-CYCLE MOTOR CONTROL Marcos Salazar Francisco, marcos.salazar@member.isa.org

More information

Linear and Nonlinear Model Predictive Control of Quadruple Tank Process

Linear and Nonlinear Model Predictive Control of Quadruple Tank Process Linear and Nonlinear Model Predictive Control of Qadrple Tank Process P.Srinivasarao Research scholar Dr.M.G.R.University Chennai, India P.Sbbaiah, PhD. Prof of Dhanalaxmi college of Engineering Thambaram

More information

Deakin Research Online

Deakin Research Online Deakin Research Online his is the pblished version: Herek A. Samir S Steven rinh Hie and Ha Qang P. 9 Performance of first and second-order sliding mode observers for nonlinear systems in AISA 9 : Proceedings

More information

Intelligent Positioning Plate Predictive Control and Concept of Diagnosis System Design

Intelligent Positioning Plate Predictive Control and Concept of Diagnosis System Design Intelligent Positioning Plate Predictive Control and Concept of Diagnosis System Design Matej Oravec - Anna Jadlovská Department of Cybernetics and Artificial Intelligence, Faclty of Electrical Engineering

More information

FREQUENCY DOMAIN FLUTTER SOLUTION TECHNIQUE USING COMPLEX MU-ANALYSIS

FREQUENCY DOMAIN FLUTTER SOLUTION TECHNIQUE USING COMPLEX MU-ANALYSIS 7 TH INTERNATIONAL CONGRESS O THE AERONAUTICAL SCIENCES REQUENCY DOMAIN LUTTER SOLUTION TECHNIQUE USING COMPLEX MU-ANALYSIS Yingsong G, Zhichn Yang Northwestern Polytechnical University, Xi an, P. R. China,

More information

Discussion Papers Department of Economics University of Copenhagen

Discussion Papers Department of Economics University of Copenhagen Discssion Papers Department of Economics University of Copenhagen No. 10-06 Discssion of The Forward Search: Theory and Data Analysis, by Anthony C. Atkinson, Marco Riani, and Andrea Ceroli Søren Johansen,

More information

Stochastic Extended LQR: Optimization-based Motion Planning Under Uncertainty

Stochastic Extended LQR: Optimization-based Motion Planning Under Uncertainty Stochastic Extended LQR: Optimization-based Motion Planning Under Uncertainty Wen Sn 1, Jr van den Berg 2, and Ron Alterovitz 1 1 University of North Carolina at Chapel Hill, USA {wens, ron}@cs.nc.ed 2

More information

Joint Transfer of Energy and Information in a Two-hop Relay Channel

Joint Transfer of Energy and Information in a Two-hop Relay Channel Joint Transfer of Energy and Information in a Two-hop Relay Channel Ali H. Abdollahi Bafghi, Mahtab Mirmohseni, and Mohammad Reza Aref Information Systems and Secrity Lab (ISSL Department of Electrical

More information

Dynamic Optimization of First-Order Systems via Static Parametric Programming: Application to Electrical Discharge Machining

Dynamic Optimization of First-Order Systems via Static Parametric Programming: Application to Electrical Discharge Machining Dynamic Optimization of First-Order Systems via Static Parametric Programming: Application to Electrical Discharge Machining P. Hgenin*, B. Srinivasan*, F. Altpeter**, R. Longchamp* * Laboratoire d Atomatiqe,

More information

Stochastic Extended LQR for Optimization-based Motion Planning Under Uncertainty

Stochastic Extended LQR for Optimization-based Motion Planning Under Uncertainty SUBMITTED TO IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, NOVEMBER 204 Stochastic Extended LQR for Optimization-based Motion Planning Under Uncertainty Wen Sn, Jr van den Berg, and Ron Alterovitz

More information

Experimental Study of an Impinging Round Jet

Experimental Study of an Impinging Round Jet Marie Crie ay Final Report : Experimental dy of an Impinging Rond Jet BOURDETTE Vincent Ph.D stdent at the Rovira i Virgili University (URV), Mechanical Engineering Department. Work carried ot dring a

More information

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

PILCO: A Model-Based and Data-Efficient Approach to Policy Search PILCO: A Model-Based and Data-Efficient Approach to Policy Search (M.P. Deisenroth and C.E. Rasmussen) CSC2541 November 4, 2016 PILCO Graphical Model PILCO Probabilistic Inference for Learning COntrol

More information

Regression Analysis of Octal Rings as Mechanical Force Transducers

Regression Analysis of Octal Rings as Mechanical Force Transducers Regression Analysis of Octal Rings as Mechanical Force Transdcers KHALED A. ABUHASEL* & ESSAM SOLIMAN** *Department of Mechanical Engineering College of Engineering, University of Bisha, Bisha, Kingdom

More information

PREDICTABILITY OF SOLID STATE ZENER REFERENCES

PREDICTABILITY OF SOLID STATE ZENER REFERENCES PREDICTABILITY OF SOLID STATE ZENER REFERENCES David Deaver Flke Corporation PO Box 99 Everett, WA 986 45-446-6434 David.Deaver@Flke.com Abstract - With the advent of ISO/IEC 175 and the growth in laboratory

More information

Discontinuous Fluctuation Distribution for Time-Dependent Problems

Discontinuous Fluctuation Distribution for Time-Dependent Problems Discontinos Flctation Distribtion for Time-Dependent Problems Matthew Hbbard School of Compting, University of Leeds, Leeds, LS2 9JT, UK meh@comp.leeds.ac.k Introdction For some years now, the flctation

More information

Model Predictive Control Lecture VIa: Impulse Response Models

Model Predictive Control Lecture VIa: Impulse Response Models Moel Preictive Control Lectre VIa: Implse Response Moels Niet S. Kaisare Department of Chemical Engineering Inian Institte of Technolog Maras Ingreients of Moel Preictive Control Dnamic Moel Ftre preictions

More information

Sensitivity Analysis in Bayesian Networks: From Single to Multiple Parameters

Sensitivity Analysis in Bayesian Networks: From Single to Multiple Parameters Sensitivity Analysis in Bayesian Networks: From Single to Mltiple Parameters Hei Chan and Adnan Darwiche Compter Science Department University of California, Los Angeles Los Angeles, CA 90095 {hei,darwiche}@cs.cla.ed

More information

Outline. Model Predictive Control: Current Status and Future Challenges. Separation of the control problem. Separation of the control problem

Outline. Model Predictive Control: Current Status and Future Challenges. Separation of the control problem. Separation of the control problem Otline Model Predictive Control: Crrent Stats and Ftre Challenges James B. Rawlings Department of Chemical and Biological Engineering University of Wisconsin Madison UCLA Control Symposim May, 6 Overview

More information

An Introduction to Geostatistics

An Introduction to Geostatistics An Introdction to Geostatistics András Bárdossy Universität Stttgart Institt für Wasser- nd Umweltsystemmodellierng Lehrsthl für Hydrologie nd Geohydrologie Prof. Dr. rer. nat. Dr.-Ing. András Bárdossy

More information

A Survey of the Implementation of Numerical Schemes for Linear Advection Equation

A Survey of the Implementation of Numerical Schemes for Linear Advection Equation Advances in Pre Mathematics, 4, 4, 467-479 Pblished Online Agst 4 in SciRes. http://www.scirp.org/jornal/apm http://dx.doi.org/.436/apm.4.485 A Srvey of the Implementation of Nmerical Schemes for Linear

More information

COMPARISON OF MODEL INTEGRATION APPROACHES FOR FLEXIBLE AIRCRAFT FLIGHT DYNAMICS MODELLING

COMPARISON OF MODEL INTEGRATION APPROACHES FOR FLEXIBLE AIRCRAFT FLIGHT DYNAMICS MODELLING COMPARISON OF MODEL INTEGRATION APPROACHES FOR FLEXIBLE AIRCRAFT FLIGHT DYNAMICS MODELLING Christian Reschke and Gertjan Looye DLR German Aerospace Center, Institte of Robotics and Mechatronics D-82234

More information

Designing of Virtual Experiments for the Physics Class

Designing of Virtual Experiments for the Physics Class Designing of Virtal Experiments for the Physics Class Marin Oprea, Cristina Miron Faclty of Physics, University of Bcharest, Bcharest-Magrele, Romania E-mail: opreamarin2007@yahoo.com Abstract Physics

More information

Lecture: Corporate Income Tax

Lecture: Corporate Income Tax Lectre: Corporate Income Tax Ltz Krschwitz & Andreas Löffler Disconted Cash Flow, Section 2.1, Otline 2.1 Unlevered firms Similar companies Notation 2.1.1 Valation eqation 2.1.2 Weak atoregressive cash

More information

Ted Pedersen. Southern Methodist University. large sample assumptions implicit in traditional goodness

Ted Pedersen. Southern Methodist University. large sample assumptions implicit in traditional goodness Appears in the Proceedings of the Soth-Central SAS Users Grop Conference (SCSUG-96), Astin, TX, Oct 27-29, 1996 Fishing for Exactness Ted Pedersen Department of Compter Science & Engineering Sothern Methodist

More information

A Macroscopic Traffic Data Assimilation Framework Based on Fourier-Galerkin Method and Minimax Estimation

A Macroscopic Traffic Data Assimilation Framework Based on Fourier-Galerkin Method and Minimax Estimation A Macroscopic Traffic Data Assimilation Framework Based on Forier-Galerkin Method and Minima Estimation Tigran T. Tchrakian and Sergiy Zhk Abstract In this paper, we propose a new framework for macroscopic

More information

Robust Tracking and Regulation Control of Uncertain Piecewise Linear Hybrid Systems

Robust Tracking and Regulation Control of Uncertain Piecewise Linear Hybrid Systems ISIS Tech. Rept. - 2003-005 Robst Tracking and Reglation Control of Uncertain Piecewise Linear Hybrid Systems Hai Lin Panos J. Antsaklis Department of Electrical Engineering, University of Notre Dame,

More information

Performance analysis of GTS allocation in Beacon Enabled IEEE

Performance analysis of GTS allocation in Beacon Enabled IEEE 1 Performance analysis of GTS allocation in Beacon Enabled IEEE 8.15.4 Pangn Park, Carlo Fischione, Karl Henrik Johansson Abstract Time-critical applications for wireless sensor networks (WSNs) are an

More information

A Regulator for Continuous Sedimentation in Ideal Clarifier-Thickener Units

A Regulator for Continuous Sedimentation in Ideal Clarifier-Thickener Units A Reglator for Continos Sedimentation in Ideal Clarifier-Thickener Units STEFAN DIEHL Centre for Mathematical Sciences, Lnd University, P.O. Box, SE- Lnd, Sweden e-mail: diehl@maths.lth.se) Abstract. The

More information

Reducing Conservatism in Flutterometer Predictions Using Volterra Modeling with Modal Parameter Estimation

Reducing Conservatism in Flutterometer Predictions Using Volterra Modeling with Modal Parameter Estimation JOURNAL OF AIRCRAFT Vol. 42, No. 4, Jly Agst 2005 Redcing Conservatism in Fltterometer Predictions Using Volterra Modeling with Modal Parameter Estimation Rick Lind and Joao Pedro Mortaga University of

More information

Imprecise Continuous-Time Markov Chains

Imprecise Continuous-Time Markov Chains Imprecise Continos-Time Markov Chains Thomas Krak *, Jasper De Bock, and Arno Siebes t.e.krak@.nl, a.p.j.m.siebes@.nl Utrecht University, Department of Information and Compting Sciences, Princetonplein

More information

Gradient Projection Anti-windup Scheme on Constrained Planar LTI Systems. Justin Teo and Jonathan P. How

Gradient Projection Anti-windup Scheme on Constrained Planar LTI Systems. Justin Teo and Jonathan P. How 1 Gradient Projection Anti-windp Scheme on Constrained Planar LTI Systems Jstin Teo and Jonathan P. How Technical Report ACL1 1 Aerospace Controls Laboratory Department of Aeronatics and Astronatics Massachsetts

More information

Move Blocking Strategies in Receding Horizon Control

Move Blocking Strategies in Receding Horizon Control Move Blocking Strategies in Receding Horizon Control Raphael Cagienard, Pascal Grieder, Eric C. Kerrigan and Manfred Morari Abstract In order to deal with the comptational brden of optimal control, it

More information

IMPROVED ANALYSIS OF BOLTED SHEAR CONNECTION UNDER ECCENTRIC LOADS

IMPROVED ANALYSIS OF BOLTED SHEAR CONNECTION UNDER ECCENTRIC LOADS Jornal of Marine Science and Technology, Vol. 5, No. 4, pp. 373-38 (17) 373 DOI: 1.6119/JMST-17-3-1 IMPROVED ANALYSIS OF BOLTED SHEAR ONNETION UNDER EENTRI LOADS Dng-Mya Le 1, heng-yen Liao, hien-hien

More information

Integrated Design of a Planar Maglev System for Micro Positioning

Integrated Design of a Planar Maglev System for Micro Positioning 005 American Control Conference Jne 8-0, 005. Portland, OR, USA hc06.6 Integrated Design of a Planar Maglev System for Micro Positioning Mei-Yng Chen, Chia-Feng sai, Hsan-Han Hang and Li-Chen F.Department

More information

Uncertainties of measurement

Uncertainties of measurement Uncertainties of measrement Laboratory tas A temperatre sensor is connected as a voltage divider according to the schematic diagram on Fig.. The temperatre sensor is a thermistor type B5764K [] with nominal

More information

Nonlinear parametric optimization using cylindrical algebraic decomposition

Nonlinear parametric optimization using cylindrical algebraic decomposition Proceedings of the 44th IEEE Conference on Decision and Control, and the Eropean Control Conference 2005 Seville, Spain, December 12-15, 2005 TC08.5 Nonlinear parametric optimization sing cylindrical algebraic

More information

Propagation of measurement uncertainty in spatial characterisation of recreational fishing catch rates using logistic transform indicator kriging

Propagation of measurement uncertainty in spatial characterisation of recreational fishing catch rates using logistic transform indicator kriging st International Congress on Modelling and Simlation, Gold Coast, Astralia, 9 Nov to 4 Dec 05 www.mssan.org.a/modsim05 Propagation of measrement ncertainty in spatial characterisation of recreational fishing

More information

1 Introduction. r + _

1 Introduction. r + _ A method and an algorithm for obtaining the Stable Oscillator Regimes Parameters of the Nonlinear Sstems, with two time constants and Rela with Dela and Hsteresis NUŢU VASILE, MOLDOVEANU CRISTIAN-EMIL,

More information

APPLICATION OF MICROTREMOR MEASUREMENTS TO EARTHQUAKE ENGINEERING

APPLICATION OF MICROTREMOR MEASUREMENTS TO EARTHQUAKE ENGINEERING 170 APPLICATION OF MICROTREMOR MEASUREMENTS TO EARTHQUAKE ENGINEERING I. M. Parton* and P. W. Taylor* INTRODUCTION Two previos papers pblished in this Blletin (Refs 1, 2) described the methods developed

More information

Home Range Formation in Wolves Due to Scent Marking

Home Range Formation in Wolves Due to Scent Marking Blletin of Mathematical Biology () 64, 61 84 doi:1.16/blm.1.73 Available online at http://www.idealibrary.com on Home Range Formation in Wolves De to Scent Marking BRIAN K. BRISCOE, MARK A. LEWIS AND STEPHEN

More information

Estimating models of inverse systems

Estimating models of inverse systems Estimating models of inverse systems Ylva Jng and Martin Enqvist Linköping University Post Print N.B.: When citing this work, cite the original article. Original Pblication: Ylva Jng and Martin Enqvist,

More information

Assignment Fall 2014

Assignment Fall 2014 Assignment 5.086 Fall 04 De: Wednesday, 0 December at 5 PM. Upload yor soltion to corse website as a zip file YOURNAME_ASSIGNMENT_5 which incldes the script for each qestion as well as all Matlab fnctions

More information