arxiv: v1 [cs.sy] 20 Sep 2017

Size: px
Start display at page:

Download "arxiv: v1 [cs.sy] 20 Sep 2017"

Transcription

1 On the Design o LQR Kernels or Eicient Controller Learning Alonso Marco 1, Philipp Hennig 1, Stean Schaal 1, and Sebastian Trimpe 1 arxiv:179.79v1 [cs.sy] Sep 17 Abstract Finding optimal eedback controllers or nonlinear dynamic systems rom data is hard. Recently, Bayesian optimization (BO) has been proposed as a powerul ramework or direct controller tuning rom experimental trials. For selecting the next query point and inding the global optimum, BO relies on a probabilistic description o the latent objective unction, typically a Gaussian process (GP). As is shown herein, GPs with a common kernel choice can, however, lead to poor learning outcomes on standard quadratic control problems. For a irstorder system, we construct two kernels that speciically leverage the structure o the well-known Linear Quadratic Regulator (LQR), yet retain the lexibility o Bayesian nonparametric learning. Simulations o uncertain linear and nonlinear systems demonstrate that the LQR kernels yield superior learning perormance. I. INTRODUCTION A core problem o learning control is to determine optimal eedback controllers or (partially) unknown nonlinear systems rom experimental data. Reinorcement learning (RL) [1], [] is a promising ramework or this, yet oten requires perorming many experiments on the physical system to even ind suitable controllers, which limits the applicability o such techniques. Thereore, a lot o research eort has been invested into data eiciency o RL aiming at learning controllers rom as ew experiments as possible. Recently, Bayesian optimization (BO) has been proposed or RL as a promising approach in this direction. BO employs a probabilistic description o the latent objective unction (typically a Gaussian process (GP)), which allows or selecting next control experiments in a principled manner, e.g., to maximize inormation gain [3] or perorm sae exploration []. While BO provides a promising ramework or learning controllers in airly general settings, the ull power o Bayesian learning is oten not exploited. A key advantage o Bayesian methods is that they allow or combining prior problem knowledge with learning rom data in a principled manner. In case o GP models, this concerns speciically the choice o the kernel, which captures the covariance between unction values at dierent inputs and is thus the core component to model prior knowledge about the unction shape. By choosing standard kernels, however, naive BO approaches do oten not exploit this opportunity to improve learning perormance. 1 Max Planck Institute or Intelligent Systems, 77 Tübingen, Germany. {amarco, phennig, sschaal, strimpe}@tuebingen.mpg.de Computational Learning and Motor Control Lab, University o Southern Caliornia, Los Angeles, USA. This work was supported in part by the Max Planck Society, the Max Planck ETH Center or Learning Systems, National Science Foundation grants IIS-159, IIS-11713, EECS-95, the Oice o Naval Research, and the Okawa Foundation. In this paper, we show how structural knowledge about the optimal control problem at hand can be leveraged or designing speciic kernels in order to improve data eiciency in learning control. For a irst-order nonlinear quadratic optimal control problem, we propose two LQR kernels that leverage the structure o the amous Linear Quadratic Regulator (LQR) problem given in orm o an approximate linear model o the true nonlinear dynamics. The proposed kernels leverage the structure o the LQR problem, while retaining the lexibility o nonparametric GP regression. Contributions: In detail, this paper makes the ollowing contributions: 1) We discuss how the structure o the well-known LQR problem can be leveraged or eicient learning o controllers or nonlinear systems. ) This discussion leads to the proposal o two new kernels or GP regression in the context o learning control: the parametric and nonparametric LQR kernels. 3) The improved learning perormance achieved with these kernels over a standard kernel is demonstrated through numerical simulations. Related work: BO or learning controllers has recently been considered, or example in [3] [7], in dierent lavors. While [3] and [5] ocus on data eiciency by maximizing the inormation gain in each iteration, [] and [] propose methods or sae exploration. Dierent BO algorithms are compared in [7]. These works all consider tuning o stateeedback controllers with a quadratic cost (possibly saturated) similar to the setting herein, but using standard kernels. The design o customized kernels or GP regression has been considered beore in the context o control and robotics or related problems. A kernel or bipedal locomotion, which captures typical gait characteristics, is proposed in []. In [9], an impedance-based model is incorporated as prior knowledge or improved predictions in human-robot interaction. For the problem o maximizing power generation in photovoltaic plants, the authors in [1] incorporate explicit basis unctions about known power curves in the kernel. In [5], a kernel is designed to model inormation rom simulation and physical experiments, in order to leverage both sources o inormation or RL. None o the above reerences considers the problem o incorporating the structure o the LQR problem or improving data-eiciency in learning control with BO. Outline: We introduce the considered learning control problem in Sec. II, and summarize BO or RL in Sec. III, together with the necessary background on GPs. While the BO ramework or learning control is introduced or general multivariate systems, we ocus on the special case o a scalar Accepted inal version. To appear in 5th IEEE Conerence on Decision and Control. c 17 IEEE. Personal use o this material is permitted. Permission rom IEEE must be obtained or all other uses, in any current or uture media, including reprinting/republishing this material or advertising or promotional purposes, creating new collective works, or resale or redistribution to servers or lists, or reuse o any copyrighted component o this work in other works.

2 problem thereater and develop the LQR kernels in Sec. IV. Numerical results in Sec. V illustrate the improved learning perormance o the proposed kernels over a standard kernel. The paper concludes with remarks in Sec. VI. Notation: A Gaussian random variable x with mean m and variance V is denoted by x N (m, V ). The expected value o x is denoted by E[x], while V[x 1, x ] denotes the covariance o x 1 and x. We also use V[x 1 ] := V[x 1, x 1 ]. II. LEARNING CONTROL PROBLEM We consider regulation o the nonlinear stochastic system x t+1 = g(x t, u t, v t ) (1) with state x t R nx, control input u t R nu, and random noise v t N (, V ). We assume that the state x t can be measured or otherwise estimated. The system dynamics g( ) are unknown. The control objective is to ind a state-eedback controller u t = F x t () with controller gain F R nu nx such that the quadratic cost 1 [ T 1 J = lim T T E ] x T t Qx t + u T t Ru t t= with symmetric weights Q and R > is minimized. A quadratic cost is a very common choice in optimal control [11] expressing the designer s preerence in the undamental trade-o between control perormance and control eort. While the solution o the above problem is standard when the dynamics (1) are known and linear [11], here, we ace the problem o optimal control under unknown and nonlinear dynamics. To address this problem, we ollow a direct RL approach [1], where the optimal controller is learned by directly evaluating the perormance o candidate controllers F i on the real system (1) without learning an (intermediate) dynamics model. We employ Bayesian optimization (BO) as a popular approach or sequential global optimization, which leverages the inormation rom previous trials in order to propose the next candidate controller F i+1 so as to eventually ind the optimum o (3). Because evaluations on the system are expensive, we seek to ind the optimal controller with as ew evaluations as possible. To this end, we address herein how to leverage inormation rom a linear model that approximates the true system (1). III. BAYESIAN OPTIMIZATION FOR LEARNING CONTROL We briely introduce necessary background material on Gaussian processes (GPs) and Bayesian optimization (BO), beore describing reinorcement learning with BO. A. Gaussian process regression Let θ denote the ree parameters o a eedback controller; that is, θ = vec(f ) in (), or any other parameterization. The dependence o the cost unction J (3) on θ is unknown a priori because o the lack o knowledge o the system (1). We (3) use GP regression to approximate the unction J : Θ R, θ Θ, rom data (i.e., noisy unction evaluations) J i = J(θ i ) + ε i, ε i N (, σ ). () Obtaining one data point J i corresponds to perorming a closed-loop experiment on the true system (1) with controller θ i, recording the state-input trajectory, and computing the cost rom these data (in case o (3) a inite approximation is used with a suiciently long horizon T ). A GP can be deined as a probability distribution over the space o unctions J : Θ R whose restriction to any inite number o unction values is jointly Gaussian [13, p. 13]. A GP is speciied through its prior mean unction µ : Θ R and covariance unction k : Θ Θ R. We write J(θ) GP(µ(θ), k(θ, θ ) µ(θ) = E[J(θ)] k(θ, θ ) = V[J(θ), J(θ )]. with Hence, µ is the expected unction value, and k captures the covariance between any two unction values and is used to model uncertainty. The latter is also called kernel and must satisy the ollowing property to give rise to a valid covariance unction: Deinition 1 (according to [13]): Let k : Θ Θ R be a unction, and K N R N N be the symmetric matrix whose entries are computed as K ij = k(θ i, θ j ) rom the collection {θ 1, θ,..., θ N }, θ i Θ. The unction k is called a positive semideinite kernel i K is positive semideinite or any inite collection {θ 1, θ,..., θ N }. The matrix K N is called the Gram matrix. A kernel typically has its own parameters, called hyperparameters, such as length scales or output variance. By choosing the prior mean and the kernel, the user can speciy prior knowledge about the unction such as expected shape, length scales, and smoothness properties. In GP regression, learning a unction amounts to predicting the (normal) distribution o unction values J(θ ) at arbitrary inputs θ Θ based on previous evaluations. Given N data points D N = {θ i, J i } N i=1, the posterior mean and variance o J(θ ) can be stated in closed orm as E[J(θ ) D N ] = µ(θ ) k T N(θ )(K N +σ I) 1 Ĵ N (5) V[J(θ ) D N ] = k(θ, θ ) k T N(θ )(K N +σ I) 1 k N (θ ) () where k N (θ ) := [k(θ, θ 1 ),..., k(θ, θ N )] T and ĴN := [J 1 µ(θ 1 ),..., J N µ(θ N )] T. In addition to computing the posterior distribution, the data D N can also be used to optimize hyperparameters in order to improve the model it. A popular approach is maximizing the marginal likelihood o the data, p(ĵn θ 1,..., θ N ), which is given in logarithmic orm as [13, p. 19] Ĵ 1 N T (K N +σ I) 1 Ĵ N 1 K N +σ I N log π. (7)

3 Algorithm 1 Learning control with Bayesian optimization 1: Speciy objective unction J (e.g., (3) with inite horizon T ) : Speciy GP prior (mean µ, kernel k) 3: Initialize: controller θ ; data set D = : while (not terminated) do e.g., ixed number o experiments 5: perorm closed-loop experiment with controller θ i : compute J i rom experimental data {x t, u t} T 1 t= 7: add evaluation {θ i, J i } to data set D i : update GP posterior (5), () 9: [optional:] optimize hyperparameters (e.g., maximize (7)) 1: compute next controller θ i+1 via () 11: end while 1: determine best guess θ bg or the controller parameters, e.g., minimum o posterior mean (5) 13: return θ bg B. Bayesian optimization Bayesian optimization [1] denotes a class o global optimization methods, where a probabilistic description p(j(θ)) o the latent objective unction J is used or data-eicient optimization. Here, the probabilistic description is given by the GP J GP(µ, k). Given p(j(θ)), a utility U[p(J(θ))] is deined, which is maximized in each iteration to ind the next evaluation point θ i+1 = arg max U[p(J(θ))]. () θ Dierent BO algorithms primarily distinguish in how the selection o the next evaluation () is ormulated. Popular BO algorithms include expected improvement (EI) [15], probability o improvement (PI) [1], GP upper conidence bound (GP-UCB) [17], and entropy search (ES) [1]. The method developed herein is agnostic to the type o BO algorithm used. For the examples in Sec. V, we employ EI, which uses U[p(J(θ))] = E[min{, J low J(θ)}] (9) or the optimization in (), where J low represents the lowest unction rom current data set D N. EI thus selects the next query point where the expected improvement upon J low (under the GP o J) is maximal. C. Reinorcement learning with Bayesian optimization Algorithm 1 summarizes direct RL using Bayesian optimization. This ramework has been successully applied in experimental settings to learn eedback controllers or the control problem o Sec. II and related scenarios. Berkenkamp et al. [] optimize a state-eedback controller () or quadrotor trajectory tracking using a sae BO algorithm [19], which builds on GP-UCB. Marco et al. [3] propose to parametrize the eedback gain F as LQR policy [], whose optimal weights are learned using ES, which maximizes the inormation gain rom each experiment. This algorithm has been extended in [5] to include inormation rom dierent sources (such as simulation and physical experiments). Both methods [3], [5] were successully applied to learn pole balancing controllers. Calandra et al. [7] compare PI, EI, and GP-UCB or learning the parameters o a discrete event controller or a walking robot. IV. CONSTRUCTING LQR KERNELS In this section, we present two kernels that are speciically designed or the control problem o Sec. II and incorporate approximate model knowledge in orm o a linear approximation to (1). Because or a linear system, the solution to the optimal control problem o Sec. II is the well-known LQR, we term these kernels LQR kernels. The ollowing derivations are presented or a irst-order system (1), where all variables are scalars (n x = n u = 1). Small letters are used in place o the capital ones to emphasize scalar variables (e.g., v instead o V in (1) and instead o F in ()). We consider minimization o the cost (3), which is rewritten as 1 [ T 1 J = lim T T E t= qx t + ru t ]. (1) We start by considering a learning example with a standard kernel choice or the GP, which motivates why a speciically designed kernel can be desirable. A. Problems with standard kernel The most common kernel choice in GP regression is arguably the squared exponential (SE) kernel [13] ( k SE (θ, θ ) = σse exp (θ θ ) ) l (11) with signal variance σse and length-scale l as its hyperparameters. Let us consider the problem o learning the cost unction J (1) via GP regression with SE kernel or the ollowing linear example: Example 1: Let (1) be given by x t+1 =.9x t + u t + v t with v t N (, 1). Figure 1 shows the prior GP and the posterior GP ater obtaining our data points (i.e., ater our evaluations o controllers i ). A ew issues are apparent rom the posterior: (i) the kernel has problems with the dierent length scales o the unction (J is steep around =.1, but rather lat in the center region); (ii) the GP does not generalize well to regions where no data has been seen (e.g., around = 1., where the posterior mean resorts to the prior); and (iii) the overall it is not very good. Clearly, the it will improve with more data, but, or eicient and ast learning o controllers, we are particularly interested in good its rom just ew data points. Hence, we seek to improve the itting properties o the GP by exploiting knowledge about the system (1) in terms o an approximate linear model. B. Incorporating prior knowledge In no practical situation, one has a perect model o the system to be controlled. At the same time, it is oten possible to obtain a rough model, e.g., rom irst principles modeling or some system identiication procedure. Here, we assume that an uncertain linear model x t+1 = ax t + bu t + v t (1) a [a min, a max ], b [b min, b max ] (13)

4 J() J() Posterior GP (Ex. 1) 1 Prior GP Fig. 1. Prior and posterior GPs o Example 1 using the squared exponential kernel k SE (hyperparameters: σse = 5, l =.). The thick line represents the GP mean, and the light blue area the GP variance (+/- two standard deviations). The true unction is shown in the bottom plot in dashed gray, and data points in orange. is available as an approximation to (1), e.g., rom linearization o a irst principles model with (possibly) some uncertainty about the physical parameters. In the ollowing, we will consider controller gains such that the system (1) is guaranteed stable or all parameters (13). That is, we consider F with F := { R a + b < 1 a [a min, a max ], b [b min, b max ]}. (1) This restriction makes sense, or example, in saety critical applications, where one wants to avoid the risk o trying an unstable controller based on the system knowledge available (i.e., (1), (13)). Moreover, the restriction to F will ensure that subsequent calculations are well-deined. I a and b were known, the unctional dependence o the cost J on the controller gain in () or the linear system (1) could be derived using standard control theory. Fact 1: Consider the system (1) with known parameters a and b, and let a + b < 1. Then, the cost (1) is given by 1 q + r J = v 1 (a + b) =: φ (a,b)(). (15) Proo: The controlled process x t+1 = (a + b)x t + v t is stable by assumption and thus converges to a stationary process with zero mean and variance E[x t ] = P, where P is the unique positive solution to P P (a + b) = v, [1, Theorem 3.1]. For stationary x t, (1) resolves into J = E[qx t + ru t ] = (q + r ) E[x t ] = (q + r )P. Equation (15) then ollows. 1 In the notation φ (a,b) (), we omit the parametric dependence on v, q and r since v is a multiplicative constant, which does not play a role in the later optimization, and we assume that q and r are ixed. I we are uncertain about a and b, a collection o possible costs (15) emerge rom all the possible combinations o a and b within their ranges (13). We assume this collection o costs to be explained by a Gaussian process J LQR. While any arbitrary choice o a and b in (1) is only an approximation to (1), it yields a cost (15) that contains useul structural inormation, which can be leveraged or aster learning controllers rom data. In the next sections, we show how this prior knowledge can be exploited to construct LQR kernels. C. Parametric LQR kernel A reasonable choice or J LQR () is J LQR () = w φ (ā, b) (), w N ( w, σ w) (1) where ā := (a min + a max )/ and b := (b min + b max )/ are the midpoints o the uncertainty intervals (13). Equation (1) is a standard parametric model with a single eature φ (ā, b) () and Gaussian prior. It is well-known (see e.g. [13]) that J LQR () is a GP, J LQR () GP (, k plqr (, ) ) (17) where we have assumed w =, with kernel k plqr (, ) := σ wφ (ā, b) ()φ (ā, b) ( ) = σ p v (q + r )(q + r ) (1 (ā + b) )(1 (ā + b ) ) (1) and hyperparameters σ p := σ w, ā, and b. We reer to (1) as the parametric LQR kernel because it captures the cost unction or the linear system (ā, b) with quadratic cost, and thus the structure o the LQR problem. To illustrate the perormance o the parametric LQR kernel k plqr, we revisit Example 1 using this kernel instead o the SE kernel. The top two graphs o Fig. show the prior and posterior GP or the same data points as in Fig. 1 when using k plqr with hyperparameters ā =.9 and b = 1. We see that the posterior it rom only our data points is almost perect and much better than the one in Fig. 1. This is, o course, not a big surprise because the hyperparameters match the true underlying system o Example 1 perectly. The itting perormance deteriorates, however, i the hyperparameters are o. This can be seen in the bottom graph o Fig., which shows the posterior GP with the same hyperparameters, but or the system Example : x t+1 =.x t +.9u t +v t with v t N (, 1). To improve the it in this situation, one can employ hyperparameter optimization (see Sec. III-A) to ind improved parameters ā and b that better explain the data. The simulation results in Sec. V will show that this can be a viable approach. An alternative is to design more lexible and expressive kernels, which allow or itting more general models. This we discuss next. D. Nonparametric LQR kernel The kernel (1) captures the structure o the cost unction or the LQR problem with one speciic linear model (ā, b). A straightorward way to increase the lexibility o the kernel

5 J() Posterior GP (Ex. 1) J() J() Posterior GP (Ex. ) Prior GP Fig.. GP it using the parametric LQR kernel k plqr (hyperparameters: σp = 1, ā =.9, b = 1). The color code is the same as in Fig. 1. The hyperparameters are exact or Example 1, while they are o by about 1% or Example. in order to it more general problems is to use m basis unctions (or eatures) φ (ai,b i) corresponding to m models (a 1, b 1 ), (a, b ),..., (a m, b m ), J LQR () = [ φ (a1,b 1)() φ (a,b )() φ ] (am,b }{{ m)() w } =:Φ T () = Φ T ()w (19) with w R m, w N ( w, Σ w ). The derivation o the corresponding kernel is analogous to (1) (see [13, Sec..7]) and yields k plqr,m (, ) = Φ T () Σ w Φ( ). () Same as (1), the model (19) represents a parametric model or the LQR cost J LQR. That is, its lexibility is essentially limited to the number o explicit eatures φ (ai,b i). Employing powerul kernel techniques [], the parametric model can be turned into a nonparametric one, which includes an ininite number o eatures while retaining inite computational complexity. The key idea is to consider the kernel () in the limit o ininitely many eatures corresponding to models a [a min, a max ] and b [b min, b max ]. The derivation ollows ideas similar to how the standard SE kernel can be derived rom basic eatures, [13, p. ]. Consider the partitions o [a min, a max ] and [b min, b max ] into m equidistant intervals, and let {a i } 1:m and {b i } 1:m be the lower (or upper) interval limits. Consider the model (19) with eature vector Φ T () = [ φ (a1,b 1)() φ (ai,b j)() φ (am,b m)() ] which includes all combinations φ (ai,b j) or i, j {1,..., m}, and the parametric prior w = and Σ w = σ n (amax amin)(bmax bmin) m I or some σ n R. The kernel () then becomes k plqr,m (, ) = σ n (a max a min )(b max b min ) m j=1 i=1 m m φ (ai,b j)() φ (ai,b j)( ). (1) Since φ (ai,b j) is continuous on F, the inite sum (1) converges to the Riemann integral in the limits as m. We can thus deine the nonparametric LQR kernel k nlqr (, ) = lim k plqr,m(, ) m = σ n bmax amax b min a min φ (a,b) ()φ (a,b) ( ) da db () or, F with the signal variance σn and the integration boundaries a min, a max, b min, and b max as hyperparameters. While the kernel in () represents the structure o the cost unction (15) or an ininite number o models (all a [a min, a max ] and b [b min, b max ]), its computation is inite consisting o solving the integral in (). By contrast, the computational complexity o the parametric kernels () and (1) grows with the number o eatures m. We prove that the kernel () is indeed a valid covariance unction. Proposition 1: k nlqr (, ) is positive semideinite or all, F. Proo: Take any collection {θ 1, θ,..., θ N } and any z R N. Let K N be the Gram matrix or the kernel k nlqr. Then N ( N ) z T K N z = z j z i k nlqr (θ i, θ j ) j=1 i=1 N N b max a max z j z i j=1 i=1 b min b max a max = σ n = σ n = σ n b min a j=1 min b max a max b min a min a min φ (a,b) (θ i )φ (a,b) (θ j ) da db N N z j φ (a,b) (θ j ) z i φ (a,b) (θ i ) da db i=1 ( N z i φ (a,b) (θ i )) da db i=1 which completes the proo. The above derivation corresponds to the kernel trick [13], [], which is a core idea o kernel methods and behind many powerul learning algorithms. In essence, the kernel trick means to write a learning algorithm solely in terms o inner products o eatures and replacing those by a kernel. In particular, this allows or considering an ininite number o eatures, while retaining inite computation. Figure 3 shows the prior and posterior GP or the nonparametric LQR kernel (). Because the kernel is more lexible All computations or the GP regression, i.e., equations (5), (), and (7), are expressed solely in terms o the kernel k (and the mean µ).

6 J() Posterior GP (Ex. 1) J() J() Posterior GP (Ex. ) Prior GP Fig. 3. GP it using the nonparametric LQR kernel k nlqr (hyperparameters: σ n =, a min =., a max = 1., b min =.9, and b max = 1.1). Colors are the same as in Fig. 1. Both examples are itted well. than (1), it can it the cost unctions or the two dierent models o Example 1 and Example well. E. A combined kernel System (1) is an approximation to the true system (1). It is thus not desirable to ully commit to the model or solving the optimal control problem. This would mean to directly minimize (15) (which would result in the wellknown LQR solution). On the other hand, the linear problem contains inormation about the structure o the optimization problem, which shall be useul also or optimization o the true nonlinear system (1), as long as we believe (1) to be a reasonable approximation thereo. In other words, we can expect the true cost unction J or the nonlinear problem to bear some similarity to (15). We model the cost unction (3) or the nonlinear system (1) as being composed o a part that stems rom the approximation as LQR problem and an error term, J() = J LQR () + J (). (3) The term J () captures the error in the model that stems rom the act that the true problem is nonlinear. We model it as a standard GP, e.g., employing the SE kernel (11): J () GP(, k SE (, )). We can model J LQR () as a GP (17) using either the parametric LQR kernel (1) or the nonparametric () LQR kernel. Then, since the sum o two independent Gaussians is also Gaussian, it ollows rom (3) that also J is a GP (see [13, Sec..7]), with J() GP (, k LQR (, ) + k SE (, ) ). where k LQR can be replaced by (1) or (). By choosing the hyperparameters o the kernels, the designer can express how much he or she trusts the LQR versus the SE model. For example, σ SE = means to ully rely on the LQR kernel. V. SIMULATIONS In this section, we show statistical comparisons o the LQR kernels proposed in Sec. IV against the commonly used SE kernel, in two dierent settings. In the irst setting, we evaluate the perormance o each kernel in the context o GP regression. Speciically, we quantiy the mismatch between the GP posterior mean, computed rom a set o random evaluations, and the underlying cost unction. In the second setting, we evaluate each kernel in the context o BO by comparing the learned minimum to the true global minimum. The GP regression and BO experiments are presented in Sec. V-B and Sec. V-C, respectively, considering a linear system (1). In addition, we also evaluate the BO setting or a nonlinear system in Sec. V-D. A. Experimental choices For the simulations in Sec. V-B and Sec. V-C, we consider the true system (1) to be linear (i.e., (1)) with uncertain parameters a [., 1.] and b [.9, 1.1]. We consider the optimal control problem as in Sec. II with q = r = v = 1. Feedback controllers () are considered to be in the range (1), F = [ 1.,.1]. For each controller, the corresponding ininite-horizon LQR cost J() is given by (15). In practice, only initehorizon simulations can be realized. Thereore, the outcome o an experiment is noisy, as modeled in (), with σ =.5. In each simulation, a dierent linear model (a, b) is obtained by uniormly sampling the ranges above, which yields a dierent underlying cost unction (15). Each simulation is repeated or our dierent kernels: SE kernel: As described in (11). LQR kernel I: Parametric LQR kernel (1) with ixed parameters (ā, b) (midpoints o uncertainty intervals). LQR kernel II: Parametric LQR kernel (1), with (ā, b) optimized rom evaluations by maximizing (7). LQR kernel III: Nonparametric LQR kernel (). The parametric LQR kernel (1) is constructed taking the middle points o the uncertainty intervals, i.e., ā =.9 and b = 1. For the nonparametric LQR kernel (), the intervals serve as integration domains. The length scale o the SE kernel is computed as one ith o the input domain, i.e., l =.37. The signal variance o the parametric and nonparametric LQR kernel, i.e., σp and σn, are normalized such that k plqr (, ) = k nlqr (, ) = 1, where =. is the midpoint o F. Since the variance o these two kernels grow ast toward the corners o the domain, we set or the SE kernel σse = 5, or a air comparison. B. GP regression setting For this statistical comparison, we run 1 simulations. In each simulation, we compute the GP posterior conditioned

7 SE kernel In LQR kernel I In LQR kernel II In LQR kernel III In RMSE 3 1 SE kernel In 3 LQR kernel I In 3 LQR kernel II In 3 LQR kernel III In Regret Fig.. Histograms o the root mean squared error (RMSE) between the true cost unction and the GP posterior mean conditioned on two evaluations. The histograms are computed out o 1 simulations. Fig. 5. Histogram o the regret incurred by stopping BO ater three evaluations. The system considered is the linear system (1). The histogram is computed over 1 BO runs. on two evaluations randomly sampled rom the underlying cost unction. We assess the quality o the regression by computing the root mean squared error (RMSE) between the true cost unction and the GP posterior mean, both computed on a grid o 1 points over F. Fig. shows the histograms o the RMSE obtained with the dierent kernels. The LQR kernels clearly outperorm the SE kernel in these experiments because they contain structural knowledge about the true cost, which contributes to a better GP it, even with only two data points. The nonparametric kernel makes good predictions because it inherently contains inormation about all possible unctions within their uncertainty ranges o (a, b). However, poorly speciied integration bounds will decrease its perormance. The RMSE statistical analysis conirms this when the integration intervals on (a, b) are a 5% larger than their uncertainties. The parametric kernel with ixed hyperparameters (ā, b) has a signiicant number o outliers since, in many cases, the data is queried rom a cost unction whose sampled (a, b) are ar away rom the ones o the kernel. The parametric kernel with hyperparameter optimization also leads to a better it than SE, but has most outliers (about 75% o the simulations) since hyperparameter optimization is not reliable with just two data points. We have repeated these experiments with N = 1, 5, and 1 evaluations. Table I shows the averaged RMSE over 1 simulations or each kernel, and the corresponding standard TABLE I RMSE AVERAGED OVER 1 SIMULATIONS. N SE ker. SE ker. (*) LQR ker. I LQR ker. II LQR ker. III 1.7 (.7).39 (.1) 1.1 (.95) 1.9 (1.9) 1.31 (.71).9 (.5). (.93) 1. (.9) 1.9 (1.7) 1. (.7) (1.) 1.31 (.95) 1.1 (1.5).5 (.7) 1. (.7) (.99).9 (.).9 (.9). (.) 1.11 (.7) deviation (in parentheses). The the outliers (i.e., any RMSE above 5) were excluded rom these computations. In general, we see that the LQR kernel optimized rom data perorms better than the others or more than evaluations. For a air comparison, we also include the SE kernel with hyperparameter optimization in the table (marked with an asterisk). Because it perorms similar to the SE kernel, and it does not improve upon the LQR kernels, we leave it out o the discussion or the rest o the paper. Remark: The hyperparameters (a, b) o the parametric LQR kernel are optimized rom data by maximizing the marginal likelihood (7). In a sense, optimizing the hyperparameters (a, b) o the LQR kernel rom data can be considered similar to doing system identiication on the linear system (a, b) using (7) as perormance metric. C. BO setting In this section, we evaluate the perormance o each kernel in the context o BO. For each BO run, the irst evaluation is decided randomly within the range o controllers F. Subsequent evaluations are acquired using the expected improvement (EI) method (), (9). We stop the exploration ater three evaluations and compute the instantaneous regret (i.e., the absolute error between the true minimum and the minimum o the GP posterior mean) as the outcome o each experiment. Fig. 5 shows the histogram o the regret or each kernel over 1 BO runs. The LQR kernels consistently outperorm the SE kernel. The nonparametric kernel shows some outliers because in some cases the GP posterior mean grows large toward negative values, and so does its minimum. This is a wrong prediction o the underlying cost, deined positive. However, this minor issue can easily be detected, and the optimization procedure continued with a randomly sampled controller.

8 TABLE II REGRET AVERAGED OVER 1 BO RUNS FOR A NONLINEAR SYSTEM N SE kernel LQR kernel I LQR kernel II LQR kernel III 1.3 (.33).3 (.1).3 (.).35 (.1) 3.9 (.39).33 (.3).31 (.19).3 (.1).35 (.7).3 (.).3 (.1).3 (.17) 5.3 (.1).31 (.3).3 (.).3 (.1) D. BO setting or a nonlinear system In this section, we use the same BO setting as in Sec. V-C, but consider now a nonlinear system (1), namely x t+1 = ã sin (x t ) + bu t + v t () with v t N (, 1) and uncertain parameters ã [.9, 1.1], b [.9, 1.1]. We control this system rom x = 1 to zero, using the same controller structure as the one described in Sec. V-A. In this case, the considered range o controllers is F = [ 1.57,.7], which corresponds to (1), reduced by a %. The LQR kernels are built up using the linearized version o () around the zero equilibrium point, i.e., x t+1 = ãx t + bu t +v t. Table II shows the regret average and standard deviation. As can be seen, the LQR kernels perorm better than the SE kernel. VI. CONCLUDING REMARKS In this paper, we discussed how prior knowledge about the structure o an optimal control problem can be leveraged or data-eicient learning control. Speciically, or a nonlinear quadratic optimal control problem, we showed how an uncertain linear model approximating the true nonlinear dynamics can be exploited in a Bayesian setting. This led to the proposal o two novel kernels, a parametric and a nonparametric version o an LQR kernel, which incorporate the structure o the well-known LQR problem as prior knowledge. Numerical simulations herein demonstrate improved data eiciency over standard kernels, i.e., good controllers are learned rom ewer experiments. We hope that the discussion and analysis herein also motivate urther development o kernels tailored or other learning control problems. Approaching the nonlinear quadratic optimal control problem presented herein with pure model-based methods can lead to superior perormance or very accurate models compared to the proposed data-based approach. However, this paper shares the motivation o [3], which proposes a databased approach when only poor models are available, and extends it by incorporating the LQR structure into the kernel. The results herein are preliminary in the sense that the LQR kernels are developed or a irst-order system. While this is the natural irst step, uture work will concern the extension o the ideas and derivations to multivariate systems in order to develop this into a powerul ramework or learning control in practice. Moreover, we plan to validate the beneit o the new kernels in experiments on more realistic nonlinear examples or physical hardware such as those in [3] and [5], settings that pose challenging issues, like imperect states measurement, among others. REFERENCES [1] R. S. Sutton and A. G. Barto, Reinorcement learning: An introduction. MIT press, 199. [] J. Kober, J. A. Bagnell, and J. Peters, Reinorcement learning in robotics: A survey, The International Journal o Robotics Research, vol. 3, no. 11, pp , 13. [3] A. Marco, P. Hennig, J. Bohg, S. Schaal, and S. Trimpe, Automatic LQR tuning based on Gaussian process global optimization, in IEEE International Conerence on Robotics and Automation, 1, pp [] F. Berkenkamp, A. P. Schoellig, and A. Krause, Sae controller optimization or quadrotors with Gaussian processes, in IEEE International Conerence on Robotics and Automation, 1, pp [5] A. Marco, F. Berkenkamp, P. Hennig, A. P. Schoellig, A. Krause, S. Schaal, and S. Trimpe, Virtual vs. real: Trading o simulations and physical experiments in reinorcement learning with Bayesian optimization, in IEEE International Conerence on Robotics and Automation, 17, pp [] J. Schreiter, D. Nguyen-Tuong, M. Eberts, B. Bischo, H. Markert, and M. Toussaint, Sae exploration or active learning with gaussian processes, in Machine Learning and Knowledge Discovery in Databases: European Conerence, 15, pp [7] R. Calandra, A. Seyarth, J. Peters, and M. P. Deisenroth, Bayesian optimization or learning gaits under uncertainty, Annals o Mathematics and Artiicial Intelligence, pp. 1 19, 15. [] R. Antonova, A. Rai, and C. G. Atkeson, Sample eicient optimization or learning controllers or bipedal locomotion, in IEEE International Conerence on Humanoid Robots, 1, pp.. [9] J. R. Medina, S. Endo, and S. Hirche, Impedance-based Gaussian processes or predicting human behavior during physical interaction, in IEEE International Conerence on Robotics and Automation, 1, pp [1] H. Abdelrahman, F. Berkenkamp, J. Poland, and A. Krause, Bayesian optimization or maximum power point tracking in photovoltaic power plants, in European Control Conerence, 1, pp [11] B. D. O. Anderson and J. B. Moore, Optimal Control: Linear Quadratic Methods. Mineola, New York: Dover Publications, 7. [1] S. Schaal and C. Atkeson, Learning control in robotics, IEEE Robotics Automation Magazine, vol. 17, no., pp. 9, Jun. 1. [13] C. Rasmussen and C. Williams, Gaussian Processes or Machine Learning. MIT Press,. [1] J. Mockus, Bayesian approach to global optimization: theory and applications, ser. Mathematics and its applications. Kluwer Academic Publishers, 199, vol. 37. [15] D. R. Jones, M. Schonlau, and W. J. Welch, Eicient global optimization o expensive black-box unctions, Journal o Global Optimization, vol. 13, no., pp. 55 9, 199. [1] H. J. Kushner, A new method o locating the maximum point o an arbitrary multipeak curve in the presence o noise, Journal o Basic Engineering, vol., no. 1, pp. 97 1, 19. [17] N. Srinivas, A. Krause, S. M. Kakade, and M. Seeger, Gaussian process optimization in the bandit setting: No regret and experimental design, in International Conerence on Machine Learning, 1, pp [1] P. Hennig and C. J. Schuler, Entropy search or inormation-eicient global optimization, The Journal o Machine Learning Research, vol. 13, no. 1, pp , 1. [19] Y. Sui, A. Gotovos, J. W. Burdick, and A. Krause, Sae exploration or optimization with gaussian processes, in International Conerence on Machine Learning, 15, pp [] S. Trimpe, A. Millane, S. Doessegger, and R. D Andrea, A sel-tuning LQR approach demonstrated on an inverted pendulum, in IFAC World Congress, Cape Town, South Arica, Aug. 1, pp [1] B. D. O. Anderson and J. B. Moore, Optimal Filtering. Mineola, New York: Dover Publications, 5. [] B. Schölkop and A. J. Smola, Learning with kernels. MIT Press,.

Safe Controller Optimization for Quadrotors with Gaussian Processes

Safe Controller Optimization for Quadrotors with Gaussian Processes Safe Controller Optimization for Quadrotors with Gaussian Processes Felix Berkenkamp, Angela P. Schoellig, and Andreas Krause Abstract One of the most fundamental problems when designing controllers for

More information

Gaussian Process Regression Models for Predicting Stock Trends

Gaussian Process Regression Models for Predicting Stock Trends Gaussian Process Regression Models or Predicting Stock Trends M. Todd Farrell Andrew Correa December 5, 7 Introduction Historical stock price data is a massive amount o time-series data with little-to-no

More information

Scattered Data Approximation of Noisy Data via Iterated Moving Least Squares

Scattered Data Approximation of Noisy Data via Iterated Moving Least Squares Scattered Data Approximation o Noisy Data via Iterated Moving Least Squares Gregory E. Fasshauer and Jack G. Zhang Abstract. In this paper we ocus on two methods or multivariate approximation problems

More information

Talk on Bayesian Optimization

Talk on Bayesian Optimization Talk on Bayesian Optimization Jungtaek Kim (jtkim@postech.ac.kr) Machine Learning Group, Department of Computer Science and Engineering, POSTECH, 77-Cheongam-ro, Nam-gu, Pohang-si 37673, Gyungsangbuk-do,

More information

The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan

The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan Background: Global Optimization and Gaussian Processes The Geometry of Gaussian Processes and the Chaining Trick Algorithm

More information

The achievable limits of operational modal analysis. * Siu-Kui Au 1)

The achievable limits of operational modal analysis. * Siu-Kui Au 1) The achievable limits o operational modal analysis * Siu-Kui Au 1) 1) Center or Engineering Dynamics and Institute or Risk and Uncertainty, University o Liverpool, Liverpool L69 3GH, United Kingdom 1)

More information

Quantifying mismatch in Bayesian optimization

Quantifying mismatch in Bayesian optimization Quantifying mismatch in Bayesian optimization Eric Schulz University College London e.schulz@cs.ucl.ac.uk Maarten Speekenbrink University College London m.speekenbrink@ucl.ac.uk José Miguel Hernández-Lobato

More information

2.6 Two-dimensional continuous interpolation 3: Kriging - introduction to geostatistics. References - geostatistics. References geostatistics (cntd.

2.6 Two-dimensional continuous interpolation 3: Kriging - introduction to geostatistics. References - geostatistics. References geostatistics (cntd. .6 Two-dimensional continuous interpolation 3: Kriging - introduction to geostatistics Spline interpolation was originally developed or image processing. In GIS, it is mainly used in visualization o spatial

More information

Bayesian Optimization with Safety Constraints: Safe and Automatic Parameter Tuning in Robotics

Bayesian Optimization with Safety Constraints: Safe and Automatic Parameter Tuning in Robotics Bayesian Optimization with Safety Constraints: Safe and Automatic Parameter Tuning in Robotics Felix Berkenkamp, Andreas Krause, and Angela P. Schoellig Learning & Adaptive Systems Group, Department of

More information

Least-Squares Spectral Analysis Theory Summary

Least-Squares Spectral Analysis Theory Summary Least-Squares Spectral Analysis Theory Summary Reerence: Mtamakaya, J. D. (2012). Assessment o Atmospheric Pressure Loading on the International GNSS REPRO1 Solutions Periodic Signatures. Ph.D. dissertation,

More information

Model-Based Reinforcement Learning with Continuous States and Actions

Model-Based Reinforcement Learning with Continuous States and Actions Marc P. Deisenroth, Carl E. Rasmussen, and Jan Peters: Model-Based Reinforcement Learning with Continuous States and Actions in Proceedings of the 16th European Symposium on Artificial Neural Networks

More information

Predictive Variance Reduction Search

Predictive Variance Reduction Search Predictive Variance Reduction Search Vu Nguyen, Sunil Gupta, Santu Rana, Cheng Li, Svetha Venkatesh Centre of Pattern Recognition and Data Analytics (PRaDA), Deakin University Email: v.nguyen@deakin.edu.au

More information

On High-Rate Cryptographic Compression Functions

On High-Rate Cryptographic Compression Functions On High-Rate Cryptographic Compression Functions Richard Ostertág and Martin Stanek Department o Computer Science Faculty o Mathematics, Physics and Inormatics Comenius University Mlynská dolina, 842 48

More information

Robust Residual Selection for Fault Detection

Robust Residual Selection for Fault Detection Robust Residual Selection or Fault Detection Hamed Khorasgani*, Daniel E Jung**, Gautam Biswas*, Erik Frisk**, and Mattias Krysander** Abstract A number o residual generation methods have been developed

More information

Supplementary material for Continuous-action planning for discounted infinite-horizon nonlinear optimal control with Lipschitz values

Supplementary material for Continuous-action planning for discounted infinite-horizon nonlinear optimal control with Lipschitz values Supplementary material or Continuous-action planning or discounted ininite-horizon nonlinear optimal control with Lipschitz values List o main notations x, X, u, U state, state space, action, action space,

More information

Roberto s Notes on Differential Calculus Chapter 8: Graphical analysis Section 1. Extreme points

Roberto s Notes on Differential Calculus Chapter 8: Graphical analysis Section 1. Extreme points Roberto s Notes on Dierential Calculus Chapter 8: Graphical analysis Section 1 Extreme points What you need to know already: How to solve basic algebraic and trigonometric equations. All basic techniques

More information

DETC A GENERALIZED MAX-MIN SAMPLE FOR RELIABILITY ASSESSMENT WITH DEPENDENT VARIABLES

DETC A GENERALIZED MAX-MIN SAMPLE FOR RELIABILITY ASSESSMENT WITH DEPENDENT VARIABLES Proceedings o the ASME International Design Engineering Technical Conerences & Computers and Inormation in Engineering Conerence IDETC/CIE August 7-,, Bualo, USA DETC- A GENERALIZED MAX-MIN SAMPLE FOR

More information

Context-Dependent Bayesian Optimization in Real-Time Optimal Control: A Case Study in Airborne Wind Energy Systems

Context-Dependent Bayesian Optimization in Real-Time Optimal Control: A Case Study in Airborne Wind Energy Systems Context-Dependent Bayesian Optimization in Real-Time Optimal Control: A Case Study in Airborne Wind Energy Systems Ali Baheri Department of Mechanical Engineering University of North Carolina at Charlotte

More information

Learning Gaussian Process Models from Uncertain Data

Learning Gaussian Process Models from Uncertain Data Learning Gaussian Process Models from Uncertain Data Patrick Dallaire, Camille Besse, and Brahim Chaib-draa DAMAS Laboratory, Computer Science & Software Engineering Department, Laval University, Canada

More information

Analysis Scheme in the Ensemble Kalman Filter

Analysis Scheme in the Ensemble Kalman Filter JUNE 1998 BURGERS ET AL. 1719 Analysis Scheme in the Ensemble Kalman Filter GERRIT BURGERS Royal Netherlands Meteorological Institute, De Bilt, the Netherlands PETER JAN VAN LEEUWEN Institute or Marine

More information

COMP 551 Applied Machine Learning Lecture 20: Gaussian processes

COMP 551 Applied Machine Learning Lecture 20: Gaussian processes COMP 55 Applied Machine Learning Lecture 2: Gaussian processes Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~hvanho2/comp55

More information

Event-triggered Model Predictive Control with Machine Learning for Compensation of Model Uncertainties

Event-triggered Model Predictive Control with Machine Learning for Compensation of Model Uncertainties Event-triggered Model Predictive Control with Machine Learning or Compensation o Model Uncertainties Jaehyun Yoo, Adam Molin, Matin Jaarian, Hasan Esen, Dimos V. Dimarogonas, and Karl H. Johansson Abstract

More information

Percentile Policies for Inventory Problems with Partially Observed Markovian Demands

Percentile Policies for Inventory Problems with Partially Observed Markovian Demands Proceedings o the International Conerence on Industrial Engineering and Operations Management Percentile Policies or Inventory Problems with Partially Observed Markovian Demands Farzaneh Mansouriard Department

More information

Basics of reinforcement learning

Basics of reinforcement learning Basics of reinforcement learning Lucian Buşoniu TMLSS, 20 July 2018 Main idea of reinforcement learning (RL) Learn a sequential decision policy to optimize the cumulative performance of an unknown system

More information

Linear Quadratic Regulator (LQR) I

Linear Quadratic Regulator (LQR) I Optimal Control, Guidance and Estimation Lecture Linear Quadratic Regulator (LQR) I Pro. Radhakant Padhi Dept. o Aerospace Engineering Indian Institute o Science - Bangalore Generic Optimal Control Problem

More information

An Ensemble Kalman Smoother for Nonlinear Dynamics

An Ensemble Kalman Smoother for Nonlinear Dynamics 1852 MONTHLY WEATHER REVIEW VOLUME 128 An Ensemble Kalman Smoother or Nonlinear Dynamics GEIR EVENSEN Nansen Environmental and Remote Sensing Center, Bergen, Norway PETER JAN VAN LEEUWEN Institute or Marine

More information

In many diverse fields physical data is collected or analysed as Fourier components.

In many diverse fields physical data is collected or analysed as Fourier components. 1. Fourier Methods In many diverse ields physical data is collected or analysed as Fourier components. In this section we briely discuss the mathematics o Fourier series and Fourier transorms. 1. Fourier

More information

Reliability Monitoring Using Log Gaussian Process Regression

Reliability Monitoring Using Log Gaussian Process Regression COPYRIGHT 013, M. Modarres Reliability Monitoring Using Log Gaussian Process Regression Martin Wayne Mohammad Modarres PSA 013 Center for Risk and Reliability University of Maryland Department of Mechanical

More information

Parallel Gaussian Process Optimization with Upper Confidence Bound and Pure Exploration

Parallel Gaussian Process Optimization with Upper Confidence Bound and Pure Exploration Parallel Gaussian Process Optimization with Upper Confidence Bound and Pure Exploration Emile Contal David Buffoni Alexandre Robicquet Nicolas Vayatis CMLA, ENS Cachan, France September 25, 2013 Motivating

More information

Chapter 6 Reliability-based design and code developments

Chapter 6 Reliability-based design and code developments Chapter 6 Reliability-based design and code developments 6. General Reliability technology has become a powerul tool or the design engineer and is widely employed in practice. Structural reliability analysis

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Fisher Consistency of Multicategory Support Vector Machines

Fisher Consistency of Multicategory Support Vector Machines Fisher Consistency o Multicategory Support Vector Machines Yueng Liu Department o Statistics and Operations Research Carolina Center or Genome Sciences University o North Carolina Chapel Hill, NC 7599-360

More information

ROBUST STABILITY AND PERFORMANCE ANALYSIS OF UNSTABLE PROCESS WITH DEAD TIME USING Mu SYNTHESIS

ROBUST STABILITY AND PERFORMANCE ANALYSIS OF UNSTABLE PROCESS WITH DEAD TIME USING Mu SYNTHESIS ROBUST STABILITY AND PERFORMANCE ANALYSIS OF UNSTABLE PROCESS WITH DEAD TIME USING Mu SYNTHESIS I. Thirunavukkarasu 1, V. I. George 1, G. Saravana Kumar 1 and A. Ramakalyan 2 1 Department o Instrumentation

More information

Linear Quadratic Regulator (LQR) Design I

Linear Quadratic Regulator (LQR) Design I Lecture 7 Linear Quadratic Regulator LQR) Design I Dr. Radhakant Padhi Asst. Proessor Dept. o Aerospace Engineering Indian Institute o Science - Bangalore LQR Design: Problem Objective o drive the state

More information

THE use of radio frequency channels assigned to primary. Traffic-Aware Channel Sensing Order in Dynamic Spectrum Access Networks

THE use of radio frequency channels assigned to primary. Traffic-Aware Channel Sensing Order in Dynamic Spectrum Access Networks EEE JOURNAL ON SELECTED AREAS N COMMUNCATONS, VOL. X, NO. X, X 01X 1 Traic-Aware Channel Sensing Order in Dynamic Spectrum Access Networks Chun-Hao Liu, Jason A. Tran, Student Member, EEE, Przemysław Pawełczak,

More information

On the Efficiency of Maximum-Likelihood Estimators of Misspecified Models

On the Efficiency of Maximum-Likelihood Estimators of Misspecified Models 217 25th European Signal Processing Conerence EUSIPCO On the Eiciency o Maximum-ikelihood Estimators o Misspeciied Models Mahamadou amine Diong Eric Chaumette and François Vincent University o Toulouse

More information

Equidistant Polarizing Transforms

Equidistant Polarizing Transforms DRAFT 1 Equidistant Polarizing Transorms Sinan Kahraman Abstract arxiv:1708.0133v1 [cs.it] 3 Aug 017 This paper presents a non-binary polar coding scheme that can reach the equidistant distant spectrum

More information

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

PILCO: A Model-Based and Data-Efficient Approach to Policy Search PILCO: A Model-Based and Data-Efficient Approach to Policy Search (M.P. Deisenroth and C.E. Rasmussen) CSC2541 November 4, 2016 PILCO Graphical Model PILCO Probabilistic Inference for Learning COntrol

More information

(One Dimension) Problem: for a function f(x), find x 0 such that f(x 0 ) = 0. f(x)

(One Dimension) Problem: for a function f(x), find x 0 such that f(x 0 ) = 0. f(x) Solving Nonlinear Equations & Optimization One Dimension Problem: or a unction, ind 0 such that 0 = 0. 0 One Root: The Bisection Method This one s guaranteed to converge at least to a singularity, i not

More information

OPTIMAL PLACEMENT AND UTILIZATION OF PHASOR MEASUREMENTS FOR STATE ESTIMATION

OPTIMAL PLACEMENT AND UTILIZATION OF PHASOR MEASUREMENTS FOR STATE ESTIMATION OPTIMAL PLACEMENT AND UTILIZATION OF PHASOR MEASUREMENTS FOR STATE ESTIMATION Xu Bei, Yeo Jun Yoon and Ali Abur Teas A&M University College Station, Teas, U.S.A. abur@ee.tamu.edu Abstract This paper presents

More information

Statistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes

Statistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes Statistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes Lecturer: Drew Bagnell Scribe: Venkatraman Narayanan 1, M. Koval and P. Parashar 1 Applications of Gaussian

More information

Using Gaussian Processes for Variance Reduction in Policy Gradient Algorithms *

Using Gaussian Processes for Variance Reduction in Policy Gradient Algorithms * Proceedings of the 8 th International Conference on Applied Informatics Eger, Hungary, January 27 30, 2010. Vol. 1. pp. 87 94. Using Gaussian Processes for Variance Reduction in Policy Gradient Algorithms

More information

Path Integral Stochastic Optimal Control for Reinforcement Learning

Path Integral Stochastic Optimal Control for Reinforcement Learning Preprint August 3, 204 The st Multidisciplinary Conference on Reinforcement Learning and Decision Making RLDM203 Path Integral Stochastic Optimal Control for Reinforcement Learning Farbod Farshidian Institute

More information

In: Proc. BENELEARN-98, 8th Belgian-Dutch Conference on Machine Learning, pp 9-46, 998 Linear Quadratic Regulation using Reinforcement Learning Stephan ten Hagen? and Ben Krose Department of Mathematics,

More information

The Deutsch-Jozsa Problem: De-quantization and entanglement

The Deutsch-Jozsa Problem: De-quantization and entanglement The Deutsch-Jozsa Problem: De-quantization and entanglement Alastair A. Abbott Department o Computer Science University o Auckland, New Zealand May 31, 009 Abstract The Deustch-Jozsa problem is one o the

More information

2. ETA EVALUATIONS USING WEBER FUNCTIONS. Introduction

2. ETA EVALUATIONS USING WEBER FUNCTIONS. Introduction . ETA EVALUATIONS USING WEBER FUNCTIONS Introduction So ar we have seen some o the methods or providing eta evaluations that appear in the literature and we have seen some o the interesting properties

More information

Improvement of Sparse Computation Application in Power System Short Circuit Study

Improvement of Sparse Computation Application in Power System Short Circuit Study Volume 44, Number 1, 2003 3 Improvement o Sparse Computation Application in Power System Short Circuit Study A. MEGA *, M. BELKACEMI * and J.M. KAUFFMANN ** * Research Laboratory LEB, L2ES Department o

More information

NONPARAMETRIC PREDICTIVE INFERENCE FOR REPRODUCIBILITY OF TWO BASIC TESTS BASED ON ORDER STATISTICS

NONPARAMETRIC PREDICTIVE INFERENCE FOR REPRODUCIBILITY OF TWO BASIC TESTS BASED ON ORDER STATISTICS REVSTAT Statistical Journal Volume 16, Number 2, April 2018, 167 185 NONPARAMETRIC PREDICTIVE INFERENCE FOR REPRODUCIBILITY OF TWO BASIC TESTS BASED ON ORDER STATISTICS Authors: Frank P.A. Coolen Department

More information

Today. Introduction to optimization Definition and motivation 1-dimensional methods. Multi-dimensional methods. General strategies, value-only methods

Today. Introduction to optimization Definition and motivation 1-dimensional methods. Multi-dimensional methods. General strategies, value-only methods Optimization Last time Root inding: deinition, motivation Algorithms: Bisection, alse position, secant, Newton-Raphson Convergence & tradeos Eample applications o Newton s method Root inding in > 1 dimension

More information

A Self-Tuning LQR Approach Demonstrated on an Inverted Pendulum

A Self-Tuning LQR Approach Demonstrated on an Inverted Pendulum A Self-Tuning LQR Approach Demonstrated on an Inverted Pendulum Sebastian Trimpe Alexander Millane Simon Doessegger Raffaello D Andrea Max Planck Institute for Intelligent Systems, Autonomous Motion Department,

More information

Reinforcement Learning with Reference Tracking Control in Continuous State Spaces

Reinforcement Learning with Reference Tracking Control in Continuous State Spaces Reinforcement Learning with Reference Tracking Control in Continuous State Spaces Joseph Hall, Carl Edward Rasmussen and Jan Maciejowski Abstract The contribution described in this paper is an algorithm

More information

Probabilistic Model of Error in Fixed-Point Arithmetic Gaussian Pyramid

Probabilistic Model of Error in Fixed-Point Arithmetic Gaussian Pyramid Probabilistic Model o Error in Fixed-Point Arithmetic Gaussian Pyramid Antoine Méler John A. Ruiz-Hernandez James L. Crowley INRIA Grenoble - Rhône-Alpes 655 avenue de l Europe 38 334 Saint Ismier Cedex

More information

Introduction. Chapter 1

Introduction. Chapter 1 Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics

More information

Optimisation séquentielle et application au design

Optimisation séquentielle et application au design Optimisation séquentielle et application au design d expériences Nicolas Vayatis Séminaire Aristote, Ecole Polytechnique - 23 octobre 2014 Joint work with Emile Contal (computer scientist, PhD student)

More information

Biplots in Practice MICHAEL GREENACRE. Professor of Statistics at the Pompeu Fabra University. Chapter 6 Offprint

Biplots in Practice MICHAEL GREENACRE. Professor of Statistics at the Pompeu Fabra University. Chapter 6 Offprint Biplots in Practice MICHAEL GREENACRE Proessor o Statistics at the Pompeu Fabra University Chapter 6 Oprint Principal Component Analysis Biplots First published: September 010 ISBN: 978-84-93846-8-6 Supporting

More information

CHAPTER 1: INTRODUCTION. 1.1 Inverse Theory: What It Is and What It Does

CHAPTER 1: INTRODUCTION. 1.1 Inverse Theory: What It Is and What It Does Geosciences 567: CHAPTER (RR/GZ) CHAPTER : INTRODUCTION Inverse Theory: What It Is and What It Does Inverse theory, at least as I choose to deine it, is the ine art o estimating model parameters rom data

More information

AH 2700A. Attenuator Pair Ratio for C vs Frequency. Option-E 50 Hz-20 khz Ultra-precision Capacitance/Loss Bridge

AH 2700A. Attenuator Pair Ratio for C vs Frequency. Option-E 50 Hz-20 khz Ultra-precision Capacitance/Loss Bridge 0 E ttenuator Pair Ratio or vs requency NEEN-ERLN 700 Option-E 0-0 k Ultra-precision apacitance/loss ridge ttenuator Ratio Pair Uncertainty o in ppm or ll Usable Pairs o Taps 0 0 0. 0. 0. 07/08/0 E E E

More information

STAT 801: Mathematical Statistics. Hypothesis Testing

STAT 801: Mathematical Statistics. Hypothesis Testing STAT 801: Mathematical Statistics Hypothesis Testing Hypothesis testing: a statistical problem where you must choose, on the basis o data X, between two alternatives. We ormalize this as the problem o

More information

A Simple Explanation of the Sobolev Gradient Method

A Simple Explanation of the Sobolev Gradient Method A Simple Explanation o the Sobolev Gradient Method R. J. Renka July 3, 2006 Abstract We have observed that the term Sobolev gradient is used more oten than it is understood. Also, the term is oten used

More information

SIO 211B, Rudnick. We start with a definition of the Fourier transform! ĝ f of a time series! ( )

SIO 211B, Rudnick. We start with a definition of the Fourier transform! ĝ f of a time series! ( ) SIO B, Rudnick! XVIII.Wavelets The goal o a wavelet transorm is a description o a time series that is both requency and time selective. The wavelet transorm can be contrasted with the well-known and very

More information

Expectation Propagation in Dynamical Systems

Expectation Propagation in Dynamical Systems Expectation Propagation in Dynamical Systems Marc Peter Deisenroth Joint Work with Shakir Mohamed (UBC) August 10, 2012 Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 1 Motivation Figure : Complex

More information

Feedback Linearization

Feedback Linearization Feedback Linearization Peter Al Hokayem and Eduardo Gallestey May 14, 2015 1 Introduction Consider a class o single-input-single-output (SISO) nonlinear systems o the orm ẋ = (x) + g(x)u (1) y = h(x) (2)

More information

Figure 1: Bayesian network for problem 1. P (A = t) = 0.3 (1) P (C = t) = 0.6 (2) Table 1: CPTs for problem 1. (c) P (E B) C P (D = t) f 0.9 t 0.

Figure 1: Bayesian network for problem 1. P (A = t) = 0.3 (1) P (C = t) = 0.6 (2) Table 1: CPTs for problem 1. (c) P (E B) C P (D = t) f 0.9 t 0. Probabilistic Artiicial Intelligence Problem Set 3 Oct 27, 2017 1. Variable elimination In this exercise you will use variable elimination to perorm inerence on a bayesian network. Consider the network

More information

Documents de Travail du Centre d Economie de la Sorbonne

Documents de Travail du Centre d Economie de la Sorbonne Documents de Travail du Centre d Economie de la Sorbonne On equilibrium payos in wage bargaining with discount rates varying in time Ahmet OZKARDAS, Agnieszka RUSINOWSKA 2014.11 Maison des Sciences Économiques,

More information

Multi-Robot Transfer Learning: A Dynamical System Perspective

Multi-Robot Transfer Learning: A Dynamical System Perspective 2017 IEEE/RSJ International Conerence on Intelligent Robots and Systems (IROS) September 24 28, 2017, Vancouver, BC, Canada Multi-Robot Transer Learning: A Dynamical System Perspective Mohamed K. Helwa

More information

Fluctuationlessness Theorem and its Application to Boundary Value Problems of ODEs

Fluctuationlessness Theorem and its Application to Boundary Value Problems of ODEs Fluctuationlessness Theorem and its Application to Boundary Value Problems o ODEs NEJLA ALTAY İstanbul Technical University Inormatics Institute Maslak, 34469, İstanbul TÜRKİYE TURKEY) nejla@be.itu.edu.tr

More information

Lecture 8 Optimization

Lecture 8 Optimization 4/9/015 Lecture 8 Optimization EE 4386/5301 Computational Methods in EE Spring 015 Optimization 1 Outline Introduction 1D Optimization Parabolic interpolation Golden section search Newton s method Multidimensional

More information

Stochastic Game Approach for Replay Attack Detection

Stochastic Game Approach for Replay Attack Detection Stochastic Game Approach or Replay Attack Detection Fei Miao Miroslav Pajic George J. Pappas. Abstract The existing tradeo between control system perormance and the detection rate or replay attacks highlights

More information

Gaussian Processes for Machine Learning

Gaussian Processes for Machine Learning Gaussian Processes for Machine Learning Carl Edward Rasmussen Max Planck Institute for Biological Cybernetics Tübingen, Germany carl@tuebingen.mpg.de Carlos III, Madrid, May 2006 The actual science of

More information

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu Lecture: Gaussian Process Regression STAT 6474 Instructor: Hongxiao Zhu Motivation Reference: Marc Deisenroth s tutorial on Robot Learning. 2 Fast Learning for Autonomous Robots with Gaussian Processes

More information

Robust feedback linearization

Robust feedback linearization Robust eedback linearization Hervé Guillard Henri Bourlès Laboratoire d Automatique des Arts et Métiers CNAM/ENSAM 21 rue Pinel 75013 Paris France {herveguillardhenribourles}@parisensamr Keywords: Nonlinear

More information

Special types of Riemann sums

Special types of Riemann sums Roberto s Notes on Subject Chapter 4: Deinite integrals and the FTC Section 3 Special types o Riemann sums What you need to know already: What a Riemann sum is. What you can learn here: The key types o

More information

Numerical Solution of Ordinary Differential Equations in Fluctuationlessness Theorem Perspective

Numerical Solution of Ordinary Differential Equations in Fluctuationlessness Theorem Perspective Numerical Solution o Ordinary Dierential Equations in Fluctuationlessness Theorem Perspective NEJLA ALTAY Bahçeşehir University Faculty o Arts and Sciences Beşiktaş, İstanbul TÜRKİYE TURKEY METİN DEMİRALP

More information

Neutron inverse kinetics via Gaussian Processes

Neutron inverse kinetics via Gaussian Processes Neutron inverse kinetics via Gaussian Processes P. Picca Politecnico di Torino, Torino, Italy R. Furfaro University of Arizona, Tucson, Arizona Outline Introduction Review of inverse kinetics techniques

More information

What the No Free Lunch Theorems Really Mean; How to Improve Search Algorithms

What the No Free Lunch Theorems Really Mean; How to Improve Search Algorithms What the No Free Lunch Theorems Really Mean; How to Improve Search Algorithms David H. Wolpert SFI WORKING PAPER: 2012-10-017 SFI Working Papers contain accounts o scientiic work o the author(s) and do

More information

Demonstration of Emulator-Based Bayesian Calibration of Safety Analysis Codes: Theory and Formulation

Demonstration of Emulator-Based Bayesian Calibration of Safety Analysis Codes: Theory and Formulation Demonstration o Emulator-Based Bayesian Calibration o Saety Analysis Codes: Theory and Formulation The MIT Faculty has made this article openly available. Please share how this access beneits you. Your

More information

Massachusetts Institute of Technology

Massachusetts Institute of Technology Massachusetts Institute of Technology 6.867 Machine Learning, Fall 2006 Problem Set 5 Due Date: Thursday, Nov 30, 12:00 noon You may submit your solutions in class or in the box. 1. Wilhelm and Klaus are

More information

Practical Bayesian Optimization of Machine Learning. Learning Algorithms

Practical Bayesian Optimization of Machine Learning. Learning Algorithms Practical Bayesian Optimization of Machine Learning Algorithms CS 294 University of California, Berkeley Tuesday, April 20, 2016 Motivation Machine Learning Algorithms (MLA s) have hyperparameters that

More information

OBSERVER/KALMAN AND SUBSPACE IDENTIFICATION OF THE UBC BENCHMARK STRUCTURAL MODEL

OBSERVER/KALMAN AND SUBSPACE IDENTIFICATION OF THE UBC BENCHMARK STRUCTURAL MODEL OBSERVER/KALMAN AND SUBSPACE IDENTIFICATION OF THE UBC BENCHMARK STRUCTURAL MODEL Dionisio Bernal, Burcu Gunes Associate Proessor, Graduate Student Department o Civil and Environmental Engineering, 7 Snell

More information

Statistical Techniques in Robotics (16-831, F12) Lecture#20 (Monday November 12) Gaussian Processes

Statistical Techniques in Robotics (16-831, F12) Lecture#20 (Monday November 12) Gaussian Processes Statistical Techniques in Robotics (6-83, F) Lecture# (Monday November ) Gaussian Processes Lecturer: Drew Bagnell Scribe: Venkatraman Narayanan Applications of Gaussian Processes (a) Inverse Kinematics

More information

Solving Multi-Mode Time-Cost-Quality Trade-off Problem in Uncertainty Condition Using a Novel Genetic Algorithm

Solving Multi-Mode Time-Cost-Quality Trade-off Problem in Uncertainty Condition Using a Novel Genetic Algorithm International Journal o Management and Fuzzy Systems 2017; 3(3): 32-40 http://www.sciencepublishinggroup.com/j/ijms doi: 10.11648/j.ijms.20170303.11 Solving Multi-Mode Time-Cost-Quality Trade-o Problem

More information

Probabilistic numerics for deep learning

Probabilistic numerics for deep learning Presenter: Shijia Wang Department of Engineering Science, University of Oxford rning (RLSS) Summer School, Montreal 2017 Outline 1 Introduction Probabilistic Numerics 2 Components Probabilistic modeling

More information

Gaussian Process Regression: Active Data Selection and Test Point. Rejection. Sambu Seo Marko Wallat Thore Graepel Klaus Obermayer

Gaussian Process Regression: Active Data Selection and Test Point. Rejection. Sambu Seo Marko Wallat Thore Graepel Klaus Obermayer Gaussian Process Regression: Active Data Selection and Test Point Rejection Sambu Seo Marko Wallat Thore Graepel Klaus Obermayer Department of Computer Science, Technical University of Berlin Franklinstr.8,

More information

Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm

Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm Michail G. Lagoudakis Department of Computer Science Duke University Durham, NC 2778 mgl@cs.duke.edu

More information

Optimal Control. McGill COMP 765 Oct 3 rd, 2017

Optimal Control. McGill COMP 765 Oct 3 rd, 2017 Optimal Control McGill COMP 765 Oct 3 rd, 2017 Classical Control Quiz Question 1: Can a PID controller be used to balance an inverted pendulum: A) That starts upright? B) That must be swung-up (perhaps

More information

A parametric approach to Bayesian optimization with pairwise comparisons

A parametric approach to Bayesian optimization with pairwise comparisons A parametric approach to Bayesian optimization with pairwise comparisons Marco Co Eindhoven University of Technology m.g.h.co@tue.nl Bert de Vries Eindhoven University of Technology and GN Hearing bdevries@ieee.org

More information

arxiv: v1 [cs.sy] 15 Nov 2018

arxiv: v1 [cs.sy] 15 Nov 2018 Reduced Order Model Predictive Control For Setpoint Tracking Joseph Lorenzetti, Benoit Landry, Sumeet Singh, Marco Pavone arxiv:1811.659v1 cs.sy 15 Nov 218 Abstract Despite the success o model predictive

More information

Feedback linearization control of systems with singularities: a ball-beam revisit

Feedback linearization control of systems with singularities: a ball-beam revisit Chapter 1 Feedback linearization control o systems with singularities: a ball-beam revisit Fu Zhang The Mathworks, Inc zhang@mathworks.com Benito Fernndez-Rodriguez Department o Mechanical Engineering,

More information

Power Spectral Analysis of Elementary Cellular Automata

Power Spectral Analysis of Elementary Cellular Automata Power Spectral Analysis o Elementary Cellular Automata Shigeru Ninagawa Division o Inormation and Computer Science, Kanazawa Institute o Technology, 7- Ohgigaoka, Nonoichi, Ishikawa 92-850, Japan Spectral

More information

Distributed Optimization Methods for Wide-Area Damping Control of Power System Oscillations

Distributed Optimization Methods for Wide-Area Damping Control of Power System Oscillations Preprints o the 9th World Congress The International Federation o Automatic Control Cape Town, South Arica. August 4-9, 4 Distributed Optimization Methods or Wide-Area Damping Control o Power System Oscillations

More information

Partially Observable Markov Decision Processes (POMDPs) Pieter Abbeel UC Berkeley EECS

Partially Observable Markov Decision Processes (POMDPs) Pieter Abbeel UC Berkeley EECS Partially Observable Markov Decision Processes (POMDPs) Pieter Abbeel UC Berkeley EECS Many slides adapted from Jur van den Berg Outline POMDPs Separation Principle / Certainty Equivalence Locally Optimal

More information

Nonparametric Regression With Gaussian Processes

Nonparametric Regression With Gaussian Processes Nonparametric Regression With Gaussian Processes From Chap. 45, Information Theory, Inference and Learning Algorithms, D. J. C. McKay Presented by Micha Elsner Nonparametric Regression With Gaussian Processes

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

A Systematic Approach to Frequency Compensation of the Voltage Loop in Boost PFC Pre- regulators.

A Systematic Approach to Frequency Compensation of the Voltage Loop in Boost PFC Pre- regulators. A Systematic Approach to Frequency Compensation o the Voltage Loop in oost PFC Pre- regulators. Claudio Adragna, STMicroelectronics, Italy Abstract Venable s -actor method is a systematic procedure that

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

EE C128 / ME C134 Feedback Control Systems

EE C128 / ME C134 Feedback Control Systems EE C128 / ME C134 Feedback Control Systems Lecture Additional Material Introduction to Model Predictive Control Maximilian Balandat Department of Electrical Engineering & Computer Science University of

More information

A Brief Survey on Semi-supervised Learning with Graph Regularization

A Brief Survey on Semi-supervised Learning with Graph Regularization 000 00 002 003 004 005 006 007 008 009 00 0 02 03 04 05 06 07 08 09 020 02 022 023 024 025 026 027 028 029 030 03 032 033 034 035 036 037 038 039 040 04 042 043 044 045 046 047 048 049 050 05 052 053 A

More information

Strong Lyapunov Functions for Systems Satisfying the Conditions of La Salle

Strong Lyapunov Functions for Systems Satisfying the Conditions of La Salle 06 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 49, NO. 6, JUNE 004 Strong Lyapunov Functions or Systems Satisying the Conditions o La Salle Frédéric Mazenc and Dragan Ne sić Abstract We present a construction

More information

A UNIFIED FRAMEWORK FOR MULTICHANNEL FAST QRD-LS ADAPTIVE FILTERS BASED ON BACKWARD PREDICTION ERRORS

A UNIFIED FRAMEWORK FOR MULTICHANNEL FAST QRD-LS ADAPTIVE FILTERS BASED ON BACKWARD PREDICTION ERRORS A UNIFIED FRAMEWORK FOR MULTICHANNEL FAST QRD-LS ADAPTIVE FILTERS BASED ON BACKWARD PREDICTION ERRORS César A Medina S,José A Apolinário Jr y, and Marcio G Siqueira IME Department o Electrical Engineering

More information