Lecture Notes: (Stochastic) Optimal Control

Size: px
Start display at page:

Download "Lecture Notes: (Stochastic) Optimal Control"

Transcription

1 Lecture Notes: (Stochastic) Optimal ontrol Marc Toussaint Machine Learning & Robotics group, TU erlin Franklinstr. 28/29, FR 6-9, 587 erlin, Germany July, 2 Disclaimer: These notes are not meant to be a complete or comprehensive survey on Stochastic Optimal ontrol. This is more of a personal script which I use to keep an overview over control methods and their derivations. One point of these notes is to fix a consistent notation and provide a coherent overview for these specific methods. ontents Notation 2 Stochastic optimal control (discrete time) 2 2. The Linear-uadratic-Gaussian (LQG) case Message passing in LQG Special case: kinematic control Special case: multiple kinematic task variables Special case: Pseudo-dynamic process Special case: dynamic process Special case: multiple dynamic/kinematic task variables The optimization view on classical control laws 7 3. General uadratic loss function and constraints Ridge regression Motion rate control (pseudo-inverse kinematics) Regularized inverse kinematics (singularity robust motion rate control) Multiple regularized task variables Multiple prioritized task variables (prioritized inverse kinematics) Optimal dynamic control (incl. operational space control) Notation x system state (can be or (, )) R n robot posture (vector of joint angles) y R d a task variable (e.g., endeffector position) φ : y differentiable kinematic function J() = φ Rd n task Jacobian in posture We define a Gaussian over x with mean a and covariance matrix A as the function N(x a, A) = 2πA /2 exp{ 2 (x a) A - (x a)} () with property N(x a, A) = N(a x, A). We also define the canonical representation Nx a, A = exp{ 2 a A - a} 2πA - /2 exp{ 2 x A x x a} (2) with properties Nx a, A = N(x A - a, A - ), N(x a, A) = Nx A - a, A -.

2 Lecture Notes: (Stochastic) Optimal ontrol, Marc Toussaint July, 2 2 The product of two Gaussians can be expressed as Nx a, A Nx b, = Nx a b, A N(A - a - b, A - - ), (3) N(x a, A) N(x b, ) = Nx A - a - b, A - - N(a b, A ), (4) N(x a, A) Nx b, = Nx A - a b, A - N(a - b, A - ). (5) Linear transformations in x imply the following identities, N(F x f a, A) = F N(x F - (a f), F - AF - ), (6) = F Nx F A - (a f), F A - F, (7) NF x f a, A = F Nx F (a Af), F AF. (8) The joint Gaussian of two linearly dependent Gaussian variables reads N(x a, A) N(y b F x, ) = N ( x y A a A, b F a A A F ) F A F A F A (See the lecture notes on Gaussian identities for more identities.) Let us collect some matrix identities which we will need throughout. The Woodbury identity (J - J W ) - J - = W - J (JW - J ) -, holds for any positive definite and W. Further we have the identity I n (J - J W ) - J - J = (J - J W ) - W. () We define the pseudo-inverse of J w.r.t. W as (9) J W = W - J (JW - J ) - () and similar uantity as J = (J- J ) - J -. (2) 2 Stochastic optimal control (discrete time) We assume a framework that is basically the same as for Markov Decision Processes but with a slight change in notation. Instead of maximizing rewards we will minimize costs; instead of an action a t we refer to the control u t ; instead of V (x) we denote the optimal value function by J t (x). ost and dynamics are generally non-stationary. onsider a discrete time stochastic controlled system x t = f t (x t, u t ) ξ t, ξ t N(, Q t ) (3) with the state x t R n, the control signal u t R m, and Gaussian noise ξ of covariance Q. An alternative notation is P (x t u t, x t ) = N(x t f t (x t, u t ), Q t ) (4) For a given state-control seuence x :T, u :T we define the cost (x :T, u :T ) = c t (x t, u t ). t= Unlike the reward function in stationary MDPs, this cost function is typically not stationary. For instance, the cost might focus on the final state x T relative to a goal state. In conseuence also optimal policies are non-stationary. Just as we had in the MDP case, the value function obeys the ellman optimality euation J t (x) = min c t (x, u) P (x u, x) J t (x ) (6) u x There are two versions of stochastic optimal control problems: The open-loop control problem is to find a control seuence u :T that minimizes the expected cost. The closed-loop (feedback) control problem is find a control policy πt : x t u t (that exploits the true state observation in each time step and maps it to a feedback control signal) that minimizes the expected cost. (5)

3 Lecture Notes: (Stochastic) Optimal ontrol, Marc Toussaint July, The Linear-uadratic-Gaussian (LQG) case onsider a linear control process with Gaussian noise, P (x t x t, u t ) = N(x t A t x t a t t u t, Q t ), (7) and uadratic costs, c t (x t, u t ) = x tr t x t 2r t x t u th t u t. (8) The general LQG case is specified by matrices and vectors A :T, a :T, :T, Q :T, R :T, r :T, H :T. With a proper choice of R t and r t this corresponds to the problem of tracing an arbitrary desired trajectory x t, where the cost is uadratic in (x t x t ). The LQG case allows us to derive an exact backward recursion, called Riccati euation, for the computation of the value function. The value function will always be a uadratic form of the state. Let us assume that we know the value function J t (x) at time t and that it has the form Then J t (x) = x V t x 2v tx. J t (x) = min x R t x 2r t x u H t u u N(y A t x a t t u, Q t ) (y V t y 2v ty) dx y (9) (2) ONVENTION: For the remainder of this section we will drop the subscript t for A, a,, Q, R, r, H whereever it is missing we refer to time t. The expectation of a uadratic form under a Gaussian is E N(y a,a) {y V y 2v y} = a V a 2v a tr(v A). So we have J t (x) = min x Rx 2r x u Hu (Ax a u) V t (Ax a u) u 2v t(ax a u) tr(v t Q) = min x (R A V t A)x 2(r (v t V t a) A)x u (H V t )u u 2u (V t (Ax a) v t ) a V t a 2v ta tr(v t Q) Minimizing w.r.t. u by setting the gradient to zero we have = 2(H V t )u 2 (V t (Ax a) v t ) (23) u t (x) = (H V t ) - (V t (Ax a) v t ) (24) J t (x) = x (R A V t A)x 2(r (v t V t a) A)x (V t (Ax a) v t ) (H V t ) - (V t (Ax a) v t ) 2(V t (Ax a) v t ) (H V t ) - (V t (Ax a) v t ) a V t a 2v ta tr(v t Q) (25) J t (x) = x V t x 2x v t terms independent of x, V t = R A V t A KV t A v t = r A (v t V t a) K(v t V t a) (27) K := A V t(v t (H - ) - ) -, The last euation for J t is called the Ricatti euation. Initialized with V T = R T and v T = r T this gives a backward recursion to compute the value function J t at each time step. Euation (24) also gives the optimal control policy. Note that the optimal control and path is independent of the process noise Q. (2) (22) (26) 2.2 Message passing in LQG We may translate costs to probabilities by introducing a binary random variable z t dependent on x t and u t, P (z t = u t, x t ) = exp{ c t (x t, u t )}. (28)

4 Lecture Notes: (Stochastic) Optimal ontrol, Marc Toussaint July, 2 4 In the LQG case we can simplify this to P (z t = x t ) Nx t r t, R t, P (u t ) = N(u t, H - ) (29) (3) What is the posterior over the state trajectory x :T conditioned on that we permanently observe ĉ t =? Since we can integrate out the control u t this is a simple Markov process with continuous state and Gaussian observation, P (x t x t ) = u du N(x t Ax t a u, Q) N(u, H - ) (3) = N(x t Ax t a, Q H - ). (32) Inference is a standard forward-backward process just as Kalman smoothing. The messages read µ xt- x t (x t ) = N(x t s t, S t ), s t = a t- A t- (S - t- R t- ) - (S - t-s t- r t- ) S t = Q t- H - t- A t- (S - t- R t- ) - A t- (33) µ xt x t (x t ) = N(x t v t, V t ), v t = A - t a t A - t (Vt - R t ) - (Vtv - t r t ) V t = A - t Q t H - t (Vt - R t ) - A - t (34) µĉt x t (x t ) = Nx t r t, R t, The potentials (v t, V T ) which define the backward message can also be expressed in a different way: let us define (35) V t = Vt - R t v t = Vtv - t r t, (36) (37) which corresponds to a backward message (in canonical representation) which has the cost message already absorbed. Using a special case of the Woodbury identity, (A - ) - = A A(A - ) - A, (38) the bwd messages can be rewritten as Vt - = A - V t Q H - - A = A Vt A K V t A K := A Vt V t (Q H - ) - - Vt - v t = A Vt a t A v t K V t a t K v t = A ( v t V t a t ) K( v t V t a t ) (4) V t = R t (A K) V t A (4) v t = r t (A K)( v t V t a t ) They correspond exactly to the Recatti euations (26), (27) except for the dependence on Q which interacts directly with the control cost metric H. Yet another way to write them is: Proof. V t = R t A I V t V t (Q H - ) - - Vt A (43) v t = r t A I V t V t (Q H - ) - - ( v t V t a t ) (44) Since all factors are pairwise we can use the expression (??) for the messages. We have µ xt- x t (x t) = R x t- dx t- P (x t x t-) µ xt-2 x t- (x t-) µĉt- x t- (x t-) = R x t- dx t- N(x t A t-x t- a t-, Q t-h - t-) N(x t- s t-, S t-) Nx t- r t-, R t- Using the product rule (4) on the last two terms gives a Gaussian N(s t- R - t-r t-, S t- R - t-) independent of x t which we can subsume in the normalization. What remains is µ xt- x t (x t) R x t- dx t- N(x t A t-x t- a t-, Q t-h - t-) Nx t- St-s t- r t-, St- - R t- = R x t- dx t- N(x t A t-x t- a t-, Q t-h - t-) N(x t- (St- - R t-) - (St-s t- r t-), (St- - R t-) - ) (39) (42)

5 Lecture Notes: (Stochastic) Optimal ontrol, Marc Toussaint July, 2 5 = N(x t A t-(s - t- R t-) - (S - t-s t- r t-) a t-, Q t-h - t- A t-(s - t- R t-) - A t-) which gives the messages as in (33). For comparison we also give the canonical representation. Let S t- = St- - R t- and s t = St-s t- r t-, S t = Q t-h - t- A S- t- t- A t- = A t-{a - t-(q t-h - t-)a - - t- S t-}a t- St - = A - t-{ S t- S t- S t- A t-(q t-h - t-) - A t- S t-}a - t- s t = a t- A S- t- t- s t St - s t = A - S t- t-a - t-a t- A - t- s t A - S t- t- S t-a - t-a t- A - S t- t- s t = A - t-( s t S t-a - t-a t-) A - S t- t- ( s t S t-a - t-a t-) We repeat the derivation for µ xt x t (x t), µ xt x t (x t) = R x t dx t P (x t x t) µ xt2 x t (x t) µĉt x t (x t) = R x t dx t N(x t A tx t a t, Q th - t ) N(x t v t, V t) Nx t r t, R t R x t dx t N(x t A tx t a t, Q th - t ) Nx t Vtv - t r t, Vt - R t = N(A tx t a t (Vt - R t) - (Vtv - t r t), Q th - t (Vt - R t) - ) = N(x t A - t a t A - t (Vt - R t) - (Vtv - t r t), A - t Q th - t (Vt - R t) - A - t ) For this backward message it is instructive to derive the canonical representation. Let V t = Vt - R t and v t = Vtv - t r t, Vt - = A - V t Q H - - A = A VtA A Vt V t (Q H - ) - - VtA = A { V t V t V t (Q H - ) - - Vt}A v t = A - t a t A - - t V t v t Vt - v t = A Vta t A v t A Vt - Vta t A Vt - v t = A ( v t V ta t) A Vt - ( v t V ta t) 2.3 Special case: kinematic control We assume a kinematic control problem: (We write instead of x.) The Process is simply t = t u t ξ. (45) This means we have A t = t =, a t = (46) 2.4 Special case: multiple kinematic task variables Let φ i : x i be a kinematic mapping to a task variable y i and J i () its Jacobian. We assume we are given targets y i,:t in the task space and (time-dependent) precisions ϱ i,t by which we want to follow the task targets. We have c t ( t, u t ) = u t H R t = r t = u t H = u t H i= i= yi,t φ i ( t ) - i i= yi,t φ i (ˆ t ) J iˆ t J i t -, J i = J i (ˆ t ) i i= i= J i - i J i t J i - i J i t 2(y i,t φ i (ˆ t ) J iˆ t ) - i J i t const (47) J i - i (y i,t φ i (ˆ t ) J iˆ t ) (49) Note: the product of a fwd message with a task message corresponds to the classical optimal control for multiple regularized task variables (92) N( t s t, S t ) N t r t, R t N( t b, ) (5) - b = R t St - r t St - s t (5) (48)

6 Lecture Notes: (Stochastic) Optimal ontrol, Marc Toussaint July, Special case: Pseudo-dynamic process We replace x by t = t A and assume u t corresponds directly to accelerations: t P ( t t, t ) = N( t t τ t, W - ), P ( t t, u t ) = N( t t τu t, Q), t A = τ t A A τ 2 Au t ξ, dξdξ = t t τ A = τ A, = τ 2 A, a = (52) (53) W - A (54) Q The following euations might be computationally more efficient than the general Ricatti recursion but I m not sure, u t (x) = (H V t ) - (V t (Ax a) v t ) V t = v t = V t V 2 t Vt 3 Vt 4 v t vt 2 A = R V. τ V. V. V. 2 V. 3 τv. A τ 2 V. 3 τv. 3 V. 4 A(H τ 2 V. 4 ) - V. 3, τv. 3 V. 4 «, (56) A = r v. τ v A τ 2 V 3. τv. 3 V. 4 A(H τ 2 V. 4 ) - v. (57) u t (x) = (H τ 2 V 4. ) - (V t Ax v t ) 2.6 Special case: dynamic process We replace x by t = t A t P ( t t, t ) = N( t t τ t, W - ), P ( t t, u t ) = N( t t τm - (u t F ), Q), (6) t A = τ t A A τ 2 AM - (u t F ) ξ, dξdξ = W - A t τ Q (6) t A = τ A, = τ 2 M - τm - A, a = τ 2 M - F τm - A (62) F The following euations might be computationally more efficient than the general Ricatti recursion but I m not sure, u t (x) = (H V t ) - (V t (Ax a) v t ) V t = v t = V t V 2 t Vt 3 Vt 4 v t vt 2 A = R V. τ A = r v. τ V.. V 3 V. V 2 2V 2. F v 2V. 4 F 2τV 2 V 3. (55) (58) (59). τv. A τ 2 τv. 3 V. 4 AM - (H τ 2 M - V. 4 M - ) - M - V. 3, τv. 3 V. 4 (63). F V 3. A τ 2 τv. 3 V. 4 AM - (H τ 2 M - V. 4 M - ) - M - (v. 2τ V 2. F V. 4 F u t (x) = (H τ 2 M - V 4. M - ) - M - (V t (Ax a) v t ) (65) 2.7 Special case: multiple dynamic/kinematic task variables As before we have access to kinematic functions φ i () and Jacobians J i (). We are given task targets x i,:t and ẋ i,:t and want to follow them with (time-dependent) precisions ϱ i,:t and ν i,:t. We have c( t, t, u t ) = ϱ i,t x i,t φ i ( t ) 2 ν i,t ẋ i,t J i t 2 u th t u t i= ϱ i,t t J i J i t 2(x i,t φ i (ˆ t ) J iˆ t ) J i t const ν i,t t J i J i t 2(ẋ i,t) J i t const u th t u t i= A) (64) (66) «,

7 Lecture Notes: (Stochastic) Optimal ontrol, Marc Toussaint July, 2 7 R t = r t = i= i= ϱ i,t J i J i ν i,t J i J i A ϱ i,t J i (x i,t φ i(ˆ t ) J iˆ t ) ν i,t J i ẋ A (68) i,t 3 The optimization view on classical control laws In this section we will first review classical control laws as minimization of a basic loss function. Since this loss function has a ayesian interpretation it will be straight-forward to develop also a ayesian view on these control laws. The ayesian inference approach can then be generalized to what we actually aim for: motion planning in temporal probabilistic models. (67) 3. General uadratic loss function and constraints Let y R d and R n. Given y, consider the problem of finding that minimizes L = W y J - 2h W (69) where W = W denotes a norm and and W are symmetric positive definite matrices. This loss function can be interpreted as follows: The first term measures how well a constraint y = J is fulfilled relative to a covariance matrix, the second term measures = with metric W, the third term measures the scalar product between and h w.r.t. W. The solution can easily be found by taking the derivative L = 2 W 2(y J) - J 2h W L = = (J - J W ) - (J - y W h) Using the Woddbury and related identities and definitions as given in section we can rewrite the solution in several forms: = (J - J W ) - (J - y W h) = (J - J W ) - J - y I n (J - J W ) - J - J h (7) = (J - J W ) - J - (y Jh) h (72) = W - J (JW - J ) - y I n W - J (JW - J ) - J h (73) = W - J (JW - J ) - (y Jh) h. (74) This also allows us to properly derive the following limits: : = J W y (I n J W J) h = J W (y Jh) h (75) W : = J y (I n J J) h = J (y Jh) h W = λi n : = J (JJ λ) - y I n J (JJ λ) - J h (76) = σi d : = (J J σw ) - J y I n (J J σw ) - J J h These limits can be interpreted as follows. : we need to fulfill the constraint y = J exactly. = σi d : we use a standard suared error measure for y J. W : we do not care about the norm W (i.e., no regularization); but interestingly, the cost term h W has a nullspace effect also in this limit. W = λi n : we use a standard ridge as regulariser. The first of these limits is perhaps the most important. It corresponds to a hard constraint, that is, (75) is the solution to argmin W 2h W such that y = J (77) The loss function (69) has many applications, as we discuss in the following. (7)

8 Lecture Notes: (Stochastic) Optimal ontrol, Marc Toussaint July, Ridge regression Let us first give an off topic example from machine learning: In ridge regression, when we have d samples of n-dimensional inputs and D outputs, we have a minimization problem L = y Xβ λ β with a input data matrix X R d n, an output data vector y R d and a regressor β R n. The first term measures the standard suared error (with uniform output covariance = I d ), the second is a regulariser (or stabilizer) as introduced by Tikhonov. The special form λ β of the regulariser is called ridge. The solution is given by euation (76) when replacing the notation according to β, y y, J X, I d, W λi n, h : β = X (XX λi d ) - y. In the ayesian interpretation of ridge regression, the ridge λ β defines a prior exp{ 2λ β } over the regressor β. The above euation gives the MAP β. Since ridge regression has a ayesian interpretation, also standard motion rate control, as discussed shortly, will have a ayesian interpretation. 3.3 Motion rate control (pseudo-inverse kinematics) onsider a robot with n DoFs and a d-dimensional task space with d < n (e.g., an endeffector state). The current joint state is R n. In a given state we can compute the end-effector Jacobian J and we are given a joint space potential H(). We would like to compute joint velocities which fulfill the task constraint ẏ = J while minimizing the absolute joint velocity W and following the negative gradient h = W - H(). In summary, the problem and its solution are (problem) = argmin W 2 H() such that J = ẏ (78) (solution) = J W ẏ (I n J W J) W - H(). (79) The solution was taken from (75) by replacing the notation according to, y ẏ. Note that we have derived pseudo-inverse kinematics from a basic constrained uadratic optimization problem. Let us repeat this briefly for case when time is discretized. We can formulate the problem as t = argmin t t- W 2h W t such that φ( t ) = y t (8) t Generally, the constraint φ( t ) = y t is non-linear. We linearize it at t- and get the simpler problem and its solution (problem) t = argmin t t- W 2h W t such that J( t t- ) = y t φ( t- ) (8) t (solution) t = t- J W y t φ( t- ) (I n J W J) h. (82) The solution was taken from (75) by replacing the notation according to ( t t- ), y (y t φ( t- )). 3.4 Regularized inverse kinematics (singularity robust motion rate control) Under some conditions motion rate control is infeable, for instance when the arm cannot be further streched to reach a desired endeffector position. In this case the computation of the pseudo-inverse J W becomes singular. lassical control developed the singularity robuse pseudo-inverse (Nakamura & Hanafusa, 986), which can be interpreted as regularizing the computation of the pseudo-inverse, or as relaxing the hard task constraint. In our framework this corresponds to not taking the limit. This regularized inverse kinematics is given as (problem) = argmin W ẏ J - 2h W (83) (solution) = JWẏ (I n J W J) h. (84) J W := (J - J W ) - J - = W - J (JW - J ) - (85) The solution was taken from (7) by replacing the notation according to, y ẏ. Note that J W is a regularization of J W (defined in ()). Euations (7-74) give many interesting alternatives to write this control law. The linearized ( ˆφ is linearized at t- as above) time discretized version is: (problem) t = argmin t t- W y t ˆφ( t ) - 2h W t (86) t (solution) t = t- J W y t φ( t- ) (I n J W J) h. (87)

9 Lecture Notes: (Stochastic) Optimal ontrol, Marc Toussaint July, Multiple regularized task variables Assume we have m task variables y,.., y m, where the ith variable y i R di is d i -dimensional. Also assume that we regularize w.r.t. each task variable, that is, we have different error metrics i - in each task space. We want to follow all of the tasks and express this as the optimization problem and its solution (problem) = argmin W i= i= m (solution) = J i - J W ẏ i J - 2h W i - m i= (88) J i - ẏ i W h. (89) The solution was taken from (7) in the following way: We can collect all task variables into one bit task vector y = y. A y m, J = J. A J m, =... m A J - J = i= J i - i J i, J - ẏ = i= J i - i ẏ i (9) And the linearized time discretized version: (problem) t = argmin t t- W t m (solution) t = t- J i - J W i= i= y i,t ˆφ i ( t ) - 2h W i - m i= (9) J i - y i,t φ( t- ) W h. (92) 3.6 Multiple prioritized task variables (prioritized inverse kinematics) The case of multiple task variables is classically not addressed by regularizing all of them, but by imposing a hierarchy on them (Nakamura et al., 987; aerlocher & oulic, 24). Let us first explain the classical prioritized inverse kinematics: The control law is based on standard motion rate control but iteratively projects each desired task rate ẏ i in the remaining nullspace of all higher level control signals. Initializing the nullspace projection with N = I and =, the control law is defined by iterating for i =,.., m J i = J i N i-, i = i- J i (ẏ i J i i- ), N i = N i- J i J i. (93) We call J i a nullspace Jacobian which has the property that J i projects to changes in that do not change control variables x,..,i- with higher priority. Also an additional nullspace movement h in the remaining nullspace of all control signals can be included when defining the final control law as = m N m h. (94) In effect, the first task rate ẋ is guaranteed to be fulfilled exactly. The second ẋ 2 is guaranteed to be fulfilled as best as possible given that ẋ must be fulfilled, et cetera. This hierarchical projection of task can also be derived by starting with the regularize task variables as in the problem (88) and then iteratively taking the limit i starting with i = up to i = m. More formally, the iterative limit corresponds to i = ɛ m i I di and ɛ. For m = 2 task variables one can prove the euivalence between prioritized inverse kinematics and the hierarchical classical limit of the MAP motion exactly (by directly applying the Woodbury identity). For m > 2 we could not find an elegant proof but we numerically confirmed this limit for up to m = 4. Non-zero task variances can again be interpreted as regularizers. Note that without regularizers the standard prioritized inverse kinematics is numerically brittle. Handling many control signals (e.g., the over-determined case di >n) is problematic since the nullspace-projected Jacobians will become singular (with rank < d i ). For non-zero regularizations i the computations in euation (92) are numerically robust. 3.7 Optimal dynamic control (incl. operational space control) onsider a robot with dynamics = M - (u F ), where M is some generalized mass matrix, F subsumes external (also oriolis and gravitational) forces, and u is the n-dimensional torue control signal. We want to compute a

10 Lecture Notes: (Stochastic) Optimal ontrol, Marc Toussaint July, 2 control signal u which generates an acceleration such that a general task constraint ÿ = J J remains fulfilled while also minimizing the absolute norm u H of the control. The problem and its solution can be written as (problem) u = argmin u u H 2h Hu such that ÿ J JM - (u F ) = (95) (solution) u = T H (ÿ J T F ) (I n T H T ) h, with T = JM-. (96) The solution was taken from (75) by replacing the notation according to u, y (ÿ J JM - F ), J JM -. For h = this solution is identical to Theorem in (Peters et al., 25). Peters et al. discuss in detail stability issues and important special cases of this control scheme. A common special case is H = M -, which is called operational space control. References aerlocher, P., & oulic, R. (24). An inverse kinematic architecture enforcing an arbitrary number of strict priority levels. The Visual omputer. Nakamura, Y., & Hanafusa, H. (986). Inverse kinematic solutions with singularity robustness for robot manipulator control. Journal of Dynamic Systems, Measurement and ontrol, 8. Nakamura, Y., Hanafusa, H., & Yoshikawa, T. (987). Task-priority based redundancy control of robot manipulators. Int. Journal of Robotics Research, 6. Peters, J., Mistry, M., Udwadia, F. E., ory, R., Nakanishi, J., & Schaal, S. (25). A unifying framework for the control of robotics systems. IEEE Int. onf. on Intelligent Robots and Systems (IROS 25) (pp ).

Optimal Control with Learned Forward Models

Optimal Control with Learned Forward Models Optimal Control with Learned Forward Models Pieter Abbeel UC Berkeley Jan Peters TU Darmstadt 1 Where we are? Reinforcement Learning Data = {(x i, u i, x i+1, r i )}} x u xx r u xx V (x) π (u x) Now V

More information

Reconstructing Null-space Policies Subject to Dynamic Task Constraints in Redundant Manipulators

Reconstructing Null-space Policies Subject to Dynamic Task Constraints in Redundant Manipulators Reconstructing Null-space Policies Subject to Dynamic Task Constraints in Redundant Manipulators Matthew Howard Sethu Vijayakumar September, 7 Abstract We consider the problem of direct policy learning

More information

Robotics. Control Theory. Marc Toussaint U Stuttgart

Robotics. Control Theory. Marc Toussaint U Stuttgart Robotics Control Theory Topics in control theory, optimal control, HJB equation, infinite horizon case, Linear-Quadratic optimal control, Riccati equations (differential, algebraic, discrete-time), controllability,

More information

Control of industrial robots. Centralized control

Control of industrial robots. Centralized control Control of industrial robots Centralized control Prof. Paolo Rocco (paolo.rocco@polimi.it) Politecnico di Milano ipartimento di Elettronica, Informazione e Bioingegneria Introduction Centralized control

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Partially Observable Markov Decision Processes (POMDPs)

Partially Observable Markov Decision Processes (POMDPs) Partially Observable Markov Decision Processes (POMDPs) Sachin Patil Guest Lecture: CS287 Advanced Robotics Slides adapted from Pieter Abbeel, Alex Lee Outline Introduction to POMDPs Locally Optimal Solutions

More information

Machine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart

Machine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart Machine Learning Bayesian Regression & Classification learning as inference, Bayesian Kernel Ridge regression & Gaussian Processes, Bayesian Kernel Logistic Regression & GP classification, Bayesian Neural

More information

Robotics. Mobile Robotics. Marc Toussaint U Stuttgart

Robotics. Mobile Robotics. Marc Toussaint U Stuttgart Robotics Mobile Robotics State estimation, Bayes filter, odometry, particle filter, Kalman filter, SLAM, joint Bayes filter, EKF SLAM, particle SLAM, graph-based SLAM Marc Toussaint U Stuttgart DARPA Grand

More information

Advanced Robotic Manipulation

Advanced Robotic Manipulation Advanced Robotic Manipulation Handout CS37A (Spring 017 Solution Set # Problem 1 - Redundant robot control The goal of this problem is to familiarize you with the control of a robot that is redundant with

More information

Robotics I. April 1, the motion starts and ends with zero Cartesian velocity and acceleration;

Robotics I. April 1, the motion starts and ends with zero Cartesian velocity and acceleration; Robotics I April, 6 Consider a planar R robot with links of length l = and l =.5. he end-effector should move smoothly from an initial point p in to a final point p fin in the robot workspace so that the

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Dynamic Programming Marc Toussaint University of Stuttgart Winter 2018/19 Motivation: So far we focussed on tree search-like solvers for decision problems. There is a second important

More information

Probabilistic Prioritization of Movement Primitives

Probabilistic Prioritization of Movement Primitives Probabilistic Prioritization of Movement Primitives Alexandros Paraschos 1,, Rudolf Lioutikov 1, Jan Peters 1,3 and Gerhard Neumann Abstract Movement prioritization is a common approach to combine controllers

More information

Robotics. Kinematics. Marc Toussaint University of Stuttgart Winter 2017/18

Robotics. Kinematics. Marc Toussaint University of Stuttgart Winter 2017/18 Robotics Kinematics 3D geometry, homogeneous transformations, kinematic map, Jacobian, inverse kinematics as optimization problem, motion profiles, trajectory interpolation, multiple simultaneous tasks,

More information

Inverse differential kinematics Statics and force transformations

Inverse differential kinematics Statics and force transformations Robotics 1 Inverse differential kinematics Statics and force transformations Prof Alessandro De Luca Robotics 1 1 Inversion of differential kinematics! find the joint velocity vector that realizes a desired

More information

Machine Learning 4771

Machine Learning 4771 Machine Learning 4771 Instructor: ony Jebara Kalman Filtering Linear Dynamical Systems and Kalman Filtering Structure from Motion Linear Dynamical Systems Audio: x=pitch y=acoustic waveform Vision: x=object

More information

Managing Uncertainty

Managing Uncertainty Managing Uncertainty Bayesian Linear Regression and Kalman Filter December 4, 2017 Objectives The goal of this lab is multiple: 1. First it is a reminder of some central elementary notions of Bayesian

More information

Robotics: Science & Systems [Topic 6: Control] Prof. Sethu Vijayakumar Course webpage:

Robotics: Science & Systems [Topic 6: Control] Prof. Sethu Vijayakumar Course webpage: Robotics: Science & Systems [Topic 6: Control] Prof. Sethu Vijayakumar Course webpage: http://wcms.inf.ed.ac.uk/ipab/rss Control Theory Concerns controlled systems of the form: and a controller of the

More information

Linear Regression (9/11/13)

Linear Regression (9/11/13) STA561: Probabilistic machine learning Linear Regression (9/11/13) Lecturer: Barbara Engelhardt Scribes: Zachary Abzug, Mike Gloudemans, Zhuosheng Gu, Zhao Song 1 Why use linear regression? Figure 1: Scatter

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

Online Learning in High Dimensions. LWPR and it s application

Online Learning in High Dimensions. LWPR and it s application Lecture 9 LWPR Online Learning in High Dimensions Contents: LWPR and it s application Sethu Vijayakumar, Aaron D'Souza and Stefan Schaal, Incremental Online Learning in High Dimensions, Neural Computation,

More information

Lecture «Robot Dynamics»: Dynamics 2

Lecture «Robot Dynamics»: Dynamics 2 Lecture «Robot Dynamics»: Dynamics 2 151-0851-00 V lecture: CAB G11 Tuesday 10:15 12:00, every week exercise: HG E1.2 Wednesday 8:15 10:00, according to schedule (about every 2nd week) office hour: LEE

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

10-725/36-725: Convex Optimization Prerequisite Topics

10-725/36-725: Convex Optimization Prerequisite Topics 10-725/36-725: Convex Optimization Prerequisite Topics February 3, 2015 This is meant to be a brief, informal refresher of some topics that will form building blocks in this course. The content of the

More information

Learning to Control an Octopus Arm with Gaussian Process Temporal Difference Methods

Learning to Control an Octopus Arm with Gaussian Process Temporal Difference Methods Learning to Control an Octopus Arm with Gaussian Process Temporal Difference Methods Yaakov Engel Joint work with Peter Szabo and Dmitry Volkinshtein (ex. Technion) Why use GPs in RL? A Bayesian approach

More information

Probabilistic inference for computing optimal policies in MDPs

Probabilistic inference for computing optimal policies in MDPs Probabilistic inference for computing optimal policies in MDPs Marc Toussaint Amos Storkey School of Informatics, University of Edinburgh Edinburgh EH1 2QL, Scotland, UK mtoussai@inf.ed.ac.uk, amos@storkey.org

More information

Planning and Moving in Dynamic Environments

Planning and Moving in Dynamic Environments Planning and Moving in Dynamic Environments A Statistical Machine Learning Approach Sethu Vijayakumar, Marc Toussaint 2, Giorgios Petkos, and Matthew Howard School of Informatics, University of Edinburgh,

More information

Path Integral Stochastic Optimal Control for Reinforcement Learning

Path Integral Stochastic Optimal Control for Reinforcement Learning Preprint August 3, 204 The st Multidisciplinary Conference on Reinforcement Learning and Decision Making RLDM203 Path Integral Stochastic Optimal Control for Reinforcement Learning Farbod Farshidian Institute

More information

11. Learning graphical models

11. Learning graphical models Learning graphical models 11-1 11. Learning graphical models Maximum likelihood Parameter learning Structural learning Learning partially observed graphical models Learning graphical models 11-2 statistical

More information

Neural Network Training

Neural Network Training Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification

More information

Lecture 6: CS395T Numerical Optimization for Graphics and AI Line Search Applications

Lecture 6: CS395T Numerical Optimization for Graphics and AI Line Search Applications Lecture 6: CS395T Numerical Optimization for Graphics and AI Line Search Applications Qixing Huang The University of Texas at Austin huangqx@cs.utexas.edu 1 Disclaimer This note is adapted from Section

More information

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 2: From Linear Regression to Kalman Filter and Beyond Lecture 2: From Linear Regression to Kalman Filter and Beyond Department of Biomedical Engineering and Computational Science Aalto University January 26, 2012 Contents 1 Batch and Recursive Estimation

More information

Lecture «Robot Dynamics» : Kinematics 3

Lecture «Robot Dynamics» : Kinematics 3 Lecture «Robot Dynamics» : Kinematics 3 151-0851-00 V lecture: CAB G11 Tuesday 10:15-12:00, every week exercise: HG G1 Wednesday 8:15-10:00, according to schedule (about every 2nd week) office hour: LEE

More information

MS&E338 Reinforcement Learning Lecture 1 - April 2, Introduction

MS&E338 Reinforcement Learning Lecture 1 - April 2, Introduction MS&E338 Reinforcement Learning Lecture 1 - April 2, 2018 Introduction Lecturer: Ben Van Roy Scribe: Gabriel Maher 1 Reinforcement Learning Introduction In reinforcement learning (RL) we consider an agent

More information

Reinforcement Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina

Reinforcement Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina Reinforcement Learning Introduction Introduction Unsupervised learning has no outcome (no feedback). Supervised learning has outcome so we know what to predict. Reinforcement learning is in between it

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 12: Gaussian Belief Propagation, State Space Models and Kalman Filters Guest Kalman Filter Lecture by

More information

arxiv: v1 [cs.lg] 20 Sep 2010

arxiv: v1 [cs.lg] 20 Sep 2010 Approximate Inference and Stochastic Optimal Control Konrad Rawlik 1, Marc Toussaint 2, and Sethu Vijayakumar 1 1 Statistical Machine Learning and Motor Control Group, University of Edinburgh 2 Machine

More information

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II 1 Non-linear regression techniques Part - II Regression Algorithms in this Course Support Vector Machine Relevance Vector Machine Support vector regression Boosting random projections Relevance vector

More information

Markov Decision Processes

Markov Decision Processes Markov Decision Processes Noel Welsh 11 November 2010 Noel Welsh () Markov Decision Processes 11 November 2010 1 / 30 Annoucements Applicant visitor day seeks robot demonstrators for exciting half hour

More information

Probabilistic Graphical Models

Probabilistic Graphical Models 2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector

More information

Lecture Schedule Week Date Lecture (M: 2:05p-3:50, 50-N202)

Lecture Schedule Week Date Lecture (M: 2:05p-3:50, 50-N202) J = x θ τ = J T F 2018 School of Information Technology and Electrical Engineering at the University of Queensland Lecture Schedule Week Date Lecture (M: 2:05p-3:50, 50-N202) 1 23-Jul Introduction + Representing

More information

Linear-Quadratic-Gaussian (LQG) Controllers and Kalman Filters

Linear-Quadratic-Gaussian (LQG) Controllers and Kalman Filters Linear-Quadratic-Gaussian (LQG) Controllers and Kalman Filters Emo Todorov Applied Mathematics and Computer Science & Engineering University of Washington Winter 204 Emo Todorov (UW) AMATH/CSE 579, Winter

More information

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 2: From Linear Regression to Kalman Filter and Beyond Lecture 2: From Linear Regression to Kalman Filter and Beyond January 18, 2017 Contents 1 Batch and Recursive Estimation 2 Towards Bayesian Filtering 3 Kalman Filter and Bayesian Filtering and Smoothing

More information

Robotics & Automation. Lecture 25. Dynamics of Constrained Systems, Dynamic Control. John T. Wen. April 26, 2007

Robotics & Automation. Lecture 25. Dynamics of Constrained Systems, Dynamic Control. John T. Wen. April 26, 2007 Robotics & Automation Lecture 25 Dynamics of Constrained Systems, Dynamic Control John T. Wen April 26, 2007 Last Time Order N Forward Dynamics (3-sweep algorithm) Factorization perspective: causal-anticausal

More information

Statistical Techniques in Robotics (16-831, F12) Lecture#20 (Monday November 12) Gaussian Processes

Statistical Techniques in Robotics (16-831, F12) Lecture#20 (Monday November 12) Gaussian Processes Statistical Techniques in Robotics (6-83, F) Lecture# (Monday November ) Gaussian Processes Lecturer: Drew Bagnell Scribe: Venkatraman Narayanan Applications of Gaussian Processes (a) Inverse Kinematics

More information

Inference and estimation in probabilistic time series models

Inference and estimation in probabilistic time series models 1 Inference and estimation in probabilistic time series models David Barber, A Taylan Cemgil and Silvia Chiappa 11 Time series The term time series refers to data that can be represented as a sequence

More information

Robotics I. February 6, 2014

Robotics I. February 6, 2014 Robotics I February 6, 214 Exercise 1 A pan-tilt 1 camera sensor, such as the commercial webcams in Fig. 1, is mounted on the fixed base of a robot manipulator and is used for pointing at a (point-wise)

More information

Operational Space Control of Constrained and Underactuated Systems

Operational Space Control of Constrained and Underactuated Systems Robotics: Science and Systems 2 Los Angeles, CA, USA, June 27-3, 2 Operational Space Control of Constrained and Underactuated Systems Michael Mistry Disney Research Pittsburgh 472 Forbes Ave., Suite Pittsburgh,

More information

Overfitting, Bias / Variance Analysis

Overfitting, Bias / Variance Analysis Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic

More information

Introduction. Chapter 1

Introduction. Chapter 1 Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics

More information

Least Squares Regression

Least Squares Regression E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute

More information

Linear Algebra and Robot Modeling

Linear Algebra and Robot Modeling Linear Algebra and Robot Modeling Nathan Ratliff Abstract Linear algebra is fundamental to robot modeling, control, and optimization. This document reviews some of the basic kinematic equations and uses

More information

Complexity of stochastic branch and bound methods for belief tree search in Bayesian reinforcement learning

Complexity of stochastic branch and bound methods for belief tree search in Bayesian reinforcement learning Complexity of stochastic branch and bound methods for belief tree search in Bayesian reinforcement learning Christos Dimitrakakis Informatics Institute, University of Amsterdam, Amsterdam, The Netherlands

More information

Partially Observable Markov Decision Processes (POMDPs) Pieter Abbeel UC Berkeley EECS

Partially Observable Markov Decision Processes (POMDPs) Pieter Abbeel UC Berkeley EECS Partially Observable Markov Decision Processes (POMDPs) Pieter Abbeel UC Berkeley EECS Many slides adapted from Jur van den Berg Outline POMDPs Separation Principle / Certainty Equivalence Locally Optimal

More information

Manipulators. Robotics. Outline. Non-holonomic robots. Sensors. Mobile Robots

Manipulators. Robotics. Outline. Non-holonomic robots. Sensors. Mobile Robots Manipulators P obotics Configuration of robot specified by 6 numbers 6 degrees of freedom (DOF) 6 is the minimum number required to position end-effector arbitrarily. For dynamical systems, add velocity

More information

LQR, Kalman Filter, and LQG. Postgraduate Course, M.Sc. Electrical Engineering Department College of Engineering University of Salahaddin

LQR, Kalman Filter, and LQG. Postgraduate Course, M.Sc. Electrical Engineering Department College of Engineering University of Salahaddin LQR, Kalman Filter, and LQG Postgraduate Course, M.Sc. Electrical Engineering Department College of Engineering University of Salahaddin May 2015 Linear Quadratic Regulator (LQR) Consider a linear system

More information

Statistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes

Statistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes Statistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes Lecturer: Drew Bagnell Scribe: Venkatraman Narayanan 1, M. Koval and P. Parashar 1 Applications of Gaussian

More information

Least Squares Regression

Least Squares Regression CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the

More information

Coarticulation in Markov Decision Processes

Coarticulation in Markov Decision Processes Coarticulation in Markov Decision Processes Khashayar Rohanimanesh Department of Computer Science University of Massachusetts Amherst, MA 01003 khash@cs.umass.edu Sridhar Mahadevan Department of Computer

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft

More information

ELEC-E8119 Robotics: Manipulation, Decision Making and Learning Policy gradient approaches. Ville Kyrki

ELEC-E8119 Robotics: Manipulation, Decision Making and Learning Policy gradient approaches. Ville Kyrki ELEC-E8119 Robotics: Manipulation, Decision Making and Learning Policy gradient approaches Ville Kyrki 9.10.2017 Today Direct policy learning via policy gradient. Learning goals Understand basis and limitations

More information

Robotics. Dynamics. Marc Toussaint U Stuttgart

Robotics. Dynamics. Marc Toussaint U Stuttgart Robotics Dynamics 1D point mass, damping & oscillation, PID, dynamics of mechanical systems, Euler-Lagrange equation, Newton-Euler recursion, general robot dynamics, joint space control, reference trajectory

More information

Robust Control of Cooperative Underactuated Manipulators

Robust Control of Cooperative Underactuated Manipulators Robust Control of Cooperative Underactuated Manipulators Marcel Bergerman * Yangsheng Xu +,** Yun-Hui Liu ** * Automation Institute Informatics Technology Center Campinas SP Brazil + The Robotics Institute

More information

A Study of Morphological Computation by using Probabilistic Inference for Motor Planning

A Study of Morphological Computation by using Probabilistic Inference for Motor Planning A Study of Morphological Computation by using Probabilistic Inference for Motor Planning Elmar Rückert Institute for Theoretical Computer Science Graz University of Technology Graz, 8010, Austria rueckert@igi.tugraz.at

More information

The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan

The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan Background: Global Optimization and Gaussian Processes The Geometry of Gaussian Processes and the Chaining Trick Algorithm

More information

Lecture «Robot Dynamics»: Dynamics and Control

Lecture «Robot Dynamics»: Dynamics and Control Lecture «Robot Dynamics»: Dynamics and Control 151-0851-00 V lecture: CAB G11 Tuesday 10:15 12:00, every week exercise: HG E1.2 Wednesday 8:15 10:00, according to schedule (about every 2nd week) Marco

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

A new large projection operator for the redundancy framework

A new large projection operator for the redundancy framework 21 IEEE International Conference on Robotics and Automation Anchorage Convention District May 3-8, 21, Anchorage, Alaska, USA A new large projection operator for the redundancy framework Mohammed Marey

More information

Linear Models for Classification

Linear Models for Classification Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,

More information

Robotics Part II: From Learning Model-based Control to Model-free Reinforcement Learning

Robotics Part II: From Learning Model-based Control to Model-free Reinforcement Learning Robotics Part II: From Learning Model-based Control to Model-free Reinforcement Learning Stefan Schaal Max-Planck-Institute for Intelligent Systems Tübingen, Germany & Computer Science, Neuroscience, &

More information

EM-based Reinforcement Learning

EM-based Reinforcement Learning EM-based Reinforcement Learning Gerhard Neumann 1 1 TU Darmstadt, Intelligent Autonomous Systems December 21, 2011 Outline Expectation Maximization (EM)-based Reinforcement Learning Recap : Modelling data

More information

Learning Operational Space Control

Learning Operational Space Control Robotics: Science and Systems 26 Philadelphia, PA, USA, August 16-19, 26 Learning Operational Space Control Jan Peters, Stefan Schaal Departments of Computer Science & Neuroscience University of Southern

More information

Linear Models in Machine Learning

Linear Models in Machine Learning CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,

More information

Linear Dynamical Systems

Linear Dynamical Systems Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations

More information

Using Gaussian Processes for Variance Reduction in Policy Gradient Algorithms *

Using Gaussian Processes for Variance Reduction in Policy Gradient Algorithms * Proceedings of the 8 th International Conference on Applied Informatics Eger, Hungary, January 27 30, 2010. Vol. 1. pp. 87 94. Using Gaussian Processes for Variance Reduction in Policy Gradient Algorithms

More information

Linear Dynamical Systems (Kalman filter)

Linear Dynamical Systems (Kalman filter) Linear Dynamical Systems (Kalman filter) (a) Overview of HMMs (b) From HMMs to Linear Dynamical Systems (LDS) 1 Markov Chains with Discrete Random Variables x 1 x 2 x 3 x T Let s assume we have discrete

More information

13 : Variational Inference: Loopy Belief Propagation and Mean Field

13 : Variational Inference: Loopy Belief Propagation and Mean Field 10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction

More information

Nonlinear Observers. Jaime A. Moreno. Eléctrica y Computación Instituto de Ingeniería Universidad Nacional Autónoma de México

Nonlinear Observers. Jaime A. Moreno. Eléctrica y Computación Instituto de Ingeniería Universidad Nacional Autónoma de México Nonlinear Observers Jaime A. Moreno JMorenoP@ii.unam.mx Eléctrica y Computación Instituto de Ingeniería Universidad Nacional Autónoma de México XVI Congreso Latinoamericano de Control Automático October

More information

11 a 12 a 21 a 11 a 22 a 12 a 21. (C.11) A = The determinant of a product of two matrices is given by AB = A B 1 1 = (C.13) and similarly.

11 a 12 a 21 a 11 a 22 a 12 a 21. (C.11) A = The determinant of a product of two matrices is given by AB = A B 1 1 = (C.13) and similarly. C PROPERTIES OF MATRICES 697 to whether the permutation i 1 i 2 i N is even or odd, respectively Note that I =1 Thus, for a 2 2 matrix, the determinant takes the form A = a 11 a 12 = a a 21 a 11 a 22 a

More information

Anytime Planning for Decentralized Multi-Robot Active Information Gathering

Anytime Planning for Decentralized Multi-Robot Active Information Gathering Anytime Planning for Decentralized Multi-Robot Active Information Gathering Brent Schlotfeldt 1 Dinesh Thakur 1 Nikolay Atanasov 2 Vijay Kumar 1 George Pappas 1 1 GRASP Laboratory University of Pennsylvania

More information

Gradient Methods for Markov Decision Processes

Gradient Methods for Markov Decision Processes Gradient Methods for Markov Decision Processes Department of Computer Science University College London May 11, 212 Outline 1 Introduction Markov Decision Processes Dynamic Programming 2 Gradient Methods

More information

Closed-loop fluid flow control with a reduced-order model gain-scheduling approach

Closed-loop fluid flow control with a reduced-order model gain-scheduling approach Closed-loop fluid flow control with a reduced-order model gain-scheduling approach L. Mathelin 1 M. Abbas-Turki 2 L. Pastur 1,3 H. Abou-Kandil 2 1 LIMSI - CNRS (Orsay) 2 SATIE, Ecole Normale Supérieure

More information

Intelligent Control. Module I- Neural Networks Lecture 7 Adaptive Learning Rate. Laxmidhar Behera

Intelligent Control. Module I- Neural Networks Lecture 7 Adaptive Learning Rate. Laxmidhar Behera Intelligent Control Module I- Neural Networks Lecture 7 Adaptive Learning Rate Laxmidhar Behera Department of Electrical Engineering Indian Institute of Technology, Kanpur Recurrent Networks p.1/40 Subjects

More information

Robotics 1 Inverse kinematics

Robotics 1 Inverse kinematics Robotics 1 Inverse kinematics Prof. Alessandro De Luca Robotics 1 1 Inverse kinematics what are we looking for? direct kinematics is always unique; how about inverse kinematics for this 6R robot? Robotics

More information

Probabilistic Robotics

Probabilistic Robotics University of Rome La Sapienza Master in Artificial Intelligence and Robotics Probabilistic Robotics Prof. Giorgio Grisetti Course web site: http://www.dis.uniroma1.it/~grisetti/teaching/probabilistic_ro

More information

EL2520 Control Theory and Practice

EL2520 Control Theory and Practice EL2520 Control Theory and Practice Lecture 8: Linear quadratic control Mikael Johansson School of Electrical Engineering KTH, Stockholm, Sweden Linear quadratic control Allows to compute the controller

More information

Data Structures for Efficient Inference and Optimization

Data Structures for Efficient Inference and Optimization Data Structures for Efficient Inference and Optimization in Expressive Continuous Domains Scott Sanner Ehsan Abbasnejad Zahra Zamani Karina Valdivia Delgado Leliane Nunes de Barros Cheng Fang Discrete

More information

A Residual Gradient Fuzzy Reinforcement Learning Algorithm for Differential Games

A Residual Gradient Fuzzy Reinforcement Learning Algorithm for Differential Games International Journal of Fuzzy Systems manuscript (will be inserted by the editor) A Residual Gradient Fuzzy Reinforcement Learning Algorithm for Differential Games Mostafa D Awheda Howard M Schwartz Received:

More information

(W: 12:05-1:50, 50-N202)

(W: 12:05-1:50, 50-N202) 2016 School of Information Technology and Electrical Engineering at the University of Queensland Schedule of Events Week Date Lecture (W: 12:05-1:50, 50-N202) 1 27-Jul Introduction 2 Representing Position

More information

Linear Models for Regression. Sargur Srihari

Linear Models for Regression. Sargur Srihari Linear Models for Regression Sargur srihari@cedar.buffalo.edu 1 Topics in Linear Regression What is regression? Polynomial Curve Fitting with Scalar input Linear Basis Function Models Maximum Likelihood

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm The Expectation-Maximization Algorithm Francisco S. Melo In these notes, we provide a brief overview of the formal aspects concerning -means, EM and their relation. We closely follow the presentation in

More information

The Jacobian. Jesse van den Kieboom

The Jacobian. Jesse van den Kieboom The Jacobian Jesse van den Kieboom jesse.vandenkieboom@epfl.ch 1 Introduction 1 1 Introduction The Jacobian is an important concept in robotics. Although the general concept of the Jacobian in robotics

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

ECONOMETRIC METHODS II: TIME SERIES LECTURE NOTES ON THE KALMAN FILTER. The Kalman Filter. We will be concerned with state space systems of the form

ECONOMETRIC METHODS II: TIME SERIES LECTURE NOTES ON THE KALMAN FILTER. The Kalman Filter. We will be concerned with state space systems of the form ECONOMETRIC METHODS II: TIME SERIES LECTURE NOTES ON THE KALMAN FILTER KRISTOFFER P. NIMARK The Kalman Filter We will be concerned with state space systems of the form X t = A t X t 1 + C t u t 0.1 Z t

More information

Policy Gradient Reinforcement Learning for Robotics

Policy Gradient Reinforcement Learning for Robotics Policy Gradient Reinforcement Learning for Robotics Michael C. Koval mkoval@cs.rutgers.edu Michael L. Littman mlittman@cs.rutgers.edu May 9, 211 1 Introduction Learning in an environment with a continuous

More information

CS281 Section 4: Factor Analysis and PCA

CS281 Section 4: Factor Analysis and PCA CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we

More information

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015 EE613 Machine Learning for Engineers Kernel methods Support Vector Machines jean-marc odobez 2015 overview Kernel methods introductions and main elements defining kernels Kernelization of k-nn, K-Means,

More information

MEAM 520. More Velocity Kinematics

MEAM 520. More Velocity Kinematics MEAM 520 More Velocity Kinematics Katherine J. Kuchenbecker, Ph.D. General Robotics, Automation, Sensing, and Perception Lab (GRASP) MEAM Department, SEAS, University of Pennsylvania Lecture 12: October

More information

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 9 Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 2 Separable convex optimization a special case is min f(x)

More information