Optimal path planning using Cross-Entropy method

Size: px
Start display at page:

Download "Optimal path planning using Cross-Entropy method"

Transcription

1 Optimal path planning using Cross-Entropy method F Celeste, FDambreville CEP/Dept of Geomatics Imagery Perception 9 Arcueil France {francisceleste, fredericdambreville}@etcafr J-P Le Cadre IRISA/CNRS Campus de Beaulieu 350 Rennes France lecadre@irisafr Abstract - This paper addresses the problem of optimizing the navigation of an intelligent mobile in a real world environment, described by a map The map is composed of features representing natural landmars in the environment The vehicle is equipped with a sensor which allows it to obtain range and bearing measurements from observed landmars during the execution These measurements are correlated with the map to estimate its position The optimal trajectory must be designed in order to control a measure of the performance for the filtering algorithm used for the localization tas As the mobile state and the measurements are random, a well-suited measure can be a functional of the approximate Posterior Cramer-Rao Bound A natural way for optimal path planning is to use this measure of performance within a (constrained) Marovian Decision Process framewor However, due to the functional characteristics, Dynamic Programming method is generally irrelevant To face that, we investigate a learning approach based on the Cross- Entropy method Keywords: Marov Decision Process, planning, estimation, Posterior Cramer Rao Bound, Cross Entropy method Introduction In this paper we are concerned with the tas of finding a plan for a vehicle moving around in its environment That is to say, reaching a given goal position from an initial position In many application, it is crucial to be able to estimate accurately the state of the mobile during the execution of the plan So it seems necessary to couple the planning and the execution stages One way to achieve that aim is to propose trajectories where the performance of the localization algorithm can be guaranteed This problem have been well studied and in most approaches the environment is discretized and described as a graph whose nodes correspond to particular area and edges are actions to move from one place to another Some of these previous contributions address the problem within a Marov Decision Process (MDP) Such approaches also use a graph representation of the mobile s state space and provide theoretical framewor to solve it in some cases In the present paper, we will also use the constrained MDP framewor, as in[] Our optimality criterion is based on the Posterior Cramer-Rao bound However, the nature of the objective function for path planning maes it impossible to perform complete optimization of MDP with classical means Indeed, we will show that the reward in one stage of our MDP depends on all the history of the trajectory To solve the problem, the Cross-Entropy originally used for rare-events simulation seemed a valuable tool The paper is organized in five parts In the second section we introduce the problem in details Section three deals with the Posterior Cramér-Rao bound and its properties We also derive its formulation for our problem and the criterion for the optimization In section four, we mae a short introduction of the Crossentropy and show how to apply it to our needs Finally, in section five the results of a first example are discussed Problem statement Let X, A and Z respectively denote the state of the vehicle, the action and the vector of observations at time We consider that the state vector X is the position of the vehicle in the x-axis, y-axis, so that X (x, y ) The action vector A (a x, a y ) is restricted to the set { d, 0, +d} { d, 0, +d} with d R + given constant So, the state {X } motion is governed by a dynamic model and an initial uncertainty, given by : X 0 π 0 N(X 0, P0) X + = f(x, A ) + w w N(0, Q ) where {w } is a white noise sequence and the symbol means distributed according to The nown map M is composed of N f features (m i ) i Nf with respective position (x i, y i ) R At time, due to the sensor capabilities, only a few landmars can be observed So, the observation process depends on the number of visible features Moreover, we do not consider data association and non detection problem for the moment That is to say, each observation is made from one landmar represented in the map If we denote I v () the indexes The symbol N means normal

2 of visible landmars at time, the measurement vector received at time is : with j I v () : Z = {z (j)} j Iv() zr(j) = (x x j ) + (y y j ) + γ z r (j) (j) zβ (j) = arctan( yj y x j x ) θ + γβ (j) s f s i where θ is the global mobile orientation, {γr (j)} and {γβ (j)} are considered as white Gaussian noise and mutually independent Nevertheless, a more complex modeling must be considered if we want to tae into account correlated errors in the map or observations 3 A Marov Decision Process model with constraints As in [], we first formulate the problem within the Marov decision Process framewor Generally, a Marov Decision Process is defined by its state and action sets, the state transition probabilities and reward functions In our case, the state and action spaces are discrete and finite Indeed the map is discretized in N s = N x N y locations and one action is defined as a move between two neighbor points (figure ), where N x and N y are the number of grid points in both axis At each point, the mobile can choose one action among N a So we can model the process with: S = {,, N s } the state space A = {,, N a } the action space T ss (a) Pr(s + = s s = s, a = a) the transition function R ss (a) the reward or cost function, associated with transition from state s to state s, for an action a Finding the optimal path between two states s i and s f is equivalent to determine the best sequence of actions or decisions Va = (a 0,, a K ) allowing to simultaneously connecting them and optimizing a (global) criterion In our application, we consider only possible actions for one state (figure ) Each of them can be selected except for the points located on the border of the map and in places where obstacles can be found Moreover, we tae into account operational constraints on the mobile dynamic between two following epochs as in [] It is possible to do that by defining one authorized transition matrix δ(a, a ) which indicates actions that can be chosen at time according to the choice at time For example, if only [ π, π ] headings controls are allowed, such a matrix can be expressed as below : eg Marov Random Fields 3 eg colored noise or biases Figure : MDP grid with states (red crosses) Trajectories with (solid line) and without(dashed line) [ π, π ] constrained headings change δ = a a The actions are clocwise numbered from ( go up and right ) to ( go up ) The above δ matrix indicates that if action is chosen at time, only actions,, 3, 7 and can be selected at time + Moreover, the mobile orientation θ is restricted to values directly lined with the orientation of the elementary displacements In the Marov Decision Process context when the reward is completely nown, optimal policy can be computed using Dynamic Programming technique or similar algorithms such as Value Iteration and Policy Iteration [] They are based on the principle of optimality due to Bellman However, applicability of this principle implies some basic assumptions of the cost functional It is not the case in our context where the criterion of optimality depends on the Posterior Cramér Rao Bound (PCRB) matrix which is introduced in next section Indeed, as shown in [], any cost functional must satisfy the Matrix Dynamic Programming Property to guarantee the principle of optimality For the determinant functional used in our wor, this property is not verified [] 3 Posterior Cramér-Rao Bound 3 Definition In this section, we briefly remind the properties of the Cramér-Rao bound for estimation problem Let ˆX(Z)

3 be an unbiased estimator of an unnown random vector X R d based on random observations Z The PCRB is given by the inverse of the Fisher Information Matrix (FIM) F [9] and measures the minimum mean square errors of ˆX(Z) : where H(X, j) = z r (j) x z β (j) x z r (j) y z β (j) y, (9) E{( ˆX(Z) X)( ˆX(Z) X) T } F (X, Z), () where A B means (A B) is positive semidefinite and : F E [ X X log (p x,z (X, Z)) ] is the second-order partial derivatives operator p x,z is the joint probability density of (X, Y ) For the filtering case, it can be shown [] that : ( ) ( ) T E{ ˆX X ˆX X V a } J (V with X the state at time, ˆX ˆX (Z (:), Va ) the estimate based on Z (:) all measurements observed until time and Va = {a,, a } the sequence of chosen actions until time 3 Tichavsy PCRB recursion A remarable contribution of Tichasy et al[] was to introduce a Ricatti lie recursion for computing J : where a ) J + = D D (J + D ) D () D = E{ X X log (p(x + X ))}, (3) D = E{ X + X log (p(x + X ))}, () D = [D ] T, (5) D = E{ X + X + log (p(x + X )} + () E{ X + X + log (p(z + X + ) } (7) The relation is valid under the assumptions that the mentioned second-order derivatives, expectations and matrix inverses exist The initial information matrix J 0 can be calculated from π 0 The measurements only contribute through the D term The dynamic being linear, the following equations are easily derived : J 0 = P0, () D = Q, D = Q, D = [Q ]T, D = Q + J + (Z) The observations contribution J + (Z) is given by the following expression : J (Z) = E X { H(X, j) T R H(X, j)}, j I v(x + ) Actually, this is a lower bound which may be reasonably accurate and R + is the covariance matrix of the combined visible observations We assume that this matrix only depends on the current position Obviously, there is no explicit expression for J + (Z) because the observation model is nonlinear Consequently, it is necessary to resort to Monte Carlo simulation to approximate it : draw N c (X i 0 ) i N c according to π 0, until l = draw (Xl+ i ) i N c from (Xl i) i N c according to the prediction equation (), for each X i + compute I v (X i + ) then H(X i +, j), j I v(x i + ) and R + For our observation model, the matrix H(X, j) is as follows : H(X, j) = (x x j) d j (y y j) d j (y y j) d j (x x j) d j, (0) where d j = (x x j ) +(y y j ) The approximation can then be obtained : J + N c i I v(x i + ) H(X i +, j) T R + H(Xi +, j) 33 A criterion for path planning We are interested in finding one or more paths connecting two points which minimize a functional φ of the PCRB along the trajectory [] We consider two functionals which depend on the determinant of the history of approximated PCRB {J0,, JK } The determinant is lined to the volume of the error ellipse of the position estimate : minimizing a weighted mean volume along the trajectory φ (J 0:K ) = K =0 w det(j ) minimizing the final error of estimation φ (J 0:K ) = det(j K ) In figure, 90% error ellipses are computed and drawn along two trajectories of the same length The size of the ellipses decreases rapidly for the trajectory which goes towards the area with landmars Since a classical optimization method is irrelevant, we propose to investigate a learning approach based on the innovative Cross Entropy (CE) method developed by Rubinstein et al [5]

4 Figure : example of 90% error ellipses for trajectories of the same length (features are in blac (o)) Cross Entropy algorithm In this section we briefly introduce the Cross Entropy algorithm The reader interested in the CE method and its convergence properties should refer to [5, ] A short introduction The Cross Entropy method was first used to estimate the probability of rare events A rare event is an event with very small probabilities It was then adapted for optimization assuming that sampling around the optimum of a function is a rare event Simulation of rare event Let X a random variable on a space X, p x its probability density function (pdf) and φ a function on X Suppose, we are concerned with estimating l(γ) the probability of the event F γ = {x X φ(x) γ} with γ R + F γ is a rare event if l(γ) is very small An unbiased estimate can be obtained via Crude Monte Carlo simulation Given a random sample (x,, x N ) drawn from p x, this estimate is : ˆl(γ) = I [φ(x i ) γ] N For rare event, the variance of ˆl(γ) is very high and it is necessary to increase N to improve the estimation The estimate properties can be improved with variance minimization technique such as importance sampling where the random sample is drawn from a more appropriate probability density function q It can be easily shown that the optimal q pdf is given by I [f(x) > γ]p x (x)/l(γ) Nevertheless, q depends on l(γ) which needs to be estimated To get round this difficulty, it can be convenient to choose q in a family of pdfs {π(, λ) λ Λ} The idea is to find the optimal parameter λ such as D(q, π(, λ)) is minimized where D is the Kullbac-Leibler pseudo-distance : D(p, q) = E p ln p q = p(x)ln p(x)dx q(x)ln p(x)dx X X Minimizing D(q, π(, λ)) is equivalent to maximize X q (x) ln π(, λ)dx which implies : λ arg max λ Λ E p (I [φ(x) γ] ln π(x, λ)) () The computation of the expectation in must also be done using importance sampling So we need a change of measure q, drawing one sample (x,, x N ) from q and estimate λ as follows : ˆλ arg max λ Λ ( I [φ(x i ) γ] p ) x(x i ) q(x i ) ln π(x i, λ) () The family {π(, λ) λ Λ} must be chosen to easily solve equation Natural Exponential Families 5 (NEF)[5] are especially adapted however, q is not nown in equation The CE algorithm tries to overcome this difficulty by constructing adaptively a sequence of parameters (γ t t ) and (λ t t ) such as : F γ is not a rare event F γt+ is not a rare event for π(, λ t ) lim t = γ More precisely, given ρ ]0, [ : choose λ 0 such as π(, λ 0 ) = p x draw (x,, x N ) from π(, λ t ) sort in increasing order (φ(x ),, φ(x N )) and evaluate the ( ρ)) quantile γ t compute λ t as λ t argmax λ Λ ( I [φ(x i ) γ t ] ) p x (x i ) π(x i, λ t ) ln π(x i, λ) if γ t < γ, set t = t + and go to step Else estimate the probability of F γ with : ˆl(γ) = p x (x i ) I [φ(x i ) γ] N π(x i, λ t ) This is the main CE algorithm but other versions can be found in [5] application to optimization The CE was adapted to solve optimization problem Consider the optimization problem : φ(x ) = γ = max φ(x) (3) x X The principle of CE for optimization is to translate the problem 3 into an associated stochastic problem and then solved it adaptively as the simulation of a rare event If γ is the optimum of φ, F γ is generally a rare event The main idea is to define a family π(, λ) λ Λ 5 pdfs f λ (x) = c(λ)e λ t(x) h(x)

5 and iterate enough the CE algorithm such as γ t γ to draw samples around the optimum Moreover, one has to notice that unlie the other local random search algorithm such as simulated annealing which used the assumption of local neighborhood hypothesis, the CE method tries to solve globally the optimization problem To apply the CE approach, we need to : define a selection rate ρ choose a well-suited family of pdf π(, λ) λ Λ define a criterion to stop the iteration Once these elements are given the algorithm for the optimization proceeds as follows : Initialize λ t = λ 0 Generate a sample of size N (x t i ) i N from π(, λ t ), compute (φ(x t i )) i N and order them from smallest to biggest Estimate γ t as the ( ρ) sample percentile 3 update λ t with : λ t+ = arg max λ N I [ φ(x t ] i) γ t ln π(x t i, λ) repeat from step until convergence () 5 assume convergence is reached at t = t, an optimal value for φ can be done by drawing from π(, λ t ) Application to the path planning tas In this part we deal with the application of the CE methods to our tas First of all, it is necessary to define the random mechanism (π(, λ) λ Λ) to generate sample of trajectories More precisely we want to generate sample which : start from s i and end at s f respect the authorized matrix δ(a, a + ), have whole length less than T max One way to achieve that is to use a probability matrix P sa = (p sa ) with s {,, N s } and a {,, N a } defining the probability of choosing action a in state s P sa is a N s N a matrix (in our case N a = ) p p p 7 p P sa = (5) p Ns p Ns p Ns7 p Ns with s, the s th row P s () is one dimensional discrete probability such as : P s (i) = p si, i =,, with p si = So, to solve our problem we are concerned with the optimization of the N s N a parameters (p sa ) using the CE algorithm Trajectory generation At each iteration of the CE algorithm, we want to generate from P sa N trajectories of length T T max starting from s i and ending within s f and drawn in accordance with the authorized transition matrix δ Let first introduce : A( s, ã) = {a s = s, δ(a = a, a = ã) = } P s () such as, a A( s, ã), p sa p sa = P a A( s,ã) p sa else p sa = 0; A( s, ã) is the admissible actions at time in state s = s nowing that ã was chosen at time and P s () is the normalized restriction of P s () to A( s, ã) We proceed as follows for one iteration for the CE: Table : Path generation principle j = 0 while (j < N) = 0, set s 0 = s i generate one action a j 0 according to P s0 and apply it set = + and T = until s = s f do - compute A( s, ã) if A( s, ã) - generate one action a A(s, a ) according to P s and apply it else stop and set j = j - set = + and T = T + if T > T max stop and set j = j else return x(j) = (s i, a j 0, sj, aj,, sj, aj, s f) j = j + Updating the P sa matrix At each iteration t of the Cross Entropy algorithm, given the N trajectories (x(j)) j N (starting from s i to s f and respecting to the matrix δ) generated via the algorithm described in table, we can evaluate each one by calculating the PCRB sequence and applying our functional φ ( {, }) Then the parameter γ t can be evaluated One can update P sa by solving Let x(j) = (s i, a j 0, sj, aj,, sj, aj, s f) be one trajectory, we have : ln π(x(j),p sa ) = I [{x(j) χ sa }] lnp sa () i=0

6 where I is the indicator function and {x(j) χ sa } means that the trajectory contains a visit to state s in which action a is taen Since for each s, the row P s () is a discrete probability, must be solved under the condition that the rows of P sa sum up to This yields to solve the following problem using Lagrange multipliers (µ s ) { s Ns} : 0 max P sa min µ,,µ Ns N I [φ (x(j)) γ t ] lnπ(x(j),p sa ) (7) j= ( N s + µ s a= s= a= ) (p sa ) with lnπ(x(j),p sa ) given in We can derive after differentiation with respect each parameter p sa (s {,, N s } a {,, }) the following equation N I [φ (x(j)) γ t ] I [{x(j) χ sa }] = µ s p sa j= After summing over a =,, and applying the condition a= p sa =, we can obtain µ s and the final updating formula µ s = N I [φ (x(j)) γ t ] I [{x(j) χ s }] () j= N j= p sa = I [{φ (x(j)) γ t }] I [{x(j) χ sa }] N j= I [{φ (x(j)) γ t }] I [{x(j) χ s }] (9) where {x(j) χ s } means that the trajectory contains a visit to state s For the first iteration, s, P s () is a uniform probability density function 5 Simulation results The algorithm has not been widely tested, and only one simple scenario is introduced in this paper In this example, the map is defined on [, ] [, ] with m j point features (figure 3) The state space is discretized in N s = 5 5 states The initial and terminal states are respectively in positions (0, 0) and (0, 0) m 0 m m m 3 m m 5 x y Table : the features of the map Figure 3: The maximum lielihood optimal solution after the first iteration of the CE Starting and terminal states ( ) and map features ( ) The mobile can only apply [ π ; π ] headings controls at each time Thus the δ matrix is exactly the same defined as in section Only trajectories with length T 30 are admissible For the observation model, a landmar m j is visible at time provided that: { r min z r (j) r max z β (j) θ max with r min = 000, r max = and θ max = 0 deg The noise variance σa on the range and σ β on the bearing are the same for all features and are time independent : σ a = 50 3, σ β = 05 deg The computation of the PCRB matrices was performed with N c = 000 to estimate the observation term J (Z) For the optimization step, the Cross Entropy algorithm was implemented with 000 iterations, N = 5000 admissible trajectories and a selective rate ρ = 0 That is to say, the 50 best samples are used for updating the psa probabilities Figure show the optimal trajectory found by the algorithm at iteration 000 for the performance function φ 0 For the mobile dynamic model, the elementary displacement d is equal to and the noise process covariance is defined by: P 0 = σ I, Q = σ I, where I is the identity matrix of size and σ = Figure : The most liely optimal trajectory found by the algorithm for φ

7 5 Analysis As expected (figure ) the mobile is guided toward the area with landmars in order to improve its performance of localization Moreover, it operates to eep the landmars visible while the maneuvers (δ) and the time constraints (T max ) allow it We can also notice that the algorithm converge rapidly to a solution To illustrate that we present in the next figure, the evolution of parameters γ and the minimum value of φ (or the maximum of φ ) at each iteration of the CE algorithm (figure 5) Figure : state (squares) with pdf different from a dirac after convergence max( φ ) gamma Figure 5: Evolution of γ (solid line) and the minimum value of the functional (dashed line) When we loo at precisely after convergence the densities (P s ()) for all s in the optimal trajectory we can notice that some of them are not a dirac probability law In some state, The most liely trajectory optimal trajectory is composed of 30 states, only 3 have their associated P s equivalent to a dirac probability law Table 3 shows the probability density functions for the others (see figure ) In these states the al Table 3: probability densities functions gorithm converge toward a multi-modal density Two actions can be chosen but with different probability We can notice that this behavior are concentrated on state where maneuvers can be done to increase the time to observe the landmars Conclusions and perspectives In this paper, we presented a framewor to solve a path planning tas for a mobile with the The problem was discretized and a Marov Decision Process with constraints on the mobile maneuver was used Our main goal was to find the optimal trajectory according to a measure of capability of estimating accurately the state of the mobile during the execution Functionals of the Posterior Cramr-Rao Bound was used as the criterion of performance The main contribution of the paper is the use of the Cross Entropy algorithm to solve the optimization step as Dynamic Programming could not be applied This approach was tested on a simple first example and seems to be relevant Future wor will first concentrate on the complete implementation of the algorithm and applications to more examples More analysis on the probability has to be made We will also investigate a continuous approach and try to approximate the computation of the observation contribution to the PCRB, which is time consuming The tuning of the Cross-Entropy to our specific tas was not studied, some experiments have to be carried out based on device given in [5] Finally, we want to consider more complex maps such those used in Geographical Information Systems and tae into account measurement models with data association and non detection problem References [] S Paris, J-P Le Cadre Planning for Terrain- Aided Navigation, Fusion 00, Annapolis (USA), pp 007 0, 7- Jul 00 [] J-P Le Cadre and O Tremois, The Matrix Dynamic Programming Property and its Implications SIAM Journal on Matrix Analysis, (): pp -, April 997

8 [3] R Enns and D Morrell, Terrain-Aided Navigation Using the Viterbi Algorithm, Journal of Guidance, Control, and Dynamics, vol, no, pp 9, November-December 995 [] P Tichavsy, CH Muravchi, A Nehorai Posterior Cramér-Rao Bounds for Discrete-Time Nonlinear Filtering,IEEE Transactions on Signal Processing, Vol, no 5, pp 3 39, May 99 [5] R Rubinstein,D P Kroese The Cross-Entropy method An unified approach to Combinatorial Optimization, Monte-Carlo Simulation, and Machine Learning, Information Science & Statistics, Springer 00 [] de-boer,p Kroese, D Mannor and RRubinstein A tutorial on the cross-entropy method, ptdeboer/ce/ [7] S Mannor, R Rubinstein, YGat The Cross Entropy method for Fast Policy Search, Procof the 0 th IC on Machine Learning, 003 [] R S Sutton and AGBarto, Reinforcement Learning An Introduction A Bradford boo, 000 [9] HL Van Trees, Detection, Estimation and Modulation Theory, New Yor Wiley, 9

E Celeste and F Dambreville IRISA/CNRS Campus de Beaulieu

E Celeste and F Dambreville IRISA/CNRS Campus de Beaulieu OPTIMAL STRATEGIES FOR MOBILE ROBOTS BASED ON THE CROSS-ENTROPY ALGORITHM J.-P. Le Cadre E Celeste and F Dambreville IRISA/CNRS Campus de Beaulieu 35042 Rennes lecadre @irisa.fr CEP/Dept. of Geomatics

More information

Evaluation of a robot learning and planning via Extreme Value Theory

Evaluation of a robot learning and planning via Extreme Value Theory Evaluation of a robot learning and planning via Extreme Value Theory F. CELESTE, F. DAMBREVILLE CEP/Dept. of Geomatics Imagery Perception 94114 Arcueil, France Emails: francis.celeste, frederic.dambreville}@etca.fr

More information

Cross entropy-based importance sampling using Gaussian densities revisited

Cross entropy-based importance sampling using Gaussian densities revisited Cross entropy-based importance sampling using Gaussian densities revisited Sebastian Geyer a,, Iason Papaioannou a, Daniel Straub a a Engineering Ris Analysis Group, Technische Universität München, Arcisstraße

More information

Fisher Information Matrix-based Nonlinear System Conversion for State Estimation

Fisher Information Matrix-based Nonlinear System Conversion for State Estimation Fisher Information Matrix-based Nonlinear System Conversion for State Estimation Ming Lei Christophe Baehr and Pierre Del Moral Abstract In practical target tracing a number of improved measurement conversion

More information

An Introduction to Expectation-Maximization

An Introduction to Expectation-Maximization An Introduction to Expectation-Maximization Dahua Lin Abstract This notes reviews the basics about the Expectation-Maximization EM) algorithm, a popular approach to perform model estimation of the generative

More information

F denotes cumulative density. denotes probability density function; (.)

F denotes cumulative density. denotes probability density function; (.) BAYESIAN ANALYSIS: FOREWORDS Notation. System means the real thing and a model is an assumed mathematical form for the system.. he probability model class M contains the set of the all admissible models

More information

EUSIPCO

EUSIPCO EUSIPCO 3 569736677 FULLY ISTRIBUTE SIGNAL ETECTION: APPLICATION TO COGNITIVE RAIO Franc Iutzeler Philippe Ciblat Telecom ParisTech, 46 rue Barrault 753 Paris, France email: firstnamelastname@telecom-paristechfr

More information

Combine Monte Carlo with Exhaustive Search: Effective Variational Inference and Policy Gradient Reinforcement Learning

Combine Monte Carlo with Exhaustive Search: Effective Variational Inference and Policy Gradient Reinforcement Learning Combine Monte Carlo with Exhaustive Search: Effective Variational Inference and Policy Gradient Reinforcement Learning Michalis K. Titsias Department of Informatics Athens University of Economics and Business

More information

Bearings-Only Tracking in Modified Polar Coordinate System : Initialization of the Particle Filter and Posterior Cramér-Rao Bound

Bearings-Only Tracking in Modified Polar Coordinate System : Initialization of the Particle Filter and Posterior Cramér-Rao Bound Bearings-Only Tracking in Modified Polar Coordinate System : Initialization of the Particle Filter and Posterior Cramér-Rao Bound Thomas Brehard (IRISA/CNRS), Jean-Pierre Le Cadre (IRISA/CNRS). Journée

More information

Conditional Posterior Cramér-Rao Lower Bounds for Nonlinear Sequential Bayesian Estimation

Conditional Posterior Cramér-Rao Lower Bounds for Nonlinear Sequential Bayesian Estimation 1 Conditional Posterior Cramér-Rao Lower Bounds for Nonlinear Sequential Bayesian Estimation Long Zuo, Ruixin Niu, and Pramod K. Varshney Abstract Posterior Cramér-Rao lower bounds (PCRLBs) 1] for sequential

More information

Robust Monte Carlo Methods for Sequential Planning and Decision Making

Robust Monte Carlo Methods for Sequential Planning and Decision Making Robust Monte Carlo Methods for Sequential Planning and Decision Making Sue Zheng, Jason Pacheco, & John Fisher Sensing, Learning, & Inference Group Computer Science & Artificial Intelligence Laboratory

More information

Machine Learning Lecture Notes

Machine Learning Lecture Notes Machine Learning Lecture Notes Predrag Radivojac January 25, 205 Basic Principles of Parameter Estimation In probabilistic modeling, we are typically presented with a set of observations and the objective

More information

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K

More information

Lecture 8: Bayesian Estimation of Parameters in State Space Models

Lecture 8: Bayesian Estimation of Parameters in State Space Models in State Space Models March 30, 2016 Contents 1 Bayesian estimation of parameters in state space models 2 Computational methods for parameter estimation 3 Practical parameter estimation in state space

More information

Basic concepts in estimation

Basic concepts in estimation Basic concepts in estimation Random and nonrandom parameters Definitions of estimates ML Maimum Lielihood MAP Maimum A Posteriori LS Least Squares MMS Minimum Mean square rror Measures of quality of estimates

More information

A Model Reference Adaptive Search Method for Global Optimization

A Model Reference Adaptive Search Method for Global Optimization OPERATIONS RESEARCH Vol. 55, No. 3, May June 27, pp. 549 568 issn 3-364X eissn 1526-5463 7 553 549 informs doi 1.1287/opre.16.367 27 INFORMS A Model Reference Adaptive Search Method for Global Optimization

More information

Extended Object and Group Tracking with Elliptic Random Hypersurface Models

Extended Object and Group Tracking with Elliptic Random Hypersurface Models Extended Object and Group Tracing with Elliptic Random Hypersurface Models Marcus Baum Benjamin Noac and Uwe D. Hanebec Intelligent Sensor-Actuator-Systems Laboratory ISAS Institute for Anthropomatics

More information

Linear Dynamical Systems

Linear Dynamical Systems Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations

More information

Maximum Likelihood Diffusive Source Localization Based on Binary Observations

Maximum Likelihood Diffusive Source Localization Based on Binary Observations Maximum Lielihood Diffusive Source Localization Based on Binary Observations Yoav Levinboo and an F. Wong Wireless Information Networing Group, University of Florida Gainesville, Florida 32611-6130, USA

More information

2D Image Processing. Bayes filter implementation: Kalman filter

2D Image Processing. Bayes filter implementation: Kalman filter 2D Image Processing Bayes filter implementation: Kalman filter Prof. Didier Stricker Kaiserlautern University http://ags.cs.uni-kl.de/ DFKI Deutsches Forschungszentrum für Künstliche Intelligenz http://av.dfki.de

More information

Reinforcement Learning. Introduction

Reinforcement Learning. Introduction Reinforcement Learning Introduction Reinforcement Learning Agent interacts and learns from a stochastic environment Science of sequential decision making Many faces of reinforcement learning Optimal control

More information

Cramér-Rao Bounds for Estimation of Linear System Noise Covariances

Cramér-Rao Bounds for Estimation of Linear System Noise Covariances Journal of Mechanical Engineering and Automation (): 6- DOI: 593/jjmea Cramér-Rao Bounds for Estimation of Linear System oise Covariances Peter Matiso * Vladimír Havlena Czech echnical University in Prague

More information

FUNDAMENTAL FILTERING LIMITATIONS IN LINEAR NON-GAUSSIAN SYSTEMS

FUNDAMENTAL FILTERING LIMITATIONS IN LINEAR NON-GAUSSIAN SYSTEMS FUNDAMENTAL FILTERING LIMITATIONS IN LINEAR NON-GAUSSIAN SYSTEMS Gustaf Hendeby Fredrik Gustafsson Division of Automatic Control Department of Electrical Engineering, Linköpings universitet, SE-58 83 Linköping,

More information

Conditional Filters for Image Sequence Based Tracking - Application to Point Tracking

Conditional Filters for Image Sequence Based Tracking - Application to Point Tracking IEEE TRANSACTIONS ON IMAGE PROCESSING 1 Conditional Filters for Image Sequence Based Tracing - Application to Point Tracing Élise Arnaud 1, Étienne Mémin 1 and Bruno Cernuschi-Frías 2 Abstract In this

More information

CS599 Lecture 1 Introduction To RL

CS599 Lecture 1 Introduction To RL CS599 Lecture 1 Introduction To RL Reinforcement Learning Introduction Learning from rewards Policies Value Functions Rewards Models of the Environment Exploitation vs. Exploration Dynamic Programming

More information

Optimal Control. McGill COMP 765 Oct 3 rd, 2017

Optimal Control. McGill COMP 765 Oct 3 rd, 2017 Optimal Control McGill COMP 765 Oct 3 rd, 2017 Classical Control Quiz Question 1: Can a PID controller be used to balance an inverted pendulum: A) That starts upright? B) That must be swung-up (perhaps

More information

Latent Variable Models and EM algorithm

Latent Variable Models and EM algorithm Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic

More information

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels? Machine Learning and Bayesian Inference Dr Sean Holden Computer Laboratory, Room FC6 Telephone extension 6372 Email: sbh11@cl.cam.ac.uk www.cl.cam.ac.uk/ sbh11/ Unsupervised learning Can we find regularity

More information

Distributed Estimation and Detection for Smart Grid

Distributed Estimation and Detection for Smart Grid Distributed Estimation and Detection for Smart Grid Texas A&M University Joint Wor with: S. Kar (CMU), R. Tandon (Princeton), H. V. Poor (Princeton), and J. M. F. Moura (CMU) 1 Distributed Estimation/Detection

More information

Efficient Sensor Network Planning Method. Using Approximate Potential Game

Efficient Sensor Network Planning Method. Using Approximate Potential Game Efficient Sensor Network Planning Method 1 Using Approximate Potential Game Su-Jin Lee, Young-Jin Park, and Han-Lim Choi, Member, IEEE arxiv:1707.00796v1 [cs.gt] 4 Jul 2017 Abstract This paper addresses

More information

Density Approximation Based on Dirac Mixtures with Regard to Nonlinear Estimation and Filtering

Density Approximation Based on Dirac Mixtures with Regard to Nonlinear Estimation and Filtering Density Approximation Based on Dirac Mixtures with Regard to Nonlinear Estimation and Filtering Oliver C. Schrempf, Dietrich Brunn, Uwe D. Hanebeck Intelligent Sensor-Actuator-Systems Laboratory Institute

More information

Basics of reinforcement learning

Basics of reinforcement learning Basics of reinforcement learning Lucian Buşoniu TMLSS, 20 July 2018 Main idea of reinforcement learning (RL) Learn a sequential decision policy to optimize the cumulative performance of an unknown system

More information

CSE250A Fall 12: Discussion Week 9

CSE250A Fall 12: Discussion Week 9 CSE250A Fall 12: Discussion Week 9 Aditya Menon (akmenon@ucsd.edu) December 4, 2012 1 Schedule for today Recap of Markov Decision Processes. Examples: slot machines and maze traversal. Planning and learning.

More information

Estimating Gaussian Mixture Densities with EM A Tutorial

Estimating Gaussian Mixture Densities with EM A Tutorial Estimating Gaussian Mixture Densities with EM A Tutorial Carlo Tomasi Due University Expectation Maximization (EM) [4, 3, 6] is a numerical algorithm for the maximization of functions of several variables

More information

Sensor Fusion: Particle Filter

Sensor Fusion: Particle Filter Sensor Fusion: Particle Filter By: Gordana Stojceska stojcesk@in.tum.de Outline Motivation Applications Fundamentals Tracking People Advantages and disadvantages Summary June 05 JASS '05, St.Petersburg,

More information

CMU Lecture 11: Markov Decision Processes II. Teacher: Gianni A. Di Caro

CMU Lecture 11: Markov Decision Processes II. Teacher: Gianni A. Di Caro CMU 15-781 Lecture 11: Markov Decision Processes II Teacher: Gianni A. Di Caro RECAP: DEFINING MDPS Markov decision processes: o Set of states S o Start state s 0 o Set of actions A o Transitions P(s s,a)

More information

A Tour of Reinforcement Learning The View from Continuous Control. Benjamin Recht University of California, Berkeley

A Tour of Reinforcement Learning The View from Continuous Control. Benjamin Recht University of California, Berkeley A Tour of Reinforcement Learning The View from Continuous Control Benjamin Recht University of California, Berkeley trustable, scalable, predictable Control Theory! Reinforcement Learning is the study

More information

Reinforcement Learning and NLP

Reinforcement Learning and NLP 1 Reinforcement Learning and NLP Kapil Thadani kapil@cs.columbia.edu RESEARCH Outline 2 Model-free RL Markov decision processes (MDPs) Derivative-free optimization Policy gradients Variance reduction Value

More information

Gaussian Mixtures Proposal Density in Particle Filter for Track-Before-Detect

Gaussian Mixtures Proposal Density in Particle Filter for Track-Before-Detect 12th International Conference on Information Fusion Seattle, WA, USA, July 6-9, 29 Gaussian Mixtures Proposal Density in Particle Filter for Trac-Before-Detect Ondřej Straa, Miroslav Šimandl and Jindřich

More information

2D Image Processing. Bayes filter implementation: Kalman filter

2D Image Processing. Bayes filter implementation: Kalman filter 2D Image Processing Bayes filter implementation: Kalman filter Prof. Didier Stricker Dr. Gabriele Bleser Kaiserlautern University http://ags.cs.uni-kl.de/ DFKI Deutsches Forschungszentrum für Künstliche

More information

1 EM algorithm: updating the mixing proportions {π k } ik are the posterior probabilities at the qth iteration of EM.

1 EM algorithm: updating the mixing proportions {π k } ik are the posterior probabilities at the qth iteration of EM. Université du Sud Toulon - Var Master Informatique Probabilistic Learning and Data Analysis TD: Model-based clustering by Faicel CHAMROUKHI Solution The aim of this practical wor is to show how the Classification

More information

NON-LINEAR NOISE ADAPTIVE KALMAN FILTERING VIA VARIATIONAL BAYES

NON-LINEAR NOISE ADAPTIVE KALMAN FILTERING VIA VARIATIONAL BAYES 2013 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING NON-LINEAR NOISE ADAPTIVE KALMAN FILTERING VIA VARIATIONAL BAYES Simo Särä Aalto University, 02150 Espoo, Finland Jouni Hartiainen

More information

Introduction to Reinforcement Learning. CMPT 882 Mar. 18

Introduction to Reinforcement Learning. CMPT 882 Mar. 18 Introduction to Reinforcement Learning CMPT 882 Mar. 18 Outline for the week Basic ideas in RL Value functions and value iteration Policy evaluation and policy improvement Model-free RL Monte-Carlo and

More information

Rao-Blackwellized Particle Filter for Multiple Target Tracking

Rao-Blackwellized Particle Filter for Multiple Target Tracking Rao-Blackwellized Particle Filter for Multiple Target Tracking Simo Särkkä, Aki Vehtari, Jouko Lampinen Helsinki University of Technology, Finland Abstract In this article we propose a new Rao-Blackwellized

More information

RAO-BLACKWELLIZED PARTICLE FILTER FOR MARKOV MODULATED NONLINEARDYNAMIC SYSTEMS

RAO-BLACKWELLIZED PARTICLE FILTER FOR MARKOV MODULATED NONLINEARDYNAMIC SYSTEMS RAO-BLACKWELLIZED PARTICLE FILTER FOR MARKOV MODULATED NONLINEARDYNAMIC SYSTEMS Saiat Saha and Gustaf Hendeby Linöping University Post Print N.B.: When citing this wor, cite the original article. 2014

More information

Reinforcement Learning II

Reinforcement Learning II Reinforcement Learning II Andrea Bonarini Artificial Intelligence and Robotics Lab Department of Electronics and Information Politecnico di Milano E-mail: bonarini@elet.polimi.it URL:http://www.dei.polimi.it/people/bonarini

More information

RECURSIVE OUTLIER-ROBUST FILTERING AND SMOOTHING FOR NONLINEAR SYSTEMS USING THE MULTIVARIATE STUDENT-T DISTRIBUTION

RECURSIVE OUTLIER-ROBUST FILTERING AND SMOOTHING FOR NONLINEAR SYSTEMS USING THE MULTIVARIATE STUDENT-T DISTRIBUTION 1 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, SEPT. 3 6, 1, SANTANDER, SPAIN RECURSIVE OUTLIER-ROBUST FILTERING AND SMOOTHING FOR NONLINEAR SYSTEMS USING THE MULTIVARIATE STUDENT-T

More information

Distributed Optimization. Song Chong EE, KAIST

Distributed Optimization. Song Chong EE, KAIST Distributed Optimization Song Chong EE, KAIST songchong@kaist.edu Dynamic Programming for Path Planning A path-planning problem consists of a weighted directed graph with a set of n nodes N, directed links

More information

Data Detection for Controlled ISI. h(nt) = 1 for n=0,1 and zero otherwise.

Data Detection for Controlled ISI. h(nt) = 1 for n=0,1 and zero otherwise. Data Detection for Controlled ISI *Symbol by symbol suboptimum detection For the duobinary signal pulse h(nt) = 1 for n=0,1 and zero otherwise. The samples at the output of the receiving filter(demodulator)

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Dynamic Programming Marc Toussaint University of Stuttgart Winter 2018/19 Motivation: So far we focussed on tree search-like solvers for decision problems. There is a second important

More information

Recall that in finite dynamic optimization we are trying to optimize problems of the form. T β t u(c t ) max. c t. t=0 T. c t = W max. subject to.

Recall that in finite dynamic optimization we are trying to optimize problems of the form. T β t u(c t ) max. c t. t=0 T. c t = W max. subject to. 19 Policy Function Iteration Lab Objective: Learn how iterative methods can be used to solve dynamic optimization problems. Implement value iteration and policy iteration in a pseudo-infinite setting where

More information

MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti

MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti 1 MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti Historical background 2 Original motivation: animal learning Early

More information

Machine Learning, Midterm Exam

Machine Learning, Midterm Exam 10-601 Machine Learning, Midterm Exam Instructors: Tom Mitchell, Ziv Bar-Joseph Wednesday 12 th December, 2012 There are 9 questions, for a total of 100 points. This exam has 20 pages, make sure you have

More information

Continuous Learning Method for a Continuous Dynamical Control in a Partially Observable Universe

Continuous Learning Method for a Continuous Dynamical Control in a Partially Observable Universe Continuous Learning Method for a Continuous Dynamical Control in a Partially Observable Universe Frédéric Dambreville Délégation Générale pour l Armement, DGA/CTA/DT/GIP 16 Bis, Avenue Prieur de la Côte

More information

Computing Maximum Entropy Densities: A Hybrid Approach

Computing Maximum Entropy Densities: A Hybrid Approach Computing Maximum Entropy Densities: A Hybrid Approach Badong Chen Department of Precision Instruments and Mechanology Tsinghua University Beijing, 84, P. R. China Jinchun Hu Department of Precision Instruments

More information

Lecture 3: Markov Decision Processes

Lecture 3: Markov Decision Processes Lecture 3: Markov Decision Processes Joseph Modayil 1 Markov Processes 2 Markov Reward Processes 3 Markov Decision Processes 4 Extensions to MDPs Markov Processes Introduction Introduction to MDPs Markov

More information

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods Prof. Daniel Cremers 14. Sampling Methods Sampling Methods Sampling Methods are widely used in Computer Science as an approximation of a deterministic algorithm to represent uncertainty without a parametric

More information

Q-Learning in Continuous State Action Spaces

Q-Learning in Continuous State Action Spaces Q-Learning in Continuous State Action Spaces Alex Irpan alexirpan@berkeley.edu December 5, 2015 Contents 1 Introduction 1 2 Background 1 3 Q-Learning 2 4 Q-Learning In Continuous Spaces 4 5 Experimental

More information

Expectation Propagation Algorithm

Expectation Propagation Algorithm Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,

More information

2D Image Processing (Extended) Kalman and particle filter

2D Image Processing (Extended) Kalman and particle filter 2D Image Processing (Extended) Kalman and particle filter Prof. Didier Stricker Dr. Gabriele Bleser Kaiserlautern University http://ags.cs.uni-kl.de/ DFKI Deutsches Forschungszentrum für Künstliche Intelligenz

More information

Mobile Robot Localization

Mobile Robot Localization Mobile Robot Localization 1 The Problem of Robot Localization Given a map of the environment, how can a robot determine its pose (planar coordinates + orientation)? Two sources of uncertainty: - observations

More information

Kernel adaptive Sequential Monte Carlo

Kernel adaptive Sequential Monte Carlo Kernel adaptive Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) December 7, 2015 1 / 36 Section 1 Outline

More information

TSRT14: Sensor Fusion Lecture 9

TSRT14: Sensor Fusion Lecture 9 TSRT14: Sensor Fusion Lecture 9 Simultaneous localization and mapping (SLAM) Gustaf Hendeby gustaf.hendeby@liu.se TSRT14 Lecture 9 Gustaf Hendeby Spring 2018 1 / 28 Le 9: simultaneous localization and

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm The Expectation-Maximization Algorithm Francisco S. Melo In these notes, we provide a brief overview of the formal aspects concerning -means, EM and their relation. We closely follow the presentation in

More information

L09. PARTICLE FILTERING. NA568 Mobile Robotics: Methods & Algorithms

L09. PARTICLE FILTERING. NA568 Mobile Robotics: Methods & Algorithms L09. PARTICLE FILTERING NA568 Mobile Robotics: Methods & Algorithms Particle Filters Different approach to state estimation Instead of parametric description of state (and uncertainty), use a set of state

More information

UTILIZING PRIOR KNOWLEDGE IN ROBUST OPTIMAL EXPERIMENT DESIGN. EE & CS, The University of Newcastle, Australia EE, Technion, Israel.

UTILIZING PRIOR KNOWLEDGE IN ROBUST OPTIMAL EXPERIMENT DESIGN. EE & CS, The University of Newcastle, Australia EE, Technion, Israel. UTILIZING PRIOR KNOWLEDGE IN ROBUST OPTIMAL EXPERIMENT DESIGN Graham C. Goodwin James S. Welsh Arie Feuer Milan Depich EE & CS, The University of Newcastle, Australia 38. EE, Technion, Israel. Abstract:

More information

Hierarchical Particle Filter for Bearings-Only Tracking

Hierarchical Particle Filter for Bearings-Only Tracking Hierarchical Particle Filter for Bearings-Only Tracing 1 T. Bréhard and J.-P. Le Cadre IRISA/CNRS Campus de Beaulieu 3542 Rennes Cedex, France e-mail: thomas.brehard@gmail.com, lecadre@irisa.fr Tel : 33-2.99.84.71.69

More information

BAYESIAN MULTI-TARGET TRACKING WITH SUPERPOSITIONAL MEASUREMENTS USING LABELED RANDOM FINITE SETS. Francesco Papi and Du Yong Kim

BAYESIAN MULTI-TARGET TRACKING WITH SUPERPOSITIONAL MEASUREMENTS USING LABELED RANDOM FINITE SETS. Francesco Papi and Du Yong Kim 3rd European Signal Processing Conference EUSIPCO BAYESIAN MULTI-TARGET TRACKING WITH SUPERPOSITIONAL MEASUREMENTS USING LABELED RANDOM FINITE SETS Francesco Papi and Du Yong Kim Department of Electrical

More information

A Tutorial on the Cross-Entropy Method

A Tutorial on the Cross-Entropy Method A Tutorial on the Cross-Entropy Method Pieter-Tjerk de Boer Electrical Engineering, Mathematics and Computer Science department University of Twente ptdeboer@cs.utwente.nl Dirk P. Kroese Department of

More information

Signal Processing - Lecture 7

Signal Processing - Lecture 7 1 Introduction Signal Processing - Lecture 7 Fitting a function to a set of data gathered in time sequence can be viewed as signal processing or learning, and is an important topic in information theory.

More information

Multi-Target Tracking Using A Randomized Hypothesis Generation Technique

Multi-Target Tracking Using A Randomized Hypothesis Generation Technique Multi-Target Tracing Using A Randomized Hypothesis Generation Technique W. Faber and S. Charavorty Department of Aerospace Engineering Texas A&M University arxiv:1603.04096v1 [math.st] 13 Mar 2016 College

More information

The Cross Entropy Method for the N-Persons Iterated Prisoner s Dilemma

The Cross Entropy Method for the N-Persons Iterated Prisoner s Dilemma The Cross Entropy Method for the N-Persons Iterated Prisoner s Dilemma Tzai-Der Wang Artificial Intelligence Economic Research Centre, National Chengchi University, Taipei, Taiwan. email: dougwang@nccu.edu.tw

More information

t x 1 e t dt, and simplify the answer when possible (for example, when r is a positive even number). In particular, confirm that EX 4 = 3.

t x 1 e t dt, and simplify the answer when possible (for example, when r is a positive even number). In particular, confirm that EX 4 = 3. Mathematical Statistics: Homewor problems General guideline. While woring outside the classroom, use any help you want, including people, computer algebra systems, Internet, and solution manuals, but mae

More information

Mini-Course 07 Kalman Particle Filters. Henrique Massard da Fonseca Cesar Cunha Pacheco Wellington Bettencurte Julio Dutra

Mini-Course 07 Kalman Particle Filters. Henrique Massard da Fonseca Cesar Cunha Pacheco Wellington Bettencurte Julio Dutra Mini-Course 07 Kalman Particle Filters Henrique Massard da Fonseca Cesar Cunha Pacheco Wellington Bettencurte Julio Dutra Agenda State Estimation Problems & Kalman Filter Henrique Massard Steady State

More information

Parameterized Joint Densities with Gaussian Mixture Marginals and their Potential Use in Nonlinear Robust Estimation

Parameterized Joint Densities with Gaussian Mixture Marginals and their Potential Use in Nonlinear Robust Estimation Proceedings of the 2006 IEEE International Conference on Control Applications Munich, Germany, October 4-6, 2006 WeA0. Parameterized Joint Densities with Gaussian Mixture Marginals and their Potential

More information

Reinforcement Learning. Spring 2018 Defining MDPs, Planning

Reinforcement Learning. Spring 2018 Defining MDPs, Planning Reinforcement Learning Spring 2018 Defining MDPs, Planning understandability 0 Slide 10 time You are here Markov Process Where you will go depends only on where you are Markov Process: Information state

More information

Three examples of a Practical Exact Markov Chain Sampling

Three examples of a Practical Exact Markov Chain Sampling Three examples of a Practical Exact Markov Chain Sampling Zdravko Botev November 2007 Abstract We present three examples of exact sampling from complex multidimensional densities using Markov Chain theory

More information

Clustering with k-means and Gaussian mixture distributions

Clustering with k-means and Gaussian mixture distributions Clustering with k-means and Gaussian mixture distributions Machine Learning and Object Recognition 2017-2018 Jakob Verbeek Clustering Finding a group structure in the data Data in one cluster similar to

More information

Clustering with k-means and Gaussian mixture distributions

Clustering with k-means and Gaussian mixture distributions Clustering with k-means and Gaussian mixture distributions Machine Learning and Category Representation 2014-2015 Jakob Verbeek, ovember 21, 2014 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.14.15

More information

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

PILCO: A Model-Based and Data-Efficient Approach to Policy Search PILCO: A Model-Based and Data-Efficient Approach to Policy Search (M.P. Deisenroth and C.E. Rasmussen) CSC2541 November 4, 2016 PILCO Graphical Model PILCO Probabilistic Inference for Learning COntrol

More information

10-701/15-781, Machine Learning: Homework 4

10-701/15-781, Machine Learning: Homework 4 10-701/15-781, Machine Learning: Homewor 4 Aarti Singh Carnegie Mellon University ˆ The assignment is due at 10:30 am beginning of class on Mon, Nov 15, 2010. ˆ Separate you answers into five parts, one

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Model-Based Reinforcement Learning Model-based, PAC-MDP, sample complexity, exploration/exploitation, RMAX, E3, Bayes-optimal, Bayesian RL, model learning Vien Ngo MLR, University

More information

The Regularized EM Algorithm

The Regularized EM Algorithm The Regularized EM Algorithm Haifeng Li Department of Computer Science University of California Riverside, CA 92521 hli@cs.ucr.edu Keshu Zhang Human Interaction Research Lab Motorola, Inc. Tempe, AZ 85282

More information

Recursive Noise Adaptive Kalman Filtering by Variational Bayesian Approximations

Recursive Noise Adaptive Kalman Filtering by Variational Bayesian Approximations PREPRINT 1 Recursive Noise Adaptive Kalman Filtering by Variational Bayesian Approximations Simo Särä, Member, IEEE and Aapo Nummenmaa Abstract This article considers the application of variational Bayesian

More information

Gaussian Mixture Model

Gaussian Mixture Model Case Study : Document Retrieval MAP EM, Latent Dirichlet Allocation, Gibbs Sampling Machine Learning/Statistics for Big Data CSE599C/STAT59, University of Washington Emily Fox 0 Emily Fox February 5 th,

More information

CS181 Midterm 2 Practice Solutions

CS181 Midterm 2 Practice Solutions CS181 Midterm 2 Practice Solutions 1. Convergence of -Means Consider Lloyd s algorithm for finding a -Means clustering of N data, i.e., minimizing the distortion measure objective function J({r n } N n=1,

More information

Introduction to Reinforcement Learning Part 1: Markov Decision Processes

Introduction to Reinforcement Learning Part 1: Markov Decision Processes Introduction to Reinforcement Learning Part 1: Markov Decision Processes Rowan McAllister Reinforcement Learning Reading Group 8 April 2015 Note I ve created these slides whilst following Algorithms for

More information

ORBIT DETERMINATION AND DATA FUSION IN GEO

ORBIT DETERMINATION AND DATA FUSION IN GEO ORBIT DETERMINATION AND DATA FUSION IN GEO Joshua T. Horwood, Aubrey B. Poore Numerica Corporation, 4850 Hahns Pea Drive, Suite 200, Loveland CO, 80538 Kyle T. Alfriend Department of Aerospace Engineering,

More information

in a Rao-Blackwellised Unscented Kalman Filter

in a Rao-Blackwellised Unscented Kalman Filter A Rao-Blacwellised Unscented Kalman Filter Mar Briers QinetiQ Ltd. Malvern Technology Centre Malvern, UK. m.briers@signal.qinetiq.com Simon R. Masell QinetiQ Ltd. Malvern Technology Centre Malvern, UK.

More information

Internet Monetization

Internet Monetization Internet Monetization March May, 2013 Discrete time Finite A decision process (MDP) is reward process with decisions. It models an environment in which all states are and time is divided into stages. Definition

More information

Estimation and Maintenance of Measurement Rates for Multiple Extended Target Tracking

Estimation and Maintenance of Measurement Rates for Multiple Extended Target Tracking Estimation and Maintenance of Measurement Rates for Multiple Extended Target Tracing Karl Granström Division of Automatic Control Department of Electrical Engineering Linöping University, SE-58 83, Linöping,

More information

11. Learning graphical models

11. Learning graphical models Learning graphical models 11-1 11. Learning graphical models Maximum likelihood Parameter learning Structural learning Learning partially observed graphical models Learning graphical models 11-2 statistical

More information

6 Reinforcement Learning

6 Reinforcement Learning 6 Reinforcement Learning As discussed above, a basic form of supervised learning is function approximation, relating input vectors to output vectors, or, more generally, finding density functions p(y,

More information

An Adaptive Clustering Method for Model-free Reinforcement Learning

An Adaptive Clustering Method for Model-free Reinforcement Learning An Adaptive Clustering Method for Model-free Reinforcement Learning Andreas Matt and Georg Regensburger Institute of Mathematics University of Innsbruck, Austria {andreas.matt, georg.regensburger}@uibk.ac.at

More information

A Comparison of Particle Filters for Personal Positioning

A Comparison of Particle Filters for Personal Positioning VI Hotine-Marussi Symposium of Theoretical and Computational Geodesy May 9-June 6. A Comparison of Particle Filters for Personal Positioning D. Petrovich and R. Piché Institute of Mathematics Tampere University

More information

Lecture 4: Types of errors. Bayesian regression models. Logistic regression

Lecture 4: Types of errors. Bayesian regression models. Logistic regression Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture

More information

A FEASIBILITY STUDY OF PARTICLE FILTERS FOR MOBILE STATION RECEIVERS. Michael Lunglmayr, Martin Krueger, Mario Huemer

A FEASIBILITY STUDY OF PARTICLE FILTERS FOR MOBILE STATION RECEIVERS. Michael Lunglmayr, Martin Krueger, Mario Huemer A FEASIBILITY STUDY OF PARTICLE FILTERS FOR MOBILE STATION RECEIVERS Michael Lunglmayr, Martin Krueger, Mario Huemer Michael Lunglmayr and Martin Krueger are with Infineon Technologies AG, Munich email:

More information

4: Parameter Estimation in Fully Observed BNs

4: Parameter Estimation in Fully Observed BNs 10-708: Probabilistic Graphical Models 10-708, Spring 2015 4: Parameter Estimation in Fully Observed BNs Lecturer: Eric P. Xing Scribes: Purvasha Charavarti, Natalie Klein, Dipan Pal 1 Learning Graphical

More information

Computer Intensive Methods in Mathematical Statistics

Computer Intensive Methods in Mathematical Statistics Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 16 Advanced topics in computational statistics 18 May 2017 Computer Intensive Methods (1) Plan of

More information

A NOVEL OPTIMAL PROBABILITY DENSITY FUNCTION TRACKING FILTER DESIGN 1

A NOVEL OPTIMAL PROBABILITY DENSITY FUNCTION TRACKING FILTER DESIGN 1 A NOVEL OPTIMAL PROBABILITY DENSITY FUNCTION TRACKING FILTER DESIGN 1 Jinglin Zhou Hong Wang, Donghua Zhou Department of Automation, Tsinghua University, Beijing 100084, P. R. China Control Systems Centre,

More information