Continuous Learning Method for a Continuous Dynamical Control in a Partially Observable Universe

Size: px
Start display at page:

Download "Continuous Learning Method for a Continuous Dynamical Control in a Partially Observable Universe"

Transcription

1 Continuous Learning Method for a Continuous Dynamical Control in a Partially Observable Universe Frédéric Dambreville Délégation Générale pour l Armement, DGA/CTA/DT/GIP 16 Bis, Avenue Prieur de la Côte d Or Arcueil, F 94114, France Web: Abstract - In this paper, we are interested in the optimal dynamical control of sensors based on partial and noisy observations. These problems are related to the POMDP family. In this case however, we are manipulating continuous-valued controls and continuous-valued decisions. While the dynamical programming method will rely on a discretization of the problem, we are dealing here directly with the continuous data. Moreover, our purpose is to address the full past observation range. Our approach is to modelize the POMDP strategies by means of Dynamic Bayesian Networks. A method, based on the Cross-Entropy is implemented for optimizing the parameters of such DBN, relatively to the POMDP problem. In this particular work, the Dynamic Bayesian Networks are built from semi-continuous probabilistic laws, so as to ensure the manipulation of continuous data. Keywords: Optimization, Dynamical control, Crossentropie method, Resource allocation, Tracking 1 Introduction When planning the surveillance of an area, there are different degrees of difficulties in the optimization of the sensorsallocations. There are well known and efficient models for such planning when the observation is not involved in the optimization process[1, 2, 3, 4, 5]. The planning is much more difficult when it becomes dynamical and involves some partial observation. In order to better understand what are the difficulties here, let us consider a very simple example. Assume that we have to catch a target moving within a field. This target is not entirely predictible and moves according to a random model. Now, assume there is a hill in the center of the field. It is hypothesized that moving in the field is easy, while climbing the hill is difficult. On the other hand, observing the target from the field is difficult while one have a full observation of the situation from the top of the hill. Then, what should be our strategy? Are we managing our investigation in the field only: we are moving fast, but with poor observations. Or are we climbing first on the hill, in order to have a better knowledge of the target: we are loosing time first, by climbing the hill. But then we have a good perception of the target. And how will we use such a knowledge efficiently? How to evaluate the information earned from the hill? Such choices are the fundamental difficulties of a planning with partial observation. Mathematically, the problem is particularly complex. A quite classical model for such problems is the theory of Partially Observable Markov Decision Process. There are some hypotheses done here. First, the law of evolution of the universe (e.g. the target move) is markovian. Secondly, the criterion to be obtimized is sufficiently simple: it is additive with the time. It is well known[10, 11] that the problem is solved then by a dynamical programming approach. But the solution is in many cases tedious or even untractable practically. Such problems are then forced to be simple. Moreover, it is necessary in the approach to discretize the problem. Another approach is to approximate the strategy by reinforcement learning[12]. Although this method needed to bound the past observation range, there are interesting progress by the way of a hierarchical approach. In previous works[6] we proposed a new approach which shares some common points with the reinforcement learning method. The purpose here is to describe the possible policies of control by a wide family of probabilistic laws (typically a family of Dynamical Bayesian Networks). Then to learn an optimal law within this family by a simulation optimization algorithm. As described in the previous papers, this approach makes very few hypotheses about the problem: no Markovian hypothesis, no additivity hypothesis, and no restriction in the past observartion range. But of course there are limitations: the limitations of our policy models. Since the complexity of these models are necessarily bounded, the optimal answer to the control problem is restricted. It has been shown however, that complex policy models were not necessary to reach a good policy. Moreover, it is possible to implement hierarchical models[7], which allow more complex answer for the control. In this paper, we will apply such a technique to a problem of detection-investigation. While many POMDP

2 problems are manipulating discretized quantities, we are applying our method directly to continuous parameters. Here, the optimized policy will be a probabilistic law, which will take a continuous observation (typically, the noised positions of the target) and produce a continuous dcision (typically, the direction and speed of two patrolers). More precisely, the optimal strategic tree will be approximated by means of semi-continuous Hidden Markov Models. We will describe the setting of these laws and how to relearn them during the simulation process. The next section introduces some formalism and gives a quick description of control problems with partial observation. Our planning method is then introduced. It is based on the direct approximation of the optimal decision tree by means of an approximating structure; typically a Hidden Markov Model is used for such structure. A particular structure of HMM for manipulating continuous data is introduced; it has been implemented. The third section explains how to optimize the parameters of this HMM, in order to approximate the optimal decision tree for the planning with partial observation. In particular, the cross-entropy method is described and applied. The fourth section explains the simulated application on which our model is applied, and presents some results. 2 Decision in a partially observable universe This section is dedicated to the theoretical description of the control with partial observation. A practical exemple of experimentation, a simulation, is detailed in section 4 One should keep in mind that we intend here to solve a control problem manipulating continuous parameters (decisions and observations). Now let introduce the formal problem. It is assumed that a subject is acting in a given world with a given purpose or mission. The goal is to optimize the accomplishment of this mission. The subject will receive observation from the world, and will produce action on it. The world. The world is characterized by an hidden state x, which is evolving with the time. As an assumption, the time t is discretized from step 1 to the maximal step T. The temporal evolution of the hidden state is denoted by the vector x = x 1,...,x t,...,x T. During the evolution of the word, the subject will make some decisions d which will impact the evolution of the world. He is also able to make some partial and noisy observations, denoted y. The world is thus characterized by a law of evolution involving both the decision, the hidden state and the observation. It is hypothesized that this law, denoted P, is probabilistic: The hidden state x t and observation y t are obtained from the conditional law P(x t, y t x 1:t 1, y 1:t 1, d 1:t 1 ), which depends Figure 1: The world Hidden state x y 1 d 1 y 2 d 2 y 3 d 3 y t d t y t+1 d t+1 on the past states, observations and decisions. Moreover, it is assumed that d t is generated by the subject after the observation x t. There is no Markovian hypothesis about the law P. But it is assumed that the laws P(x t, y t x 1:t 1, d 1:t 1 ) are simulated very quickly. Notice that this lack of assumption make impossible the use of a method based on the dynamic programming. The law of x, y d is represented graphically by figure 1. In this description, output arrows are for the values produced by the world, i.e. in this case the observations. The input arrows are for the values consumed by the world, i.e. the decision of the subject. The variables are put chronologically: y t appears before d t because the decision d t occurs after the observation y t. In this paper, the observation y t and the decision d t are continuous values. From now on, we will use the notation P(x, y d) for the full law of the world: P(x, y d) = T P(x t, y t x 1:t 1, d 1:t 1 ). Evaluation and optimal planning. The previous paragraphs have built a modelling of the world, of the actions and of the observations. We are now giving a characterization of the mission to be accomplished. The mission is limited in time. Let T be this maximum time. In the most generality, the mission is evaluated by a function V (d, y, x) defined on the trajectories of d, y, x. Typically, the function V could be used for computing the time needed for the mission accomplishment. The purpose is to construct an optimal decision tree d(obs) depending on the observation obs in order to maximize the mean evaluation. This is a dynamic optimization problem, since the actions depend on the previous observations. The related program consists in the optimization of y ( d t (y 1:t ) T ) as follows: d arg max d y x P ( x, y ( d t (y 1:t ) T )) V ( ( d t (y 1:t ) T ) (1), y, x)dydx. This optimization is schematized in figure 2. It is related to the family of Partially Observable Markov Decision Problems, although there is no Markovian hypothesis about the world here. In the figure, the double arrows characterize the variables to be optimized. More precisely, these arrows describe the flow of information between the observations and the actions. The cells denoted are making decisions and transmitting

3 Figure 2: POMDP planning Hidden state x y 1 y 2 y 3 y t y t+1 d 1 d 2 d 3 d t d t+1 all the received and generated information (including the actions). This architecture illustrates that planning with observation is an indefinite-memory problem: the decision depends on the whole past observations. When the evaluation function V is additive, it is known that there is a finite-memory construction of the solution, by means of the dynamic programming paradigm. However, notice that this finite-memory is a probabilistic posterior of the world, resulting from the past observations. By the way, it is too huge to be manipulated properly. An alternative method to the Dynamic Programming is proposed subsequently. It relies on the optimisation of a probabilistic template for the control policy. Direct approximation of the decision tree. In an optimization problem like (1), the value to be optimized, d, is a deterministic object. In this precise case, d is a tree of decision, that is a function which maps to a decision d t from any sequence of observation y 1:t. It is possible however to have a probabilistic viewpoint. Then the problem is equivalent to finding π(d y), a probabilistic law of actions conditionally to the past observations, which maximizes the mean evaluation: V (π) = d y x T π(d t d 1:t 1, y 1:t ) P(x, y d) V (d, y, x)dxdy dd. Notice that this problem could be still schematized by figure 2, but the double arrows now describes the DBN (Dynamic Bayesian Network) structure of the law π. While are indefinite memories, the schematized law is quite general. Actually, there will not be a great difference with the deterministic case for an otimal solution: when the solution d is unique, the optimal law π is a dirac on d. But things change when approximating π. Now, why using a probability to approximate the optimal strategy? The main point is that probabilistic models seem more suitable for approximation. The second point is that we are sure to approximate continuously: indeed, π V (π) is continuous. There is a third point. When replacing in figure 2 the indefinite memories by finite memories, let they be denoted by m as in figure 3, it is then obtained a natural approximation of the law π. The approximated law is a Hidden Markov Model. As will be seen, HMM are very practical for an optimization. Figure 3: Finite-memory planning approximation Hidden state x y 1 y 2 y 3 y t y t+1 d 1 d 2 d 3 d t d t+1 m 1 m 2 m 3 m t m t+1 Figure 4: A typical Hidden Markov Model y 1 y 2 y 3 y t y t+1 d 1 d 2 d 3 d t d t+1 m 1 m 2 m 3 m t m t+1 Policy approximation by a HMM. Define for any time t a variable m t M, called the memory at time t. Notice that m t is intended to describe a finite memory. Nevertheless, M is not neccessarily a finite set; for example, M could contain continuous or a semicontinuous values. In the most general case, a HMM for the decision policy will take the form: h(d y) = m M T h(d, m y)dm where: h(d, m y) = T ( h(dt m t )h(m t y t, m t 1 ) ). This general model for an HMM is schematized in figure 4. This formalism is the most general, and it hides many possible HMM settings more or less intricated. We will not discuss here about the detailed structure of the HMM (see next paragraph), but instead about the general principle of the approximation of π by such HMM. The approach developped in [6][7] is quite general and can be split up into two points: Define a family of HMMs H, to be used as policy aproximation, Optimize the parameters of such HMMs in order to maximize the mean evaluation: Find h argmax h H V (h). In practice, a good choice of H implies that h is a good approximation of π. A detailed description of the HMM family. It is recalled that our purpose is to investigate a continuous control problem. Thus, any HMM h H should input a continuous data (the observation y) and output a continuous decision d. The choice here is to manipulate both a discrete memory and a continuous memory.

4 Let call m D the discrete memory of h, and assume m D t takes its values within the set {0,...,2 L 1}. Let m C denote the continuous memory of h, and assume that m C t is a IR-valued vector of dimension K. In addition, we will define a continuous temporary memory, denoted µ D, such that µ D t is a IR-valued vector of dimension L. The idea is to derive the continuous data m C t and µ D t from the previous memories m C t 1, m D t 1 and observations y t by means of a Gaussian law; these law will be optimized. The discrete memory m D t is obtained by discretizing the temporary memory µ D t ; this process is fixed and cannot be optimized. The decision d t is obtained from the memory m C t, md t by means of a Gaussian law; this law will be optimized. Let N(Σ, µ) denotes a multivariate gaussian vector with variance matrix Σ and means vector µ. All the process could be detailed as follows: m C t = N(Σ C [m D t 1], A C [m D t 1](1, m C t 1, y t )), where the matrices Σ C [m] and A C [m] have to be optimized for any m {0,...,2 L 1}. Notice that Σ C [m] is of dimension K K while A C [m] is of dimension K (1 + K + dimy t ), µ D t = N(Σ D [m D t 1 ], AD [m D t 1 ](1, mc t 1, y t)), where the matrices Σ D [m] and A D [m] have to be optimized for any m {0,...,2 L 1}. Notice that Σ D [m] is of dimension L L while A D [m] is of dimension L (1 + K + dimy t ), m D t is the boolean-vector of dimension L which indicates in which hypercorner of IR L is placed µ D t. More precisely: where: L 1 m D t = b kt 2 k, k=0 b kt = 1 when µ D kt 0 and b kt = 0 else, d t = N(Σ dec [m D t ], A dec [m D t ](1, m C t )), where the matrices Σ dec [m] and A dec [m] have to be optimized for any m {0,...,2 L 1}. Notice that Σ dec [m] is of dimension dim d t dimd t while A dec [m] is of dimension dimd t (1 + K) Figure 5 give an illustration of the Markovian transition. The doubled arrows means that the parameter have to be optimized. From now on, the set H will refer to these semi-continuous HMM. Why not using purely continuous HMM? Continuous hmm, particularly Gaussian, are too weak structures. A semi-continuous scheme is necessary to achieve a sufficient abstraction. It remains now to explain how to optimize the choice of h among the family H. The following section explains a cross-entropic method for optimizing such a choice. Figure 5: Semi-continuous HMM transition m C t 1 m D t 1 y t m C t µ D t µ D t 3 Cross-entropic optimization The reader interested in CE methods should refer to the tutorial on the CE method[8]. CE algorithms were first dedicated to estimating the probability of rare events. A slight change of the basic algorithm made it also good for optimization. In their new article[13], Homem-de-Mello and Rubinstein have given some results about the global convergence. In order to ensure such convergence, some refinements are introduced particularly about the selective rate. This presentation is restricted to the basic CE optimization method. The new improvements of the CE algorithm proposed in [13] have not been implemented, but the algorithm has been seen to work properly. For this reason, this paper does not deal with the choice of the selective rate. 3.1 General CE algorithm for the optimization The Cross Entropy algorithm repeats until convergence the three successive phases: 1. Generate samples of random data according to a parameterized random mechanism, 2. Select the best samples according to an evaluation criterion, 3. Update the parameters of the random mechanism, on the basis of the selected samples. In the particular case of CE, the update in phase 3 is obtained by minimizing the Kullback-Leibler distance, or cross entropy, between the updated random mechanism and the selected samples. The next paragraphs describe on a theoretical example how such method can be used in an optimization problem. Formalism Let be given a function x f(x); this function is easily computable. The value f(x) has to be maximized, by optimizing the choice of x X. The function f will be the evaluation criterion. Now let be given a family of probabilistic laws, P σ σ Σ, applying on the variable x. The family P is the parameterized random mechanism. The variable x is the random data. Let ρ ]0, 1[ be a selective rate. The CE algorithm for (x, f, P) follows the synopsis: d t

5 1. Initialize σ Σ, 2. Generate N samples x n according to P σ, 3. Select the ρn best samples according to the evaluation criterion f, 4. Update σ as a minimizer of the cross-entropy with the selected samples: σ argmax ln P σ (x n ), σ Σ n selected 5. Repeat from step 2 until convergence. This algorithm requires f to be easily computable. Interpretation The CE algorithm tightens the law P σ around the maximizer of f. Then, when the probabilistic family P is well suited to the maximization of f, it becomes equivalent to find a maximizer for f or to optimize the parameter σ by means of the CE algorithm. The problem is to find a good family... Another issue is the criterion for deciding the convergence. Some answers are given in [13]. Now, it is outside the scope of this paper to investigate these questions precisely. Our criterion was to stop after a given threshold of successive unsuccessful tries and this very simple method have worked fine on our problem. 3.2 Application The cross-entropy, together with the probabilistic modelling of the policy, is now applied in order to approximate the optimal strategy for a planning with partial observation. Our objective is to tune the semicontinuous HMM h H in order to have the best approximation of the optimal planning strategy π : π h arg max h H V (h). Define P[h] the complete probabilistic law of the system Universe/Planner by: P[h](d, y, x, m) = P(x, y d)h(d, m y). Notice here that the memory is composite, i.e. m = (m D, m C, µ D ) with one discrete and two continuous components. The approximated planning reduces to solve: h arg max P[h](d, y, x, m) h H d y x m V (d, y, x)dxdy dddm. Optimizing h means tuning the parameter h H in order to tighten the probability P[h] around optimal values for V. This is exactly solved by the Cross- Entropy optimization method. However, it is required that the evaluation function V is easily computable. Typically, the definition of V may be recursive, eg. 1 : V (d, y, x) = {v t (d t, y t, x t, } 2 t=t v 1(d 1, y 1, x 1 ) {)} T t=2. 1 Braces used with subscripts, eg. {} T, only have a grammatical meaning here. More precisely, it means that the symbols inside the braces are duplicated and concatenated according to the subscript. For example, {f k (} 3 k=1 x{)}1 k=3 means f 1 (f 2 (f 3 (x))) and {x k } T k=t means T k=t x k. Let the selective rate ρ be a positive number such that ρ < 1. The cross-entropy optimization method follows the synopsis: 1. Initialize h. For example a flat h, 2. Make N tossing θ n = (d n, y n, x n, m n ) according to the law P[h], 3. Choose the ρn best samples θ n according to the evaluation V (d, y, x). Denote S the set of the selected samples, 4. Update h as the minimizer of the cross-entropy with the selected examples: h argmax h H ln P[h](θ n ), (2) 5. Reiterate from step 2 until convergence. In this case, the maximization (2) is not difficult. In particular, the Markovian property is widely used: ln P[h] is derived into a sum and subsequently, the optimization is split into several elementary independent problems. At last, this maximization (2) is solved by: For the continuous memory (v means the transpose of vector v): A C [m] = and m C n,t (1, mc n,t 1, y n,t) t:m D n,t 1 =m (1, m C n,t 1, y n,t)(1, m C n,t 1, y n,t) t:m D n,t 1 =m Γ nt Γ nt Σ C t:m D n,t 1 [m] = { =m } card n S, t/m D t 1 = m where Γ nt = m C n,t AC [m](1, m C n,t 1, y n,t) For the temporary memory: A D [m] = µ D n,t (1, mc n,t 1, y n,t) t:m D n,t 1 =m and t:m D n,t 1 =m (1, m C n,t 1, y n,t )(1, m C n,t 1, y n,t ) Γ nt Γ nt Σ D t:m D n,t 1 [m] = { =m } card n S, t/m D t 1 = m where Γ nt = µ D n,t AD [m](1, m C n,t 1, y n,t) 1 1

6 For the decision: A dec [m] = d n,t (1, m C n,t ) t:m D n,t =m 1 and t:m D n,t =m (1, m C n,t)(1, m C n,t) Γ nt Γ nt Σ dec t:m D n,t [m] = { =m } card n S, t/m D t = m where Γ nt = d n,t A dec [m](1, m C n,t ) 4 Implementation The algorithm has been applied to a target detection and interception problem. 4.1 The experiment. A target R is moving in the continuous space [ 20, 20] [ 20, 20]. It is initially located in the area [0, 20] [ 20, 20], with a known distribution (actually an uniform distribution). R is tracked by two mobiles, B 1 and B 2, controled by the subject. B 1 and B 2 are initially located at the coordonates ( 20, 0). The mobiles receive the relative position of each other, with an additive noise (a gaussian noises with variance 1). Each mobile receives only one information about the target: it knows the direction of the target spot but have no information about its distance. Moreover, this direction information is noisy. The noise may vary with the (Euclidean) distance d between the patrol and the target: in this simulation, the angular noise is a uniform random variable on the set [ d π d+1 2, d d+1 π 2 ]. Each mobile will receive this information as a spot tossed accordingly to the noisy distribution (in particular, there is a very big variance about the distance). Thus, the dimension of the continuous information y t is 8, since y t contains two spot positions and two mobile relative positions. B 1 and B 2 are able to move, according to a directive of the subject. The directive is a direction and a move intensity (a speed) for each mobile. The mobile maximal speed is 2, starting form 0; but the mobile cannot escape from the space. Thus, the dimension of the continuous decision d t is 4 (moves will be truncated). Moreover, the patrols moves are noised additively (a gaussian noises with variance 1). The target moves accordingly to the following directives (unless other test directives are given): It cannot escape from the space, unless it reach the escape line { 20} [ 20, 20], The target speed is characterized by its relative move from step t to step t+1. This relative move is chosen as a uniform random variable on the set [ 4, 0] [ 2, 2]. The move is truncated, if a constraint is reached. As a consequence, the target is moving downward so as to reach the escape line. It moves twice faster than the patrols. The purpose of the mission is to get closer as possible to the target (at least one time and by mean of at least one patrol), before it escapes. More precisely, the evaluation V of a sample is given by: V = max max{ 1 t before escape d(b1 t, Rt ) 2, 1 d(b2 t, Rt ) 2 }. Thus, we are just optimizing the expected maximal inverted (squared) distance, which results in strategies with close target contacts. The total number of turn is T = 100. Results. Owing to the conference deadline Schedule, our tests have been limited. More results should be available later at this address: In the tests described subsequently, the processes have been run for one hour on a 2GHz PC (almost all the processor time were used). This was sufficient to reach a good convergence, since most of the gains are obtained at the beginning of the process (convergence is almost done after about ten minutes). In this version of the paper, we are interested in 3 different tests (test 1 is the simplest). Test 1. In this case, the target does not move, and is initially located at position (20, 0). After optimization of the strategy, the obtained mean reward is 1982, which means that a patrol will contact the target at distance Test 2. Again the target does not move, but is initially located randomly on the space [0, 20] [ 20, 20] with a uniform distribution. After optimization of the strategy, the obtained mean reward is 17, which means that a patrol will contact the target at distance Test 3. In this case, the full location and moving hypotheses are made about the target. Notice that the period of possible contact is quite reduced (because of escape) in comparison with previous tests. After optimization of the strategy, the obtained mean reward is 16, which means that a patrol will contact the target at distance It appears fortunately that our optimized policies are able of good contact with the target. Such results are promizing, but more tests should be done, and comparisons with other methods (for example a Q-learning approach on a discretized problem) are needed. More

7 intricated examples should be investigated too. These tests are considered for a next future. 5 Conclusion. In this paper, we proposed a method for approximating the optimal planning in a partially observable control problem. The planning involves an optimization of continuous decision in regards to a sequence of continuous past observations. The method relies on a modelling of the policies by means of a semi-continuous probabilitic law family. The method of cross-entropy is applied to find the optimal law. This method will be implemented for solving a problem of detectioninvestigation, where two mobiles have to catch a target, while receiving a radial observation of this target. The tests on this scenario are promizing. More tests are being done. [11] Anthony Rocco Cassandra, Exact and approximate algorithms for partially observable Markov decision processes, PhD thesis, Brown University, Rhode Island, Providence, May [12] B. Bakker, J. Schmidhuber, Hierarchical Reinforcement Learning Based on Subgoal Discovery and Subpolicy Specialization, in Proceedings of the 8-th Conference on Intelligent Autonomous Systems, Amsterdam, The Netherlands, p , [13] Homem-de-Mello, Rubinstein, Rare Event Estimation for Static Models via Cross-Entropy and Importance Sampling, tito/list.htm [14] Kevin Murphy and Mark Paskin, Linear Time Inference in Hierarchical HMMs, Proceedings of Neural Information Processing Systems, References [1] S.S. Brown, Optimal Search for a Moving Target in Discrete Time and Space. Operations Research 28, pp , [2] J. de Guenin, Optimum Distribution of Effort: an Extension of the Koopman Basic Theory. Operations Research 9, pp 1 7, [3] B.O. Koopman, Search and Screening: General Principle with Historical Applications. MORS Heritage Series, Alexandria, VA, [4] L.D. Stone, Theory of Optimal Search, 2-nd ed.. Operations Research Society of America, Arlington, VA, [5] A.R. Washburn, Search for a moving Target: The FAB algorithm. Operations Research 31, pp , [6] Frederic Dambreville, Learning a Machine for the Decision in a Partially Observable Markov Universe, ISDA 2004, Budapest, Hungary, August 26-28, [7] Frederic Dambreville, Learning a Machine for the Decision in a Partially Observable Markov Universe. Submitted to European Journal of Operation Research. [8] De Boer and Kroesse and Mannor and Rubinstein, A Tutorial on the Cross-Entropy Method, [9] Richard Bellman, Dynamic Programming, Princeton University Press, Princeton, New Jersey, [10] Edward J. Sondik, The Optimal Control of Partially Observable Markov Processes, PhD thesis, Stanford University, Stanford, California, 1971.

arxiv:math/ v1 [math.gm] 11 Aug 2004

arxiv:math/ v1 [math.gm] 11 Aug 2004 Learning a Machine for the Decision in a Partially arxiv:math/0408146v1 [math.gm] 11 Aug 2004 Observable Markov Universe Frédéric Dambreville Délégation Générale pour l Armement, DGA/CTA/DT/GIP 16 Bis,

More information

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels? Machine Learning and Bayesian Inference Dr Sean Holden Computer Laboratory, Room FC6 Telephone extension 6372 Email: sbh11@cl.cam.ac.uk www.cl.cam.ac.uk/ sbh11/ Unsupervised learning Can we find regularity

More information

Optimal path planning using Cross-Entropy method

Optimal path planning using Cross-Entropy method Optimal path planning using Cross-Entropy method F Celeste, FDambreville CEP/Dept of Geomatics Imagery Perception 9 Arcueil France {francisceleste, fredericdambreville}@etcafr J-P Le Cadre IRISA/CNRS Campus

More information

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

PILCO: A Model-Based and Data-Efficient Approach to Policy Search PILCO: A Model-Based and Data-Efficient Approach to Policy Search (M.P. Deisenroth and C.E. Rasmussen) CSC2541 November 4, 2016 PILCO Graphical Model PILCO Probabilistic Inference for Learning COntrol

More information

6 Reinforcement Learning

6 Reinforcement Learning 6 Reinforcement Learning As discussed above, a basic form of supervised learning is function approximation, relating input vectors to output vectors, or, more generally, finding density functions p(y,

More information

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang Chapter 4 Dynamic Bayesian Networks 2016 Fall Jin Gu, Michael Zhang Reviews: BN Representation Basic steps for BN representations Define variables Define the preliminary relations between variables Check

More information

Algorithmisches Lernen/Machine Learning

Algorithmisches Lernen/Machine Learning Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) Decision-Trees, Genetic Algorithms Part 2: Norman Hendrich Support-Vector Machines

More information

Mathematical Formulation of Our Example

Mathematical Formulation of Our Example Mathematical Formulation of Our Example We define two binary random variables: open and, where is light on or light off. Our question is: What is? Computer Vision 1 Combining Evidence Suppose our robot

More information

Learning Gaussian Process Models from Uncertain Data

Learning Gaussian Process Models from Uncertain Data Learning Gaussian Process Models from Uncertain Data Patrick Dallaire, Camille Besse, and Brahim Chaib-draa DAMAS Laboratory, Computer Science & Software Engineering Department, Laval University, Canada

More information

RL 14: POMDPs continued

RL 14: POMDPs continued RL 14: POMDPs continued Michael Herrmann University of Edinburgh, School of Informatics 06/03/2015 POMDPs: Points to remember Belief states are probability distributions over states Even if computationally

More information

Statistical Learning. Philipp Koehn. 10 November 2015

Statistical Learning. Philipp Koehn. 10 November 2015 Statistical Learning Philipp Koehn 10 November 2015 Outline 1 Learning agents Inductive learning Decision tree learning Measuring learning performance Bayesian learning Maximum a posteriori and maximum

More information

Linear Dynamical Systems

Linear Dynamical Systems Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Dynamic Programming Marc Toussaint University of Stuttgart Winter 2018/19 Motivation: So far we focussed on tree search-like solvers for decision problems. There is a second important

More information

Introduction to Artificial Intelligence (AI)

Introduction to Artificial Intelligence (AI) Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 9 Oct, 11, 2011 Slide credit Approx. Inference : S. Thrun, P, Norvig, D. Klein CPSC 502, Lecture 9 Slide 1 Today Oct 11 Bayesian

More information

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012 CSE 573: Artificial Intelligence Autumn 2012 Reasoning about Uncertainty & Hidden Markov Models Daniel Weld Many slides adapted from Dan Klein, Stuart Russell, Andrew Moore & Luke Zettlemoyer 1 Outline

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Model-Based Reinforcement Learning Model-based, PAC-MDP, sample complexity, exploration/exploitation, RMAX, E3, Bayes-optimal, Bayesian RL, model learning Vien Ngo MLR, University

More information

Markov Decision Processes

Markov Decision Processes Markov Decision Processes Noel Welsh 11 November 2010 Noel Welsh () Markov Decision Processes 11 November 2010 1 / 30 Annoucements Applicant visitor day seeks robot demonstrators for exciting half hour

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

CS Machine Learning Qualifying Exam

CS Machine Learning Qualifying Exam CS Machine Learning Qualifying Exam Georgia Institute of Technology March 30, 2017 The exam is divided into four areas: Core, Statistical Methods and Models, Learning Theory, and Decision Processes. There

More information

REINFORCE Framework for Stochastic Policy Optimization and its use in Deep Learning

REINFORCE Framework for Stochastic Policy Optimization and its use in Deep Learning REINFORCE Framework for Stochastic Policy Optimization and its use in Deep Learning Ronen Tamari The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (#67679) February 28, 2016 Ronen Tamari

More information

Introduction to Artificial Intelligence (AI)

Introduction to Artificial Intelligence (AI) Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 10 Oct, 13, 2011 CPSC 502, Lecture 10 Slide 1 Today Oct 13 Inference in HMMs More on Robot Localization CPSC 502, Lecture

More information

Optimal Control of Partiality Observable Markov. Processes over a Finite Horizon

Optimal Control of Partiality Observable Markov. Processes over a Finite Horizon Optimal Control of Partiality Observable Markov Processes over a Finite Horizon Report by Jalal Arabneydi 04/11/2012 Taken from Control of Partiality Observable Markov Processes over a finite Horizon by

More information

Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm

Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm Michail G. Lagoudakis Department of Computer Science Duke University Durham, NC 2778 mgl@cs.duke.edu

More information

Constrained Minimax Optimization of Continuous Search Efforts for the Detection of a Stationary Target

Constrained Minimax Optimization of Continuous Search Efforts for the Detection of a Stationary Target Constrained Mini Optimization of Continuous Search Efforts for the Detection of a Stationary Target Frédéric Dambreville, 1 Jean-Pierre Le Cadre 2 1 Délégation Générale pour l Armement, 16 Bis, Avenue

More information

CS599 Lecture 1 Introduction To RL

CS599 Lecture 1 Introduction To RL CS599 Lecture 1 Introduction To RL Reinforcement Learning Introduction Learning from rewards Policies Value Functions Rewards Models of the Environment Exploitation vs. Exploration Dynamic Programming

More information

Sequential Decision Problems

Sequential Decision Problems Sequential Decision Problems Michael A. Goodrich November 10, 2006 If I make changes to these notes after they are posted and if these changes are important (beyond cosmetic), the changes will highlighted

More information

Administration. CSCI567 Machine Learning (Fall 2018) Outline. Outline. HW5 is available, due on 11/18. Practice final will also be available soon.

Administration. CSCI567 Machine Learning (Fall 2018) Outline. Outline. HW5 is available, due on 11/18. Practice final will also be available soon. Administration CSCI567 Machine Learning Fall 2018 Prof. Haipeng Luo U of Southern California Nov 7, 2018 HW5 is available, due on 11/18. Practice final will also be available soon. Remaining weeks: 11/14,

More information

Template-Based Representations. Sargur Srihari

Template-Based Representations. Sargur Srihari Template-Based Representations Sargur srihari@cedar.buffalo.edu 1 Topics Variable-based vs Template-based Temporal Models Basic Assumptions Dynamic Bayesian Networks Hidden Markov Models Linear Dynamical

More information

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I SYDE 372 Introduction to Pattern Recognition Probability Measures for Classification: Part I Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 Why use probability

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 12: Gaussian Belief Propagation, State Space Models and Kalman Filters Guest Kalman Filter Lecture by

More information

MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti

MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti 1 MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti Historical background 2 Original motivation: animal learning Early

More information

DETECTING PROCESS STATE CHANGES BY NONLINEAR BLIND SOURCE SEPARATION. Alexandre Iline, Harri Valpola and Erkki Oja

DETECTING PROCESS STATE CHANGES BY NONLINEAR BLIND SOURCE SEPARATION. Alexandre Iline, Harri Valpola and Erkki Oja DETECTING PROCESS STATE CHANGES BY NONLINEAR BLIND SOURCE SEPARATION Alexandre Iline, Harri Valpola and Erkki Oja Laboratory of Computer and Information Science Helsinki University of Technology P.O.Box

More information

Massachusetts Institute of Technology

Massachusetts Institute of Technology Massachusetts Institute of Technology 6.867 Machine Learning, Fall 2006 Problem Set 5 Due Date: Thursday, Nov 30, 12:00 noon You may submit your solutions in class or in the box. 1. Wilhelm and Klaus are

More information

Optimal Control. McGill COMP 765 Oct 3 rd, 2017

Optimal Control. McGill COMP 765 Oct 3 rd, 2017 Optimal Control McGill COMP 765 Oct 3 rd, 2017 Classical Control Quiz Question 1: Can a PID controller be used to balance an inverted pendulum: A) That starts upright? B) That must be swung-up (perhaps

More information

Temporal-Difference Q-learning in Active Fault Diagnosis

Temporal-Difference Q-learning in Active Fault Diagnosis Temporal-Difference Q-learning in Active Fault Diagnosis Jan Škach 1 Ivo Punčochář 1 Frank L. Lewis 2 1 Identification and Decision Making Research Group (IDM) European Centre of Excellence - NTIS University

More information

Probabilistic Robotics

Probabilistic Robotics University of Rome La Sapienza Master in Artificial Intelligence and Robotics Probabilistic Robotics Prof. Giorgio Grisetti Course web site: http://www.dis.uniroma1.it/~grisetti/teaching/probabilistic_ro

More information

PATTERN CLASSIFICATION

PATTERN CLASSIFICATION PATTERN CLASSIFICATION Second Edition Richard O. Duda Peter E. Hart David G. Stork A Wiley-lnterscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim Brisbane Singapore Toronto CONTENTS

More information

Novel spectrum sensing schemes for Cognitive Radio Networks

Novel spectrum sensing schemes for Cognitive Radio Networks Novel spectrum sensing schemes for Cognitive Radio Networks Cantabria University Santander, May, 2015 Supélec, SCEE Rennes, France 1 The Advanced Signal Processing Group http://gtas.unican.es The Advanced

More information

Course 16:198:520: Introduction To Artificial Intelligence Lecture 13. Decision Making. Abdeslam Boularias. Wednesday, December 7, 2016

Course 16:198:520: Introduction To Artificial Intelligence Lecture 13. Decision Making. Abdeslam Boularias. Wednesday, December 7, 2016 Course 16:198:520: Introduction To Artificial Intelligence Lecture 13 Decision Making Abdeslam Boularias Wednesday, December 7, 2016 1 / 45 Overview We consider probabilistic temporal models where the

More information

Kalman filtering and friends: Inference in time series models. Herke van Hoof slides mostly by Michael Rubinstein

Kalman filtering and friends: Inference in time series models. Herke van Hoof slides mostly by Michael Rubinstein Kalman filtering and friends: Inference in time series models Herke van Hoof slides mostly by Michael Rubinstein Problem overview Goal Estimate most probable state at time k using measurement up to time

More information

Accuracy and Decision Time for Decentralized Implementations of the Sequential Probability Ratio Test

Accuracy and Decision Time for Decentralized Implementations of the Sequential Probability Ratio Test 21 American Control Conference Marriott Waterfront, Baltimore, MD, USA June 3-July 2, 21 ThA1.3 Accuracy Decision Time for Decentralized Implementations of the Sequential Probability Ratio Test Sra Hala

More information

2-Step Temporal Bayesian Networks (2TBN): Filtering, Smoothing, and Beyond Technical Report: TRCIM1030

2-Step Temporal Bayesian Networks (2TBN): Filtering, Smoothing, and Beyond Technical Report: TRCIM1030 2-Step Temporal Bayesian Networks (2TBN): Filtering, Smoothing, and Beyond Technical Report: TRCIM1030 Anqi Xu anqixu(at)cim(dot)mcgill(dot)ca School of Computer Science, McGill University, Montreal, Canada,

More information

Chris Bishop s PRML Ch. 8: Graphical Models

Chris Bishop s PRML Ch. 8: Graphical Models Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular

More information

Notes on Machine Learning for and

Notes on Machine Learning for and Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Choosing Hypotheses Generally want the most probable hypothesis given the training data Maximum a posteriori

More information

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that

More information

CSEP 573: Artificial Intelligence

CSEP 573: Artificial Intelligence CSEP 573: Artificial Intelligence Hidden Markov Models Luke Zettlemoyer Many slides over the course adapted from either Dan Klein, Stuart Russell, Andrew Moore, Ali Farhadi, or Dan Weld 1 Outline Probabilistic

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Roman Barták Department of Theoretical Computer Science and Mathematical Logic Summary of last lecture We know how to do probabilistic reasoning over time transition model P(X t

More information

Efficient Sensitivity Analysis in Hidden Markov Models

Efficient Sensitivity Analysis in Hidden Markov Models Efficient Sensitivity Analysis in Hidden Markov Models Silja Renooij Department of Information and Computing Sciences, Utrecht University P.O. Box 80.089, 3508 TB Utrecht, The Netherlands silja@cs.uu.nl

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

Using Gaussian Processes for Variance Reduction in Policy Gradient Algorithms *

Using Gaussian Processes for Variance Reduction in Policy Gradient Algorithms * Proceedings of the 8 th International Conference on Applied Informatics Eger, Hungary, January 27 30, 2010. Vol. 1. pp. 87 94. Using Gaussian Processes for Variance Reduction in Policy Gradient Algorithms

More information

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes (bilmes@cs.berkeley.edu) International Computer Science Institute

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

Bayesian Networks Inference with Probabilistic Graphical Models

Bayesian Networks Inference with Probabilistic Graphical Models 4190.408 2016-Spring Bayesian Networks Inference with Probabilistic Graphical Models Byoung-Tak Zhang intelligence Lab Seoul National University 4190.408 Artificial (2016-Spring) 1 Machine Learning? Learning

More information

10 Robotic Exploration and Information Gathering

10 Robotic Exploration and Information Gathering NAVARCH/EECS 568, ROB 530 - Winter 2018 10 Robotic Exploration and Information Gathering Maani Ghaffari April 2, 2018 Robotic Information Gathering: Exploration and Monitoring In information gathering

More information

Application of probabilistic PCR5 Fusion Rule for Multisensor Target Tracking

Application of probabilistic PCR5 Fusion Rule for Multisensor Target Tracking Application of probabilistic PCR5 Fusion Rule for Multisensor Target Tracking arxiv:0707.3013v1 [stat.ap] 20 Jul 2007 Aloïs Kirchner a, Frédéric Dambreville b, Francis Celeste c Délégation Générale pour

More information

Algorithm-Independent Learning Issues

Algorithm-Independent Learning Issues Algorithm-Independent Learning Issues Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2007 c 2007, Selim Aksoy Introduction We have seen many learning

More information

Development of a Deep Recurrent Neural Network Controller for Flight Applications

Development of a Deep Recurrent Neural Network Controller for Flight Applications Development of a Deep Recurrent Neural Network Controller for Flight Applications American Control Conference (ACC) May 26, 2017 Scott A. Nivison Pramod P. Khargonekar Department of Electrical and Computer

More information

CS 446 Machine Learning Fall 2016 Nov 01, Bayesian Learning

CS 446 Machine Learning Fall 2016 Nov 01, Bayesian Learning CS 446 Machine Learning Fall 206 Nov 0, 206 Bayesian Learning Professor: Dan Roth Scribe: Ben Zhou, C. Cervantes Overview Bayesian Learning Naive Bayes Logistic Regression Bayesian Learning So far, we

More information

Bayesian Networks BY: MOHAMAD ALSABBAGH

Bayesian Networks BY: MOHAMAD ALSABBAGH Bayesian Networks BY: MOHAMAD ALSABBAGH Outlines Introduction Bayes Rule Bayesian Networks (BN) Representation Size of a Bayesian Network Inference via BN BN Learning Dynamic BN Introduction Conditional

More information

10-701/ Machine Learning, Fall

10-701/ Machine Learning, Fall 0-70/5-78 Machine Learning, Fall 2003 Homework 2 Solution If you have questions, please contact Jiayong Zhang .. (Error Function) The sum-of-squares error is the most common training

More information

CSCE 478/878 Lecture 6: Bayesian Learning

CSCE 478/878 Lecture 6: Bayesian Learning Bayesian Methods Not all hypotheses are created equal (even if they are all consistent with the training data) Outline CSCE 478/878 Lecture 6: Bayesian Learning Stephen D. Scott (Adapted from Tom Mitchell

More information

CS242: Probabilistic Graphical Models Lecture 4B: Learning Tree-Structured and Directed Graphs

CS242: Probabilistic Graphical Models Lecture 4B: Learning Tree-Structured and Directed Graphs CS242: Probabilistic Graphical Models Lecture 4B: Learning Tree-Structured and Directed Graphs Professor Erik Sudderth Brown University Computer Science October 6, 2016 Some figures and materials courtesy

More information

Hidden Markov Models Part 1: Introduction

Hidden Markov Models Part 1: Introduction Hidden Markov Models Part 1: Introduction CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Modeling Sequential Data Suppose that

More information

A Probabilistic Relational Model for Characterizing Situations in Dynamic Multi-Agent Systems

A Probabilistic Relational Model for Characterizing Situations in Dynamic Multi-Agent Systems A Probabilistic Relational Model for Characterizing Situations in Dynamic Multi-Agent Systems Daniel Meyer-Delius 1, Christian Plagemann 1, Georg von Wichert 2, Wendelin Feiten 2, Gisbert Lawitzky 2, and

More information

Lecture 6: April 19, 2002

Lecture 6: April 19, 2002 EE596 Pat. Recog. II: Introduction to Graphical Models Spring 2002 Lecturer: Jeff Bilmes Lecture 6: April 19, 2002 University of Washington Dept. of Electrical Engineering Scribe: Huaning Niu,Özgür Çetin

More information

Sequential Monte Carlo methods for filtering of unobservable components of multidimensional diffusion Markov processes

Sequential Monte Carlo methods for filtering of unobservable components of multidimensional diffusion Markov processes Sequential Monte Carlo methods for filtering of unobservable components of multidimensional diffusion Markov processes Ellida M. Khazen * 13395 Coppermine Rd. Apartment 410 Herndon VA 20171 USA Abstract

More information

16.4 Multiattribute Utility Functions

16.4 Multiattribute Utility Functions 285 Normalized utilities The scale of utilities reaches from the best possible prize u to the worst possible catastrophe u Normalized utilities use a scale with u = 0 and u = 1 Utilities of intermediate

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

Learning from Sequential and Time-Series Data

Learning from Sequential and Time-Series Data Learning from Sequential and Time-Series Data Sridhar Mahadevan mahadeva@cs.umass.edu University of Massachusetts Sridhar Mahadevan: CMPSCI 689 p. 1/? Sequential and Time-Series Data Many real-world applications

More information

Forecasting Wind Ramps

Forecasting Wind Ramps Forecasting Wind Ramps Erin Summers and Anand Subramanian Jan 5, 20 Introduction The recent increase in the number of wind power producers has necessitated changes in the methods power system operators

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Statistical learning. Chapter 20, Sections 1 3 1

Statistical learning. Chapter 20, Sections 1 3 1 Statistical learning Chapter 20, Sections 1 3 Chapter 20, Sections 1 3 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete

More information

de Blanc, Peter Ontological Crises in Artificial Agents Value Systems. The Singularity Institute, San Francisco, CA, May 19.

de Blanc, Peter Ontological Crises in Artificial Agents Value Systems. The Singularity Institute, San Francisco, CA, May 19. MIRI MACHINE INTELLIGENCE RESEARCH INSTITUTE Ontological Crises in Artificial Agents Value Systems Peter de Blanc Machine Intelligence Research Institute Abstract Decision-theoretic agents predict and

More information

The Cross Entropy Method for the N-Persons Iterated Prisoner s Dilemma

The Cross Entropy Method for the N-Persons Iterated Prisoner s Dilemma The Cross Entropy Method for the N-Persons Iterated Prisoner s Dilemma Tzai-Der Wang Artificial Intelligence Economic Research Centre, National Chengchi University, Taipei, Taiwan. email: dougwang@nccu.edu.tw

More information

CS532, Winter 2010 Hidden Markov Models

CS532, Winter 2010 Hidden Markov Models CS532, Winter 2010 Hidden Markov Models Dr. Alan Fern, afern@eecs.oregonstate.edu March 8, 2010 1 Hidden Markov Models The world is dynamic and evolves over time. An intelligent agent in such a world needs

More information

Introduction to Mobile Robotics Probabilistic Robotics

Introduction to Mobile Robotics Probabilistic Robotics Introduction to Mobile Robotics Probabilistic Robotics Wolfram Burgard 1 Probabilistic Robotics Key idea: Explicit representation of uncertainty (using the calculus of probability theory) Perception Action

More information

Dynamic Approaches: The Hidden Markov Model

Dynamic Approaches: The Hidden Markov Model Dynamic Approaches: The Hidden Markov Model Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Inference as Message

More information

An Adaptive Neural Network Scheme for Radar Rainfall Estimation from WSR-88D Observations

An Adaptive Neural Network Scheme for Radar Rainfall Estimation from WSR-88D Observations 2038 JOURNAL OF APPLIED METEOROLOGY An Adaptive Neural Network Scheme for Radar Rainfall Estimation from WSR-88D Observations HONGPING LIU, V.CHANDRASEKAR, AND GANG XU Colorado State University, Fort Collins,

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September

More information

Linear Dynamical Systems (Kalman filter)

Linear Dynamical Systems (Kalman filter) Linear Dynamical Systems (Kalman filter) (a) Overview of HMMs (b) From HMMs to Linear Dynamical Systems (LDS) 1 Markov Chains with Discrete Random Variables x 1 x 2 x 3 x T Let s assume we have discrete

More information

Data Structures for Efficient Inference and Optimization

Data Structures for Efficient Inference and Optimization Data Structures for Efficient Inference and Optimization in Expressive Continuous Domains Scott Sanner Ehsan Abbasnejad Zahra Zamani Karina Valdivia Delgado Leliane Nunes de Barros Cheng Fang Discrete

More information

Human-Oriented Robotics. Temporal Reasoning. Kai Arras Social Robotics Lab, University of Freiburg

Human-Oriented Robotics. Temporal Reasoning. Kai Arras Social Robotics Lab, University of Freiburg Temporal Reasoning Kai Arras, University of Freiburg 1 Temporal Reasoning Contents Introduction Temporal Reasoning Hidden Markov Models Linear Dynamical Systems (LDS) Kalman Filter 2 Temporal Reasoning

More information

Statistical learning. Chapter 20, Sections 1 4 1

Statistical learning. Chapter 20, Sections 1 4 1 Statistical learning Chapter 20, Sections 1 4 Chapter 20, Sections 1 4 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete

More information

COMP 551 Applied Machine Learning Lecture 20: Gaussian processes

COMP 551 Applied Machine Learning Lecture 20: Gaussian processes COMP 55 Applied Machine Learning Lecture 2: Gaussian processes Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~hvanho2/comp55

More information

Math 350: An exploration of HMMs through doodles.

Math 350: An exploration of HMMs through doodles. Math 350: An exploration of HMMs through doodles. Joshua Little (407673) 19 December 2012 1 Background 1.1 Hidden Markov models. Markov chains (MCs) work well for modelling discrete-time processes, or

More information

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore

More information

Q-Learning in Continuous State Action Spaces

Q-Learning in Continuous State Action Spaces Q-Learning in Continuous State Action Spaces Alex Irpan alexirpan@berkeley.edu December 5, 2015 Contents 1 Introduction 1 2 Background 1 3 Q-Learning 2 4 Q-Learning In Continuous Spaces 4 5 Experimental

More information

Independent Component Analysis and Unsupervised Learning

Independent Component Analysis and Unsupervised Learning Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien National Cheng Kung University TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent

More information

CS 6375 Machine Learning

CS 6375 Machine Learning CS 6375 Machine Learning Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Course Info. Instructor: Nicholas Ruozzi Office: ECSS 3.409 Office hours: Tues.

More information

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Introduction: MLE, MAP, Bayesian reasoning (28/8/13) STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this

More information

Dialogue as a Decision Making Process

Dialogue as a Decision Making Process Dialogue as a Decision Making Process Nicholas Roy Challenges of Autonomy in the Real World Wide range of sensors Noisy sensors World dynamics Adaptability Incomplete information Robustness under uncertainty

More information

ON SCALABLE CODING OF HIDDEN MARKOV SOURCES. Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose

ON SCALABLE CODING OF HIDDEN MARKOV SOURCES. Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose ON SCALABLE CODING OF HIDDEN MARKOV SOURCES Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose Department of Electrical and Computer Engineering University of California, Santa Barbara, CA, 93106

More information

A Tour of Reinforcement Learning The View from Continuous Control. Benjamin Recht University of California, Berkeley

A Tour of Reinforcement Learning The View from Continuous Control. Benjamin Recht University of California, Berkeley A Tour of Reinforcement Learning The View from Continuous Control Benjamin Recht University of California, Berkeley trustable, scalable, predictable Control Theory! Reinforcement Learning is the study

More information

Markov chain optimisation for energy systems (MC-ES)

Markov chain optimisation for energy systems (MC-ES) Markov chain optimisation for energy systems (MC-ES) John Moriarty Queen Mary University of London 19th August 2016 Approaches to the stochastic optimisation of power systems are not mature at research

More information

Markov localization uses an explicit, discrete representation for the probability of all position in the state space.

Markov localization uses an explicit, discrete representation for the probability of all position in the state space. Markov Kalman Filter Localization Markov localization localization starting from any unknown position recovers from ambiguous situation. However, to update the probability of all positions within the whole

More information

Final Exam, Fall 2002

Final Exam, Fall 2002 15-781 Final Exam, Fall 22 1. Write your name and your andrew email address below. Name: Andrew ID: 2. There should be 17 pages in this exam (excluding this cover sheet). 3. If you need more room to work

More information

Introduction to Bayesian Learning. Machine Learning Fall 2018

Introduction to Bayesian Learning. Machine Learning Fall 2018 Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability

More information

Symbolic Perseus: a Generic POMDP Algorithm with Application to Dynamic Pricing with Demand Learning

Symbolic Perseus: a Generic POMDP Algorithm with Application to Dynamic Pricing with Demand Learning Symbolic Perseus: a Generic POMDP Algorithm with Application to Dynamic Pricing with Demand Learning Pascal Poupart (University of Waterloo) INFORMS 2009 1 Outline Dynamic Pricing as a POMDP Symbolic Perseus

More information