Continuous Learning Method for a Continuous Dynamical Control in a Partially Observable Universe
|
|
- Jonah Briggs
- 5 years ago
- Views:
Transcription
1 Continuous Learning Method for a Continuous Dynamical Control in a Partially Observable Universe Frédéric Dambreville Délégation Générale pour l Armement, DGA/CTA/DT/GIP 16 Bis, Avenue Prieur de la Côte d Or Arcueil, F 94114, France Web: Abstract - In this paper, we are interested in the optimal dynamical control of sensors based on partial and noisy observations. These problems are related to the POMDP family. In this case however, we are manipulating continuous-valued controls and continuous-valued decisions. While the dynamical programming method will rely on a discretization of the problem, we are dealing here directly with the continuous data. Moreover, our purpose is to address the full past observation range. Our approach is to modelize the POMDP strategies by means of Dynamic Bayesian Networks. A method, based on the Cross-Entropy is implemented for optimizing the parameters of such DBN, relatively to the POMDP problem. In this particular work, the Dynamic Bayesian Networks are built from semi-continuous probabilistic laws, so as to ensure the manipulation of continuous data. Keywords: Optimization, Dynamical control, Crossentropie method, Resource allocation, Tracking 1 Introduction When planning the surveillance of an area, there are different degrees of difficulties in the optimization of the sensorsallocations. There are well known and efficient models for such planning when the observation is not involved in the optimization process[1, 2, 3, 4, 5]. The planning is much more difficult when it becomes dynamical and involves some partial observation. In order to better understand what are the difficulties here, let us consider a very simple example. Assume that we have to catch a target moving within a field. This target is not entirely predictible and moves according to a random model. Now, assume there is a hill in the center of the field. It is hypothesized that moving in the field is easy, while climbing the hill is difficult. On the other hand, observing the target from the field is difficult while one have a full observation of the situation from the top of the hill. Then, what should be our strategy? Are we managing our investigation in the field only: we are moving fast, but with poor observations. Or are we climbing first on the hill, in order to have a better knowledge of the target: we are loosing time first, by climbing the hill. But then we have a good perception of the target. And how will we use such a knowledge efficiently? How to evaluate the information earned from the hill? Such choices are the fundamental difficulties of a planning with partial observation. Mathematically, the problem is particularly complex. A quite classical model for such problems is the theory of Partially Observable Markov Decision Process. There are some hypotheses done here. First, the law of evolution of the universe (e.g. the target move) is markovian. Secondly, the criterion to be obtimized is sufficiently simple: it is additive with the time. It is well known[10, 11] that the problem is solved then by a dynamical programming approach. But the solution is in many cases tedious or even untractable practically. Such problems are then forced to be simple. Moreover, it is necessary in the approach to discretize the problem. Another approach is to approximate the strategy by reinforcement learning[12]. Although this method needed to bound the past observation range, there are interesting progress by the way of a hierarchical approach. In previous works[6] we proposed a new approach which shares some common points with the reinforcement learning method. The purpose here is to describe the possible policies of control by a wide family of probabilistic laws (typically a family of Dynamical Bayesian Networks). Then to learn an optimal law within this family by a simulation optimization algorithm. As described in the previous papers, this approach makes very few hypotheses about the problem: no Markovian hypothesis, no additivity hypothesis, and no restriction in the past observartion range. But of course there are limitations: the limitations of our policy models. Since the complexity of these models are necessarily bounded, the optimal answer to the control problem is restricted. It has been shown however, that complex policy models were not necessary to reach a good policy. Moreover, it is possible to implement hierarchical models[7], which allow more complex answer for the control. In this paper, we will apply such a technique to a problem of detection-investigation. While many POMDP
2 problems are manipulating discretized quantities, we are applying our method directly to continuous parameters. Here, the optimized policy will be a probabilistic law, which will take a continuous observation (typically, the noised positions of the target) and produce a continuous dcision (typically, the direction and speed of two patrolers). More precisely, the optimal strategic tree will be approximated by means of semi-continuous Hidden Markov Models. We will describe the setting of these laws and how to relearn them during the simulation process. The next section introduces some formalism and gives a quick description of control problems with partial observation. Our planning method is then introduced. It is based on the direct approximation of the optimal decision tree by means of an approximating structure; typically a Hidden Markov Model is used for such structure. A particular structure of HMM for manipulating continuous data is introduced; it has been implemented. The third section explains how to optimize the parameters of this HMM, in order to approximate the optimal decision tree for the planning with partial observation. In particular, the cross-entropy method is described and applied. The fourth section explains the simulated application on which our model is applied, and presents some results. 2 Decision in a partially observable universe This section is dedicated to the theoretical description of the control with partial observation. A practical exemple of experimentation, a simulation, is detailed in section 4 One should keep in mind that we intend here to solve a control problem manipulating continuous parameters (decisions and observations). Now let introduce the formal problem. It is assumed that a subject is acting in a given world with a given purpose or mission. The goal is to optimize the accomplishment of this mission. The subject will receive observation from the world, and will produce action on it. The world. The world is characterized by an hidden state x, which is evolving with the time. As an assumption, the time t is discretized from step 1 to the maximal step T. The temporal evolution of the hidden state is denoted by the vector x = x 1,...,x t,...,x T. During the evolution of the word, the subject will make some decisions d which will impact the evolution of the world. He is also able to make some partial and noisy observations, denoted y. The world is thus characterized by a law of evolution involving both the decision, the hidden state and the observation. It is hypothesized that this law, denoted P, is probabilistic: The hidden state x t and observation y t are obtained from the conditional law P(x t, y t x 1:t 1, y 1:t 1, d 1:t 1 ), which depends Figure 1: The world Hidden state x y 1 d 1 y 2 d 2 y 3 d 3 y t d t y t+1 d t+1 on the past states, observations and decisions. Moreover, it is assumed that d t is generated by the subject after the observation x t. There is no Markovian hypothesis about the law P. But it is assumed that the laws P(x t, y t x 1:t 1, d 1:t 1 ) are simulated very quickly. Notice that this lack of assumption make impossible the use of a method based on the dynamic programming. The law of x, y d is represented graphically by figure 1. In this description, output arrows are for the values produced by the world, i.e. in this case the observations. The input arrows are for the values consumed by the world, i.e. the decision of the subject. The variables are put chronologically: y t appears before d t because the decision d t occurs after the observation y t. In this paper, the observation y t and the decision d t are continuous values. From now on, we will use the notation P(x, y d) for the full law of the world: P(x, y d) = T P(x t, y t x 1:t 1, d 1:t 1 ). Evaluation and optimal planning. The previous paragraphs have built a modelling of the world, of the actions and of the observations. We are now giving a characterization of the mission to be accomplished. The mission is limited in time. Let T be this maximum time. In the most generality, the mission is evaluated by a function V (d, y, x) defined on the trajectories of d, y, x. Typically, the function V could be used for computing the time needed for the mission accomplishment. The purpose is to construct an optimal decision tree d(obs) depending on the observation obs in order to maximize the mean evaluation. This is a dynamic optimization problem, since the actions depend on the previous observations. The related program consists in the optimization of y ( d t (y 1:t ) T ) as follows: d arg max d y x P ( x, y ( d t (y 1:t ) T )) V ( ( d t (y 1:t ) T ) (1), y, x)dydx. This optimization is schematized in figure 2. It is related to the family of Partially Observable Markov Decision Problems, although there is no Markovian hypothesis about the world here. In the figure, the double arrows characterize the variables to be optimized. More precisely, these arrows describe the flow of information between the observations and the actions. The cells denoted are making decisions and transmitting
3 Figure 2: POMDP planning Hidden state x y 1 y 2 y 3 y t y t+1 d 1 d 2 d 3 d t d t+1 all the received and generated information (including the actions). This architecture illustrates that planning with observation is an indefinite-memory problem: the decision depends on the whole past observations. When the evaluation function V is additive, it is known that there is a finite-memory construction of the solution, by means of the dynamic programming paradigm. However, notice that this finite-memory is a probabilistic posterior of the world, resulting from the past observations. By the way, it is too huge to be manipulated properly. An alternative method to the Dynamic Programming is proposed subsequently. It relies on the optimisation of a probabilistic template for the control policy. Direct approximation of the decision tree. In an optimization problem like (1), the value to be optimized, d, is a deterministic object. In this precise case, d is a tree of decision, that is a function which maps to a decision d t from any sequence of observation y 1:t. It is possible however to have a probabilistic viewpoint. Then the problem is equivalent to finding π(d y), a probabilistic law of actions conditionally to the past observations, which maximizes the mean evaluation: V (π) = d y x T π(d t d 1:t 1, y 1:t ) P(x, y d) V (d, y, x)dxdy dd. Notice that this problem could be still schematized by figure 2, but the double arrows now describes the DBN (Dynamic Bayesian Network) structure of the law π. While are indefinite memories, the schematized law is quite general. Actually, there will not be a great difference with the deterministic case for an otimal solution: when the solution d is unique, the optimal law π is a dirac on d. But things change when approximating π. Now, why using a probability to approximate the optimal strategy? The main point is that probabilistic models seem more suitable for approximation. The second point is that we are sure to approximate continuously: indeed, π V (π) is continuous. There is a third point. When replacing in figure 2 the indefinite memories by finite memories, let they be denoted by m as in figure 3, it is then obtained a natural approximation of the law π. The approximated law is a Hidden Markov Model. As will be seen, HMM are very practical for an optimization. Figure 3: Finite-memory planning approximation Hidden state x y 1 y 2 y 3 y t y t+1 d 1 d 2 d 3 d t d t+1 m 1 m 2 m 3 m t m t+1 Figure 4: A typical Hidden Markov Model y 1 y 2 y 3 y t y t+1 d 1 d 2 d 3 d t d t+1 m 1 m 2 m 3 m t m t+1 Policy approximation by a HMM. Define for any time t a variable m t M, called the memory at time t. Notice that m t is intended to describe a finite memory. Nevertheless, M is not neccessarily a finite set; for example, M could contain continuous or a semicontinuous values. In the most general case, a HMM for the decision policy will take the form: h(d y) = m M T h(d, m y)dm where: h(d, m y) = T ( h(dt m t )h(m t y t, m t 1 ) ). This general model for an HMM is schematized in figure 4. This formalism is the most general, and it hides many possible HMM settings more or less intricated. We will not discuss here about the detailed structure of the HMM (see next paragraph), but instead about the general principle of the approximation of π by such HMM. The approach developped in [6][7] is quite general and can be split up into two points: Define a family of HMMs H, to be used as policy aproximation, Optimize the parameters of such HMMs in order to maximize the mean evaluation: Find h argmax h H V (h). In practice, a good choice of H implies that h is a good approximation of π. A detailed description of the HMM family. It is recalled that our purpose is to investigate a continuous control problem. Thus, any HMM h H should input a continuous data (the observation y) and output a continuous decision d. The choice here is to manipulate both a discrete memory and a continuous memory.
4 Let call m D the discrete memory of h, and assume m D t takes its values within the set {0,...,2 L 1}. Let m C denote the continuous memory of h, and assume that m C t is a IR-valued vector of dimension K. In addition, we will define a continuous temporary memory, denoted µ D, such that µ D t is a IR-valued vector of dimension L. The idea is to derive the continuous data m C t and µ D t from the previous memories m C t 1, m D t 1 and observations y t by means of a Gaussian law; these law will be optimized. The discrete memory m D t is obtained by discretizing the temporary memory µ D t ; this process is fixed and cannot be optimized. The decision d t is obtained from the memory m C t, md t by means of a Gaussian law; this law will be optimized. Let N(Σ, µ) denotes a multivariate gaussian vector with variance matrix Σ and means vector µ. All the process could be detailed as follows: m C t = N(Σ C [m D t 1], A C [m D t 1](1, m C t 1, y t )), where the matrices Σ C [m] and A C [m] have to be optimized for any m {0,...,2 L 1}. Notice that Σ C [m] is of dimension K K while A C [m] is of dimension K (1 + K + dimy t ), µ D t = N(Σ D [m D t 1 ], AD [m D t 1 ](1, mc t 1, y t)), where the matrices Σ D [m] and A D [m] have to be optimized for any m {0,...,2 L 1}. Notice that Σ D [m] is of dimension L L while A D [m] is of dimension L (1 + K + dimy t ), m D t is the boolean-vector of dimension L which indicates in which hypercorner of IR L is placed µ D t. More precisely: where: L 1 m D t = b kt 2 k, k=0 b kt = 1 when µ D kt 0 and b kt = 0 else, d t = N(Σ dec [m D t ], A dec [m D t ](1, m C t )), where the matrices Σ dec [m] and A dec [m] have to be optimized for any m {0,...,2 L 1}. Notice that Σ dec [m] is of dimension dim d t dimd t while A dec [m] is of dimension dimd t (1 + K) Figure 5 give an illustration of the Markovian transition. The doubled arrows means that the parameter have to be optimized. From now on, the set H will refer to these semi-continuous HMM. Why not using purely continuous HMM? Continuous hmm, particularly Gaussian, are too weak structures. A semi-continuous scheme is necessary to achieve a sufficient abstraction. It remains now to explain how to optimize the choice of h among the family H. The following section explains a cross-entropic method for optimizing such a choice. Figure 5: Semi-continuous HMM transition m C t 1 m D t 1 y t m C t µ D t µ D t 3 Cross-entropic optimization The reader interested in CE methods should refer to the tutorial on the CE method[8]. CE algorithms were first dedicated to estimating the probability of rare events. A slight change of the basic algorithm made it also good for optimization. In their new article[13], Homem-de-Mello and Rubinstein have given some results about the global convergence. In order to ensure such convergence, some refinements are introduced particularly about the selective rate. This presentation is restricted to the basic CE optimization method. The new improvements of the CE algorithm proposed in [13] have not been implemented, but the algorithm has been seen to work properly. For this reason, this paper does not deal with the choice of the selective rate. 3.1 General CE algorithm for the optimization The Cross Entropy algorithm repeats until convergence the three successive phases: 1. Generate samples of random data according to a parameterized random mechanism, 2. Select the best samples according to an evaluation criterion, 3. Update the parameters of the random mechanism, on the basis of the selected samples. In the particular case of CE, the update in phase 3 is obtained by minimizing the Kullback-Leibler distance, or cross entropy, between the updated random mechanism and the selected samples. The next paragraphs describe on a theoretical example how such method can be used in an optimization problem. Formalism Let be given a function x f(x); this function is easily computable. The value f(x) has to be maximized, by optimizing the choice of x X. The function f will be the evaluation criterion. Now let be given a family of probabilistic laws, P σ σ Σ, applying on the variable x. The family P is the parameterized random mechanism. The variable x is the random data. Let ρ ]0, 1[ be a selective rate. The CE algorithm for (x, f, P) follows the synopsis: d t
5 1. Initialize σ Σ, 2. Generate N samples x n according to P σ, 3. Select the ρn best samples according to the evaluation criterion f, 4. Update σ as a minimizer of the cross-entropy with the selected samples: σ argmax ln P σ (x n ), σ Σ n selected 5. Repeat from step 2 until convergence. This algorithm requires f to be easily computable. Interpretation The CE algorithm tightens the law P σ around the maximizer of f. Then, when the probabilistic family P is well suited to the maximization of f, it becomes equivalent to find a maximizer for f or to optimize the parameter σ by means of the CE algorithm. The problem is to find a good family... Another issue is the criterion for deciding the convergence. Some answers are given in [13]. Now, it is outside the scope of this paper to investigate these questions precisely. Our criterion was to stop after a given threshold of successive unsuccessful tries and this very simple method have worked fine on our problem. 3.2 Application The cross-entropy, together with the probabilistic modelling of the policy, is now applied in order to approximate the optimal strategy for a planning with partial observation. Our objective is to tune the semicontinuous HMM h H in order to have the best approximation of the optimal planning strategy π : π h arg max h H V (h). Define P[h] the complete probabilistic law of the system Universe/Planner by: P[h](d, y, x, m) = P(x, y d)h(d, m y). Notice here that the memory is composite, i.e. m = (m D, m C, µ D ) with one discrete and two continuous components. The approximated planning reduces to solve: h arg max P[h](d, y, x, m) h H d y x m V (d, y, x)dxdy dddm. Optimizing h means tuning the parameter h H in order to tighten the probability P[h] around optimal values for V. This is exactly solved by the Cross- Entropy optimization method. However, it is required that the evaluation function V is easily computable. Typically, the definition of V may be recursive, eg. 1 : V (d, y, x) = {v t (d t, y t, x t, } 2 t=t v 1(d 1, y 1, x 1 ) {)} T t=2. 1 Braces used with subscripts, eg. {} T, only have a grammatical meaning here. More precisely, it means that the symbols inside the braces are duplicated and concatenated according to the subscript. For example, {f k (} 3 k=1 x{)}1 k=3 means f 1 (f 2 (f 3 (x))) and {x k } T k=t means T k=t x k. Let the selective rate ρ be a positive number such that ρ < 1. The cross-entropy optimization method follows the synopsis: 1. Initialize h. For example a flat h, 2. Make N tossing θ n = (d n, y n, x n, m n ) according to the law P[h], 3. Choose the ρn best samples θ n according to the evaluation V (d, y, x). Denote S the set of the selected samples, 4. Update h as the minimizer of the cross-entropy with the selected examples: h argmax h H ln P[h](θ n ), (2) 5. Reiterate from step 2 until convergence. In this case, the maximization (2) is not difficult. In particular, the Markovian property is widely used: ln P[h] is derived into a sum and subsequently, the optimization is split into several elementary independent problems. At last, this maximization (2) is solved by: For the continuous memory (v means the transpose of vector v): A C [m] = and m C n,t (1, mc n,t 1, y n,t) t:m D n,t 1 =m (1, m C n,t 1, y n,t)(1, m C n,t 1, y n,t) t:m D n,t 1 =m Γ nt Γ nt Σ C t:m D n,t 1 [m] = { =m } card n S, t/m D t 1 = m where Γ nt = m C n,t AC [m](1, m C n,t 1, y n,t) For the temporary memory: A D [m] = µ D n,t (1, mc n,t 1, y n,t) t:m D n,t 1 =m and t:m D n,t 1 =m (1, m C n,t 1, y n,t )(1, m C n,t 1, y n,t ) Γ nt Γ nt Σ D t:m D n,t 1 [m] = { =m } card n S, t/m D t 1 = m where Γ nt = µ D n,t AD [m](1, m C n,t 1, y n,t) 1 1
6 For the decision: A dec [m] = d n,t (1, m C n,t ) t:m D n,t =m 1 and t:m D n,t =m (1, m C n,t)(1, m C n,t) Γ nt Γ nt Σ dec t:m D n,t [m] = { =m } card n S, t/m D t = m where Γ nt = d n,t A dec [m](1, m C n,t ) 4 Implementation The algorithm has been applied to a target detection and interception problem. 4.1 The experiment. A target R is moving in the continuous space [ 20, 20] [ 20, 20]. It is initially located in the area [0, 20] [ 20, 20], with a known distribution (actually an uniform distribution). R is tracked by two mobiles, B 1 and B 2, controled by the subject. B 1 and B 2 are initially located at the coordonates ( 20, 0). The mobiles receive the relative position of each other, with an additive noise (a gaussian noises with variance 1). Each mobile receives only one information about the target: it knows the direction of the target spot but have no information about its distance. Moreover, this direction information is noisy. The noise may vary with the (Euclidean) distance d between the patrol and the target: in this simulation, the angular noise is a uniform random variable on the set [ d π d+1 2, d d+1 π 2 ]. Each mobile will receive this information as a spot tossed accordingly to the noisy distribution (in particular, there is a very big variance about the distance). Thus, the dimension of the continuous information y t is 8, since y t contains two spot positions and two mobile relative positions. B 1 and B 2 are able to move, according to a directive of the subject. The directive is a direction and a move intensity (a speed) for each mobile. The mobile maximal speed is 2, starting form 0; but the mobile cannot escape from the space. Thus, the dimension of the continuous decision d t is 4 (moves will be truncated). Moreover, the patrols moves are noised additively (a gaussian noises with variance 1). The target moves accordingly to the following directives (unless other test directives are given): It cannot escape from the space, unless it reach the escape line { 20} [ 20, 20], The target speed is characterized by its relative move from step t to step t+1. This relative move is chosen as a uniform random variable on the set [ 4, 0] [ 2, 2]. The move is truncated, if a constraint is reached. As a consequence, the target is moving downward so as to reach the escape line. It moves twice faster than the patrols. The purpose of the mission is to get closer as possible to the target (at least one time and by mean of at least one patrol), before it escapes. More precisely, the evaluation V of a sample is given by: V = max max{ 1 t before escape d(b1 t, Rt ) 2, 1 d(b2 t, Rt ) 2 }. Thus, we are just optimizing the expected maximal inverted (squared) distance, which results in strategies with close target contacts. The total number of turn is T = 100. Results. Owing to the conference deadline Schedule, our tests have been limited. More results should be available later at this address: In the tests described subsequently, the processes have been run for one hour on a 2GHz PC (almost all the processor time were used). This was sufficient to reach a good convergence, since most of the gains are obtained at the beginning of the process (convergence is almost done after about ten minutes). In this version of the paper, we are interested in 3 different tests (test 1 is the simplest). Test 1. In this case, the target does not move, and is initially located at position (20, 0). After optimization of the strategy, the obtained mean reward is 1982, which means that a patrol will contact the target at distance Test 2. Again the target does not move, but is initially located randomly on the space [0, 20] [ 20, 20] with a uniform distribution. After optimization of the strategy, the obtained mean reward is 17, which means that a patrol will contact the target at distance Test 3. In this case, the full location and moving hypotheses are made about the target. Notice that the period of possible contact is quite reduced (because of escape) in comparison with previous tests. After optimization of the strategy, the obtained mean reward is 16, which means that a patrol will contact the target at distance It appears fortunately that our optimized policies are able of good contact with the target. Such results are promizing, but more tests should be done, and comparisons with other methods (for example a Q-learning approach on a discretized problem) are needed. More
7 intricated examples should be investigated too. These tests are considered for a next future. 5 Conclusion. In this paper, we proposed a method for approximating the optimal planning in a partially observable control problem. The planning involves an optimization of continuous decision in regards to a sequence of continuous past observations. The method relies on a modelling of the policies by means of a semi-continuous probabilitic law family. The method of cross-entropy is applied to find the optimal law. This method will be implemented for solving a problem of detectioninvestigation, where two mobiles have to catch a target, while receiving a radial observation of this target. The tests on this scenario are promizing. More tests are being done. [11] Anthony Rocco Cassandra, Exact and approximate algorithms for partially observable Markov decision processes, PhD thesis, Brown University, Rhode Island, Providence, May [12] B. Bakker, J. Schmidhuber, Hierarchical Reinforcement Learning Based on Subgoal Discovery and Subpolicy Specialization, in Proceedings of the 8-th Conference on Intelligent Autonomous Systems, Amsterdam, The Netherlands, p , [13] Homem-de-Mello, Rubinstein, Rare Event Estimation for Static Models via Cross-Entropy and Importance Sampling, tito/list.htm [14] Kevin Murphy and Mark Paskin, Linear Time Inference in Hierarchical HMMs, Proceedings of Neural Information Processing Systems, References [1] S.S. Brown, Optimal Search for a Moving Target in Discrete Time and Space. Operations Research 28, pp , [2] J. de Guenin, Optimum Distribution of Effort: an Extension of the Koopman Basic Theory. Operations Research 9, pp 1 7, [3] B.O. Koopman, Search and Screening: General Principle with Historical Applications. MORS Heritage Series, Alexandria, VA, [4] L.D. Stone, Theory of Optimal Search, 2-nd ed.. Operations Research Society of America, Arlington, VA, [5] A.R. Washburn, Search for a moving Target: The FAB algorithm. Operations Research 31, pp , [6] Frederic Dambreville, Learning a Machine for the Decision in a Partially Observable Markov Universe, ISDA 2004, Budapest, Hungary, August 26-28, [7] Frederic Dambreville, Learning a Machine for the Decision in a Partially Observable Markov Universe. Submitted to European Journal of Operation Research. [8] De Boer and Kroesse and Mannor and Rubinstein, A Tutorial on the Cross-Entropy Method, [9] Richard Bellman, Dynamic Programming, Princeton University Press, Princeton, New Jersey, [10] Edward J. Sondik, The Optimal Control of Partially Observable Markov Processes, PhD thesis, Stanford University, Stanford, California, 1971.
arxiv:math/ v1 [math.gm] 11 Aug 2004
Learning a Machine for the Decision in a Partially arxiv:math/0408146v1 [math.gm] 11 Aug 2004 Observable Markov Universe Frédéric Dambreville Délégation Générale pour l Armement, DGA/CTA/DT/GIP 16 Bis,
More informationMachine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?
Machine Learning and Bayesian Inference Dr Sean Holden Computer Laboratory, Room FC6 Telephone extension 6372 Email: sbh11@cl.cam.ac.uk www.cl.cam.ac.uk/ sbh11/ Unsupervised learning Can we find regularity
More informationOptimal path planning using Cross-Entropy method
Optimal path planning using Cross-Entropy method F Celeste, FDambreville CEP/Dept of Geomatics Imagery Perception 9 Arcueil France {francisceleste, fredericdambreville}@etcafr J-P Le Cadre IRISA/CNRS Campus
More informationPILCO: A Model-Based and Data-Efficient Approach to Policy Search
PILCO: A Model-Based and Data-Efficient Approach to Policy Search (M.P. Deisenroth and C.E. Rasmussen) CSC2541 November 4, 2016 PILCO Graphical Model PILCO Probabilistic Inference for Learning COntrol
More information6 Reinforcement Learning
6 Reinforcement Learning As discussed above, a basic form of supervised learning is function approximation, relating input vectors to output vectors, or, more generally, finding density functions p(y,
More informationChapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang
Chapter 4 Dynamic Bayesian Networks 2016 Fall Jin Gu, Michael Zhang Reviews: BN Representation Basic steps for BN representations Define variables Define the preliminary relations between variables Check
More informationAlgorithmisches Lernen/Machine Learning
Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) Decision-Trees, Genetic Algorithms Part 2: Norman Hendrich Support-Vector Machines
More informationMathematical Formulation of Our Example
Mathematical Formulation of Our Example We define two binary random variables: open and, where is light on or light off. Our question is: What is? Computer Vision 1 Combining Evidence Suppose our robot
More informationLearning Gaussian Process Models from Uncertain Data
Learning Gaussian Process Models from Uncertain Data Patrick Dallaire, Camille Besse, and Brahim Chaib-draa DAMAS Laboratory, Computer Science & Software Engineering Department, Laval University, Canada
More informationRL 14: POMDPs continued
RL 14: POMDPs continued Michael Herrmann University of Edinburgh, School of Informatics 06/03/2015 POMDPs: Points to remember Belief states are probability distributions over states Even if computationally
More informationStatistical Learning. Philipp Koehn. 10 November 2015
Statistical Learning Philipp Koehn 10 November 2015 Outline 1 Learning agents Inductive learning Decision tree learning Measuring learning performance Bayesian learning Maximum a posteriori and maximum
More informationLinear Dynamical Systems
Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations
More informationArtificial Intelligence
Artificial Intelligence Dynamic Programming Marc Toussaint University of Stuttgart Winter 2018/19 Motivation: So far we focussed on tree search-like solvers for decision problems. There is a second important
More informationIntroduction to Artificial Intelligence (AI)
Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 9 Oct, 11, 2011 Slide credit Approx. Inference : S. Thrun, P, Norvig, D. Klein CPSC 502, Lecture 9 Slide 1 Today Oct 11 Bayesian
More informationOutline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012
CSE 573: Artificial Intelligence Autumn 2012 Reasoning about Uncertainty & Hidden Markov Models Daniel Weld Many slides adapted from Dan Klein, Stuart Russell, Andrew Moore & Luke Zettlemoyer 1 Outline
More informationReinforcement Learning
Reinforcement Learning Model-Based Reinforcement Learning Model-based, PAC-MDP, sample complexity, exploration/exploitation, RMAX, E3, Bayes-optimal, Bayesian RL, model learning Vien Ngo MLR, University
More informationMarkov Decision Processes
Markov Decision Processes Noel Welsh 11 November 2010 Noel Welsh () Markov Decision Processes 11 November 2010 1 / 30 Annoucements Applicant visitor day seeks robot demonstrators for exciting half hour
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationCS Machine Learning Qualifying Exam
CS Machine Learning Qualifying Exam Georgia Institute of Technology March 30, 2017 The exam is divided into four areas: Core, Statistical Methods and Models, Learning Theory, and Decision Processes. There
More informationREINFORCE Framework for Stochastic Policy Optimization and its use in Deep Learning
REINFORCE Framework for Stochastic Policy Optimization and its use in Deep Learning Ronen Tamari The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (#67679) February 28, 2016 Ronen Tamari
More informationIntroduction to Artificial Intelligence (AI)
Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 10 Oct, 13, 2011 CPSC 502, Lecture 10 Slide 1 Today Oct 13 Inference in HMMs More on Robot Localization CPSC 502, Lecture
More informationOptimal Control of Partiality Observable Markov. Processes over a Finite Horizon
Optimal Control of Partiality Observable Markov Processes over a Finite Horizon Report by Jalal Arabneydi 04/11/2012 Taken from Control of Partiality Observable Markov Processes over a finite Horizon by
More informationBalancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm
Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm Michail G. Lagoudakis Department of Computer Science Duke University Durham, NC 2778 mgl@cs.duke.edu
More informationConstrained Minimax Optimization of Continuous Search Efforts for the Detection of a Stationary Target
Constrained Mini Optimization of Continuous Search Efforts for the Detection of a Stationary Target Frédéric Dambreville, 1 Jean-Pierre Le Cadre 2 1 Délégation Générale pour l Armement, 16 Bis, Avenue
More informationCS599 Lecture 1 Introduction To RL
CS599 Lecture 1 Introduction To RL Reinforcement Learning Introduction Learning from rewards Policies Value Functions Rewards Models of the Environment Exploitation vs. Exploration Dynamic Programming
More informationSequential Decision Problems
Sequential Decision Problems Michael A. Goodrich November 10, 2006 If I make changes to these notes after they are posted and if these changes are important (beyond cosmetic), the changes will highlighted
More informationAdministration. CSCI567 Machine Learning (Fall 2018) Outline. Outline. HW5 is available, due on 11/18. Practice final will also be available soon.
Administration CSCI567 Machine Learning Fall 2018 Prof. Haipeng Luo U of Southern California Nov 7, 2018 HW5 is available, due on 11/18. Practice final will also be available soon. Remaining weeks: 11/14,
More informationTemplate-Based Representations. Sargur Srihari
Template-Based Representations Sargur srihari@cedar.buffalo.edu 1 Topics Variable-based vs Template-based Temporal Models Basic Assumptions Dynamic Bayesian Networks Hidden Markov Models Linear Dynamical
More informationSYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I
SYDE 372 Introduction to Pattern Recognition Probability Measures for Classification: Part I Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 Why use probability
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 12: Gaussian Belief Propagation, State Space Models and Kalman Filters Guest Kalman Filter Lecture by
More informationMARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti
1 MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti Historical background 2 Original motivation: animal learning Early
More informationDETECTING PROCESS STATE CHANGES BY NONLINEAR BLIND SOURCE SEPARATION. Alexandre Iline, Harri Valpola and Erkki Oja
DETECTING PROCESS STATE CHANGES BY NONLINEAR BLIND SOURCE SEPARATION Alexandre Iline, Harri Valpola and Erkki Oja Laboratory of Computer and Information Science Helsinki University of Technology P.O.Box
More informationMassachusetts Institute of Technology
Massachusetts Institute of Technology 6.867 Machine Learning, Fall 2006 Problem Set 5 Due Date: Thursday, Nov 30, 12:00 noon You may submit your solutions in class or in the box. 1. Wilhelm and Klaus are
More informationOptimal Control. McGill COMP 765 Oct 3 rd, 2017
Optimal Control McGill COMP 765 Oct 3 rd, 2017 Classical Control Quiz Question 1: Can a PID controller be used to balance an inverted pendulum: A) That starts upright? B) That must be swung-up (perhaps
More informationTemporal-Difference Q-learning in Active Fault Diagnosis
Temporal-Difference Q-learning in Active Fault Diagnosis Jan Škach 1 Ivo Punčochář 1 Frank L. Lewis 2 1 Identification and Decision Making Research Group (IDM) European Centre of Excellence - NTIS University
More informationProbabilistic Robotics
University of Rome La Sapienza Master in Artificial Intelligence and Robotics Probabilistic Robotics Prof. Giorgio Grisetti Course web site: http://www.dis.uniroma1.it/~grisetti/teaching/probabilistic_ro
More informationPATTERN CLASSIFICATION
PATTERN CLASSIFICATION Second Edition Richard O. Duda Peter E. Hart David G. Stork A Wiley-lnterscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim Brisbane Singapore Toronto CONTENTS
More informationNovel spectrum sensing schemes for Cognitive Radio Networks
Novel spectrum sensing schemes for Cognitive Radio Networks Cantabria University Santander, May, 2015 Supélec, SCEE Rennes, France 1 The Advanced Signal Processing Group http://gtas.unican.es The Advanced
More informationCourse 16:198:520: Introduction To Artificial Intelligence Lecture 13. Decision Making. Abdeslam Boularias. Wednesday, December 7, 2016
Course 16:198:520: Introduction To Artificial Intelligence Lecture 13 Decision Making Abdeslam Boularias Wednesday, December 7, 2016 1 / 45 Overview We consider probabilistic temporal models where the
More informationKalman filtering and friends: Inference in time series models. Herke van Hoof slides mostly by Michael Rubinstein
Kalman filtering and friends: Inference in time series models Herke van Hoof slides mostly by Michael Rubinstein Problem overview Goal Estimate most probable state at time k using measurement up to time
More informationAccuracy and Decision Time for Decentralized Implementations of the Sequential Probability Ratio Test
21 American Control Conference Marriott Waterfront, Baltimore, MD, USA June 3-July 2, 21 ThA1.3 Accuracy Decision Time for Decentralized Implementations of the Sequential Probability Ratio Test Sra Hala
More information2-Step Temporal Bayesian Networks (2TBN): Filtering, Smoothing, and Beyond Technical Report: TRCIM1030
2-Step Temporal Bayesian Networks (2TBN): Filtering, Smoothing, and Beyond Technical Report: TRCIM1030 Anqi Xu anqixu(at)cim(dot)mcgill(dot)ca School of Computer Science, McGill University, Montreal, Canada,
More informationChris Bishop s PRML Ch. 8: Graphical Models
Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular
More informationNotes on Machine Learning for and
Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Choosing Hypotheses Generally want the most probable hypothesis given the training data Maximum a posteriori
More informationThe Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision
The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that
More informationCSEP 573: Artificial Intelligence
CSEP 573: Artificial Intelligence Hidden Markov Models Luke Zettlemoyer Many slides over the course adapted from either Dan Klein, Stuart Russell, Andrew Moore, Ali Farhadi, or Dan Weld 1 Outline Probabilistic
More informationArtificial Intelligence
Artificial Intelligence Roman Barták Department of Theoretical Computer Science and Mathematical Logic Summary of last lecture We know how to do probabilistic reasoning over time transition model P(X t
More informationEfficient Sensitivity Analysis in Hidden Markov Models
Efficient Sensitivity Analysis in Hidden Markov Models Silja Renooij Department of Information and Computing Sciences, Utrecht University P.O. Box 80.089, 3508 TB Utrecht, The Netherlands silja@cs.uu.nl
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationUsing Gaussian Processes for Variance Reduction in Policy Gradient Algorithms *
Proceedings of the 8 th International Conference on Applied Informatics Eger, Hungary, January 27 30, 2010. Vol. 1. pp. 87 94. Using Gaussian Processes for Variance Reduction in Policy Gradient Algorithms
More informationA Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes (bilmes@cs.berkeley.edu) International Computer Science Institute
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationBayesian Networks Inference with Probabilistic Graphical Models
4190.408 2016-Spring Bayesian Networks Inference with Probabilistic Graphical Models Byoung-Tak Zhang intelligence Lab Seoul National University 4190.408 Artificial (2016-Spring) 1 Machine Learning? Learning
More information10 Robotic Exploration and Information Gathering
NAVARCH/EECS 568, ROB 530 - Winter 2018 10 Robotic Exploration and Information Gathering Maani Ghaffari April 2, 2018 Robotic Information Gathering: Exploration and Monitoring In information gathering
More informationApplication of probabilistic PCR5 Fusion Rule for Multisensor Target Tracking
Application of probabilistic PCR5 Fusion Rule for Multisensor Target Tracking arxiv:0707.3013v1 [stat.ap] 20 Jul 2007 Aloïs Kirchner a, Frédéric Dambreville b, Francis Celeste c Délégation Générale pour
More informationAlgorithm-Independent Learning Issues
Algorithm-Independent Learning Issues Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2007 c 2007, Selim Aksoy Introduction We have seen many learning
More informationDevelopment of a Deep Recurrent Neural Network Controller for Flight Applications
Development of a Deep Recurrent Neural Network Controller for Flight Applications American Control Conference (ACC) May 26, 2017 Scott A. Nivison Pramod P. Khargonekar Department of Electrical and Computer
More informationCS 446 Machine Learning Fall 2016 Nov 01, Bayesian Learning
CS 446 Machine Learning Fall 206 Nov 0, 206 Bayesian Learning Professor: Dan Roth Scribe: Ben Zhou, C. Cervantes Overview Bayesian Learning Naive Bayes Logistic Regression Bayesian Learning So far, we
More informationBayesian Networks BY: MOHAMAD ALSABBAGH
Bayesian Networks BY: MOHAMAD ALSABBAGH Outlines Introduction Bayes Rule Bayesian Networks (BN) Representation Size of a Bayesian Network Inference via BN BN Learning Dynamic BN Introduction Conditional
More information10-701/ Machine Learning, Fall
0-70/5-78 Machine Learning, Fall 2003 Homework 2 Solution If you have questions, please contact Jiayong Zhang .. (Error Function) The sum-of-squares error is the most common training
More informationCSCE 478/878 Lecture 6: Bayesian Learning
Bayesian Methods Not all hypotheses are created equal (even if they are all consistent with the training data) Outline CSCE 478/878 Lecture 6: Bayesian Learning Stephen D. Scott (Adapted from Tom Mitchell
More informationCS242: Probabilistic Graphical Models Lecture 4B: Learning Tree-Structured and Directed Graphs
CS242: Probabilistic Graphical Models Lecture 4B: Learning Tree-Structured and Directed Graphs Professor Erik Sudderth Brown University Computer Science October 6, 2016 Some figures and materials courtesy
More informationHidden Markov Models Part 1: Introduction
Hidden Markov Models Part 1: Introduction CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Modeling Sequential Data Suppose that
More informationA Probabilistic Relational Model for Characterizing Situations in Dynamic Multi-Agent Systems
A Probabilistic Relational Model for Characterizing Situations in Dynamic Multi-Agent Systems Daniel Meyer-Delius 1, Christian Plagemann 1, Georg von Wichert 2, Wendelin Feiten 2, Gisbert Lawitzky 2, and
More informationLecture 6: April 19, 2002
EE596 Pat. Recog. II: Introduction to Graphical Models Spring 2002 Lecturer: Jeff Bilmes Lecture 6: April 19, 2002 University of Washington Dept. of Electrical Engineering Scribe: Huaning Niu,Özgür Çetin
More informationSequential Monte Carlo methods for filtering of unobservable components of multidimensional diffusion Markov processes
Sequential Monte Carlo methods for filtering of unobservable components of multidimensional diffusion Markov processes Ellida M. Khazen * 13395 Coppermine Rd. Apartment 410 Herndon VA 20171 USA Abstract
More information16.4 Multiattribute Utility Functions
285 Normalized utilities The scale of utilities reaches from the best possible prize u to the worst possible catastrophe u Normalized utilities use a scale with u = 0 and u = 1 Utilities of intermediate
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationLearning from Sequential and Time-Series Data
Learning from Sequential and Time-Series Data Sridhar Mahadevan mahadeva@cs.umass.edu University of Massachusetts Sridhar Mahadevan: CMPSCI 689 p. 1/? Sequential and Time-Series Data Many real-world applications
More informationForecasting Wind Ramps
Forecasting Wind Ramps Erin Summers and Anand Subramanian Jan 5, 20 Introduction The recent increase in the number of wind power producers has necessitated changes in the methods power system operators
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationStatistical learning. Chapter 20, Sections 1 3 1
Statistical learning Chapter 20, Sections 1 3 Chapter 20, Sections 1 3 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete
More informationde Blanc, Peter Ontological Crises in Artificial Agents Value Systems. The Singularity Institute, San Francisco, CA, May 19.
MIRI MACHINE INTELLIGENCE RESEARCH INSTITUTE Ontological Crises in Artificial Agents Value Systems Peter de Blanc Machine Intelligence Research Institute Abstract Decision-theoretic agents predict and
More informationThe Cross Entropy Method for the N-Persons Iterated Prisoner s Dilemma
The Cross Entropy Method for the N-Persons Iterated Prisoner s Dilemma Tzai-Der Wang Artificial Intelligence Economic Research Centre, National Chengchi University, Taipei, Taiwan. email: dougwang@nccu.edu.tw
More informationCS532, Winter 2010 Hidden Markov Models
CS532, Winter 2010 Hidden Markov Models Dr. Alan Fern, afern@eecs.oregonstate.edu March 8, 2010 1 Hidden Markov Models The world is dynamic and evolves over time. An intelligent agent in such a world needs
More informationIntroduction to Mobile Robotics Probabilistic Robotics
Introduction to Mobile Robotics Probabilistic Robotics Wolfram Burgard 1 Probabilistic Robotics Key idea: Explicit representation of uncertainty (using the calculus of probability theory) Perception Action
More informationDynamic Approaches: The Hidden Markov Model
Dynamic Approaches: The Hidden Markov Model Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Inference as Message
More informationAn Adaptive Neural Network Scheme for Radar Rainfall Estimation from WSR-88D Observations
2038 JOURNAL OF APPLIED METEOROLOGY An Adaptive Neural Network Scheme for Radar Rainfall Estimation from WSR-88D Observations HONGPING LIU, V.CHANDRASEKAR, AND GANG XU Colorado State University, Fort Collins,
More informationNaïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability
Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September
More informationLinear Dynamical Systems (Kalman filter)
Linear Dynamical Systems (Kalman filter) (a) Overview of HMMs (b) From HMMs to Linear Dynamical Systems (LDS) 1 Markov Chains with Discrete Random Variables x 1 x 2 x 3 x T Let s assume we have discrete
More informationData Structures for Efficient Inference and Optimization
Data Structures for Efficient Inference and Optimization in Expressive Continuous Domains Scott Sanner Ehsan Abbasnejad Zahra Zamani Karina Valdivia Delgado Leliane Nunes de Barros Cheng Fang Discrete
More informationHuman-Oriented Robotics. Temporal Reasoning. Kai Arras Social Robotics Lab, University of Freiburg
Temporal Reasoning Kai Arras, University of Freiburg 1 Temporal Reasoning Contents Introduction Temporal Reasoning Hidden Markov Models Linear Dynamical Systems (LDS) Kalman Filter 2 Temporal Reasoning
More informationStatistical learning. Chapter 20, Sections 1 4 1
Statistical learning Chapter 20, Sections 1 4 Chapter 20, Sections 1 4 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete
More informationCOMP 551 Applied Machine Learning Lecture 20: Gaussian processes
COMP 55 Applied Machine Learning Lecture 2: Gaussian processes Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~hvanho2/comp55
More informationMath 350: An exploration of HMMs through doodles.
Math 350: An exploration of HMMs through doodles. Joshua Little (407673) 19 December 2012 1 Background 1.1 Hidden Markov models. Markov chains (MCs) work well for modelling discrete-time processes, or
More informationPerformance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project
Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore
More informationQ-Learning in Continuous State Action Spaces
Q-Learning in Continuous State Action Spaces Alex Irpan alexirpan@berkeley.edu December 5, 2015 Contents 1 Introduction 1 2 Background 1 3 Q-Learning 2 4 Q-Learning In Continuous Spaces 4 5 Experimental
More informationIndependent Component Analysis and Unsupervised Learning
Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien National Cheng Kung University TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent
More informationCS 6375 Machine Learning
CS 6375 Machine Learning Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Course Info. Instructor: Nicholas Ruozzi Office: ECSS 3.409 Office hours: Tues.
More informationIntroduction: MLE, MAP, Bayesian reasoning (28/8/13)
STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this
More informationDialogue as a Decision Making Process
Dialogue as a Decision Making Process Nicholas Roy Challenges of Autonomy in the Real World Wide range of sensors Noisy sensors World dynamics Adaptability Incomplete information Robustness under uncertainty
More informationON SCALABLE CODING OF HIDDEN MARKOV SOURCES. Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose
ON SCALABLE CODING OF HIDDEN MARKOV SOURCES Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose Department of Electrical and Computer Engineering University of California, Santa Barbara, CA, 93106
More informationA Tour of Reinforcement Learning The View from Continuous Control. Benjamin Recht University of California, Berkeley
A Tour of Reinforcement Learning The View from Continuous Control Benjamin Recht University of California, Berkeley trustable, scalable, predictable Control Theory! Reinforcement Learning is the study
More informationMarkov chain optimisation for energy systems (MC-ES)
Markov chain optimisation for energy systems (MC-ES) John Moriarty Queen Mary University of London 19th August 2016 Approaches to the stochastic optimisation of power systems are not mature at research
More informationMarkov localization uses an explicit, discrete representation for the probability of all position in the state space.
Markov Kalman Filter Localization Markov localization localization starting from any unknown position recovers from ambiguous situation. However, to update the probability of all positions within the whole
More informationFinal Exam, Fall 2002
15-781 Final Exam, Fall 22 1. Write your name and your andrew email address below. Name: Andrew ID: 2. There should be 17 pages in this exam (excluding this cover sheet). 3. If you need more room to work
More informationIntroduction to Bayesian Learning. Machine Learning Fall 2018
Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability
More informationSymbolic Perseus: a Generic POMDP Algorithm with Application to Dynamic Pricing with Demand Learning
Symbolic Perseus: a Generic POMDP Algorithm with Application to Dynamic Pricing with Demand Learning Pascal Poupart (University of Waterloo) INFORMS 2009 1 Outline Dynamic Pricing as a POMDP Symbolic Perseus
More information