Decayed Markov Chain Monte Carlo for Interactive POMDPs

Size: px
Start display at page:

Download "Decayed Markov Chain Monte Carlo for Interactive POMDPs"

Transcription

1 Decayed Markov Chain Monte Carlo for Interactive POMDPs Yanlin Han Piotr Gmytrasiewicz Department of Computer Science University of Illinois at Chicago Chicago, IL Abstract To act optimally in a partially observable, stochastic and multi-agent environment, an autonomous agent needs to maintain a belief of the world at any given time. An extension of partially observable Markov decision processes (POMDPs), called interactive POMDPs (I-POMDPs), provides a principled framework for planning and acting in such settings. I-POMDP augments the POMDP beliefs by including models of other agents in the state space, which forms a hierarchical belief structure that represents an agent s belief about the physical state, belief about the other agents and their beliefs about others beliefs. This nested hierarchy results in a dramatic increase of the belief space complexity. In order to perform belief update in such settings, we propose a new approximating method that utilizes decayed Markov chain Monte Carlo (D-MCMC). For problems of various complexities, we show that our approach effectively mitigates the belief space complexity and competes with other Monte Carlo sampling algorithms for multi-agent systems, such as interactive particle filter (I-PF). We also give comparisons on their accuracy and efficiency, and then suggests applicable scenarios for each algorithms. 1 Introduction Partially observable Markov decision processes (POMDPs) (Kaelbling, Littman, and Cassandra 1998) provides a principled, decision-theoretic framework for planning under uncertainty in a partially observable, stochastic environment. An autonomous agent operates rationally in such settings by maintaining a belief of the physical state at any given time, in doing so it sequentially chooses the optimal actions that maximize the future rewards. Therefore, solutions of POMDPs are mappings from an agent s beliefs to actions. Although POMDPs can be used in multi-agent settings, it is doing so under strong assumptions that the effects of other agents actions are implicitly treated as noise and fold in state transitions, such as recent Bayes-adaptive POMDPs (Ross, Draa, and Pineau 2007), infinite generalized policy representation (Liu, Liao, and Carin 2011), infinite POMDPs (Doshi-Velez et al. 2013). Thus, an agent s beliefs about other agents are not in the solutions of POMDPs. Interactive POMDPs (I-POMDPs) (Gmytrasiewicz and Doshi 2005) are a generalization of POMDPs to multi-agent settings by replacing POMDP belief spaces with interactive hierarchical belief systems. Specifically, it augments the plain beliefs about physical states in POMDP by including models of other agents in the state space, which forms a hierarchical belief structure. The models of other agents included in the new augmented state space comprise two types: the intentional models and subintentional models. The former ascribes beliefs, preferences, and rationality to other agents (Gmytrasiewicz and Doshi 2005), while the latter, such as finite state controllers (Panella and Gmytrasiewicz 2016), does not. We focus on intentional models in this paper. Solutions of I- POMDPs map an agent s belief about the environment and other agents models to actions. Therefore, it is applicable to all important agent, human, and mixed agent-human applications. It has been 30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.

2 clearly shown (Gmytrasiewicz and Doshi 2005) that the added sophistication for modeling others as rational agents results in a higher value function which dominates the one obtained from simply treating others as noise. However, for I-POMDPs, the interactive belief modification results in a dramatic increase of the belief space complexity, adding to the curse of dimensionality: the complexity of the belief representation is proportional to belief dimensions, due to exponential growth of agent models with increase of nesting level. Since exact solutions to POMDPs are proven to be PSPACE-complete for finite horizon and undecidable for infinite time horizon (Papadimitriou and Tsitsiklis 1987), the time complexity of generalized I-POMDPs, which may contain multiple POMDPs and I-POMDPs depending on its actual nesting level, is greater than or equal to PSPACE-complete for finite horizon and undecidable for infinite time horizon. Therefore, in order to apply I-POMDPs to more realistic settings, a good approximation algorithm for computing the nested interactive beliefs is crucial to the trade-off between solution quality and computations. To address this issue, we propose methods that utilize Monte Carlo sampling algorithms to obtain approximating solutions to I-POMDPs. Specifically, we use decayed Markov chain Monte Carlo (D-MCMC) (Marthi et al. 2002) to concentrate on sampling beliefs more frequently from their recent past, and use interactive particle filter (I-PF) (Doshi and Gmytrasiewicz 2009) to descend the belief hierarchies and sample them at each level. Applying them to I-POMDPs is nontrivial, since it needs generalization of D-MCMC to multi-agent settings with guaranteed convergence and improvements of I-PF for the infinite time horizon and modeled agent of our interest. Our method significantly mitigates the belief space complexity of I-POMDPs, it works effectively and efficiently on various settings of multi-agent problems. Compared with other sampling methods for I-POMDP such as I-PF, the generalized D-MCMC is competitive, non-divergent, and advantageous on certain application scenarios. 2 Background 2.1 POMDP A Partially observable Markov decision process (POMDP) (Kaelbling, Littman, and Cassandra 1998) is a general model for planning and acting in a single-agent, partially observable, stochastic domain. It is defined for a single agent i as: P OMDP i = S, A i, Ω i, T i, O i, R i (1) where S is the set of states of the environment; A i is the set of agent i s possible actions; Ω i is the set of agent i s possible observations; T i : S A i S [0, 1] is the state transition function; O i : S A i Ω i [0, 1] is the observation function; R i : S A i R i is the reward function. An agent s belief about the state can be represented as a probability distribution over S. The belief update can be simply done using the following formula, where α is the normalizing constant: b (s ) = αo(o, s, a) s S T (s, a, s)b(s) (2) Then the optimal action, a, is part of the set of optimal actions, OP T (b i ), for the belief state defined as: { OP T (b i ) = arg max b i (s)r(s, a i ) + γ } P (o i a i, b i )U(SE(b i, a i, o i )) (3) a i A i s S o i Ω i 2.2 Markov Chain Monte Carlo The Markov Chain Monte Carlo (MCMC) method (Gilks et al., 1996) is widely used to approximate probability distributions when they are unable to be computed directly. It generates samples from a posterior distribution π(x) over state space x, by simulating a Markov chain p(x x) whose state space is x and stationary distribution is π(x). The samples drawn from p converge to the target distribution π as the number of samples goes to infinity. Such a Markov Chain with appropriate stationary distribution can be constructed specifically using Gibbs sampling (Pearl, 1988), it works on the target distribution π(x) where x = (x 1,..., x t ). For a 2

3 Markov chain p(x x), in each iteration we sample i {1,..., t} and sample x i from its conditional distribution π(x i x j : j i), eventually the stationary distribution of p is π. 3 The Model 3.1 I-POMDP framework An interactive POMDP of agent i, I-POMDP i, is defined as: I-P OMDP i = IS i,l, A, Ω i, T i, O i, R i (4) where IS i,l is the set of interactive states of the environment, defined as IS i,l = S M i,l 1, l 1, where S is the set of states and M i,l 1 is the set of possible models of agent j, and l is the strategy level. A specific class of models are the (l 1)th level intentional models, Θ j,l 1, of agent j: θ j,l 1 = b j,l 1, A, Ω j, T j, O j, R j, OC j, b j,l 1 is agent j s belief nested to level (l 1), b j,l 1 (IS j,l 1 ), and OC j is j s optimality criterion. The intentional model θ j,l 1 can be rewritten as θ j,l 1 = b j,l 1, ˆθ j, where ˆθ j includes all elements of the intentional model other than the belief and is called the agent j s frame. IS i,l could be defined in an inductive manner (note that when ˆθ j is usually known, ˆθ j reduces to b j ): IS i,0 = S, θ j,0 = { b j,0, ˆθ j : b j,0 (S)} IS i,1 = S θ j,0, θ j,1 = { b j,1, ˆθ j : b j,1 (IS j,1 )}... IS i,l = S θ j,l 1, θ j,l = { b j,l, ˆθ j : b j,l (IS j,l )} And all other remaining components in an I-POMDP are similar to those in a POMDP: (5) A = A i A j is the set of joint actions of all agents. Ω i is the set of agent i s possible observations. T i : S A i S [0, 1] is the state transition function. O i : S A i Ω i [0, 1] is the observation function. R i : IS A i R i is the reward function. 3.2 Interactive belief update Given all the definitions above, the interactive belief up-date can be performed as follows: b t i(is t ) =P r(is t b t 1 i, a t 1 =α b(is t 1 ) is t 1 o t j i, o t i) a t 1 j P r(a t 1 j θ t 1 j )T (s t 1, a t 1, s t )O(s t, a t 1, o t i) (6) O j (s t, a t 1, o t j)τ(b t 1 j, a t 1 j, o t j, b t j) Unlike plain belief update in POMDP, the interactive belief update in I-POMDP takes two additional sophistications into account. First, the probabilities of other s actions given its models (the second summation) need to be computed since the state of physical environment now depends on both agents actions. Second, the agent needs to update its beliefs based on the anticipation of what observations the other agent might get and how it updates (the third summation). Then the optimal action, a, for the case of infinite horizon criterion with discounting, is part of the set of optimal actions, OP T (θ i ), for the belief state defined as: { OP T (θ i ) = arg max b is (s)er i (is, a i ) + γ P (o i a i, b i )U( SE θi (b i, a i, o i ), ˆθ } i ) a i A i is IS o i Ω i (7) 3

4 4 Decayed Markov Chain Monte Carlo The original Decayed MCMC method was proposed as a filtering algorithm for problems such as dynamic Bayesian networks (Marthi et al. 2002), but it is limited to a single agent perspective. Here we generalize Decayed MCMC to the multi-agent settings by applying it to such models denoted by the Interactive Dynamic Influence Diagram (IDID) (Doshi, Zeng, and Chen 2009), which is shown in figure 1 as an example of two time slices. IDID explicitly models the I-POMDP structure by Figure 1: Two time slices of an IDID, red ones are known. decomposing it into chance and decision variables and the dependencies between them. It captures the essence of the multi-agent interaction problem under the framework of I-POMDPS thus is a perfect way to make computations. Suppose there are two agents i and j, the subscripts in figure 1 denotes corresponding agents. S is the physical state, O j is agent j s observation, A j is j s action and θ j is j s model. The red nodes O i and A i are agent i s action and observation respectively, and they are the only observable variables in this two agent setting. In order to utilize Gibbs samplings, we identify the known nodes of agent i, and sample all unknown variables from their conditional distributions given everything else. However, this can be easily simplified by sampling from the conditionals given their corresponding Markov blankets at different time steps, since any unknown variables are independent of others given their Markov blankets. Then for every t, we just need to sample from the following conditional distributions: o t j P r(o t j s t, θ t 1 j, a t 1, a t 1 j i ) s t P r(s t s t 1, s t+1, a t 1:t j, a t 1:t i, o t j, o t i) (8) a t j P r(a t j θj, t s t, s t+1, a t i, o t+1 j, o t+1 i ) θj t P r(θj θ t t 1 j, a t 1 j, o t j, a t j) For a particular time step t, say t = T, since our goal is the filtering distribution of all hidden variables given observation histories Oi 1:T, only the last hidden variables need to be retained and all previous one can be discarded. And we use the final sample of the chain from last time step to initialize the new Markov chain for sampling the hidden variables at current time step, hoping that the final sample of the previous time step is near the high probability region of the next step. The actual algorithm is as follows, where x (i) t denotes the ith sample of all variables at time t in the network: Algorithm 1: Decayed MCMC Initialize x 0 1:t For i = 1,..., N sample t from some decay function: d(t) Sample x (i) t from p(x t x (i) t), where x (i) t denotes all components of x (i) except x (i) t 4

5 Accordingly, the key of this algorithm is to appropriately choose the time step at which we sample more frequently. Intuitively, since Markov chain has an exponential forgetting effect: x t k has exponentially small effect on p(x t ), instead of picking t uniformly in Gibbs sampling, we can pick t from a decay function d(t) which decays equal to or slower than the forgetting rate, so that the new sampling algorithm favors sampling from recent past and maintains accuracy in the mean time. Notice that the decay function is essentially a probability of updating state x t k, it must satisfy: 1. d(k) > 0 for all k; 2. d(k) decays no greater than the forgetting rate at state x t k. Hence, an inverse polynomial decay d(k) k α (α > 1)should work well for the purpose, since it dominates the exponential forgetting rate asymptotically. Regarding the time complexity, it is usually analyzed in terms of the time cost of each update step of the sampling Markov chain and mixing time of the chain. The former is simply linear of the interactive state space, but the latter will be dependent of time t since samples need to be updated once at each time step in order for the chain to be mixed. Since our aim here is to accurately sample from the marginal distribution of x t, we follow the same approach of the original D-MCMC algorithm, which uses the marginal mixing time to ensure the filtering distribution at current time step is accurate. It is proved that the marginal mixing time for any given observation sequence is O(1), so the total time complexity is O(1) by adding up the update and mixing time, which is independent of time t. 5 Experiments 5.1 Setup We present results for the multi-agent tiger game (Gmytrasiewicz and Doshi 2005) with various parameters. The multi-agent tiger game is a generalization of the classical single agent tiger game (Kaelbling, Littman, and Cassandra 1998) with adding observations which caused by others actions. The generalized multi-agent game contains additional observations regarding other players, while the state transition and reward function also involve others actions as well. Specifically, the game goes as follows: there are a tiger and a pile of gold behind two doors respectively, two players can both listen for the growl of tiger and the creak caused by the other player, or open doors which resets the tiger s location with equal probability. Their observation toward the tiger and the other player are both relatively high (0.85 and 0.9 respectively). No matter caused by which player, the reward for listening action is -1, opening the tiger door is -100 and opening the gold door is 10. Recall that an interactive POMDP of agent i is defined as a six tuple I-P OMDP i = IS i,l, A, Ω i, T i, O i, R i. Thus for the specific setting of multi-agent tiger problem: IS i,1 = S θ j,0, S = {tiger on the left (TL), tiger on the right (TR)} and θ j,0 = b j,0 = {p(t L), p(t R)}, assuming j s frame is known. A = A i A j is all the combinations of each agent s possible actions: listen (L), open left door (OL) and open right door (OR). Ω i is all the combinations of each agent s possible observations: growl from left (GL) or right (GR), combined with creak from left (CL), right (CR) or silence (S). T i = T j : S A i A j S [0, 1] now becomes a joint state transition probability that involves both actions, the tiger s position gets reset to left/right door with 0.5/0.5 whenever an agent opens the door, and remains with 1 when they both listen. O i : S A i A j Ω i [0, 1] becomes a joint observation probability that involves both actions, the observation accuracy of agent i is the accuracy of hearing a growl (0.85) times the accuracy of hearing a creak (0.9). O j is symmetric of O i in terms of the joint actions. R i : IS A i A j R i : agent i gets -1, -100 and 10 when he listens, opens the wrong door and opens the correct door, respectively and independent of j s actions, and vise versa for agent j. 5

6 For the sake of brevity, we restrict the experiments to a two-agent setting and nesting level of one, but the sampling algorithm is extensible to any number of agents and nesting levels in a straightforward way. For the actual experiments, we firstly fix the number of samples used to be 1000 and run it on a two agent tiger game simulation as described above, for comparing the accuracy and efficiency of D-MCMC and I-PF. Then we change the probabilities of tiger s resetting when agents listen and their observation accuracy, in order to test the mixing time of D-MCMC on different problem settings. Lastly we change the sample size and compare the total error rate of predicting others actions to show the consistencies of these two methods. The experiments were running on a 64-bit Windows 10 computer with Intel Core i5-6200u 2.3GHz CPU, 8GB memory, and Matlab R2015b. 5.2 Results Figure 2: Accuracy and efficiency comparisons. The left plot in figure 2 shows the prediction error of agent j s actions over time steps when i s observation accuracy is increase to 0.95 and every other parameter remains the same as in section 5.1. We see that the errors remain bounded for both sampling algorithms, but I-PF can sometimes lose track of the actual belief (around time step and 35-40) when i s observation function is very deterministic, since the agent now tends to be overly sure and the samples collapse. Meanwhile, D-MCMC does not suffer such a problem on this situation. The right plot in figure 2 is a running time comparison on the standard tiger game settings described in section 5.1. Although running time of both methods are independent of t, when the two methods use equal number of samples (1000), I-PF is slightly efficient than D-MCMC due to its recursive nature that the samples drawn at previous time step are reused at next step. On the contrary, D-MCMC is non-recursive, consults from part of the history, and also needs some mixing time at each time step before the samples from the sampling Markov chain can be actually used. However, the difference of their time complexity is up to a constant factor, thus D-MCMC is still a better choice when dealing with observation outliers. Figure 3 shows some important intrinsic properties of D-MCMC on I-POMDP. The left plot is the mixing time of D-MCMC over observation history. For tiger game with different transitions (tiger remains with 0.5 and 0.9 when door opens) and observation functions (hearing accuracy of 0.65 and 0.85), this experiment shows that mixing time of D-MCMC increases with transition determinism and decrease with observation determinism, since the increasing importance of the observations means history becomes less relevant. However, the mixing time remains bounded on both sampling methods as time step increases. The right plot in figure 3 shows the bounded total error rates (consistency) for the standard tiger game settings as the number of samples increases. We observed that the error rate of D-MCMC drops faster than I-PF until they become very close after using roughly 1000 samples, and D-MCMC remains slightly lower error rate that I-PF due to its better performance when confronting outliers. Intuitively 6

7 Figure 3: Mixing time and consistency comparisons. D-MCMC consults from history and current observation, in which it should receive more information that leads to a higher accuracy. Table 1: Comparisons of major differences between I-PF and D-MCMC Name Samples Resampling RecursionDivergence Major Drawbacks D-MCMC I-PF entire sate trajectories most recent state resample state at any time only resample current state nonrecursivdivergent non- recursive sometimes divergent incapable of determinism; slightly slower high-dimensional inefficient; errors propagate forward Lastly, we give a detailed comparison in table 1 regarding important aspects of D-MCMC and I-PF. To summarize, when dealing with a complicated problem which might generate frequent observation outliers, the D-MCMC is guaranteed to be non-divergent while I-PF may temporarily lose track of the real posterior distribution. A complex, high dimensional state space or a tight observation model could be evident signs of such scenarios. However, there is a trade-off between accuracy and efficiency, since D-MCMC tackles these scenarios at the cost of sampling from recent history and also consumes a small amount of time for the burn-in period. 6 Conclusions and Future Work We have described a new method to approximate the belief update in I-POMDP settings and used it to sample from the interactive beliefs. The results show that our approach mitigates the belief space complexity, tackles observation outliers, and competes with other sampling algorithms in terms of the prediction accuracy toward others actions. Although empirical results have shown that D-MCMC is non-divergent, in the future we will formally prove its convergence on I-POMDPs. Also more examples with high dimensionality state space can be tested, in which situation D-MCMC should outperform I-PF since the latter suffers a long recovering time using the state transition model as the dimension increases. References Doshi, P., and Gmytrasiewicz, P. J Monte Carlo sampling methods for approximating interactive POMDPs. Journal of Artificial Intelligence Research 34: Doshi-Velez, F. and Konidaris, G., Hidden parameter Markov decision processes: A semiparametric regression approach for discovering latent task parametrizations. arxiv preprint arxiv: Doshi, P., Zeng, Y., and Chen, Q Graphical models for interactive POMDPs: representations and solutions. Autonomous Agents and Multi-Agent Systems 18.3: Gmytrasiewicz, P. J., and Doshi, P A 7

8 framework for sequential planning in multi-agent settings. Journal of Artificial Intelligence Research 24(1): Gilks, W.R., Richardson, S. and Spiegelhalter, D.J., Introducing markov chain monte carlo. Markov chain Monte Carlo in practice, 1, p.19. Gmytrasiewicz, P. J., and Doshi, P A framework for sequential planning in multi-agent settings. Journal of Artificial Intelligence Research 24(1): Kaelbling, L.P., Littman, M.L. and Cassandra, A.R., Planning and acting in partially observable stochastic domains. Artificial intelligence, 101(1), pp Liu, M., Liao, X. and Carin, L., The infinite regionalized policy representation. In Proceedings of the 28th International Conference on Machine Learning (ICML-11) (pp ). Marthi, B., Pasula, H., Russell, S. and Peres, Y., Decayed MCMC filtering. In Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence, Panella, A. and Gmytrasiewicz, P., 2016, March. Bayesian Learning of Other Agents Finite Controllers for Interactive POMDPs. In Thirtieth AAAI Conference on Artificial Intelligence. Papadimitriou, C.H. and Tsitsiklis, J.N., The complexity of Markov decision processes. Mathematics of operations research, 12(3), pp Pearl, J., Probabilistic reasoning in intelligent systems: Networks of plausible reasoning. Ross, S., Chaib-draa, B. and Pineau, J., Bayes-adaptive pomdps. In Advances in neural information processing systems (pp ). 8

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our

More information

A Partition-Based First-Order Probabilistic Logic to Represent Interactive Beliefs

A Partition-Based First-Order Probabilistic Logic to Represent Interactive Beliefs A Partition-Based First-Order Probabilistic Logic to Represent Interactive Beliefs Alessandro Panella and Piotr Gmytrasiewicz Fifth International Conference on Scalable Uncertainty Management Dayton, OH

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Dynamic Programming Marc Toussaint University of Stuttgart Winter 2018/19 Motivation: So far we focussed on tree search-like solvers for decision problems. There is a second important

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Learning Static Parameters in Stochastic Processes

Learning Static Parameters in Stochastic Processes Learning Static Parameters in Stochastic Processes Bharath Ramsundar December 14, 2012 1 Introduction Consider a Markovian stochastic process X T evolving (perhaps nonlinearly) over time variable T. We

More information

Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm

Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm Michail G. Lagoudakis Department of Computer Science Duke University Durham, NC 2778 mgl@cs.duke.edu

More information

Combine Monte Carlo with Exhaustive Search: Effective Variational Inference and Policy Gradient Reinforcement Learning

Combine Monte Carlo with Exhaustive Search: Effective Variational Inference and Policy Gradient Reinforcement Learning Combine Monte Carlo with Exhaustive Search: Effective Variational Inference and Policy Gradient Reinforcement Learning Michalis K. Titsias Department of Informatics Athens University of Economics and Business

More information

arxiv: v1 [cs.ai] 14 Nov 2018

arxiv: v1 [cs.ai] 14 Nov 2018 Bayesian Reinforcement Learning in Factored POMDPs arxiv:1811.05612v1 [cs.ai] 14 Nov 2018 ABSTRACT Sammie Katt Northeastern University Bayesian approaches provide a principled solution to the explorationexploitation

More information

Introduction to Reinforcement Learning. CMPT 882 Mar. 18

Introduction to Reinforcement Learning. CMPT 882 Mar. 18 Introduction to Reinforcement Learning CMPT 882 Mar. 18 Outline for the week Basic ideas in RL Value functions and value iteration Policy evaluation and policy improvement Model-free RL Monte-Carlo and

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Learning in Zero-Sum Team Markov Games using Factored Value Functions

Learning in Zero-Sum Team Markov Games using Factored Value Functions Learning in Zero-Sum Team Markov Games using Factored Value Functions Michail G. Lagoudakis Department of Computer Science Duke University Durham, NC 27708 mgl@cs.duke.edu Ronald Parr Department of Computer

More information

Reinforcement Learning in Partially Observable Multiagent Settings: Monte Carlo Exploring Policies

Reinforcement Learning in Partially Observable Multiagent Settings: Monte Carlo Exploring Policies Reinforcement earning in Partially Observable Multiagent Settings: Monte Carlo Exploring Policies Presenter: Roi Ceren THINC ab, University of Georgia roi@ceren.net Prashant Doshi THINC ab, University

More information

Efficient Maximization in Solving POMDPs

Efficient Maximization in Solving POMDPs Efficient Maximization in Solving POMDPs Zhengzhu Feng Computer Science Department University of Massachusetts Amherst, MA 01003 fengzz@cs.umass.edu Shlomo Zilberstein Computer Science Department University

More information

Probabilistic Graphical Networks: Definitions and Basic Results

Probabilistic Graphical Networks: Definitions and Basic Results This document gives a cursory overview of Probabilistic Graphical Networks. The material has been gleaned from different sources. I make no claim to original authorship of this material. Bayesian Graphical

More information

Bias-Variance Error Bounds for Temporal Difference Updates

Bias-Variance Error Bounds for Temporal Difference Updates Bias-Variance Bounds for Temporal Difference Updates Michael Kearns AT&T Labs mkearns@research.att.com Satinder Singh AT&T Labs baveja@research.att.com Abstract We give the first rigorous upper bounds

More information

Mathematical Formulation of Our Example

Mathematical Formulation of Our Example Mathematical Formulation of Our Example We define two binary random variables: open and, where is light on or light off. Our question is: What is? Computer Vision 1 Combining Evidence Suppose our robot

More information

Partially Observable Markov Decision Processes (POMDPs) Pieter Abbeel UC Berkeley EECS

Partially Observable Markov Decision Processes (POMDPs) Pieter Abbeel UC Berkeley EECS Partially Observable Markov Decision Processes (POMDPs) Pieter Abbeel UC Berkeley EECS Many slides adapted from Jur van den Berg Outline POMDPs Separation Principle / Certainty Equivalence Locally Optimal

More information

10 Robotic Exploration and Information Gathering

10 Robotic Exploration and Information Gathering NAVARCH/EECS 568, ROB 530 - Winter 2018 10 Robotic Exploration and Information Gathering Maani Ghaffari April 2, 2018 Robotic Information Gathering: Exploration and Monitoring In information gathering

More information

Today s Outline. Recap: MDPs. Bellman Equations. Q-Value Iteration. Bellman Backup 5/7/2012. CSE 473: Artificial Intelligence Reinforcement Learning

Today s Outline. Recap: MDPs. Bellman Equations. Q-Value Iteration. Bellman Backup 5/7/2012. CSE 473: Artificial Intelligence Reinforcement Learning CSE 473: Artificial Intelligence Reinforcement Learning Dan Weld Today s Outline Reinforcement Learning Q-value iteration Q-learning Exploration / exploitation Linear function approximation Many slides

More information

Satisfaction Equilibrium: Achieving Cooperation in Incomplete Information Games

Satisfaction Equilibrium: Achieving Cooperation in Incomplete Information Games Satisfaction Equilibrium: Achieving Cooperation in Incomplete Information Games Stéphane Ross and Brahim Chaib-draa Department of Computer Science and Software Engineering Laval University, Québec (Qc),

More information

Bagging During Markov Chain Monte Carlo for Smoother Predictions

Bagging During Markov Chain Monte Carlo for Smoother Predictions Bagging During Markov Chain Monte Carlo for Smoother Predictions Herbert K. H. Lee University of California, Santa Cruz Abstract: Making good predictions from noisy data is a challenging problem. Methods

More information

Q-Learning in Continuous State Action Spaces

Q-Learning in Continuous State Action Spaces Q-Learning in Continuous State Action Spaces Alex Irpan alexirpan@berkeley.edu December 5, 2015 Contents 1 Introduction 1 2 Background 1 3 Q-Learning 2 4 Q-Learning In Continuous Spaces 4 5 Experimental

More information

MDP Preliminaries. Nan Jiang. February 10, 2019

MDP Preliminaries. Nan Jiang. February 10, 2019 MDP Preliminaries Nan Jiang February 10, 2019 1 Markov Decision Processes In reinforcement learning, the interactions between the agent and the environment are often described by a Markov Decision Process

More information

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling 10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel

More information

Distributed Optimization. Song Chong EE, KAIST

Distributed Optimization. Song Chong EE, KAIST Distributed Optimization Song Chong EE, KAIST songchong@kaist.edu Dynamic Programming for Path Planning A path-planning problem consists of a weighted directed graph with a set of n nodes N, directed links

More information

European Workshop on Reinforcement Learning A POMDP Tutorial. Joelle Pineau. McGill University

European Workshop on Reinforcement Learning A POMDP Tutorial. Joelle Pineau. McGill University European Workshop on Reinforcement Learning 2013 A POMDP Tutorial Joelle Pineau McGill University (With many slides & pictures from Mauricio Araya-Lopez and others.) August 2013 Sequential decision-making

More information

Planning and Acting in Partially Observable Stochastic Domains

Planning and Acting in Partially Observable Stochastic Domains Planning and Acting in Partially Observable Stochastic Domains Leslie Pack Kaelbling*, Michael L. Littman**, Anthony R. Cassandra*** *Computer Science Department, Brown University, Providence, RI, USA

More information

MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti

MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti 1 MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti Historical background 2 Original motivation: animal learning Early

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

Learning Energy-Based Models of High-Dimensional Data

Learning Energy-Based Models of High-Dimensional Data Learning Energy-Based Models of High-Dimensional Data Geoffrey Hinton Max Welling Yee-Whye Teh Simon Osindero www.cs.toronto.edu/~hinton/energybasedmodelsweb.htm Discovering causal structure as a goal

More information

Sequential Decision Problems

Sequential Decision Problems Sequential Decision Problems Michael A. Goodrich November 10, 2006 If I make changes to these notes after they are posted and if these changes are important (beyond cosmetic), the changes will highlighted

More information

Partially Observable Markov Decision Processes (POMDPs)

Partially Observable Markov Decision Processes (POMDPs) Partially Observable Markov Decision Processes (POMDPs) Sachin Patil Guest Lecture: CS287 Advanced Robotics Slides adapted from Pieter Abbeel, Alex Lee Outline Introduction to POMDPs Locally Optimal Solutions

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Learning Tetris. 1 Tetris. February 3, 2009

Learning Tetris. 1 Tetris. February 3, 2009 Learning Tetris Matt Zucker Andrew Maas February 3, 2009 1 Tetris The Tetris game has been used as a benchmark for Machine Learning tasks because its large state space (over 2 200 cell configurations are

More information

Bayesian Inference and MCMC

Bayesian Inference and MCMC Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the

More information

Bayesian Networks BY: MOHAMAD ALSABBAGH

Bayesian Networks BY: MOHAMAD ALSABBAGH Bayesian Networks BY: MOHAMAD ALSABBAGH Outlines Introduction Bayes Rule Bayesian Networks (BN) Representation Size of a Bayesian Network Inference via BN BN Learning Dynamic BN Introduction Conditional

More information

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012 CSE 573: Artificial Intelligence Autumn 2012 Reasoning about Uncertainty & Hidden Markov Models Daniel Weld Many slides adapted from Dan Klein, Stuart Russell, Andrew Moore & Luke Zettlemoyer 1 Outline

More information

Complexity of stochastic branch and bound methods for belief tree search in Bayesian reinforcement learning

Complexity of stochastic branch and bound methods for belief tree search in Bayesian reinforcement learning Complexity of stochastic branch and bound methods for belief tree search in Bayesian reinforcement learning Christos Dimitrakakis Informatics Institute, University of Amsterdam, Amsterdam, The Netherlands

More information

Human-Oriented Robotics. Temporal Reasoning. Kai Arras Social Robotics Lab, University of Freiburg

Human-Oriented Robotics. Temporal Reasoning. Kai Arras Social Robotics Lab, University of Freiburg Temporal Reasoning Kai Arras, University of Freiburg 1 Temporal Reasoning Contents Introduction Temporal Reasoning Hidden Markov Models Linear Dynamical Systems (LDS) Kalman Filter 2 Temporal Reasoning

More information

Bayes Adaptive Reinforcement Learning versus Off-line Prior-based Policy Search: an Empirical Comparison

Bayes Adaptive Reinforcement Learning versus Off-line Prior-based Policy Search: an Empirical Comparison Bayes Adaptive Reinforcement Learning versus Off-line Prior-based Policy Search: an Empirical Comparison Michaël Castronovo University of Liège, Institut Montefiore, B28, B-4000 Liège, BELGIUM Damien Ernst

More information

Artificial Intelligence & Sequential Decision Problems

Artificial Intelligence & Sequential Decision Problems Artificial Intelligence & Sequential Decision Problems (CIV6540 - Machine Learning for Civil Engineers) Professor: James-A. Goulet Département des génies civil, géologique et des mines Chapter 15 Goulet

More information

6 Reinforcement Learning

6 Reinforcement Learning 6 Reinforcement Learning As discussed above, a basic form of supervised learning is function approximation, relating input vectors to output vectors, or, more generally, finding density functions p(y,

More information

16.4 Multiattribute Utility Functions

16.4 Multiattribute Utility Functions 285 Normalized utilities The scale of utilities reaches from the best possible prize u to the worst possible catastrophe u Normalized utilities use a scale with u = 0 and u = 1 Utilities of intermediate

More information

Graphical Models and Kernel Methods

Graphical Models and Kernel Methods Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.

More information

Reinforcement learning

Reinforcement learning Reinforcement learning Based on [Kaelbling et al., 1996, Bertsekas, 2000] Bert Kappen Reinforcement learning Reinforcement learning is the problem faced by an agent that must learn behavior through trial-and-error

More information

Expectation propagation for signal detection in flat-fading channels

Expectation propagation for signal detection in flat-fading channels Expectation propagation for signal detection in flat-fading channels Yuan Qi MIT Media Lab Cambridge, MA, 02139 USA yuanqi@media.mit.edu Thomas Minka CMU Statistics Department Pittsburgh, PA 15213 USA

More information

Lecture 13 : Variational Inference: Mean Field Approximation

Lecture 13 : Variational Inference: Mean Field Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

Introduction to Restricted Boltzmann Machines

Introduction to Restricted Boltzmann Machines Introduction to Restricted Boltzmann Machines Ilija Bogunovic and Edo Collins EPFL {ilija.bogunovic,edo.collins}@epfl.ch October 13, 2014 Introduction Ingredients: 1. Probabilistic graphical models (undirected,

More information

Using Gaussian Processes for Variance Reduction in Policy Gradient Algorithms *

Using Gaussian Processes for Variance Reduction in Policy Gradient Algorithms * Proceedings of the 8 th International Conference on Applied Informatics Eger, Hungary, January 27 30, 2010. Vol. 1. pp. 87 94. Using Gaussian Processes for Variance Reduction in Policy Gradient Algorithms

More information

28 : Approximate Inference - Distributed MCMC

28 : Approximate Inference - Distributed MCMC 10-708: Probabilistic Graphical Models, Spring 2015 28 : Approximate Inference - Distributed MCMC Lecturer: Avinava Dubey Scribes: Hakim Sidahmed, Aman Gupta 1 Introduction For many interesting problems,

More information

Temporal Difference Learning & Policy Iteration

Temporal Difference Learning & Policy Iteration Temporal Difference Learning & Policy Iteration Advanced Topics in Reinforcement Learning Seminar WS 15/16 ±0 ±0 +1 by Tobias Joppen 03.11.2015 Fachbereich Informatik Knowledge Engineering Group Prof.

More information

Partially Observable Markov Decision Processes (POMDPs)

Partially Observable Markov Decision Processes (POMDPs) Partially Observable Markov Decision Processes (POMDPs) Geoff Hollinger Sequential Decision Making in Robotics Spring, 2011 *Some media from Reid Simmons, Trey Smith, Tony Cassandra, Michael Littman, and

More information

On Prediction and Planning in Partially Observable Markov Decision Processes with Large Observation Sets

On Prediction and Planning in Partially Observable Markov Decision Processes with Large Observation Sets On Prediction and Planning in Partially Observable Markov Decision Processes with Large Observation Sets Pablo Samuel Castro pcastr@cs.mcgill.ca McGill University Joint work with: Doina Precup and Prakash

More information

Internet Monetization

Internet Monetization Internet Monetization March May, 2013 Discrete time Finite A decision process (MDP) is reward process with decisions. It models an environment in which all states are and time is divided into stages. Definition

More information

Linear Dynamical Systems

Linear Dynamical Systems Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations

More information

Expectation Propagation Algorithm

Expectation Propagation Algorithm Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,

More information

Part 1: Expectation Propagation

Part 1: Expectation Propagation Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 1: Expectation Propagation Tom Heskes Machine Learning Group, Institute for Computing and Information Sciences Radboud

More information

Sequential Monte Carlo and Particle Filtering. Frank Wood Gatsby, November 2007

Sequential Monte Carlo and Particle Filtering. Frank Wood Gatsby, November 2007 Sequential Monte Carlo and Particle Filtering Frank Wood Gatsby, November 2007 Importance Sampling Recall: Let s say that we want to compute some expectation (integral) E p [f] = p(x)f(x)dx and we remember

More information

Lecture 1: March 7, 2018

Lecture 1: March 7, 2018 Reinforcement Learning Spring Semester, 2017/8 Lecture 1: March 7, 2018 Lecturer: Yishay Mansour Scribe: ym DISCLAIMER: Based on Learning and Planning in Dynamical Systems by Shie Mannor c, all rights

More information

Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms

Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms Artificial Intelligence Review manuscript No. (will be inserted by the editor) Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms Mostafa D. Awheda Howard M. Schwartz Received:

More information

ilstd: Eligibility Traces and Convergence Analysis

ilstd: Eligibility Traces and Convergence Analysis ilstd: Eligibility Traces and Convergence Analysis Alborz Geramifard Michael Bowling Martin Zinkevich Richard S. Sutton Department of Computing Science University of Alberta Edmonton, Alberta {alborz,bowling,maz,sutton}@cs.ualberta.ca

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Machine Learning for Data Science (CS4786) Lecture 24

Machine Learning for Data Science (CS4786) Lecture 24 Machine Learning for Data Science (CS4786) Lecture 24 Graphical Models: Approximate Inference Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016sp/ BELIEF PROPAGATION OR MESSAGE PASSING Each

More information

CS188: Artificial Intelligence, Fall 2009 Written 2: MDPs, RL, and Probability

CS188: Artificial Intelligence, Fall 2009 Written 2: MDPs, RL, and Probability CS188: Artificial Intelligence, Fall 2009 Written 2: MDPs, RL, and Probability Due: Thursday 10/15 in 283 Soda Drop Box by 11:59pm (no slip days) Policy: Can be solved in groups (acknowledge collaborators)

More information

Final Exam December 12, 2017

Final Exam December 12, 2017 Introduction to Artificial Intelligence CSE 473, Autumn 2017 Dieter Fox Final Exam December 12, 2017 Directions This exam has 7 problems with 111 points shown in the table below, and you have 110 minutes

More information

Oblivious Equilibrium: A Mean Field Approximation for Large-Scale Dynamic Games

Oblivious Equilibrium: A Mean Field Approximation for Large-Scale Dynamic Games Oblivious Equilibrium: A Mean Field Approximation for Large-Scale Dynamic Games Gabriel Y. Weintraub, Lanier Benkard, and Benjamin Van Roy Stanford University {gweintra,lanierb,bvr}@stanford.edu Abstract

More information

Planning Under Uncertainty II

Planning Under Uncertainty II Planning Under Uncertainty II Intelligent Robotics 2014/15 Bruno Lacerda Announcement No class next Monday - 17/11/2014 2 Previous Lecture Approach to cope with uncertainty on outcome of actions Markov

More information

Learning to Coordinate Efficiently: A Model-based Approach

Learning to Coordinate Efficiently: A Model-based Approach Journal of Artificial Intelligence Research 19 (2003) 11-23 Submitted 10/02; published 7/03 Learning to Coordinate Efficiently: A Model-based Approach Ronen I. Brafman Computer Science Department Ben-Gurion

More information

Final Exam December 12, 2017

Final Exam December 12, 2017 Introduction to Artificial Intelligence CSE 473, Autumn 2017 Dieter Fox Final Exam December 12, 2017 Directions This exam has 7 problems with 111 points shown in the table below, and you have 110 minutes

More information

Markov chain Monte Carlo methods for visual tracking

Markov chain Monte Carlo methods for visual tracking Markov chain Monte Carlo methods for visual tracking Ray Luo rluo@cory.eecs.berkeley.edu Department of Electrical Engineering and Computer Sciences University of California, Berkeley Berkeley, CA 94720

More information

CSEP 573: Artificial Intelligence

CSEP 573: Artificial Intelligence CSEP 573: Artificial Intelligence Hidden Markov Models Luke Zettlemoyer Many slides over the course adapted from either Dan Klein, Stuart Russell, Andrew Moore, Ali Farhadi, or Dan Weld 1 Outline Probabilistic

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

More information

Machine Learning I Reinforcement Learning

Machine Learning I Reinforcement Learning Machine Learning I Reinforcement Learning Thomas Rückstieß Technische Universität München December 17/18, 2009 Literature Book: Reinforcement Learning: An Introduction Sutton & Barto (free online version:

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Unsupervised Learning with Permuted Data

Unsupervised Learning with Permuted Data Unsupervised Learning with Permuted Data Sergey Kirshner skirshne@ics.uci.edu Sridevi Parise sparise@ics.uci.edu Padhraic Smyth smyth@ics.uci.edu School of Information and Computer Science, University

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that

More information

Lecture 6: Graphical Models: Learning

Lecture 6: Graphical Models: Learning Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)

More information

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods Prof. Daniel Cremers 14. Sampling Methods Sampling Methods Sampling Methods are widely used in Computer Science as an approximation of a deterministic algorithm to represent uncertainty without a parametric

More information

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning for for Advanced Topics in California Institute of Technology April 20th, 2017 1 / 50 Table of Contents for 1 2 3 4 2 / 50 History of methods for Enrico Fermi used to calculate incredibly accurate predictions

More information

CS 343: Artificial Intelligence

CS 343: Artificial Intelligence CS 343: Artificial Intelligence Bayes Nets: Sampling Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley.

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Robust Monte Carlo Methods for Sequential Planning and Decision Making

Robust Monte Carlo Methods for Sequential Planning and Decision Making Robust Monte Carlo Methods for Sequential Planning and Decision Making Sue Zheng, Jason Pacheco, & John Fisher Sensing, Learning, & Inference Group Computer Science & Artificial Intelligence Laboratory

More information

The Origin of Deep Learning. Lili Mou Jan, 2015

The Origin of Deep Learning. Lili Mou Jan, 2015 The Origin of Deep Learning Lili Mou Jan, 2015 Acknowledgment Most of the materials come from G. E. Hinton s online course. Outline Introduction Preliminary Boltzmann Machines and RBMs Deep Belief Nets

More information

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels? Machine Learning and Bayesian Inference Dr Sean Holden Computer Laboratory, Room FC6 Telephone extension 6372 Email: sbh11@cl.cam.ac.uk www.cl.cam.ac.uk/ sbh11/ Unsupervised learning Can we find regularity

More information

Probabilistic Graphical Models

Probabilistic Graphical Models 10-708 Probabilistic Graphical Models Homework 3 (v1.1.0) Due Apr 14, 7:00 PM Rules: 1. Homework is due on the due date at 7:00 PM. The homework should be submitted via Gradescope. Solution to each problem

More information

Optimizing Memory-Bounded Controllers for Decentralized POMDPs

Optimizing Memory-Bounded Controllers for Decentralized POMDPs Optimizing Memory-Bounded Controllers for Decentralized POMDPs Christopher Amato, Daniel S. Bernstein and Shlomo Zilberstein Department of Computer Science University of Massachusetts Amherst, MA 01003

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Lecture 16 Deep Neural Generative Models

Lecture 16 Deep Neural Generative Models Lecture 16 Deep Neural Generative Models CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 22, 2017 Approach so far: We have considered simple models and then constructed

More information

Marks. bonus points. } Assignment 1: Should be out this weekend. } Mid-term: Before the last lecture. } Mid-term deferred exam:

Marks. bonus points. } Assignment 1: Should be out this weekend. } Mid-term: Before the last lecture. } Mid-term deferred exam: Marks } Assignment 1: Should be out this weekend } All are marked, I m trying to tally them and perhaps add bonus points } Mid-term: Before the last lecture } Mid-term deferred exam: } This Saturday, 9am-10.30am,

More information

RL 3: Reinforcement Learning

RL 3: Reinforcement Learning RL 3: Reinforcement Learning Q-Learning Michael Herrmann University of Edinburgh, School of Informatics 20/01/2015 Last time: Multi-Armed Bandits (10 Points to remember) MAB applications do exist (e.g.

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

Hidden Markov Models (recap BNs)

Hidden Markov Models (recap BNs) Probabilistic reasoning over time - Hidden Markov Models (recap BNs) Applied artificial intelligence (EDA132) Lecture 10 2016-02-17 Elin A. Topp Material based on course book, chapter 15 1 A robot s view

More information

Probabilistic Time Series Classification

Probabilistic Time Series Classification Probabilistic Time Series Classification Y. Cem Sübakan Boğaziçi University 25.06.2013 Y. Cem Sübakan (Boğaziçi University) M.Sc. Thesis Defense 25.06.2013 1 / 54 Problem Statement The goal is to assign

More information

Minimizing Communication Cost in a Distributed Bayesian Network Using a Decentralized MDP

Minimizing Communication Cost in a Distributed Bayesian Network Using a Decentralized MDP Minimizing Communication Cost in a Distributed Bayesian Network Using a Decentralized MDP Jiaying Shen Department of Computer Science University of Massachusetts Amherst, MA 0003-460, USA jyshen@cs.umass.edu

More information