Piecewise Linear Dynamic Programming for Constrained POMDPs

Size: px
Start display at page:

Download "Piecewise Linear Dynamic Programming for Constrained POMDPs"

Transcription

1 Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence (28) Piecewise Linear Dynamic Programming for Constrained POMDPs Joshua D. Isom Sikorsky Aircraft Corporation Stratford, CT 6615 Sean P. Meyn and Richard D. Braatz University of Illinois at Urana-Champaign Urana, IL 6181 Astract We descrie an exact dynamic programming update for constrained partially oservale Markov decision processes (CPOMDPs). State-of-the-art exact solution of unconstrained POMDPs relies on implicit enumeration of the vectors in the piecewise linear value function, and pruning operations to otain a minimal representation of the updated value function. In dynamic programming for CPOMDPs, each vector takes two valuations, one with respect to the ojective function and another with respect to the constraint function. The dynamic programming update consists of finding, for each elief state, the vector that has the est ojective function valuation while still satisfying the constraint function. Whereas the pruning operation in an unconstrained POMDP requires solution of a linear program, the pruning operation for CPOMDPs requires solution of a mixed integer linear program. Background The partially oservale Markov decision process (POMDP) is a model for decision-making under uncertainty with respect oth to the current state and to the future evolution of the system. The model generalizes the fully oserved Markov decision process, which allows only for uncertainty as to the future evolution of the system. The theory of solution of fully oserved Markov decision processes with finite or countale state space is well-estalished for oth unconstrained (Puterman 25) and constrained (Altman 1999) prolem formulations. Despite a recent urst of algorithmic development for unconstrained POMDPs (Cassandra 1998; Hansen 1998; Poupart 25; Feng & Zilerstein 24; Spaan & Vlassis 25), there has een relatively little development of algorithmic approaches for the constrained prolem. A comprehensive treatment of countale state constrained Markov decision processes is Altman s monograph (Altman 1999). The central solution concept is a countaly infinite linear program, which has desirale theoretical properties ut oviously requires LP approximation or state truncation to e important practically. Complexity reduction is addressed in (Meyn 27) for network models. Approaches include relaxation techniques and value function approximations ased on a mean-flow model. Because the indefinite- Copyright c 28, Association for the Advancement of Artificial Intelligence ( All rights reserved. horizon and infinite-horizon discounted POMDPs considered here are equivalent to Markov decision processes with an uncountaly infinite state space, the countale state results are not directly applicale. The solution technique that we propose is related to an approach developed y Piunovskiy and Mao (Piunovskiy & Mao 2). They considered an uncountaly infinitestate Markov decision process, and proposed augmenting the state space with a state representing the expected value of the constraint function, with a penalty function used to prohiit solutions that violate the constraint. The technique of Piunovskiy and Mao is independent of the representation of the value function, and was illustrated with an example that admitted an exact analytical solution. Our approach is similar in that it disallows at every step a solution that would violate the constraint. However, our method uses a piecewise linear representation of the value function, uilding on techniques developed for unconstrained POMDPs and allowing for ɛ-exact numerical solutions. Our value function update approach can e the asis for value iteration or for policy iteration using Hansen s policy improvement technique. A constrained partially oservale Markov decision process is an 7-tuple where (S, A, Σ, p(s s, a), p(z s), r(s, a), c(s, a)), (1) S, A and Σ are finite sets of states, actions, and oservations, p(s s, a) is the state transition function, where p(s s, a) is the proaility that state s is reached from state s on action a, p(z s) is the oservation function, where p(z s) is the proaility that oservation z Σ will e made in state s, r(s, a) is the cost incurred y taking action a in state s, and c(s, a) is a constraint function. A deterministic stationary policy φ for a CPOMDP is a map from all possile oservation sequences to an action in A, with the set of all policies given y Φ = {φ : Σ A}. (2) 291

2 We consider two prolems on CPOMDPs. The indefinitehorizon prolem takes the form [ T minimize E Φ r(a t, s t ), (3) [ T E Φ c(a t, s t ) α, s, (4) where T is the time that the system enters a set of one or more terminal states, and s is the initial state. Conditions that ensure that the dynamic programming operator for an indefinite-horizon prolem has a unique fixed point are developed in the literature (Hansen 27; Patek 21). For the infinite-horizon constrained case, the constrained prolem takes the form [ minimize E Φ λ t r(a t, s t ) (5) [ E Φ λ t c(s t, a t ) α, s, (6) with discount factor < λ < 1 and initial state s. Discounting ensures that the dynamic programming operator is a contraction operator. Motivating Prolems Below we discuss three examples that motivate development of techniques for solution of CPOMDPs. Practical prolems that are naturally formulated as CPOMDPs are at least as prevalent as those formulated as unconstrained prolems. Change Detection A classic example of a constrained indefinite-horizon POMDP is the prolem of quickest change detection. The prolem is to minimize detection delay a constraint on the proaility of false alarm. This prolem was studied y Shiryaev (Shiryaev 1963), who elucidated the form of the optimal policy ut did not estalish numerical techniques for optimal parameterization of the solution. The difficulty of finding the optimal parameterization for the policy, or even evaluating the proaility of false alarm, has een noted in the literature (Tartakovsky & Veeravalli 24). An indefinite-horizon constrained Bayes change detection prolem is a 5-tuple {Σ, α, f, f 1, T } with Σ a finite set of oservations, change time parameter < ρ < 1, proaility of false alarm constraint < α < 1, pre-change oservation proaility mass function f : Σ [, 1, post-change oservation proaility mass function f 1 : Σ [, 1, and countale time set T. The change time ν is a geometric random variale with parameter ρ, and η is an adapted estimate. The prolem statement is minimize E [(η ν φ Φ φ(y 1,...,y h ) 1)+ (7) 1-ρ / 1 / 1 / 1 f ρ / f 1 / 1 f 1 / 1 Figure 1: State transition diagram for the change detection prolem with geometric change time distriution. Arcs are laeled with transition proailities under the no-alarm / alarm actions. Nodes are laeled with the distriution of oservations for the state. with p(η < ν) < α (8) Φ = {φ : Σ [, 1}, (9) where the distriution of random variales is given y p(y t = a ν < t) = f (z), z Σ, (1) p(y t = a ν t) = f 1 (z), z Σ, (11) p(η = t η t) = φ(y 1... y t ), (12) p(ν = t) = ρ(1 ρ) t 1. (13) The prolem of quickest change detection can e modeled as an indefinite-horizon POMDP with S = {PreChange, PostChange, PostAlarm}, A = {Alarm, NoAlarm}, c(prechange, Alarm) = 1, r(postchange, NoAlarm) = 1, (14) and constraint functional [ T E Φ c(a t, s t ) α. (15) State transition and oservation proailities are as shown in Figure 1. Structural Monitoring and Maintenance Bridges, aircraft airframes, and other structural components in engineered systems are fatigue failures. Nondestructive techniques for inspection of the damage state of the component are imperfect, so the true structural damage state of the component is imperfectly knowale. The maintainer may take several actions in structural monitoring, including inspection, repair, or replacement. Because the consequences of structural failure are often severe, the structural monitoring and maintenance prolem is most naturally formulated as an infinite-horizon discounted POMDP in which the ojective is to minimize lifecycle maintenance and replacement costs a constraint on the proaility of failure, expressed at least qualitatively in terms like 292

3 no more than one in a million. Because the service life of the engineered system is usually uncertain, a discounted life cycle cost ojectives and occupation measure constraints are appropriate (Puterman 25, p. 125). Opportunistic Spectrum Access Limits on andwidth for wireless services motivate dynamic spectrum sharing strategies such as opportunistic spectrum access (OSA), which is a strategy to allow secondary users to exploit instantaneous spectrum opportunities while limiting the level of interference for primary users. Zhao and Swami proposed a CPOMDP decision-theoretic framework for solution of the OSA prolem (Zhao & Swami 27). In this framework, the state is determined y whether or not the spectrum is in use y the primary user, the oservations consist of partial spectrum measurements, the reward function is the numer of its delivered y the secondary user, and the constraint is on the proaility of collision with a primary user. Exact Dynamic Programming Update for CPOMDP Having defined a CPOMDP and provided some motivating prolems, we now descrie how to extend dynamic programming techniques for POMDPs to solve CPOMDPs. A POMDP can e transformed into an infinite-state fully oserved Markov decision process y defining a new elief state, which is a vector [, 1 S, (s) = 1, representing the posterior proaility that the system is in each state s S, given the starting state, oservation history, and action history. The elief state is a sufficient statistic and can e recursively updated via Bayes rule. If oservation z is made and action a taken, and if the previous elief state vector was (s), the new elief state vector (s ) is given y (s ) = p(z s ) s p(s a, s)(s) s,s p(z s )p(s a, s)(s). (16) The dynamic programming update for a POMDP is [ v n+1 () = min a A r(, a) + λ z Σ p(z, a)v n ( (, a, z)) Exact methods for efficiently performing the dynamic programming update for POMDPs include incremental pruning (Cassandra 1998) and restricted region incremental pruning (Feng & Zilerstein 24). A key technique for oth of these methods is representation of the value function as a piecewise-linear function over the elief simplex. This representation allows for a finite representation of the value function as a set of value function vectors, the elements of which represent the intersection of a hyperplane with a given axis of the elief simplex. Stationary nonrandomized policies resulting from a stationary value function are sufficient for constrained discounted (Piunovskiy & Mao 2) and indefinite-horizon (Feinerg & Piunovskiy 2) Markov decision processes prolems with continuous state space. Exact techniques for the POMDP dynamic programming update implicitly enumerate all value function vectors for. the updated value function, and use a pruning operation to maintain a minimal representation. An explicitly enumerative dynamic programming update to convert a set of value function vectors V to a new set V is with v a,z,i (s) = V = a A z Σ {v a,z,i v i V} (17) r(s, a) +λ p(z s, a)p(s, s, a)v i (s ), (18) Σ s S and λ = 1 for the indefinite-horizon prolem or < λ < 1 for the infinite-horizon discounted prolem. Typically, many of the value function vectors in V are dominated and superfluous to a minimal value function representation. Pruning is the process of identifying vectors that are not part of the minimal value function representation. At the heart of the pruning operation is solution of a linear program to determine if there is a point where a given value function vector w is etter (lower cost) than all other value function vectors u i U. The linear program takes the form variales: h, (s) s S maximize: h the constraints: (u w) h, u U s S (s) = 1. If the linear program is feasile, then w is a candidate for the minimal value function representation. Incremental pruning interleaves pruning with vector generation, to reduce the computational cost associated with pruning for each dynamic programming update. With PR representing the pruning operation, the dynamic programming update is given y V = PR( a AV a ), V a = PR(V a,z 1 PR(V a,z 2...PR(V a,z k 1 V a,z k )...)), V a,z = PR({v a,z,i v i V}). (19) This dynamic programming technique can e extended to the constrained prolem y generating two sets of value function vectors with a one-to-one correspondence among the vectors in the set. One set is valued with respect to the cost function r, and the other is valued with respect to the constraint function c. We write (v i r, v i c) for the vectors corresponding to each other in the two sets, and V for the collection of pairs. Let v a,z,j r (s) = r(s, a) +λ p(z s, a)p(s, s, a)v Σ r(s j ) (2) s S and vc a,z,j c(s, a) (s) = + λ p(z s, a)p(s, s, a)v Σ c(s j ). s S (21) Equation 17 is applied separately to {vr a,z,j (s)} and {vc a,z,j (s)}, and vectors (vr, i vc) i in the resulting sets correspond if they were generated from the same action and the same predecessor vector index for each oservation. To develop a pruning operation for the value function vectors, note that for a vector v to elong to the minimal set, it 293

4 must have a lower cost v r at than any another vector u i r U that satisfies the constraint u i c < α for a pair (u r, u c ). This concept is illustrated in Figure 2. If a vector u i that does not satisfy the constraint u i c < α at, the test vector w need not have a lower cost. Thus there are conditional constraints. A standard trick for incorporating conditional constraints into linear programs is to add a discrete variale d i {, 1}, resulting in a mixed integer linear program. Applying this technique to the prolem at hand produces the mixed integer linear program variales: h, (s) > s S, d i {, 1} i {1,..., U } maximize: h the constraints: v c α, (u i r v r ) h d i M, u i c d i α, (u i c, u i r) U, (s) = 1, s S with M a large positive numer. If d i = 1, then u i s violates the constraint at, and a candidate vector v r need not have a lower cost than u i r at. On the other hand, if d i =, then u i s satisfies the constraint at, and a candidate vector v r must have a lower cost than u i r at. If the program is feasile, then v r is a candidate for the lowest cost, minimal value function representation satisfying the constraint. The entire pruning operation for a CPOMDP is presented in Tale 1. The CPOMDP pruning operation can e used to perform value iteration using incremental pruning dynamic programming updates using Equation 19. After a dynamic programming update producing a minimal set of vectors V, the value function is given y v() = min( v r : v c α, (v r, v c ) V). The value function converges to the optimal value function with each successive dynamic programming update. As an alternative to value iteration, the exact dynamic programming update can e used to perform policy iteration usings Hansen s algorithm (Hansen 1998), which uses a deterministic finite-state controller (Q, q o, Σ, δ, A, α) for a policy representation, where Q is a finite set of controller states,one for each vector v V, q Q is the start state, Σ is a finite input alphaet, δ is a function from Q Σ into Q, called the transition function, A is a finite output alphaet, α is an output function from Q into A, with α(i) as shorthand for α(q i ). Because the comination of the system with the controller yields a Markov chain, any policy can e evaluated via solution of a linear system with coefficient matrix (I λa) with I an identity matrix, A the system/controller transition matrix, and λ = 1 for the indefinite-horizon prolem or < λ < 1 for the infinite-horizon prolem. The system state index i and the controller state index j are mapped to a new index k via a one-to-one mapping (i, j) k, and the v r v c d.5 e e d c a a c Figure 2: The top plot shows value functions vectors with ojective function valuation v r, while the ottom plot shows value function vectors with constraint valuation v c. Vectors in the top plot are highlighted in the region of the elief space where the vector is the est (lowest cost) vector satisfying the constraint α =.2. coefficients of the controller/system transition matrix A are given y A ((i, j), (i, j )) p(i i, α(j))p(z j, α(j))1{δ(j, z) = j }. (22) z Σ The linear system for finding the ojective value function of a policy is (I λa)r = x, where r (i,j) = r(i, α(j)) and x (i,j) is the value function intercept for the ith elief state vertex for the jth value function vector vr. j The linear system for finding the constraint value function of a policy is (I λa)c = x, where c (i,j) = c(i, α(j)) and x (i,j) is the value function intercept for the ith elief state vertex for the jth value function vector vc. j Under Hansen s policy iteration algorithm, an initial policy is evaluated with respect to oth the ojective and constraint functions, and then a dynamic programming update is performed using Equation 19 and the pruning operation descried in Tale 1. Each vector in the updated value function represents a new candidate state for the finite state controller. Once new states have een added, the policy is evaluated to produce a new value function, and the algorithm repeats. Each iteration yields an improved finite state controller that satisfies the prolem constraints. 294

5 Tale 1: Algorithm for pruning value function vectors for a CPOMDP Procedure: MIP-DOMINATE((w r, w c ), U) 1: solve the following mixed integer linear program: 2: variales: 3: h, (s) > s S, 4: d i {, 1} i {1,..., U } 5: maximize: h 6: the constraints: 7: w c α, 8: (u i r w r ) h d i M, 9: u i c d i α, (u i r, u i c) U 1: s S (s) = 1. 11: if h then 12: return 13: else 14: return nil 15: end if Procedure: BEST(, U) 1: min 2: for all (u r, u c ) U do 3: if u c < α then 4: if ( u r < min) or (( u r = min) and (u r < lex w r )) then 5: (w r, w c ) (u r, u c ) 6: min u 7: end if 8: end if 9: end for 1: return (w r, w c ) Procedure: PR(W) 1: D 2: while W do 3: (w r, w c ) any element in W 4: MIP-DOMINATE((w r, w c ), D) 5: if = nil then 6: W W {(w r, w c )} 7: else 8: (w r, w c ) BEST(, W) 9: D D {(w r, w c )} 1: W W {(w r, w c )} 11: end if 12: end while 13: return D Example The solution technique is illustrated with an example. Consider the indefinite-horizon change detection prolem with geometric change time parameter ρ =.1, oservation alphaet size Σ = 3, pre-change oservation distriution f (z 1 ) =.6, f (z 2 ) =.3, f (z 3 ) =.1, and post-change oservation distriution The prolem statement is f (z 1 ) =.2, f (z 2 ) =.4, f (z 3 ) =.4. minimize v = E[(τ η 1) + (23) φ Φ p(τ < η) α. (24) with false alarm proaility constraint α =.2. Starting with initial value function vectors v r = [1 and v c = [, we perform four dynamic programming updates. The results are shown in Figure 3. The policy corresponding to the value function consists of producing an alarm when the proaility that the system is in the postchange state exceeds.4, with an expected detection delay of approximately 4. Tartakovsky and Veeravalli noted that a conservative policy for the prolem of quickest detection consists of producing an alarm when the posterior proaility of change exceeds 1 α (Tartakovsky & Veeravalli 24). Note that this policy is much less conservative (and therefore etter-valued) than the conservative policy of producing an alarm when the posterior proaility of change exceeds 1 α =.8. Conclusion This paper presents an exact dynamic update for CPOMDPs using a piecewise linear representation of the value function. The key concept is modification of the standard pruning algorithm, with a mixed integer linear program replacing a linear program. The dynamic programming update can e used to perform value iteration or policy iteration. The method should e useful for finding solutions to modestlysized sequential decision prolems that include uncertainty and constraints. The scaling ehavior of the proposed CPOMDP dynamic programming update is expected to e similar to that of exact dynamic programming updates for POMDPs, which exhiit PSPACE complexity for many prolems. Although this scaling ehavior makes the proposed technique unsuitale for large-scale prolems, we hope that the insight provided y an exact numerical technique will spur development of approximate methods for solution of CPOMDPs. 295

6 v c v r (s 2 ) (s 2 ) Figure 3: A minimal representation of the value function for the change detection example after four exact dynamic programming updates. Each vector is presented with its ojective valuation v r and constraint valuation v c, with (s 2 ) the posterior proaility that the system is in the post-change state. An approximation of the optimal value function v() is shown as dots on the top plot. References Altman, E Constrained Markov Decision Processes. Boca Raton, LA: CRC Press. Cassandra, A. R Exact and Approximate Algorithms for Partially Oservale Markov Decision Processes. Ph.D. Dissertation, Brown University. Feinerg, E. A., and Piunovskiy, A. B. 2. Multiple ojective nonatomic markov decision processes with total reward criteria. J. Math. Anal. Appl. 247: Feng, Z., and Zilerstein, S. 24. Region-ased incremental pruning for POMDPs. In Proceedings of the 2th Conference on Uncertainty in Artificial Intelligence, Hansen, E. A Finite Memory Control of Partially Oservale Systems. PhD thesis, University of Massachusetts Amherst. Hansen, E. A. 27. Indefinite-horizon POMDPs with action-ased termination. In 22nd National Conference on Artificial Intelligence. Meyn, S. P. 27. Control Techniques for Complex Networks. Camridge, UK: Camridge University Press. Patek, S. D. 21. On partially oserved stochastic shortest path prolems. In 4th IEEE Conference on Decision and Control. Piunovskiy, A. B., and Mao, X. 2. Constrained markovian decision processes: the dynamic programming approach. Operations research letters 27: Poupart, P. 25. Exploiting Structure to Efficiently Solve Large Scale Partially Oservale Markov Decision Processes. PhD thesis, University of Toronto. Puterman, M. L. 25. Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley Series in Proaility and Statistics. Hooken, NJ: John Wiley & Sons. Shiryaev, A. N On optimum methods in quickest detection prolems. Theory of Proaility and Its Applications 8(1): Spaan, M. T. J., and Vlassis, N. 25. Perseus: Randomized point-ased value iteration for POMDPs. Journal of Artificial Intelligence Research 24: Tartakovsky, A. G., and Veeravalli, V. V. 24. General asymptotic Bayesian theory of quickest change detection. Theory of Proaility and Its Applications 49(3): Zhao, Q., and Swami, A. 27. A decision-theoretic framework for opportunistic spectrum access. IEEE Wireless Communications 14(4):

Point-Based Value Iteration for Constrained POMDPs

Point-Based Value Iteration for Constrained POMDPs Point-Based Value Iteration for Constrained POMDPs Dongho Kim Jaesong Lee Kee-Eung Kim Department of Computer Science Pascal Poupart School of Computer Science IJCAI-2011 2011. 7. 22. Motivation goals

More information

Efficient Maximization in Solving POMDPs

Efficient Maximization in Solving POMDPs Efficient Maximization in Solving POMDPs Zhengzhu Feng Computer Science Department University of Massachusetts Amherst, MA 01003 fengzz@cs.umass.edu Shlomo Zilberstein Computer Science Department University

More information

CAP Plan, Activity, and Intent Recognition

CAP Plan, Activity, and Intent Recognition CAP6938-02 Plan, Activity, and Intent Recognition Lecture 10: Sequential Decision-Making Under Uncertainty (part 1) MDPs and POMDPs Instructor: Dr. Gita Sukthankar Email: gitars@eecs.ucf.edu SP2-1 Reminder

More information

A Decentralized Approach to Multi-agent Planning in the Presence of Constraints and Uncertainty

A Decentralized Approach to Multi-agent Planning in the Presence of Constraints and Uncertainty 2011 IEEE International Conference on Robotics and Automation Shanghai International Conference Center May 9-13, 2011, Shanghai, China A Decentralized Approach to Multi-agent Planning in the Presence of

More information

Region-Based Dynamic Programming for Partially Observable Markov Decision Processes

Region-Based Dynamic Programming for Partially Observable Markov Decision Processes Region-Based Dynamic Programming for Partially Observable Markov Decision Processes Zhengzhu Feng Department of Computer Science University of Massachusetts Amherst, MA 01003 fengzz@cs.umass.edu Abstract

More information

Finite-State Controllers Based on Mealy Machines for Centralized and Decentralized POMDPs

Finite-State Controllers Based on Mealy Machines for Centralized and Decentralized POMDPs Finite-State Controllers Based on Mealy Machines for Centralized and Decentralized POMDPs Christopher Amato Department of Computer Science University of Massachusetts Amherst, MA 01003 USA camato@cs.umass.edu

More information

Markov Decision Processes and Solving Finite Problems. February 8, 2017

Markov Decision Processes and Solving Finite Problems. February 8, 2017 Markov Decision Processes and Solving Finite Problems February 8, 2017 Overview of Upcoming Lectures Feb 8: Markov decision processes, value iteration, policy iteration Feb 13: Policy gradients Feb 15:

More information

Markov decision processes and interval Markov chains: exploiting the connection

Markov decision processes and interval Markov chains: exploiting the connection Markov decision processes and interval Markov chains: exploiting the connection Mingmei Teo Supervisors: Prof. Nigel Bean, Dr Joshua Ross University of Adelaide July 10, 2013 Intervals and interval arithmetic

More information

Dynamic spectrum access with learning for cognitive radio

Dynamic spectrum access with learning for cognitive radio 1 Dynamic spectrum access with learning for cognitive radio Jayakrishnan Unnikrishnan and Venugopal V. Veeravalli Department of Electrical and Computer Engineering, and Coordinated Science Laboratory University

More information

Bayesian inference with reliability methods without knowing the maximum of the likelihood function

Bayesian inference with reliability methods without knowing the maximum of the likelihood function Bayesian inference with reliaility methods without knowing the maximum of the likelihood function Wolfgang Betz a,, James L. Beck, Iason Papaioannou a, Daniel Strau a a Engineering Risk Analysis Group,

More information

CS 7180: Behavioral Modeling and Decisionmaking

CS 7180: Behavioral Modeling and Decisionmaking CS 7180: Behavioral Modeling and Decisionmaking in AI Markov Decision Processes for Complex Decisionmaking Prof. Amy Sliva October 17, 2012 Decisions are nondeterministic In many situations, behavior and

More information

Section Notes 9. Midterm 2 Review. Applied Math / Engineering Sciences 121. Week of December 3, 2018

Section Notes 9. Midterm 2 Review. Applied Math / Engineering Sciences 121. Week of December 3, 2018 Section Notes 9 Midterm 2 Review Applied Math / Engineering Sciences 121 Week of December 3, 2018 The following list of topics is an overview of the material that was covered in the lectures and sections

More information

Artificial Intelligence & Sequential Decision Problems

Artificial Intelligence & Sequential Decision Problems Artificial Intelligence & Sequential Decision Problems (CIV6540 - Machine Learning for Civil Engineers) Professor: James-A. Goulet Département des génies civil, géologique et des mines Chapter 15 Goulet

More information

Christopher Watkins and Peter Dayan. Noga Zaslavsky. The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015

Christopher Watkins and Peter Dayan. Noga Zaslavsky. The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015 Q-Learning Christopher Watkins and Peter Dayan Noga Zaslavsky The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015 Noga Zaslavsky Q-Learning (Watkins & Dayan, 1992)

More information

An Introduction to Markov Decision Processes. MDP Tutorial - 1

An Introduction to Markov Decision Processes. MDP Tutorial - 1 An Introduction to Markov Decision Processes Bob Givan Purdue University Ron Parr Duke University MDP Tutorial - 1 Outline Markov Decision Processes defined (Bob) Objective functions Policies Finding Optimal

More information

Beyond Loose LP-relaxations: Optimizing MRFs by Repairing Cycles

Beyond Loose LP-relaxations: Optimizing MRFs by Repairing Cycles Beyond Loose LP-relaxations: Optimizing MRFs y Repairing Cycles Nikos Komodakis 1 and Nikos Paragios 2 1 University of Crete, komod@csd.uoc.gr 2 Ecole Centrale de Paris, nikos.paragios@ecp.fr Astract.

More information

IN this paper we study a discrete optimization problem. Constrained Shortest Link-Disjoint Paths Selection: A Network Programming Based Approach

IN this paper we study a discrete optimization problem. Constrained Shortest Link-Disjoint Paths Selection: A Network Programming Based Approach Constrained Shortest Link-Disjoint Paths Selection: A Network Programming Based Approach Ying Xiao, Student Memer, IEEE, Krishnaiyan Thulasiraman, Fellow, IEEE, and Guoliang Xue, Senior Memer, IEEE Astract

More information

SPOKEN dialog systems (SDSs) help people accomplish

SPOKEN dialog systems (SDSs) help people accomplish IEEE TRANS. ON AUDIO, SPEECH & LANGUAGE PROCESSING, VOL.???, NO.???, MONTH??? YEAR??? 1 Scaling POMDPs for spoken dialog management Astract Control in spoken dialog systems is challenging largely ecause

More information

Control Theory : Course Summary

Control Theory : Course Summary Control Theory : Course Summary Author: Joshua Volkmann Abstract There are a wide range of problems which involve making decisions over time in the face of uncertainty. Control theory draws from the fields

More information

RL 14: POMDPs continued

RL 14: POMDPs continued RL 14: POMDPs continued Michael Herrmann University of Edinburgh, School of Informatics 06/03/2015 POMDPs: Points to remember Belief states are probability distributions over states Even if computationally

More information

Distributed Optimization. Song Chong EE, KAIST

Distributed Optimization. Song Chong EE, KAIST Distributed Optimization Song Chong EE, KAIST songchong@kaist.edu Dynamic Programming for Path Planning A path-planning problem consists of a weighted directed graph with a set of n nodes N, directed links

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Dynamic Programming Marc Toussaint University of Stuttgart Winter 2018/19 Motivation: So far we focussed on tree search-like solvers for decision problems. There is a second important

More information

Rollout Policies for Dynamic Solutions to the Multi-Vehicle Routing Problem with Stochastic Demand and Duration Limits

Rollout Policies for Dynamic Solutions to the Multi-Vehicle Routing Problem with Stochastic Demand and Duration Limits Rollout Policies for Dynamic Solutions to the Multi-Vehicle Routing Prolem with Stochastic Demand and Duration Limits Justin C. Goodson Jeffrey W. Ohlmann Barrett W. Thomas Octoer 2, 2012 Astract We develop

More information

c 2009 Joshua David Isom

c 2009 Joshua David Isom c 2009 Joshua David Isom EXACT SOLUTION OF BAYES AND MINIMAX CHANGE-DETECTION PROBLEMS BY JOSHUA DAVID ISOM B.S., Yale University, 2000 M.S., Rensselaer at Hartford, 2002 M.S., University of Illinois at

More information

Uncertainty. Jayakrishnan Unnikrishnan. CSL June PhD Defense ECE Department

Uncertainty. Jayakrishnan Unnikrishnan. CSL June PhD Defense ECE Department Decision-Making under Statistical Uncertainty Jayakrishnan Unnikrishnan PhD Defense ECE Department University of Illinois at Urbana-Champaign CSL 141 12 June 2010 Statistical Decision-Making Relevant in

More information

Prioritized Sweeping Converges to the Optimal Value Function

Prioritized Sweeping Converges to the Optimal Value Function Technical Report DCS-TR-631 Prioritized Sweeping Converges to the Optimal Value Function Lihong Li and Michael L. Littman {lihong,mlittman}@cs.rutgers.edu RL 3 Laboratory Department of Computer Science

More information

Decision Theory: Markov Decision Processes

Decision Theory: Markov Decision Processes Decision Theory: Markov Decision Processes CPSC 322 Lecture 33 March 31, 2006 Textbook 12.5 Decision Theory: Markov Decision Processes CPSC 322 Lecture 33, Slide 1 Lecture Overview Recap Rewards and Policies

More information

21 Markov Decision Processes

21 Markov Decision Processes 2 Markov Decision Processes Chapter 6 introduced Markov chains and their analysis. Most of the chapter was devoted to discrete time Markov chains, i.e., Markov chains that are observed only at discrete

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Model-Based Reinforcement Learning Model-based, PAC-MDP, sample complexity, exploration/exploitation, RMAX, E3, Bayes-optimal, Bayesian RL, model learning Vien Ngo MLR, University

More information

Bayes-Adaptive POMDPs: Toward an Optimal Policy for Learning POMDPs with Parameter Uncertainty

Bayes-Adaptive POMDPs: Toward an Optimal Policy for Learning POMDPs with Parameter Uncertainty Bayes-Adaptive POMDPs: Toward an Optimal Policy for Learning POMDPs with Parameter Uncertainty Stéphane Ross School of Computer Science McGill University, Montreal (Qc), Canada, H3A 2A7 stephane.ross@mail.mcgill.ca

More information

MATHEMATICAL ENGINEERING TECHNICAL REPORTS. Polynomial Time Perfect Sampler for Discretized Dirichlet Distribution

MATHEMATICAL ENGINEERING TECHNICAL REPORTS. Polynomial Time Perfect Sampler for Discretized Dirichlet Distribution MATHEMATICAL ENGINEERING TECHNICAL REPORTS Polynomial Time Perfect Sampler for Discretized Dirichlet Distriution Tomomi MATSUI and Shuji KIJIMA METR 003 7 April 003 DEPARTMENT OF MATHEMATICAL INFORMATICS

More information

FinQuiz Notes

FinQuiz Notes Reading 9 A time series is any series of data that varies over time e.g. the quarterly sales for a company during the past five years or daily returns of a security. When assumptions of the regression

More information

Planning in Markov Decision Processes

Planning in Markov Decision Processes Carnegie Mellon School of Computer Science Deep Reinforcement Learning and Control Planning in Markov Decision Processes Lecture 3, CMU 10703 Katerina Fragkiadaki Markov Decision Process (MDP) A Markov

More information

MDP Preliminaries. Nan Jiang. February 10, 2019

MDP Preliminaries. Nan Jiang. February 10, 2019 MDP Preliminaries Nan Jiang February 10, 2019 1 Markov Decision Processes In reinforcement learning, the interactions between the agent and the environment are often described by a Markov Decision Process

More information

Accelerated Vector Pruning for Optimal POMDP Solvers

Accelerated Vector Pruning for Optimal POMDP Solvers Accelerated Vector Pruning for Optimal POMDP Solvers Erwin Walraven and Matthijs T. J. Spaan Delft University of Technology Mekelweg 4, 2628 CD Delft, The Netherlands Abstract Partially Observable Markov

More information

Procedia Computer Science 00 (2011) 000 6

Procedia Computer Science 00 (2011) 000 6 Procedia Computer Science (211) 6 Procedia Computer Science Complex Adaptive Systems, Volume 1 Cihan H. Dagli, Editor in Chief Conference Organized by Missouri University of Science and Technology 211-

More information

Linear Programming. Our market gardener example had the form: min x. subject to: where: [ acres cabbages acres tomatoes T

Linear Programming. Our market gardener example had the form: min x. subject to: where: [ acres cabbages acres tomatoes T Our market gardener eample had the form: min - 900 1500 [ ] suject to: Ñ Ò Ó 1.5 2 â 20 60 á ã Ñ Ò Ó 3 â 60 á ã where: [ acres caages acres tomatoes T ]. We need a more systematic approach to solving these

More information

Exploiting Structure to Efficiently Solve Large Scale Partially Observable Markov Decision Processes. Pascal Poupart

Exploiting Structure to Efficiently Solve Large Scale Partially Observable Markov Decision Processes. Pascal Poupart Exploiting Structure to Efficiently Solve Large Scale Partially Observable Markov Decision Processes by Pascal Poupart A thesis submitted in conformity with the requirements for the degree of Doctor of

More information

Least Favorable Distributions for Robust Quickest Change Detection

Least Favorable Distributions for Robust Quickest Change Detection Least Favorable Distributions for Robust Quickest hange Detection Jayakrishnan Unnikrishnan, Venugopal V. Veeravalli, Sean Meyn Department of Electrical and omputer Engineering, and oordinated Science

More information

ON THE COMPARISON OF BOUNDARY AND INTERIOR SUPPORT POINTS OF A RESPONSE SURFACE UNDER OPTIMALITY CRITERIA. Cross River State, Nigeria

ON THE COMPARISON OF BOUNDARY AND INTERIOR SUPPORT POINTS OF A RESPONSE SURFACE UNDER OPTIMALITY CRITERIA. Cross River State, Nigeria ON THE COMPARISON OF BOUNDARY AND INTERIOR SUPPORT POINTS OF A RESPONSE SURFACE UNDER OPTIMALITY CRITERIA Thomas Adidaume Uge and Stephen Seastian Akpan, Department Of Mathematics/Statistics And Computer

More information

Symbolic Dynamic Programming for Continuous State and Observation POMDPs

Symbolic Dynamic Programming for Continuous State and Observation POMDPs Symbolic Dynamic Programming for Continuous State and Observation POMDPs Zahra Zamani ANU & NICTA Canberra, Australia zahra.zamani@anu.edu.au Pascal Poupart U. of Waterloo Waterloo, Canada ppoupart@uwaterloo.ca

More information

Optimizing Memory-Bounded Controllers for Decentralized POMDPs

Optimizing Memory-Bounded Controllers for Decentralized POMDPs Optimizing Memory-Bounded Controllers for Decentralized POMDPs Christopher Amato, Daniel S. Bernstein and Shlomo Zilberstein Department of Computer Science University of Massachusetts Amherst, MA 01003

More information

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012 CSE 573: Artificial Intelligence Autumn 2012 Reasoning about Uncertainty & Hidden Markov Models Daniel Weld Many slides adapted from Dan Klein, Stuart Russell, Andrew Moore & Luke Zettlemoyer 1 Outline

More information

Finding the Value of Information About a State Variable in a Markov Decision Process 1

Finding the Value of Information About a State Variable in a Markov Decision Process 1 05/25/04 1 Finding the Value of Information About a State Variable in a Markov Decision Process 1 Gilvan C. Souza The Robert H. Smith School of usiness, The University of Maryland, College Park, MD, 20742

More information

A POMDP Framework for Cognitive MAC Based on Primary Feedback Exploitation

A POMDP Framework for Cognitive MAC Based on Primary Feedback Exploitation A POMDP Framework for Cognitive MAC Based on Primary Feedback Exploitation Karim G. Seddik and Amr A. El-Sherif 2 Electronics and Communications Engineering Department, American University in Cairo, New

More information

Open Theoretical Questions in Reinforcement Learning

Open Theoretical Questions in Reinforcement Learning Open Theoretical Questions in Reinforcement Learning Richard S. Sutton AT&T Labs, Florham Park, NJ 07932, USA, sutton@research.att.com, www.cs.umass.edu/~rich Reinforcement learning (RL) concerns the problem

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning March May, 2013 Schedule Update Introduction 03/13/2015 (10:15-12:15) Sala conferenze MDPs 03/18/2015 (10:15-12:15) Sala conferenze Solving MDPs 03/20/2015 (10:15-12:15) Aula Alpha

More information

PROBABILISTIC PLANNING WITH RISK-SENSITIVE CRITERION PING HOU. A dissertation submitted to the Graduate School

PROBABILISTIC PLANNING WITH RISK-SENSITIVE CRITERION PING HOU. A dissertation submitted to the Graduate School PROBABILISTIC PLANNING WITH RISK-SENSITIVE CRITERION BY PING HOU A dissertation submitted to the Graduate School in partial fulfillment of the requirements for the degree Doctor of Philosophy Major Subject:

More information

University of Alberta

University of Alberta University of Alberta NEW REPRESENTATIONS AND APPROXIMATIONS FOR SEQUENTIAL DECISION MAKING UNDER UNCERTAINTY by Tao Wang A thesis submitted to the Faculty of Graduate Studies and Research in partial fulfillment

More information

Planning and Acting in Partially Observable Stochastic Domains

Planning and Acting in Partially Observable Stochastic Domains Planning and Acting in Partially Observable Stochastic Domains Leslie Pack Kaelbling*, Michael L. Littman**, Anthony R. Cassandra*** *Computer Science Department, Brown University, Providence, RI, USA

More information

Performance of Round Robin Policies for Dynamic Multichannel Access

Performance of Round Robin Policies for Dynamic Multichannel Access Performance of Round Robin Policies for Dynamic Multichannel Access Changmian Wang, Bhaskar Krishnamachari, Qing Zhao and Geir E. Øien Norwegian University of Science and Technology, Norway, {changmia,

More information

Depth versus Breadth in Convolutional Polar Codes

Depth versus Breadth in Convolutional Polar Codes Depth versus Breadth in Convolutional Polar Codes Maxime Tremlay, Benjamin Bourassa and David Poulin,2 Département de physique & Institut quantique, Université de Sherrooke, Sherrooke, Quéec, Canada JK

More information

Loss Bounds for Uncertain Transition Probabilities in Markov Decision Processes

Loss Bounds for Uncertain Transition Probabilities in Markov Decision Processes Loss Bounds for Uncertain Transition Probabilities in Markov Decision Processes Andrew Mastin and Patrick Jaillet Abstract We analyze losses resulting from uncertain transition probabilities in Markov

More information

An Adaptive Clustering Method for Model-free Reinforcement Learning

An Adaptive Clustering Method for Model-free Reinforcement Learning An Adaptive Clustering Method for Model-free Reinforcement Learning Andreas Matt and Georg Regensburger Institute of Mathematics University of Innsbruck, Austria {andreas.matt, georg.regensburger}@uibk.ac.at

More information

Today s Outline. Recap: MDPs. Bellman Equations. Q-Value Iteration. Bellman Backup 5/7/2012. CSE 473: Artificial Intelligence Reinforcement Learning

Today s Outline. Recap: MDPs. Bellman Equations. Q-Value Iteration. Bellman Backup 5/7/2012. CSE 473: Artificial Intelligence Reinforcement Learning CSE 473: Artificial Intelligence Reinforcement Learning Dan Weld Today s Outline Reinforcement Learning Q-value iteration Q-learning Exploration / exploitation Linear function approximation Many slides

More information

Internet Monetization

Internet Monetization Internet Monetization March May, 2013 Discrete time Finite A decision process (MDP) is reward process with decisions. It models an environment in which all states are and time is divided into stages. Definition

More information

Course 16:198:520: Introduction To Artificial Intelligence Lecture 13. Decision Making. Abdeslam Boularias. Wednesday, December 7, 2016

Course 16:198:520: Introduction To Artificial Intelligence Lecture 13. Decision Making. Abdeslam Boularias. Wednesday, December 7, 2016 Course 16:198:520: Introduction To Artificial Intelligence Lecture 13 Decision Making Abdeslam Boularias Wednesday, December 7, 2016 1 / 45 Overview We consider probabilistic temporal models where the

More information

Partially Observable Markov Decision Processes (POMDPs)

Partially Observable Markov Decision Processes (POMDPs) Partially Observable Markov Decision Processes (POMDPs) Geoff Hollinger Sequential Decision Making in Robotics Spring, 2011 *Some media from Reid Simmons, Trey Smith, Tony Cassandra, Michael Littman, and

More information

NEW generation devices, e.g., Wireless Sensor Networks. Battery-Powered Devices in WPCNs. arxiv: v2 [cs.it] 28 Jan 2016

NEW generation devices, e.g., Wireless Sensor Networks. Battery-Powered Devices in WPCNs. arxiv: v2 [cs.it] 28 Jan 2016 1 Battery-Powered Devices in WPCNs Alessandro Biason, Student Memer, IEEE, and Michele Zorzi, Fellow, IEEE arxiv:161.6847v2 [cs.it] 28 Jan 216 Astract Wireless powered communication networks are ecoming

More information

Symbolic Perseus: a Generic POMDP Algorithm with Application to Dynamic Pricing with Demand Learning

Symbolic Perseus: a Generic POMDP Algorithm with Application to Dynamic Pricing with Demand Learning Symbolic Perseus: a Generic POMDP Algorithm with Application to Dynamic Pricing with Demand Learning Pascal Poupart (University of Waterloo) INFORMS 2009 1 Outline Dynamic Pricing as a POMDP Symbolic Perseus

More information

Stochastic Shortest Path Problems

Stochastic Shortest Path Problems Chapter 8 Stochastic Shortest Path Problems 1 In this chapter, we study a stochastic version of the shortest path problem of chapter 2, where only probabilities of transitions along different arcs can

More information

STRUCTURE AND OPTIMALITY OF MYOPIC SENSING FOR OPPORTUNISTIC SPECTRUM ACCESS

STRUCTURE AND OPTIMALITY OF MYOPIC SENSING FOR OPPORTUNISTIC SPECTRUM ACCESS STRUCTURE AND OPTIMALITY OF MYOPIC SENSING FOR OPPORTUNISTIC SPECTRUM ACCESS Qing Zhao University of California Davis, CA 95616 qzhao@ece.ucdavis.edu Bhaskar Krishnamachari University of Southern California

More information

Statistical Models and Algorithms for Real-Time Anomaly Detection Using Multi-Modal Data

Statistical Models and Algorithms for Real-Time Anomaly Detection Using Multi-Modal Data Statistical Models and Algorithms for Real-Time Anomaly Detection Using Multi-Modal Data Taposh Banerjee University of Texas at San Antonio Joint work with Gene Whipps (US Army Research Laboratory) Prudhvi

More information

2 discretized variales approach those of the original continuous variales. Such an assumption is valid when continuous variales are represented as oat

2 discretized variales approach those of the original continuous variales. Such an assumption is valid when continuous variales are represented as oat Chapter 1 CONSTRAINED GENETIC ALGORITHMS AND THEIR APPLICATIONS IN NONLINEAR CONSTRAINED OPTIMIZATION Benjamin W. Wah and Yi-Xin Chen Department of Electrical and Computer Engineering and the Coordinated

More information

When Memory Pays: Discord in Hidden Markov Models

When Memory Pays: Discord in Hidden Markov Models When Memory Pays: Discord in Hidden Markov Models Author: Emma Lathouwers Supervisors: Prof. Dr. John Bechhoefer Dr. Laura Filion Master s thesis July 8, 2016 Astract When does keeping a memory of oservations

More information

2534 Lecture 4: Sequential Decisions and Markov Decision Processes

2534 Lecture 4: Sequential Decisions and Markov Decision Processes 2534 Lecture 4: Sequential Decisions and Markov Decision Processes Briefly: preference elicitation (last week s readings) Utility Elicitation as a Classification Problem. Chajewska, U., L. Getoor, J. Norman,Y.

More information

On the Optimality of Myopic Sensing. in Multi-channel Opportunistic Access: the Case of Sensing Multiple Channels

On the Optimality of Myopic Sensing. in Multi-channel Opportunistic Access: the Case of Sensing Multiple Channels On the Optimality of Myopic Sensing 1 in Multi-channel Opportunistic Access: the Case of Sensing Multiple Channels Kehao Wang, Lin Chen arxiv:1103.1784v1 [cs.it] 9 Mar 2011 Abstract Recent works ([1],

More information

Symbolic Dynamic Programming for Continuous State and Observation POMDPs

Symbolic Dynamic Programming for Continuous State and Observation POMDPs Symbolic Dynamic Programming for Continuous State and Observation POMDPs Zahra Zamani ANU & NICTA Canberra, Australia zahra.zamani@anu.edu.au Pascal Poupart U. of Waterloo Waterloo, Canada ppoupart@uwaterloo.ca

More information

Final Exam December 12, 2017

Final Exam December 12, 2017 Introduction to Artificial Intelligence CSE 473, Autumn 2017 Dieter Fox Final Exam December 12, 2017 Directions This exam has 7 problems with 111 points shown in the table below, and you have 110 minutes

More information

10 Robotic Exploration and Information Gathering

10 Robotic Exploration and Information Gathering NAVARCH/EECS 568, ROB 530 - Winter 2018 10 Robotic Exploration and Information Gathering Maani Ghaffari April 2, 2018 Robotic Information Gathering: Exploration and Monitoring In information gathering

More information

Function Approximation for Continuous Constrained MDPs

Function Approximation for Continuous Constrained MDPs Function Approximation for Continuous Constrained MDPs Aditya Undurti, Alborz Geramifard, Jonathan P. How Abstract In this work we apply function approximation techniques to solve continuous, constrained

More information

Markov decision processes

Markov decision processes CS 2740 Knowledge representation Lecture 24 Markov decision processes Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Administrative announcements Final exam: Monday, December 8, 2008 In-class Only

More information

Merging and splitting endowments in object assignment problems

Merging and splitting endowments in object assignment problems Merging and splitting endowments in oject assignment prolems Nanyang Bu, Siwei Chen, and William Thomson April 26, 2012 1 Introduction We consider a group of agents, each endowed with a set of indivisile

More information

Probabilistic Modelling of Duration of Load Effects in Timber Structures

Probabilistic Modelling of Duration of Load Effects in Timber Structures Proailistic Modelling of Duration of Load Effects in Timer Structures Jochen Köhler Institute of Structural Engineering, Swiss Federal Institute of Technology, Zurich, Switzerland. Summary In present code

More information

Motivation for introducing probabilities

Motivation for introducing probabilities for introducing probabilities Reaching the goals is often not sufficient: it is important that the expected costs do not outweigh the benefit of reaching the goals. 1 Objective: maximize benefits - costs.

More information

Opportunistic Spectrum Access for Energy-Constrained Cognitive Radios

Opportunistic Spectrum Access for Energy-Constrained Cognitive Radios 1206 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 8, NO. 3, MARCH 2009 Opportunistic Spectrum Access for Energy-Constrained Cognitive Radios Anh Tuan Hoang, Ying-Chang Liang, David Tung Chong Wong,

More information

The Markov Decision Process (MDP) model

The Markov Decision Process (MDP) model Decision Making in Robots and Autonomous Agents The Markov Decision Process (MDP) model Subramanian Ramamoorthy School of Informatics 25 January, 2013 In the MAB Model We were in a single casino and the

More information

Decision Theory: Q-Learning

Decision Theory: Q-Learning Decision Theory: Q-Learning CPSC 322 Decision Theory 5 Textbook 12.5 Decision Theory: Q-Learning CPSC 322 Decision Theory 5, Slide 1 Lecture Overview 1 Recap 2 Asynchronous Value Iteration 3 Q-Learning

More information

Kalman Based Temporal Difference Neural Network for Policy Generation under Uncertainty (KBTDNN)

Kalman Based Temporal Difference Neural Network for Policy Generation under Uncertainty (KBTDNN) Kalman Based Temporal Difference Neural Network for Policy Generation under Uncertainty (KBTDNN) Alp Sardag and H.Levent Akin Bogazici University Department of Computer Engineering 34342 Bebek, Istanbul,

More information

Luis Manuel Santana Gallego 100 Investigation and simulation of the clock skew in modern integrated circuits. Clock Skew Model

Luis Manuel Santana Gallego 100 Investigation and simulation of the clock skew in modern integrated circuits. Clock Skew Model Luis Manuel Santana Gallego 100 Appendix 3 Clock Skew Model Xiaohong Jiang and Susumu Horiguchi [JIA-01] 1. Introduction The evolution of VLSI chips toward larger die sizes and faster clock speeds makes

More information

Symbolic Dynamic Programming for First-order POMDPs

Symbolic Dynamic Programming for First-order POMDPs Symbolic Dynamic Programming for First-order POMDPs Scott Sanner NICTA & ANU Canberra, Australia scott.sanner@nicta.com.au Kristian Kersting Fraunhofer IAIS Sankt Augustin, Germany kristian.kersting@iais.fraunhofer.de

More information

POMDP S : Exact and Approximate Solutions

POMDP S : Exact and Approximate Solutions POMDP S : Exact and Approximate Solutions Bharaneedharan R University of Illinois at Chicago Automated optimal decision making. Fall 2002 p.1/21 Road-map Definition of POMDP Belief as a sufficient statistic

More information

SUFFIX TREE. SYNONYMS Compact suffix trie

SUFFIX TREE. SYNONYMS Compact suffix trie SUFFIX TREE Maxime Crochemore King s College London and Université Paris-Est, http://www.dcs.kcl.ac.uk/staff/mac/ Thierry Lecroq Université de Rouen, http://monge.univ-mlv.fr/~lecroq SYNONYMS Compact suffix

More information

arxiv:cs/ v1 [cs.ni] 27 Feb 2007

arxiv:cs/ v1 [cs.ni] 27 Feb 2007 Joint Design and Separation Principle for Opportunistic Spectrum Access in the Presence of Sensing Errors Yunxia Chen, Qing Zhao, and Ananthram Swami Abstract arxiv:cs/0702158v1 [cs.ni] 27 Feb 2007 We

More information

Planning Under Uncertainty II

Planning Under Uncertainty II Planning Under Uncertainty II Intelligent Robotics 2014/15 Bruno Lacerda Announcement No class next Monday - 17/11/2014 2 Previous Lecture Approach to cope with uncertainty on outcome of actions Markov

More information

On Decision Problems for Probabilistic Büchi Automata

On Decision Problems for Probabilistic Büchi Automata On Decision Prolems for Proailistic Büchi Automata Christel Baier 1, Nathalie Bertrand 2, Marcus Größer 1 1 TU Dresden, Germany 2 IRISA, INRIA Rennes, France Cachan 05 février 2008 Séminaire LSV 05/02/08,

More information

Bayesian Congestion Control over a Markovian Network Bandwidth Process

Bayesian Congestion Control over a Markovian Network Bandwidth Process Bayesian Congestion Control over a Markovian Network Bandwidth Process Parisa Mansourifard 1/30 Bayesian Congestion Control over a Markovian Network Bandwidth Process Parisa Mansourifard (USC) Joint work

More information

Robust Policy Computation in Reward-uncertain MDPs using Nondominated Policies

Robust Policy Computation in Reward-uncertain MDPs using Nondominated Policies Robust Policy Computation in Reward-uncertain MDPs using Nondominated Policies Kevin Regan University of Toronto Toronto, Ontario, Canada, M5S 3G4 kmregan@cs.toronto.edu Craig Boutilier University of Toronto

More information

Open Problem: Approximate Planning of POMDPs in the class of Memoryless Policies

Open Problem: Approximate Planning of POMDPs in the class of Memoryless Policies Open Problem: Approximate Planning of POMDPs in the class of Memoryless Policies Kamyar Azizzadenesheli U.C. Irvine Joint work with Prof. Anima Anandkumar and Dr. Alessandro Lazaric. Motivation +1 Agent-Environment

More information

Today s s Lecture. Applicability of Neural Networks. Back-propagation. Review of Neural Networks. Lecture 20: Learning -4. Markov-Decision Processes

Today s s Lecture. Applicability of Neural Networks. Back-propagation. Review of Neural Networks. Lecture 20: Learning -4. Markov-Decision Processes Today s s Lecture Lecture 20: Learning -4 Review of Neural Networks Markov-Decision Processes Victor Lesser CMPSCI 683 Fall 2004 Reinforcement learning 2 Back-propagation Applicability of Neural Networks

More information

Final Exam December 12, 2017

Final Exam December 12, 2017 Introduction to Artificial Intelligence CSE 473, Autumn 2017 Dieter Fox Final Exam December 12, 2017 Directions This exam has 7 problems with 111 points shown in the table below, and you have 110 minutes

More information

OPPORTUNISTIC Spectrum Access (OSA) is emerging

OPPORTUNISTIC Spectrum Access (OSA) is emerging Optimal and Low-complexity Algorithms for Dynamic Spectrum Access in Centralized Cognitive Radio Networks with Fading Channels Mario Bkassiny, Sudharman K. Jayaweera, Yang Li Dept. of Electrical and Computer

More information

On the Approximate Solution of POMDP and the Near-Optimality of Finite-State Controllers

On the Approximate Solution of POMDP and the Near-Optimality of Finite-State Controllers On the Approximate Solution of POMDP and the Near-Optimality of Finite-State Controllers Huizhen (Janey) Yu (janey@mit.edu) Dimitri Bertsekas (dimitrib@mit.edu) Lab for Information and Decision Systems,

More information

Bayesian Quickest Change Detection Under Energy Constraints

Bayesian Quickest Change Detection Under Energy Constraints Bayesian Quickest Change Detection Under Energy Constraints Taposh Banerjee and Venugopal V. Veeravalli ECE Department and Coordinated Science Laboratory University of Illinois at Urbana-Champaign, Urbana,

More information

Two-Stage Improved Group Plans for Burr Type XII Distributions

Two-Stage Improved Group Plans for Burr Type XII Distributions American Journal of Mathematics and Statistics 212, 2(3): 33-39 DOI: 1.5923/j.ajms.21223.4 Two-Stage Improved Group Plans for Burr Type XII Distriutions Muhammad Aslam 1,*, Y. L. Lio 2, Muhammad Azam 1,

More information

Sequential Decision Problems

Sequential Decision Problems Sequential Decision Problems Michael A. Goodrich November 10, 2006 If I make changes to these notes after they are posted and if these changes are important (beyond cosmetic), the changes will highlighted

More information

POMDP solution methods

POMDP solution methods POMDP solution methods Darius Braziunas Department of Computer Science University of Toronto 2003 Abstract This is an overview of partially observable Markov decision processes (POMDPs). We describe POMDP

More information

CMU Lecture 11: Markov Decision Processes II. Teacher: Gianni A. Di Caro

CMU Lecture 11: Markov Decision Processes II. Teacher: Gianni A. Di Caro CMU 15-781 Lecture 11: Markov Decision Processes II Teacher: Gianni A. Di Caro RECAP: DEFINING MDPS Markov decision processes: o Set of states S o Start state s 0 o Set of actions A o Transitions P(s s,a)

More information

Optimal Routing in Chord

Optimal Routing in Chord Optimal Routing in Chord Prasanna Ganesan Gurmeet Singh Manku Astract We propose optimal routing algorithms for Chord [1], a popular topology for routing in peer-to-peer networks. Chord is an undirected

More information

Planning by Probabilistic Inference

Planning by Probabilistic Inference Planning by Probabilistic Inference Hagai Attias Microsoft Research 1 Microsoft Way Redmond, WA 98052 Abstract This paper presents and demonstrates a new approach to the problem of planning under uncertainty.

More information