CS 7180: Behavioral Modeling and Decision- making in AI

Similar documents
Modeling and Inference with Relational Dynamic Bayesian Networks Cristina Manfredotti

Hidden Markov models 1

PROBABILISTIC REASONING OVER TIME

Hidden Markov Models. AIMA Chapter 15, Sections 1 5. AIMA Chapter 15, Sections 1 5 1

Bayesian Networks BY: MOHAMAD ALSABBAGH

Artificial Intelligence

Temporal probability models. Chapter 15

Temporal probability models. Chapter 15, Sections 1 5 1

Extensions of Bayesian Networks. Outline. Bayesian Network. Reasoning under Uncertainty. Features of Bayesian Networks.

Hidden Markov Models (recap BNs)

Uncertainty and Bayesian Networks

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012

Probabilistic Reasoning. (Mostly using Bayesian Networks)

Sampling Algorithms for Probabilistic Graphical models

Directed Graphical Models

CSEP 573: Artificial Intelligence

Rapid Introduction to Machine Learning/ Deep Learning

Approximate Inference

Dynamic Bayesian Networks and Hidden Markov Models Decision Trees

Probabilistic Graphical Models (I)

CS 343: Artificial Intelligence

Probabilistic Robotics

Bayesian Methods in Artificial Intelligence

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

Probability. CS 3793/5233 Artificial Intelligence Probability 1

CS 343: Artificial Intelligence

15-780: Grad AI Lecture 19: Graphical models, Monte Carlo methods. Geoff Gordon (this lecture) Tuomas Sandholm TAs Erik Zawadzki, Abe Othman

CSE 473: Artificial Intelligence Probability Review à Markov Models. Outline

CS 343: Artificial Intelligence

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty

Introduction to Probabilistic Reasoning. Image credit: NASA. Assignment

Quantifying uncertainty & Bayesian networks

Y. Xiang, Inference with Uncertain Knowledge 1

Hidden Markov Models. Vibhav Gogate The University of Texas at Dallas

Introduction to Artificial Intelligence (AI)

Human-Oriented Robotics. Temporal Reasoning. Kai Arras Social Robotics Lab, University of Freiburg

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

Pengju XJTU 2016

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

CSE 473: Ar+ficial Intelligence. Hidden Markov Models. Bayes Nets. Two random variable at each +me step Hidden state, X i Observa+on, E i

CSCI 360 Introduc/on to Ar/ficial Intelligence Week 2: Problem Solving and Op/miza/on. Professor Wei-Min Shen Week 8.1 and 8.2

Introduction to Bayesian Learning

CS 343: Artificial Intelligence

CS 7180: Behavioral Modeling and Decision- making in AI

Final Examination CS 540-2: Introduction to Artificial Intelligence

Intelligent Systems: Reasoning and Recognition. Reasoning with Bayesian Networks

CSE 473: Artificial Intelligence

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

CS 7180: Behavioral Modeling and Decision- making in AI

Markov Chains and Hidden Markov Models

Objectives. Probabilistic Reasoning Systems. Outline. Independence. Conditional independence. Conditional independence II.

CS 5522: Artificial Intelligence II

CS 5522: Artificial Intelligence II

Probabilistic Reasoning Systems

Course Introduction. Probabilistic Modelling and Reasoning. Relationships between courses. Dealing with Uncertainty. Chris Williams.

Graphical Models and Kernel Methods

Template-Based Representations. Sargur Srihari

CS6220: DATA MINING TECHNIQUES

Bayesian Approaches Data Mining Selected Technique

CS 188: Artificial Intelligence Fall Recap: Inference Example

Probabilistic Reasoning. Kee-Eung Kim KAIST Computer Science

Markov localization uses an explicit, discrete representation for the probability of all position in the state space.

CS 188: Artificial Intelligence Fall 2011

Logic, Knowledge Representation and Bayesian Decision Theory

Directed Graphical Models or Bayesian Networks

Intelligent Systems (AI-2)

Bayesian networks. Chapter 14, Sections 1 4

Stephen Scott.

STA 4273H: Statistical Machine Learning

Bayesian networks. Chapter Chapter

Implementing Machine Reasoning using Bayesian Network in Big Data Analytics

Learning in Bayesian Networks

Bayesian Networks. Motivation

Hidden Markov Models. Hal Daumé III. Computer Science University of Maryland CS 421: Introduction to Artificial Intelligence 19 Apr 2012

CS 188: Artificial Intelligence Spring Announcements

CS 188: Artificial Intelligence Fall 2009

Reasoning Under Uncertainty Over Time. CS 486/686: Introduction to Artificial Intelligence

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Bayes Nets: Sampling

A Brief Introduction to Graphical Models. Presenter: Yijuan Lu November 12,2004

An AI-ish view of Probability, Conditional Probability & Bayes Theorem

10/18/2017. An AI-ish view of Probability, Conditional Probability & Bayes Theorem. Making decisions under uncertainty.

Outline. Spring It Introduction Representation. Markov Random Field. Conclusion. Conditional Independence Inference: Variable elimination

CS 188: Artificial Intelligence. Bayes Nets

Statistical Approaches to Learning and Discovery

Probabilistic Reasoning

Ch.6 Uncertain Knowledge. Logic and Uncertainty. Representation. One problem with logical approaches: Department of Computer Science

Chapter 16. Structured Probabilistic Models for Deep Learning

Artificial Intelligence

CS 5522: Artificial Intelligence II

CS Lecture 3. More Bayesian Networks

Directed Probabilistic Graphical Models CMSC 678 UMBC

STA 4273H: Statistical Machine Learning

On the Relationship between Sum-Product Networks and Bayesian Networks

Probabilistic representation and reasoning

Pengju

Bayesian Inference. Definitions from Probability: Naive Bayes Classifiers: Advantages and Disadvantages of Naive Bayes Classifiers:

CS6220: DATA MINING TECHNIQUES

Markov Models and Hidden Markov Models

Artificial Intelligence

Transcription:

CS 7180: Behavioral Modeling and Decision- making in AI Bayesian Networks for Dynamic and/or Relational Domains Prof. Amy Sliva October 12, 2012

World is not only uncertain, it is dynamic Beliefs, observations, and relationships are not static Diabetic blood sugar and insulin levels Economic activity of a nation Tracking vehicle location Represent world as a series of snapshots or time slices Temporal state- space model keep track of value of evidence and outcome variables at each time slice State- variable representation Assume time is bounded discrete instants Step size depends on the domain (i.e., hour vs. day) Fixed interval between time slices is Nixed (represent by integers) Starts at time t = 0

RepresenEng the state at a Eme slice Two types of state variables X t = Unobserved random variables at time t Rain t, BloodSugar t, StomachContents t, QualityOfLife t E t = Observed evidence variables at time t Umbrella t, MeasuredBloodSugar t, GDP t E t = e t is the actual observation at time t Assume evidence starts arriving at time t = 1 Represent domain by sequences of state variables and evidence R 0, R 1, R 2, and E 1, E 2, E 3, X a:b denotes variables from X a to X b

RepresenEng state changes over Eme How can we reason about states over time? Leverage structural features of Bayesian networks What are parents? Transition model how the world evolves over time Probability of the state variables given the previous values P(X t X 0:t- 1 ) unbounded in size as t increases Assume stationary process for transition Process of change governed by rules that do not themselves change P(X t X 0:t- 1 ) the same for all t no need to recompute at time slice Observation model how evidence is sensed over time

Markov process for transieons and observaeons Markov assumption Current state depends on only a Jinite Jixed number of previous states Future conditionally independent of the past given a subset of previous states Markov process (or Markov Chain) First- order Markov process depends only on the previous state P(X t X 0:t- 1 ) = P(X t X t- 1 ) Second- order Markov process depends on the previous two states P(X t X 0:t- 1 ) = P(X t X t- 2, X t- 1 ) Sensor Markov assumption Evidence depends only on current state P(E t X 0:t- 1, E 0:t- 1 ) = P(E t X t ) X t 2 X t 1 X t X t+1 X t+2 X t 2 X t 1 X t X t+1 X t+2

Bayesian network with temporal model Rain t = it is raining at time t Umbrella t = our friend is carrying an umbrella at time t Dynamic Bayesian network more to come! R t -1 t f P(R t) 0.7 0.3 Rain t 1 Rain t Rain t+1 R t f t P(U t ) 0.9 0.2 Umbrella t 1 Umbrella t Umbrella t+1 Start with prior probability distribution P(X 0 ) at time t = 0 Joint distribution over all variables in the network P(X 0:t,E 1:t ) = P(X 0 ) Π i = 1 to t P(X i X i- 1 ) P(E i X i )

First- order Markov assumpeon unrealisec? First- order Markov not exactly true in real world! Rain only depends on if it rained yesterday? R t -1 t f P(R t) 0.7 0.3 Rain t 1 Rain t Rain t+1 R t f t P(U t ) 0.9 0.2 Umbrella t 1 Umbrella t Umbrella t+1 Improving accuracy of the model Increase order of Markov process Increase set of variables additional information and relationships Temperature t, BarametricPressure t

Inference in temporal models Common reasoning patterns through temporal model Filtering: P(X t e 1:t ) computing the belief state given evidence sequence to facilitate rational decision- making Prediction: P(X t+k e 1:t ), k > 0 compute the posterior probability of future state given the evidence sequence Smoothing: P(X k e 1:t ), 0 k 1 compute the probability of past state given the evidence sequence Most likely explanation: argmax x1:tp (x 1:t e 1:t ) sequence of states most likely to have generated the observations Learning learning the structure and probabilities from data using expectation maximization (EM)

Inference in temporal models Common reasoning patterns through temporal model Filtering: P(X t e 1:t ) computing the belief state given evidence sequence to facilitate rational decision- making Prediction: P(X t+k e 1:t ), k > 0 compute the posterior probability of future state given the evidence sequence Smoothing: P(X k e 1:t ), 0 k 1 compute the probability of past state given the evidence sequence Most likely explanation: argmax x1:tp (x 1:t e 1:t ) sequence of states most likely to have generated the observations Learning learning the structure and probabilities from data using expectation maximization (EM)

Dynamic Bayesian networks (DBNs) Bayesian network representing temporal probability model Stationary, Markov process of state transitions Includes prior distribution P(X 0 ), transition model P(X t X t- 1 ), and observation model P(E i X i ) Depends on topology between time slices Connection between Bayesian network at time t and t+1 Transition arcs Bayesian Network at time t Bayesian Network at time t+1

Basic approach to DBNs Copy the state and evidence from one time slice to the next Only specify for Nirst time slice and replicate for all the others P(R 0) 0.7 R 0 t f P(R 1) 0.7 0.3 Rain 0 Rain 1 R 1 t f P(U 1) 0.9 0.2 Umbrella 1 Process called unrolling the DBN One slice DBN Unrolled for time t = 0 to t = 10 X t X t+1 X 0 X 1 X 2 X 10 Y t Y t+1 Y 0 Y 1 Y 2 Y 10

Exact inference in DBNs Naïve approach unroll the whole network and apply any exact Bayesian reasoning algorithm R 0 P(R 1 ) P(R 0 ) t P(R 0.7 0 ) 0.7 f 0.3 0.7 Rain 0 Rain 1 R t f 0 P(R 1) 0.7 0.3 Rain 0 Rain 1 R t f 1 P(R 2) 0.7 0.3 Rain 2 R t f 2 P(R 3) 0.7 0.3 Rain 3 R 3 t f P(R 4) 0.7 0.3 Rain 4 Umbrella 1 Umbrella 1 Umbrella 2 Umbrella 3 Umbrella 4 P(U 1 ) 0.9 0.2 The inference cost for each update grows with t Use variable elimination to sum out previous time slices Keep at most two slices in memory at a time Still exponential in number of state variables Need approximations! R 1 t f R 1 t f P(U 1 ) 0.9 0.2 R 2 t f P(U 2 ) 0.9 0.2 R 3 t f P(U 3 ) 0.9 0.2 R 4 t f P(U 4 ) 0.9 0.2

Unrolling intractable in real- world models Pathways, biological processes, cellular components, and molecular components that change with growing bacterial infection

ApproximaEon in DBNs using parecle filtering Filtering: P(X t e 1:t ) computing the belief state given evidence sequence to facilitate rational decision- making Filtering algorithm maintains current state and updates with new evidence, rather than looking at entire sequence P(X t+1 e 1:t+1 ) = f(e t+1, P(X t e 1:t ) Recursive estimation Particle Niltering for importance sampling Focus the samples (particles) on high- probability regions Throw away samples with low weights according to the observations and replicate those with high weights Population of samples representative of reality

ParEcle filtering algorithms Sample N initial states from P(X 0 ) Update cycle for each time step: 1. Propagate sample forward using Markov transition model P(X t+1 X t ) 2. Weight sample by likelihood of new evidence using observation model P(e t+1 x t+1 ) 3. Resample N new samples from the current population probability of selection proportional to weight New samples are unweighted

Using parecle filtering N = 10 samples at each time slice true Rain t Rain t+1 false (a) Propagate Time t 8 samples indicate Rain is true, and 2 false Use transition model to propagate to t+1 sample Rain t+1 from CPT conditioned on Rain t Time t+1 6 samples indicate Rain is true, and 4 false

Using parecle filtering N = 10 samples at each time slice true Rain t Rain t+1 Rain t+1 false (a) Propagate (b) Weight At time t+1, observation is Umbrella Use this evidence to weight the sample just generated

Using parecle filtering N = 10 samples at each time slice true Rain t Rain t+1 Rain t+1 Rain t+1 false (a) Propagate (b) Weight (c) Resample Generate renined set of 10 samples Weighted random selection from current set 2 samples indicate rain, 8 no rain Now propagate this tuned sample to time t+2

Analysis of parecle filtering Consistent estimation converges to exact probabilities as N Resampling allows us to rejine likelihood weighting throw out small weights and focus on large ones Drawback of particle Niltering InefNicient in high- dimensional spaces (Variance becomes so large) Solution Rao- Balckwellization sample a subset of variables allowing the remainder to be integrated out exactly Estimates have lower variance

Rao- Blackwellized parecle filtering How can we reduce the number of particles (samples) needed to achieve the same accuracy? Sample subset of the variables allowing remainder to be integrated out exactly Results in estimates having lower variance Partition the state variables at time t s.t. X t = (R t, V t ) where P(R 0:t,V 0:t E 1:t ) = P(V 0:t R 0:t, E 1:t ) P(R 0:t E 1:t ) Assume we can tractably compute P(V 0:t R 0:t, E 1:t ) Just focus on estimating probability from lower dimension space P(R 0:t E 1:t ) = P(E t E 1:t - 1,R 0:t )P(R t R t- 1 ) P(R 0:t- 1 E 1:t- 1 ) P(E t E t- 1 )

Rao- Blackwellised parecle filtering How can we reduce the number of particles (samples) needed to achieve the same accuracy? Sample subset of the variables allowing remainder to be integrated out exactly Results in estimates having lower variance Partition the state variables at time t s.t. X t = (R t, V t ) where P(R 0:t,V 0:t E 1:t ) = P(V 0:t R 0:t, E 1:t ) P(R 0:t E 1:t ) Assume we can tractably compute P(V 0:t R 0:t, E 1:t ) Just focus on estimating probability from lower dimension space P(R 0:t E 1:t ) = P(E t E 1:t - 1,R 0:t )P(R t R t- 1 ) P(R 0:t- 1 E 1:t- 1 ) P(E t E t- 1 ) Only sample this!

Rao- Blackwellised parecle filtering How can we reduce the number of particles (samples) needed to achieve the same accuracy? Sample subset of the variables allowing remainder to be integrated out exactly Results in estimates having lower variance Partition the state variables at time t s.t. X t = (R t, V t ) where P(R 0:t,V 0:t E 1:t ) = P(V 0:t R 0:t, E 1:t ) P(R 0:t E 1:t ) Assume we can tractably compute P(V 0:t R 0:t, E 1:t ) Just focus on estimating probability from lower dimension space P(R 0:t E 1:t ) = P(E t E 1:t - 1,R 0:t )P(R t R t- 1 ) P(R 0:t- 1 E 1:t- 1 ) P(E t E t- 1 ) The rest of the values conditionally independent given the sample and evidence

Approximate inference with fewer samples A 0 A 1 A 2 A 10 Y 0 A Y 1 A Y 2 A Y 10 A B 0 B 1 B 2 B 10 Y 0 B Y 1 B Y 2 B Y 10 B C 0 C 1 C 2 C 10 Y 0 C Y 1 C Y 2 C Y 10 C Goal: compute joint Niltering distribution P(A t,b t,c t E 1:t )

Approximate inference with fewer samples A 0 A 1 A 2 A 10 Y 0 A Y 1 A Y 2 A Y 10 A B 0 B 1 B 2 B 10 Y 0 B Y 1 B Y 2 B Y 10 B C 0 C 1 C 2 C 10 Y 0 C Y 1 C Y 2 C Y 10 C P(A t,b t,c t Y 1:t ) = P(A 1: t,c 1:t Y 1:t,B 1:t ) P(B 1:t Y 1:t ) = P(A 1: Y 1:t A,B 1:t- 1 ) P(C 1: Y 1:t C,B 1:t- 1 ) P(B 1:t Y 1:t )

Approximate inference with fewer samples A 0 A 1 A 2 A 10 Only sample B Y 0 A Y 1 A Y 2 A Y 10 A B 0 B 1 B 2 B 10 Y 0 B Y 1 B Y 2 B Y 10 B C 0 C 1 C 2 C 10 Y 0 C Y 1 C Y 2 C Y 10 C P(A t,b t,c t Y 1:t ) = P(A 1: t,c 1:t Y 1:t,B 1:t ) P(B 1:t Y 1:t ) = P(A 1: Y 1:t A,B 1:t- 1 ) P(C 1: Y 1:t C,B 1:t- 1 ) P(B 1:t Y 1:t )

Approximate inference with fewer samples A 0 A 1 A 2 A 10 Only sample B Y 0 A Y 1 A Y 2 A Y 10 A B 0 B 1 B 2 B 10 Y 0 B Y 1 B Y 2 B Y 10 B C 0 C 1 C 2 C 10 Y 0 C Y 1 C Y 2 C Y 10 C Where do we get these partitions? Typically domain or application specinic.

LimitaEons of only using random variables DBNs extend traditional Bayesian networks Facilitate probabilistic reasoning over time Knowledge representation is still not very expressive Random variables are essentially propositions, with same drawbacks How do we express relationships and properties of objects? Exhaustively represent all possible objects and relations among them Intractable in real- world relational domains Incorporate Jirst- order logic into DBNs Relational Dynamic Bayesian Networks

Dynamic RelaEonal domains Set of objects (constants, variables, functions) and attributes or relations (predicates) among them State is the set of ground predicates that are true B 0 at state A id color position(t) velocity(t) direction(t) decreasing_velocity(t) same_direction(t) distance(t) B 0 at state B id color position(t) velocity(t) direction(t) decreasing_velocity(t) same_direction(t) distance(t)

RelaEonal domains Set of objects (constants, variables, functions) and attributes or relations (predicates) among them State is the set of ground Attributes predicates that are true B 0 at state A id color position(t) velocity(t) direction(t) decreasing_velocity(t) same_direction(t) distance(t) B 0 at state B id color position(t) velocity(t) direction(t) decreasing_velocity(t) same_direction(t) distance(t)

RelaEonal domains Set of objects (constants, variables, functions) and attributes or relations (predicates) among them State is the set of ground Relations predicates that are true B 0 at state A id color position(t) velocity(t) direction(t) decreasing_velocity(t) same_direction(t) distance(t) B 0 at state B id color position(t) velocity(t) direction(t) decreasing_velocity(t) same_direction(t) distance(t)

RelaEonal Bayesian Network (RBN) Syntax Set of nodes one for each FOL predicate DAG directed acyclic graph Conditional distribution for each node given its parents Now do not have to instantiate all ground atoms and use propositional Bayesian network Represent general relationships between objects To ensure no cycles in the RBN predicates must be ordered

CondiEonal model for each node For each node conditional distribution determined by relational information First- order probability tree Conditional model of ground node given its parents Store FOPT at each node rather than conditional probability tables Construction Interior node: Nirst- order formula F n on parent predicates will make the child either true or false Leaves: probability distribution c Color(x,c) Color(y,c) T F 0.3 0.05

RelaEonal Dynamic Bayesian Network (RDBN) Infeasible to use exact DBN on all ground predicates Extend RBN for explicit relational, dynamic network id color position(t- 1) velocity(t- 1) B 0 at state A same_direction(t- 1) id color position(t) velocity(t) B 0 at state B same_direction(t) t- 1 t

RelaEonal Dynamic Bayesian Network (RDBN) Infeasible to use exact DBN on all ground predicates Extend RBN for explicit relational, dynamic network B 0 at state A Transition Model B 0 at state B id color position(t- 1) velocity(t- 1) same_direction(t- 1) id color position(t) velocity(t) same_direction(t) t- 1 Observation Model t

TransiEon model is first- order Markov Predicates at time t depend only on those at t- 1 Create node at t for every ground predicate Use conditional model (FOPT) based on grounding at the node Number of ground predicates (per slice!) is O(N k ) where N is the size of the domain raised to the arity k of the predicate Domain size can be tens of thousands or more! Assume one action performed per time slice

Example of RDBN Factory assembly domain Plates, brackets, etc. welded and bolted together Plates and brackets have attributes such as size, shape, and color Bolted- to(x, y, t- 1) Bolted- to(x, y, t) Bolted- to(x, y, t+1) Color(x, c, t- 1) Color(x, c, t) Color(x, c, t+1) Shape(y, s, t- 1) Shape(y, s, t) Shape(y, s, t) Bolt(x, y, t) Bolt(x, y, t+1)

First order probability tree for RDBN T Bolted- to(x, y, t- 1) F 1.0 Bolt (x, y, t) F 0.9 T z Bolt (x, z, t) F c Color(y, c, t- 1) Color(z, c, t- 1) T F 0.0 0.1 / (count(w Bracket(w) Color(w, c, t- 1) 0.0

Inference in RDBN using FOL properees DBN inference on ground version Exact inference completely intractable! Particle Niltering will sample poorly because of high variance in large domains Lifted versions of the existing algorithms makes use of FOL structure Identify two categories of predicates Complex if the domain size is large Bolted- To(x,y,t) where items x and y are components (i.e., plates and brackets) that can be bolted together in manufacturing Simple otherwise Color(x,c,t) where the number of possible colors c in this application is small Largeness of the domain depends on the application

Rao- BlackwellizaEon in RDBNs Partition using simple and complex predicates (well, and make some assumptions ) Assumption 1: Uncertain complex predicates do not appear in the RDBN as the parents of other predicates All parents of unobserved complex predicates are simple or known Assumption 2: For any object o there is at most one other object o s.t. the ground predicate R(o, o, t) is true and one o s.t. R(o, o, t) is true Objects in a relation are mutually exclusive

CondiEonal independence gives pareeons Complex predicates independent of each other Conditioned on simple predicates and known evidence (i.e., parents) Simple predicates independent of unknown complex ones Given known evidence Rao- Blackwell partitions of (unknown) predicates P at t: P t = (Complex t, Simple t ) so P(Simple 0:t,Complex 0:t E 1:t ) = P(Complex 0:t Simple 0:t, E 1:t ) P(Simple 0:t E 1:t )

CondiEonal independence gives pareeons Complex predicates independent of each other Conditioned on simple predicates and known evidence (i.e., parents) Simple predicates independent of unknown complex ones Given known evidence Rao- Blackwell partitions of (unknown) predicates P at t: P t = (Complex t, Simple t ) so Sample the simple predicates P(Simple 0:t,Complex 0:t E 1:t ) = P(Complex 0:t Simple 0:t, E 1:t ) P(Simple 0:t E 1:t ) Compute the complex predicates

Efficiency of Rao- BlackwellizaEon Rao- Blackwellized particle Niltering better than standard algorithm Domains with large numbers of objects and relations are still complex even with Rao- Blackwellization Leverage context or domain specinic independence to improve efniciency Group related objects o and o that give rise to R(o,o,t) into abstractions Disjoint sets A R1, A R2,, A rm s.t. two pairs of objects (o i,o j ), (o k,o l ) in A R iff Pr(o i,o j,t) = Pr(o k,o l ) Specify abstractions with FOL formulas Maintain conditional probabilities for abstractions rather than pairs Abstractions improve performance by factor of 30 to 70

DBNs and RDBNs are not the only way Several approaches to handling time and uncertainty depending on the task Markov decision process Hidden Markov models DBNs are generalizations of many of these other systems Can be even more effective when domain knowledge allows additional conditional independence assumptions