Artificial Intelligence

Similar documents
PROBABILISTIC REASONING OVER TIME

Bayesian Networks BY: MOHAMAD ALSABBAGH

Kalman Filters, Dynamic Bayesian Networks

Temporal probability models. Chapter 15, Sections 1 5 1

Bayesian Methods in Artificial Intelligence

Extensions of Bayesian Networks. Outline. Bayesian Network. Reasoning under Uncertainty. Features of Bayesian Networks.

Hidden Markov Models (recap BNs)

CSEP 573: Artificial Intelligence

CS 7180: Behavioral Modeling and Decision- making in AI

Reasoning Under Uncertainty Over Time. CS 486/686: Introduction to Artificial Intelligence

CS 343: Artificial Intelligence

Template-Based Representations. Sargur Srihari

EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS

Linear Dynamical Systems

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

CS 343: Artificial Intelligence

CSE 473: Artificial Intelligence

Markov Models. CS 188: Artificial Intelligence Fall Example. Mini-Forward Algorithm. Stationary Distributions.

Temporal probability models. Chapter 15

Announcements. CS 188: Artificial Intelligence Fall VPI Example. VPI Properties. Reasoning over Time. Markov Models. Lecture 19: HMMs 11/4/2008

Hidden Markov Models. Hal Daumé III. Computer Science University of Maryland CS 421: Introduction to Artificial Intelligence 19 Apr 2012

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Hidden Markov models

Final Exam December 12, 2017

Bayesian networks: Modeling

CS532, Winter 2010 Hidden Markov Models

Probabilistic Reasoning. (Mostly using Bayesian Networks)

CS 5522: Artificial Intelligence II

CS 188: Artificial Intelligence Spring Announcements

Local Probabilistic Models: Continuous Variable CPDs

CS 5522: Artificial Intelligence II

Graphical Models and Kernel Methods

Hidden Markov Models. AIMA Chapter 15, Sections 1 5. AIMA Chapter 15, Sections 1 5 1

CSE 473: Artificial Intelligence Spring 2014

Robotics 2 Data Association. Giorgio Grisetti, Cyrill Stachniss, Kai Arras, Wolfram Burgard

Machine Learning for Signal Processing Bayes Classification and Regression

Probability. CS 3793/5233 Artificial Intelligence Probability 1

Final Exam December 12, 2017

Markov localization uses an explicit, discrete representation for the probability of all position in the state space.

Introduction to Artificial Intelligence (AI)

Hidden Markov Models. Vibhav Gogate The University of Texas at Dallas

Machine Learning, Midterm Exam: Spring 2009 SOLUTION

Bayesian Networks Inference with Probabilistic Graphical Models

14 PROBABILISTIC REASONING

Kalman filtering and friends: Inference in time series models. Herke van Hoof slides mostly by Michael Rubinstein

CS 188: Artificial Intelligence Fall 2011

Bayesian networks. Chapter Chapter

Modeling and state estimation Examples State estimation Probabilities Bayes filter Particle filter. Modeling. CSC752 Autonomous Robotic Systems

9 Forward-backward algorithm, sum-product on factor graphs

CS491/691: Introduction to Aerial Robotics

LEARNING DYNAMIC SYSTEMS: MARKOV MODELS

CS 4100 // artificial intelligence. Recap/midterm review!

Introduction to Artificial Intelligence (AI)

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty

6.867 Machine learning, lecture 23 (Jaakkola)

Using first-order logic, formalize the following knowledge:

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic

Introduction to Machine Learning

Uncertainty. Logic and Uncertainty. Russell & Norvig. Readings: Chapter 13. One problem with logical-agent approaches: C:145 Artificial

Final Examination CS 540-2: Introduction to Artificial Intelligence

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

Bayesian networks. Chapter AIMA2e Slides, Stuart Russell and Peter Norvig, Completed by Kazim Fouladi, Fall 2008 Chapter 14.

STA 4273H: Statistical Machine Learning

CS 188: Artificial Intelligence Spring Announcements

Probabilistic Graphical Models

Ch.6 Uncertain Knowledge. Logic and Uncertainty. Representation. One problem with logical approaches: Department of Computer Science

Announcements. CS 188: Artificial Intelligence Fall Markov Models. Example: Markov Chain. Mini-Forward Algorithm. Example

Artificial Intelligence Roman Barták

Bayesian networks. Chapter Chapter Outline. Syntax Semantics Parameterized distributions. Chapter

Probabilistic representation and reasoning

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

Sensor Tasking and Control

Bayesian Networks. Motivation

Approximate Inference

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

Bayesian Networks. instructor: Matteo Pozzi. x 1. x 2. x 3 x 4. x 5. x 6. x 7. x 8. x 9. Lec : Urban Systems Modeling

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

CS 5522: Artificial Intelligence II

CS 188: Artificial Intelligence Spring 2009

Hidden Markov models 1

STA 4273H: Statistical Machine Learning

Chapter 05: Hidden Markov Models

Review: Bayesian learning and inference

Probabilistic Graphical Models

CS 5522: Artificial Intelligence II

15-381: Artificial Intelligence. Hidden Markov Models (HMMs)

PROBABILISTIC REASONING SYSTEMS

CS188: Artificial Intelligence, Fall 2010 Written 3: Bayes Nets, VPI, and HMMs

Miscellaneous. Regarding reading materials. Again, ask questions (if you have) and ask them earlier

CS 188 Fall Introduction to Artificial Intelligence Midterm 2

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012

Pattern Recognition and Machine Learning

Mobile Robot Localization

Probabilistic representation and reasoning

2D Image Processing (Extended) Kalman and particle filter

Statistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes

Markov chain Monte Carlo methods for visual tracking

Directed and Undirected Graphical Models

Machine Learning 4771

CSL302/612 Artificial Intelligence End-Semester Exam 120 Minutes

Transcription:

Artificial Intelligence Roman Barták Department of Theoretical Computer Science and Mathematical Logic Summary of last lecture We know how to do probabilistic reasoning over time transition model P(X t X t-1 ), sensor mode P(E t X t ) Markov assumptions Basic inference tasks: filtering: P(X t e 1:t ) prediction: P(X t+k e 1:t ) pro k>0 smoothing: P(X k e 1:t ) pro k: 0 k < t most likely explanation: argmax x 1:t P(x 1:t e 1:t ) Hidden Markov Models (HMM) a special case with a single state variable and a single sensor variable inference tasks solved by means of matrix operations

Localization (an example of HMM) Assume a robot that moves randomly in a grid world, has a map of the world and (noisy) sensors reporting obstacles laying immediately to the north, south, east, and west. The robot needs to find its location. A possible model: random variables X t describe robot s location at times t possible values are 1,..,n for n locations Nb(i) a set of neighboring locations for location i transition tables P(X t+1 =j X t =i) = 1/ Nb(i), if j Nb(i), otherwise 0 sensor variables E t describe observations (evidence) at times t (four sensor for four directions NSEW) values indicate detection of obstacle at a given direction NSEW (16 values for all directions) assume that sensor s error rate is ε sensor tables P(E t =e t X t =i) = (1-ε) 4-d it ε d it where d it is the number of deviations of observation e t from the true values for square i Localization (a practical example) P(X 1 E 1 =NSW) P(X 2 E 1 =NSW, E 2 =NS) The localization error defined as the Manhattan distance from the true location

Dynamic Bayesian networks Dynamic Bayesian network (DBN) is a Bayesian network that represents a temporal probability model. the variables and links exactly replicated from slice to slice It is enough to describe one slice. prior distribution P(X 0 ) transition model P(X 1 X 0 ) sensor model P(E 1 X 1 ) Each variable has parents either at the same slice or in the previous slice (Markov assumption). DBN vs HMM A hidden Markov model is a special case of a dynamic Bayesian network. Similarly, a dynamic Bayesian network can be encoded as a hidden Markov model one random variable in HMM whose values are n-tuples of values of state variables in DBN What is the difference? The relationship between DBN and HMM is roughly analogous to the relationship between ordinary Bayesian networks and full tabulated joint distribution. DBN with 20 Boolean state variables, each of which has three parents the transition model has 20 2 3 = 160 probabilities Hidden Markov model has one random variable with 2 20 values the transition model has 2 20 2 20 10 12 probabilities HMM requires much more space and inference is much more expensive

Exact inference in DBNs Dynamic Bayesian networks are Bayesian networks and we already have algorithms for inference in Bayesian networks. We can construct the full Bayesian network representation of a DBN by replicating slices to accommodate the observations (unrolling). A naive application of unrolling would not be particularly efficient as the inference time will increase with new observations. We can use an incremental approach that keeps in memory only two slices (via summing out the variables from previous slices). This is similar to variable elimination in Bayesian networks. The bad news are that constant space to represent the largest factor will be exponential in the number of state variables. O(d n+k ), where n is the number of variables, d is the domain size, k is the maximum number of parents of any state variable Approximate inference in DBNs We can try approximate inference methods (likelihood weighting is most easily adapted to the DBN context). We sample non-evidence nodes of the network in topological order, weighting each sample by the likelihood in accords to the observed evidence variables. each sample must go through a full network we can simply run all N samples together through the network (N samples are similar to the forward message, the samples themselves are approximate representation of the current state distribution) There is a problem! Samples are generated completely independently of the evidence! This behaviors decreases accuracy of the method as the weight of samples is usually very small and we need much larger number of samples We need to increase the number of samples exponentially with t. If a constant number of samples is used them the method accuracy is goes down significantly.

Particle filtering We need to focus the set of samples on the high-probability regions of the sample space. Particle filtering is doing exactly that by resampling the population of samples. a population of N samples is created by sampling from the prior distribution P(X 0 ) each sample is propagated forward by sampling the next state value x t+1 given the current value x t for the sample, based on the transition model P(X t+1 x t ) each sample is weighted by the likelihood it assigns to the new evidence, P(e t+1 x t+1 ) the population is resampled to generate a new population of N samples each sample is selected from the current population such that the probability of selection is proportional to its weight (the new samples are unweighted) Particle filtering algorithm e,n

Particle filtering (soundness) Assume that samples at time t are consistent with probability distribution N(x t e 1:t ) / N = P(x t e 1:t ) After propagation to time t+1, the number of samples is N(x t+1 e 1:t ) = Σ x t P(x t+1 x t ) N(x t e 1:t ) The total weightof the samples in state x t+1 after evidence is W(x t+1 e 1:t+1 ) = P(e t+1 x t+1 ) N(x t+1 e 1:t ) After resampling we will get N(x t+1 e 1:t+1 ) / N = α W(x t+1 e 1:t+1 ) = α P(e t+1 x t+1 ) N(x t+1 e 1:t ) = α P(e t+1 x t+1 ) Σ x t P(x t+1 x t ) N(x t e 1:t ) = α P(e t+1 x t+1 ) Σ x t P(x t+1 x t ) P(x t e 1:t ) = P(x t+1 e 1:t+1 ) Complex example (DBN) Let us consider a battery-powered mobile robot and model its behavior using a DBN. we need state variables modelling position of the robot X t = (X t,y t ), its velocity X t = (X t,y t ), and actual battery level the position at the next time step depends on the current position and velocity the velocity at the next steps depends on the current velocity and the state of battery we assume some method of measuring position (a fixed camera or onboard GPS) Z t similarly we assume a sensor measuring battery level BMeter t

Sensor failures A fully accurate sensor uses a sensor model with probabilities 1.0 along the diagonal and probabilities 0.0 elsewhere. In reality, noise always creeps into measurements. A Gaussian distribution with small variance might be used (for continuous measurements) Gaussian error model Real sensors fail. When a sensor fails, it simply sends nonsense (but does not say that it that way). Then the Gaussian error model does not work as we need. after two wrong measurements we are almost certain that the battery is empty if the failure disappears then the battery level quickly returns to 5, as if by magic (transient failures) in the meantime, the system may do some wrong decisions (shut down) if the wrong measures are still coming then the model is certain that the batters is empty (though a chance of such a sudden change is small) (persistent failure) Sensor transient failure model The simplest kind of failure model allows a certain probability that the sensor will return some completely incorrect value, regardless of the true state of the world. P(BMeter t =0 Battery t =5) = 0.03 Let us call this the transient failure model. The model brings some inertia that helps to overcome temporary blips in the meter reading. However, after more wrong readings the robot gradually coming to believe that the its battery is empty while in fact it may by that the meter has failed!

To model persistent failures we use additional state variable, BMBroken, that describes the status of the battery meter. The persistence arc has a CPT that gives a small probability of failure in any given time step (0.001), but specifies that the sensor stays broken once it breaks. When the sensor is OK, the sensor model is identical to the transient failure model. Once the sensor is known to be broken, the robot can only assume that its battery discharges at the normal rate. Sensor persistent failure model Continuous variables So far we assumed discrete random variables so the probability distribution can be captured by tables. How can we handle continuous variables? discretization dividing up the possible values into a fixed set of intervals often results in a considerably loss of accuracy and very large CPTs we can also use standard families of probability density functions that are specified by a finite number of parameters Gaussian (or normal) distribution described by the mean µ and the variance σ 2 as parameters

Continuous variables (conditional probability) How are the conditional probability tables specified in hybrid Bayesian networks? dependence of continuous variable on the continuous variable can be described using linear Gaussian distribution» the mean values varies linearly with the value of the parent» standard deviation is fixed dependence of continuous variable on the discrete variable» for each value of the discrete variable we specify parameters of standard distribution Dependence of discrete variables on the continuous variable» soft threshold function probit (probability unit) distribution logit (logistic function) distribution» probit is often a better fit to real situations, (the underlying decision process has a hard threshold, but the precise location of th threshold is subject to random Gaussian noise)» logit has much longer tails, sometimes easier to deal with mathematically Kalman filters Assume a problem of detecting actual position of an aircraft based on observations on radar screen. We will model the problem using a dynamic Bayesian network. random state variables describe actual location and speed of the aircraft the next location depends on the previous location and speed and can be modelled using linear Gaussian distribution P(X t+δ =x t+δ X t =x t, X t =x t ) = N(x t +x t Δ,σ 2 ) (x t+δ ) we observe location Z t again, we can use Gaussian distribution in the sensor model

Kalman filters (properties) Gaussian distribution has some nice properties when solving the tasks of filtering, prediction, and smoothing. If the distribution P(X t e 1:t ) is Gaussian and the transition model P(X t+1 x t ) is linear Gaussian then P(X t+1 e 1:t ) is also a Gaussian distribution. If P(X t+1 e 1:t ) is Gaussian and the sensor model P(e t+1 X t+1 ) is linear Gaussian then P(X t+1 e 1:t+1 ) is also a Gaussian distribution. To solve tasks we can use the message passing technique. Classical application of the Kalman filter is tracking movements of aircrafts and missiles using radars. Kalman filters are used to reconstruct trajectories of particles in physics and monitoring ocean currents. The range of applications is much larger than tracking of motion: any system characterized by continuous state variables and noisy measurements will do (pulp mills, chemical plants, nuclear reactors, plant ecosystems, and national economies). Kalman filters (applications)

Kalman filters (non-linearities) Kalman filters assumes linear Gaussian transition and sensor models. What if the model is non-linear? Example: assume a bird heading at high speed straight for a tree trunk Kalman filter a more realistic model A standard solution: switching Kalman filter multiple Kalman filters run in parallel, each using a different model of the system (one for straight flight, one for sharp left turns, and one for sharp right turns) a weighted sum of predictions is used, where the weight depends om how well each filter fits the current data this is a special case of DBN obtained by adding a discrete maneuver state variable to the network Keeping track of many objects We have considered state estimation problems involving a single object. What do happen if two or more objects generate the observations? Additional problem: Uncertainty about which object generated which observation!

Keeping track of many objects (approaches) Data association problem: the problem of associating observation data with the objects that generated them Exact reasoning means summing out the variables over all possible assignments of objects to observations for a single time slice and n objects it means n! mappings for T time slices we have (n!) T mappings Many different approximate methods are used. choose a single best assignment at each time step nearest-neighbor filter (chooses the closest pairing of predicted position and observation) a better approach is to choose the assignment that maximizes the joint probability of the current observations given the predicted positions (the Hungarian algorithm) Any method that commits to as single best assignment at each time step fails miserably under more difficult conditions (the prediction on the next step may be significantly wrong) particle filtering (maintains a large collection of possible current assignments) MCMC (explores the space of possible current assignments) Keeping track of many objects (in reality) Real applications of data association are typically much more complicated. false alarm (clutter) (observations not caused by real objects) detection failures (no observation is reported for a real object) new and disappearing objects (new objects arrive and old ones disappear)

2016 Roman Barták Department of Theoretical Computer Science and Mathematical Logic bartak@ktiml.mff.cuni.cz