Template-Based Representations. Sargur Srihari

Similar documents
Bayesian Networks BY: MOHAMAD ALSABBAGH

Reasoning Under Uncertainty Over Time. CS 486/686: Introduction to Artificial Intelligence

Artificial Intelligence

Introduction to Artificial Intelligence (AI)

CS 343: Artificial Intelligence

COMP90051 Statistical Machine Learning

Approximate Inference

CS 343: Artificial Intelligence

STA 4273H: Statistical Machine Learning

CS 343: Artificial Intelligence

Sequential Data and Markov Models

Human-Oriented Robotics. Temporal Reasoning. Kai Arras Social Robotics Lab, University of Freiburg

CS 188: Artificial Intelligence Fall 2011

Bayesian Network Representation

Modeling and Inference with Relational Dynamic Bayesian Networks Cristina Manfredotti

Probabilistic Graphical Models and Bayesian Networks. Artificial Intelligence Bert Huang Virginia Tech

CS 188: Artificial Intelligence Spring Announcements

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

CSE 473: Artificial Intelligence Probability Review à Markov Models. Outline

A Probabilistic Relational Model for Characterizing Situations in Dynamic Multi-Agent Systems

Markov Chains and Hidden Markov Models

CS 5522: Artificial Intelligence II

CS 5522: Artificial Intelligence II

Announcements. CS 188: Artificial Intelligence Fall Markov Models. Example: Markov Chain. Mini-Forward Algorithm. Example

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Probabilistic Graphical Models

From Bayesian Networks to Markov Networks. Sargur Srihari

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang

Markov Models. CS 188: Artificial Intelligence Fall Example. Mini-Forward Algorithm. Stationary Distributions.

Announcements. CS 188: Artificial Intelligence Fall VPI Example. VPI Properties. Reasoning over Time. Markov Models. Lecture 19: HMMs 11/4/2008

Bayesian Networks Inference with Probabilistic Graphical Models

CS 188: Artificial Intelligence Spring 2009

Linear Dynamical Systems

PROBABILISTIC REASONING OVER TIME

Hidden Markov Models. Hal Daumé III. Computer Science University of Maryland CS 421: Introduction to Artificial Intelligence 19 Apr 2012

Probabilistic Graphical Models

University of Washington Department of Electrical Engineering EE512 Spring, 2006 Graphical Models

CSEP 573: Artificial Intelligence

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

Reasoning Under Uncertainty: Bayesian networks intro

Bayesian networks Lecture 18. David Sontag New York University

Probabilistic Reasoning. (Mostly using Bayesian Networks)

Hidden Markov Models

Bayesian belief networks. Inference.

Directed Graphical Models or Bayesian Networks

CS 484 Data Mining. Classification 7. Some slides are from Professor Padhraic Smyth at UC Irvine

CS 5522: Artificial Intelligence II

Sistemi Cognitivi per le

CS221 / Autumn 2017 / Liang & Ermon. Lecture 13: Bayesian networks I

CS 5522: Artificial Intelligence II

The Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 9: Acoustic Models

Computational Complexity of Inference

Reasoning Under Uncertainty: Belief Network Inference

Review: Bayesian learning and inference

PROBABILITY. Inference: constraint propagation

Local Probabilistic Models: Continuous Variable CPDs

Introduction to Artificial Intelligence (AI)

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

Probability. CS 3793/5233 Artificial Intelligence Probability 1

Bayesian Models for Sequences

15-381: Artificial Intelligence. Hidden Markov Models (HMMs)

The Noisy Channel Model and Markov Models

HMM part 1. Dr Philip Jackson

Machine Learning for Data Science (CS4786) Lecture 19

Hidden Markov Models. Vibhav Gogate The University of Texas at Dallas

STA 414/2104: Machine Learning

Extensions of Bayesian Networks. Outline. Bayesian Network. Reasoning under Uncertainty. Features of Bayesian Networks.

Hidden Markov Models in Language Processing

Directed and Undirected Graphical Models

MAP Examples. Sargur Srihari

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov

Hidden Markov Models Part 1: Introduction

Bayesian Learning. Examples. Conditional Probability. Two Roles for Bayesian Methods. Prior Probability and Random Variables. The Chain Rule P (B)

1 : Introduction. 1 Course Overview. 2 Notation. 3 Representing Multivariate Distributions : Probabilistic Graphical Models , Spring 2014

CSE 473: Artificial Intelligence Spring 2014

Graphical models for part of speech tagging

CS839: Probabilistic Graphical Models. Lecture 2: Directed Graphical Models. Theo Rekatsinas

Course Introduction. Probabilistic Modelling and Reasoning. Relationships between courses. Dealing with Uncertainty. Chris Williams.

Rapid Introduction to Machine Learning/ Deep Learning

CSE 473: Artificial Intelligence

Doctoral Course in Speech Recognition. May 2007 Kjell Elenius

Sum-Product Networks: A New Deep Architecture

Temporal Modeling and Basic Speech Recognition

Temporal Models.

Hidden Markov Models

Graphical Models for Automatic Speech Recognition

CS325 Artificial Intelligence Ch. 15,20 Hidden Markov Models and Particle Filtering

Learning Bayesian Networks (part 1) Goals for the lecture

Graphical Models. Lecture 5: Template- Based Representa:ons. Andrew McCallum

Temporal probability models. Chapter 15

CS 188: Artificial Intelligence Fall Recap: Inference Example

Intelligent Systems (AI-2)

From Distributions to Markov Networks. Sargur Srihari

PROBABILISTIC REASONING SYSTEMS

10. Hidden Markov Models (HMM) for Speech Processing. (some slides taken from Glass and Zue course)

Variable Elimination: Basic Ideas

Introduction to Probabilistic Graphical Models

EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS

LEARNING DYNAMIC SYSTEMS: MARKOV MODELS

Transcription:

Template-Based Representations Sargur srihari@cedar.buffalo.edu 1

Topics Variable-based vs Template-based Temporal Models Basic Assumptions Dynamic Bayesian Networks Hidden Markov Models Linear Dynamical Systems Template Variables 2

Introduction A PGM specifies a joint distribution over a fixed set χ of variables A network for medical diagnosis can be applied to multiple patients, each with different symptoms and diseases Such networks are variable-based Sometimes need a much more complex space than a fixed set of variables 3

Temporal Setting Distributions of systems whose states change with time Ex: monitoring patient in ICU Obtain sensor readings at regular intervals Heart rate, blood pressure, EKG Need to track over time Ex: Robot location tracking As it moves around and gathers observations Need a single model to apply to trajectories Possibly of infinite length 4

ICU Monitoring using a DBN Transition model for true-hr is linear Gaussian: at time 0, µ=80 with σ= 30 Sensor models for ECG-HR and pleth-hr: Gaussians centered at true-hr Sensor model for SpO2 is a Gaussian Transition models for ECG-detached, pleth-detached, and patient-movement assign a high probability to persisting in the current state. 5

Template Single compact model Provides a template for entire class of distributions From the same type of trajectories Temporal modeling using Dynamic Bayesian Networks Or different pedigrees 6

Temporal Models State of the world evolves over time System State Value at time t is a snapshot of the relevant attributes (hidden or observed) Assignment of values to a set of random variables χ Use X i (t) to represent instantiation of the variable X i at time t X i is no longer a variable that takes a value; rather it is a template variable 7

Template Variable Notation Template is instantiated at different points of time t Each X i (t) takes a value in Val (X i ) For a set of variables X χ we use X (t1:t2) (t 1 <t 2 ) to denote the set of variables {X (t) : t [t 1, t 2 ]} Assignment of values to each variable X i (t) for each relevant time t is a trajectory 8

Trajectory An assignment to each variable X i (t) Goal is to represent probability distributions over such trajectories Representing such a distribution is difficult Need to make some simplifying assumptions 9

Vehicle Localization Task Track current location using faulty sensors System state encoded using X 1 : Location (car s current location) X 2 : Velocity (car s current velocity) X 3 : Weather (current weather) X 4 : Failure (failure status of sensor) X 5 : Obs (current observation) One such set of variables for every point t Queries Where is it now (t )? Where is it likely to be in ten minutes? Did it stop at the red light? Weather Velocity Location Failure Weather' Velocity' Location' Failure' Obs' Time slice t Time slice t +1 (a) 10

Basic Assumptions Discretize timeline into time slices Time granularity Set of random variables χ (0:T) ={ χ (0), χ (1),.., χ (T) } Using chain rule P(A,B,C)=P(A)P(B/A)P(C/A,B), T 1 P(χ (0:T ) ) = P(χ (0) ) P(χ (t+1) χ (0:t) ) Note dependence on all of time 0:t Distribution over trajectory is the product of conditional distributions Need to simplify this formulation t=0 11

Markovian System Future is conditionally independent of the past given the present A dynamic system over template variables χ satisfies the Markov assumption if χ (t+1) χ (t-1) χ (t) Helps define a more compact representation of the distribution T 1 P(χ (0:T ) ) = P(χ (0) ) P(χ (t+1) χ (t) ) t=0 12

Stationary Markovian System A Markovian dynamic system is stationary if P(χ (t+1) χ (t) ) is the same for all t. In this case we can represent the process using a transition model P(χ χ ) so that for any t 0 P(χ (t+1) =ξ χ (t) =ξ) = P(χ =ξ χ=ξ) Non-stationarity: if the conditional distribution changes with t E.g., variables in biological systems Change more in early years than in later years 13

Dynamic Bayesian Networks Stationarity allows representing probability distributions over infinite trajectories Transition model P(χ χ) can be represented as a conditional Bayesian network 14

DBN for monitoring a vehicle Represents system dynamics X 5 : Observation depends on car s location (and map not modeled) and error status of sensor (failure) (X 4 ) Weather Weather' X 1 : Bad weather makes sensor likely to fail (X 4 ) Velocity Location Velocity' Location' X 3 : Location depends on previous position and velocity (X 2 ) Failure Failure' Obs' Time slice t Time slice t +1 (a) 15

Interface Variables All variables are interface variables except for Obs since we assume that the sensor observation is generated at each point independently given other variables Weather Weather' Velocity Velocity' Location Location' Failure Failure' Obs' Time slice t Time slice t +1 (a) 16

2-Time slice Bayesian Network (2-TBN) A 2-Time slice BN for a process over χ is a conditional Bayesian Network over χ given χ I, where χ I χ is a set of interface variables Weather Velocity Weather' Velocity' χ={weather, Velocity, Location, Failure, Obs} χ ={Weather, Velocity, Location, Failure, Obs } Location Failure Location' Failure' Interface Variables: χ I ={Weather, Velocity, Location, Failure} Obs' Time slice t Time slice t +1 P(χ ' χ I ) (a) 17

Conditional Bayesian Network In a conditional Bayesian network only variables in χ have parents or CPDs χ ={Weather, Velocity, Location, Failure, Obs } Weather Weather' Velocity Velocity' Location Location' Failure Failure' Obs' Time slice t Time slice t +1 Interface variables χ I are those variables whose values at time t have a direct effect on variables at time t+1 χ I ={Weather, Velocity, Location, Failure} Thus only variables in χ I can be parents of variables in χ 2-TBN represents the conditional distribution (a) P(χ ' χ) = P(χ ' χ I ) = n i=1 P(χ i ' Pa χi ' ) 18

Example 2-TBN Simplest nontrivial DBN is a HMM Single state variable S and single observation variable O S S S 0 S 1 S 2 S 3 O O 1 O 2 O 3 (a) (b) (a) The 2-TBN for a generic HMM, (b) the unrolled DBN for four time slices χ={s, O} χ ={S, O } χ I ={S } B : P(χ ' χ I ) = P(S ' S) B 0 : P(S)P(O S) 19

Definition of DBN A dynamic Bayesian network is a pair (B 0,B à ) B 0 is a Bayesian Network over χ (0) representing the initial distribution over states B à Is a 2-TBN for the process For any desired time span T 0 the distribution over χ (0:T) is defined as a unrolled Bayesian network where for any i=1,..,n: The structure and CPDs of X i (0) are the same as those for X i in B 0 The structure and CPD of X i (t) for t 0 are the same as those for X i in B à 20

DBN for monitoring a vehicle Weather Weather' Weather 0 Weather 0 Weather 1 Weather 2 Velocity Velocity' Velocity 0 Velocity 0 Velocity 1 Velocity 2 Location Location' Location 0 Location 0 Location 1 Location 2 Failure Failure' Failure 0 Failure 0 Failure 1 Failure 2 Obs' Obs 0 Obs 0 Obs 1 Obs 2 Time slice t Time slice t +1 Time slice 0 Time slice 0 Time slice 1 Time slice 2 (a) (b) 0 (c) DBN unrolled over 3 steps (a) The 2-TBN (b) the time 0 network (c) resulting unrolled DBN over three time slices 21

DBN as a compact representation A DBN can be viewed as a compact representation from which we can generate an infinite number of Bayesian networks One for every T > 0 22

Classes of DBNs from HMMs (a) A factorial HMM 2-TBN has the structure of chains X i à X i (i=1,..n) With a single observed variable Y Ex: several sources of sound through a microphone (b) A coupled HMM X 1 X ' 1 X 1 X ' 1 Also a set of chains X i Each chain is an HMM with a private observation Y i Ex: monitoring temperature in a building for fire alarms X 2 X 3 X 4 X 2 ' X 3 ' X 4 ' X 2 X 3 X ' 2 X ' 3 Y' 1 Y' 2 X i is hidden temp of room, Y ' Y' 3 Y i is sensor reading Adjacent room temps interact (a) (b) 23

State observation models 0.3 0.5 0.1 S 1 0.7 0.4 0.5 S 2 S 3 S 4 Alternative way of thinking about a temporal process State evolves naturally on its own Our observation of it is a separate process Separates out system dynamics from observation model 0.6 0.9 24

State observation model Separates out dynamics of system from our ability to sense it Two independence assumptions 1. State variables evolve in a Markovian way ( X (t+1) X (0:t-1) X (t) ) 2. Observation at time t are conditionally independent given entire sequence (O (t) X (0:t-1), X (t+1:inf) X (t) ) View model as having two components: 1. transition model P(X X) and 2. observation model P(O X) 25

Converting 2-TBN to State-observation Any 2-Time slice Bayesian Network can be converted to a state observation representation For any observed variable Y (that does not already satisfy structural restrictions) introduce new variable Y whose only parent is Y. View Y as being hidden and interpret observations of Y as observations on Y In effect Y is a perfectly reliable sensor of Y While the transformed network is probabilistically equivalent it obscures independence properties 26

Applications of State Observation Models Hidden Markov Models Linear Dynamical Systems 27

HMMs S S S 0 S 1 S 2 S 3 O O 1 O 2 O 3 (a) (b) Transition model P(S S) is assumed to be sparse with many possible transitions having zero probability Different graphical notation, generally cyclic Use representation in which nodes represent different states of the system 28

HMM transition graphs are very different from PGMs 0.3 0.5 0.1 S 1 0.7 S 2 0.4 S 3 0.5 S 4 0.6 0.9 s1 s2 s3 s4 s1 0.3 0.7 0 0 s2 0 0 0.4 0.6 s3 0.5 0 0 0.5 s4 0 0.9 0 0.1 Nodes are states or possible values of the state variables Edges represent transitions between states, or entries in the CPDs 29

HMMs for Speech Recognition Three distinct layers 1. Language Model: generates sentences as sequences of words 2. Word Model: described as a sequence of phonemes /p//u//sh/ 3. Acoustic model: shows progression of the acoustic signal through a phoneme 30

Language Model Probability distribution over sequences of words Bigram model Markov model defined via distributions P(W i W i-1 ) for the i th word in sequence Does not take into account position in sentence P(W i W i-1 ) is the same for all i Trigram model Model distributions as P(W i W i-1, W i-2 ) Although naiive this model works well Due to large amounts of training data without manual labeling

Phoneme Model S 1 S 2 S 3 S 7 A phoneme-level HMM for a complex phoneme S 4 S 6 S 5 Basic phonetic units corresponding to distinct sounds P vs. B Sound is breathy, aspirated, nasalized, and more International phonetic Alphabet has 100 phonemes 32

Acoustic Level Signal segmented into short time frames (around 10-25ms). A phoneme lasts over a sequence of these partitions Different acoustics for beginning, middle and end of a phoneme Thus a HMM at the phoneme level Observation represents features extracted from acoustic signal Features discretized into bins or a GMM 33

Combining models with hierarchical HMM Three models (language, phoneme and acoustic) combined in a huge hierarchical HMM Defines a joint probability distribution over words, phonemes and basic acoustic units In bigram model, states have the form (w,i,j) w =current word, i is a phoneme within that word and j is an acoustic position within that phoneme Word HMM has a start state and an end state Each sequence is a trajectory through acoustic HMMs of individual phonemes 34

Hierarchical HMM to DBN DBN framework is much more flexible to introduce extensions to the model Variables represent different states of different levels of hierarchy (word, phoneme, and intraphone state) along with auxiliary variables (to capture control architecture of hierarchical HMM) 35

Linear Dynamical Systems One or more real-valued variables that evolve linearly over time with some Gaussian noise Also called Kalman filters After the algorithm used to perform tracking A linear dynamical system can be viewed as a DBN where the variables are all continuous and all the dependencies are linear Gaussian 36

Another template model: Genetics example Family tree (pedigree) individuals all with own properties PGM encodes joint distribution over properties of all individuals Cannot have a single variable-based model Each family has different family tree Yet mechanism to transmit genes are identical G: Genotype B: Blood Type 37