Graphical Event Models and Causal Event Models. Chris Meek Microsoft Research

Similar documents
Foundations of Statistical Inference. Sufficient statistics. Definition (Sufficiency) Definition (Sufficiency)

A Bayesian Approach to Spectral Analysis

Vehicle Arrival Models : Headway

Y. Xiang, Learning Bayesian Networks 1

5. Stochastic processes (1)

Random Processes 1/24

Two Popular Bayesian Estimators: Particle and Kalman Filters. McGill COMP 765 Sept 14 th, 2017

Diebold, Chapter 7. Francis X. Diebold, Elements of Forecasting, 4th Edition (Mason, Ohio: Cengage Learning, 2006). Chapter 7. Characterizing Cycles

Answers to QUIZ

Written HW 9 Sol. CS 188 Fall Introduction to Artificial Intelligence

Kriging Models Predicting Atrazine Concentrations in Surface Water Draining Agricultural Watersheds

Comparing Means: t-tests for One Sample & Two Related Samples

3.1.3 INTRODUCTION TO DYNAMIC OPTIMIZATION: DISCRETE TIME PROBLEMS. A. The Hamiltonian and First-Order Conditions in a Finite Time Horizon

Lecture 2-1 Kinematics in One Dimension Displacement, Velocity and Acceleration Everything in the world is moving. Nothing stays still.

Anno accademico 2006/2007. Davide Migliore

Notes for Lecture 17-18

SOLUTIONS TO ECE 3084

Continuous Time. Time-Domain System Analysis. Impulse Response. Impulse Response. Impulse Response. Impulse Response. ( t) + b 0.

Modal identification of structures from roving input data by means of maximum likelihood estimation of the state space model

Inference of Sparse Gene Regulatory Network from RNA-Seq Time Series Data

Outline. lse-logo. Outline. Outline. 1 Wald Test. 2 The Likelihood Ratio Test. 3 Lagrange Multiplier Tests

Stochastic Structural Dynamics. Lecture-6

RC, RL and RLC circuits

Guest Lectures for Dr. MacFarlane s EE3350 Part Deux

T L. t=1. Proof of Lemma 1. Using the marginal cost accounting in Equation(4) and standard arguments. t )+Π RB. t )+K 1(Q RB

SPH3U: Projectiles. Recorder: Manager: Speaker:

Transform Techniques. Moment Generating Function

Finish reading Chapter 2 of Spivak, rereading earlier sections as necessary. handout and fill in some missing details!

Causal Reasoning for Events in Continuous Time: A Decision Theoretic Approach

3.6 Derivatives as Rates of Change

Temporal probability models

Speech and Language Processing

Block Diagram of a DCS in 411

Chapter 2. Models, Censoring, and Likelihood for Failure-Time Data

Matlab and Python programming: how to get started

8. Basic RL and RC Circuits

Reliability of Technical Systems

An introduction to the theory of SDDP algorithm

Sensors, Signals and Noise

Hidden Markov Models

Self assessment due: Monday 4/29/2019 at 11:59pm (submit via Gradescope)

Stochastic models and their distributions

CS Homework Week 2 ( 2.25, 3.22, 4.9)

Estimation of Poses with Particle Filters

Mechanical Fatigue and Load-Induced Aging of Loudspeaker Suspension. Wolfgang Klippel,

CHAPTER 2 Signals And Spectra

Maximum Likelihood Parameter Estimation in State-Space Models

GMM - Generalized Method of Moments

Lab 10: RC, RL, and RLC Circuits

Predator - Prey Model Trajectories and the nonlinear conservation law

EECE251. Circuit Analysis I. Set 4: Capacitors, Inductors, and First-Order Linear Circuits

Nature Neuroscience: doi: /nn Supplementary Figure 1. Spike-count autocorrelations in time.

Logic in computer science

The Arcsine Distribution

Filtering Turbulent Signals Using Gaussian and non-gaussian Filters with Model Error

KINEMATICS IN ONE DIMENSION

ACE 564 Spring Lecture 7. Extensions of The Multiple Regression Model: Dummy Independent Variables. by Professor Scott H.

On Measuring Pro-Poor Growth. 1. On Various Ways of Measuring Pro-Poor Growth: A Short Review of the Literature

Econ107 Applied Econometrics Topic 7: Multicollinearity (Studenmund, Chapter 8)

Christos Papadimitriou & Luca Trevisan November 22, 2016

1. Consider a pure-exchange economy with stochastic endowments. The state of the economy

m = 41 members n = 27 (nonfounders), f = 14 (founders) 8 markers from chromosome 19

Failure of the work-hamiltonian connection for free energy calculations. Abstract

School and Workshop on Market Microstructure: Design, Efficiency and Statistical Regularities March 2011

Economics 8105 Macroeconomic Theory Recitation 6

Phys 221 Fall Chapter 2. Motion in One Dimension. 2014, 2005 A. Dzyubenko Brooks/Cole

MOMENTUM CONSERVATION LAW

Part III: Chap. 2.5,2.6 & 12

Homework 4 SOLUTION EE235, Summer 2012

22. Inbreeding. related measures: = coefficient of kinship, a measure of relatedness of individuals of a population; panmictic index, P = 1 F;

Math 10C: Relations and Functions PRACTICE EXAM

Lecture 4 Notes (Little s Theorem)

Decentralized Stochastic Control with Partial History Sharing: A Common Information Approach

5.2. The Natural Logarithm. Solution

Kinematics Vocabulary. Kinematics and One Dimensional Motion. Position. Coordinate System in One Dimension. Kinema means movement 8.

Simulation-Solving Dynamic Models ABE 5646 Week 2, Spring 2010

Tom Heskes and Onno Zoeter. Presented by Mark Buller

Module 2 F c i k c s la l w a s o s f dif di fusi s o i n

Some Basic Information about M-S-D Systems

d 1 = c 1 b 2 - b 1 c 2 d 2 = c 1 b 3 - b 1 c 3

Testing for a Single Factor Model in the Multivariate State Space Framework

Homework-8(1) P8.3-1, 3, 8, 10, 17, 21, 24, 28,29 P8.4-1, 2, 5

13.3 Term structure models

EXERCISES FOR SECTION 1.5

Math 105 Second Midterm March 16, 2017

Brock University Physics 1P21/1P91 Fall 2013 Dr. D Agostino. Solutions for Tutorial 3: Chapter 2, Motion in One Dimension

Object tracking: Using HMMs to estimate the geographical location of fish

Demodulation of Digitally Modulated Signals

Linear Time-invariant systems, Convolution, and Cross-correlation

Timed Circuits. Asynchronous Circuit Design. Timing Relationships. A Simple Example. Timed States. Timing Sequences. ({r 6 },t6 = 1.

Financial Econometrics Jeffrey R. Russell Midterm Winter 2009 SOLUTIONS

Adaptation and Synchronization over a Network: stabilization without a reference model

2.7. Some common engineering functions. Introduction. Prerequisites. Learning Outcomes

Explore 2 Proving the Vertical Angles Theorem

( ) = b n ( t) n " (2.111) or a system with many states to be considered, solving these equations isn t. = k U I ( t,t 0 )! ( t 0 ) (2.

Application of a Stochastic-Fuzzy Approach to Modeling Optimal Discrete Time Dynamical Systems by Using Large Scale Data Processing

Solutions from Chapter 9.1 and 9.2

Navneet Saini, Mayank Goyal, Vishal Bansal (2013); Term Project AML310; Indian Institute of Technology Delhi

Internet Traffic Modeling for Efficient Network Research Management Prof. Zhili Sun, UniS Zhiyong Liu, CATR

ACE 562 Fall Lecture 5: The Simple Linear Regression Model: Sampling Properties of the Least Squares Estimators. by Professor Scott H.

Transcription:

Graphical Even Models and Causal Even Models Chris Meek Microsof Research

Graphical Models Defines a join disribuion P X over a se of variables X = X 1,, X n A graphical model M =< G, Θ > G =< X, E > is a direced acyclic graph. Θ = {Θ 1,, Θ n } where Θ i defines he condiional disribuion P(X i π i ) where π i are he parens of X i in G. Learning: Assume we see many draws from P X. X 1 X 2 X 3 P X 1 = f 1 X 1, Θ 1 P X 2 = f 2 X 2, Θ 2 P X 3 X 1, X 2 = f 3 (X 3, X 1, X 2, Θ 3 )

Graphical Models Explaining away ype reasoning Wha is probabiliy of Burglary given AlarmSound? Wha is probabiliy of Burglary given AlarmSound and a NewsRepor of an earhquake? Earhquake? Burglary? NewsRepor? AlarmSound?

Graphical Models Explaining away ype reasoning Wha is probabiliy of Burglary given AlarmSound? Wha is probabiliy of Burglary given AlarmSound and a NewsRepor of an earhquake? Wha if he NewsRepor said he earhquake was afer he Alarm wen off?

Ouline Temporal Even Sequences Graphical Even Models Learning Graphical Even Models Learning Causal Dependencies Causal Even Model Graphical Even Models

Modeling emporal even sreams A emporal even sream is a ime-samped sream of labeled evens. This ype of daa is pervasive: daacener even logs, search queries, Wan o model: Wha evens will happen when, based on wha evens have happened when.

Temporal Even Sequences: Even Logs from a Daacener LinkFail SQLTimeou CDriveFail LinkFail SQLTimeou LinkFail HomailAuhFail LinkFail QueueOverflow SQLTimeou QueueOverflow QueueOverflow QueueOverflow CDriveFail QueueOverflow ReimageFail SendTech 2:15 pm 2:30 pm 2:45 pm 3:00 pm L is he se of possible evens (i.e., hings ha can happen) D = 1, l 1, 2, l 2, 3, l 3, n, l n where l L and i < i+1

Marked poin processes Trea daa as a realizaion of a marked poin process: x = 1, l 1,, n, l n Forward in ime likelihood: p x = n i=1 p( i, l i h i ) where he hisory h i = h i (x) = 1, l 1,, i 1, l i 1 Any p( i, l i h i ) can be represened via condiional inensiies λ l i h i : p i, l i h i = l l l=l λ l i h i i e i λ 0 l τ h i dτ

Proof skech Given pdf p() define: λ p() 1 0 p τ dτ P() Then, Calculus: P = λ 1 P P = 1 e 0 λ τ dτ λ τ dτ p = λ e 0 Given p, l = p p(l ) define λ l λ p l Then p, l = p p l = λ l e l 0 λl τ dτ

Condiional inensiies p i, l i h i = l λ l i h l l=l i i e i 0 λ l τ h i dτ

Condiional inensiies λ h λ h p i, l i h i = l λ l i h l l=l i i e i 0 λ l τ h i dτ

Condiional inensiies λ h λ h p i, l i h i = l λ l i h l l=l i i e i 0 λ l τ h i dτ

Condiional inensiies λ h λ h p i, l i h i = l λ l i h l l=l i i e i 0 λ l τ h i dτ

Condiional inensiies λ h λ h p i, l i h i = l λ l i h l l=l i i e i 0 λ l τ h i dτ

Condiional inensiies λ h λ h p i, l i h i = l λ l i h l l=l i i e i 0 λ l τ h i dτ

Condiional inensiies λ h λ h p i, l i h i = l λ l i h l l=l i i e i 0 λ l τ h i dτ

Condiional inensiies λ h λ h p i, l i h i = l λ l i h l l=l i i e i 0 λ l τ h i dτ

Condiional inensiies λ h λ h p i, l i h i = l λ l i h l l=l i i e i 0 λ l τ h i dτ

Condiional inensiies λ h λ h p i, l i h i = l λ l i h l l=l i i e i 0 λ l τ h i dτ

Condiional inensiies λ h λ h p i, l i h i = l λ l i h l l=l i i e i 0 λ l τ h i dτ

Condiional inensiies λ h λ h p i, l i h i = l λ l i h l l=l i i e i 0 λ l τ h i dτ

Condiional inensiies λ h λ h 1) Behavior of differen even ypes specified separaely 2) Highlighs dependency of each even ypes on hisory

Even Sequence Noaion Example: L = {a, b, c} D = 1, l 1, n, l n e.g., 1, a, 3, b, 3 = 5, a, 8, c h i = h(, D) is he hisory up o i h 3 = 1, a, 3, b, (5, a) h A is he filered hisory for A L D a = { 1, a, 5, a }

Graphical Even Models A Graphical Even Model (GEM) is a pair G, Θ Verices for each even ype L = a, b, c Edges represen poenial dependencies Θ l Θ parameerizes inensiy funcion for l λ a h, Θ a λ b h, Θ b λ c h, Θ c = λ a h πa, Θ a = λ b h πb, Θ b = λ c h πc, Θ c

Learning Graphical Even Models Specify funcional form(s) for inensiies λ l wih separae parameers for each even Likelihood facors according o L so we can learn each inensiy funcion separaely. Bayesians also require facorizaion of prior Search over space of direced graphs Add/remove parens ha improve he score

Piecewise-Consan CIMs (PCIM) Idea: resric λ l i h i o be piecewise consan in for all even sequences A sae funcion σ(, h) maps hisories o a discree se of saes Σ A PCIM is a pair M = S, Θ where Srucure S= σ l, h, Σ l l L Parameers Θ = Θ l l L and Θ l = λ ls s Σl

Piecewise-Consan CIMs σ a (, h) σ b (, h) σ c (, h) a: b: c: 10 15 20 25 30 35 Sae Funcions (coloring) λ a ( ) = 0 λ a ( ) = 0.1 λ a ( ) = 10. λ c ( ) = 10

Piecewise-Consan CIM A PCIM (CIM) M = S, Θ (where Θ = {Θ 1,, Θ L } and Θ l = λ ls s Σl ) has likelihood c(l,s) p D M = λ ls e λ ls d(l,s) l L s Σ l c(l, s) is he coun of even l when σ l, h = s in D d(l, s) is he oal duraion of σ l, h = s in D

Piecewise-Consan CIM Produc of Gammas is conjugae prior, even hough he likelihood isn a produc of exponenials! p λ ls α ls, β ls = β ls α λ ls 1 Γ α ls e β ls λ ls ls Closed-form poserior: p λ ls α ls, β ls, D, S = p λ ls α ls + c(l, s), β ls + d(l, s) Closed-form marginal likelihood: p D S = γ ls D ls α ls γ ls D = β ls Γ α ls Γ α ls + c(l, s) β ls + d(l, s) α ls+c(l,s)

Piecewise-Consan CIM Defining PCIM Srucures Example Le B = f 1, h,, f n (, ) where f i, h is a basis sae funcion (BSF) A family of srucures S(B) is obained by combining BSFs. We use decision rees bu one could use decision graphs. no σ l, h = r λ lr f i, h = s i? no σ l, h = s λ ls yes f j, h = s j? yes σ l, h = λ l

Piecewise-Consan CIM Example Types of basis sae funcions f, h Even-ype specific sae funcions f, h = f(, h l ) depends only on he hisory of a specific even ype Windowed sae funcions f, h = f, h s, e depends only on he hisory during a window relaive o ime. Hisorical sae funcions f, h depends on he las evens ha have happened bu no heir imes.

Piecewise-Consan CIM σ a (, h) σ b (, h) σ c (, h)

Piecewise-Consan CIM:Learning We use a Bayesian Model selecion approach o choose S S B For each l L Sar wih empy decision ree (i.e.,, h σ l, h = k) For each leaf in decision ree Evaluae each possible spli (f B, s) Choose spli ha mos improves he marginal likelihood Alernaively one could use MCMC o average over S B.

Example: λ RebooFail decision ree false IniReboo [30 min] rue 216 Even ypes λ = 0 false SuccReboo [30min] rue false Unhealh VersionCheck [30 min] rue λ = 0 λ = 3.36 false Healhy VersionCheck [30 min] rue λ = 41.4 λ = 0

Learning Causal Graphical Models Direced Acyclic Assumpion (DAG) causal model can be represened by a direced acyclic graph G = X, E Daa Assumpion: world is described by some P X and he observed world is some P O O X Reliable Informaion Assumpion (Reliable) he world provides reliable informaion abou independencies among observed variables O X. I(A, C, B) means A is independen of B given C

Learning Causal Graphical Models Assumpions ha connec observed world P and causal model G Causal Markov Assumpion (CMA): If d G A, B, C I P A, B, C Noe 1: d G (A, B, C) is d-separaion: a verex separaion crierion Noe 2: A graphical model non-causal Markov w.r.. P X i defines Causal Faihfulness Assumpion (CFA): If I P A, B, C d G (A, B, C)

Learning Causal Graphical Models Learning Scenarios Complee daa: All variables observed (X = O) Causal sufficiency: No pair of observed variables have unobserved common ancesors. General case: O X Goal: In each learning scenario, use independence facs o idenify causal informaion common o every graph wih hose independencies/separaion facs.

Learning Causal Graphical Models Manra: Correlaion does no imply causaion Complee daa: Indisinguishable A B A B General case: Also indisinguishable H H H A B A B A B

Learning Causal Graphical Models Ineresing general case resul Under he assumpions DAG, Reliable, CMA and CFA If he only independence facs we observe o hold are I A,, B, I A, C, D, I(B, C, D) hen C is a cause of D. A C D B

Learning Causal GEMs Sep 1: Change he separaion crierion from d-separaion o δ-separaion δ(a, C, B) in G = L, E if and only if d(a, C, B) in G B where G B = L, E B and E B = l 1, l 2 E l 1 B Sep 2: Change from independence ess o facorizaion (process independence) ess Sep 3: Assume analog of CMA, CFA, Reliable Sep 4: Prove hings

Learning Causal GEMs Learning Scenarios Complee daa: All variables observed (X = O) Resul: Can recover he srucure. Causal sufficiency: No pair of observed variables have unobserved common ancesors. Resul: Can recover he srucure over O. (all causes) General case: O X In Progress: Some sufficien condiions for cause Some sufficien condiions for non-cause Some sufficien condiions for exisence of unmeasured common cause

Open Issues Characerize wha one can learn in he general case Jusificaion of using Process Independence o learn cause (e.g., compleeness of δ-separaion) Principled approaches o esing process independence saemens. Consisency of score learning for process independence. Relaxing assumpions such as he reliabiliy assumpion