Graphical Even Models and Causal Even Models Chris Meek Microsof Research
Graphical Models Defines a join disribuion P X over a se of variables X = X 1,, X n A graphical model M =< G, Θ > G =< X, E > is a direced acyclic graph. Θ = {Θ 1,, Θ n } where Θ i defines he condiional disribuion P(X i π i ) where π i are he parens of X i in G. Learning: Assume we see many draws from P X. X 1 X 2 X 3 P X 1 = f 1 X 1, Θ 1 P X 2 = f 2 X 2, Θ 2 P X 3 X 1, X 2 = f 3 (X 3, X 1, X 2, Θ 3 )
Graphical Models Explaining away ype reasoning Wha is probabiliy of Burglary given AlarmSound? Wha is probabiliy of Burglary given AlarmSound and a NewsRepor of an earhquake? Earhquake? Burglary? NewsRepor? AlarmSound?
Graphical Models Explaining away ype reasoning Wha is probabiliy of Burglary given AlarmSound? Wha is probabiliy of Burglary given AlarmSound and a NewsRepor of an earhquake? Wha if he NewsRepor said he earhquake was afer he Alarm wen off?
Ouline Temporal Even Sequences Graphical Even Models Learning Graphical Even Models Learning Causal Dependencies Causal Even Model Graphical Even Models
Modeling emporal even sreams A emporal even sream is a ime-samped sream of labeled evens. This ype of daa is pervasive: daacener even logs, search queries, Wan o model: Wha evens will happen when, based on wha evens have happened when.
Temporal Even Sequences: Even Logs from a Daacener LinkFail SQLTimeou CDriveFail LinkFail SQLTimeou LinkFail HomailAuhFail LinkFail QueueOverflow SQLTimeou QueueOverflow QueueOverflow QueueOverflow CDriveFail QueueOverflow ReimageFail SendTech 2:15 pm 2:30 pm 2:45 pm 3:00 pm L is he se of possible evens (i.e., hings ha can happen) D = 1, l 1, 2, l 2, 3, l 3, n, l n where l L and i < i+1
Marked poin processes Trea daa as a realizaion of a marked poin process: x = 1, l 1,, n, l n Forward in ime likelihood: p x = n i=1 p( i, l i h i ) where he hisory h i = h i (x) = 1, l 1,, i 1, l i 1 Any p( i, l i h i ) can be represened via condiional inensiies λ l i h i : p i, l i h i = l l l=l λ l i h i i e i λ 0 l τ h i dτ
Proof skech Given pdf p() define: λ p() 1 0 p τ dτ P() Then, Calculus: P = λ 1 P P = 1 e 0 λ τ dτ λ τ dτ p = λ e 0 Given p, l = p p(l ) define λ l λ p l Then p, l = p p l = λ l e l 0 λl τ dτ
Condiional inensiies p i, l i h i = l λ l i h l l=l i i e i 0 λ l τ h i dτ
Condiional inensiies λ h λ h p i, l i h i = l λ l i h l l=l i i e i 0 λ l τ h i dτ
Condiional inensiies λ h λ h p i, l i h i = l λ l i h l l=l i i e i 0 λ l τ h i dτ
Condiional inensiies λ h λ h p i, l i h i = l λ l i h l l=l i i e i 0 λ l τ h i dτ
Condiional inensiies λ h λ h p i, l i h i = l λ l i h l l=l i i e i 0 λ l τ h i dτ
Condiional inensiies λ h λ h p i, l i h i = l λ l i h l l=l i i e i 0 λ l τ h i dτ
Condiional inensiies λ h λ h p i, l i h i = l λ l i h l l=l i i e i 0 λ l τ h i dτ
Condiional inensiies λ h λ h p i, l i h i = l λ l i h l l=l i i e i 0 λ l τ h i dτ
Condiional inensiies λ h λ h p i, l i h i = l λ l i h l l=l i i e i 0 λ l τ h i dτ
Condiional inensiies λ h λ h p i, l i h i = l λ l i h l l=l i i e i 0 λ l τ h i dτ
Condiional inensiies λ h λ h p i, l i h i = l λ l i h l l=l i i e i 0 λ l τ h i dτ
Condiional inensiies λ h λ h p i, l i h i = l λ l i h l l=l i i e i 0 λ l τ h i dτ
Condiional inensiies λ h λ h 1) Behavior of differen even ypes specified separaely 2) Highlighs dependency of each even ypes on hisory
Even Sequence Noaion Example: L = {a, b, c} D = 1, l 1, n, l n e.g., 1, a, 3, b, 3 = 5, a, 8, c h i = h(, D) is he hisory up o i h 3 = 1, a, 3, b, (5, a) h A is he filered hisory for A L D a = { 1, a, 5, a }
Graphical Even Models A Graphical Even Model (GEM) is a pair G, Θ Verices for each even ype L = a, b, c Edges represen poenial dependencies Θ l Θ parameerizes inensiy funcion for l λ a h, Θ a λ b h, Θ b λ c h, Θ c = λ a h πa, Θ a = λ b h πb, Θ b = λ c h πc, Θ c
Learning Graphical Even Models Specify funcional form(s) for inensiies λ l wih separae parameers for each even Likelihood facors according o L so we can learn each inensiy funcion separaely. Bayesians also require facorizaion of prior Search over space of direced graphs Add/remove parens ha improve he score
Piecewise-Consan CIMs (PCIM) Idea: resric λ l i h i o be piecewise consan in for all even sequences A sae funcion σ(, h) maps hisories o a discree se of saes Σ A PCIM is a pair M = S, Θ where Srucure S= σ l, h, Σ l l L Parameers Θ = Θ l l L and Θ l = λ ls s Σl
Piecewise-Consan CIMs σ a (, h) σ b (, h) σ c (, h) a: b: c: 10 15 20 25 30 35 Sae Funcions (coloring) λ a ( ) = 0 λ a ( ) = 0.1 λ a ( ) = 10. λ c ( ) = 10
Piecewise-Consan CIM A PCIM (CIM) M = S, Θ (where Θ = {Θ 1,, Θ L } and Θ l = λ ls s Σl ) has likelihood c(l,s) p D M = λ ls e λ ls d(l,s) l L s Σ l c(l, s) is he coun of even l when σ l, h = s in D d(l, s) is he oal duraion of σ l, h = s in D
Piecewise-Consan CIM Produc of Gammas is conjugae prior, even hough he likelihood isn a produc of exponenials! p λ ls α ls, β ls = β ls α λ ls 1 Γ α ls e β ls λ ls ls Closed-form poserior: p λ ls α ls, β ls, D, S = p λ ls α ls + c(l, s), β ls + d(l, s) Closed-form marginal likelihood: p D S = γ ls D ls α ls γ ls D = β ls Γ α ls Γ α ls + c(l, s) β ls + d(l, s) α ls+c(l,s)
Piecewise-Consan CIM Defining PCIM Srucures Example Le B = f 1, h,, f n (, ) where f i, h is a basis sae funcion (BSF) A family of srucures S(B) is obained by combining BSFs. We use decision rees bu one could use decision graphs. no σ l, h = r λ lr f i, h = s i? no σ l, h = s λ ls yes f j, h = s j? yes σ l, h = λ l
Piecewise-Consan CIM Example Types of basis sae funcions f, h Even-ype specific sae funcions f, h = f(, h l ) depends only on he hisory of a specific even ype Windowed sae funcions f, h = f, h s, e depends only on he hisory during a window relaive o ime. Hisorical sae funcions f, h depends on he las evens ha have happened bu no heir imes.
Piecewise-Consan CIM σ a (, h) σ b (, h) σ c (, h)
Piecewise-Consan CIM:Learning We use a Bayesian Model selecion approach o choose S S B For each l L Sar wih empy decision ree (i.e.,, h σ l, h = k) For each leaf in decision ree Evaluae each possible spli (f B, s) Choose spli ha mos improves he marginal likelihood Alernaively one could use MCMC o average over S B.
Example: λ RebooFail decision ree false IniReboo [30 min] rue 216 Even ypes λ = 0 false SuccReboo [30min] rue false Unhealh VersionCheck [30 min] rue λ = 0 λ = 3.36 false Healhy VersionCheck [30 min] rue λ = 41.4 λ = 0
Learning Causal Graphical Models Direced Acyclic Assumpion (DAG) causal model can be represened by a direced acyclic graph G = X, E Daa Assumpion: world is described by some P X and he observed world is some P O O X Reliable Informaion Assumpion (Reliable) he world provides reliable informaion abou independencies among observed variables O X. I(A, C, B) means A is independen of B given C
Learning Causal Graphical Models Assumpions ha connec observed world P and causal model G Causal Markov Assumpion (CMA): If d G A, B, C I P A, B, C Noe 1: d G (A, B, C) is d-separaion: a verex separaion crierion Noe 2: A graphical model non-causal Markov w.r.. P X i defines Causal Faihfulness Assumpion (CFA): If I P A, B, C d G (A, B, C)
Learning Causal Graphical Models Learning Scenarios Complee daa: All variables observed (X = O) Causal sufficiency: No pair of observed variables have unobserved common ancesors. General case: O X Goal: In each learning scenario, use independence facs o idenify causal informaion common o every graph wih hose independencies/separaion facs.
Learning Causal Graphical Models Manra: Correlaion does no imply causaion Complee daa: Indisinguishable A B A B General case: Also indisinguishable H H H A B A B A B
Learning Causal Graphical Models Ineresing general case resul Under he assumpions DAG, Reliable, CMA and CFA If he only independence facs we observe o hold are I A,, B, I A, C, D, I(B, C, D) hen C is a cause of D. A C D B
Learning Causal GEMs Sep 1: Change he separaion crierion from d-separaion o δ-separaion δ(a, C, B) in G = L, E if and only if d(a, C, B) in G B where G B = L, E B and E B = l 1, l 2 E l 1 B Sep 2: Change from independence ess o facorizaion (process independence) ess Sep 3: Assume analog of CMA, CFA, Reliable Sep 4: Prove hings
Learning Causal GEMs Learning Scenarios Complee daa: All variables observed (X = O) Resul: Can recover he srucure. Causal sufficiency: No pair of observed variables have unobserved common ancesors. Resul: Can recover he srucure over O. (all causes) General case: O X In Progress: Some sufficien condiions for cause Some sufficien condiions for non-cause Some sufficien condiions for exisence of unmeasured common cause
Open Issues Characerize wha one can learn in he general case Jusificaion of using Process Independence o learn cause (e.g., compleeness of δ-separaion) Principled approaches o esing process independence saemens. Consisency of score learning for process independence. Relaxing assumpions such as he reliabiliy assumpion