Learning Partially Observable Markov Models from First Passage Times

Similar documents
Nondeterministic Automata vs Deterministic Automata

Project 6: Minigoals Towards Simplifying and Rewriting Expressions

Today. Recap: Reasoning Over Time. Demo Bonanza! CS 188: Artificial Intelligence. Advanced HMMs. Speech recognition. HMMs. Start machine learning

CS311 Computational Structures Regular Languages and Regular Grammars. Lecture 6

Bayesian Networks: Approximate Inference

Abstraction of Nondeterministic Automata Rong Su

Automatic Synthesis of New Behaviors from a Library of Available Behaviors

Statistical models for record linkage

Outline. Theory-based Bayesian framework for property induction Causal structure induction

Alpha Algorithm: Limitations

Prefix-Free Regular-Expression Matching

Reinforcement Learning

Reinforcement learning II

CS 188: Artificial Intelligence Fall Announcements

Lecture 6: Coding theory

Spacetime and the Quantum World Questions Fall 2010

Hidden Markov Models

Petri Nets. Rebecca Albrecht. Seminar: Automata Theory Chair of Software Engeneering

Generalization of 2-Corner Frequency Source Models Used in SMSIM

1. For each of the following theorems, give a two or three sentence sketch of how the proof goes or why it is not true.

Regular languages refresher

Active Diagnosis. Serge Haddad. Vecos 16. October the 6th 2016

Inductive and statistical learning of formal grammars

CS 573 Automata Theory and Formal Languages

Common intervals of genomes. Mathieu Raffinot CNRS LIAFA

Learning Partially Observable Markov Models from First Passage Times

Finite State Automata and Determinisation

ANALYSIS AND MODELLING OF RAINFALL EVENTS

NON-DETERMINISTIC FSA

Chapter 2 Finite Automata

Hybrid Systems Modeling, Analysis and Control

Estimation of Sequence Components using Magnitude Information

Lecture Notes No. 10

Ling 3701H / Psych 3371H: Lecture Notes 9 Hierarchic Sequential Prediction

Unfoldings of Networks of Timed Automata

CS S-12 Turing Machine Modifications 1. When we added a stack to NFA to get a PDA, we increased computational power

Minimal DFA. minimal DFA for L starting from any other

Génération aléatoire uniforme pour les réseaux d automates

Today. CS 188: Artificial Intelligence. Recap: Reasoning Over Time. Particle Filters and Applications of HMMs. HMMs

Chapter 4 State-Space Planning

CS103B Handout 18 Winter 2007 February 28, 2007 Finite Automata

Foundation of Diagnosis and Predictability in Probabilistic Systems

Graph States EPIT Mehdi Mhalla (Calgary, Canada) Simon Perdrix (Grenoble, France)

Arrow s Impossibility Theorem

Arrow s Impossibility Theorem

Myhill-Nerode Theorem

SOLUTIONS TO ASSIGNMENT NO The given nonrecursive signal processing structure is shown as

A likelihood-ratio test for identifying probabilistic deterministic real-time automata from positive data

The University of Nottingham SCHOOL OF COMPUTER SCIENCE A LEVEL 2 MODULE, SPRING SEMESTER MACHINES AND THEIR LANGUAGES ANSWERS

Global alignment. Genome Rearrangements Finding preserved genes. Lecture 18

Test Generation from Timed Input Output Automata

Probabilistic Reasoning. CS 188: Artificial Intelligence Spring Inference by Enumeration. Probability recap. Chain Rule à Bayes net

Nondeterministic Finite Automata

Discrete Structures, Test 2 Monday, March 28, 2016 SOLUTIONS, VERSION α

Hidden Markov Model Induction by Bayesian Model Merging

Bisimulation, Games & Hennessy Milner logic

CS 373, Spring Solutions to Mock midterm 1 (Based on first midterm in CS 273, Fall 2008.)

, g. Exercise 1. Generator polynomials of a convolutional code, given in binary form, are g. Solution 1.

Compiler Design. Spring Lexical Analysis. Sample Exercises and Solutions. Prof. Pedro C. Diniz

A Differential Approach to Inference in Bayesian Networks

Behavior Composition in the Presence of Failure

1 This diagram represents the energy change that occurs when a d electron in a transition metal ion is excited by visible light.

System Validation (IN4387) November 2, 2012, 14:00-17:00

Alpha Algorithm: A Process Discovery Algorithm

Learning probabilistic finite automata

Running an NFA & the subset algorithm (NFA->DFA) CS 350 Fall 2018 gilray.org/classes/fall2018/cs350/

22: Union Find. CS 473u - Algorithms - Spring April 14, We want to maintain a collection of sets, under the operations of:

Algorithm Design and Analysis

Yaglom limits can depend on the starting state

LECTURE NOTE #12 PROF. ALAN YUILLE

Strong Bisimulation. Overview. References. Actions Labeled transition system Transition semantics Simulation Bisimulation

Compiler Design. Fall Lexical Analysis. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Linear choosability of graphs

TIME AND STATE IN DISTRIBUTED SYSTEMS

I1 = I2 I1 = I2 + I3 I1 + I2 = I3 + I4 I 3

THE INFLUENCE OF MODEL RESOLUTION ON AN EXPRESSION OF THE ATMOSPHERIC BOUNDARY LAYER IN A SINGLE-COLUMN MODEL

Non-Linear & Logistic Regression

Heavy tail and stable distributions

SECOND HARMONIC GENERATION OF Bi 4 Ti 3 O 12 FILMS

Symmetrical Components 1

Lesson 2.1 Inductive Reasoning

Regular expressions, Finite Automata, transition graphs are all the same!!

Designing finite automata II

arxiv: v1 [quant-ph] 2 Apr 2007

Algorithm Design and Analysis

6.3.2 Spectroscopy. N Goalby chemrevise.org 1 NO 2 CH 3. CH 3 C a. NMR spectroscopy. Different types of NMR

Technische Universität München Winter term 2009/10 I7 Prof. J. Esparza / J. Křetínský / M. Luttenberger 11. Februar Solution

Infinite-Step Opacity of Stochastic Discrete-Event Systems

Approximate Online Inference for Dynamic Markov Logic Networks

Introduction to Olympiad Inequalities

Homework 3 Solutions

AP Calculus BC Chapter 8: Integration Techniques, L Hopital s Rule and Improper Integrals

2.4 Theoretical Foundations

Model Reduction of Finite State Machines by Contraction

Iowa Training Systems Trial Snus Hill Winery Madrid, IA

Supervisory Control under Partial Observation

Random subgroups of a free group

CS 2204 DIGITAL LOGIC & STATE MACHINE DESIGN SPRING 2014

Lesson 2.1 Inductive Reasoning

5. Every rational number have either terminating or repeating (recurring) decimal representation.

Transcription:

Lerning Prtilly Oservle Mrkov s from First Pssge s Jérôme Cllut nd Pierre Dupont Europen Conferene on Mhine Lerning (ECML) 8 Septemer 7 Outline. FPT in models nd sequenes. Prtilly Oservle Mrkov s (POMMs). FPT dynmis in POMMs. POMM indution: POMMSTRUCT 5. Experimentl results Computing siene & engineering dept. (INGI) UCL Mhine Lerning Group HMM indution FPT in models nd sequenes Prolem: Estimte the model struture nd its proilisti prmeters from oserved sequenes. To do wht? Predit the future outomes of the proess Predit when future events will our. s = Speil fous: First Pssge s (FPT) etween events of interest FPT(, ) = {,,,, } Contriution: A novel indution lgorithm to indue models from FPT FPT sttistis n e omputed from models or from sequenes The FPT dynmis of proess denotes its FPT distriutions FPT fetures re time unounded fetures unlike N-grms hrterize long-term dependenies nd temporl dynmis

Prtilly Oservle Mrkov s (POMMs) A POMM is HMM suh tht ny stte emits single letter with proility. The sme letter n e emitted y severl sttes...8.5.8..8.9.9..6. We hve shown tht for ny HMM there is n equivlent POMM. We use POMMs s trget formlism (onvenient FPT omputtions).. FPT dynmis in POMMs. d... FPT(,).. d.. q k... to sorption Proility 5.5 to to sorption The distriutions of FPT in POMMs re of phse-type Merge sttes nd into n soring stte Strt in or ording to the reltive proportion of time spent in these sttes The multimodl distriution revels the presene of dominnt pth lengths Long-term dependenies re lso refleted in the FPT dynmis Prtilly Oservle Mrkov s (POMMs) A POMM is HMM suh tht ny stte emits single letter with proility. The sme letter n e emitted y severl sttes...8.5.8..8.9.9..6. We hve shown tht for ny HMM there is n equivlent POMM. We use POMMs s trget formlism (onvenient FPT omputtions). Sttes n e gthered in loks w.r.t. their unique emission FPT oserved in the sequenes onern these loks. POMM dynmis is poorly pproximted y MC. d... FPT(,).. d.. q k... to sorption Proility 5.5 to to sorption Order MC modeling of the POMM s distriution: (estimted from sequenes of length from the POMM) 5.5 FPT(,).58 d. 5.5. to sorption d Proility 5.5 to sorption 5

POMM indution: POMMStrut EP initilize(s, r); F P T extrtfpt(s); P seletdivpirs(ep, F P T, p); EP POMMPHIT(EP, F P T, P, n r ); Lik FPTLikelihood(EP, F P T ); i repet Lik lst Lik; proebloks(ep i, F P T ); EP i+ ddstteinblok(ep i, ); EP i+ POMMPHIT(EP i+, F P T, P, n r ); Lik FPTLikelihood(EP i+, F P T ); i i + until Lik Lik lst Lik lst return {EP,..., EP i } Feture seletion/weighting Proility Proility.5.. 6 8 5.5..5. 5.5 FPT* -> JS = 6.85e- Empiril FPT* -> JS =.7e- Empiril 5 5 Proility 5.5..5. 5.5.5.. D JS (P P ) = H(M) H(P ) H(P ) where M = (P + P ) nd H(.) is the Shnnon entropy Proility FPT* -> JS =.6e- Empiril 6 8 6 8 FPT* -> JS = 8.8e- Empiril 6 8 6 8 FPT pirs re filtered/weighted ording to their JS divergene 6 8 POMM indution: POMMStrut EP initilize(s, r); F P T extrtfpt(s); P seletdivpirs(ep, F P T, p); EP POMMPHIT(EP, F P T, P, n r ); Lik FPTLikelihood(EP, F P T ); i repet Lik lst Lik; proebloks(ep i, F P T ); EP i+ ddstteinblok(ep i, ); EP i+ POMMPHIT(EP i+, F P T, P, n r ); Lik FPTLikelihood(EP i+, F P T ); i i + until Lik Lik lst Lik lst return {EP,..., EP i } POMM indution: POMMStrut EP initilize(s, r); F P T extrtfpt(s); P seletdivpirs(ep, F P T, p); EP POMMPHIT(EP, F P T, P, n r ); Lik FPTLikelihood(EP, F P T ); i repet Lik lst Lik; proebloks(ep i, F P T ); EP i+ ddstteinblok(ep i, ); EP i+ POMMPHIT(EP i+, F P T, P, n r ); Lik FPTLikelihood(EP i+, F P T ); i i + until Lik Lik lst Lik lst return {EP,..., EP i } 7 9

Prmeter estimtion: POMMPHit FPT(,) = {z,,z } l FPT(,) = {z,,z } l +? ML estimtion POMMPHIT is novel EM-sed lgorithm to mximize the FPT likelihood Eh z i is prtil oservtion of ouple (z i, h i ) where h i is the sequene of sttes rehed during the FPT z i Re-estimtion formuls re derived to mximize E [P (Z, H ρ) Z] Additionlly, trimming proedure removes the trnsitions with the lowest expeted pssge times..5.5. POMM indution: POMMStrut EP initilize(s, r); F P T extrtfpt(s); P seletdivpirs(ep, F P T, p); EP POMMPHIT(EP, F P T, P, n r ); Lik FPTLikelihood(EP, F P T ); i repet Lik lst Lik; proebloks(ep i, F P T ); EP i+ ddstteinblok(ep i, ); EP i+ POMMPHIT(EP i+, F P T, P, n r ); Lik FPTLikelihood(EP i+, F P T ); i i + until Lik Lik lst Lik lst return {EP,..., EP i } Reestimtion formul Expettion step S, (q) = N, (q, q ) = Mximiztion step σ κ q = A qq = l k= l k= z k k σqβ (q, z k ) k q κ α, (q, z k ) t= α, (q, t)a qq β (q, z k t ) q κ α, (q, z k ) { (,) P} S, (q) q κ q Q { (,) P} S, (q) if q κ otherwise (,) P N, (q, q ) (,) P N, (q, q ) Computtionl omplexity: O(p L n t ) per itertion where L is the longest FPT nd n t is the numer of trnsitions Adding stte in lok p p k Pred( ) Su( ) q q l s s m Pred( ) Su( ) p P red( ), nd Su( ) need not e disjoint Lol trnsitions re initilized following their type (olor here) POMMPHIT first only estimtes lol trnsitions. The trimming proedure is pplied to these trnsitions. The omplete model is then reestimted with POMMPHIT nd ll trnsitions re ndidte to e trimmed p k q q l q s s m

POMM indution: POMMStrut EP initilize(s, r); F P T extrtfpt(s); P seletdivpirs(ep, F P T, p); EP POMMPHIT(EP, F P T, P, n r ); Lik FPTLikelihood(EP, F P T ); i repet Lik lst Lik; proebloks(ep i, F P T ); EP i+ ddstteinblok(ep i, ); EP i+ POMMPHIT(EP i+, F P T, P, n r ); Lik FPTLikelihood(EP i+, F P T ); i i + until Lik Lik lst Lik lst return {EP,..., EP i } Conlusion nd future work We proposed novel pproh to indue POMMs sed on the FPT dynmis oserved in the smple The FPT re informtive out the struture of the model Struturl indution is mde y itertive stte ddition nd y trnsition trimming Prmeter estimtion is performed y POMMPHIT whih mximizes the model likelihood w.r.t. the oserved FPT Future work Return HMM rther thn POMM Fit FPT etween higher order events 6 Experimentl results FPT Divergene.5..5. 5.5 GenDep : sttes, Σ = POMMStrut Bum-Welh Stolke. Trining dt rtio Perplexity.5.5.5 GenDep : sttes, Σ = POMMStrut Bum-Welh Stolke.5. Trining dt rtio HMM indution overview Prmeter estimtion Stte merging Disrete HMM indution Stte splitting/dding Disrimintive tehniques FPT Divergene....8.6. Splie : Exon -> Intron POMMStrut Bum-Welh Stolke.. Trining dt rtio Perplexity.5..5.95.9.85.8.75.7.65 Splie : Exon -> Intron POMMStrut Bum-Welh Stolke. Trining dt rtio Bum-Welh/ grdient-bsed Brnd ALERGIA/ MDI Stolke Orstendorf POMMStrut Conditionl likelihood Mrgin-sed tehniques - Lol optimiztion - Strong is towrds - Restrited to PDFA - No ler improvement over B-W left-to-right - - Restrited to - - No stndrd method sprse models -> not for prmeter estimtion Not lwys onerned - Long-term dynmis topology with HMM expliitely lwys pproprite dly estimted - Diffiult to estimte in - Often fous on prtie sequene leling - Unsupervized tehniques re time onsuming Contriution 5 7