An introduction to PRISM and its applications

Similar documents
Logic-based probabilistic modeling language. Syntax: Prolog + msw/2 (random choice) Pragmatics:(very) high level modeling language

EM algorithm and applications Lecture #9

Natural Language Processing CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science

The E-M Algorithm in Genetics. Biostatistics 666 Lecture 8

Phasing via the Expectation Maximization (EM) Algorithm

Chapter 2. Review of basic Statistical methods 1 Distribution, conditional distribution and moments

LECTURER: BURCU CAN Spring

Natural Language Processing : Probabilistic Context Free Grammars. Updated 5/09

Lecture 2: Probability, Naive Bayes

A* Search. 1 Dijkstra Shortest Path

The PITA System for Logical-Probabilistic Inference

Probabilistic Context-free Grammars

CS 188 Introduction to AI Fall 2005 Stuart Russell Final

The Infinite PCFG using Hierarchical Dirichlet Processes

Probabilistic Context Free Grammars

CS460/626 : Natural Language

2. Map genetic distance between markers

Lecture 15. Probabilistic Models on Graph

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov

Ryosuke Kojima & Taisuke Sato Tokyo Institute of Technology

Suppose h maps number and variables to ɛ, and opening parenthesis to 0 and closing parenthesis

Automata Theory Final Exam Solution 08:10-10:00 am Friday, June 13, 2008

Expectation Maximization (EM)

Natural Language Processing

LECTURE # How does one test whether a population is in the HW equilibrium? (i) try the following example: Genotype Observed AA 50 Aa 0 aa 50

Educational Items Section

Introduction to Bayesian Learning. Machine Learning Fall 2018

Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing

Directed Probabilistic Graphical Models CMSC 678 UMBC

Structured Output Prediction: Generative Models

CS460/626 : Natural Language

In this chapter, we explore the parsing problem, which encompasses several questions, including:

Grammars and introduction to machine learning. Computers Playing Jeopardy! Course Stony Brook University

Association studies and regression

Probabilistic Context-Free Grammar

Statistical Methods for NLP

Context- Free Parsing with CKY. October 16, 2014

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

Statistical NLP for the Web Log Linear Models, MEMM, Conditional Random Fields

Processing/Speech, NLP and the Web

10/17/04. Today s Main Points

PCFGs 2 L645 / B659. Dept. of Linguistics, Indiana University Fall PCFGs 2. Questions. Calculating P(w 1m ) Inside Probabilities

Spectral Unsupervised Parsing with Additive Tree Metrics

This kind of reordering is beyond the power of finite transducers, but a synchronous CFG can do this.

Midterm sample questions

Lagrange Multipliers

EXERCISES FOR CHAPTER 3. Exercise 3.2. Why is the random mating theorem so important?

Unit 2: Tree Models. CS 562: Empirical Methods in Natural Language Processing. Lectures 19-23: Context-Free Grammars and Parsing

CS626: NLP, Speech and the Web. Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 14: Parsing Algorithms 30 th August, 2012

Lecture 13: Structured Prediction

Lecture 5: UDOP, Dependency Grammars

S NP VP 0.9 S VP 0.1 VP V NP 0.5 VP V 0.1 VP V PP 0.1 NP NP NP 0.1 NP NP PP 0.2 NP N 0.7 PP P NP 1.0 VP NP PP 1.0. N people 0.

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification,

Parsing. Based on presentations from Chris Manning s course on Statistical Parsing (Stanford)

Linkage and Linkage Disequilibrium

Computers playing Jeopardy!

An Introduction to Bioinformatics Algorithms Hidden Markov Models

Lecture 4 Bayes Theorem

Lecture 4 Bayes Theorem

Parsing. Probabilistic CFG (PCFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 22

Hidden Markov Models

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Language Processing with Perl and Prolog

Case-Control Association Testing. Case-Control Association Testing

Goodness of Fit Goodness of fit - 2 classes

CKY & Earley Parsing. Ling 571 Deep Processing Techniques for NLP January 13, 2016

Hidden Markov Models

Probabilistic Linguistics

Lecture 12: Algorithms for HMMs

Advanced Natural Language Processing Syntactic Parsing

Quiz 1, COMS Name: Good luck! 4705 Quiz 1 page 1 of 7

Lecture 3: Probability

Unit 1: Sequence Models

7.014 Problem Set 6 Solutions

Penn Treebank Parsing. Advanced Topics in Language Processing Stephen Clark

Expectation Maximization (EM)

A Logic-based Approach to Generatively Defined Discriminative Modeling

Lecture 12: Algorithms for HMMs

Probabilistic Graphical Models

06 From Propositional to Predicate Logic

The Monte Carlo Method: Bayesian Networks

CMSC 723: Computational Linguistics I Session #5 Hidden Markov Models. The ischool University of Maryland. Wednesday, September 30, 2009

Einführung in die Computerlinguistik

To make a grammar probabilistic, we need to assign a probability to each context-free rewrite

Spectral Learning for Non-Deterministic Dependency Parsing

EXAM. CS331 Compiler Design Spring Please read all instructions, including these, carefully

Computational Models - Lecture 3

Naïve Bayes classification

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.

Observing Patterns in Inherited Traits

Probabilistic Context-Free Grammars. Michael Collins, Columbia University

Computational Models - Lecture 4

Introduction to Probabilistic Reasoning. Image credit: NASA. Assignment

Conditional Random Field

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.

Statistical Genetics I: STAT/BIOST 550 Spring Quarter, 2014

Linear Classifiers IV

Allele Frequency Estimation

Statistical learning. Chapter 20, Sections 1 4 1

Text Mining. March 3, March 3, / 49

Transcription:

An introduction to PRISM and its applications Yoshitaka Kameya Tokyo Institute of Technology 2007/9/17 FJ-2007 1

Contents What is PRISM? Two examples: from population genetics from statistical natural language processing Details of PRISM system Applications 2007/9/17 FJ-2007 2

What is PRISM? PRISM = PRogramming In Statistical Modeling PRISM is a programming language/system for probabilistic modeling Probabilistic extension of Prolog Tool for probabilistic modeling with complex data beyond traditional data matrices sequences, trees, graphs and relations 2007/9/17 FJ-2007 3

What is PRISM? - example (1) Inheritance of ABO blood type father (type A) Hardy-Weinberg's law P P P P A B O AB = = = = θ θ θ a 2 a 2 b 2 o 2θ a θ o + 2θ b a b o θ + 2θ θ o o b b o mother (type B) daughter (type B) P A, P B, P O, P AB : Frequencies of blood types θ a, θ b, θ o : Frequencies of genes 2007/9/17 FJ-2007 4

What is PRISM? - example (1) Problem: Estimate the frequencies {θ a, θ b, θ o } of genes from the frequencies {P A, P B, P O, P AB } of blood types Sometimes gene frequencies are useful to characterize the population of interest Note: we can only observe the blood types there is an ambiguity blood type A genotype AA? genotype AO? 2007/9/17 FJ-2007 5

What is PRISM? - example (1) Solving this estimation problem by PRISM PRISM program: values(gene,[a,b,o]). genotype(x,y) :- msw(gene,x), msw(gene,y). Run the EM (expectation-maximization) algorithm provided by PRISM system:?- learn. We use a random switch named 'gene' whose parameters correspond to gene frequencies Pick up two genes X and Y from a pool of genes bloodtype(p) :- genotype(x,y), ( X=Y -> P=X ; X=o -> P=Y ; Y=o -> P=X ; P=ab ). Mapping from genotype to phenotype using OR (;) and if-then (->) 2007/9/17 FJ-2007 6

What is PRISM? - example (1) Demo 2007/9/17 FJ-2007 7

What is PRISM? - example (2) Probabilistic parsing using probabilistic context-free grammars (PCFGs) Example: s np vp s vp np noun np noun pp np noun np vp verb vp verb np vp verb pp vp verb np pp pp prep np (0.8) (0.2) (0.4) (0.4) (0.2) (0.3) (0.3) (0.2) (0.2) (1.0) verb swat verb flies verb like noun swat noun flies noun ants prep like (0.2) (0.4) (0.4) (0.05) (0.45) (0.5) (1.0) from [Charniak 93] A sentence is derived by applying probabilistic rules: s vp verb np swat np swat noun pp swat files pp swat flies prep np swat flies like np swat flies like noun swat flies like ants 2007/9/17 FJ-2007 8

What is PRISM? - example (2) Probabilistic parsing: Compute the most probable parse for a given sentence (e.g. "swat flies like ants") t 1 : verb s vp noun np pp prep np noun t 2 : noun np s np noun vp verb np noun swat flies like ants P(t 1 ) = 0.2*0.3*0.2*0.4*0.45*1.0* 1.0*0.4*0.5 = 0.000432 swat flies like ants P(t 2 ) = 0.8*0.2*0.05*0.4*0.45* 0.3*0.4*0.4*0.5 = 0.00003456 2007/9/17 FJ-2007 9

What is PRISM? - example (2) PRISM program for PCFGs: LHS is replaced with RHS values(s,[[np,vp],[vp]]). pcfg(l):- pcfg(s,l-[]). by a rule 'LHS RHS' values(np,[[noun], [noun,pp], pcfg(lhs,l0-l1):- To generate a sentence L, [noun,np]]). ( nonterminal(lhs) -> we start with the symbol 's' msw(lhs,rhs),proj(rhs,l0-l1) ; L0 = [LHS L1] Symbols in RHS are ). processed one by one values(vp,[[verb], [verb,np], [verb,pp], [verb,np,pp]]). values(pp,[[prep,np]]). values(verb,[[swat],[flies],[like]]). values(noun,[[swat],[flies],[ants]]). values(prep,[[like]]). proj([],l-l). proj([x Xs],L0-L1):- pcfg(x,l0-l2),proj(xs,l2-l1). nonterminal(s). We use random switches msw(noun,[swat]), nonterminal(np). msw(noun,[flies]), msw(noun,[ants]) to make a choice among : the rules noun swat, noun flies, noun ants. 2007/9/17 FJ-2007 10

What is PRISM? - example (2) Probabilistic parsing by the viterbif routine:?- viterbif(pcfg([swat,flies,like,ants])). pcfg([swat,flies,like,ants]) <= pcfg(s,[swat,flies,like,ants]-[]) pcfg(s,[swat,flies,like,ants]-[]) <= pcfg(vp,[swat,flies,like,ants]-[]) & msw(s,[vp]) : pcfg(noun,[ants]-[]) <= pcfg(ants,[ants]-[]) & msw(noun,[ants]) pcfg(ants,[ants]-[]) Viterbi_P = 0.000432 The most probable parse is [[swat verb [flies noun [like prep [ants noun ] np ] pp ] np ] vp ] s and its generative probability is 0.000432 2007/9/17 FJ-2007 11

What is PRISM? - example (2) Demo 2007/9/17 FJ-2007 12

What is PRISM? Merit of PRISM programming High expressivity from first-order representations We can write our model in a compact and readable form Declarative semantics (distribution semantics [Sato 95]) A lot of built-ins for ease of probabilistic modeling We need not to derive/implement model-specific probabilistic inference algorithms All we need to do is write our model 2007/9/17 FJ-2007 13

Basic probabilistic inferences PRISM system provides built-in predicates for: Sampling For a given goal G, return answer substitution σ with the probability P θ (Gσ) Probability computation For a given goal G, compute P θ (G) Viterbi computation For a given goal G, find the most probable explanation E * = argmax E ψ (G) P θ (E) where ψ (G)={E 1, E 2,..., E K } are possible explanations for G Hindsight computation For a given goal G, compute P θ (G') or P θ (G' G) where G' is a subgoal of G EM learning Given a bag {G 1, G 2,..., G T } of goals, estimate the parameters θ that maximizes the likelihood Π t P θ (G t ) 2007/9/17 FJ-2007 14

Generative modeling Basically we need to model the generation process of the data at hand Just like a derivation process of a sentence using PCFGs Key assumptions in PRISM: Exclusiveness condition Uniqueness condition... probabilistic choice Exclusiveness condition: Generation paths are exclusive to each other G 1 G 2 G 3 G 4 Uniqueness condition: start G's are exclusive and Σ G P(G) = 1 2007/9/17 FJ-2007 15

Efficient probabilistic inferences Data flows in EM learning of parameters PRISM program + Observed data (Training data) Exhaustive search for explanations with tabling Tabled solutions Extraction + Sorting Explanation graphs (AND/OR graphs) Dynamic-programming-based EM algorithm This general learning procedure works as efficiently as the existing model-specific EM algorithms Estimated (trained) parameters 2007/9/17 FJ-2007 16

Advanced probabilistic inferences Handling failures in the generation process (version 1.8) Model selection (version 1.10) Variational Bayesian learning (version 1.11) Data-parallel EM learning (version 1.11) Deterministic annealing EM algorithm (version 1.11) Basically there is no need to modify the model part of the program 2007/9/17 FJ-2007 17

Applications (1) Another hypothesis on blood type inheritance (from J. F. Crow (1983), Genetics Notes) Theory I: alleles at two loci -- A/a and B/b Phenotype Genotype in theory I Genotype in theory II A A- bb AA, AO B aa B- BB, BO O aa bb OO AB A- B- AB (known as correct now) Goal: compare two theories by BIC (Baysian Information Criterion), a well-known Bayesian model score 2007/9/17 FJ-2007 18

Applications (1) Another theory on blood type inheritance PRISM program for theory I: values(locus1,['a',a]). values(locus2,['b',b]). genotype(l,x,y) :- msw(l,x), msw(l,y). bloodtype(p) :- genotype(locus1,x1,y1), % 1 st locus genotype(locus2,x2,y2), % 2 nd locus ( X1=a, Y1=a, X2=b, Y2=b -> P=o ; ( X1='A' ; Y1='A' ), X2=b, Y2=b -> P=a ; X1=a, Y1=a, ( X2='B' ; Y2='B') -> P=b ; P=ab ). 100 samples from typical Japanese: Bloodtype A B O AB #persons 38 22 31 9 2007/9/17 FJ-2007 19

Applications (1) Another theory on blood type inheritance Run the programs for theories I and II:?- prism(bloodaabb). :?- learn.?- prism_statistics(bic,bic). : BIC = -135.64984669945866??- prism(bloodabo) :?- learn.?- prism_statistics(bic,bic). : BIC = -132.66708143397111? The value of BIC can be obtained after learning Theory II (ABO) has higher BIC score than that of Theory I (AaBb) Theory II is more reasonable according to the data (χ 2 -test is used in the previous approach) 2007/9/17 FJ-2007 20

Conclusion PRISM is a general programming language/system for probabilistic modeling PRISM language allows us to handle complex data such as sequences, trees, graphs, relations,... PRISM system provides a lot of built-ins for ease of probabilistic modeling We believe PRISM is fairly useful but currently its main target is generative models Eliminating the modeling assumptions is future work 2007/9/17 FJ-2007 21

Further materials... Please visit: http://sato-www.cs.titech.ac.jp/prism/ Latest version: 1.11 beta 7 You can download: Software Papers 2007/9/17 FJ-2007 22