Deductive and Inductive Probabilistic Programming

Size: px

Start display at page:

Download "Deductive and Inductive Probabilistic Programming"

Agatha Cain
5 years ago
Views:

1 Deductive and Inductive Probabilistic Programming Fabrizio Riguzzi F. Riguzzi Deductive and Inductive PP 1 / 43

2 Outline Probabilistic programming Probabilistic logic programming Inference Learning Applications F. Riguzzi Deductive and Inductive PP 2 / 43

3 Probabilistic Programming Users specify a probabilistic model in its entirety (e.g., by writing code that generates a sample from the joint distribution) and inference follows automatically given the specification. PP languages provide the full power of modern programming languages for describing complex distributions Reuse of libraries of models Interactive modeling Abstraction F. Riguzzi Deductive and Inductive PP 3 / 43

4 Probabilistic Programming F. Riguzzi Deductive and Inductive PP 4 / 43

5 Probabilistic Programming Languages Name Extends from Host language Venture Scheme C++ Probabilistic-C C C Anglican Scheme Clojure IBAL OCaml PRISM B-Prolog Infer.NET.NET Framework.NET Framework dimple MATLAB, Java chimple MATLAB, Java BLOG Java PSQL SQL BUGS FACTORIE Scala PMTK MATLAB MATLAB Alchemy C++ Dyna Prolog Name Extends from Host language Figaro Scala Church Scheme JavaScript, Scheme ProbLog Prolog Python, Jython ProBT C++, Python Stan (software) C++ Hakaru Haskell Haskell BAli-Phy (software) Haskell C++ ProbCog Java, Python Gamble Racket Tuffy Java PyMC Python Python Lea Python Python WebPPL JavaScript JavaScript Picture Julia Julia Turing.jl Julia Julia Source programming_language F. Riguzzi Deductive and Inductive PP 5 / 43

6 Probabilistic Programming Languages Only three are logic, the others imperative/functional/object oriented DARPA released in 2013 the funding call Probabilistic Programming for Advancing Machine Learning (PPAML) Aim: develop probabilistic programming languages and accompanying tools to facilitate the construction of new machine learning applications across a wide range of domains. Focus: functional PP F. Riguzzi Deductive and Inductive PP 6 / 43

7 Probabilistic Logic Programming What are we missing? Is logic programming to blame? F. Riguzzi Deductive and Inductive PP 7 / 43

8 Thesis Probabilistic logic programming is alive and kicking! F. Riguzzi Deductive and Inductive PP 8 / 43

9 Strengths Relationships are first class citizens Conceptually easier to lift Strong semantics Inductive systems F. Riguzzi Deductive and Inductive PP 9 / 43

10 Weaknesses Handling non-termination Continuous variables F. Riguzzi Deductive and Inductive PP 10 / 43

11 Non-termination Possible when the number of explanations for the query is infinite F. Riguzzi Deductive and Inductive PP 11 / 43

12 Non-termination: Inducing Arithmetic Functions Church code (define (random-arithmetic-fn) (if (flip 0.3) (random-combination (random-arithmetic-fn) (random-arithmetic-fn)) (if (flip) (lambda (x) x) (random-constant-fn)))) (define (random-combination f g) (define op (uniform-draw (list + -))) (lambda (x) (op (f x) (g x)))) (define (random-constant-fn) (define i (sample-integer 10)) (lambda (x) i)) F. Riguzzi Deductive and Inductive PP 12 / 43

13 Non-termination: Inducing Arithmetic Functions LPAD (cplint) code example/inference/arithm.pl eval(x,y):- random_fn(x,0,f), Y is F. op(l,+):0.5;op(l,-):0.5. random_fn(x,l,f):- comb(l), random_fn(x,l(l),f1), random_fn(x,r(l),f2), op(l,op), F=..[Op,F1,F2]. random_fn(x,l,f):- \+ comb(l), base_random_fn(x,l,f). comb(_):0.3. base_random_fn(x,l,x):- identity(l). base_random_fn(_x,l,c):- \+ identity(l), random_const(l,c). identity(_):0.5. random_const(_,c):discrete(c,[0:0.1,1:0.1,2:0.1,3:0.1,4:0.1, 5:0.1,6:0.1,7:0.1,8:0.1,9:0.1]). F. Riguzzi Deductive and Inductive PP 13 / 43

14 Non-termination: Inducing Arithmetic Functions Aim: given observations of couples input-output for the random function, predict the output for a new input Arbitrarily complex functions have a non-zero probability of being selected The program has a non-terminating execution. Exact inference: infinite number of explanations F. Riguzzi Deductive and Inductive PP 14 / 43

15 Non-termination: Inducing Arithmetic Functions (define (sample) (rejection-query (define my-proc (random-arithmetic-fn)) (my-proc 2) (= (my-proc 1) 3))) (hist (repeat 100 sample)) F. Riguzzi Deductive and Inductive PP 15 / 43

16 Solution Use (T. Sato, P. Meyer, Infinite probability computation by cyclic explanation graphs, Theor. Pract. Log. Prog ) or (A. Gorlin, C. R. Ramakrishnan, S. A. Smolka, Model checking with proba- bilistic tabled logic programming, Theor. Pract. Log. Prog. 12 (4-5) 2012) or resort to sampling: with the increase of complexity, the probability of functions tend to 0 and the probability of the infinite trace is 0 Metropolis Hastings: (Nampally, A., Ramakrishnan, C.: Adaptive MCMC-based inference in probabilistic logic programs. arxiv preprint arxiv: ) Monte Carlo sampling is attractive for the simplicity of its implementation and because you can improve the estimate as more time is available. F. Riguzzi Deductive and Inductive PP 16 / 43

17 Monte Carlo The disjunctive clause C r = H 1 : α 1... H n : α n L 1,..., L m. is transformed into the set of clauses MC(C r ) MC(C r, 1) = H 1 L 1,..., L m, sample_head(n, r, VC, NH), NH = MC(C r, n) = H n L 1,..., L m, sample_head(n, r, VC, NH), NH = n. Sample truth value of query Q:... (call(q)->... NT1 is NT+1 ; NT1 =NT), F. Riguzzi Deductive and Inductive PP 17 / 43

18 Metropolis-Hastings MCMC A Markov chain is built by taking an initial sample and by generating successor samples. The initial sample is built by randomly sampling choices so that the evidence is true. A successor sample is obtained by deleting a fixed number of sampled probabilistic choices. Then the evidence is queried If the query succeeds, the goal is queried. The sample is accepted with a probability of min{1, N 0 N 1 } where N 0 (N 1 ) is the number of choices sampled in the previous (current) sample. F. Riguzzi Deductive and Inductive PP 18 / 43

19 Solution In cplint:?- mc_mh_sample(eval(2,4),eval(1,3),100,100,3,t,f,p). Probability of eval(2,4) given that eval(1,3) is true F = 90, T = 10, P = 0.1 You can also try rejection sampling (usually slower)?- mc_rejection_sample(eval(2,4),eval(1,3),100, T,F,P). F. Riguzzi Deductive and Inductive PP 19 / 43

20 Solution You may be interested in the distribution of the output In cplint:?- mc_mh_sample_arg_bar(eval(2,y),eval(1,3),100, 100,3,Y,V). F. Riguzzi Deductive and Inductive PP 20 / 43

21 Solution You may be interested in the expected value of the output In cplint:?- mc_mh_expectation(eval(2,y),eval(1,3), 100,100,3,Y,E). E = 3.21 F. Riguzzi Deductive and Inductive PP 21 / 43

22 Continuous Random Variables Distributional clauses (B. Gutmann, I. Thon, A. Kimmig, M. Bruynooghe, and L. De Raedt, The magic of logical inference in probabilistic programming, Theory and Practice of Logic Programming, 2011) Gaussian mixture model in cplint: heads:0.6;tails:0.4. g(x): gaussian(x,0, 1). h(x): gaussian(x,5, 2). mix(x) :- heads, g(x). mix(x) :- tails, h(x). F. Riguzzi Deductive and Inductive PP 22 / 43

23 Continuous Random Variables Inference by sampling Without evidence or evidence on discrete random variables, you can reuse the same methods Sampling arguments of goals for building a probability density of the arguments. F. Riguzzi Deductive and Inductive PP 23 / 43

24 Gaussian Mixture Model heads:0.6;tails:0.4. g(x): gaussian(x,0, 1). h(x): gaussian(x,5, 2). mix(x) :- heads, g(x). mix(x) :- tails, h(x).?- mc_sample_arg(mix(x),10000,x,l0), histogram(l0,40,chart). F. Riguzzi Deductive and Inductive PP 24 / 43

25 Evidence on Continuous Random Variables You cannot use rejection sampling or Metropolis-Hastings, as the probability of the evidence is 0 You can use likelihood weighting to obtain samples of continuous arguments of a goal. (Nitti, D., De Laet, T., De Raedt, L.: Probabilistic logic programming for hybrid relational domains. Mach. Learn. 103(3), ) F. Riguzzi Deductive and Inductive PP 25 / 43

26 Likelihood Weighting For each sample to be taken, likelihood weighting samples the query and then assigns a weight to the sample on the basis of evidence. The weight is computed by deriving the evidence backward in the same sample of the query starting with a weight of one Each time a choice should be taken or a continuous variable sampled, if the choice/variable has already been taken, the current weight is multiplied by probability of the choice/by the density value of the continuous value. F. Riguzzi Deductive and Inductive PP 26 / 43

27 Bayesian Estimation Problem from examples/viewer/?worksheet=gaussian-posteriors Estimate the true value of a Gaussian distributed random variable, given some observed data. The variance is known and we suppose that the mean has itself a Gaussian distribution with mean 1 and variance 5 (prior on the parameter) We take different measurement (e.g. at different times), indexed with an integer. F. Riguzzi Deductive and Inductive PP 27 / 43

Bayesian Estimation Anglican code (def dataset [9 8]) (defquery gaussian-model [data] (let [mu (sample (normal 1 (sqrt 5))) sigma (sqrt 2)] (doall (map (fn [x] (observe (normal mu sigma) x)) data))

28 Bayesian Estimation Anglican code (def dataset [9 8]) (defquery gaussian-model [data] (let [mu (sample (normal 1 (sqrt 5))) sigma (sqrt 2)] (doall (map (fn [x] (observe (normal mu sigma) x)) data)) mu)) (def posterior ((conditional gaussian-model :smc :number-of-particles 10) dataset)) (def posterior-samples (repeatedly #(sample* posterior))) F. Riguzzi Deductive and Inductive PP 28 / 43

29 Bayesian Estimation cplint code example/inference/gauss_mean_est.pl value(i,x) :- mean(m), value(i,m,x). mean(m): gaussian(m,1.0, 5.0). value(_,m,x): gaussian(x,m, 2.0).?- mc_sample_arg(value(0,y),10000,y,l0), mc_lw_sample_arg(value(0,x),(value(1,9),value(2,8)),10000,x,l), densities(l0,l,40,chart). F. Riguzzi Deductive and Inductive PP 29 / 43

30 Learning Parameter learning Structure learning more developed for PLP, but (Perov, Yura N., and Frank D. Wood. Learning Probabilistic Programs. arxiv preprint arxiv: ). (Lake, Brenden M., Ruslan Salakhutdinov, and Joshua B. Tenenbaum. Human-level concept learning through probabilistic program induction. Science ). (Gaunt, Alexander L., et al. TerpreT: A Probabilistic Programming Language for Program Induction. arxiv preprint arxiv: ). F. Riguzzi Deductive and Inductive PP 30 / 43

31 Parameter Learning Problem: given a set of interpretations, a program, find the parameters maximizing the likelihood of the interpretations (or of instances of a target predicate) Exploit the equivalence with BN to use BN learning algorithms The interpretations record the truth value of ground atoms, not of the choice variables Unseen data: relative frequency can t be used F. Riguzzi Deductive and Inductive PP 31 / 43

32 Parameter Learning (Thon et al. ECML 2008) proposed an adaptation of EM for CPT-L, a simplified version of LPADs The algorithm computes the counts efficiently by repeatedly traversing the BDDs representing the explanations (Ishihata et al. ILP 2008) independently proposed a similar algorithm LFI-PROBLOG (Gutamnn et al. ECML 2011) is the adaptation of EM to ProbLog EMBLEM (Riguzzi & Bellodi IDAJ 2013) adapts (Ishihata et al. ILP 2008) to LPADs F. Riguzzi Deductive and Inductive PP 32 / 43

33 Structure Learning Given a trivial LPAD or an empty one, a set of interpretations (data) Find the model and the parameters that maximize the probability of the data (log-likelihood) SLIPCOVER: Structure LearnIng of Probabilistic logic program by searching OVER the clause space 1 Beam search in the space of clauses to find the promising ones 2 Greedy search in the space of probabilistic programs guided by the LL of the data. Parameter learning by means of EMBLEM F. Riguzzi Deductive and Inductive PP 33 / 43

34 Applications Link prediction: given a (social) network, compute the probability of the existence of a link between two entities (UWCSE) advisedby(x, Y) :0.3 :- publication(p, X), publication(p, Y), student(x). F. Riguzzi Deductive and Inductive PP 34 / 43

coursepage(page1): 0.3 :- linkto(page2,page1),facultypage(page2).

35 Applications Classify web pages on the basis of the link structure (WebKB) coursepage(page1): 0.3 :- linkto(page2,page1),coursepage(page2). coursepage(page1): 0.3 :- linkto(page2,page1),facultypage(page2).... coursepage(page): 0.3 :- has( abstract,page).... F. Riguzzi Deductive and Inductive PP 35 / 43

samebib(b,c):0.3 :- author(b,d),author(c,e),sameauthor(d,e). samebib(b,c):0.3 :- title(b,d),title(c,e),sametitle(d,e). samebib(b,c):0.3 :- venue(b,d),venue(c,e),samevenue(d,e).

36 Applications Entity resolution: identify identical entities in text or databases samebib(a,b):0.3 :- samebib(a,c), samebib(c,b). sameauthor(a,b):0.3 :- sameauthor(a,c), sameauthor(c,b). sametitle(a,b):0.3 :- sametitle(a,c), sametitle(c,b). samevenue(a,b):0.3 :- samevenue(a,c), samevenue(c,b). samebib(b,c):0.3 :- author(b,d),author(c,e),sameauthor(d,e). samebib(b,c):0.3 :- title(b,d),title(c,e),sametitle(d,e). samebib(b,c):0.3 :- venue(b,d),venue(c,e),samevenue(d,e). samevenue(b,c):0.3 :- haswordvenue(b,word_06), haswordvenue(c,word_06).... F. Riguzzi Deductive and Inductive PP 36 / 43

Applications Chemistry: given the chemical composition of a substance, predict its mutagenicity or its carcenogenicity active(a):0.5 :- atm(a,b,c,29,c), gteq(c,-0.003), ring_size_5(a,d). active(a):0.5 :- lumo(a,b), lteq(b,-2.

37 Applications Chemistry: given the chemical composition of a substance, predict its mutagenicity or its carcenogenicity active(a):0.5 :- atm(a,b,c,29,c), gteq(c,-0.003), ring_size_5(a,d). active(a):0.5 :- lumo(a,b), lteq(b,-2.072). active(a):0.5 :- bond(a,b,c,2), bond(a,c,d,1), ring_size_5(a,e). active(a):0.5 :- carbon_6_ring(a,b). active(a):0.5 :- anthracene(a,b).... F. Riguzzi Deductive and Inductive PP 37 / 43

38 Applications Medicine: diagnose diseases on the basis of patient information (Hepatitis), influence of genes on HIV, risk of falling of elderly people (FFRAT) F. Riguzzi Deductive and Inductive PP 38 / 43

39 Experiments - Area Under the PR Curve System HIV UW-CSE Mondial SLIPCOVER 0.82 ± ± ± 0.07 SLIPCASE 0.78 ± ± ± 0.06 LSM 0.37 ± ± ALEPH ± ± 0.07 RDN-B 0.28 ± ± ± 0.07 MLN-BT 0.29 ± ± ± 0.10 MLN-BC 0.51 ± ± ± 0.09 BUSL 0.38 ± ± F. Riguzzi Deductive and Inductive PP 39 / 43

40 Experiments - Area Under the PR Curve System Carcinogenesis Mutagenesis Hepatitis SLIPCOVER ± ± 0.01 SLIPCASE ± ± 0.05 LSM ± 0.04 ALEPH ± RDN-B ± ± 0.01 MLN-BT ± ± 0.02 MLN-BC ± ± 0.02 BUSL ± 0.03 F. Riguzzi Deductive and Inductive PP 40 / 43

41 PLP Online Inference (knwoledge compilation, Monte Carlo) Parameter learning (EMBLEM) Structure learning (SLIPCOVER) Inference (knwoledge compilation, Monte Carlo) Parameter learning (LFI-ProbLog) F. Riguzzi Deductive and Inductive PP 41 / 43

42 Conclusions PLP is still a fertile field but......we must look at other communities and build bridges and......join forces! Much is left to do: Tractable sublanguages (see following talk) Lifted inference Structure/Parameter learning (also for programs with continuous variables) F. Riguzzi Deductive and Inductive PP 42 / 43

43 F. Riguzzi Deductive and Inductive PP 43 / 43

Probabilistic Logical Inference On the Web

Probabilistic Logical Inference On the Web Marco Alberti 1, Giuseppe Cota 2, Fabrizio Riguzzi 1, and Riccardo Zese 2 1 Dipartimento di Matematica e Informatica University of Ferrara Via Saragat 1, I-44122,