Mixtures of Truncated Basis Functions

Size: px

Start display at page:

Download "Mixtures of Truncated Basis Functions"

Penelope Perry
5 years ago
Views:

1 Mixtures of Truncated Basis Functions Helge Langseth, Thomas D. Nielsen, Rafael Rumí, and Antonio Salmerón This work is supported by an Abel grant from Iceland, Liechtenstein, and Norway through the EEA Financial Mechanism (Nils mobility project). Supported and Coordinated by Universidad Complutense de Madrid. Mixtures of Truncated Basis Functions 1

2 The ground work Mixture of Truncated Exponentials potential (Moral, Rumí, & Salmerón, 1) e 1.7z if 3 z < e 1.64z if 1 z < f(z) = e 1.64z if z < e 1.7z if 1 z < Mixtures of Truncated Basis Functions Background

3 The ground work Mixture of Truncated Exponentials potential (Moral, Rumí, & Salmerón, 1) e 1.7z if 3 z < e 1.64z if 1 z < f(z) = e 1.64z if z < e 1.7z if 1 z < Mixture of Truncated Polynomials potential (Shenoy & West, 9) The MoP potential is similar to the MTE, except that a polynomial is used as a basis function instead of an exponential. Appears as a natural consequence of a Taylor expansion of the target f. Mixtures of Truncated Basis Functions Background

4 The MoTBF model MoTBF potential The Mixture of Truncated Basis Functions potential is defined as for MTEs and MoPs, just abstracting the basis functions from exponentials/polynomials to functions ψ i( ), where{ψ i( )} i=1 define a collection of real basis functions that are closed under product and marginalisation. Mixtures of Truncated Basis Functions Background 3

5 The MoTBF model MoTBF potential The Mixture of Truncated Basis Functions potential is defined as for MTEs and MoPs, just abstracting the basis functions from exponentials/polynomials to functions ψ i( ), where{ψ i( )} i=1 define a collection of real basis functions that are closed under product and marginalisation. MoTBF rationale and properties More flexible than MTEs and MoPs (MoPs and MTEs are special cases of MoTBFs). Any distribution function can be approximated arbitrarily well by an MoTBF distribution (under certain assumptions). The class of MoTBF potentials is closed under combination and marginalisation (approximate) inferences in any hybrid domain using Shenoy-Shafer propagation. Mixtures of Truncated Basis Functions Background 3

6 The MoTBF model MoTBF potential The Mixture of Truncated Basis Functions potential is defined as for MTEs and MoPs, just abstracting the basis functions from exponentials/polynomials to functions ψ i( ), where{ψ i( )} i=1 define a collection of real basis functions that are closed under product and marginalisation. MoTBF rationale and properties More flexible than MTEs and MoPs (MoPs and MTEs are special cases of MoTBFs). Any distribution function can be approximated arbitrarily well by an MoTBF distribution (under certain assumptions). The class of MoTBF potentials is closed under combination and marginalisation (approximate) inferences in any hybrid domain using Shenoy-Shafer propagation. The subject of this talk An efficient (online tradeoff between complexity and quality) method for translating a density to an MoTBF. Mixtures of Truncated Basis Functions Background 3

7 Geometry of approximations A quick recall of how of how to do approximations in R n : We want to approximate the vector f = (3,, 5) with A vector along e 1 = (1,,). Mixtures of Truncated Basis Functions Univariate distributions 4

8 Geometry of approximations A quick recall of how of how to do approximations in R n : We want to approximate the vector f = (3,, 5) with A vector along e 1 = (1,,). Best choice is f,e 1 e 1 = (3,,). Mixtures of Truncated Basis Functions Univariate distributions 4

9 Geometry of approximations A quick recall of how of how to do approximations in R n : We want to approximate the vector f = (3,, 5) with A vector along e 1 = (1,,). Best choice is f,e 1 e 1 = (3,,). Now, add a vector alonge. Best choice is f,e e, independently of the choice made for e 1. Also, the choice we made for e 1 is still optimal since e 1 e Mixtures of Truncated Basis Functions Univariate distributions 4

10 Geometry of approximations A quick recall of how of how to do approximations in R n : We want to approximate the vector f = (3,, 5) with A vector along e 1 = (1,,). Best choice is f,e 1 e 1 = (3,,). Now, add a vector alonge. Best choice is f,e e, independently of the choice made for e 1. Also, the choice we made for e 1 is still optimal since e 1 e All of this maps over to approximations of functions! We only need a definition of the inner product and the equivalent to orthonormal basis vectors. Mixtures of Truncated Basis Functions Univariate distributions 4

11 Function approximations Inner product For two functions f(x) and g(x) defined on Ω R, define f,g = f(x) g(x) dx. Ω Mixtures of Truncated Basis Functions Univariate distributions 5

12 Function approximations Inner product For two functions f(x) and g(x) defined on Ω R, define f,g = f(x) g(x) dx. Ω Finding an orthonormal basis If we have a set of functions{ψ k } k=1 in x, e.g., {1, exp( x), exp(x), exp( x),...} we can orthonormalise the set using the Gram-Schmidt process. Mixtures of Truncated Basis Functions Univariate distributions 5

13 Function approximations Inner product For two functions f(x) and g(x) defined on Ω R, define f,g = f(x) g(x) dx. Ω Finding an orthonormal basis If we have a set of functions{ψ k } k=1 in x, e.g., {1, exp( x), exp(x), exp( x),...} we can orthonormalise the set using the Gram-Schmidt process. We now have the machinery to approximate any univariate function (including densities) as a sum of basis-functions. Mixtures of Truncated Basis Functions Univariate distributions 5

14 Function approximations Doing the approximation The function f is approximated by ˆf = i f,φi φi (known as the Generalised Fourier Series approximation). ˆf minimises Ω (f(x) ˆf(x)) dx. Properties The approximation can be made arbitrarily good even without splitting Ω into sub-intervals (given some technicalities that are fulfilled for both the polynomials and the exponentials). The approximation supports the Shenoy-Shafer scheme. Mixtures of Truncated Basis Functions Univariate distributions 6

15 Example: Polynomials vs. the Std. Gaussian ˆf =.436 φ 1 ˆf =.436 φ1 + φ φ 3 ˆf =.436 φ 1 + φ φ 9 Constant (φ 1) chosen so that the probability mass of ˆf is allocated correctly. φ i, for i > 1, does not contribute to the probability mass: φ i(x)dx because φ i φ 1 for i > 1. Ω Mixtures of Truncated Basis Functions Univariate distributions 7

16 Positivity condition ˆf is chosen to minimise (f(x) ˆf(x)) dx, but is ˆf a density? Ω 1 ˆf(x)dx f(x)dx as long as φ1(x) is a constant. Ω Ω However, we cannot guarantee ˆf(x),x Ω in general x x 1 3 Approximation of the χ distribution, and zooming in towards the origin. Mixtures of Truncated Basis Functions Univariate distributions 8

17 Positivity condition ˆf is chosen to minimise (f(x) ˆf(x)) dx, but is ˆf a density? Ω 1 ˆf(x)dx f(x)dx as long as φ1(x) is a constant. Ω Ω However, we cannot guarantee ˆf(x),x Ω in general x x 1 3 Approximation of the χ distribution, and zooming in towards the origin. We must optimise the error subject to the constraint that ˆf(x),x Ω. Mixtures of Truncated Basis Functions Univariate distributions 8

18 Positivity condition Property: If h k (x) = k i=1αiφi(x) then (error(f,h k )) = k ( f,φ i α i) + i=1 Ensuring positivity We now have the following optimisation problem: f,φ i. i=k+1 Minimise Subject to k ( f,φ i α i) i=1 k α iφ i(x),x Ω, and α 1 = f,φ 1 i=1 This is a convex optimisation problem, and can be solved using standard methods (semi-definite programming). Mixtures of Truncated Basis Functions Univariate distributions 9

19 A probabilistic interpretation So far, ˆf lacks a probabilistic interpretation. However, it can be shown that ( ( D f ˆf ) error(f, ˆf) ). min x Ω ˆf(x) Mixtures of Truncated Basis Functions Univariate distributions 1

20 A probabilistic interpretation So far, ˆf lacks a probabilistic interpretation. However, it can be shown that ( ( D f ˆf ) error(f, ˆf) ). min x Ω ˆf(x) The final estimation procedure The updated optimisation problem wrt. (α, ξ) is convex: Minimise 1 ξ Subject to k ( f,φ i α i) i=1 k α iφ i(x) ξ,x Ω, and α 1 = f,φ 1 i=1 This is an approximation to the KL-minimising MoTBF. Mixtures of Truncated Basis Functions Univariate distributions 1

21 Conditional MoTBF densities Definition: Conditional MoTBF density, version.1 m ˆf(y x) = α j φ j(y,x),x I k j=1 Requirement: For each x, we must have that y ˆf(y x)dy = 1. Mixtures of Truncated Basis Functions Conditional Distributions 11

22 Conditional MoTBF densities Definition: Conditional MoTBF density, version.1 m ˆf(y x) = α j φ j(y,x),x I k j=1 Requirement: For each x, we must have that y ˆf(y x)dy = 1. Consequence: We define conditional MoTBFs to only depend on their conditioning variable(s) through the relevant hypercube, and not the numerical value. Definition: Conditional MoTBF density, version. m k ˆf(y x) = α (k) j φ j(y),x I k. j=1 Mixtures of Truncated Basis Functions Conditional Distributions 11

23 What is the target density? Question Assume we have I k, and want to define ˆf(y x) = m j=1 αjφj(y) for x I k. What should it approximate? Choose ˆf(y x) f(y x ), for some fixed x. x chosen as midpoint of hypercubei k. x chosen as mass-center of hypercubei k ˆf(y x) ˆf(y) Mixtures of Truncated Basis Functions Conditional Distributions 1

24 What is the target density? Question Assume we have I k, and want to define ˆf(y x) = m j=1 αjφj(y) for x I k. What should it approximate? Choose ˆf(y x) f(y x I k ) = f(y x)f(x x I k )dx x I k t f(y x t)f(x t x I k ) ˆf(y x) ˆf(y) Mixtures of Truncated Basis Functions Conditional Distributions 1

25 Cost-Benefit translation of arbitrary models Algorithm 1 Initialisation: All distributions use 1 BF, all conditionals defined on only one hypercube For each marginal: Calculate the KL gain of adding a BF 3 For each conditional: Calculate the KL benefit of splitting each conditioning hypercube Calculate the KL benefit of adding one BF for each hypercube 4 Perform the operation with best KL benefit vs. increase in complexity 5 If KL is not good enough, goto Mixtures of Truncated Basis Functions Conditional Distributions 13

26 Preliminary experiments An MOP representation of the network given by X N(, 1) and Y X = x N(.1 x,1): ˆf(y) and ˆf(y) ˆf(y x) Approximation found after 15 iterations. 3 basis vectors used for both the parent and the child (for each interval). Mixtures of Truncated Basis Functions Conditional Distributions 14

27 Future work Investigate heuristics for efficient calculation of approximate KL gain (ongoing). How to exploit the structure of MoTBFs when doing inference (ongoing). How to use the current translation procedure for learning MoTBFs from data (ongoing). How to efficiently re-translate the model to incorporate evidence ( dynamic discretisation ). Mixtures of Truncated Basis Functions Conclusions 15

28 Some experiments Comparison with the translation procedure by Cobb, Shenoy, and Rumí (6): Distribution Cobb et al. MoTBFs (MTE) MoTBFs (MoP) N(,1) [8] [1] Gamma(6,1) [] [13] Gamma(8,1) [7] [1] Gamma(11,1) [1] [13] Beta(,) [9] [15] Beta(.7,1.3) [8] [11] Beta(1.3,.7) [8] [16] LogNormal(,.5) [19] [11] LogNormal(,.5) [] [1] LogNormal(,1) [6] [6] Mixtures of Truncated Basis Functions For the curious 16

Learning Mixtures of Truncated Basis Functions from Data

Learning Mixtures of Truncated Basis Functions from Data Helge Langseth, Thomas D. Nielsen, and Antonio Salmerón PGM This work is supported by an Abel grant from Iceland, Liechtenstein, and Norway through