Learning Mixtures of Truncated Basis Functions from Data

Size: px

Start display at page:

Download "Learning Mixtures of Truncated Basis Functions from Data"

Phillip Terry
6 years ago
Views:

Norway through the EEA Financial Mechanism (Nils mobility project).

1 Learning Mixtures of Truncated Basis Functions from Data Helge Langseth, Thomas D. Nielsen, and Antonio Salmerón PGM This work is supported by an Abel grant from Iceland, Liechtenstein, and Norway through the EEA Financial Mechanism (Nils mobility project). Supported and Coordinated by Universidad Complutense de Madrid, by the Spanish Ministry of Science and Innovation through projects TIN-9-C--, and by ERDF (FEDER) funds. Learning MoTBFs from data

2 Background: Approximations Learning MoTBFs from data Background: Approximations

3 Geometry of approximations A quick recall of how of how to do approximations in R n : 5 5 We want to approximate the vector f = (,,5) with A vector along e = (,,). 5 Learning MoTBFs from data Background: Approximations

4 Geometry of approximations A quick recall of how of how to do approximations in R n : 5 5 We want to approximate the vector f = (,,5) with A vector along e = (,,). Best choice is f,e e = (,,). 5 Learning MoTBFs from data Background: Approximations

5 Geometry of approximations A quick recall of how of how to do approximations in R n : 5 5 We want to approximate the vector f = (,,5) with A vector along e = (,,). Best choice is f,e e = (,,). Now, add a vector along e. Best choice is f,e e, independently of the choice made for e. Also, the choice we made for e is still optimal since e e. Best approximation is in general l f,e l e l. 5 Learning MoTBFs from data Background: Approximations

6 Geometry of approximations A quick recall of how of how to do approximations in R n : 5 5 We want to approximate the vector f = (,,5) with A vector along e = (,,). Best choice is f,e e = (,,). Now, add a vector along e. Best choice is f,e e, independently of the choice made for e. Also, the choice we made for e is still optimal since e e. Best approximation is in general l f,e l e l. All of this maps over to approximations of functions! We only need a definition of the inner product and the equivalent to orthonormal basis vectors. 5 Learning MoTBFs from data Background: Approximations

7 Geometry of approximations A quick recall of how of how to do approximations in R n : 5 5 We want to approximate the vector f = (,,5) with A vector along e = (,,). Best choice is f,e e = (,,). Now, add a vector along e. Best choice is f,e e, independently of the choice made for e. Also, the choice we made for e is still optimal since e e. Best approximation is in general l f,e l e l. Inner product for functions For two functions u( ) and v( ) defined on Ω R, we use u,v = Ω u(x)v(x)dx. 5 Learning MoTBFs from data Background: Approximations

8 Generalised Fourier series Definition (Legal set of basis functions) Let Ψ = {ψ i } i= be an indexed set of basis functions. Let Q be the set of all linear combination of functions in Ψ. Ψ is a legal set of basis functions if: ψ is constant; u Q and v Q implies that (u v) Q; For any pair of real numbers s and t, s t, there exists a function ψ i Ψ s.t. ψ i (s) ψ i (t). Legal basis functions {,x,x,x,...} is a legal set of basis functions. {, exp( x), exp(x), exp( x), exp(x),...} is also legal. {, log(x), log(x), log(x),...} is not a legal set of basis functions. Learning MoTBFs from data Background: Approximations

9 Generalised Fourier series Definition (Legal set of basis functions) Let Ψ = {ψ i } i= be an indexed set of basis functions. Let Q be the set of all linear combination of functions in Ψ. Ψ is a legal set of basis functions if: ψ is constant; u Q and v Q implies that (u v) Q; For any pair of real numbers s and t, s t, there exists a function ψ i Ψ s.t. ψ i (s) ψ i (t). Generalized Fourier series Assume Ψ is legal and contains orthonormal basis functions (if not, they can be made orthonormal through a Gram-Schmidt process). Then, the Generalized Fourier Series approximation to a function f is defined as ˆf( ) = l f,ψ l ψ l ( ). Learning MoTBFs from data Background: Approximations

10 Generalised Fourier series Definition (Legal set of basis functions) Let Ψ = {ψ i } i= be an indexed set of basis functions. Let Q be the set of all linear combination of functions in Ψ. Ψ is a legal set of basis functions if: ψ is constant; u Q and v Q implies that (u v) Q; For any pair of real numbers s and t, s t, there exists a function ψ i Ψ s.t. ψ i (s) ψ i (t). Important properties Any function including density functions can be approximated arbitrarily well by this approach. ( f(x) ) ( k Ω l= c iψ l (x) dx f(x) k Ω l= f,ψ l ψ l (x)) dx, so the generalized Fourier series approximation is optimal in L sense. Learning MoTBFs from data Background: Approximations

11 MoTBFs Learning MoTBFs from data MoTBFs

12 The marginal MoTBF potential Definition Let Ψ = {ψ i } i= with ψ i : R R define a legal set of basis functions on Ω R. Then g k : Ω R + is an MoTBF potential at level k wrt. Ψ... if g k (x) = k a i ψ i (x) i= for all x Ω, where a i are real constants;... or there is a partition of Ω into intervals I,...,I m s.t. g k is defined as above on each I j. Special cases An MoTBFs potential at level k = is simply a standard discretisation. MoPs (original definition) and MTEs are also special cases of MoTBFs. Learning MoTBFs from data MoTBFs

13 The marginal MoTBF potential Definition Let Ψ = {ψ i } i= with ψ i : R R define a legal set of basis functions on Ω R. Then g k : Ω R + is an MoTBF potential at level k wrt. Ψ... if g k (x) = k a i ψ i (x) i= for all x Ω, where a i are real constants;... or there is a partition of Ω into intervals I,...,I m s.t. g k is defined as above on each I j. Simplification We do not utilize the option to split the domain into subdomains here. Learning MoTBFs from data MoTBFs

14 Example: Polynomials vs. the Std. Gaussian g =.6 ψ g =.6 ψ + ψ + g 8 =.6 ψ + ψ +.97 ψ...5 ψ 8 Use orthonormal polynomials (shifted & scaled Legendre polynomials). Approximation always integrates to unity. Direct computations give the g k closest in L -norm. Positivity constraint and KL minimisation convex optimization. Learning MoTBFs from data MoTBFs 5

15 Learning Univariate Distributions Learning MoTBFs from data Learning Univariate Distributions 6

16 Relationship between KL and ML Idea for learning MoTBFs from data Generate a kernel density for a (marginal) probability distribution, and use the translation-scheme to approximate it with an MoTBF. Learning MoTBFs from data Learning Univariate Distributions 6

17 Relationship between KL and ML Idea for learning MoTBFs from data Generate a kernel density for a (marginal) probability distribution, and use the translation-scheme to approximate it with an MoTBF. Setup Let f(x) be the density generating {x,...,x N }. Let g k (x θ) = k i= θ i ψ i (x) be an MoTBF of order k. Let h N (x) be a kernel density estimator. Result: KL minimization is likelihood maximization in the limit Let ˆθ N = argmin θ D(h N ( ) g k ( θ)). Then ˆθ N converges to the maximum likelihood estimator of θ as N (given certain regularity conditions). Learning MoTBFs from data Learning Univariate Distributions 6

18 Example: Learning the standard Gaussian Density estimate; 5 samples. Learning MoTBFs from data Learning Univariate Distributions 7

19 Example: Learning the standard Gaussian Density estimate; 5 samples. g : BIC = 9.5. Learning MoTBFs from data Learning Univariate Distributions 7

20 Example: Learning the standard Gaussian Density estimate; 5 samples. g : BIC = 9.5. g : BIC = 8.. Learning MoTBFs from data Learning Univariate Distributions 7

21 Example: Learning the standard Gaussian Density estimate; 5 samples. g : BIC = 9.5. g : BIC = 8.. g : BIC = 76.. Learning MoTBFs from data Learning Univariate Distributions 7

22 Example: Learning the standard Gaussian Density estimate; 5 samples. g : BIC = 9.5. g : BIC = 8.. g : BIC = 76.. g : BIC = Best BIC score. Learning MoTBFs from data Learning Univariate Distributions 7

23 Comparison to State-of-the-art Direct ML optimization At PGM 8/IJAR we presented ML-learning of univariate MTEs: Divides support of function up into intervals. Direct ML optimization inside each interval. Computationally difficult. Summary of results Precision of the new method in terms of log likelihood is comparable to (but slightly poorer than) previous results. Speedup factor from to 5. Fewer parameters chosen by BIC selection criteria. Learning MoTBFs from data Learning Univariate Distributions 8

24 Conditional Distributions Learning MoTBFs from data Conditional Distributions 9

25 Definition of conditional distributions Assume we have x I m, and want to define g (m) k (y x) there. We define conditional MoTBFs to only depend on their conditioning variable(s) through the relevant hypercube, and not the numerical value: g (m) k (y x) = k j= θ(m) j ψ j (y) for x I m. X g (,) (y) g (,) (y) g (,) (y) g (,) (y) g (,) (y) g (,) (y) g (,) (y) g (,) (y) g (,) (y) X Conditioning hypercubes learned by optimizing BIC-score. Learning MoTBFs from data Conditional Distributions 9

26 Results: X N(,), Y {X = x} N(x/,) 5 cases 5 cases 5 cases 5 cases Learning MoTBFs from data Conditional Distributions

27 Concluding Remarks Learning MoTBFs from data Concluding Remarks

28 Summary Conclusions: KL-guided learning is much faster than the current implementations of direct ML optimization. There is however a loss in precision. The KL-guided learning results do not use splitpoints for the head variable. This can be exploited by inference algorithms. Future work: Look for improvements with respect to computational speed and numerical stability of the learning algorithm. Investigate the formal properties of the estimators. Compare our approach to López-Cruz et al. (): Learning mixtures of polynomials from data using B-spline interpolation. Learning MoTBFs from data Concluding Remarks

Mixtures of Truncated Basis Functions

Mixtures of Truncated Basis Functions Helge Langseth, Thomas D. Nielsen, Rafael Rumí, and Antonio Salmerón This work is supported by an Abel grant from Iceland, Liechtenstein, and Norway through the EEA