How to Take into Account the Discrete Parameters in the BIC Criterion?

Size: px

Start display at page:

Download "How to Take into Account the Discrete Parameters in the BIC Criterion?"

Claud Simpson
6 years ago
Views:

1 How to Take into Account the Discrete Parameters in the BIC Criterion? V. Vandewalle University Lille 2, IUT STID COMPSTAT 2010 Paris August 23th, 2010

2 Issue Intorduction Some models involve discrete parameters. The discrete parameters play a part in the likelihood overfitting. But, they cannot be penalized using standard BIC approximation. Study Study the influence of the discrete parameters in the BIC approximation Focus on a simple model : the modal modality model Study the accuracy of differents approximations

3 Outline The modal modality model Integrated likelihood and BIC approximations Numerical experiments Conclusion and perspectives

4 Model The modal modality model X M(1, α 1,..., α m ) ( m h=1 α h = 1, α h > 0). x = (x 1, x 2,..., x n ) an n i.i.d. sample coming from X Constraint proposed by Biernacki et al. (2006) : { 1 ε if h = h α h = otherwise, ε m 1 h the location of the modal modality and 0 ε m 1 m. Two parameters must be estimated : ε which is continuous and h which is discrete. Comments Intuitive interpretation. Useful to get parsimonious models in clustering.

5 The modal modality model Both continuous and discrete parameters, in a simple case. In a Bayesian setting integration over both continuous and discrete parameters.

6 Integrated likelihood Prior on (ε, h ) : p(ε, h ) = 1 m p(ε). Integrated likelihood : p(x) = 1 m m h =1 m 1 m Truncated Dirichlet prior for p(ε) 0 p(x ε, h )p(ε)dε. p(ε) = Cε 1 2 (1 ε) 1 2 1[0, m 1 m ] (ε), with C some normalization constant.

7 Integrated likelihood Let n h = n x ih, the logarithm of the integrated likelihood (IL) is i=1 ( 1 m IL = log m h=1 m 1 m 0 (1 ε) n h How can we approximate this integral? Neglect discrete parameters. ( ) ) ε n nh Cε 1 2 (1 ε) 1 2 dε m 1 Make Laplace approximation for each term of the sum. Take into account the number of states of the discrete variable into account in the penalization.

8 Standard BIC approximation Maximum likelihood estimator of the parameters ( ) (ˆε, ĥ ) = arg max (1 ε,h ε)n h ε n nh, m 1 which gives ĥ = arg max h n h and ˆε = 1 n h c n. If the discrete parameters are not taken into account, the BIC criterion is : ( ( ) ) BIC 1 = log (1 ˆε) n h c ˆε n nch 1 log n, m 1 2 However this approximation is not justified when considering discrete parameters.

9 Taking the discrete parameters into account For the sum into IL, there are terms for which the maximum in reached on the border for which we need the following proposition. Proposition Let L : [a, b] R, such that L be one time differentiable on [a, b] and that it reaches its maximum at b with L (b) > 0. Then ( ) b log e nl(u) du = nl(b) log n + O(1). a For a comparison note that ( ) b log e nl(u) du = nl(c) 1 log n + O(1), 2 a if L would reach its maximum for c ]a, b[.

10 Taking the discrete parameters into account Applying the previous proposition ( m 1 ( ) ) m log (1 ε) n h ε n nh Cε 1 2 (1 ε) 1 2 dε = m 1 0 log p(x ˆε, h) 1 + s h 2 log n + O(1) where s h = 1 if the constraint is saturated (i.e. ˆε = m 1 m ) and 0 otherwise. Then replacing these approximations in IL we get ( 1 m ( ) n nh BIC 2 = log (1 ˆε h ) n ˆεh h n 1+s h 2 m m 1 h=1 where ˆε h is the maximum likelihood estimator of ε when h is constrained to be the modal modality. )

11 Taking the discrete parameters into account Simplify BIC 2 to avoid the integration on the states of the discrete variable, which gives the alternative criterion ( ( ) ) n nch BIC 3 = log (1 ˆε ch ) n ˆεch h c 1 log n log m. m 1 2 It is the standard BIC criterion penalized by the logarithm of the number of possible states of the discrete variable.

12 Numerical experiments Study the accuracy of the approximation in a simple case. Study the accuray for parsimonious models on binary data.

13 X M(1, 0.40, 0.35, 0.25) Number of selection of the right model (M1) IL BIC1 BIC2 BIC3 Behavior of each criterion according to the number of data when M1 is true Number of data X M(1, 0.40, 0.30, 0.30) Number of selection of the right model (M1) Behavior of each criterion according to the number of data when M2 is true Number of data IL BIC1 BIC2 BIC3 FIG.: Number of times where the true model is selected. FIG.: Number of times where the parsimonious model is selected.

14 Model Binary simulated data Binary data in the mulvariate case (in dimension d). x i (i {1,..., n}) with x i = (x 1 i, x 2 i,..., x d i ). x j i drawn from a Bernoulli distribution. Equality of ε for each variable (Celeux and Govaert (1991)). Experimental setting If d is large it is not possible to perform the integration over all the states of the discrete variable. Importance sampling (IS) to compute the sum. Compare the different approximations of the integrated likelihood without considering the model choice issue. d = 5, d = 10 and d = 20 variables. ε = 0.45 for each variable. 100 datasets, simulate 10, 000 modal positions for IS.

15 Binary simulated data Crit \ n d = 5 dimensions IL (0.9) (1.2) (1.7) (7.2) BIC (1.4) (1.7) (2.2) (7.2) BIC (0.8) (1.2) (1.6) (7.2) BIC (1.4) (1.7) (2.2) (7.2) d = 10 dimensions IL (1.0) (1.2) (2.4) (10) BIC (2.1) (2.1) (3.3) (10) BIC (1.0) (1.2) (2.3) (10) BIC (2.1) (2.1) (3.3) (10) d = 20 dimensions IL (0.8) (1.4) (2.4) (14) BIC (2.6) (3.2) (3.5) (11) BIC (0.8) (1.4) (2.4) (14) BIC (2.6) (3.2) (3.5) (11) TAB.: Mean value of the criterion according the values of n and d, the standard deviation is given into parenthesis.

16 Binary real data Binary data from the UCI database repository and the Statlog database. Parsimonious product of binary distributions model. Comparison the integrated likelihood without considering the model choice issue. If the initial data are continuous they are discretized using the Fisher algorithm (Fisher (1958)).

17 Binary real data Dataset n d IL BIC 1 BIC 2 BIC 3 SPECT Heart (Test) SPECT Heart (Train) Acute Inflammations Abalone Breast Cancer Diagnostic Crab Cushings Fglass TAB.: Comparison of the approximations of the log-likelihood value for binary data of the UCI and Statlog databases.

18 Conclusion and perspectives Conclusion The number of possible states of the discrete variable should be taken into account. At least in the penalty of the BIC criterion. Perspectives Estimate the integrated likelihood via posterior simulation using the harmonic mean identity. Study the setting where the number of possible states of the discrete variable grows to infinity.

Variable selection for model-based clustering

Variable selection for model-based clustering Matthieu Marbac (Ensai - Crest) Joint works with: M. Sedki (Univ. Paris-sud) and V. Vandewalle (Univ. Lille 2) The problem Objective: Estimation of a partition