An Information Criteria for Order-restricted Inference

Size: px

Start display at page:

Download "An Information Criteria for Order-restricted Inference"

Regina Greer
5 years ago
Views:

1 An Information Criteria for Order-restricted Inference Nan Lin a, Tianqing Liu 2,b, and Baoxue Zhang,2,b a Department of Mathematics, Washington University in Saint Louis, Saint Louis, MO 633, U.S.A. b Key Laboratory for Applied Statistics of MOE and School of Mathematics and Statistics, Northeast Normal University, Changchun 324 Jilin Province, P. R. China Abstract: A general information criterion with a general penalty which depends on the size of samples is developed for nested and non-nested models in the context of inequality constraints. The true parameters may be defined by a specified parametric model, or a set of specified estimating functions. When the true parameters are defined by estimating functions, we use the empirical likelihood approach to construct information criterion. The consistency of the relevant detection procedures are also established. A Monte Carlo study indicates that our new criterion is effective, compared to Anraku (999) information criterion, for detecting the configuration of normal means satisfying the simple order restriction. Keywords: AIC; BIC; Empirical likelihood; Changepoint; No-observed-adverse-effect level; Simple order restriction; Umbrella Order restriction; Consistency. Introduction Many types of problems are concerned with identifying meaningful structure in real world situations. Structure involving orderings and inequalities is often useful since it is easy to interpret, understand, and explain. When experiments conditions have an inherent ordering, making use of ordering information can improve inference. In bioassays of dose-effect experiments, the no-observed-adverse-effect level L is used to decide the highest dose level at which the adverse effect is acceptably small. In order to determine L, Kikuchi, Yanagawa & Nishiyama (993) and Yanagawa, Kikuchi & Brown (994) suggested several multiple comparison procedures, and also derived bxzhang@nenu.edu.cn 2 The authors are partly supported by NSF of China (No.8737).

2 a method based on Akaike s information criterion, AIC (Akaike, 973). In Kikuchi et al. (993), independent random samples corresponding to k dose levels are supposed to come from k normal populations with an unknown common variance σ 2 and means θ θ 2 θ k. In their paper, L may be defined as a changepoint of mean parameters θ m ( m k) such that θ = = θ m < θ m + θ k. Kikuchi et al. (993) and Yanagawa et al. (994) also examined the probabilities of correct decision of the AIC method and other testing procedures by simulation. However, as far as we know, in classical statistics, hardly any model selection procedure is available for nested or non-nested models in the context of inequality constraints. The problem is that most model selection criterion such AIC and BIC have a penalty for the number of parameters, and in the inequality constrained models, this number is not clear. Recently, Anraku (999) proposed a better theoretical information criterion for parameters under a simple order restriction by employing a more favorable penalty number. Peng et al. (22) proposed an information criterion for detecting the multiplicity of the largest parameter as well as the configuration of true parameters with simple order restriction. They also established strong consistency of their procedures. In this paper, a general information criterion with a general penalty which depends on the size of samples is developed for nested or non-nested models in the context of inequality constraints. The true parameters may be defined by a specified parametric model, or a set of specified estimating functions. When the true parameters are defined by estimating functions, we use the empirical likelihood approach to construct information criterion. The consistency of the relevant detection procedures are also established. 2 An Information Criteria for Order-restricted Inference Suppose we have independent random samples from each of k populations specified by scalar-valued, unknown parameters θ,, θ k satisfying some unknown order restriction. Our concern is to seek distinct parameters among θ,, θ k based on the data. The following definition help us in constructing information criterion. Definition: Let M(θ) be a saturated model with parameter θ Θ R k, where Θ is the parameter space. Let Θ = {θ : θ = = θ k } and Θ t be a measurable set such 2

3 that Θ Θ t Θ. Furthermore, let M(Θ t ) =: M(θ), θ Θ t, which means M(Θ t ) is a sub-model with parameter space Θ t. Define model family as follows: F = {M(Θ t ) : Θ Θ Θ 2 Θ T } where T is finite. It is easily seen that M(Θ t ) is nested in M(Θ) for all t and when t < t 2, M(Θ t ) is nested in M(Θ t2 ). For example, the classic information criteria consider model M(θ), θ = (θ,, θ T ) τ, θ R T with (θ,, θ t ) τ Θ t = R t, that is (θ t+,, θ T ) τ are fixed, it is easy to have F = {M(Θ t )} T t= is a model family. Under order restriction, Let M : θ = θ 2 = θ 3, M 2 : θ = θ 2 θ 3, M 3 : θ θ 2 θ 3, M 4 : θ θ 2 θ 3, where means the independence of two parameters. Then, F = {M t } 4 t= is a model family. Suppose we have s model families F s = {M(Θ st ) : Θ Θ s Θ s2 Θ sts }, s =, 2,, s. (2.) For convenience, let M(s, t) denote the model M(Θ st ). information criteria as follows: In this paper, we consider c n (s, t) = 2 log(l n (x, s, t))/n + p(s, t) ϕ(n)/n, (2.2) where p(s, t) is the dimension of model M(s, t). L n (x, s, t) is the maximum likelihood based on the model M(s, t), the model is selected that corresponds to (ŝ, ˆt) = arg min c n (s, t). (2.3) s s,t t s Theorem 2. Let s and t s be some finite positive integers. Suppose model families {F s } s s= subject to F i Fj =, i j. Let (s, t ) be the correct model family and the correct number of parameters with s s and t t s, which means model M(s, t ) is the correct model. Assume that < p(s, t) < p for s =,, s, t =,, t, where p is a finite constant, and for fixed s, p(s, t ) < p(s, t 2 ) if t < t 2. Furthermore, assume F s is the largest model family that contains true model M(s, t ). Let L n (s, t) be the maximum likelihood based on model M(s, t). For t > t and M(s, t) F s, assume that 2[log(L n (s, t)) log(l n (s, t ))] d W (s, t, t) (2.4) where d denotes the convergence in distribution, W (s, t, t) is a non-degenerate distribution. Furthermore, we assume Then, lim ϕ(n)/n =, lim ϕ(n) =. (2.5) lim P [ŝ = s, ˆt = t ] =. (2.6) 3

4 3 Order-restricted Information Criteria Based on Parametric Likelihood In this section, suppose we specify a probability model p(x θ) for data x. Let L n (x, θ) be the likelihood under the model M(θ), where θ Θ. Denote ˆθ λ as the MLE of θ under model M(Θ λ ) that is ˆθ λ (x) = arg sup θ Θ λ L n (x, θ), (3.) where Θ = {θ : θ = = θ k }. Furthermore, let { A n (y, λ) = 2 log(l n (y, ˆθ λ (y))) log(l n (y, ˆθ } (x))) (3.2) Then, the dimension for the model M(Θ λ ) is defined as p w (λ) = A n (y, λ)p(y ˆθ (x))dy (3.3) It is easy to verify that the model dimension p w (λ) satisfy the conditions of Theorem 2.. Then, the information criteria (2.2) reduces to ORIC w (λ) = 2l(λ) + p w (λ) ϕ(n), (3.4) where l(λ) is the maximum log-likelihood under the λ-th candidate model. According to Theorem 2., (3.4) is a consistent criterion. Let ν (λ) and ν 2 (λ) are the number of and {, } specified in the λ th candidate model. Then, we define the model dimension as follows p(λ) = ν (λ) + ν 2 (λ) i= + i. (3.5) It is easy to verify that the model dimension p(λ) satisfy the conditions of Theorem 2.. Then, the information criteria (2.2) reduces to ORIC(λ) = 2l(λ) + p(λ) ϕ(n), (3.6) where l(λ) is the maximum log-likelihood under the λ-th candidate model. According to Theorem 2., (3.6) is a consistent criterion. Under order restrictions, we choose ϕ (n) = 2 ϕ 2 (n) = a log(log(n)) ϕ 3 (n) = b log(n) ϕ 4 (n) = c n where, a, b and c is constants. Generally, we can take a = 2/ log(log(k)), b = 2/ log(k), and c = 2/ k. 4

5 4 Order-restricted Information Criteria Based on Empirical Likelihood Suppose we have k independent random samples. For the ith sample, let x i,, x ini be i.i.d. observations from a d i -variate distribution F i, that there is a -dimensional parameter θ i associated with F i and information about θ i and F i is available in the form of Eg i (x i, θ i ) =. Let n = k i= n i, then the log-empirical likelihood is given by l n (F,, F k ) = k n i log p ij (4.) i= j= Obviously, we should maximize l(f,, F k ) under constraints given by n i j= p ij = n i j= p ij g i (x ij, θ i ) = for i =,, k. Then, it is easy to have p ij (θ i ) = n i { + λ τ i g i(x ij, θ i )} and for fixed θ i, λ i is a Lagrange multiplier satisfying n i j= p ij (θ i )g i (x ij, θ i ) = for i =,, k. The corresponding profile log-empirical likelihood is given by l E (θ) = k n i log{p ij (θ i )} (4.2) i= j= where θ = (θ τ,, θ τ k )τ. (MELE) of θ under model M(Θ λ ) that is Denote ˆθ λ as the maximum empirical likelihood estimate ˆθ λ = arg sup θ Θ λ l E (θ). (4.3) Then, we define the empirical-likelihood-based information criteria as ORIC EL (λ) = 2l E (ˆθ λ ) + p(λ) ϕ(n), (4.4) where p(λ) is defined in (3.5). According to Theorem 2., (4.4) is a consistent criterion. 5

6 5 A simulation study To examine the performance of the order-restricted information criterion, we executed Monte Carlo simulations for k = 4. We generated normal variates with means µ, µ 2, µ 3, µ 4 and variance. The following configurations of parameters were selected for µ = (µ, µ 2, µ 3, µ 4 ): H : µ = µ2 = µ3 = µ4, µ = (,,, ); K : µ = µ2 = µ3 µ4, K 2 : µ = µ2 µ3 = µ4, K 3 : µ µ2 = µ3 = µ4, K 4 : µ = µ2 µ3 µ4, µ = (,,,.5); µ = (,,.5,.5); µ = (,.5,.5,.5); µ = (,,.5, ); K 5 : µ µ2 = µ3 µ4, µ = (,.5,.5, ); K 6 : µ µ2 µ3 = µ4, µ = (,.5,, ); K 7 : µ µ2 µ3 µ4, µ = (,.5,,.5); M : µ = µ2 = µ3 µ4, µ = (,,,.5); M 2 : µ = µ2 µ3 = µ4, µ = (,,.5,.5); M 3 : µ µ2 = µ3 = µ4, µ = (,.5,.5,.5); M 4 : µ = µ2 µ3 µ4, µ = (,,.5, ); M 5 : µ µ2 = µ3 µ4, µ = (,.5,.5, ); M 6 : µ µ2 µ3 = µ4, µ = (,.5,, ); M 7 : µ µ2 µ3 µ4, µ = (,.5,,.5). We examined two types of comparison. First we compared the performance of our order-restricted criterion with that of Anraku(999) for detecting the true configuration of the parameters. In this simulation study, models H, K, K 2,, K 7, M, M 2,, M 7 are jointly considered. Here we assume the variance is known to be. First, the penalty numbers corresponding to H, K, K 2,, K 7, M, M 2,, M 7 are given by (3.3) for Figure and Figure 2. Second, the penalty numbers corresponding to H, K, K 2,, K 7, M, M 2,, M 7 are given by (3.5) for Figure 3 and Figure 4. Furthermore, in order to provide some clues for how to select the ϕ(n) under moderate-size sample, or even small sample, we choose ϕ(n) = (2/ log(log(4))) log(log(n)), (2/ log(4)) log(n), (2/ 4) n for our three case. The sample sizes for each population are taken equally, ranging form to 2 with step. Figure and Figure 2 show the frequency of error detection by these four methods based on simulations for true models H, K, K 2,, K 7, M, M 2,, M 7, respectively. We observe that our order information criterion work much better than Anraku(999) s order information criterion when ϕ(n) = (2/ log(log(4))) log(log(n)) or ϕ(n) = (2/ log(4)) log(n). And generally, under the restriction that lim ϕ(n)/n = 6

7 , frequencies of error detection for criteria with ϕ(n) increasing to more rapidly converge to more quickly than those with slower ϕ(n). But the slower ϕ(n) is more robust than the faster ϕ(n), see K 6 and K 7 of Figure 2. When we used the penalty numbers (3.5), from Figure 3 and Figure 4, we can draw the same conclusions. Secondly we compared four procedures for detecting the first changepoint (see (6) of Anraku(999)). Models H, M, M 2,, M 7 are considered. Here we assumed the variances of normal variates are common and unknown, and the penalty numbers are given by (3.3). And ϕ(n) is choose as above. The sample sizes for each population are taken equally, ranging form to 2 with step. Figure 5 shows the frequency of error detection by these four methods based on simulations for true models H, M, M 2,, M 7, respectively. We also observe that our order information criterion work much better than Anraku(999) s order information criterion when ϕ(n) = (2/ log(log(4))) log(log(n)) or ϕ(n) = (2/ log(4)) log(n). And the slower ϕ(n) is more robust than the faster ϕ(n), see M 3, M 6 and M 7 of Figure 5. For empirical likelihood inference, we generated X N(µ, ), X 2 N(µ 2, 2), X 3 t(6)+µ 3, X 4 χ 2 () +µ 4 Figure 6 and Figure 7 shows the frequency of error detection by these four methods based on simulations for true models H, K, K 2,, K 7, M, M 2,, M 7, respectively. We observe that our order information criterion works well when ϕ(n) = (2/ log(log(4))) log(log(n)) or ϕ(n) = (2/ log(4)) log(n). And generally, under the restriction that lim ϕ(n)/n =, frequencies of error detection for criteria with ϕ(n) increasing to more rapidly converge to more quickly than those with slower ϕ(n). But the slower ϕ(n) is more robust than the faster ϕ(n), see K 4, K 5, K 6 and K 7 of Figure 6 and Figure 7. 6 Comments and Conclusions As is well known, the calculation of P (i, k, w) may be difficult except for special weights under various order restrictions. Our information criterion avoids this computation problem. 7 Appendix Proof of Theorem 2.. When s s, t t s then the model M(s, t) with p(s, t) parameters is misspecified, so that plim log(l n (s, t))/n < plim log(l n (s, t ))/n, (A.) 7

8 where plim denotes convergence in probability. lim ϕ(n)/n = that Hence, it follows from (A.) and lim n(s, t ) c n (s, t)] = lim P [ 2 log(l n (s, t )) + p(s, t )ϕ(n)/n 2 log(l n (s, t)) + p(s, t)ϕ(n)/n] = lim P [log(l n (s, t ))/n log(l n (s, t))/n.5(p(s, t ) p(s, t))ϕ(n)/n] lim P [log(l n (s, t ))/n log(l n (s, t))/n pϕ(n)/n] = (A.2) so that = lim P [ˆt t m, ŝ s] lim P [c n (s, t ) c n (s, t) for some t t s, s s ] lim P [c n (s, t ) c n (s, t)] s s t<t (A.3) When s = s, if t < t then the model with p(s, t) parameters is misspecified, so that plim log(l n (s, t))/n < plim log(l n (s, t ))/n (A.4) Hence, it follows from (A.4) and lim ϕ(n)/n = that lim n(s, t ) c n (s, t)] = lim P [ 2 log(l n (s, t ))/n + p(s, t )ϕ(n)/n 2 log(l n (s, t))/n + p(s, t)ϕ(n)/n] = lim P [log(l n (s, t ))/n log(l n (s, t))/n.5(p(s, t ) p(s, t))ϕ(n)/n] = (A.5) so that lim P [ˆt < t, ŝ = s ] = lim P [ˆt < t ŝ = s ]P [ŝ = s ] lim P [c n (s, t ) c n (s, t) for some t < t ŝ = s ]P [ŝ = s ] t<t lim P [c n (s, t ) c n (s, t) ŝ = s ]P [ŝ = s ] = (A.6) For t > t it is follows form the assumption that 2[log(L n (s, t)) log(l n (s, t ))] d W (A.7) 8

9 where d denotes the convergence in distribution. Therefore, by lim ϕ(n) = and (A.7) implies that plim 2[log(L n (s, t)) log(l n (s, t ))]/ϕ(n) = hence plim n(c n (s, t ) c n (s, t))/ϕ(n) = plim 2[log(L n (s, t)) log(l n (s, t ))]/ϕ(n) + p(s, t ) p(s, t) = p(s, t ) p(s, t) < so that This implies that lim P [c n(s, t ) c n (s, t))] = lim P [ˆt > t, ŝ = s ] = lim P [ˆt > t ŝ = s ]P [ŝ = s ] =. Thus, we have References lim P [ŝ = s, ˆt = t ] = (A.8) [] Anraku, K. (999), An information criterion for parameters under a simple order restriction. Biometrika, 86:4-52. [2] Barlow, R.E., Bartholomew, D.J., Bremner, J.M. and Brunk, H.D. (972), Statistical Inference under Order Restrictions. New York: Wiley. [3] Barmi H.E. (996), Empirical likelihood ratio test for or against a set of inequality constraints Journal of Statistical Planning and Inference, 55:9-24. [4] Bartholomew, D.J. (959), A test of homogeneity for ordered alternatives. Biometrika, 46: [5] Bohrer, R. and Chow, W. (978), Weights for one-sided multivariate inference. Appl. Statist, 27:-4. [6] Childs, D.R. (967), Reduction of the multivariate normal integral to characteristic form. Biometrika, 54: [7] Konishi, S. and Kitagawa, G. (996), Generalised information criteria in model selection Biometrika, 83:

10 [8] Kullback, S. and Leibler, R.A. (95), On information and sufficiency. Ann.Math. Statist, 22: [9] LEE, C.I.C., Robertson, T. and Wright, F.T. (993), Bounds on distributions arising in order restricted inferences with restricted weights. Biometrika, 8: [] Perlman, M. D. (969), One-sided problems in multivariate analysis. Ann.Math. Statist, 4: [] Qin, J. and J. Lawless (955), empirical likelihood and tests of parameters subject to constraints. Canad. J. Statist, 23: [2] Rao, C.R. (973), Linear Statistical Inference and its Applications, 2nd ed. New York: Wiley. [3] Robertson, T. and Wright, F.T. (982), Bounds on mixtures of distributions arising in order restricted inference. Ann. Statist, : [4] Robertson, T. and Wright, F.T. and Dykstra, R.L. (988), Order Restricted Statistical Inference. New York: Wiley. [5] Williams, D.A. (97), A test for differences between treatment means when several dose levels are compared with a zero dose control Biometrics, 27:3-7. [6] Zhao, L. and Peng, L. (22), Model selection under order restriction. Statistics & Probability Letters, 57:3-36.

11 .7 H.7 K.7 K K 3 K 4 K Figure : The curve denote the curve for Anraku(999) s order information criterion. Those for our restricted criteria with the penalty numbers (3.3), and ϕ(n) being choose as (2/ log(log(4))) log(log(n)), (2/ log(4)) log(n), (2/ 4) n are showed by x, +, * curve, respectively.

12 K 6 K 7.8 M M 2 M M M M 4.5 M 7 Figure 2: The curve denote the curve for Anraku(999) s order information criterion.those for our restricted criteria with the penalty numbers (3.3), and ϕ(n) being choose as (2/ log(log(4))) log(log(n)), (2/ log(4)) log(n), (2/ 4) n are showed by x, +, * curve, respectively. 2

13 .7 H.7 K.7 K K 3 K 4 K Figure 3: The curve denote the curve for Anraku(999) s order information criterion. Those for our restricted criteria with the penalty numbers (3.5), and ϕ(n) being choose as (2/ log(log(4))) log(log(n)), (2/ log(4)) log(n), (2/ 4) n are showed by x, +, * curve, respectively. 3

14 K 6 K 7.8 M M 2 M M M M 4.5 M 7 Figure 4: The curve denote the curve for Anraku(999) s order information criterion. Those for our restricted criteria with the penalty numbers (3.5), and ϕ(n) being choose as (2/ log(log(4))) log(log(n)), (2/ log(4)) log(n), (2/ 4) n are showed by x, +, * curve, respectively. 4

15 .4 H.7 M.7 M 2.5 M M 4.5 M 5.4 M 6.4 M Figure 5: The curve denote the curve for Anraku(999) s order information criterion. Those for our restricted criteria with the penalty numbers (3.3), and ϕ(n) being choose as (2/ log(log(4))) log(log(n)), (2/ log(4)) log(n), (2/ 4) n are showed by x, +, * curve, respectively. 5

16 .7 H.8 K.8 K K 3 K 4 K Figure 6: Those for our empirical-likelihood-based information criteria (4.4), and ϕ(n) being choose as (2/ log(log(4))) log(log(n)), (2/ log(4)) log(n), (2/ 4) n are showed by x, +, * curve, respectively. 6

17 K 6 K 7.8 M M 2 M M 3.5 M 6.5 M 4.5 M 7 Figure 7: Those for our empirical-likelihood-based information criteria (4.4), and ϕ(n) being choose as (2/ log(log(4))) log(log(n)), (2/ log(4)) log(n), (2/ 4) n are showed by x, +, * curve, respectively. 7

More Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order Restriction

Sankhyā : The Indian Journal of Statistics 2007, Volume 69, Part 4, pp. 700-716 c 2007, Indian Statistical Institute More Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order