Optimal Screening for multiple regression models with interaction

Size: px

Start display at page:

Download "Optimal Screening for multiple regression models with interaction"

Belinda Harris
5 years ago
Views:

1 Optimal Screening for multiple regression models with interaction Marília Antunes 1, Natércia Durão, Maria Antónia Amaral Turkman 1 1 DEIO -CEAUL, Faculdade de Ciências da Universidade de Lisboa Universidade Portucalense Abstract The predictive point of view approach to the screening problem where the relation between the variable of interest Y and a vector of covariates X is established through a multiple regression model with interaction, presented by Durão [], is adapted and implemented. For the implementation, the methodology developed by Antunes [1] was followed. The results are illustrated by an example and a simulation study was carried for comparison purposes. Keywords: predictive distributions, multiple regression, screening. 1 Introduction An individual is considered a success if the measured value y of a random variable Y belongs to a certain region C y. Let γ represent the proportion of successful individuals in the population, that is, γ = PY C y). Admit that Y is difficult and/or expensive to measure and suppose that we wish to identify usually to retain for future observation) a set of individuals for whom Y C y. In that case, it is desirable to select to be analyzed in detail) only individuals seen as having a high probability of being a success. This can be achieved by measuring a feature vector X = X 1,..., X q) t, q 1) which is correlated to Y and easier and/or cheaper to measure. The screening can then be described by a region C x of IR q such that if x C x, then the individual is retained. C x is called the specification region. Let py,x θ) be the joint density function of Y,X) where θ is an unknown parameter vector. If there is available data D = {y 1,x 1),..., y n,x n)} from the unscreened population and a prior distribution pθ) for the parameter is known, then the specification region C x can be chosen such that the predictive probability that an individual will be rated as success is raised. That is, the predictive probability of an individual to be a success, γ = PY C y D) = PY Cy θ)pθ D)dθ 1) Θ 1

2 is raised by screening to a value δ such that PY Cy,X Cx θ)pθ D)dθ δ = PY C y X C x, D) = Θ. ) PX Cx θ)pθ D)dθ Θ δ represents the predictive probability of an individual selected by the screening procedure to be a success. For a future individual y, x), the following predictive probabilities are also of interest: α = PX C x D) = PX Cx θ)pθ D)dθ, 3) Θ the predictive probability of an individual to be selected by the screening procedure, and PY Cy,X / Cx θ)pθ D)dθ ε = PY C y X / C x, D) = Θ, 4) PX / Cx θ)pθ D)dθ Θ the predictive probability of a non selected individual to be a success. Note that ε = γ δ α)/1 α). The optimal screening problem consists in obtaining a specification region C x, optimal in some sense. From the predictive point of view, the specification region C x of size α is optimal if it maximizes the probability of a selected individual to be a success. Definition 1 The specification region C x of size α is optimal if PY C y X C x, D) = sup B x PY C y X B, D), 5) where the supreme is taken over all sets B which belong to the σ-algebra generated by X, such that PX B D) = α. Theorem 1 [3]) Let px Y C y, D) and px D), be, respectively, the predictive density function of X given Y C y and the marginal predictive density function of X. The optimal specification region of size α is given by { } C x = x IR q px Y Cy, D) : k 6) px D) or, equivalently, C x = where k is such that PX C x D) = α. { } x IR q PY Cy x, D) : k, 7) PY C y D) This result ensures that, for a given size α, PY C y X C x, D) is maximum. A reasonable choice for α, seems to be α = γ = PY C y D), since, in principle, we do not wish to retain more cases than those potentially considered as a success.

3 Optimal Screening for multiple regression models with interaction We consider the regression model with interaction Y i = β 0 + β 1X 1i + β X i +β 3X 1iX i +e i, where Y i, i = 1,..., n is the response variable and X 1i and X i, i = 1,..., n are the two explanatory variables. Using matrix notation, the model can be written as Y = Xβ + e, 8) where Y is a n 1 vector of observations Y i, X is a n 4 matrix with maximal characteristic), β = β 0, β 1, β, β 3) t is the vector ) of parameters and e is a n 1 vector of random errors, e i N n 0, σ 0 I n. The elements of e are not correlated, that is, Ee ie k ) = Ee i)ee k ), i, k = 1,..., n. Given the model parameters, β and σ0, and assuming that X 1i, X i) N µ,σ), i = 1,..., n, independent of e i, then X i µ,σ N µ,σ) and Y i X i = x i, β, σ0 Nα + β 1x 1i + β x i + β 3x 1ix i, σ0), [ for i = 1,. ].., n, σ where X i = X 1i, X i) t and µ = µ 1, µ ) t IR. Σ = 1 σ 1 σ 1 σ is a symmetric positive definite matrix, β IR 4 and σ0 IR +. The joint likelihood function, based on data set D = {y i, x 1i, x 1i) : i = 1,..., n}, obtained independently from the non screened population, in a natural informative experience, is given by Lβ, σ0, µ,σ y,x) = py X, β, σ0)px µ,σ). [ ] s Since the statistics m = x 1, x ) t and V = n 1) 1 s 1 s 1 s, where n n n x j = x ji/n, s j = x ji x j) /n 1) and s 1 = x 1i x 1)x i i=1 i=1 x )/n 1) for j = 1,, are jointly sufficient with respect to the model X i µ,σ N µ,σ), i = 1,..., n, the joint likelihood function is Lβ, σ 0, µ,σ y,x) = py X, β, σ 0)pm,V µ,σ) { σ0) n exp 1 [ k 1s + β σ ˆβ) ˆβ)] } t X t Xβ 0 { Σ n exp n [ m µ) t Σ 1 m µ) ]} [ exp 1 tr Σ 1 V )], 9) where k 1 = n 4, k 1s = y Xˆβ) t y Xˆβ) is the sum of squares of the residuals and ˆβ = X t X) 1 X t y is the least squares estimate or the maximum likelihood estimate) for β..1 Bayesian Predictive Analysis with non informative prior distribution Suppose that β, σ 0 and µ and Σ are, a priori, independently distributed. Applying Jeffreys rules for the specification of the marginal distributions, the non informative joint prior distribution is i=1 3

4 pβ, σ 0, µ,σ) = pβ)pσ 0)pµ)pΣ) σ 0) 1 Σ 3/. 10) The posterior distributions of the parameters are σ0 y,x k1 GI, k1s µ Σ,m,V N m, 1 n Σ ) ), β σ 0,y,X N 4 β, σ 0 X t X) 1), and Σ m,v Inv W n 1) V 1 ). For a future individual, y n+1,x n+1) = y,x), the joint predictive distribution can be written as py,x y,x) = py x,y,x)px m,v) = + = py x, β, σ0)pβ, σ0 y,x)dσ 0dβ β δ 0 IR px µ,σ)pµ,σ m,v)dµdσ. 11) The marginal predictive density function of X, px m,v) is given by px m,v) [ x m)t n ) ] n )+ 1 n + 1 nn ) V x m), ) ) 1) n + 1 V that is, X m,v t ) n ;m,, and the marginal predictive distribution of Y is a n n Student-t, Y x,y,x tk 1;X 0 β, s C), 13) where C = I+X 0X t X) 1 X t 0 and X 0 = [1 x 1 n+1 x n+1 x 1 n+1x n+1], since the distribution of the dependent variable for the future individual is NX 0β, σ0i), β σ0,y,x N 4 β, σ0x t X) 1 ) and the marginal posterior distribution for σ 0 is GI k1, k1s ).. Optimal specification region and predictive probabilities Without loss of generality, admit that C y =, l). Recall that X IR and, therefore, C x IR. Also note that all the necessary information, extracted from D, is in y, X, m and V. The predictive probabilities can now be written in a more specific form as l 1. γ = PY C y D) = py x,y,x)px m,v)dx dy, IR predictive probability of a future individual to be a success ; l py x,y,x)px mv)dx dy C. δ = PY C y X C x, D) = x α, predictive probability of a selected individual to be a success ; 4

5 3. α = PX C x D) = px m,v)dx, C x predictive probability of an individual to be selected by the screening procedure; l py x, y, X)px m, V)dx dy C 4. ε = PY C y X / C x, x D) = c 1 α, predictive probability of a non selected individual to be a success. Under these conditions, the specification region is given by C x = { l x IR : } py x,y,x)dy k, 14) where k is such that px m,v)dx = α. Since the condition in 14) C x can not be solved analytically, it is not possible to fix α in advance and then obtain k and C x as functions of α. We must start by fixing the value of k. Second, C x k), the specification region for a fixed k, is obtained by approximation and finally its size is evaluated. We followed the procedure introduced by Antunes [1], described as follows: 1. Build a sufficiently fine grid G = {x 1i, x i) IR } such that PX G D) 1.. For each x G, calculate PY C y x,y,x). 3. For several values of appropriately chosen k, chosen in an adequate way, form the sets Ĉk) x = {x G : PY C y x,y,x) k}. 4. Fit the necessary smooth functions to the borders of Ĉ x k), to approximate further the specification region C k) x. 5. For each k, calculate ˆα k = PX Ĉk) x ). ˆα k and Ĉk) x are used instead of α k and C x k) to evaluate the predictive probabilities. The results are obtained by numerical integration. 3 Application One hundred observations x 1i, x i) from X = X 1, X ) N µ,σ) with µ = 0, 0) t and Σ 11 = Σ = 1.0 and Σ 1 = were generated. Then, the values of y i = α + β 1x 1i + β 3x i + β 3x 1ix i + ε i with α = 1.0, β 1 = 0.15, β = 0.5, β 3 = 0.3 and ε i N0, 1) were simulated. We chose l = 1.0, the 0. quantile of the generated data, and hence defined C y = {y : y 1.0}. 5

6 3.1 Specification Region and predictive probabilities In order to obtain the specification region we considered a grid with ) points covering the interval [ 4.0, 4.0] [ 4.0, 4.0]. The results were as follows. 1. Predictive probability of a future individual from the population to be a success: γ = PY C y D) = 1.0 py x,y,x)px m,v)dx dy = 0.1. IR. Conditional probability of an individual to be considered a success, given x [ 4.0, 4.0] [ 4.0, 4.0]: PY 1.0 x,y,x) was evaluated for x [ 4.0, 4.0] [ 4.0, 4.0]. Figure 1 shows the points x 1i, x i, PY 1.0 x,y,x)), for x 1i, x i) G. Figure 1: x 1i,x i,py 1.0 x,y,x) ). 3. The sets Ĉk) x = {x G : PY C y x,y,x) k} for k = 0., 0.3, 0.4 and 0.5, are represented in Figure. 4. The fitted functions to the borders of Ĉk) x : Ĉ0.) x = {x 1, x ) IR : x 1 > x x x x < x < 4.0) x 1 < < x < x x x x )}; Ĉ0.3) x = {x 1, x ) IR : x 1 > x x x x < x < 4.0) x 1 < < x < 0.086x x x x )}; Ĉ0.4) x = {x 1, x ) IR : x 1 > x x x x < x < 4.0) x 1 < < 6

7 Figure : Regions C xk) for k = 0., 0.3, 0.4 and 0.5. x < x x x x )}; 0.5) C x = {x1, x ) IR : x1 > x x x x < x < 4.0) x1 < < x < x x x x )}. k) 5. For each k, the size of the specification region Cx is approximated k) by the integral of the marginal predictive density of X over C x : α k) = P X Cxk) D) Results are presented in Table 1. k α k) Z k) px m, V)dx. C x Table 1: Size of the regions C xk). 6. To obtain the remaining predictive probabilities, it is necessary to calculate, for each k, Z 1.0 Z ζ k) = P X Cxk), Y Cy py x, y, X)px m, V)dx dy. k) C x Note that δ, the probability of a selected individual to be a success is given by ζ/α. Results are presented in Table. 7

8 k ζ k) Table : Probability of a selected individual to be a success. k γ α k) δ k) ε k) Table 3: γ =Pindividual to be success); α =Pindividual to be selected by the screening procedure); δ =Pselected individual to be a success); and ε =Pnon selected individual to be a success). A summary of the predictive probabilities is presented in Table 3. Note that γ, α and ζ were calculated directly whereas δ and ε are obtained through the expressions relating them to the three probabilities above. The specification region with size closer to γ was obtained for k = 0.3. Concerning the predictive capacity, with this region, the probability of an individual to be a success is about.3 times superior than the rate of successes in the unscreened population. When larger values of k are considered, the rate of successes increases significantly. However, in real situations, such regions may be a bad choice because of their small size. This means that in the first stage of the screening procedure only a very small proportion of the screened individual would be retained for posterior analysis and hence the screening procedure could become itself very expensive. The introduction of a cost function is useful to find a good compromise. 3. Simulation Study We generated data sets of size 100, according to the model considered earlier in the application. For each one of the data sets the values of the predictive probabilities were estimated by the adequate ratios, as described in Table 4. The mean and standard error of the estimates are in Table 5. Except for δ, the standard error of the estimates decreases as k increases. Note that these values are associated to specification regions of very small size, all of them being smaller than the probability of an individual to be a success. Also note that these regions are located in the tails of the joint distribution of the covariates. Therefore, the number of observations in such regions will always be very small, especially if the sample size is not large, originating larger standard error values. 8

9 predictive probability numerator denominator α {x Ĉx} 100 γ {y C y} 100 δ {x, y) : x Ĉx y Cy} {x Ĉx} ε {x, y) : x / Ĉx y Cy} {x / Ĉx} Table 4: α =Pindividual to be selected by the screening procedure); γ =Pindividual to be a success); δ =Pselected individual to be a success); and ε =Pnot selected individual to be a success). Mean k ˆγ ˆα ˆδ ˆε Standard Error k ˆγ ˆα ˆδ ˆε Table 5: Simulation study results: mean and standard error of the estimated predictive probabilities. 4 Conclusions The development of methodologies to obtain optimal specification regions in screening problems is of great importance and utility, not only in practical situations as the ones referred in this work, but also in situations where a huge amount of data is available but just a small part of it is really important to treat. In real situations, it is useful to include the costs associated with bad decisions. Since each region is the one producing the best results among all of its sizein predictive terms), the choice of the optimal procedure will fall in the choice of k leading to the best compromise of the predictive probabilities of interest. References [1] Antunes, M. Some Problems in Non-Linear Prediction, PhD Thesis. Departamento de Estatística e Investigação Operacional, Faculdade de Ciências, Universidade de Lisboa, 00. 9

10 [] Durão, N. Metodologia Bayesiana na Análise de Problemas de Triagem, PhD Thesis. Departamento de Estatística e Investigação Operacional, Faculdade de Ciências, Universidade de Lisboa, 004. [3] Turkman, K. F. and Amaral Turkman, M. A. Optimal screening methods. J. Royal Statist. Soc., B 51:87 95,

Bayesian Linear Regression

Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective