UNIVERSIDADE DE SANTIAGO DE COMPOSTELA DEPARTAMENTO DE ESTATÍSTICA E INVESTIGACIÓN OPERATIVA

Size: px
Start display at page:

Download "UNIVERSIDADE DE SANTIAGO DE COMPOSTELA DEPARTAMENTO DE ESTATÍSTICA E INVESTIGACIÓN OPERATIVA"

Transcription

1 UNIVERSIDADE DE SANTIAGO DE COMPOSTELA DEPARTAMENTO DE ESTATÍSTICA E INVESTIGACIÓN OPERATIVA GOODNESS OF FIT TEST FOR LINEAR REGRESSION MODELS WITH MISSING RESPONSE DATA González Manteiga, W. and Pérez González, A. Report Reports in Statistics and Operations Research

2 Goodness of fit test for linear regression models with missing response data González Manteiga, W. and Pérez González, A. Key words and phrases: Bootstrap; Goodness-of-fit-test; Missing at random; Multivariate Local Linear smoother; Nonparametric regression. Abstract: In this paper we propose tests to check the hypothesis of a linear regression model when we have missing data in the response variable. The test statistics are based on the L 2 distance between nonparametric regression estimators and rootn-consistent estimators of the regression function under the parametric model. We obtain the limit distribution of each statistic and we prove the validity of its Bootstrap version. Finally, in a simulation study, we compare the level and the power for the tests with incomplete samples with the resultant of having the complete samples. 1

3 1. INTRODUCTION Let m (x) =E [Y/X = x],x R d be the regression function associated to a random vector (X, Y ) R d+1. In a parametric regression context, m is assumed to belong to a certain family M Θ = {m θ (.),θ Θ} depending on some p-dimensional parameter θ; important examples of these are linear models (Seber (1977)), etc. During many years numerous efforts have been directed towards inference over these parameters. However, it is clear that before any type of conclusion can be drawn about these parameters, it is necessary to study whether or not the supposed parametric model is correct. This can be obtained by carrying out a test in order to contrast H 0 : m M Θ = {m θ (.),θ Θ} Θ R p versus H 1 : m/ M Θ. In contrast to parametric estimation, over the last three decades several nonparametric estimators for m have been developed, which avoids the problem of assuming a parametric model of the regression function (kernel type estimators, local polynomial smoothers, splines, etc.; see for instance Fan and Gijbels (1996)). On certain occasions these estimators have been used as pilot estimators in the test to validate a parametric model (Härdle and Mammen (1993), Alcalá, Cristóbal and González-Manteiga (1999), etc). All these tests statistics have been developed for the context of complete 2

4 samples. In practice, however, sample observations are often unavailable, owing to errors in the measuring apparatus in experiment design, or to any refusal to reply to certain questions during surveys, etc. In the field of regression estimation with missing data both the parametric context (Little (1992), Wang C. Y. et al. (2002), etc.) and the nonparametric context (Chu and Cheng (1995), Wang C. Y. et al.(1998), González Manteiga and Pérez González (2003), etc.) have been studied. The aim of this paper is to consider a goodness of fit test for a linear regression model adapted to the case in which the response variable Y has missing observations and covariable X is totally observed. As a test statistic, we shall use the distance (based on a weighted L 2 norm) between the nonparametric and parametric estimates of the regression function under the null hypothesis of linearity. When estimating the regression function in this situation there are two alternatives: either to consider only the complete observations or to first impute the incomplete ones and subsequently carry out the estimation with the completed sample. Bearing this in mind, two possible nonparametric estimators are considered, based on the Multivariate Local Linear Smoother (Ruppert and Wand (1994)): the Simplified Multivariate Local Linear Smoother and the Imputed Multivariate Local Linear Smoother. Both estimators have been recently studied by González Manteiga, W. and Pérez González, A. (2003). The two test statistics studied in this paper are based on these two estimators. 3

5 As in the case of complete data, the convergence speed of the test statistics distribution to their normal-type asymptotic distribution is, in general, slow, thus a Bootstrap procedure is proposed for the approximation of the critical values of the test. In this paper we have designed a resampling Bootstrap mechanism adapted to the lack of data in the response variable, and we have shown the asymptotic validity of the Bootstrap versions of the proposed contrast statistics. Not only does our interest lie in obtaining a goodness of fit test, which up until now has been nonexistent, for a linear regression model with missing data in the response variable, but also in ascertaining which of the two proposed statistics shows better behaviour. From the asymptotic results obtained, as well as from the simulations carried out, it is revealed that with an adequate choice of smoothing parameters, the test for the imputed estimator behaves better than that of the simplified one, as can be seen in the simulation study performed in Section 5. Moreover, the asymptotic results we obtained extend existing studies for complete data to this context of incompleteness, investigations such as those performed by Härdle and Mammen (1993), where they proposed a goodness of fit test for a parametric model using the Nadaraya-Watson estimator (1964) as the nonparametric smoother; or the results obtained by Alcalá, Cristóbal and González-Manteiga (1999) for polynomial parametric models using the local polynomial smoother. The rest of the article is organized as follows. In the next section we 4

6 present the regression model with missing data, as well as the nonparametric estimators used. In Section 3, we derive the asymptotic distribution of both statistics Section 4 focuses on the Bootstrap approximation of the test statistics. In Section 5, we present some of the simulation results, obtained through the aforementioned resampling mechanism, which compare the behavior of both tests and of the result of working with the complete sample, with respect to power, sample size, etc. And finally, Section 7 contains the proofs of the results appearing in previous sections. 2. THE REGRESSION MODEL WITH MISSING DATA AND THE LOCAL LINEAR SMOOTHERS We shall consider the general heteroscedastic regression model: Y = m (X)+σ (X) ε = m (X)+η, where ε is the error term, assumed to be of mean zero and variance the unit and σ 2 (x) =Var[Y/X = x]. In the case of there being no missing observations, we have a sample {(X i,y i )} n i=1 of i.i.d. (independent and identically distributed) vectors to the random vector (X, Y ) R d+1. In our case, it may be possible that Y i is not observed for any index i, which implies that we are faced with: (X i,y i ) R d+1 if Y i is observed and otherwise (X i,?) R d. To control whether or not an observation is complete, a new variable δ is introduced in the model, this being an indicator of the missing observations. Thus for each index i, δ i =1if Y i is observed, and zero if Y i is missing. 5

7 Following the guidelines laid down in publications (see Little and Rubin (2002) among others), it is necessary to establish whether or not the loss of a datum is independent of the value of the observed data and/or the missing data. In this paper we model the aforementioned loss assuming that the data are missing at random, MAR, i.e.: P (δ =1/Y,X) =P (δ =1/X) =p (X), X R d. (1) This model has been previously used by various authors such as Cheng (1994), Chu and Cheng (1995) and Wang and Rao (2001, 2002), among others. In this case, the variable in which the missing data appear is not the cause of loss. When there are no missing observations, one possible nonparametric estimator of the multidimensional regression function is the Multivariate Local Linear Smoother (MLLS) studied by Ruppert and Wand (1994), among others. This estimator is the result of minimizing: min α, β nx Yi α β t (X i x) ª2 K H (X i x), i=1 ³ where K H (u) = H 1 2 K H 1 2 u,his a symmetrical and positively defined d d matrix, and K is a d dimensional kernel function, that is K 0 and 6

8 R K (u) du =1. The explicit expression for this estimator is: bm H (x) =e t 1 X t x W x,h X x 1 X t xw x,h Y, (2) 1 (X 1 x) t where X t x =.., W x,h = diag {(K H (X i x)) n i=1 }, 1 (X n x) t Y =(Y 1,..., Y n ) t and e 1 is the (d +1) 1 dimensional vector with 1 at the first coordinate and zero at the rest. A very simple way of estimating the regression function with missing data in the response variable is that considered by the Simplified Multivariate Local Linear Smoother (SMLLS) which involves using only the complete observations, that is, those where δ i =1. Thus the problem of minimization becomes: min α,β nx Yi α β t (X i x) ª2 K H (X i x) δ i, i=1 from where the explicit expression of the estimator is deduced as: bm S,H (x) =bα = e t 1 X t x W δ x,hx x 1 X t x W δ x,hy, (3) where X x has the same expression as for complete data and Wx,H δ = diag {(K H (X i x) δ i ) n i=1 }. Another possibility is the Imputed Multivariate Local Linear Smoother (IMLLS), constructed in two stages. At the first stage the missing observa- 7

9 tions are estimated by making use of the SMLLS, and the sample is completed. In this way a completed sample is obtained of the form ³X ti, by i R d+1 i =1,...,n, where b Y i = δ i Y i +(1 δ i ) bm S,G,(L) (X i ), bm S,G,(L) (X i ) being the estimate of the m function evaluated at point X i through SMLLS (3), using a matrix of bandwidth G and a kernel function L. Once the sample has been completed the Multivariate Local Linear Smoother n³ is applied to the data Xi t, Y i o b n from where the estimator expression is i=1, deduced as: bm I,H,G (x) =bα = e t 1 X t x W x,h X x 1 X t xw x,h b Y, (4) where b Y = ³ by1,..., b Y n t is the imputed response vector. 3. ASYMPTOTIC RESULTS The aim of this paper is to test whether or not the regression function m is linear, that is: H 0 : m M Θ versus H 1 : m/ M Θ with M Θ = m θ (x) =θ 0 + θ t 1x, θ =(θ 0,θ 1 ) t Θ ª, Θ R d+1, in the case of the response variable having missing observations. In order to do this, we shall compare the parametric estimation under the null hypothesis, and the nonparametric one in both the previously considered 8

10 situations, that is, considering only the complete observations, or imputing the missing observations. To measure the distance between the estimates we shall use the weighted L 2 norm, previously used by Härdle and Mammen (1993) and Alcalá, Cristóbal and González Manteiga (1999), among others. As a nonparametric estimator the former authors used that of Nadaraya-Watson (1964) which is biased even under the null hypothesis of linearity, consequently having to correct this bias by using parametric residuals smoothing. This did not occur in the work of Alcalá, Cristóbal and González Manteiga (1999) which, in spite of contrasting a polynomial model, guaranteed the unbiasedness of the nonparametric estimator by using a polynomial local smoother of sufficient order. Since the objective of this paper is the contrast of a linear model, the use of the simplified and imputed estimators guarantees the unbiasedness under our null hypothesis (due to their construction based on the local linear smoother), thus avoiding the need to smooth the parametric estimation. For the SMLLS estimator(3) we considered the following test statistic: T n,s = n H 1 4 Z ( bm S,H (x) bm θn (x)) 2 w (x) dx. In the same way we consider the following statistic for the IMLLS estimator(4): T n,i = n H 1 4 Z ( bm I,H,G (x) bm θn (x)) 2 w (x) dx. In general, the asymptotic distribution of the test statistics is analyzed 9

11 under a local alternative, which in this case we shall assume as: with c n = m (x) =m θ0 (x)+c n s (x) ³ n H and s (x) belonging to the class of orthogonal functions to M Θ with respect to the inner product <s,t>= R s (x) t (x) w (x) dx. Next some considerations regarding the notation used throughout this paper are presented. We denote as ³ L the convergence in distribution, the symbol means convolution, and K (j) (a) represents the j-th convolution of the function K at point a. Furthermore, we define A (t) = R ³ K (u) L G 1 2 H 1 2 (t u) du, ³ with A H (t) = H 1 2 A H 1 2 t, and q (x) =1 p (x). The hypotheses needed to obtain the asymptotic distribution of the statistics are: (A.1) The variable X, with density function f, lies in a compact set with probability one. (A.2) The functions m, f and p are twice continuously differentiable. The function w is positive and continuously differentiable. (A.3) The functions f and p are bounded at the top and bottom far from zero and from infinity. (A.4) The conditional variance function Var(Y/X = x) =σ 2 (x) is bounded away from 0 and from and is continuous. 10

12 (A.5) The kernel functions K and L are symmetrical continuous densities, with compact support, and such that R K (u) udu =0and R K (u) uu t du = µ 2 (K) I, where µ 2 (K) is a scalar and I is the d-dimensional matrix. We will denote R [K] = R K 2 (u) du. (A.6) The orthogonal component s (x) is bounded uniformly in x. (A.7) E [ε 4 ] exists. (A.8) The difference bm θn (x) m θ0 (x) =O P ³n 1 2 uniformly in x. (A.9) The matrix H is symmetrical, positively defined, with each of its elementstendingtowardszero,and H 1 2 n d d+4 (A.10) (Only applicable for the imputed estimator) Apart from being symmetrical, positively defined and having all its elements tending towards zero, imputation matrix G should verify that ³n H G 2 = O (1) and n 3 2 H 1 2 G as n. (A.11) (Only applicable for the imputed estimator) The kernel function L is Lipchitzian continuous. (A.12) (Only applicable for the imputed estimator) The continuity module of the function p (x) 1 is uniformly bounded (see definition in Billingsley (1999) p. 80, for example). 11

13 Theorem 1 Given the hypotheses ( A.1-A.9), the asymptotic distribution of the test statistic is obtained for the Simplified estimator: T n,s L N (b S,V S ), where Z b S = [(K H s)(x)] 2 w (x) dx + H 1 4 Z w (x) σ 2 (x) R [K] dx, f (x) p (x) and V S =2K (4) (0) Z µ w (x) σ 2 2 (x) dx. f (x) p (x) Theorem 2 Given the hypotheses ( A.1-A.12), the asymptotic distribution of the test statistic is obtained for the Imputed estimator. We will distinguish three cases according to the asymptotic behaviour of smoothing parameter G: a) For the case in which G 1 2 = O (1) H 1 2,thatis G 1 2 = α H 1 2 with some scalar α>0. T n,i L N (b I,1,V I,1 ), where the asymptotic bias is b I,1 = Z p(x)(kh s)(x)+q(x)α 1 (A H s)(x) 2 w (x) dx Z + H 1 4 w (x) σ 2 (x) v 1 (x) dx, f (x) 12

14 and the asymptotic variance is Z µ w (x) σ 2 2 (x) p (x) V I,1 =2 c 1 (x) dx, f (x) being v 1 (x) =p (x) R [K]+α 2 q2 (x) p (x) and Z Z A 2 (u) du +2α 1 q (x) K (u) A (u) du, (5) c 1 (x) = K (4) (0) + H 2 q 4 (x) G 2 p 4 (x) A(4) (0) + 6 H q2 (x) G p 2 (x) +4 H 1 2 G 1 2 q (x) p (x) [K K K A (0)] + 4 H G [K K A A (0)] + q 3 (x) [A A A K (0)]. p 3 (x) b) If H 1 2 G 1 2 0, that is G H 0 T n,i L N (b I,2,V I,2 ), where the asymptotic bias is Z b I,2 = Z [(K H s)(x)] 2 w (x) dx + H 1 4 w (x) σ 2 (x) v 2 (x) dx, f (x) and the asymptotic variance is V I,2 =2 Z µ w (x) σ 2 2 (x) K (4) (0) dx, f (x) p (x) 13

15 with v 2 (x) = 1 p (x) R [K]. c) Finally, when G 1 2 H 1 2 0, that is H G 0 T n,i L N (b I,3,V I,3 ), where the asymptotic bias is b I,3 = Z [(p(x)(k H s)(x)+q(x)(l G s)(x))] 2 w (x) dx Z + H 1 4 w (x) σ 2 (x) v 3 (x) dx, f (x) and the asymptotic variance is Z µ w (x) σ 2 2 (x) p(x) V I,3 =2 c 3 (x) dx, being f (x) Z v 3 (x) =p (x) K 2 (u) du + q Z (x)2 p (x) α 1 L 2 (u) du +2q (x) α 1 L (0), and c 3 (x) = K (4) (0) + H H 1 2 G 1 2 µ 6 q2 (x) G 1 2 p 2 (x) L(2) (0) + q4 (x) p 4 (x) L(4) (0) µ 4 q (x) p (x) L (0) + (x) 4q3 p 3 (x) L(3) (0). The following remarks are of interest. 14

16 Remark 1 The asymptotic distribution of the tests is generally obtained under an alternative hypothesis which converges asymptotically to the null hypothesis. It can be observed that the convergence rate we use here ( c n = ³ n H ) extends on that used by Hardle and Mammen (1993) with scalar ³ smoothing matrix ( H = h 2 I, c n = nh d ), or by Alcalá et al. (1999) for the onedimensional case with complete samples µ c n = ³ nh 1 2 Regarding the hypotheses needed to obtain those results, hypotheses A.1- A.7 are similar to those used for complete data; with the only difference that, here we also need some conditions about the missing data model ( p)(1). Hypothesis A.8 is analogous to that used for complete data (see Alcalá et al.(1999)), where bm θn is a parametric linear estimator of the regression function with missing data. It is easy to see that by applying the least squares method, the estimator of θ obtained from using only the complete observations θ n = bθ0, b ³ ³ θ t 1, coincides with that obtained from using the imputed sample, when the imputations are made under the null hypothesis by by i = b θ 0 + b θ t 1x i (see Little and Rubin (2002) for more details). This implies 1 2. that the parametric least squared estimator (θ n ) is the same for T n,s and T n,i. If we have the whole sample, then the rate of convergence of the parametric ³ estimator is known n 1 2. However, if we have missing observations and we useonlythecompletesubsample,thentherateofconvergenceisof (n 1 ) 1 2, where n 1 is the size of the complete subsample, but n 1 = O P (np ( )), then under the p-bound hypotheses, our estimator can be considered as rootn- 15

17 consistent. Hypothesis A.9 extends on that used for complete data (see for example Alcalá et al.(1999)). And finally, hypotheses A.10 A.12 are used to obtain the asymptotic representation of the weights for the imputed estimator (4). Remark 2 We can see that terms v 1 (x), v 2 (x) and v 3 (x) are also the expressions that appear in the variance of the Imputed estimator, IMLLS (4), forthecases G 1 2 = α H 1 2, H 1 2 G and G 1 2 H respectively (see González Manteiga W. and Pérez González A. (2003) for more details). Remark 3 It is important to point out that the asymptotic distributions obtained extend the existing results for complete samples to the case of absence of data in the response variable. In the particular case of no missing observations, it is immediately obvious that both contrast statistics T n,s and T n,i coincide, and therefore their asymptotic distribution as well. If we denote the statistic based in the complete sample as T n,c, we get that: T n,c L N (b, V ), with Z b = Z [(K H s)(x)] 2 w (x) dx + H 1 4 w (x) σ 2 (x) R [K] dx, f (x) and Z µ w (x) σ V =2K (4) 2 2 (x) (0) dx. f (x) 16

18 This asymptotic distribution coincides (taking H = h 2 I) with the obtained by Härdle and Mammen (1993) with a statistic based on the smoothing of the parametric residuals. It is also clear from their work that if the smoothing is no carried out, then, for reasons we mentioned above, a term appears in the bias in the asymptotic distribution depending on the first order derivatives of the function m. On the other hand, the obtained by Alcalá, Cristóbal and González-Manteiga (1999) for the local linear smoother is extended to the multidimensional case. Remark 4 A general characteristic of the previous distributions worth pointing out is that the data observation probability (1) affects the asymptotic distributions considerably. It can immediately be seen that the bias and variance terms increase as we lose data (the value of p (.) decreases). Observing the asymptotic distributions obtained for T n,s and T n,i in the case of H 1 2 G as n, it can be seen that both expressions are asymptotically equivalent, which in turn indicates that carrying out the imputation with a bandwidth parameter G, with order of convergence lower than that of estimate parameter H, does not bring about any improvement in the test performance, but rather it merely implies an increase in computational time. From this, it follows that imputation is not recommended in this case. This situation also arose when comparing the asymptotic mean squared error of both estimators for this case in the work of González Manteiga W. and PérezGonzálezA.(2003). The behaviour of the test in the other two cases is more complex. Bearing 17

19 in mind the considerations given in the aforementioned work, the case in which G 1 2 H 1 2 0, implies a degree of imputation oversmoothing which may make worse the behaviour of the Imputed estimator considerably in this case. However, since our aim is to test a linear model and we are using a local linear estimator, the degree of oversmoothing in parameter G can lead to a good behaviour of the Imputed test under the null hypothesis. This choice of imputation parameter provokes more conservative behaviour compared to the Simplified one. However, in the case of G 1 2 = α H 1 2, it was observed in the estimation that, with an appropriate choice of imputation parameter, the imputed one is better than the simplified one considerably. Analogous behaviour could be expected for the tests; but due to the complexity of the asymptotic distribution for the statistic T n,i in this case, we chose for one simulation study in order to carry out the aforementioned comparisons. It can be seen in the simulation study of the Section 5 that the test for the Imputed estimator is slightly better than that of the Simplified one, when the selection of imputation parameter G is appropriate. 4. BOOTSTRAP RESAMPLING Since the speed of convergence of the distribution of the statistic to the Normal asymptotic distribution is generally quite slow, it follows that obtaining the critical points using this asymptotic distribution is in general is not recommended. One method to approximate these critical values could be the Bootstrap Resampling mechanism. In this paper we propose a method 18

20 based on Wild Bootstrap, which Härdle and Mammen (1993) used previously, demonstrating its validity for the case of complete data. In our case, we will have to design a mechanism adapted to the situation in which there are missing observations in the response variable. We shall now describe the method for the Imputed estimator. The method for the Simplified estimator is obtained through the appropriate modifications. Starting from a random sample {(X i,y i )} n i=1 where Y i may not be observed for some i index, we follow these steps: 1) Constructions of the residuals: At the first stage the residuals are constructed: bη i = Y i bm θn (X i ), if δ i =1 where bm θn is the least squared linear estimator with missing observations. 2) Construction of the Bootstrap errors: Subsequently the resampling oftheavailableresidualsisperformed,followingthewildbootstrap methodology, and obtaining {η i } i J such that: E [(η i )] = 0 i J. E (η i ) 2 =(bη i ) 2 i J. E (η i ) 3 =(bη i ) 3 i J. Here J is the set of indices such that δ i =1. 19

21 3) Construction of the Bootstrap sample: If δ i = 1, then Y i = η i + bm θn (X i ), where bm θn (X i ) is the parametric estimator at point X i, and if δ i =0,Y i is missing. In this way the Bootstrap sample ends up as: (X i,y i,δ i ) i =1, 2,..., n. The process will be repeated as many times as Bootstrap samples we wish to construct. The next theorem proves the Bootstrap validity. Theorem 3 Let us assume that hypotheses A.1-A.12 are verified. Let T n,s and T n,i be the test statistics for the Simplified (3) and Imputed (4) estimators respectively, over the Bootstrap sample {(X i,y i,δ i )} n i=1.then T n,s L N (b S,V S ), in probability; furthermore, for the three cases considered in Theorem 2 T n,i L N (b I,j,V I,j ), in probability with j =1, 2 and A SIMULATION STUDY In this section we will describe a simulation study designed for comparing the performance of the simplified and imputed tests. To do this, we observed their behaviour, using the test which resulted from having all the data of sample (T n,c ), as reference. We have used the Wild Bootstrap method, described in the previous section, in order to approximate the critical value of each test; 20

22 so that H 0 is rejected if T n,c (T n,s or T n,i respectively) is greater that the 1 α (α = 0.05) quantile of the Bootstrap distribution of the statistic, which is approximated by Monte Carlo. We have considered the following one-dimensional regression model: Y i =5X i + ax 2 i + σ (X i ) ε i 1 i n, (6) where the X i were generated from the uniform distribution in the unit interval [0, 1] and ε i N (0, 1). The Wild Bootstrap resampling has been performed 500 times for each sample. In the first place we have considered the sample sizes n =50and 100; two possible choices of σ (0.5 and 1) and a=0,1,5;furthermorewehavetakenthe missing data model as p (x) =1 0.4exp 5(x 0.4) 2 (see Figure 1). The statistics T n,c,t n,s and T n,i, have been calculated for various selections of the onedimensional bandwidth parameter h (or g in the imputed case). For each combination of factors, the experiment has been repeated 1000 times, and the percentage of rejections has been calculated. The results appear in Table x Figure 1: Model of missing data: p (x) =1 0.4exp 5(x 0.4) 2. 21

23 n=50 n=100 a=0 a=1 a=5 a=0 a=1 a=5 σ=0.5 σ =1 σ=0.5 σ =1 σ=0.5 σ =1 σ=0.5 σ =1 σ=0.5 σ =1 σ=0.5 σ =1 Comp Simp h=0.1 g= Imput. g= g= Comp Simp h=0.25 g= Imput. g= g= Comp Simp h=0.4 g= Imput. g= g= Figure 1: Table 1: Percentage of times that the null hypothesis was rejected, α =0.05. It is evident that the results are better for the case of the complete sample, especially for n =100; furthermore, as the sample size is increased or the variance is decreased, the performance of the three tests improves considerably; the effect of the variance is seen mainly in the power of the tests. As expected under the null hypothesis (a =0), the percentage of rejections approaches the considered level (5%), except for very small bandwidth parameters, perhaps due to the small amount of data available for the estimate. As we move away from the null hypothesis (a =1and 5), the power grows. We can see how, with the appropriate choice of the imputation parameter g, the behaviour of the imputed test can be improved, on most occasions more that the simplified one. In this table it is difficult to appreciate the degree 22

24 of oversmoothing which we mentioned in Remark 4, so we have created a graph (Figure 2) in which we present the rejection percentage under the null hypothesis for the imputed estimator test with various choices of parameter h and g. In addition we have compared this with the best level obtained for the simplified test with those values of h, which was 5.1% for h =0.15. In the graph we can see that the test for the imputed estimator becomes more conservative as g increases. Furthermore, the optimum choice of g is located in a neighborhood close to h, which coincides with our comments in Remark % of Rejects 4 3 h=0.05 h=0.1 h=0.15 Simp h= g Figure 2: Percentage of rejections under the null hypothesis for the Imputed test. The black discontinuous line represents the best level obtained for the simplified test (h =0.15, 5.1%), black continuous line represents the imputed test (h =0.15), gray continuous line represents the imputed test (h =0.1) and gray discontinuous line represents the imputed test (h =0.05). We shall now present some graphs (Figures 3, 4 and 5) where we can see 23

25 the behaviour of the test (for fixed values of g) for various values of a and of bandwidth parameter h, comparing these with the behaviour of the complete data case. Here we can observe more clearly that we previously remarked, even in the case of fixing the imputation parameter, in the larger part of the variation range of h, the imputed test approaches that of the case of complete data far more than the simplified test Comp. Simp. Imp h Figure 3: Estimated probability of rejection of H 0 under the null hypothesis (a =0). Percentages are for complete data (black continuous line), for the simplified test (discontinuous line) and for the imputed test with g =0.05 (gray line); n =100and σ 2 =

26 Comp Simp Imput h Figure 4: Estimated probability of rejection of H 0 with a =1. Percentages are for complete data (black continuous line), for the simplified test (discontinuous line) and for the imputed test with g =0.25 (gray line); n =100and σ 2 = Comp. Simp. Imput h Figure 5: Estimated probability of rejection of H 0 with a =5. Percentages are for complete data (black continuous line), for the simplified test (discontinuous line) and for the imputed test with g =0.15 (gray line); n =100and σ 2 =

27 The previously study may seem restrictive due to a determined model of missing data (1) being assumed, thus we shall now to study the behaviour of the tests with respect to it. In order to do this, we have assumed that missing data model (1) is constant, and we have taken various choices of this one (1, 0.9, 0.8, 0.75, 0.6 and 0.5); evidently, p =1reflects the complete data case. In the study the percentages of rejections under the null hypothesis (a =0) and the power with a =1are calculated for several values of the bandwidth parameter samples of size 100 of model (6) have been generated, and the standard deviation used has been of 0.5. n= 100, σ= 0.5 a=0 a=1 p=1 p=0.9 p=0.8 p=0.75 p=0.6 p=0.5 p=1 p=0.9 p=0.8 p=0.75 p=0.6 p=0.5 Simp g= h= 0.1 Imput. g= g= Simp g= h= 0.2 Imput. g= g= Simp g= h= 0.3 Imput. g= g= Table 2: Percentage of times that H 0 was rejected with level 0.05 for several values of missing data model (1) ; n =100and σ =0.5. As expected, the behaviour of both tests becomes worse as more data are lost (as p decreases), the levels of significance arenotreachedwelland it lowers the power. For values of p near 1, it may be that no improvement is brought about by the imputation since very few data are lost; furthermore, 26

28 major variance may be introduced in the estimates; but as p decreases, it becomes clear that through an appropriate selection of g, the behaviour of the imputed test is better than that of the simplified one. Figures 6 and 7 present the empirical approximations of the power function of the test for complete data, for the Simplified and Imputed ones, with α =0.05. The curves were drawn by joining points (a, P(a)) with lines, where P(a) denotes the estimated probability of rejection. We see how the empirical power for both Simplified and Imputed are similar Comp 0.2 Simp Imput a Figure 6: Estimated power function with p =0.7. Approximations are for complete data (black continuous line), for the simplified test (discontinuous line) and for the imputed test (gray line); n =100, σ 2 =0.25, h =0.15 and g =0.15. Computations are based on 100 samples. 27

29 Comp 0.2 Simp Imput a Figure 7: Estimated power function with variable p. Approximations are for complete data (black continuous line), for the simplified test (discontinuous line) and for the imputed test (gray line); n =100, σ 2 =0.25, h =0.15, g =0.15 and p (x) =1 0.4exp 5(x 0.4) 2. Computations are based on 100 samples. In the first figure (Figure 6), it is assumed that p =0.7; thatis,thedata are Missing Completely at random (MCAR) and the incomplete sample is a random subsample of the original sample; this is the most unfavourable case fortheimputedestimator;inthefigure the Imputed estimator shows slight improvement over the simplified one for any choice of a. In Figure 7 we have taken p (x) =1 0.4exp 5(x 0.4) 2, observing a greater gain for any value of a of the Imputed test (they display similar behaviour under the null hypothesis). 28

30 6. CONCLUSIONS In this paper, we have focused on the goodness of fit testofalinearregression model when there are missing observations in the response variable. To do this we have proposed two contrast statistics (T n,s and T n,i ). The analysis of the asymptotic distribution for each one of them and the results obtained through the simulation study allow us to reach several conclusions. In the first place, it is necessary to point out that both tests perform quite well in the absence of data; their behaviour evidently depends on the choice of the smoothing parameters of the nonparametric estimators. If we compare the asymptotic distributions obtained for both tests varying the parameter G depending on H, we can conclude that, under the null hypothesis, the best choice of G is G 1/2 H 1/2 0, providing less bias and variance to the distribution of the imputed estimator. On the other hand, under the alternative hypothesis, this choice would provoke a degree of oversmoothing which would lead to the Imputed test being too conservative, consequently the appropriate choice is H 1/2 G 1/2 = O (1) ; since in the case of H 1/2 G 1/2 0, both statistics have the same distribution asymptotically, and their comparison lacks interest. It is difficult to prove this analytically, due to the complexity of the distribution variance term of T n,i in the case H 1/2 G 1/2 = O (1). However, this fact is reflected in the simulation study. In short, for both the level test and the power analysis, it was observed that the appropriate choice of smoothing parameter implies a clear advantage 29

31 for the imputed test opposite to that the simplified one. 7. SKETCHES OF THE PROOFS Proof of the Theorem 1 The proof is similar to the complete data case by Alcala et al. (1999) taking into account the missing data model (1) and the multidimensional calculus. ProofoftheTheorem2The proof begins by obtaining a more manageable representation of the Imputed estimator (4); todothiswebaseourselveson the following development: bm I,H,G (x) = nx li δ (x; H, G)Y i, i=1 where nx li δ (x; H, G) =w i (x, H)δ i + w j (x, H)(1 δ j ) wi δ (X j,g), j=1 with w i (x, H) =e t 1(X t xw x,h X x ) 1 [1, (X i x) t ] t K H (X i x) and wi δ (x, H) =e t 1(X t xwx,h δ X x) 1 [1, (X i x) t ] t K H (X i x)δ i. We now use the asymptotically equivalent representation of a local lineal smoother in terms of a kernel estimator (see Fan and Gijbels (1996) for more 30

32 details): li δ (x; H, G) n 1 f(x) 1 K H (X i x)δ i + nx + f(x) 1 δ i n 2 K H (X j x)(1 δ j )f(x j ) 1 p(x j ) 1 L G (X i X j ), j=1 where means asymptotically equivalent. Our next goal is to obtain an asymptotic expression (uniformly in X i )for the second member of the above expression. Lemma 1 We will denote by e li δ (x; H, G) the asymptotic representation of li δ (x; H, G): For the case of H 1/2 G 1/2 = O (1), its expression is: e l δ i (x; H, G) =n 1 f(x) 1 K H (X i x)δ i +n 1 f(x) 1 δ i q(x)p(x) 1 α 1 A H (X i x); when G 1/2 H 1/2 0 as n : e l δ i (x; H, G) =n 1 f(x) 1 K H (X i x)δ i + n 1 f(x) 1 δ i q(x)p(x) 1 L G (X i x); and finally when H 1/2 G 1/2 0 as n : e l δ i (x; H, G) =n 1 f(x) 1 p(x i ) 1 K H (X i x)δ i. 31

33 ProofoftheLemma1In order to obtain the proof of the lemma, we need the two following lemmas. Let R (u) =n 2 P n i=1 K H(X i x)(1 δ i )f(x i ) 1 p(x i ) 1 L G (u X i )=n 2 P n i=1 Z i (u). Lemma 2 Under the hypotheses A.1-A.3, A.5, A.9-A.11, there is obtained that Sup R (u) E [R (u)] 0 as n. u Lemma 3 Furthermore, Sup E [R (u)] R0 (u) 0 as n, where: u If H 1/2 G 1/2 = O (1) then 1 q (x) R0 (u) =n R p (x) G 1/2 K (v) L G 1/2 x + H 1/2 v u dv 1 q (x) = n p (x) α 1 A H (u x) If G 1/2 H 1/2 1 q (x) 0 as n, R0 (u) =n p (x) L G (u x). And if H 1/2 G 1/2 1 q (u) 0 as n,r0 (u) =n p (u) K H (u x). ProofoftheLemma2 Sup u R (u) E [R (u)] = n 1 Sup P n i=1 n 1 Z i (u) E [Z i (u)] u Since the support of variable X (D) isacompactset,wecancarryout a partition in L n cubes (I k ), such as D = Ln k=1 I k,andi k I j = φ if k 6= j. For simplicity of notation we assume that the support is cubic, if it is not cubic, since D is a compact set, it can be covered by a cubic set. From this we obtain the following decomposition: n 1 Sup P n i=1 n 1 Z i (u) E [Z i (u)] u 32

34 = n 1 Max Sup 1 k L n u I k D n 1 Max Sup 1 k L n u I k D P n i=1 n 1 Z i (u) E [Z i (u)] P n i=1 n 1 Z i (u) P n i=1 n 1 Z i (u k ) + n 1 Max P n i=1 1 k L n 1 Z i (u k ) E [Z i (u k )] n +n 1 Max Sup 1 k L n u I k D E [Z i (u k )] E [Z i (u)] = Q 1 + Q 2 + Q 3. where u k (k =1,..., L n ) are the centers of the cubes. Now, let us consider the first term. Q 1 = Max Sup n P 2 n i=1 K H(X i x)(1 δ i )f(x i ) 1 p(x i ) 1 L G (u X i ) 1 k L n u D I k n P 2 n i=1 K H(X i x)(1 δ i )f(x i ) 1 p(x i ) 1 L G (u k X i ) from the functions hypothes about f, p, K, andl, we can bound the previous expression by: C 0 (u u n H 1 k ), where denotes a norm and C 0 is a positive constant. 2 G Let L n the number of d-dimensional cubes, and let l n the length (area, volume) of each of these cubes. Clearly l n = Cte, where Cte is a positive (L n ) 1 d µ d/2 n constant. Taking L n =, gives (u u k ) O (l n ) = Ã log n µlog 1! n 2 O and then: n µ 1 C 0 C 1 log n 2 (u u n H 1 k ) 0, 2 G n H 1 2 G n with C 1 a other positive constant; and this proves that Q 1 0 a.s. 33

35 The proof of Q 3 0 a.s is straightforward from Q 1 0 a.s., using that E [Q 3 ] E [ Q 3 ]. And finally, we consider the Q 2 term. Applying the Bernstein s inequality, we obtain that P { Q 2 >ζ} = P ½ Max n 1 n P ¾ 1 n i=1 Z i (u k ) E [Z i (u k )] 1 k L n >ζ L n Max P { n P 1 n i=1 1 k L (n 1 Z i (u k )) E [n 1 Z i (u k )] >ζ} n ( ) n 2 ζ 2 O (L n )2exp 2C 2 n 1 H 1 G 1 + 2C 3 3 H 1/2 G 1/2 ζ ( ) n 2 ζ 2 H 1/2 G 1/2 O (L n )2exp 2C 2 n 1 H 1/2 G 1/2 + 2C 3 3ζ Under the hypothesis of the bandwidth matrices (A.10), and letting Ã! 1 2 ζ = C log n n 2 H 1 1, we can bound the previous expression by: 2 G 2 ( ) (C ) 2 (C ) 2 log n O (L n )2exp = O (L n )2n C 0 where C 2,C 3,C 0 and C C 0 are positive constants. Hence (C ) 2 Ã µ! d/2 (C ) 2 n P { Q 2 >ζ} O (L n )2n C 0 = O 2n C 0 0, log n taking C sufficiently large and applying the lemma of Borel Cantelli we obtain that Q 2 = o (1) a.s. ProofoftheLemma3We only give the proof for the first case; the other cases are analogous. 34

36 Sup E [R (u)] R0 (u) u = Sup n 1 G 1/2 R K (v) L G 1/2 x + H 1/2 v u q x + H 1/2 v dv R0 (u) u q (x + H 1/2 v) n 1 G 1/2 R Sup K (v) L G 1/2 x + H 1/2 v u à q x + H 1/2 v! u q (x + H 1/2 v) q (x) dv p (x) n 1 G 1/2 Sup R K (v) L G 1/2 x + H 1/2 v u W q/p H 1/2 v dv u = O ³n 1/2 1 G, where Wq H 1/2 v denotes the modulus of continuity of q p. p Combining the results of the Lemmas 2 and 3, we prove the Lemma 1. Later, we give the proof of the asymptotic distribution of T n,i in the case G 1 2 = α H 1 2. Through the construction of the imputed estimator, we have the following decomposition of T n,i : T n,i = n H 1 4 Z ( bm I,H,G (x) m θn (x)) 2 w (x) dx = I 1 + I 2 +2I 12, R P where I 1 = n H 1 4 n i=1 lδ i (x; H, G)(Y i m θ0 (X i )) 2 w (x) dx, R I 2 = n H 1 4 [(mθ0 (x) m θn (x))] 2 w (x) dx, R P and I 12 = n H 1 4 n i=1 lδ i (x; H, G)(Y i m θ0 (X i )) (m θ0 (x) m θn (x)) w (x) dx. Under the hypothesis (A.8) and the conditions of s, f and p, itiseasyto check that I 2 = o P (1). Furthermore, similar considerations to complete data case, yield I 12 = o P (1). 35

37 We have seen that the weight function l δ i (x; H, G) can be approximated by e l δ i (x; H, G). Hence, the asymptotic distribution of T n,i is determined by the approximation e I 1 of I 1, where ei 1 = n H 1 4 Z ³ el δ i (x; H, G)(c n s (X i )+η i ) 2 w (x) dx, Developing ei 1 we obtain the following decomposition: ei 1 = , R ³ P where 1 = n H 1 4 n e i=1 li δ (x; H, G)η i 2 w (x) dx, 2 = n H 1 4 R ³ P n e 2 i=1 li δ (x; H, G)c n s (X i ) w (x) dx, and 3 is the cross product. By a simple application of the Markov s inequality, we have: 2 = Z p(x)(kh s)(x)+q(x)α 1 (A H s)(x) 2 w (x) dx + o P (1). By straightforward calculations one gets E [ 3 ]=0, and that E ( 3 ) 2 = o (1), hence 3 = o P (1). R h P Let s see the term 1 = n H 1 4 n e i 2 i=1 li δ (x; H, G)η i w (x) dx = R µ P ³ where 11 = n H 1 4 n el δ i=1 i (x; H, G)η i 2 w (x) dx R ³ and 12 = n H 1 P ³ 4 n el δ i6=j=1 i (x; H, G)η i ³ el δ j (x; H, G)η j w (x) dx. 36

38 Now, if we span 11, it is obvious that: à Ã!! R 11 = n H 1 P 4 n δj η j j=1 K H (x X j )+ H q (x) A nf (x) G 1 H (x X j ) w (x) dx 2 p (x) = where 111 = n H = n H =2n H 1 4 µ R P n δj η 2 j j=1 nf (x) K H (x X j ) w (x) dx à R P n δj η j H 1 2 q (x) j=1 nf (x) R P n j=1 à µ δj η j nf (x)! 2 w (x) dx and A G 1 H (x X j ) 2 p (x) 2 H 1 2 q (x) A G 1 H (x X j ) K H (x X j ) 2 p (x) Arguing as in the article of Härdle and Mammen (1993) one sees that: 111 = H = H 1 4 and 113 = 2 H 1 4! w (x) dx Z µz σ 2 (x) p (x) w (x) K 2 (u) du dx + o P (1), f (x) Z µz σ 2 (x) q 2 (x) w (x) α 2 A 2 (u) du dx + o P (1) f (x) p (x) Z µz σ 2 (x) q (x) w (x) α 1 K (u) A (u) du dx + o P (1). f (x) Hence: 11 = H 1 4 Z σ 2 (x) w (x) (v 1 (x)) dx + o P (1), f (x) where v 1 (x) has the expression (5). Finally 12 = n H 1 4 R P n i,j=1 i6=j ½ ¾½ ¾ δi η i nf (x) B δj η j H (x X i ) nf (x) B H (x X j ) w (x) dx 37

39 with B H (x X j )=K H (x X j )+ H 1 2 q (x) G 1 2 p (x) A H (x X j ) In this point we make use of Theorem (2.1) of Jong (1987) applied to the quadratic form 12. n H 1 4 with k ij = 12 = nx k ij i,j=1 i6=j R µ 2 1 B H (x X i ) δ i η nf (x) i B H (x X j ) δ j η j w (x) dx si i 6= j 0 si i = j It suffices to check the following conditions: 1)E [k ij /X i,δ i,η i ]=0 2)E k ij /X j,δ j,η j =0 P 3) max n 1 i n j=1 Var(k ij) /V ar ( 12 ) 0 4)E ( 12 ) 4 / {Var( 12 )} 2 3 It is obvious that the first two conditions are verified. Also, Var( 12 )= 2n (n 1) E (k ij ) 2 But E (k 12 ) 2 RRRR = n 2 H 1 2 w (x) (nf (x)) 2 w (z) (nf (z)) 2 p (x 2) f (x 2 ) σ 2 (x 2 ) p (x 1 ) f (x 1 ) σ 2 (x 1 ) B H (x x 1 ) B H (z x 1 ) B H (x x 2 ) B H (z x 2 ) dxdzdx 1 dx 2, using the properties of continuity and boundary, we obtain: E (k 12 ) 2 Z µ w (x) σ n (x) p (x) c 1 (x) dx, f (x) 38

40 andthisissufficient to prove condition 3. Itremainstoprove condition4; for this, we can repeat the steps in the proof of Theorem 2 in Härdle and Mammen (1993, p. 1943) taking into account the weights of the imputed estimator. Therefore we obtain the asymptotic normality of 12. Combining thepreviousresultstheproofisconcluded. The proof for the cases H 1 2 G and G 1 2 H as (element to element) is similar, it is sufficient to consider the asymptotic representation of the weights of each case, and repeat the previous calculations. Proof of the Theorem 3 Under the hypotheses A1-A12, we derive the asymptotic distribution of the Bootstrap versions of the statistics T n,s and T n,i following the same lines of Theorem 1 and 2. ACKNOWLEDGEMENTS Research supported in part by MCyT Grant BFM (European FEDER support included), by PG IDIT 03 PXIC 20702PN of Dirección Xeral de Investigación e Desenvolvemento (Xunta de Galicia), and by Vicerrectorado de Investigación of the Universidad de Vigo. REFERENCES J. T. Alcalá,, J. A. Cristóbal & W. González-Manteiga (1999). Goodnessof-fit test for linear models based on local polynomials. Statistics & Probability Letters, 42, P. Billingsley (1999). Convergence in probability measures. Second edition, John Wiley & Sons, Inc., New York. 39

41 P.E. Cheng (1994). Nonparametric estimation of mean functionals with data missing at random. J. Amer. Statist. Assoc., 89, no 425, C. K. Chu & P. E. Cheng (1995). Nonparametric regression estimation with missing data. Journal of Statistical Planning and Inference, 48, J. Fan & I. Gijbels (1996). Local polynomial modelling and its applications, Chapman and Hall, London. W. González Manteiga & A. Pérez González (2003). Nonparametric mean estimation with missing data. To appear in Communications in Statistics. W. Härdle & E. Mammen (1993). Comparing nonparametric versus parametric regression fits. The Annals of Statistics, 21, P. De Jong (1987). A central limit theorem for generalized quadratic forms. Probability Theory and Related Fields, 75, no. 2, R. J. A. Little (1992). Regression with missing X s: a review. Journal of the American Statistical Association, 87, R. J. A. Little & D. B. Rubin (2002). Statistical analysis with missing data. 2nd ed., J. Wiley & Sons, New York. E. A. Nadaraya (1964). On estimating regression. Theory of Probability and its Applications, 10,

42 D.Ruppert & M. P. Wand (1994). Multivariate locally weighted least squares regression. The Annals of Statistics, 22, 3, G. A. F. Seber (1977). Linear regression analysis. John Wiley and Sons, New York. Q. Wang & J. N. K. Rao (2001). Empirical likelihood for linear regression models under imputation for missing responses. The Canadian Journal of Statistics, 29, no. 4, Q. Wang & J. N. K. Rao (2002). Empirical likelihood-based inference in linear models with missing data. Scandinavian Journal of Statistics, 29, no. 3, C. Y. Wang, J. C. Chen, S. M. Lee & S. T. Ou (2002). Joint conditional likelihood estimator in logistic regression with missing covariate data. Statistica Sinica, 12, C. Y. Wang, S. Wang, R. J. Carrol & R. G. Gutierrez (1998). Local linear regression for generalized linear models with missing data. The Annals of Statistics, 26, G. S. Watson (1964). Smooth regression analysis. Sankhya. The Indian Journal of Statistics, Ser. A, 26,

43 González Manteiga, W.: Departamento de Estadística e Investigación Operativa, Universidad de Santiago de Compostela, Facultad de Matemáticas, Campus Sur, C.P , Santiago de Compostela, Spain Pérez González, A.: anapg@uvigo.es Departamento de Estadística e Investigación Operativa, Universidad devigo, Escuela Superior de Ingeniería Informática, Campus AS Lagoas, C.P , Orense, Spain 42

Bickel Rosenblatt test

Bickel Rosenblatt test University of Latvia 28.05.2011. A classical Let X 1,..., X n be i.i.d. random variables with a continuous probability density function f. Consider a simple hypothesis H 0 : f = f 0 with a significance

More information

Goodness-of-fit tests for the cure rate in a mixture cure model

Goodness-of-fit tests for the cure rate in a mixture cure model Biometrika (217), 13, 1, pp. 1 7 Printed in Great Britain Advance Access publication on 31 July 216 Goodness-of-fit tests for the cure rate in a mixture cure model BY U.U. MÜLLER Department of Statistics,

More information

Defect Detection using Nonparametric Regression

Defect Detection using Nonparametric Regression Defect Detection using Nonparametric Regression Siana Halim Industrial Engineering Department-Petra Christian University Siwalankerto 121-131 Surabaya- Indonesia halim@petra.ac.id Abstract: To compare

More information

Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon

Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon Jianqing Fan Department of Statistics Chinese University of Hong Kong AND Department of Statistics

More information

Smooth nonparametric estimation of a quantile function under right censoring using beta kernels

Smooth nonparametric estimation of a quantile function under right censoring using beta kernels Smooth nonparametric estimation of a quantile function under right censoring using beta kernels Chanseok Park 1 Department of Mathematical Sciences, Clemson University, Clemson, SC 29634 Short Title: Smooth

More information

DEPARTMENT MATHEMATIK ARBEITSBEREICH MATHEMATISCHE STATISTIK UND STOCHASTISCHE PROZESSE

DEPARTMENT MATHEMATIK ARBEITSBEREICH MATHEMATISCHE STATISTIK UND STOCHASTISCHE PROZESSE Estimating the error distribution in nonparametric multiple regression with applications to model testing Natalie Neumeyer & Ingrid Van Keilegom Preprint No. 2008-01 July 2008 DEPARTMENT MATHEMATIK ARBEITSBEREICH

More information

TESTING FOR THE EQUALITY OF k REGRESSION CURVES

TESTING FOR THE EQUALITY OF k REGRESSION CURVES Statistica Sinica 17(2007, 1115-1137 TESTNG FOR THE EQUALTY OF k REGRESSON CURVES Juan Carlos Pardo-Fernández, ngrid Van Keilegom and Wenceslao González-Manteiga Universidade de Vigo, Université catholique

More information

UNIVERSITÄT POTSDAM Institut für Mathematik

UNIVERSITÄT POTSDAM Institut für Mathematik UNIVERSITÄT POTSDAM Institut für Mathematik Testing the Acceleration Function in Life Time Models Hannelore Liero Matthias Liero Mathematische Statistik und Wahrscheinlichkeitstheorie Universität Potsdam

More information

Local linear multiple regression with variable. bandwidth in the presence of heteroscedasticity

Local linear multiple regression with variable. bandwidth in the presence of heteroscedasticity Local linear multiple regression with variable bandwidth in the presence of heteroscedasticity Azhong Ye 1 Rob J Hyndman 2 Zinai Li 3 23 January 2006 Abstract: We present local linear estimator with variable

More information

A Note on Data-Adaptive Bandwidth Selection for Sequential Kernel Smoothers

A Note on Data-Adaptive Bandwidth Selection for Sequential Kernel Smoothers 6th St.Petersburg Workshop on Simulation (2009) 1-3 A Note on Data-Adaptive Bandwidth Selection for Sequential Kernel Smoothers Ansgar Steland 1 Abstract Sequential kernel smoothers form a class of procedures

More information

Local Polynomial Regression

Local Polynomial Regression VI Local Polynomial Regression (1) Global polynomial regression We observe random pairs (X 1, Y 1 ),, (X n, Y n ) where (X 1, Y 1 ),, (X n, Y n ) iid (X, Y ). We want to estimate m(x) = E(Y X = x) based

More information

Estimation of the Bivariate and Marginal Distributions with Censored Data

Estimation of the Bivariate and Marginal Distributions with Censored Data Estimation of the Bivariate and Marginal Distributions with Censored Data Michael Akritas and Ingrid Van Keilegom Penn State University and Eindhoven University of Technology May 22, 2 Abstract Two new

More information

Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone Missing Data

Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone Missing Data Journal of Multivariate Analysis 78, 6282 (2001) doi:10.1006jmva.2000.1939, available online at http:www.idealibrary.com on Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone

More information

A review of some semiparametric regression models with application to scoring

A review of some semiparametric regression models with application to scoring A review of some semiparametric regression models with application to scoring Jean-Loïc Berthet 1 and Valentin Patilea 2 1 ENSAI Campus de Ker-Lann Rue Blaise Pascal - BP 37203 35172 Bruz cedex, France

More information

A Bootstrap Test for Conditional Symmetry

A Bootstrap Test for Conditional Symmetry ANNALS OF ECONOMICS AND FINANCE 6, 51 61 005) A Bootstrap Test for Conditional Symmetry Liangjun Su Guanghua School of Management, Peking University E-mail: lsu@gsm.pku.edu.cn and Sainan Jin Guanghua School

More information

11. Bootstrap Methods

11. Bootstrap Methods 11. Bootstrap Methods c A. Colin Cameron & Pravin K. Trivedi 2006 These transparencies were prepared in 20043. They can be used as an adjunct to Chapter 11 of our subsequent book Microeconometrics: Methods

More information

41903: Introduction to Nonparametrics

41903: Introduction to Nonparametrics 41903: Notes 5 Introduction Nonparametrics fundamentally about fitting flexible models: want model that is flexible enough to accommodate important patterns but not so flexible it overspecializes to specific

More information

Modelling Non-linear and Non-stationary Time Series

Modelling Non-linear and Non-stationary Time Series Modelling Non-linear and Non-stationary Time Series Chapter 2: Non-parametric methods Henrik Madsen Advanced Time Series Analysis September 206 Henrik Madsen (02427 Adv. TS Analysis) Lecture Notes September

More information

ANALYSIS OF PANEL DATA MODELS WITH GROUPED OBSERVATIONS. 1. Introduction

ANALYSIS OF PANEL DATA MODELS WITH GROUPED OBSERVATIONS. 1. Introduction Tatra Mt Math Publ 39 (2008), 183 191 t m Mathematical Publications ANALYSIS OF PANEL DATA MODELS WITH GROUPED OBSERVATIONS Carlos Rivero Teófilo Valdés ABSTRACT We present an iterative estimation procedure

More information

Minimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model.

Minimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model. Minimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model By Michael Levine Purdue University Technical Report #14-03 Department of

More information

DESIGN-ADAPTIVE MINIMAX LOCAL LINEAR REGRESSION FOR LONGITUDINAL/CLUSTERED DATA

DESIGN-ADAPTIVE MINIMAX LOCAL LINEAR REGRESSION FOR LONGITUDINAL/CLUSTERED DATA Statistica Sinica 18(2008), 515-534 DESIGN-ADAPTIVE MINIMAX LOCAL LINEAR REGRESSION FOR LONGITUDINAL/CLUSTERED DATA Kani Chen 1, Jianqing Fan 2 and Zhezhen Jin 3 1 Hong Kong University of Science and Technology,

More information

Nonparametric Modal Regression

Nonparametric Modal Regression Nonparametric Modal Regression Summary In this article, we propose a new nonparametric modal regression model, which aims to estimate the mode of the conditional density of Y given predictors X. The nonparametric

More information

More Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order Restriction

More Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order Restriction Sankhyā : The Indian Journal of Statistics 2007, Volume 69, Part 4, pp. 700-716 c 2007, Indian Statistical Institute More Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order

More information

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics Jiti Gao Department of Statistics School of Mathematics and Statistics The University of Western Australia Crawley

More information

7 Semiparametric Estimation of Additive Models

7 Semiparametric Estimation of Additive Models 7 Semiparametric Estimation of Additive Models Additive models are very useful for approximating the high-dimensional regression mean functions. They and their extensions have become one of the most widely

More information

Local linear multivariate. regression with variable. bandwidth in the presence of. heteroscedasticity

Local linear multivariate. regression with variable. bandwidth in the presence of. heteroscedasticity Model ISSN 1440-771X Department of Econometrics and Business Statistics http://www.buseco.monash.edu.au/depts/ebs/pubs/wpapers/ Local linear multivariate regression with variable bandwidth in the presence

More information

Computational treatment of the error distribution in nonparametric regression with right-censored and selection-biased data

Computational treatment of the error distribution in nonparametric regression with right-censored and selection-biased data Computational treatment of the error distribution in nonparametric regression with right-censored and selection-biased data Géraldine Laurent 1 and Cédric Heuchenne 2 1 QuantOM, HEC-Management School of

More information

The high order moments method in endpoint estimation: an overview

The high order moments method in endpoint estimation: an overview 1/ 33 The high order moments method in endpoint estimation: an overview Gilles STUPFLER (Aix Marseille Université) Joint work with Stéphane GIRARD (INRIA Rhône-Alpes) and Armelle GUILLOU (Université de

More information

Smooth simultaneous confidence bands for cumulative distribution functions

Smooth simultaneous confidence bands for cumulative distribution functions Journal of Nonparametric Statistics, 2013 Vol. 25, No. 2, 395 407, http://dx.doi.org/10.1080/10485252.2012.759219 Smooth simultaneous confidence bands for cumulative distribution functions Jiangyan Wang

More information

LOCAL LINEAR REGRESSION FOR GENERALIZED LINEAR MODELS WITH MISSING DATA

LOCAL LINEAR REGRESSION FOR GENERALIZED LINEAR MODELS WITH MISSING DATA The Annals of Statistics 1998, Vol. 26, No. 3, 1028 1050 LOCAL LINEAR REGRESSION FOR GENERALIZED LINEAR MODELS WITH MISSING DATA By C. Y. Wang, 1 Suojin Wang, 2 Roberto G. Gutierrez and R. J. Carroll 3

More information

GOODNESS OF FIT TESTS FOR PARAMETRIC REGRESSION MODELS BASED ON EMPIRICAL CHARACTERISTIC FUNCTIONS

GOODNESS OF FIT TESTS FOR PARAMETRIC REGRESSION MODELS BASED ON EMPIRICAL CHARACTERISTIC FUNCTIONS K Y B E R N E T I K A V O L U M E 4 5 2 0 0 9, N U M B E R 6, P A G E S 9 6 0 9 7 1 GOODNESS OF FIT TESTS FOR PARAMETRIC REGRESSION MODELS BASED ON EMPIRICAL CHARACTERISTIC FUNCTIONS Marie Hušková and

More information

Statistica Sinica Preprint No: SS

Statistica Sinica Preprint No: SS Statistica Sinica Preprint No: SS-017-0013 Title A Bootstrap Method for Constructing Pointwise and Uniform Confidence Bands for Conditional Quantile Functions Manuscript ID SS-017-0013 URL http://wwwstatsinicaedutw/statistica/

More information

A new lack-of-fit test for quantile regression models using logistic regression

A new lack-of-fit test for quantile regression models using logistic regression A new lack-of-fit test for quantile regression models using logistic regression Mercedes Conde-Amboage 1 & Valentin Patilea 2 & César Sánchez-Sellero 1 1 Department of Statistics and O.R.,University of

More information

DA Freedman Notes on the MLE Fall 2003

DA Freedman Notes on the MLE Fall 2003 DA Freedman Notes on the MLE Fall 2003 The object here is to provide a sketch of the theory of the MLE. Rigorous presentations can be found in the references cited below. Calculus. Let f be a smooth, scalar

More information

Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model

Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model 1. Introduction Varying-coefficient partially linear model (Zhang, Lee, and Song, 2002; Xia, Zhang, and Tong, 2004;

More information

Improving linear quantile regression for

Improving linear quantile regression for Improving linear quantile regression for replicated data arxiv:1901.0369v1 [stat.ap] 16 Jan 2019 Kaushik Jana 1 and Debasis Sengupta 2 1 Imperial College London, UK 2 Indian Statistical Institute, Kolkata,

More information

GOODNESS-OF-FIT TEST FOR RANDOMLY CENSORED DATA BASED ON MAXIMUM CORRELATION. Ewa Strzalkowska-Kominiak and Aurea Grané (1)

GOODNESS-OF-FIT TEST FOR RANDOMLY CENSORED DATA BASED ON MAXIMUM CORRELATION. Ewa Strzalkowska-Kominiak and Aurea Grané (1) Working Paper 4-2 Statistics and Econometrics Series (4) July 24 Departamento de Estadística Universidad Carlos III de Madrid Calle Madrid, 26 2893 Getafe (Spain) Fax (34) 9 624-98-49 GOODNESS-OF-FIT TEST

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized

More information

Nonparametric Econometrics

Nonparametric Econometrics Applied Microeconometrics with Stata Nonparametric Econometrics Spring Term 2011 1 / 37 Contents Introduction The histogram estimator The kernel density estimator Nonparametric regression estimators Semi-

More information

Stat 710: Mathematical Statistics Lecture 31

Stat 710: Mathematical Statistics Lecture 31 Stat 710: Mathematical Statistics Lecture 31 Jun Shao Department of Statistics University of Wisconsin Madison, WI 53706, USA Jun Shao (UW-Madison) Stat 710, Lecture 31 April 13, 2009 1 / 13 Lecture 31:

More information

Local Polynomial Wavelet Regression with Missing at Random

Local Polynomial Wavelet Regression with Missing at Random Applied Mathematical Sciences, Vol. 6, 2012, no. 57, 2805-2819 Local Polynomial Wavelet Regression with Missing at Random Alsaidi M. Altaher School of Mathematical Sciences Universiti Sains Malaysia 11800

More information

Preface. 1 Nonparametric Density Estimation and Testing. 1.1 Introduction. 1.2 Univariate Density Estimation

Preface. 1 Nonparametric Density Estimation and Testing. 1.1 Introduction. 1.2 Univariate Density Estimation Preface Nonparametric econometrics has become one of the most important sub-fields in modern econometrics. The primary goal of this lecture note is to introduce various nonparametric and semiparametric

More information

Supplement to Quantile-Based Nonparametric Inference for First-Price Auctions

Supplement to Quantile-Based Nonparametric Inference for First-Price Auctions Supplement to Quantile-Based Nonparametric Inference for First-Price Auctions Vadim Marmer University of British Columbia Artyom Shneyerov CIRANO, CIREQ, and Concordia University August 30, 2010 Abstract

More information

Density Estimation (II)

Density Estimation (II) Density Estimation (II) Yesterday Overview & Issues Histogram Kernel estimators Ideogram Today Further development of optimization Estimating variance and bias Adaptive kernels Multivariate kernel estimation

More information

BAYESIAN ESTIMATION IN DAM MONITORING NETWORKS

BAYESIAN ESTIMATION IN DAM MONITORING NETWORKS BAYESIAN ESIMAION IN DAM MONIORING NEWORKS João CASACA, Pedro MAEUS, and João COELHO National Laboratory for Civil Engineering, Portugal Abstract: A Bayesian estimator with informative prior distributions

More information

Testing Statistical Hypotheses

Testing Statistical Hypotheses E.L. Lehmann Joseph P. Romano Testing Statistical Hypotheses Third Edition 4y Springer Preface vii I Small-Sample Theory 1 1 The General Decision Problem 3 1.1 Statistical Inference and Statistical Decisions

More information

OPTIMAL TRANSPORTATION PLANS AND CONVERGENCE IN DISTRIBUTION

OPTIMAL TRANSPORTATION PLANS AND CONVERGENCE IN DISTRIBUTION OPTIMAL TRANSPORTATION PLANS AND CONVERGENCE IN DISTRIBUTION J.A. Cuesta-Albertos 1, C. Matrán 2 and A. Tuero-Díaz 1 1 Departamento de Matemáticas, Estadística y Computación. Universidad de Cantabria.

More information

Some Monte Carlo Evidence for Adaptive Estimation of Unit-Time Varying Heteroscedastic Panel Data Models

Some Monte Carlo Evidence for Adaptive Estimation of Unit-Time Varying Heteroscedastic Panel Data Models Some Monte Carlo Evidence for Adaptive Estimation of Unit-Time Varying Heteroscedastic Panel Data Models G. R. Pasha Department of Statistics, Bahauddin Zakariya University Multan, Pakistan E-mail: drpasha@bzu.edu.pk

More information

Adaptive Kernel Estimation of The Hazard Rate Function

Adaptive Kernel Estimation of The Hazard Rate Function Adaptive Kernel Estimation of The Hazard Rate Function Raid Salha Department of Mathematics, Islamic University of Gaza, Palestine, e-mail: rbsalha@mail.iugaza.edu Abstract In this paper, we generalized

More information

The Bootstrap: Theory and Applications. Biing-Shen Kuo National Chengchi University

The Bootstrap: Theory and Applications. Biing-Shen Kuo National Chengchi University The Bootstrap: Theory and Applications Biing-Shen Kuo National Chengchi University Motivation: Poor Asymptotic Approximation Most of statistical inference relies on asymptotic theory. Motivation: Poor

More information

High Dimensional Empirical Likelihood for Generalized Estimating Equations with Dependent Data

High Dimensional Empirical Likelihood for Generalized Estimating Equations with Dependent Data High Dimensional Empirical Likelihood for Generalized Estimating Equations with Dependent Data Song Xi CHEN Guanghua School of Management and Center for Statistical Science, Peking University Department

More information

Local Polynomial Modelling and Its Applications

Local Polynomial Modelling and Its Applications Local Polynomial Modelling and Its Applications J. Fan Department of Statistics University of North Carolina Chapel Hill, USA and I. Gijbels Institute of Statistics Catholic University oflouvain Louvain-la-Neuve,

More information

1 Glivenko-Cantelli type theorems

1 Glivenko-Cantelli type theorems STA79 Lecture Spring Semester Glivenko-Cantelli type theorems Given i.i.d. observations X,..., X n with unknown distribution function F (t, consider the empirical (sample CDF ˆF n (t = I [Xi t]. n Then

More information

On a Nonparametric Notion of Residual and its Applications

On a Nonparametric Notion of Residual and its Applications On a Nonparametric Notion of Residual and its Applications Bodhisattva Sen and Gábor Székely arxiv:1409.3886v1 [stat.me] 12 Sep 2014 Columbia University and National Science Foundation September 16, 2014

More information

Nonparametric Inference via Bootstrapping the Debiased Estimator

Nonparametric Inference via Bootstrapping the Debiased Estimator Nonparametric Inference via Bootstrapping the Debiased Estimator Yen-Chi Chen Department of Statistics, University of Washington ICSA-Canada Chapter Symposium 2017 1 / 21 Problem Setup Let X 1,, X n be

More information

Nonparametric Identi cation and Estimation of Truncated Regression Models with Heteroskedasticity

Nonparametric Identi cation and Estimation of Truncated Regression Models with Heteroskedasticity Nonparametric Identi cation and Estimation of Truncated Regression Models with Heteroskedasticity Songnian Chen a, Xun Lu a, Xianbo Zhou b and Yahong Zhou c a Department of Economics, Hong Kong University

More information

Additive Isotonic Regression

Additive Isotonic Regression Additive Isotonic Regression Enno Mammen and Kyusang Yu 11. July 2006 INTRODUCTION: We have i.i.d. random vectors (Y 1, X 1 ),..., (Y n, X n ) with X i = (X1 i,..., X d i ) and we consider the additive

More information

Model-free prediction intervals for regression and autoregression. Dimitris N. Politis University of California, San Diego

Model-free prediction intervals for regression and autoregression. Dimitris N. Politis University of California, San Diego Model-free prediction intervals for regression and autoregression Dimitris N. Politis University of California, San Diego To explain or to predict? Models are indispensable for exploring/utilizing relationships

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

A Test for Order Restriction of Several Multivariate Normal Mean Vectors against all Alternatives when the Covariance Matrices are Unknown but Common

A Test for Order Restriction of Several Multivariate Normal Mean Vectors against all Alternatives when the Covariance Matrices are Unknown but Common Journal of Statistical Theory and Applications Volume 11, Number 1, 2012, pp. 23-45 ISSN 1538-7887 A Test for Order Restriction of Several Multivariate Normal Mean Vectors against all Alternatives when

More information

Nonparametric confidence intervals. for receiver operating characteristic curves

Nonparametric confidence intervals. for receiver operating characteristic curves Nonparametric confidence intervals for receiver operating characteristic curves Peter G. Hall 1, Rob J. Hyndman 2, and Yanan Fan 3 5 December 2003 Abstract: We study methods for constructing confidence

More information

ECO Class 6 Nonparametric Econometrics

ECO Class 6 Nonparametric Econometrics ECO 523 - Class 6 Nonparametric Econometrics Carolina Caetano Contents 1 Nonparametric instrumental variable regression 1 2 Nonparametric Estimation of Average Treatment Effects 3 2.1 Asymptotic results................................

More information

Can we do statistical inference in a non-asymptotic way? 1

Can we do statistical inference in a non-asymptotic way? 1 Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.

More information

Nonparametric Methods

Nonparametric Methods Nonparametric Methods Michael R. Roberts Department of Finance The Wharton School University of Pennsylvania July 28, 2009 Michael R. Roberts Nonparametric Methods 1/42 Overview Great for data analysis

More information

PREWHITENING-BASED ESTIMATION IN PARTIAL LINEAR REGRESSION MODELS: A COMPARATIVE STUDY

PREWHITENING-BASED ESTIMATION IN PARTIAL LINEAR REGRESSION MODELS: A COMPARATIVE STUDY REVSTAT Statistical Journal Volume 7, Number 1, April 2009, 37 54 PREWHITENING-BASED ESTIMATION IN PARTIAL LINEAR REGRESSION MODELS: A COMPARATIVE STUDY Authors: Germán Aneiros-Pérez Departamento de Matemáticas,

More information

A CONDITION TO OBTAIN THE SAME DECISION IN THE HOMOGENEITY TEST- ING PROBLEM FROM THE FREQUENTIST AND BAYESIAN POINT OF VIEW

A CONDITION TO OBTAIN THE SAME DECISION IN THE HOMOGENEITY TEST- ING PROBLEM FROM THE FREQUENTIST AND BAYESIAN POINT OF VIEW A CONDITION TO OBTAIN THE SAME DECISION IN THE HOMOGENEITY TEST- ING PROBLEM FROM THE FREQUENTIST AND BAYESIAN POINT OF VIEW Miguel A Gómez-Villegas and Beatriz González-Pérez Departamento de Estadística

More information

UNIVERSIDADE DE SANTIAGO DE COMPOSTELA DEPARTAMENTO DE ESTATÍSTICA E INVESTIGACIÓN OPERATIVA

UNIVERSIDADE DE SANTIAGO DE COMPOSTELA DEPARTAMENTO DE ESTATÍSTICA E INVESTIGACIÓN OPERATIVA UNIVERSIDADE DE SANTIAGO DE COMPOSTELA DEPARTAMENTO DE ESTATÍSTICA E INVESTIGACIÓN OPERATIVA BOOSTING FOR REAL AND FUNCTIONAL SAMPLES. AN APPLICATION TO AN ENVIRONMENTAL PROBLEM B. M. Fernández de Castro

More information

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score

More information

Confidence intervals for kernel density estimation

Confidence intervals for kernel density estimation Stata User Group - 9th UK meeting - 19/20 May 2003 Confidence intervals for kernel density estimation Carlo Fiorio c.fiorio@lse.ac.uk London School of Economics and STICERD Stata User Group - 9th UK meeting

More information

NONPARAMETRIC DENSITY ESTIMATION WITH RESPECT TO THE LINEX LOSS FUNCTION

NONPARAMETRIC DENSITY ESTIMATION WITH RESPECT TO THE LINEX LOSS FUNCTION NONPARAMETRIC DENSITY ESTIMATION WITH RESPECT TO THE LINEX LOSS FUNCTION R. HASHEMI, S. REZAEI AND L. AMIRI Department of Statistics, Faculty of Science, Razi University, 67149, Kermanshah, Iran. ABSTRACT

More information

Heteroskedasticity-Robust Inference in Finite Samples

Heteroskedasticity-Robust Inference in Finite Samples Heteroskedasticity-Robust Inference in Finite Samples Jerry Hausman and Christopher Palmer Massachusetts Institute of Technology December 011 Abstract Since the advent of heteroskedasticity-robust standard

More information

Multivariate Statistics

Multivariate Statistics Multivariate Statistics Chapter 2: Multivariate distributions and inference Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2016/2017 Master in Mathematical

More information

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Jae-Kwang Kim 1 Iowa State University June 26, 2013 1 Joint work with Shu Yang Introduction 1 Introduction

More information

Generalized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence

Generalized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence Generalized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence Sunil Kumar Dhar Center for Applied Mathematics and Statistics, Department of Mathematical Sciences, New Jersey

More information

Likelihood Ratio Tests. that Certain Variance Components Are Zero. Ciprian M. Crainiceanu. Department of Statistical Science

Likelihood Ratio Tests. that Certain Variance Components Are Zero. Ciprian M. Crainiceanu. Department of Statistical Science 1 Likelihood Ratio Tests that Certain Variance Components Are Zero Ciprian M. Crainiceanu Department of Statistical Science www.people.cornell.edu/pages/cmc59 Work done jointly with David Ruppert, School

More information

Introduction An approximated EM algorithm Simulation studies Discussion

Introduction An approximated EM algorithm Simulation studies Discussion 1 / 33 An Approximated Expectation-Maximization Algorithm for Analysis of Data with Missing Values Gong Tang Department of Biostatistics, GSPH University of Pittsburgh NISS Workshop on Nonignorable Nonresponse

More information

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the

More information

ESTIMATION OF NONLINEAR BERKSON-TYPE MEASUREMENT ERROR MODELS

ESTIMATION OF NONLINEAR BERKSON-TYPE MEASUREMENT ERROR MODELS Statistica Sinica 13(2003), 1201-1210 ESTIMATION OF NONLINEAR BERKSON-TYPE MEASUREMENT ERROR MODELS Liqun Wang University of Manitoba Abstract: This paper studies a minimum distance moment estimator for

More information

O Combining cross-validation and plug-in methods - for kernel density bandwidth selection O

O Combining cross-validation and plug-in methods - for kernel density bandwidth selection O O Combining cross-validation and plug-in methods - for kernel density selection O Carlos Tenreiro CMUC and DMUC, University of Coimbra PhD Program UC UP February 18, 2011 1 Overview The nonparametric problem

More information

Time Series and Forecasting Lecture 4 NonLinear Time Series

Time Series and Forecasting Lecture 4 NonLinear Time Series Time Series and Forecasting Lecture 4 NonLinear Time Series Bruce E. Hansen Summer School in Economics and Econometrics University of Crete July 23-27, 2012 Bruce Hansen (University of Wisconsin) Foundations

More information

Shu Yang and Jae Kwang Kim. Harvard University and Iowa State University

Shu Yang and Jae Kwang Kim. Harvard University and Iowa State University Statistica Sinica 27 (2017), 000-000 doi:https://doi.org/10.5705/ss.202016.0155 DISCUSSION: DISSECTING MULTIPLE IMPUTATION FROM A MULTI-PHASE INFERENCE PERSPECTIVE: WHAT HAPPENS WHEN GOD S, IMPUTER S AND

More information

Preliminaries The bootstrap Bias reduction Hypothesis tests Regression Confidence intervals Time series Final remark. Bootstrap inference

Preliminaries The bootstrap Bias reduction Hypothesis tests Regression Confidence intervals Time series Final remark. Bootstrap inference 1 / 171 Bootstrap inference Francisco Cribari-Neto Departamento de Estatística Universidade Federal de Pernambuco Recife / PE, Brazil email: cribari@gmail.com October 2013 2 / 171 Unpaid advertisement

More information

Transformation and Smoothing in Sample Survey Data

Transformation and Smoothing in Sample Survey Data Scandinavian Journal of Statistics, Vol. 37: 496 513, 2010 doi: 10.1111/j.1467-9469.2010.00691.x Published by Blackwell Publishing Ltd. Transformation and Smoothing in Sample Survey Data YANYUAN MA Department

More information

Test for Discontinuities in Nonparametric Regression

Test for Discontinuities in Nonparametric Regression Communications of the Korean Statistical Society Vol. 15, No. 5, 2008, pp. 709 717 Test for Discontinuities in Nonparametric Regression Dongryeon Park 1) Abstract The difference of two one-sided kernel

More information

The EM Algorithm for the Finite Mixture of Exponential Distribution Models

The EM Algorithm for the Finite Mixture of Exponential Distribution Models Int. J. Contemp. Math. Sciences, Vol. 9, 2014, no. 2, 57-64 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ijcms.2014.312133 The EM Algorithm for the Finite Mixture of Exponential Distribution

More information

BOOTSTRAPPING SAMPLE QUANTILES BASED ON COMPLEX SURVEY DATA UNDER HOT DECK IMPUTATION

BOOTSTRAPPING SAMPLE QUANTILES BASED ON COMPLEX SURVEY DATA UNDER HOT DECK IMPUTATION Statistica Sinica 8(998), 07-085 BOOTSTRAPPING SAMPLE QUANTILES BASED ON COMPLEX SURVEY DATA UNDER HOT DECK IMPUTATION Jun Shao and Yinzhong Chen University of Wisconsin-Madison Abstract: The bootstrap

More information

. Find E(V ) and var(v ).

. Find E(V ) and var(v ). Math 6382/6383: Probability Models and Mathematical Statistics Sample Preliminary Exam Questions 1. A person tosses a fair coin until she obtains 2 heads in a row. She then tosses a fair die the same number

More information

Bootstrap, Jackknife and other resampling methods

Bootstrap, Jackknife and other resampling methods Bootstrap, Jackknife and other resampling methods Part III: Parametric Bootstrap Rozenn Dahyot Room 128, Department of Statistics Trinity College Dublin, Ireland dahyot@mee.tcd.ie 2005 R. Dahyot (TCD)

More information

A note on multiple imputation for general purpose estimation

A note on multiple imputation for general purpose estimation A note on multiple imputation for general purpose estimation Shu Yang Jae Kwang Kim SSC meeting June 16, 2015 Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 1 / 32 Introduction Basic Setup Assume

More information

Large Sample Properties of Estimators in the Classical Linear Regression Model

Large Sample Properties of Estimators in the Classical Linear Regression Model Large Sample Properties of Estimators in the Classical Linear Regression Model 7 October 004 A. Statement of the classical linear regression model The classical linear regression model can be written in

More information

A comparison of different nonparametric methods for inference on additive models

A comparison of different nonparametric methods for inference on additive models A comparison of different nonparametric methods for inference on additive models Holger Dette Ruhr-Universität Bochum Fakultät für Mathematik D - 44780 Bochum, Germany Carsten von Lieres und Wilkau Ruhr-Universität

More information

Variance Function Estimation in Multivariate Nonparametric Regression

Variance Function Estimation in Multivariate Nonparametric Regression Variance Function Estimation in Multivariate Nonparametric Regression T. Tony Cai 1, Michael Levine Lie Wang 1 Abstract Variance function estimation in multivariate nonparametric regression is considered

More information

Bahadur representations for bootstrap quantiles 1

Bahadur representations for bootstrap quantiles 1 Bahadur representations for bootstrap quantiles 1 Yijun Zuo Department of Statistics and Probability, Michigan State University East Lansing, MI 48824, USA zuo@msu.edu 1 Research partially supported by

More information

Statistical inference on Lévy processes

Statistical inference on Lévy processes Alberto Coca Cabrero University of Cambridge - CCA Supervisors: Dr. Richard Nickl and Professor L.C.G.Rogers Funded by Fundación Mutua Madrileña and EPSRC MASDOC/CCA student workshop 2013 26th March Outline

More information

MIVQUE and Maximum Likelihood Estimation for Multivariate Linear Models with Incomplete Observations

MIVQUE and Maximum Likelihood Estimation for Multivariate Linear Models with Incomplete Observations Sankhyā : The Indian Journal of Statistics 2006, Volume 68, Part 3, pp. 409-435 c 2006, Indian Statistical Institute MIVQUE and Maximum Likelihood Estimation for Multivariate Linear Models with Incomplete

More information

The exact bootstrap method shown on the example of the mean and variance estimation

The exact bootstrap method shown on the example of the mean and variance estimation Comput Stat (2013) 28:1061 1077 DOI 10.1007/s00180-012-0350-0 ORIGINAL PAPER The exact bootstrap method shown on the example of the mean and variance estimation Joanna Kisielinska Received: 21 May 2011

More information

Generalized Multivariate Rank Type Test Statistics via Spatial U-Quantiles

Generalized Multivariate Rank Type Test Statistics via Spatial U-Quantiles Generalized Multivariate Rank Type Test Statistics via Spatial U-Quantiles Weihua Zhou 1 University of North Carolina at Charlotte and Robert Serfling 2 University of Texas at Dallas Final revision for

More information

Asymptotic normality of conditional distribution estimation in the single index model

Asymptotic normality of conditional distribution estimation in the single index model Acta Univ. Sapientiae, Mathematica, 9, 207 62 75 DOI: 0.55/ausm-207-000 Asymptotic normality of conditional distribution estimation in the single index model Diaa Eddine Hamdaoui Laboratory of Stochastic

More information

Log-Density Estimation with Application to Approximate Likelihood Inference

Log-Density Estimation with Application to Approximate Likelihood Inference Log-Density Estimation with Application to Approximate Likelihood Inference Martin Hazelton 1 Institute of Fundamental Sciences Massey University 19 November 2015 1 Email: m.hazelton@massey.ac.nz WWPMS,

More information

A NON-PARAMETRIC TEST FOR NON-INDEPENDENT NOISES AGAINST A BILINEAR DEPENDENCE

A NON-PARAMETRIC TEST FOR NON-INDEPENDENT NOISES AGAINST A BILINEAR DEPENDENCE REVSTAT Statistical Journal Volume 3, Number, November 5, 155 17 A NON-PARAMETRIC TEST FOR NON-INDEPENDENT NOISES AGAINST A BILINEAR DEPENDENCE Authors: E. Gonçalves Departamento de Matemática, Universidade

More information