Robust Nonparametric Methods for Regression to the Mean Model
|
|
- Jason Glenn
- 6 years ago
- Views:
Transcription
1 Western Michigan University ScholarWorks at WMU Dissertations Graduate College Robust Nonparametric Methods for Regression to the Mean Model Therawat Wisadrattanapong Western Michigan University Follow this and additional works at: Part of the Statistics and Probability Commons Recommended Citation Wisadrattanapong, Therawat, "Robust Nonparametric Methods for Regression to the Mean Model" (2011). Dissertations This Dissertation-Open Access is brought to you for free and open access by the Graduate College at ScholarWorks at WMU. It has been accepted for inclusion in Dissertations by an authorized administrator of ScholarWorks at WMU. For more information, please contact
2 ROBUST NONPARAMETRIC METHODS FOR REGRESSION TO THE MEAN MODEL by Therawat Wisadrattanapong A Dissertation Submitted to the Faculty of The Graduate College in partial fulfillment of the requirements for the Degree of Doctor of Philosophy Department of Statistics Advisor: Joseph W. McKean, Ph.D. Western Michigan University Kalamazoo, Michigan August 2011
3 ROBUST NONPARAMETRIC METHODS FOR REGRESSION TO THE MEAN MODEL Therawat Wisadrattanapong, Ph.D. Western Michigan University, 2011 Regression to the mean is a statistical phenomenon that often confounds treatment effects in experiments. Consider an experiment involving a treatment, in which a response is measured (baseline) on a subject then a treatment is applied and a second measurement is taken. Then under many bivariate models for the pair of responses (including the bivariate normal), the predicted response of the second measurement will regress to the mean. In experiments where the second response is only taken for a select sample, say above a cutoff value, then this regression to the mean effect may mistakenly be thought of as a treatment effect. In this investigation, we consider a model of the treatment effect which also takes into account this regression to the mean effect. In particular, we consider the multiplicative model of Naranjo and McKean (2001). Naranjo and McKean developed a bootstrap test for treatment effect based on least squares methods for bivariate normal distributions. We developed robust procedures to assess treatment effects for this multiplicative model. Our procedures are based on rank-based methods, for general score functions. Our preliminary Monte Carlo investigations show that our procedures are robust. We extend this robust and traditional development to models other than the bivariate normal, including the multivariate t distributions. We investigate the finite sample properties of these methods and compare their empirical behavior over a variety of models and situations.
4 UMI Number: All rights reserved INFORMATION TO ALL USERS The quality of this reproduction is dependent upon the quality of the copy submitted. In the unlikely event that the author did not send a complete manuscript and there are missing pages, these will be noted. Also, if material had to be removed, a note will indicate the deletion. UMI Dissertation Publishing UMI Copyright 2011 by ProQuest LLC. All rights reserved. This edition of the work is protected against unauthorized copying under Title 17, United States Code. uest ProQuest LLC 789 East Eisenhower Parkway P.O. Box 1346 Ann Arbor, Ml
5 Copyright by Therawat Wisadrattanapong 2011
6 ACKNOWLEDGMENTS I would like to express my sincere gratitude to Dr. Joseph W. McKean, advisor and committee chair, for his patience and guidance. I also thank Dr. Joshua D. Naranjo, Dr. Bradley E. Huitema, and Dr. Jeffrey T. Terpstra, committee members, for their suggestions. Special thanks to my mother, Tarn Wisadrattanapong, my sisters, Marin Suksawat and Sineenad Vieuxtemps, and their families, and my brothers. I am deeply grateful to my father-in-law and mother-in-law, Dr. Sirichai and Suchin Watcharotone. I would also like to thank my wife, Dr. Kuanwong Watcharotone, and my sons, Sukda and Han Wisadrattanapong, for their constant encouragement and support. Therawat Wisadrattanapong ii
7 TABLE OF CONTENTS ACKNOWLEDGMENTS LIST OF TABLES LIST OF FIGURES ii v ix CHAPTER I. INTRODUCTION The Practical Problem of Interest Simple Linear Model Traditional Least Squares Fit Robust Fit High Breakdown Rank-Based (HBR) Estimates 8 II. THE MODEL AND ESTIMATES The Dual Effects Model Parameter Estimates R and HBR Estimations for Linear Regression Model Scale Functionals Scale Functions and Estimator Example of Scale Functionals and Estimators Dispersion Function ParameterX(p Median Absolute Deviation (MAD) Example: Final and Midterm Exam Scores 35 iii
8 Table of Contents continued CHAPTER III. MONTE CARLO STUDY FOR DUAL TREATMENT EFFECTS AND MULVARIATE NORMAL MODE Bootstrap Confidence Intervals (BCI) Tests of Significance Simulation Study Dual Treatment Effects Model of Different Scenarios with 20% and 30% Regression to the Mean Multivariate Normal Model with 20% Regression toward the Mean Multivariate Normal Model with Different Scenarios 49 IV. MONTE CARLO STUDY FOR MULTIVARIATE T MODEL Elliptical Distribution Simulation Study Multivariate T Model Comparing between the and Procedures Multivariate T Model Comparing between the and HBR Procedures Multivariate T Model Using 20% Regression toward the Mean Comparing among the,, and HBR Procedure 82 V. CONCLUSION 87 REFERENCES 91 IV
9 LIST OF TABLES 1. Final and Midterm Exam Scores Mean Parameters % BCI Empirical MSE when p = Empirical MSE when p = Empirical ARE when p = Empirical ARE when p = % Empirical CI when p = %) Empirical CI when p = % Empirical CI when p = % Empirical CI when p = Empirical Level of Full Test when p = Empirical Level of Full Test when p = Empirical Level of Marginal Test when p = Empirical Level of Marginal Test when p = Empirical Means when a = 1 and p = Empirical ARE when c = 1 and p = Empirical MSE Empirical ARE Empirical MSE when df = 3 and p = Empirical MSE when df = 3 and p = v
10 List of Tables continued 22. Empirical ARE when df = 3 and p = Empirical ARE when df = 3 and p = % Empirical CI when df = 3 and p = % Empirical CI when df = 3 and p = % Empirical CI when df = 3 and p = % Empirical CI when df = 3 and p = Empirical Level of Full Test when df = 3 and p = Empirical Level of Full Test when df = 3 and p = Empirical Level of Marginal Test when df = 3 and p = Empirical Level of Marginal Test when df = 3 and p = Empirical MSE when df = 5 and p = Empirical MSE when df = 5 and p = Empirical MSE when df = 5 and p = Empirical MSE when df = 5 and p = % Empirical CI when df = 5 and p = % Empirical CI when df = 5 and p = % Empirical CI when df = 5 and p = % Empirical CI when df = 5 and p = Empirical Level of Full Test when df = 5 and p = Empirical Level of Full Test when df = 5 and p = Empirical Level of Marginal Test when df = 5 and p = Empirical Level of Marginal Test when df = 5 and p = vi
11 List of Tables continued 44. Empirical MSE when df = 10 and p = Empirical MSE when df = 10 and p = Empirical MSE when df = 10 and p = Empirical MSE when df = 10 and p = % Empirical CI when df = 10 and p = % Empirical CI when df = 10 and p = % Empirical CI when df = 10 and p = % Empirical CI when df = 10 and p = Empirical Level of Full Test when df = 10 and p = Empirical Level of Full Test when df = 10 and p = Empirical Level of Marginal Test when df = 10 and p = Empirical Level of Marginal Test when df = 10 and p = Empirical MSE when df = 5 and p = Empirical MSE when df = 5 and p = Empirical MSE when df = 5 and p = Empirical MSE when df = 5 and p = %o Empirical CI when df = 5 and p = % Empirical CI when df = 5 and p = % Empirical CI when df = 5 and p = %o Empirical CI when df = 5 and p = Empirical Level of Full Test when df = 5 and p = Empirical Level of Full Test when df = 5 and p = vii
12 List of Tables continued 66. Empirical Level of Marginal Test when df = 5 and p = Empirical Level of Marginal Test when df = 5 and p = Empirical Means Empirical MSE Empirical ARE 84 viii
13 LIST OF FIGURES 1. Regression toward the Mean Plot Regression toward the Mean Plot AREofRho ARE of Gamma AREofRho ARE of Gamma 86 ix
14 CHAPTER I INTRODUCTION Regression comes from a Latin root which means going back. Francis Galton was the first person who introduced regression toward mediocrity in hereditary stature. In his famous study of the heights of fathers and first sons, Galton found that heights of the sons regressed toward the mean, that is, taller fathers (above the mean height) tended to have shorter sons and shorter fathers tended to have taller sons. Now we call it regression toward the mean or, briefly, the regression effect. The regression effect is a statistical phenomenon. It may make the variation in repeated measurements look as real change. McDonald, Mazzuca, and McCabe (1983) note that very sick patients should feel better at the second measurement without treatment due to the effect of regression toward the mean. In sexual selection, males that remain unmated in the first year, are less attractive than mated males. Here, the regression effect is that unmated males in the first year increase their attractiveness in the second year more than mated males (Kelly and Price, 2005). James (1973) states that even if the correlation between the pre and posttreatment measurement is small, the regression effect may impact statistical significance. A set of informative examples on the effect are provided by Cutter (1976). The regression toward the mean (RTM) effect tends to move initial values close to the mean on the second measurement. The experimental design for this study is the following situation. Patients or subjects for the experiment are selected on an initial (pretest) response. Only patients whose response is above a specified cutoff value; however, are selected for the experiment. Those that are selected are then treated. It is thought that this treatment reduces the response. This is the hypothesis of interest. After a specified period 1
15 of time, a second (postest) response is taken on the patient. An estimate of the treatment effect is the difference in the responses; i.e., pretest - postest. Are these differences; however, due to the treatment or the regression toward the mean effect? Unlike many experiments, there is no control group for comparisons. There is extra information, though, in the initial responses that were smaller than the cutoff point. The models discussed next utilize this information. Mee and Chua (1991), George, Johnson, and Nick (1997), and Ostermann, Willich, and Ludtke (2008) presented methods for estimating the treatment effect by using an additive model. The additive model, though, does not take into account possible interaction between the treatment and the initial response. James (1973), Senn, Brown, and James (1985), and Chen and Cox (1992) presented a multiplicative model which models this interaction. They developed an inferencce using this model. Finally, Naranjo and McKean (2001) considered a dual model that models both the additive and interaction effect. They further proposed methods for estimating treatment effect based on this dual model. All these models, though, assume a bivarite normal assumption on the measurements before and after treatment (X, Y) respectively. The resulting inference is likelihood based, least squares (), which can be seriously impaired by only one outlier; i.e., the methods are not robust. In this study we propose robust methods that are not tied to the assumption of bivariate normality. We develop robust procedures to assess treatment effects for the dual model of Naranjo and McKean (2001). Our precedures are based on rank-based methods for general score functions. In Chapter 2, we present these methods along with their corresponding theory. This includes a robust analog of the bootstrap test for treatment effect proposed by Naranjo and McKean (2001). This theory is asymptotic; however, so in Chapter 3 we present the results of an extensive Monte Carlo study. This includes comparisons for estimation of empirical mean square errors and relative efficiencies between the robust 2
16 and methods. For the tests of treatment effect, empirical levels and power are obtained and compared. Our discussion in Chapter 2 and 3 is based on the bivariate normal distribution. Our robust theory, though, is for general bivariate distributions. In Chapter 4, we consider extensions to other bivariate distributions. In particular, we consider the popular elliptical bivariate distributions; see Muirhead (1982) for an informative discussion of these models. In Chapter 4, we present the results of a Monte Carlo investigation of our robust and methods for several of these elliptical distributions. 1.1 The Practical Problem of Interest As discussed above, in this study, we are concerned with the following practical problem. We are interested in how a subject with an infirmity responds to a certain treatment. As an illustration, we offer this simple example concerning cholesterol. The infirmity to treat is high cholesterol. Subjects are screened for high cholesterol; that is, only subjects who have high cholesterol (over a prespecified threshold) are admitted to the study and only these are treated. Their cholesterol at admission is their Baseline or pretreatment response. After a specified time of treatment, for those in the study, a second response, posttreatment response is taken. The difference in these response, pretreatment posttreatment is the treatment effect for a subject. In saying this, however, we are ignoring the regression effect. In the absence of treatment, a subject with high cholesterol is expected to have a decline in cholesterol at the second reading. So for our study, are we observing a treatment effect or a regression effect? We somehow have to tease out of the effect that portion due to treatment and that portion due to regression. In many studies a control is used. In this case, for the cholesterol study, subjects with high cholesterol on the first reading would be randomly assigned to a treatment group 3
17 and a control (placebo) group. Then, if least squares procedures are used for inference, the mean difference of their differences (first and second responses) would be obtained. This would essentially remove the regression effect. In many studies, however, this may be impossible and it may even be unethical, (knowingly withholding treatment). For the practical problem of this study, there is no control group. Our data consists of the before and after responses of those admitted to the study. In addition, we have history on subjects who were below the threshold. This could be either a sample of subject responses who were not admitted to the study (i.e., their responses were below the threshold) or well known information on the distribution of the pretreatment variable. As we show, this additional information is needed in order to separate the treatment effect and the regression effect. 1.2 Simple Linear Model this model as A basic part of models we discuss is the simple linear regression model. We write Y t = a + x t (3 + e t, (1.1) where Y % is the response variable for the ith valuves; x % is the ith row of a know n xp matrix of independent variables (p > 1); (5 is a p-vector of unknown slope parameters; a is the unknown intercept parameter; and e t is the ith error term. We discuss assumptions of the errors e t in Chapter 2. For now, we assume that e l5..., e n are independent and identically distributed (iid) with cumulative distribution function (cdf) Fit) and probability density function (pdf) fit). For the traditional fit, we assume that a 2 = var(e) exists. The model discussed above are based on simple linear models. The methodology that we develop is based on a robust fits of this model. For reference we next discuss the traditional least squares () and robust fits for the simple linear model. 4
18 1.3 Traditional Least Squares Fit Let Y = (yi,..., y n ) T and X = (xi,..., x n ) T. The estimate of slope (/3 ) and intercept (ct^s) can be obtained in this way. First, center X; i.e., let X c = X X. Then the estimate is 3 = Argmin Y-X c /?, where \\-\\ is the Euclidean norm. Let a^s be the average of the residuals y l = (x l X)P', i = 1,...,n. Then the estimatea are : a = XT=i fa - ^) (y. - v) E,=i fa - x) = V-PX. The fitted values are then y,i = a L s + PXI- Denote the residuals by &,% = Ui- V,Z- The estimate of the variance is a 2 = ^ J2 ^Isy 1.4 Robust Fit Consider the simple linear model (1.1). Assume that the random errors ti,...,e n are iid with cdf F(t) and pdf f(t). In particular, our robust methods do not require finite variance of the random errors. Consider the general R pseudo norm, n i=i where a(i) = f[i/(n + 1)]; i = 1,2,...,n. We assume that ip(u) is an increasing squareintegrable function defined on (0,1). Without loss of generality, assume </? is standardized; 5
19 i.e., j Lp{u)du = 0 and J ip 2 {u)du 1. The dispersion function can be written as D v (p) = \\Y - PXW,, = $>[!%, - x t P)](y x - x t 0). 2=1 It is easy to know that D v {0) is a piece-wise linear. Thus, the D^{P) is a continuous, and convex function of P (Hettmansperger and McKean, 2011, p. 168). Define the R-estimator as 3, = Argmin Y-X/3 v,. (1.2) The negative of the gradient is given by SM dd v (P) dp dp n = -^[Ri^-x'Mi-x) i=i n = J^a[R(y t -x t (3)](x lj ). Thus, the R estimate of slope p R solves also the equations n Y^x.a^Riy, -x' t P)] = 0. i=i 6
20 p.l68);i.e., The R estimate a is the median of residuals (Hettmansperger and McKean, 2011, otr rned < Y X p R >. Under regularity conditions, Hettmansperger and McKean (2011, p. 189) show a *)~N(l a pj \\P k n -r^'(x'x)" 1 -r^x'x)-^ r^(x'x)- 1 where k n n VJ + r^a^x'x) 1 x and T S and T V are the scale parameters which are defined as rs = (2/(0))" 1 r v = {VV2Jf\t)dt)- 1. The Wilcoxon scores discussed in Chapter 3 of Hettmansperger and McKean (2011) are generated by ip(u) = Vl2(u ~). For Wilcoxon scores, if the errors have a normal pdf with variance a 2 then Hettmansperger and McKean (2011) show that T V = o-sj\, and TS = v-sf\- Another popular set of scores are the sign scores generated by <p(u) = sgn(u-i). The ARE of the R estimate in relation to is the ratio of a 2 /r 2. For Wilcoxon scores, this is the familiar value 12cr 2 (J f 2 ) 2, which for normal errors is (McKean and Vidmar, 1994). If the true distribution has tails heavier than the normal, then this efficiency is usually much larger than one (McKean, 2004). 7
21 1.4.1 High Breakdown Rank-Based (HBR) Estimates Consider the simple linear model (1.1) and the dispersion function, n n 1=1 ] = 1 where b v denotes the weight function and e, = y % x\p. Note that when the weights are b l3 = 1, and weights yield the well know rank-based Wilcoxon estimate (Terpstra and McKean, 2005)). Writing the dispersion function as a function of P, we have n n 1=1 3 = 1 = EE^ fa-^)-fa-x,)'p. i=l 3=1 It follows that D(P) is a nonnegative, continuous, and convex function of p. Define the HB R-estimator as The gradient is given by P HBR = Aigmin\\Y-Xp\\ HBR. (1.3) 9 (6) - dd{/3) n n = J2YsK& - x 3 )sgn[(y t - Y 3 ) - fa - x])'p}. 1=1 3 = 1 8
22 Thus, the HBR estimate of slope PHBR also solves the equations SHBR(P) = 0. The intercept is estimated by the median of the residuals. The R estimate of a is the median of residual (Hettmansperger and McKean, 2011, p.265) OCHBR = med y - X'P H BR J Under regularity conditions, Hettmansperger and McKean (2011, p.265) show where Ts is defined by ( ahbr )~N(( a ), \ PHBR J \\pj ^C" 1 " 1 T S = (2/(0))- 1. As discussed in Hettmansperger and McKean (2011), there is a loss in efficiency when HBR estimates are used in place of the Wilcoxon estimates. However, the estimate P R has breakdown \/n due to its sensitivity to outliers in the X space. But with a proper set of weights, the influence function of the HBR estimate is bounded in X and Y space and has 50% breakdown. 9
23 CHAPTER II THE MODEL AND ESTIMATES The model of interest for the regression effect is a bivariate model for a random vector of responses (X, Y). Let X denote the initial response or measurement on a subject. We often refer to X as the Baseline measurement or the premeasurement value of the response before treatment or therapy. Let Y denote the second value of the response on the subject. The variable Y is realized after the treatment and, hence, is often called the posttreatment measurement or response. We are interested in the effect of the treatment, over and above the regression effect. The null hypothesis is that there is no treatment effect. Hence, for our models, under the null hypothesis the marginal distributions of X and Y must be the same. The second requirement for our models is that the expected value of Y should be a linear function of X. With this requirement, we can model both the regression effect and the treatment effect. There are several families of distributions which satisfy these requirements. For example, as we show later, the elliptical family of bivariate distributions is one such family. In practice, though, the most important member of this family is the bivariate normal distribution. Essentailly all the literature on the regression effect is in terms of this model. For this and the next chapter, most of our discussion concerns the bivariate normal model. For the normal null model, (X, Y) can be written as bivariate normal as (*H(. a 2 pa 2 pa 2 a 2 (2.1) where p denotes correlation. With regression towards the mean effect in mind, assume, 10
24 as usual that 0 < p < 1. Note that for this null model, both X and Y are distributed as N(p,a 2 ). We can write model (2.1) from a regression point of view. Consider the model Y = p + p(x - p) + e, (2.2) where e is ~ N(0, a 2 (l p 2 )), X is N(p, a 2 ), and e and X are independent. Hence e X N 0 " " a 2 {l-p 2 ) 0 " p - > 0 a 2. So that It follows that ( Y ) \ x ) r x I p 0 / ivl L /i J e X + a 2 pa 2 pa 2 a 2 So model (2.1) and (2.2) are equivalent. It also follows that p- pp 0 E(Y\X = x) = p + p(x- p). Also note that var(e) is free of x. Then we have To illustrate the regression towards the mean effect, suppose that x > p and p > 0. E(Y\X = x) = p + p(x-p) < p + (x p) = X. 11
25 So, E(Y\X = x) < x. Also, since p(x p) > 0, we have E(Y\X = x) = p + p(x - p) > p. Thus p<e{y\x = x) <x. This is the regression towards the mean effect and it is illustrated in Figure 1. Y Y=/y +(X-/i ) = /i + p(x-/i) X Figure 1: Regression toward the Mean Plot 12
26 2.1 The Dual Effects Model The dual effects model by Naranjo and McKean (2001) can be written as Y = p - S + 7p {X - p) + e, (2.3) where X is N(p, a 2 ), e is N(0, a 2 (l p 2 )), and X and e are independent. If the additive component S = 0, then model (2.3) reduces to the multiplicative model of Chen and Cox (1992), where the treatment effect is represented by 7. If 7 = 1, then model (2.3) reduces to the additive model of Mee and Chua (1991), where the treatment effect is represented by 5 > 0. We then refer to 5 and 7 as the additive and multiplicative components, respectively, of treatment effect (Naranjo and McKean, 2001). For the dual effects model, the bivariate normal distribution (2.3) is equivalent to the conditional model, Y\X = x ~ iv (p p (x - p), a 2 (l - p 2 )), (2.4) where X is N(p, a 2 ). The hypothesis of interest is H0 : S = 0 and 7 = 1 versus H a : 5 ^ 0 and 7 ^ 1. (2.5) We next briefly derive the multivariate distribution of [X, Y). This will help when we discuss distributions other than the normal in Chapter 4. A derivation similar to that showing the equivalence of model (2.1) and (2.5) can be used to show that model (2.3) is equivalent to saying that Y X N p 5 P Yp 2 a 2 + a 2 (l - p 2 ) "fpa 2 Ipa' a (2.6) 13
27 where X denotes N(p, a 2 ), and Y is N(p - 5, ^2p 2 o 2 + a 2 (l - p 2 )). Note that under H 0, Y is N(p, a 2 ), that is Y and X have the same distributions. For inference, we need estimates of the parameters of the model along with the standard errors of the estimates. We also need to discuss tests of H 0. We briefly review the inference developed in Naranjo and McKean (2001) and then develop our robust analogs. 2.2 Parameter Estimates Recall from Section 1.1 the practical problem with which this study is concerned. Suppose we have n pre and posttreatment responses. In the notation of Model (2.1), these responses would be the realizations of the random variables, (Xi,Yi), (x 2,y 2 ),, (X n,y n ), where Xi and Yi denote the pre and posttreatment responses on the ith subject. We assume that these n random vectors are iid with distribution (2.6). As discussed in Section 1.1, we also have at our disposal a sample X n +i, X n+ 2, j X n+m of pretreatment responses of subjects who were not treated. Note that the X\, X 2,..., X n, X n +i, X n+2,, X n+m are iid with the marginal distribution of X as their distribution. In practice instead of this large sample from the marginal distribution of X, we may know characteristics of the distribution of X, such as the mean and variance of X. Consider the case of selected sampling where X measurement is a random sample of n + m. The Y measurements are gotten from only a subset of size n of the n + m. Therefore, the data we have is from (x 1,yi),..., (x n, y n ),x n+1,..., x n+m. 14
28 As in Naranjo and McKean (2001), consider the likelihood of the dual effect model m L{p,l,o) = Y[f x, y (x t,y t ) YI Kx 3 ) 1=1 j=n-fl n m = \\f{x l \y l )h{x l ) YI K X J) i=l 3=n-\-l U L± (2TT) (2irY/ 1 /2( 2 (a 2 (T 2( (l 1 _ p2))i/2 (2 7 r) 1 t /2 CJ * a+77i -. TT - e-^^-^) 2 (2TT) 1 /2 0 - j=n+l v ' = 1 e-^(tr^ Er=i[».-M+«-7P(x.-/i)l 2 1 ^^?_ l(gl -^a (27T) 1 /2( a 2( 1 _ p 2))l/2 e (27r)V2a 1 1 V n+m IT -ia 2 (27r)V 2 a Taking In of the likelihood, we have (p,7,5) = -^n(27r)- /n(cj 2 (l-p 2 ))-^ ^^Jr-p + o- 1P {x % -p)} 2 + k, or ' 7=1 1 ^fa 7, 5) = --/n(l - p 2 ) - 2^2(1 _ 2. Y^Vl ~ V + 5 ~ 7p fa ~ ^ + k - n Taking partial derivatives and solving the Model (p, 7, S) give the MLE's result d ( P,l,S) 1 y^r, x. = d~5 ~ a 2 (1 - p2) 1> ~ ^+ * ~ 7p(^ ~ ^)] 15
29 6) Then let de{p^' = 0 and solving the equation d (p,i,s) = 0 85 n 1 ~2Ti 2^51^ -p + 5-7p(x, -p)] = 0 (T^ 1 P ) Z ' V ' 7 = y^ y % - np + ns - 7p y~] x l - n-ypp = 0. 7=1 7=1 Therefore, we have 7=1 y% i i 2-J7=l "^7 o = h p + 7P 7PP n n = -y + p + -fppx - ipfi = IP(X- p)-{y- p) Taking ^EnH = Q and solving the equation dl(p,~/,5) dip n 1 - a 2 (1 _ p 2) X ^ - M TPfa " P)] ^ ' 7 = 1 * (- (x, - p)) = 0 77 Y^[yt-p + s-ip{x l -P)](X 1 7 = (& ~ ^ fa ~ ^)+ s 53 fa ~ ^ 7=1 7 = p) = o -7pJ3fa-p) 2 = 0. 7 = 1 16
30 We then have --. _ EIU (& - P) fa - P) 7P y^n / \2 E,=i fa - P) Er=i [(^ - y) + (y - M)] [fa - x) + (x - p)] Er=i [fa -x) + {x- p)f EHi fa - x) jy % - y) Er=i fa - ^) 2 Next set 1{ - P^> 5 > 0. Solving the equation, we have 17 d (p,i,s) <97 = 0 1 <A 2M _ 2^ 53^ ~ P + ^ ~ 7Pfa ~ P)] (~P fa ~ P)) = 0 ^ ^ ^ 7= b* - p + $ - IP{XI - P))P fa - p) = o 7= p53(?a ~^)fa -^+5p 53fa~ /i ) ~7P 2 53fa _^) 2 = - 7=1 7=1 7 = 1 Hence, 7 p Er=i (v* ~ P) fa - P) ^ E:=I fa - & JP p' 17
31 Setting dl(p,~t,5) Q ' = 0 and solving the equation, we have i r(i-p 2 )2Ei=i[^-p+<5-7P< X7. 2a 2 2a 2 (1 - p>y p)]{-p{x l - dp p)) Er=i [^ - P + 5-7p (x t - p)f (-2p) ;i - P 2 Y <-2p) 2(1-p 2 ) = 0. Thus, n{2p) 2(1-P 2 ) na 2 1-P 2 = p p p 1 Er=i k - p+^ - 7p fa - P)] 2 (2p) z^2 (1 - P 2 ) 2 Er=i k - P + s - 7P fa - P)Y (1 - P 2 ) 2 Er=i [yi- p + d '- IP fa - p)f na"./, Er=i [yi-v + s- V 1 5-7p (x t - p)y ^ L_ Er=i [yi - P + 7P (s - p) - (y - p) - 7P fa - p)y V na 2 A Er=i k - /- 7P fa - z)] 2 ncr^ Hence, the mle of p is =,/i _ Er=i e 7 2 = L _ E?=i [yi-»-pfa -p)y na" na" 18
32 and the results of MLE from the dual effects model are 6 = 7P (x - p) - {y - p), IP Er=ifa- w ) (y* - y) \2 ' 2^7=1 \ Xt X ~ 7P 7= > P = J x _ Er=i[y*-y-7Pfa-x)] 2 na 2 To compute the inverse of the empirical information matrix, we can rewrite Model (2.4) as I m ) (2^)V2( (T 2( 1 _ p2 ))i/2 e V-'> Taking In on Model (2.7) ln/(ylx) = -^n(27r)-^n(a 2 (l-p 2 ))- 2(j2(i 1 _^2) [-p + (5-7p(x-p)] 2. Taking first partial derivatives Model (2.8) with 6 (2.8) j3 = «^ = L_ h/ _ /J + 4 _ w(l _ rt]. (2. 9) Taking second partial derivatives from Model (2.9) with 5 d 2 lnf (y\x) _ 1 J 33- ^ -"a 2 (l-p 2 )' a i 0 ) 19
33 Taking expected both sides of Model (2.10) and get J 3 3 Taking second partial derivatives from Model (2.9) with 7 d 2 lnf{y\x) _ d 2 lnf{y\x) p{x-p) Js2 - d5 1 ~ d5 1 [ ~ P{X ~ ^ ~ a 2 (l-p 2 )' (2>12) Taking expected both sides of Model (2.12) and got I 32 ' --*M--*(^)-- P W^--&&- - <2J3) /32 = /23. (2.14) From Model (2.14), Then we get / 32 equal I 2 3- Taking second partial derivatives from Model (2.9) with p J31 = a 2 /n/ (2/ x) f [y - p + 5-7p (x - p)] (-a 2 2p) - a 2 {I - p 2 ) (-7 (x - p)) dop V (<x 2 (1 - P 2 )) 2 7 _[j/-p + (5-7P(x-p)](2a 2 p)-a 2 (l-p 2 )(7(x-p)) (a 2 (1 - p 2 )) 2 Taking expected both sides of Model (2.15) and got /31 T vfj s ([y-p + 5-1P {x-p)]{2a 2 p)-a 2 {l-p 2 ){ 1 {x-p)) hi = & {J31) = & (<x 2 (l-p 2 )) 2 hi = E(J3i) = - E{e){2a 2 p)-a 2 {l-p 2 ){ 1 {E{x)-p)) {a 2 {l-p 2 )f / 3 i=/i3 = 0. (2.16) 20
34 Taking first partial derivatives Model (2.8) with 7 Olfi f (y\x} 1 h = Q = ~ a 2/ 1 _ 2\ [y- P + S-IP(X- p)} (-p (x - p)) T p{x- p)[y- p + 6-~fp{x- p)} J 2-2~7l 2\ K Z - l >) a z (1 p l ) Taking second partial derivatives Model (2.17) with 7 _8 2 lnf{y\x) _ p{x-p) p 2 {x - p) 2 J22 - d a 2 (1 - p 2 ) { ~ P {X ~ P)) ~ ~a 2 {l-p 2 Y (2 ' 18) Taking expected both sides of Model (2.18) *--*<«--*!-^J-^-^. (,.9, Taking second partial derivatives Model (2.17) with p J, d 2 lnf{y\x) 21 a 7 p P [y - P IP {x - p)] (~2pa 2 ) - a 2 (1 - P 2 ) [p (-7 (x - p)) + [y - p yp {x - p)]] N * (x - p) {a 2 {l-p 2 )) 2, j 2a 2 P 2 {x- p)[y- p + 5-jp{x-p)] a 2 pj (1 - p 2 ) (x - p) 2 {a 2 (1 - p 2 )) 2 {a 2 (1 - p 2 )f a 2 (1 - p 2 ) {x - p) [y - p + (5-7p (x - p)] (a 2 (l-p 2 )) 2 21
35 22 Xz d ~ l) z D Xz d - l) z D J(r/ -x)dl-g + rl-k} J(r/ - x) dl - g + rl - H) { z d - X) ^ e^-lh* 2(^-1)2^ S(^-l) S(^-T) z{ri - x) { z d - 1) d [{rl- X )dl-g + rl-ft]{r1-x)dlz zd \ \ W -T) ] J{rl-x)dL-g + rl-fi} z { z d- l )_ I \ W - I) 1 {{rl _ x) L-) [(ri -x)dl-g + ri-d} d Z \ z^-l) 1 [{{ri - X) L-) { z d - 1) - (d Z -) [(rl -x)dl-g 7D + I [ W ~ T ) 1 {d Z -)( z d-i)z z [{rl-x)dl-g + rl-fi}d_ = ll f 1 z^d - T ) + r1-a}\ (ri-x)l ( z d- X )-(d Z -)d + I **e - TT. (x\a) fu lz g d qjiav (ZZ'Z) I 3 P IAI saaireauap repied puooas Suprex (ZZ'Z) Xz d - l) z D (z + d ~ l) z D il^l [(rl -x)dl-g + rf - d]d [(ri - x) dl - g + ri - H] (rl - x) L d Xz d -D z[(ri -x)dl-g + rt-fi] (d Z -) + ((ri - x) L-) [(ri - x)dl - g + ri - fl] Z ( z d - \) = i f S^Z (z d - \) z D ( d Z-) zo dq (X\K) fujq l f (XIT) d qjim (8*3) PP IM S9AIJBAU9P rerj-red JSJU Suprex zij = kl_jl_ = ( 0 + zo ( z d - T) Ld z n - 0) ^ ^ l) ZD) = («/ ) 3- = "/ dl I (0Z7) PP 1AI J S3 P? S m 0 l pspsdxa Suprex
36 2p7 (x - p) [y - P + S - 7p (x - p)] a 2 (1 - p 2 ) 2 (2.23) Taking expected both sides of Model (2.23) In = -E{J n ) P Q P(1-P> 2,4p 2 (1-p 2 ) a 2 (1-p 2 ) ^2(1-P 2 ) O Q (1-p 2 ) 2 (1-p 2 ) 2 a 2 (1 - p 2 a 2 (1-p 2 ) 4 a 2 (1-p 2 )', p ' (1-p 2 ) 2 (1-p 2 ) 2 (1-P 2 ) (1-p 2 ) 2 + (1-P 2 ) hi Model (2.25) show J in matrix from p 2 ;I-P 2 ) a-p 2 r (2.24) J = Jll J\2 Jl3 J21 J22 J23 Ju J32 J33 (2.25) Taking expected on Model (2.25) in order to get the inverse of the information matrix I = -E[J] (2.26) Plug Model (2.11),(2.14), (2.16), (2.19), (2.21) and (2.24) into Model (2.27) l+5p 2 (1-P 2 ) T (l_ p 2)2 + IP (1-P 2 ) 0 2 _ (1-P 2 ) P 2 (1-P 2 ) 0 1 <7 2 (1-P 2 ) (2.27) 23
37 Multiply Model (2.27) by n in order to get i, L = nl. (2 In = U \(l-p*) + (1-p 2 ) 2 n IP (i-p 2 ) -n IP 2 (1-P 2 ) '(i-p 2 ) 0 0 (2 0 0 n- l 7 2 (l-p 2 ) Variance and Standard error estimates computed from the information matrix are Var (p) = n 2 P^ (i-p 2 ) n (i-p 2 ) \( 1+7 l+5p 2 \] [{(1-P 2 ) ' (1-P 2 ) 2 j. n2^pk (1 - P 2 ) n 1+^4. 1 I+5P 2 ^ - - P _ V 2 Var (7) = n (1-P 2 ) l+5p 2 fa-p 2 ) + (1-p 2 ) l+5p 2 (1-p 2 ) + (1-p 2 ) 2 7V (1-P 2 ) 2 (1- np 2 P 2 ) 1 + [j + ~ P 2 " [1+7+ (I_p2)j, l+5p 2 0] 7 -l- (i_p 2) 7 j Var (5) = S (S) = a 2 (1 - p 2 ) n a 2 (1 - p 2 n SE (p) = ;i - P 2 ) \n[l + 7+( 1±^-7 2 j S (7) = \ (1- np 2 P 2 ) 1 + [lx 7 x 1+5P 2 1 [-L + 7+ (1_ p 2)J T + 1+5P 2 _,2l 7 -l- (1_p 2) 7 J 24
38 2.3 R and HBR Estimations for Linear Regression Model The dual effects model can be written as Y = p p (X - p) + e. We can rewrite the dual effects model as a linear regression model Y = a + px + e, where a equals to p 8 jpp and P equals to jp. The efficient estimators for a and P depend on the underlying distribution. The R estimators (1.2) are better than the estimators when the distribution has thicker tail or allows outliers in the Y space. However, the R estimator is not robust to outliers in the X space. But the HBR estimator (1.3) is robust to outliers in both the X and Y spaces and, further, it has positive breakdown. The estimate of 7p and 5 based on estimates of slope and intercept is IP = P and 8 = P - & - PP- In particular, 5 is unbiased, consistent, and has minimum variance. However, 5 is sensitive to outliers and deviations from the normality assumption. The estimate of 8 and 7p based on R estimates (1.2) of slope and intercept is 1PR = PR 25
39 and 8R = p-ol R - p R p, where the estimate of 8 and 7p based on HBR estimates (1.3) of slope and intercept are IPHBR = PHBR and 5HBR = P O/HBR PHBRP'- Recall that the MLE estimate of p is -,/i-sas P \I - 9 na* or Ore = A/1- n^l= l l Let a 2 = ^3^ ]C^?- I n practice we often do not know a 2. Then we estimate it by n p _. n+m v 2 = T Y ( x * - ^) 2 ' i=i n + m 1 -^^ i=i which is the nonparametric estimate of a 2 based on a sample. Thus, in practice? 2 For a robust estimate of p, it seems natural to use ratio of robust scale estimators in place of a 2 /a 2. We first look at it generally, obtaining a general consistency result. This is followed by a discussion of explicit estimators. 26
40 2.4 Scale Functionals For this discussion, functional notation is convenient. Let W be a continuous random variable with cumulative distribution function (cdf) F(w) and proability density function (pdf) f(w). Then 9 is a functional if it maps F into R where R denotes the set of real numbers. We use the notation 9(F), 9(f) and 9(W) interchangeably. This is an abuse of notation because 9 is a function of a cdf; i.e., 9(W) means 9(F W ). Definition: Let W be a random variable cdf F. 9 is a scale functional if for all a G R and b > 0 9(W + a) = 9(W) and 9(bW) = b9(w). Consider our regression dual effects normal model (2.3), while we rewrite as Y = p p (x - p) + e. Note that x is a continuous random variable. Theorem: Let 9 be a scale functional. Then 9(e) = y/l p 2 9(x). Proof: For the normal model, we have, in distribution, that e = ^/l p 2 az, where Z is iv(0,1). Since 9 is a scale parameter, then we have 9(e) = y/l-p 2 a9(z). 27
41 Also, we can write X in distribution as X = az, where ZisN(0,l). Hence, 9(X) =a9(z). Thus, 9(e) = Vl Zr P~^[0(X)/a] = y/t=?6(x). Therefore, 9(X) =V " That is, ^ 9 2 = { U-PP (X) 2 ]) 2.5 Scale Functions and Estimator Returnning to the robust estimation of the model (2.3), suppose we select 9 as our scale functional. (1) The estimates of 8 and 7p based on R and HBR estimates of slope and intercept are IPR = PR, 5R = p~a R - p R p. 28
42 and IPHBR - PHBR, 8~HBR = P OiHBR PHBRP- (2) The residuals form ei = y% - (P - 5) ~ lp( x i ~ p)- Denote our estimation of 9 by 9 n,t 9{e\,...,e n ). Assume that 0 j is a consistent estimate of 9(e); i.e., # j -^ 0(e) as n > oo. In our problem, we sometimes have a value of a 2 obtained by subject matter. We can then use normality assumption of the model to determine 9(X). Otherwise we will have to estimate 9(X) based on the large sample; i.e., 9 n+m^(xi,...,x n, x n+ i,..., x n+m ) Assume that 9 n+m, x > 9(X) in probability as (n + m) > oo. Finally, let P N j» 2 (?x.,?») ''aa^'l) ' ) Xn+m) are consistent estimators. Then p is consistent estimator of p. Theorem: Using the notation above, p > p. Proof: 1 _ g*(?l.-,?n) _, 1 _ M _ 1 _ ( > _ n2\ _ el(xi,...,x m+n ) «2 (x) P 2 29
43 2.6 Example of Scale Functionals and Estimators In this section we discuss several scale functionals and estimators that we will use in this study. As we note all of these are consistent, and, hence, lead to consistent estimators of p. In Chapter 3, we will investigate how well they do in practice Dispersion Function D^; i.e., Recall that our robust regression estimator of 7p minimized the dispersion function 7P^ = Argmin/J^p), (2.30) where D^p) YYY^=i a^{r(yt ~ lp x [)](yi ipx[). Note that D is invariant to p and equivariant to a. As discussed in Hettmansperger and McKean (2011, p.201), the functional corresponding to \D^ is D^F) = J Up (F (t)) f (t) dt Theorem: Let X be a random variable. Then D V (F) is a scale parameter. Proof: We want to show that for all a and for all b > 0, D(x + a) = D(x) and D(bx) =bd(x). Let F(x) be the distribution function of x. Consider D(e) = 9(ax), a > 0. Then, 30
44 we have / oo -oo x<p(f e (x))f t (x)dx. (2.31) Assume that X has a cumulative distribution function (cdf) F x and proability density function (pdf) f x, F<(t) = P(e<t) = P(ax< t) = P(X<a = F x [- From Model (2.31), D(e) = Jt<p(F c (t))f c (t) dt ^( F - )^(s)* a <P (F e (s)) f e (s) ds = ad(x). Consider D(e) = D(x + b), oo < b < oo. Let X have a cumulative distribution function (cdf) F x and proability density function (pdf) f x. F t {t) = P(e<t) = P(x + 6<i) = P(x<t-b) = F x (t-6). 31
45 From Model (2.31), D(e) = J t<p(f (t))f e (t)dt = (ty(f x (t-b))-j x (t-b)dt = Js<p(F e (s))f e {s)ds = D(x). As discussed in Hettmansperger and McKean (2011, p.201), the residual dispersion function converges in proability to D^(F), D v {e n ) A D V (F) From Model (2.5), we can rewrite to dispersion function as PD<P i ^2(?i,...,? n ) \ D x (x\,...,x m+n ) Theorem: Using the notation above, p Dip > p. Proof: Consider D(e) = y/l - p 2 D(x). We plug D(e) into Model (2.6.1), PDip I (^T^J 2 D(x)) 2 I D 2 (x) (^T^p 2 ) 2 D 2 (x) D 2 (x) = P- 32
46 Therefore, we conclude that po v > p Parameter T V Define the parameter r by r" 1 = / (p(u)ip f (u)du, (2.32) where (p f (u) = - f f { (F-'(u)y Hettmansperger and McKean (2011, p. 178) showed that r^ is the scale parameter. Let u - F(x) and du = f(x)dx. We then have <p[f(x)\f (x)dx. Let y = a + bx and a > 0 and b > 0. Hence, F r (y)= P (X<^) = F(^). Let/ (y) = i/(«=2) and/ y (y) = /'(*^). We have r" 1 = v>[*wy)]/ y (y)<*y. Therefore, ^--f^^i^w <«3> 33
47 Let z = y -^ and dz ~\dy. Next, plug z and dz into Model (2.33), we get OO -1 / - w- -oo v[f(z)}-f (z)bdz We then have this means that r is a scale functional. As an estimate of r, we use the estimator developed by Koul, Sievers, and McKean (1987). This estimator is a consistent estimator of r under both symmetrical and asymmetrical errors; see also Hettmansperger, McKean, and Sheather (2003). An informative discussion of this estimator can be found in Section of Hettmansperger and McKean (2011) Median Absolute Deviation (MAD) The median absolute deviation from the median, called the MAD, is a common resistant measure of scale (Mosteller and Tukey 1977). Let a sample w\,...,w n. The MAD is defined by the median of the absolute deviations w l from the median w 3 (Lax, D.A., 1985), The MAD model can be written as MAD = med, \w t - med 3 Wj. (2.34) 34
48 The functional of MAD is d UAD = l-4826med \w - med(w). (2.35) It is easy to see that #]yiad * s a sca^e functional; that is, for all a, 6, MAD( U ' + 0 = ^MAD( U ') anc * for ^ > 0 ^MAD^'w) = ^^MAD( U ')- The ^MAD * s a consistent estimator (Mosteller and Tukey, 1977) Example: Final and Midterm Exam Scores The data consist of final and midterm exam scores from 20 students. Let X and Y respectively denote the midterm and final exam scores of a student. Choose the cutoff point 70 for X. Then the select sample is (x t, y t ), where x % > 70 Then n = 10 and m = 13. Use 70.1 and for p and a (these are the sample mean and standard deviation of all the x's). The Midterm and Final scores are shown in Table 1. The regression towards the mean plot (with X centered) is displayed in Figure 2, which contains a dot line, in which the slope is equal 1; and a line, in which the slope is less than 1 or is the regression towards the mean. We investigate and Wilcoxon dispersion () procedures. Terpstra and McKean (2005) have written the weighted Wilcoxon routine using R statistical software package (R Development Core Team, 2005). For this study, we use this R program to perform our computation and call it as W D in this study. 35
49 Table 1: Final and Midterm Exam Scores Student Midterm Scores Final Scores Procedure Table 2: Mean Parameters Mean Parameters 7P P
50 o o o oo o o CD O LO O o oo o Midterm Scores Figure 2: Regression toward the Mean Plot 37
51 The mean parameter of 7 of the method is higher than that of the procedure. While the mean parameter of p and 8 using the procedure is higher than the procedure (Table 2). For either or Wilcoxon procedures, based on the results in Table 3, we reject the over all null hypothesis since 0 0 1$ or 1 f- I 1. Based on the confidence intervals, there are both additive and multiplicative effects for this data set. Hence, there is no simple interpretation. Procedure 7P (1.0761,2.2708) (0.9375,2.3214) Table 3: 95% BCI 95% Bootstrap Confidence Interval (BCI) P 7 (0.6522,0.9532) (1.2827,2.9679) (0.6857,0.9632) (1.0596,2.8563) 8 ( , ) ( , ) 38
52 CHAPTER III MONTE CARLO STUDY FOR DUAL TREATMENT EFFECTS AND MULTIVARIATE NORMAL MODE In this chapter, we consider a simulation study to investigate the traditional and R procedures for the dual effects model. Our summary of this study includes the empirical means, the empirical MSEs, the empirical AREs, the empirical confidence coefficients, and the empirical levels of four parameters; which are 7p, p, 7, and 8. Both pretreatment and posttreatment variables are generated from dual treatment effects models using multivariate normal varites. We specify two settings for p at 0.7 and 0.8. So the regression effect is on for all situations. We also consider the different scenarios for the additive (8) and the multiplicative (7) effects. For our Monte Carlo study, we simulate a clinical study in which only a patient whose response exceeds a prespecified threshold is treated. In our case, we set the threshold at the third quartile of the initial response. Hence for a given bivariate distribution for a random vector (X, Y), a simulated sample is obtained as follows. Generate a realization (x, y) from the bivariate distribution. If x > q Xy3, where q x>3 is the third quartile of X, then retain (x,y); otherwise, retain only x. In our simulation studies, we set the sample size of the bivariate sample at n = 50. Hence, for each simulation we have the bivariate sample (xi,yy),..., as x n+ i,..., x n+m, (x n, y n ) and, in addition, the sample of x's below q x>3, which we label (as in Chapter 2). We expect m to be 150 but it may not be. This adds some variation into the process but it is similar to what occurs in clinical practice. Further in practice, the mean (p) and standard deviation (a) of X may be known; i.e., estimated from a very large sample. If not, then they often estimated from the combined sample of 39
53 x's; i.e., all n + m realizations of X. In our simulation studies, we use the later approach; that is, the combined sample of the x's is used to estimate p and a. In this chapter, the simulated model is the bivariate normal under the dual treatment effects model. The bivariate pdf is given in expression (2.2). The parameters 8, 7, and p are set at different values depending on the situation. For all simulations, we set p = 200 and <T= Bootstrap Confidence Intervals (BCI) Our model for the study is the dual effects model which was discussed in Chapter 2. For convenience, we rewrire it Y = p + p(x - p) + e. In Chapter 2 we discussed traditional and robust estimation of the parameters of the model. In practice, though, we are often interested in the hypotheses of no treatment effect beyond the regression towards the mean effect; that is, the hypotheses H 0 : 8 = 0 and 7 = 1 versus H a : 8 y 0 or 7 ^ 1. We consider testing these hypotheses using the following bootstrap confidences. The bootstrap confidence intervals are calculated as follows. A sample of size n is drawn from (xi,yi)..., (x n,y n ) with replacement. This gives new estimates (^,7*). The bootstrap joint distribution of (8,7) is obtained by repeating the procedure B times to get (51, 77),..., (8g, 7^). A 95% confidence interval for, say, 5 is given by the 2.5th and 97.5th percentiles of the bootstrap distribution of 5. These bootstrap-percentile confidence intervals are denoted 95% BCI (Naranjo and McKean, 2001). 40
54 3.2 Tests of Significance The treatment effect components 5 and 7 may be tested by using the bootstrap confidence intervals. However, as in estimation, Naranjo and McKean (2001) recommend the use of bootstrap-based inference over the likelihood-based standard errors. A marginal test with 5% level of significance against the alternative hypothesis 5 7^ 0 can be conducted by constructing a 95% bootstrap-percentile confidence interval h for 8 to see if it contains the value 0. Similarly, the alternative hypothesis 7^ 1 can be detected if its 95% confidence interval 7 7 does not contain 1. An overall test against the general alternative (8 ^ 0 or 7 7^ 1) is conducted as follows. Let II and I* be bootstrap-percentile 97.5% confidence intervals for 8 and 7, respectively. A Bonferroni test with at most a 5% level of significance for testing H Q : 8 0,7 = 1 versus H a : 8 7^ 0 or 7 7^ 1 is given by IB 5 +. Reject H 0 if 0 P 5 or 1 I*. 3.3 S imulation S tudy Dual Treatment Effects Model of Different Scenarios with 20% and 30% Regression to the Mean Our model for this simulation study is the dual effects model (2.3). As noted, in this chapter we use the bivariate normal distribution to generate the variates. We investigate the procedures for the following four situations: (HI) 8 = 0, 7 = 1 (H2) 5 = 0, 7 = 0.6 (H3) 8 = 0.7, 7 = 1 41
55 (H4) 8 = 0.7, 7 = 0.6 Note that HI is the overall null hypothesis, while H2 and H3 are marginal null situations. As stated before, all four situations are simulated with 20% (p = 0.8) and 30% (p = 0.7) regression to the mean. We run 1000 simulations for each situation. Further, for each simulation we set the bootstrap size at Our simulation study focuses on the comparison of the behavior of the traditional ( based) and the robust procedures based on the Wilcoxon fit. In Chapter 2, we presented these robust estimates of p, based on the Wilcoxon despersion functional (2.30), the functional r (2.32), and the functional MAD. The estimates of each of these functionals along with the Wilcoxon estimates of the other parameters constitute a procedure. Early investigations, though, showed that estimates based on r and MAD led to too frequent negative estimates of p 2. This was not the case for the dispersion function. So for our study we only consider the procedure based on the dispersion function. We label this Wilcoxon dispersion procedure as. We summarize the results for the estimation of the parameters in Tables The summaries include the empirical mean squared errors (MSE), empirical asymptotic relative efficiency (the ratios of MSE's), and the empirical confidences for nominal 95% and 97.5% confidence intervals. There are two testing situations given by 1. Full test (a F ): H 0 : 8 = 0, 7 = 1 versus H a : 8 ^ 0 or 7 7^ 1 is given by Reject H 0 if 0 P & or \ < I* Recall that we use the Bonferroni procedure for this hypothesis with separate nominal level Hence, the nominal level is at most a = We label the empirical level by Sp. 42
56 2. Marginal test: H ao :5 = 0 versus H aa : 8 ^ 0 and H bo : 7 = 1 versus H bb : 7 7^ 1 For 5 marginal test (a a,m,<5): Reject H 0 if 0 /; and For 7 marginal test (Sb >TO, i7 ): Reject H 0 iflg I* The marginal tests have nominal level Table 4: Empirical MSE when p = 0.8 Situation Procedure Empirical MSE HI when p = 0.8 H2 when p = 0.8 H3 when p = 0.8 H4 when p = 0.8 7P P The empirical MSEs over all situations for p = 0.8 are less than that for p = 0.7, for both and methods. In the case of 7p, the MSEs of the estimates are greater than those of the in HI, H2, H3, and H4 situations for both p = 0.8andp = 0.7 (Table 4 and Table 5). This last result is not surprising because procedure is more powerful 43
On Robustification of Some Procedures Used in Analysis of Covariance
Western Michigan University ScholarWorks at WMU Dissertations Graduate College 5-2010 On Robustification of Some Procedures Used in Analysis of Covariance Kuanwong Watcharotone Western Michigan University
More informationEffects of pioglitazone on cardiovascular function in type I and type II diabetes mellitus
University of Montana ScholarWorks at University of Montana Graduate Student Theses, Dissertations, & Professional Papers Graduate School 1993 Effects of pioglitazone on cardiovascular function in type
More informationRobust Outcome Analysis for Observational Studies Designed Using Propensity Score Matching
The work of Kosten and McKean was partially supported by NIAAA Grant 1R21AA017906-01A1 Robust Outcome Analysis for Observational Studies Designed Using Propensity Score Matching Bradley E. Huitema Western
More informationRobust Interval Estimation of a Treatment Effect in Observational Studies Using Propensity Score Matching
Western Michigan University ScholarWorks at WMU Dissertations Graduate College 12-2010 Robust Interval Estimation of a Treatment Effect in Observational Studies Using Propensity Score Matching Scott F.
More informationLeast Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions
Journal of Modern Applied Statistical Methods Volume 8 Issue 1 Article 13 5-1-2009 Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error
More informationInferences on a Normal Covariance Matrix and Generalized Variance with Monotone Missing Data
Journal of Multivariate Analysis 78, 6282 (2001) doi:10.1006jmva.2000.1939, available online at http:www.idealibrary.com on Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone
More informationComputational rank-based statistics
Article type: Advanced Review Computational rank-based statistics Joseph W. McKean, joseph.mckean@wmich.edu Western Michigan University Jeff T. Terpstra, jeff.terpstra@ndsu.edu North Dakota State University
More informationA NONPARAMETRIC TEST FOR HOMOGENEITY: APPLICATIONS TO PARAMETER ESTIMATION
Change-point Problems IMS Lecture Notes - Monograph Series (Volume 23, 1994) A NONPARAMETRIC TEST FOR HOMOGENEITY: APPLICATIONS TO PARAMETER ESTIMATION BY K. GHOUDI AND D. MCDONALD Universite' Lava1 and
More informationOne-Sample Numerical Data
One-Sample Numerical Data quantiles, boxplot, histogram, bootstrap confidence intervals, goodness-of-fit tests University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationContents 1. Contents
Contents 1 Contents 1 One-Sample Methods 3 1.1 Parametric Methods.................... 4 1.1.1 One-sample Z-test (see Chapter 0.3.1)...... 4 1.1.2 One-sample t-test................. 6 1.1.3 Large sample
More information401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.
401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis
More informationFirst Year Examination Department of Statistics, University of Florida
First Year Examination Department of Statistics, University of Florida August 19, 010, 8:00 am - 1:00 noon Instructions: 1. You have four hours to answer questions in this examination.. You must show your
More informationrobustness, efficiency, breakdown point, outliers, rank-based procedures, least absolute regression
Robust Statistics robustness, efficiency, breakdown point, outliers, rank-based procedures, least absolute regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html
More informationConfidence Intervals, Testing and ANOVA Summary
Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0
More information2 Functions of random variables
2 Functions of random variables A basic statistical model for sample data is a collection of random variables X 1,..., X n. The data are summarised in terms of certain sample statistics, calculated as
More informationResearch Article A Nonparametric Two-Sample Wald Test of Equality of Variances
Advances in Decision Sciences Volume 211, Article ID 74858, 8 pages doi:1.1155/211/74858 Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances David Allingham 1 andj.c.w.rayner
More informationMaster s Written Examination - Solution
Master s Written Examination - Solution Spring 204 Problem Stat 40 Suppose X and X 2 have the joint pdf f X,X 2 (x, x 2 ) = 2e (x +x 2 ), 0 < x < x 2
More informationUnit 14: Nonparametric Statistical Methods
Unit 14: Nonparametric Statistical Methods Statistics 571: Statistical Methods Ramón V. León 8/8/2003 Unit 14 - Stat 571 - Ramón V. León 1 Introductory Remarks Most methods studied so far have been based
More information5 Introduction to the Theory of Order Statistics and Rank Statistics
5 Introduction to the Theory of Order Statistics and Rank Statistics This section will contain a summary of important definitions and theorems that will be useful for understanding the theory of order
More informationGlossary. The ISI glossary of statistical terms provides definitions in a number of different languages:
Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the
More informationPart 6: Multivariate Normal and Linear Models
Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of
More informationThis does not cover everything on the final. Look at the posted practice problems for other topics.
Class 7: Review Problems for Final Exam 8.5 Spring 7 This does not cover everything on the final. Look at the posted practice problems for other topics. To save time in class: set up, but do not carry
More informationRegression and Statistical Inference
Regression and Statistical Inference Walid Mnif wmnif@uwo.ca Department of Applied Mathematics The University of Western Ontario, London, Canada 1 Elements of Probability 2 Elements of Probability CDF&PDF
More informationPart IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015
Part IB Statistics Theorems with proof Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly)
More informationReview of Statistics 101
Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods
More informationSection 4.6 Simple Linear Regression
Section 4.6 Simple Linear Regression Objectives ˆ Basic philosophy of SLR and the regression assumptions ˆ Point & interval estimation of the model parameters, and how to make predictions ˆ Point and interval
More informationInferences about Parameters of Trivariate Normal Distribution with Missing Data
Florida International University FIU Digital Commons FIU Electronic Theses and Dissertations University Graduate School 7-5-3 Inferences about Parameters of Trivariate Normal Distribution with Missing
More informationIntroduction to Normal Distribution
Introduction to Normal Distribution Nathaniel E. Helwig Assistant Professor of Psychology and Statistics University of Minnesota (Twin Cities) Updated 17-Jan-2017 Nathaniel E. Helwig (U of Minnesota) Introduction
More informationA nonparametric two-sample wald test of equality of variances
University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 211 A nonparametric two-sample wald test of equality of variances David
More informationReview for Final. Chapter 1 Type of studies: anecdotal, observational, experimental Random sampling
Review for Final For a detailed review of Chapters 1 7, please see the review sheets for exam 1 and. The following only briefly covers these sections. The final exam could contain problems that are included
More informationCh 2: Simple Linear Regression
Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component
More informationMultivariate Autoregressive Time Series Using Schweppe Weighted Wilcoxon Estimates
Western Michigan University ScholarWorks at WMU Dissertations Graduate College 4-2014 Multivariate Autoregressive Time Series Using Schweppe Weighted Wilcoxon Estimates Jaime Burgos Western Michigan University,
More informationTesting the homogeneity of variances in a two-way classification
Biomelrika (1982), 69, 2, pp. 411-6 411 Printed in Ortal Britain Testing the homogeneity of variances in a two-way classification BY G. K. SHUKLA Department of Mathematics, Indian Institute of Technology,
More informationCorrelation and Regression
Correlation and Regression October 25, 2017 STAT 151 Class 9 Slide 1 Outline of Topics 1 Associations 2 Scatter plot 3 Correlation 4 Regression 5 Testing and estimation 6 Goodness-of-fit STAT 151 Class
More informationp(z)
Chapter Statistics. Introduction This lecture is a quick review of basic statistical concepts; probabilities, mean, variance, covariance, correlation, linear regression, probability density functions and
More informationColby College Catalogue
Colby College Digital Commons @ Colby Colby Catalogues College Archives: Colbiana Collection 1872 Colby College Catalogue 1872-1873 Colby College Follow this and additional works at: http://digitalcommonscolbyedu/catalogs
More informationUNCLASSIFIED Reptoduced. if ike ARMED SERVICES TECHNICAL INFORMATION AGENCY ARLINGTON HALL STATION ARLINGTON 12, VIRGINIA UNCLASSIFIED
. UNCLASSIFIED. 273207 Reptoduced if ike ARMED SERVICES TECHNICAL INFORMATION AGENCY ARLINGTON HALL STATION ARLINGTON 12, VIRGINIA UNCLASSIFIED NOTICE: When government or other drawings, specifications
More informationThe Nonparametric Bootstrap
The Nonparametric Bootstrap The nonparametric bootstrap may involve inferences about a parameter, but we use a nonparametric procedure in approximating the parametric distribution using the ECDF. We use
More informationMonte Carlo Studies. The response in a Monte Carlo study is a random variable.
Monte Carlo Studies The response in a Monte Carlo study is a random variable. The response in a Monte Carlo study has a variance that comes from the variance of the stochastic elements in the data-generating
More informationApplication of Variance Homogeneity Tests Under Violation of Normality Assumption
Application of Variance Homogeneity Tests Under Violation of Normality Assumption Alisa A. Gorbunova, Boris Yu. Lemeshko Novosibirsk State Technical University Novosibirsk, Russia e-mail: gorbunova.alisa@gmail.com
More informationRobust Parameter Estimation in the Weibull and the Birnbaum-Saunders Distribution
Clemson University TigerPrints All Theses Theses 8-2012 Robust Parameter Estimation in the Weibull and the Birnbaum-Saunders Distribution Jing Zhao Clemson University, jzhao2@clemson.edu Follow this and
More informationMATH Notebook 3 Spring 2018
MATH448001 Notebook 3 Spring 2018 prepared by Professor Jenny Baglivo c Copyright 2010 2018 by Jenny A. Baglivo. All Rights Reserved. 3 MATH448001 Notebook 3 3 3.1 One Way Layout........................................
More informationChapter 15 Confidence Intervals for Mean Difference Between Two Delta-Distributions
Chapter 15 Confidence Intervals for Mean Difference Between Two Delta-Distributions Karen V. Rosales and Joshua D. Naranjo Abstract Traditional two-sample estimation procedures like pooled-t, Welch s t,
More informationTABLES AND FORMULAS FOR MOORE Basic Practice of Statistics
TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics Exploring Data: Distributions Look for overall pattern (shape, center, spread) and deviations (outliers). Mean (use a calculator): x = x 1 + x
More informationLecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011
Lecture 2: Linear Models Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector
More informationAdaptive Extensions of a Two-Stage Group Sequential Procedure for Testing a Primary and a Secondary Endpoint (II): Sample Size Re-estimation
Research Article Received XXXX (www.interscience.wiley.com) DOI: 10.100/sim.0000 Adaptive Extensions of a Two-Stage Group Sequential Procedure for Testing a Primary and a Secondary Endpoint (II): Sample
More informationEconometrics Summary Algebraic and Statistical Preliminaries
Econometrics Summary Algebraic and Statistical Preliminaries Elasticity: The point elasticity of Y with respect to L is given by α = ( Y/ L)/(Y/L). The arc elasticity is given by ( Y/ L)/(Y/L), when L
More informationDeccan Education Society s FERGUSSON COLLEGE, PUNE (AUTONOMOUS) SYLLABUS UNDER AUTOMONY. SECOND YEAR B.Sc. SEMESTER - III
Deccan Education Society s FERGUSSON COLLEGE, PUNE (AUTONOMOUS) SYLLABUS UNDER AUTOMONY SECOND YEAR B.Sc. SEMESTER - III SYLLABUS FOR S. Y. B. Sc. STATISTICS Academic Year 07-8 S.Y. B.Sc. (Statistics)
More informationLecture 8: Information Theory and Statistics
Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang
More informationPsychology 282 Lecture #4 Outline Inferences in SLR
Psychology 282 Lecture #4 Outline Inferences in SLR Assumptions To this point we have not had to make any distributional assumptions. Principle of least squares requires no assumptions. Can use correlations
More informationEC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)
1 EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) Taisuke Otsu London School of Economics Summer 2018 A.1. Summation operator (Wooldridge, App. A.1) 2 3 Summation operator For
More informationMULTIVARIATE PROBABILITY DISTRIBUTIONS
MULTIVARIATE PROBABILITY DISTRIBUTIONS. PRELIMINARIES.. Example. Consider an experiment that consists of tossing a die and a coin at the same time. We can consider a number of random variables defined
More informationWISE International Masters
WISE International Masters ECONOMETRICS Instructor: Brett Graham INSTRUCTIONS TO STUDENTS 1 The time allowed for this examination paper is 2 hours. 2 This examination paper contains 32 questions. You are
More informationJoseph W. McKean 1. INTRODUCTION
Statistical Science 2004, Vol. 19, No. 4, 562 570 DOI 10.1214/088342304000000549 Institute of Mathematical Statistics, 2004 Robust Analysis of Linear Models Joseph W. McKean Abstract. This paper presents
More informationJoint Probability Distributions and Random Samples (Devore Chapter Five)
Joint Probability Distributions and Random Samples (Devore Chapter Five) 1016-345-01: Probability and Statistics for Engineers Spring 2013 Contents 1 Joint Probability Distributions 2 1.1 Two Discrete
More informationLecture 3. Inference about multivariate normal distribution
Lecture 3. Inference about multivariate normal distribution 3.1 Point and Interval Estimation Let X 1,..., X n be i.i.d. N p (µ, Σ). We are interested in evaluation of the maximum likelihood estimates
More informationGeneralized Multivariate Rank Type Test Statistics via Spatial U-Quantiles
Generalized Multivariate Rank Type Test Statistics via Spatial U-Quantiles Weihua Zhou 1 University of North Carolina at Charlotte and Robert Serfling 2 University of Texas at Dallas Final revision for
More informationSimple Linear Regression
Simple Linear Regression In simple linear regression we are concerned about the relationship between two variables, X and Y. There are two components to such a relationship. 1. The strength of the relationship.
More informationSummer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.
Summer School in Statistics for Astronomers V June 1 - June 6, 2009 Regression Mosuk Chow Statistics Department Penn State University. Adapted from notes prepared by RL Karandikar Mean and variance Recall
More informationSpring 2012 Math 541B Exam 1
Spring 2012 Math 541B Exam 1 1. A sample of size n is drawn without replacement from an urn containing N balls, m of which are red and N m are black; the balls are otherwise indistinguishable. Let X denote
More information1 Probability and Random Variables
1 Probability and Random Variables The models that you have seen thus far are deterministic models. For any time t, there is a unique solution X(t). On the other hand, stochastic models will result in
More informationNon-parametric Inference and Resampling
Non-parametric Inference and Resampling Exercises by David Wozabal (Last update. Juni 010) 1 Basic Facts about Rank and Order Statistics 1.1 10 students were asked about the amount of time they spend surfing
More informationRandom Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R
In probabilistic models, a random variable is a variable whose possible values are numerical outcomes of a random phenomenon. As a function or a map, it maps from an element (or an outcome) of a sample
More informationA Gentle Introduction to Gradient Boosting. Cheng Li College of Computer and Information Science Northeastern University
A Gentle Introduction to Gradient Boosting Cheng Li chengli@ccs.neu.edu College of Computer and Information Science Northeastern University Gradient Boosting a powerful machine learning algorithm it can
More informationAP Statistics Cumulative AP Exam Study Guide
AP Statistics Cumulative AP Eam Study Guide Chapters & 3 - Graphs Statistics the science of collecting, analyzing, and drawing conclusions from data. Descriptive methods of organizing and summarizing statistics
More informationPractice Problems Section Problems
Practice Problems Section 4-4-3 4-4 4-5 4-6 4-7 4-8 4-10 Supplemental Problems 4-1 to 4-9 4-13, 14, 15, 17, 19, 0 4-3, 34, 36, 38 4-47, 49, 5, 54, 55 4-59, 60, 63 4-66, 68, 69, 70, 74 4-79, 81, 84 4-85,
More informationRank-Based Estimation and Associated Inferences. for Linear Models with Cluster Correlated Errors
Rank-Based Estimation and Associated Inferences for Linear Models with Cluster Correlated Errors John D. Kloke Bucknell University Joseph W. McKean Western Michigan University M. Mushfiqur Rashid FDA Abstract
More informationA L A BA M A L A W R E V IE W
A L A BA M A L A W R E V IE W Volume 52 Fall 2000 Number 1 B E F O R E D I S A B I L I T Y C I V I L R I G HT S : C I V I L W A R P E N S I O N S A N D TH E P O L I T I C S O F D I S A B I L I T Y I N
More informationLifetime prediction and confidence bounds in accelerated degradation testing for lognormal response distributions with an Arrhenius rate relationship
Scholars' Mine Doctoral Dissertations Student Research & Creative Works Spring 01 Lifetime prediction and confidence bounds in accelerated degradation testing for lognormal response distributions with
More informationACM 116: Lectures 3 4
1 ACM 116: Lectures 3 4 Joint distributions The multivariate normal distribution Conditional distributions Independent random variables Conditional distributions and Monte Carlo: Rejection sampling Variance
More informationMultivariate Distributions
IEOR E4602: Quantitative Risk Management Spring 2016 c 2016 by Martin Haugh Multivariate Distributions We will study multivariate distributions in these notes, focusing 1 in particular on multivariate
More informationDETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics
DETAILED CONTENTS About the Author Preface to the Instructor To the Student How to Use SPSS With This Book PART I INTRODUCTION AND DESCRIPTIVE STATISTICS 1. Introduction to Statistics 1.1 Descriptive and
More information9. Linear Regression and Correlation
9. Linear Regression and Correlation Data: y a quantitative response variable x a quantitative explanatory variable (Chap. 8: Recall that both variables were categorical) For example, y = annual income,
More informationReview of Statistics
Review of Statistics Topics Descriptive Statistics Mean, Variance Probability Union event, joint event Random Variables Discrete and Continuous Distributions, Moments Two Random Variables Covariance and
More informationApplied Multivariate and Longitudinal Data Analysis
Applied Multivariate and Longitudinal Data Analysis Chapter 2: Inference about the mean vector(s) Ana-Maria Staicu SAS Hall 5220; 919-515-0644; astaicu@ncsu.edu 1 In this chapter we will discuss inference
More informationMeasure-theoretic probability
Measure-theoretic probability Koltay L. VEGTMAM144B November 28, 2012 (VEGTMAM144B) Measure-theoretic probability November 28, 2012 1 / 27 The probability space De nition The (Ω, A, P) measure space is
More informationChapter 2. Review of basic Statistical methods 1 Distribution, conditional distribution and moments
Chapter 2. Review of basic Statistical methods 1 Distribution, conditional distribution and moments We consider two kinds of random variables: discrete and continuous random variables. For discrete random
More informationStatistics Handbook. All statistical tables were computed by the author.
Statistics Handbook Contents Page Wilcoxon rank-sum test (Mann-Whitney equivalent) Wilcoxon matched-pairs test 3 Normal Distribution 4 Z-test Related samples t-test 5 Unrelated samples t-test 6 Variance
More informationThe extreme points of symmetric norms on R^2
Graduate Theses and Dissertations Iowa State University Capstones, Theses and Dissertations 2008 The extreme points of symmetric norms on R^2 Anchalee Khemphet Iowa State University Follow this and additional
More information, 0 x < 2. a. Find the probability that the text is checked out for more than half an hour but less than an hour. = (1/2)2
Math 205 Spring 206 Dr. Lily Yen Midterm 2 Show all your work Name: 8 Problem : The library at Capilano University has a copy of Math 205 text on two-hour reserve. Let X denote the amount of time the text
More informationSTAT331. Cox s Proportional Hazards Model
STAT331 Cox s Proportional Hazards Model In this unit we introduce Cox s proportional hazards (Cox s PH) model, give a heuristic development of the partial likelihood function, and discuss adaptations
More informationMath Review Sheet, Fall 2008
1 Descriptive Statistics Math 3070-5 Review Sheet, Fall 2008 First we need to know about the relationship among Population Samples Objects The distribution of the population can be given in one of the
More informationYour use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at
Regression Analysis when there is Prior Information about Supplementary Variables Author(s): D. R. Cox Source: Journal of the Royal Statistical Society. Series B (Methodological), Vol. 22, No. 1 (1960),
More informationIndependent Component (IC) Models: New Extensions of the Multinormal Model
Independent Component (IC) Models: New Extensions of the Multinormal Model Davy Paindaveine (joint with Klaus Nordhausen, Hannu Oja, and Sara Taskinen) School of Public Health, ULB, April 2008 My research
More informationHypothesis Testing. Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA
Hypothesis Testing Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA An Example Mardia et al. (979, p. ) reprint data from Frets (9) giving the length and breadth (in
More informationSTAT Chapter 11: Regression
STAT 515 -- Chapter 11: Regression Mostly we have studied the behavior of a single random variable. Often, however, we gather data on two random variables. We wish to determine: Is there a relationship
More informationMA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2
MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 1 Bootstrapped Bias and CIs Given a multiple regression model with mean and
More informationRegression analysis based on stratified samples
Biometrika (1986), 73, 3, pp. 605-14 Printed in Great Britain Regression analysis based on stratified samples BY CHARLES P. QUESENBERRY, JR AND NICHOLAS P. JEWELL Program in Biostatistics, University of
More informationUnit 10: Simple Linear Regression and Correlation
Unit 10: Simple Linear Regression and Correlation Statistics 571: Statistical Methods Ramón V. León 6/28/2004 Unit 10 - Stat 571 - Ramón V. León 1 Introductory Remarks Regression analysis is a method for
More informationMath 494: Mathematical Statistics
Math 494: Mathematical Statistics Instructor: Jimin Ding jmding@wustl.edu Department of Mathematics Washington University in St. Louis Class materials are available on course website (www.math.wustl.edu/
More informationHANDBOOK OF APPLICABLE MATHEMATICS
HANDBOOK OF APPLICABLE MATHEMATICS Chief Editor: Walter Ledermann Volume VI: Statistics PART A Edited by Emlyn Lloyd University of Lancaster A Wiley-Interscience Publication JOHN WILEY & SONS Chichester
More informationWhat to do today (Nov 22, 2018)?
What to do today (Nov 22, 2018)? Part 1. Introduction and Review (Chp 1-5) Part 2. Basic Statistical Inference (Chp 6-9) Part 3. Important Topics in Statistics (Chp 10-13) Part 4. Further Topics (Selected
More information2 (Statistics) Random variables
2 (Statistics) Random variables References: DeGroot and Schervish, chapters 3, 4 and 5; Stirzaker, chapters 4, 5 and 6 We will now study the main tools use for modeling experiments with unknown outcomes
More informationBootstrapping the Confidence Intervals of R 2 MAD for Samples from Contaminated Standard Logistic Distribution
Pertanika J. Sci. & Technol. 18 (1): 209 221 (2010) ISSN: 0128-7680 Universiti Putra Malaysia Press Bootstrapping the Confidence Intervals of R 2 MAD for Samples from Contaminated Standard Logistic Distribution
More informationON THE DISTRIBUTION OF RESIDUALS IN FITTED PARAMETRIC MODELS. C. P. Quesenberry and Charles Quesenberry, Jr.
.- ON THE DISTRIBUTION OF RESIDUALS IN FITTED PARAMETRIC MODELS C. P. Quesenberry and Charles Quesenberry, Jr. Results of a simulation study of the fit of data to an estimated parametric model are reported.
More informationGROUPED DATA E.G. FOR SAMPLE OF RAW DATA (E.G. 4, 12, 7, 5, MEAN G x / n STANDARD DEVIATION MEDIAN AND QUARTILES STANDARD DEVIATION
FOR SAMPLE OF RAW DATA (E.G. 4, 1, 7, 5, 11, 6, 9, 7, 11, 5, 4, 7) BE ABLE TO COMPUTE MEAN G / STANDARD DEVIATION MEDIAN AND QUARTILES Σ ( Σ) / 1 GROUPED DATA E.G. AGE FREQ. 0-9 53 10-19 4...... 80-89
More informationColby College Catalogue
Colby College Digital Commons @ Colby Colby Catalogues College Archives: Colbiana Collection 1871 Colby College Catalogue 1871-1872 Colby College Follow this and additional works at: http://digitalcommonscolbyedu/catalogs
More informationCh. 1: Data and Distributions
Ch. 1: Data and Distributions Populations vs. Samples How to graphically display data Histograms, dot plots, stem plots, etc Helps to show how samples are distributed Distributions of both continuous and
More informationFormulas for probability theory and linear models SF2941
Formulas for probability theory and linear models SF2941 These pages + Appendix 2 of Gut) are permitted as assistance at the exam. 11 maj 2008 Selected formulae of probability Bivariate probability Transforms
More information