RANKED SET SAMPLING FOR ENVIRONMENTAL STUDIES
|
|
- Britney Paula Pitts
- 5 years ago
- Views:
Transcription
1 . RANKED SET SAMPLING FOR ENVIRONMENTAL STUDIES 000 JAYANT V. DESHPANDE Indian Institute of Science Education and Research, Pune , India Talk Delivered at the INDO-US Workshop on Environmental Statistics held at Duke University, NC, U.S.A. from March 4 to March 6,
2 ABSTRACT Here the fact that Ranked Set Sampling (RSS) procedures were initially developed for environmental studies is emphasized. These procedures envisage ranking a large number of observations (in small groups) and actually measuring only a few of them. The basic procedure to estimate the mean and its higher efficiency over the SRS procedure is demonstrated. Then we discuss the modification to be carried out for developing nonparametric confidence intervals for quantiles. Lastly we discuss some tests for perfect judgment ranking of observations which should be used as preliminary procedures before the RSS methodology is adopted. 2
3 Outline Examples from Environmental Studies Introduction to Ranked Set Sampling Estimation of Mean Confidence Intervals for Quantiles Tests for Perfect Ranking Conclusions. 3
4 1. Introduction and Summary This paper reviews the use of ranked set sampling methodology for statistical problems arising in environmental studies. These studies typically require high cost observations. Hence it is important that the inferences be based on the smallest number of observations. Ranked set sampling is a methodology which can improve the efficiency of techniques such as estimation and confidence intervals without increasing the number of substantial observations; if additional information on their relative ranking is available (or can be obtained without significant expense). 4
5 In the second section we discuss the modes of obtaining environmental data and the reasons why this can be very expensive, bringing out the need for exercising parsimony in the number of observations. We introduce ranked set sampling (RSS) and explain why this leads to greater efficiency of the statistical procedures compared to those based on the usual simple random sampling (SRS) procedures. We exhibit the gain in efficiency of the mean based on RSS over SRS. We also explicitly bring out the increase in coverage probabilities of distribution free confidence intervals based on order statistics in similar circumstances. 5
6 In the operation of RSS procedures one needs to rank a larger number of observations than those measured for the characteristic of interest. This ordering is often done on the basis of some surrogate variable or covariate whose values are available practically free of cost. For choice of optimal procedures we must have perfect rankings. But the question of perfection of ranking remains unanswered. We therefore discuss tests available for testing perfect ranking against the extreme alternative of random rankings. We see that reasonable test procedures allow us to either decide in favour of perfect ranking or against it. 6
7 In the last section we discuss some aspects of the large body of work of RSS methodology giving references to a recent book and some review papers. It may be observed that after introducing the well known properties of the RSS estimation of the mean, we discuss some newer work, not yet reported in say the book of Chen, Bai and Sinha (2003). 7
8 2. Problems Arising in Environmental Studies where Ranked Set Sampling is Useful Observations in environmental studies are notoriously difficult and/or expensive to obtain. Studies after studies have confirmed this. Hence the RSS methodology which has higher efficiency for the same number of measured (as opposed to ranked) observations is the preferred one. In fact the origins and much of its development has been with respect to the domain of environmental studies. 8
9 McIntyre (1952) who initially proposed this methodology did so in the context of estimating mean pasture and forage yields. In order to obtain observations one must harvest the forage. It would involve moving and clipping the browse and then weighing it after drying it. All this is a labourious, expensive and time consuming process. So all avenues towards reduction of number of observations are explored. Now one can rank the observational units (called quadrats) more or less by just imputing them visually. Thus ranking can be accomplished in a large number of quadrats without much expense and included in the RSS procedures. 9
10 Another example often quoted is that of assessing the status of hazard waste sites. The radiochemical analysis of the soil samples is expensive as well as hazardous. But the samples can be ranked according to the value of a surrogate variable, viz., the Field Instrument for the Determination of Low Energy Radiation (FIDLER) counts per minute taken at the location from where the soil samples for the radiochemical analysis would be obtained. These can be obtained at relatively low cost and then used for ranking a large number of soil samples. The actual measurements are on plutonium concentration, per square metre of surface soil (Yu and Lam (1997)), which are expensive. 10
11 Another experiment described in details in Murray, Ridout and Cross (2000). Leaves on standing trees were sprayed with a fluorescent, water soluble tracer at 2% concentration in water. Then a large number of leaves were harvested and were ranked according to the surface area covered with the chemical. Once ranking was complete, the deposits on fewer number of selected ranked leaves were washed and the flow directed in a test tube. The relevant concentration of the tracer was then measured using a spectrometer. Thus the data was collected. It may be noted that the ranking stage is much less laborious, time consuming and expensive than the measuring stage. 11
12 Another example concerns estimation of mercury contamination in fish as discussed in Murff and Sager (2006). Ten appropriate live or freshly dead fish were selected from a catch. Their length was measured for ranking. Then in the laboratory a fillet was removed from each fish, homogenized with a blender and analyzed for mercury concentration (in mg/kg) with a gas chromatograph. This is an expensive process, besides resulting in destruction of the sample. RSS methodology was suggested for use in such an experiment. However, in this case it was found that although more efficient, the gain was not very much for ordinary least squares regression technique. Hence it is of relevance to see how much gain in the efficiency is actually made by the RSS methodology over the SRS methodology. 12
13 3. Ranked Set Sampling Methodology and Estimation of the Mean The methodology of Ranked Set Sampling was introduced in 1952 by McIntyre. For a while it did not attract the attention it deserved, but there has been a surge in the interest in it over the last twenty years or so. Patil (1994), Barnett (1999), Chen, Bai and Sinha (2003), Wolfe (2004) may be seen as some of the landmark contributions. Let us first describe the basic framework of this methodology. It consists of the following stages. Suppose we are interested in a characteristic represented by the values of a random variable X taken by each of the units in the population. Let X have continuous probability distribution with c.d.f. F and p.d.f. f. Let µ be the expectation of X and σ 2 its variance. 13
14 (i) First choose k units from the population as a simple random sample. By some (hopefully inexpensive) procedure select the unit with the smallest value of X. Let it be X [1]. (ii) A second independent SRS of size k is chosen and the unit with the second smallest value X [2] of the character is selected. Similarly, continue this process until one has X [1], X [2],, X [k], a collection of independent order statistics from k disjoint collections of k simple random samples. 14
15 This constitutes the basic balanced Ranked Set Sample. These are independently distributed with respective p.d.f. s f (i) (x [i] ) = k! (i 1)!(k i)! (F (x [i])) i 1 (1 F (x [i] )) k i f(x [i] ), < x [i] <. 15
16 Note that we obtained k 2 SRS samples from the population, but retained only one (with the appropriate rank) from each group of k samples. So the total effort has been to rank k random samples of size k each, and retaining only k, the i-th (i = 1, 2,, k) order statistic from the i-th group respectively, which are to be actually measured. It is expected that ranking is cheap and measuring is expensive. Ranking k 2 observations and using k of them after measurement is thus a sampling method of boosting the efficiency of the statistical procedures from that based merely in groups of k (SRS) measurements. 16
17 In order to retain the reliability of the rankings, it is usually suggested that it be carried out in small groups, say upto 4, 5, 6 in size. However, this would limit both the versatility and efficiency of the procedures. Hence recourse is taken to replicating the whole process m times. Thus mk 2 units are examined, each group of k is ranked within itself and the i-th order statistic is obtained from m groups, leading to the data X [i]j, i = 1,, k, j = 1,, m. This constituted balanced RSS sampling. In unbalanced sampling instead of m replications for each order statistic, one will have m i replications of the i-th order statistic. 17
18 4. Estimation of Mean Let us consider the estimation of µ, the mean of F now. If we use k SRS observations then X, its sample mean is an unbiased estimator with variance σ 2 /k. The mean X = 1 k X [i] k i=1 of the k RSS observation is also unbiased for µ as seen below. E(X ) = 1 k E(X [i] ) k = 1 k = = i=1 k i=1 k! x (i 1)!(k 1)! (F (x)) i 1 (1 σ(x)) k i f(x)dx [ k ( ) k 1 x i 1 i=1 (F (x)) i 1 (1 F (x)) k i] f(x)dx xf(x)dx = µ. 18
19 Further we see that V (X ) = 1 k V (X k 2 [i] ) i=1 = 1 k E(X k 2 [i] E(X [i] ) 2 i=1 = 1 k 2[kσ2 ( µ (i) µ)2 ] = σ2 k 1 k 2 k (µ (i) µ)2 i=1 where µ (i) = E(X [i]) = V (X) 1 k 2 (µ (i) µ) 2 V (X). Hence, if the rankings are perfect then the RSS estimator of m has a smaller variance than the SRS estimator. This increase in the efficiency comes at the cost (if any) of ranking the k 2 observations, in addition to measuring the k selected order statistics. 19
20 5. Confidence Intervals for Quantiles If F is a continuous c.d.f. with the quantile of order p, q p = inf{x : F (x) p} then a standard nonparametric confidence interval for q p is provided by the order statistics. Let X 1,, X n be a random sample (SRS) of size n and X 1:n X n:n the order statistics from it then if we set P [X r:n q p X s:n ] s 1 ( ) x = p j (1 p) n j = 1 α, j j=r [X r:n, X s:n ], r < s, provides a (1 α)100% confidence interval for q p. 20
21 Let us now adopt the RSS methodology. We select n 2 independent observations. These are ranked in groups are n each. Then let X [1], X [2],, X [n] be the 1-st, 2-nd, n-th order statistics from these disjoint groups. Hence, although the marginal distribution of X [i] is the same as that of X [i:n], they further are independent. Then one interprets the interval [X [r], X [s] ], r < s, as a confidence interval for q p. Due to independence of X [r] and X [s], the confidence coefficient is P [X [r] q p X [s] ] P [X [r] q p ][1 P (X [s] q p )] n ( ) n = p j (1 p) n j j j=r n ( ) n 1 p j (1 p) n j j. j=1 21
22 One notes two properties here (i) E(X s:n X r:n ) = E(X [s] X [r] ). Hence both the SRS based and the RSS based confidence intervals have the same expected lengths. (ii) If r and s are so chosen that the confidence coefficient for the traditional SRS confidence interval is 1 α with probability α/2 in each tail then the confidence coefficient of the RSS based confidence interval is ( 1 α ) 2 α 2 = 1 α + 2 4, giving an increase of α2 4 in coverage property. 22
23 It is recognized that X [r] and X [s] being order statistics from independent SRS do not necessarily obey X [r] < X [s], so in some cases we may not have a proper interval at all. It is therefore suggested that we order the two statistics X [r] and X [s] as X (rs1), X (rs2) and use [X (rs1), X (rs2) ] as the confidence interval. The confidence coefficient of this modified interval is P [X (rs1) q p X (rs2) ] P [{X [r] q p X [s] } or {X [s] q p X [r] }] = P [X [r] q p X [s] ] + P [X [s] q p X [r] ] P (X [r] q p )P (q p X [s] ) +P (X [s] q p )P (q p X [r] ). 23
24 Again, if r and s are so chosen that the SRS confidence intervals leave probability α/2 in the tails then the probability of coverage of [X (rs1) X (rs2) ] is ( 1 α ) 2 ( α ) 2 α 2 + = 1 α This C.I. thus adds a further α 2 /4 to the confidence coefficient of the SRS C.I. However, this comes at the cost of some increase in the expected length of the new C.I.. It can be seen that E(X (rs2) X (rs1) ) = E F(ss) {2XF [r] (X) X} +E F(rr) {2XF [s] (X) X} and can be calculated for specific distributions. 24
25 To increase the flexibility of the confidence intervals, it is suggested that m independent groups of n 2 observations be obtained for ranking purposes. From these X [i]j, j = 1, 2,, m, i = 1,, n replicates of the i-th order statistic be obtained. These N independent order statistics be ordered from lowest to highest as X 1:N X N:N. Then 1 r < s N can be appropriately chosen to obtain nonparametric confidence intervals with confidence coefficient (1 α)100%. Fo details see Ozturk and Deshpande (2004). Further similar RSS based confidence intervals for quantiles of finite populations have been discussed in Deshpande, Frey and Ozturk (2006). 25
26 6. Judgment Rankings and Tests for Perfect Judgment The ranked set sampling protocol depends heavily on the ability to rank observations which are not measured. As the groups in which ranking is to be made become large such rankings become more difficult and thus more unreliable. So it is always preferred to actually rank observations only in small groups (usually restricted to 4, 5 or 6 units). There are other practical considerations as well. If two quadrats are adjacent then it is easy to visually compare them for their prospective yields. But if the two or more quadrats are far flung them to make comparative judgments is far more difficult. 26
27 The extensive ranked set sampling literature includes optimal procedures which rely on the assumption of perfect rankings and also those which are more robust with respect to the violation of this assumption. So an applied statistician who assumes that the rankings are perfect and chooses the appropriately optimal procedure faces the possibility that the procedures are not optimal and perhaps may not even be valid, if the rankings are imperfect. So as a preliminary procedure we suggest that a test for the perfectness of rankings may be used. 27
28 In Frey, Deshpande and Ozturk (2007) the following approach has been suggested. Let X [i]j, i = 1,, m, j = 1,, n j, be a ranked set sample. Here m observations are ranked and X [i]j is the i-th order statistic to be measured. This is done n i times j = 1, 2,, n i, giving us a full set of N = n 1 + n n m observations. Let R [i]j be the rank of X [i]j among these N observations. In case of perfect rankings one can find the probabilities of each of the N! possible rank vectors which are the permutations of {1, 2,, N}. The null hypothesis says that the distributions of the ranks follow the distribution of order statistics ranks. 28
29 This distribution although theoretically known, is not the usual equal probability for all possible rankings. In a very small example with m = 3, n 1 = n 2 = n 3 = 1 one can obtain the probabilities under the null hypothesis, of the vector R = (R [1] 1, R [2] 1, R [3] 1) as follows. R Prob / / / / / /105 A possible way to test the H 0 is to form the critical region as the union of the least likely (under H 0 ) outcomes. For example, the union {(2 3 1), (3 1 2), (3 2 1)} will provide a critical region with almost.05 as the probability of first type of error. This is in consonance with Fisher s approach of constructing tests of significance. 29
30 Another approach due to Neyman-Pearson requires, first of all, an alternative hypothesis. Heuristically, one may say that the alternative to perfect rankings is totally random ranking, i.e., those providing equal probability to each rank vector. This would indicate that the rankings are arbitrary and devoid of any information regarding the sizes of the observations. The Neyman-Pearson approach will reject the H 0 when the ratio of the probabilities of R under H 1 and under H 0 exceeds a threshold value so chosen that the probability of type I error is the specified α. It is quickly seen that this approach too leads to exactly the same critical region as the Fisherian approach described above. Also see Frey and Wang (2013). 30
31 However, the distribution theory of the ranks becomes too complicated for even moderate sized m and n i. Frey (2007) has provided a recursion formula for this purpose. But as is usual in nonparametric tests one investigates functions (linear or otherwise) of ranks whose exact distributions may be tabulated for small samples and asymptotic distributions may be obtained by appropriate versions of the central limit theorem. Two such statistics have been proposed by Frey, Ozturk and Deshpande (2007). Let R [i]. = 1 n i n i j=1 R [i]j, T i = 1 N (R [i]. E(R [i]. ) K = T QT where T = (T 1,, T m ), and and Q is the Moore-Penrose inverse of the asymptotic covariance matrix of T. The expectations and the covariances are under H 0. Then K has asymptotically, under H 0, the chi-squared distri- 31
32 bution with m 1 degree of freedom. Large values, indicating large departures of the observed R from its null expectation, will indicate evidence against H 0 and for some alternative hypothesis. 32
33 Another statistic proposed in the same paper is m n W = ir [i]j, i=1 j=1 the test rejecting for small values of the statistic. After standardization, its asymptotic distribution is the standard normal. We consider performance of these tests through simulated power for alternatives which are convex combinations of probabilities under H 0 and under the extreme random rankings. We find the test based on W consistently out-performing the one based on K. Similar work was undertaken by Vock and Balakrishnan (2011, 2013). In the first paper they propose a Jonkheere-Terpstra type test for perfect rankings and in the second paper they find that it is essentially the Frey, Ozturk, Deshpande (2007) test, the test statistics being linear functions of each other. 33
34 Example : One of the examples introduced earlier was about ranking and measuring the percent cover of leaves under various sprayer settings. The experiment had m = n = 5 giving a total of 25 observations. The ranks observed in the 5 groups of order statistics were (1, 5, 4, 6, 3), (2, 11,10, 8, 15), (22, 12, 14, 18, 13), (7, 9, 20, 19, 23), and (16, 25, 24, 17, 21). We find W = which has (simulated) p- value > Hence it may be concluded that there is insufficient evidence to reject the null hypothesis of perfect orderings. 34
35 7. Conclusions In this review we have only provided an introduction to the ranked set sampling methodology as applicable to environmental studies. Since its introduction in 1952 it has taken great strides in analysis of parametric and nonparametric models as introduced here. Further, problems like optimal estimation in the context of lognormal extreme value and other distributions are discussed by Barnett (1999). An easy to read introduction is available in Patil (2002). Chen, Bai and Sinha (2003) have an entire book on ranked set sampling. Ozturk and Deshapnde (2004) have proposed a new test for the nonparametric two sample scale problem. Newer contributions include more detailed power studies. Murff and Sagar (2006) have shown that the use of this methodology in ordinary least squares regression does improve the efficiency, but only marginally. 35
36 However, the basic result which states that if inexpensive (or cost free) ranking is incorporated along with measurements which are expensive, then RSS does provide some increase in efficiency of the procedure over its SRS version. 36
37 References [1] Barnett V., (1999), Ranked set sample design for environmental investigations, Env. Eco. Statist., 6, [2] Chen Z., Bai Z., Sinha B. K. (2003), Ranked Set Sampling, Springer. [3] Deshpande J. V., Frey H., Ozturk O, (2006), Nonparametric rank set sampling confidence intervals for quantiles of a finite population, Environ Eco. Statist., 13, [4] Frey J., Ozturk O., Deshpande J. V., (2007), Nonparametric tests for perfect judgment rankings, J.Am. Statisti. Assoc., 102, [5] Frey J., (2007), A note on probability involving independent order statistics, J. Statist. Comp. Sim., 77, [6] Frey J., Wang L., (2013), Most powerful tests to perfect rankings, Comp. Statist. Data An., 60, [7] McIntyre G. A., (1952), A method for 37
38 unbiased selective sampling, using ranked sets, Aust. J. Agr. Res., 3, [8] Murff E., Sager T., (2006), The relative efficiency of ranked set sampling in ordinary least squares regression, Environ Eco. Statist., 13, [9] Murray J. A., Ridout M. S., Cross J. V., (2000), The use of ranked set sampling in spray deposit assessment, Aspects of App. Bio., 57, [10] Ozturk O., Deshpande J.V., (2004), A new nonparametric test using ranked set data for a two sample scale problem, Sankhya, 66, [11] Ozturk O., Deshpande J. V., (2006), Ranked set sample nonparametric quantile confidence intervals, J. Statist. Planning Inf., 136, [12] Patil G. P., Sinha A. K., Taillie C., (1994), Ranked set sampling, in Handbook of Statistics, 12, Ed. G. P. Patil and C. R. Rao,
39 [13] Patil G. P., (2002), Ranked set sampling, Ency. Environmentrics, Ed. A. H. El-Sharawi, W. W. Piegorsch, 3, [14] Vock M., Balakrishnan N., (2011), A Jonkheere-Terpstra type test for perfect ranking in balanced ranked set sampling, J. Statist. Planing Inf., 141, [15] Vock M., Balakrishnan N., (2013), A connection between two nonparametric tests for perfect ranking in balanced ranked set sampling, Comm. statist.(th. Methods), 42, [16] Wolfe D. A., (2004), Ranked set sampling : An approach to more efficient data collection, Statist. Sc., 19, [17] Yu P.L.H., Lam K., (1997), Regression estimator in ranked set sampling, Biometrics, 53,
PARAMETRIC TESTS OF PERFECT JUDGMENT RANKING BASED ON ORDERED RANKED SET SAMPLES
REVSTAT Statistical Journal Volume 16, Number 4, October 2018, 463 474 PARAMETRIC TESTS OF PERFECT JUDGMENT RANKING BASED ON ORDERED RANKED SET SAMPLES Authors: Ehsan Zamanzade Department of Statistics,
More informationExact two-sample nonparametric test for quantile difference between two populations based on ranked set samples
Ann Inst Stat Math (2009 61:235 249 DOI 10.1007/s10463-007-0141-5 Exact two-sample nonparametric test for quantile difference between two populations based on ranked set samples Omer Ozturk N. Balakrishnan
More informationOn Ranked Set Sampling for Multiple Characteristics. M.S. Ridout
On Ranked Set Sampling for Multiple Characteristics M.S. Ridout Institute of Mathematics and Statistics University of Kent, Canterbury Kent CT2 7NF U.K. Abstract We consider the selection of samples in
More informationStatistical Inference: Estimation and Confidence Intervals Hypothesis Testing
Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing 1 In most statistics problems, we assume that the data have been generated from some unknown probability distribution. We desire
More informationRANK-SUM TEST FOR TWO-SAMPLE LOCATION PROBLEM UNDER ORDER RESTRICTED RANDOMIZED DESIGN
RANK-SUM TEST FOR TWO-SAMPLE LOCATION PROBLEM UNDER ORDER RESTRICTED RANDOMIZED DESIGN DISSERTATION Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate
More informationFundamental Probability and Statistics
Fundamental Probability and Statistics "There are known knowns. These are things we know that we know. There are known unknowns. That is to say, there are things that we know we don't know. But there are
More informationCHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)
FROM: PAGANO, R. R. (007) I. INTRODUCTION: DISTINCTION BETWEEN PARAMETRIC AND NON-PARAMETRIC TESTS Statistical inference tests are often classified as to whether they are parametric or nonparametric Parameter
More informationStatistical Inference with Randomized Nomination Sampling
Statistical Inference with Randomized Nomination Sampling by Mohammad Nourmohammadi A Thesis submitted to the Faculty of Graduate Studies of The University of Manitoba in partial fulfilment of the requirements
More informationChapter 1 Statistical Inference
Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations
More informationSociology 6Z03 Review II
Sociology 6Z03 Review II John Fox McMaster University Fall 2016 John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 1 / 35 Outline: Review II Probability Part I Sampling Distributions Probability
More informationTest of the Correlation Coefficient in Bivariate Normal Populations Using Ranked Set Sampling
JIRSS (05) Vol. 4, No., pp -3 Test of the Correlation Coefficient in Bivariate Normal Populations Using Ranked Set Sampling Nader Nematollahi, Reza Shahi Department of Statistics, Allameh Tabataba i University,
More informationConfidence intervals for quantiles and tolerance intervals based on ordered ranked set samples
AISM (2006) 58: 757 777 DOI 10.1007/s10463-006-0035-y N. Balakrishnan T. Li Confidence intervals for quantiles and tolerance intervals based on ordered ranked set samples Received: 1 October 2004 / Revised:
More informationMonte Carlo Studies. The response in a Monte Carlo study is a random variable.
Monte Carlo Studies The response in a Monte Carlo study is a random variable. The response in a Monte Carlo study has a variance that comes from the variance of the stochastic elements in the data-generating
More informationThree Non-parametric Control Charts Based on Ranked Set Sampling Waliporn Tapang, Adisak Pongpullponsak* and Sukuman Sarikavanij
Chiang Mai J. Sci. 16; 43(4) : 914-99 http://epg.science.cmu.ac.th/ejournal/ Contributed Paper Three Non-parametric s Based on Ranked Set Sampling Waliporn Tapang, Adisak Pongpullponsak* and Sukuman Sarikavanij
More informationApplication of Variance Homogeneity Tests Under Violation of Normality Assumption
Application of Variance Homogeneity Tests Under Violation of Normality Assumption Alisa A. Gorbunova, Boris Yu. Lemeshko Novosibirsk State Technical University Novosibirsk, Russia e-mail: gorbunova.alisa@gmail.com
More informationParameter Estimation, Sampling Distributions & Hypothesis Testing
Parameter Estimation, Sampling Distributions & Hypothesis Testing Parameter Estimation & Hypothesis Testing In doing research, we are usually interested in some feature of a population distribution (which
More informationTesting Statistical Hypotheses
E.L. Lehmann Joseph P. Romano Testing Statistical Hypotheses Third Edition 4y Springer Preface vii I Small-Sample Theory 1 1 The General Decision Problem 3 1.1 Statistical Inference and Statistical Decisions
More informationDISTRIBUTION FREE TWO-SAMPLE METHODS FOR JUDGMENT POST-STRATIFIED DATA
Statistica Sinica 25 (2015), 1691-1712 doi:http://dx.doi.org/10.5705/ss.2014.163 DISTRIBUTION FREE TWO-SAMPLE METHODS FOR JUDGMENT POST-STRATIFIED DATA Omer Ozturk The Ohio State University Abstract: A
More informationSTATS 200: Introduction to Statistical Inference. Lecture 29: Course review
STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout
More informationContents 1. Contents
Contents 1 Contents 1 One-Sample Methods 3 1.1 Parametric Methods.................... 4 1.1.1 One-sample Z-test (see Chapter 0.3.1)...... 4 1.1.2 One-sample t-test................. 6 1.1.3 Large sample
More informationContents Kruskal-Wallis Test Friedman s Two-way Analysis of Variance by Ranks... 47
Contents 1 Non-parametric Tests 3 1.1 Introduction....................................... 3 1.2 Advantages of Non-parametric Tests......................... 4 1.3 Disadvantages of Non-parametric Tests........................
More informationTABLES AND FORMULAS FOR MOORE Basic Practice of Statistics
TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics Exploring Data: Distributions Look for overall pattern (shape, center, spread) and deviations (outliers). Mean (use a calculator): x = x 1 + x
More informationBIOL 4605/7220 CH 20.1 Correlation
BIOL 4605/70 CH 0. Correlation GPT Lectures Cailin Xu November 9, 0 GLM: correlation Regression ANOVA Only one dependent variable GLM ANCOVA Multivariate analysis Multiple dependent variables (Correlation)
More informationPOWER AND TYPE I ERROR RATE COMPARISON OF MULTIVARIATE ANALYSIS OF VARIANCE
POWER AND TYPE I ERROR RATE COMPARISON OF MULTIVARIATE ANALYSIS OF VARIANCE Supported by Patrick Adebayo 1 and Ahmed Ibrahim 1 Department of Statistics, University of Ilorin, Kwara State, Nigeria Department
More informationChap The McGraw-Hill Companies, Inc. All rights reserved.
11 pter11 Chap Analysis of Variance Overview of ANOVA Multiple Comparisons Tests for Homogeneity of Variances Two-Factor ANOVA Without Replication General Linear Model Experimental Design: An Overview
More informationTesting Statistical Hypotheses
E.L. Lehmann Joseph P. Romano, 02LEu1 ttd ~Lt~S Testing Statistical Hypotheses Third Edition With 6 Illustrations ~Springer 2 The Probability Background 28 2.1 Probability and Measure 28 2.2 Integration.........
More informationDr. Junchao Xia Center of Biophysics and Computational Biology. Fall /1/2016 1/46
BIO5312 Biostatistics Lecture 10:Regression and Correlation Methods Dr. Junchao Xia Center of Biophysics and Computational Biology Fall 2016 11/1/2016 1/46 Outline In this lecture, we will discuss topics
More information3 Joint Distributions 71
2.2.3 The Normal Distribution 54 2.2.4 The Beta Density 58 2.3 Functions of a Random Variable 58 2.4 Concluding Remarks 64 2.5 Problems 64 3 Joint Distributions 71 3.1 Introduction 71 3.2 Discrete Random
More informationPLEASE SCROLL DOWN FOR ARTICLE. Full terms and conditions of use:
This article was downloaded by: [Ferdowsi University of Mashhad] On: 7 June 2010 Access details: Access Details: [subscription number 912974449] Publisher Taylor & Francis Informa Ltd Registered in England
More informationEvaluation requires to define performance measures to be optimized
Evaluation Basic concepts Evaluation requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain (generalization error) approximation
More informationSTAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015
STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots March 8, 2015 The duality between CI and hypothesis testing The duality between CI and hypothesis
More informationCentral Limit Theorem ( 5.3)
Central Limit Theorem ( 5.3) Let X 1, X 2,... be a sequence of independent random variables, each having n mean µ and variance σ 2. Then the distribution of the partial sum S n = X i i=1 becomes approximately
More informationPurposes of Data Analysis. Variables and Samples. Parameters and Statistics. Part 1: Probability Distributions
Part 1: Probability Distributions Purposes of Data Analysis True Distributions or Relationships in the Earths System Probability Distribution Normal Distribution Student-t Distribution Chi Square Distribution
More informationNon-parametric Inference and Resampling
Non-parametric Inference and Resampling Exercises by David Wozabal (Last update. Juni 010) 1 Basic Facts about Rank and Order Statistics 1.1 10 students were asked about the amount of time they spend surfing
More informationUNIT 5:Random number generation And Variation Generation
UNIT 5:Random number generation And Variation Generation RANDOM-NUMBER GENERATION Random numbers are a necessary basic ingredient in the simulation of almost all discrete systems. Most computer languages
More informationCRISP: Capture-Recapture Interactive Simulation Package
CRISP: Capture-Recapture Interactive Simulation Package George Volichenko Carnegie Mellon University Pittsburgh, PA gvoliche@andrew.cmu.edu December 17, 2012 Contents 1 Executive Summary 1 2 Introduction
More information6 Single Sample Methods for a Location Parameter
6 Single Sample Methods for a Location Parameter If there are serious departures from parametric test assumptions (e.g., normality or symmetry), nonparametric tests on a measure of central tendency (usually
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationEco517 Fall 2004 C. Sims MIDTERM EXAM
Eco517 Fall 2004 C. Sims MIDTERM EXAM Answer all four questions. Each is worth 23 points. Do not devote disproportionate time to any one question unless you have answered all the others. (1) We are considering
More information401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.
401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis
More informationA MODIFICATION OF THE HARTUNG KNAPP CONFIDENCE INTERVAL ON THE VARIANCE COMPONENT IN TWO VARIANCE COMPONENT MODELS
K Y B E R N E T I K A V O L U M E 4 3 ( 2 0 0 7, N U M B E R 4, P A G E S 4 7 1 4 8 0 A MODIFICATION OF THE HARTUNG KNAPP CONFIDENCE INTERVAL ON THE VARIANCE COMPONENT IN TWO VARIANCE COMPONENT MODELS
More informationHANDBOOK OF APPLICABLE MATHEMATICS
HANDBOOK OF APPLICABLE MATHEMATICS Chief Editor: Walter Ledermann Volume VI: Statistics PART A Edited by Emlyn Lloyd University of Lancaster A Wiley-Interscience Publication JOHN WILEY & SONS Chichester
More informationConfidence Intervals, Testing and ANOVA Summary
Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0
More information1 Statistical inference for a population mean
1 Statistical inference for a population mean 1. Inference for a large sample, known variance Suppose X 1,..., X n represents a large random sample of data from a population with unknown mean µ and known
More informationHypothesis testing:power, test statistic CMS:
Hypothesis testing:power, test statistic The more sensitive the test, the better it can discriminate between the null and the alternative hypothesis, quantitatively, maximal power In order to achieve this
More informationDistribution-Free Procedures (Devore Chapter Fifteen)
Distribution-Free Procedures (Devore Chapter Fifteen) MATH-5-01: Probability and Statistics II Spring 018 Contents 1 Nonparametric Hypothesis Tests 1 1.1 The Wilcoxon Rank Sum Test........... 1 1. Normal
More informationLet us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided
Let us first identify some classes of hypotheses. simple versus simple H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided H 0 : θ θ 0 versus H 1 : θ > θ 0. (2) two-sided; null on extremes H 0 : θ θ 1 or
More informationTesting Simple Hypotheses R.L. Wolpert Institute of Statistics and Decision Sciences Duke University, Box Durham, NC 27708, USA
Testing Simple Hypotheses R.L. Wolpert Institute of Statistics and Decision Sciences Duke University, Box 90251 Durham, NC 27708, USA Summary: Pre-experimental Frequentist error probabilities do not summarize
More informationEvaluation. Andrea Passerini Machine Learning. Evaluation
Andrea Passerini passerini@disi.unitn.it Machine Learning Basic concepts requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain
More informationTest Code: STA/STB (Short Answer Type) 2013 Junior Research Fellowship for Research Course in Statistics
Test Code: STA/STB (Short Answer Type) 2013 Junior Research Fellowship for Research Course in Statistics The candidates for the research course in Statistics will have to take two shortanswer type tests
More informationAPPLICATION AND POWER OF PARAMETRIC CRITERIA FOR TESTING THE HOMOGENEITY OF VARIANCES. PART IV
DOI 10.1007/s11018-017-1213-4 Measurement Techniques, Vol. 60, No. 5, August, 2017 APPLICATION AND POWER OF PARAMETRIC CRITERIA FOR TESTING THE HOMOGENEITY OF VARIANCES. PART IV B. Yu. Lemeshko and T.
More informationSome General Types of Tests
Some General Types of Tests We may not be able to find a UMP or UMPU test in a given situation. In that case, we may use test of some general class of tests that often have good asymptotic properties.
More informationMA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2
MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 1 Bootstrapped Bias and CIs Given a multiple regression model with mean and
More informationApplication of Parametric Homogeneity of Variances Tests under Violation of Classical Assumption
Application of Parametric Homogeneity of Variances Tests under Violation of Classical Assumption Alisa A. Gorbunova and Boris Yu. Lemeshko Novosibirsk State Technical University Department of Applied Mathematics,
More informationAnalysis of variance and linear contrasts in experimental design with generalized secant hyperbolic distribution
Journal of Computational and Applied Mathematics 216 (2008) 545 553 www.elsevier.com/locate/cam Analysis of variance and linear contrasts in experimental design with generalized secant hyperbolic distribution
More information" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2
Notation and Equations for Final Exam Symbol Definition X The variable we measure in a scientific study n The size of the sample N The size of the population M The mean of the sample µ The mean of the
More informationSimulating Realistic Ecological Count Data
1 / 76 Simulating Realistic Ecological Count Data Lisa Madsen Dave Birkes Oregon State University Statistics Department Seminar May 2, 2011 2 / 76 Outline 1 Motivation Example: Weed Counts 2 Pearson Correlation
More informationIntroduction to Statistical Analysis
Introduction to Statistical Analysis Changyu Shen Richard A. and Susan F. Smith Center for Outcomes Research in Cardiology Beth Israel Deaconess Medical Center Harvard Medical School Objectives Descriptive
More informationIntroduction to bivariate analysis
Introduction to bivariate analysis When one measurement is made on each observation, univariate analysis is applied. If more than one measurement is made on each observation, multivariate analysis is applied.
More informationSome Review and Hypothesis Tes4ng. Friday, March 15, 13
Some Review and Hypothesis Tes4ng Outline Discussing the homework ques4ons from Joey and Phoebe Review of Sta4s4cal Inference Proper4es of OLS under the normality assump4on Confidence Intervals, T test,
More informationON THE CONSEQUENCES OF MISSPECIFING ASSUMPTIONS CONCERNING RESIDUALS DISTRIBUTION IN A REPEATED MEASURES AND NONLINEAR MIXED MODELLING CONTEXT
ON THE CONSEQUENCES OF MISSPECIFING ASSUMPTIONS CONCERNING RESIDUALS DISTRIBUTION IN A REPEATED MEASURES AND NONLINEAR MIXED MODELLING CONTEXT Rachid el Halimi and Jordi Ocaña Departament d Estadística
More informationIntroduction to bivariate analysis
Introduction to bivariate analysis When one measurement is made on each observation, univariate analysis is applied. If more than one measurement is made on each observation, multivariate analysis is applied.
More informationSystems Simulation Chapter 7: Random-Number Generation
Systems Simulation Chapter 7: Random-Number Generation Fatih Cavdur fatihcavdur@uludag.edu.tr April 22, 2014 Introduction Introduction Random Numbers (RNs) are a necessary basic ingredient in the simulation
More informationRank-sum Test Based on Order Restricted Randomized Design
Rank-sum Test Based on Order Restricted Randomized Design Omer Ozturk and Yiping Sun Abstract One of the main principles in a design of experiment is to use blocking factors whenever it is possible. On
More informationConfidence Interval Estimation
Department of Psychology and Human Development Vanderbilt University 1 Introduction 2 3 4 5 Relationship to the 2-Tailed Hypothesis Test Relationship to the 1-Tailed Hypothesis Test 6 7 Introduction In
More informationFlexible Estimation of Treatment Effect Parameters
Flexible Estimation of Treatment Effect Parameters Thomas MaCurdy a and Xiaohong Chen b and Han Hong c Introduction Many empirical studies of program evaluations are complicated by the presence of both
More informationLecture 3. Inference about multivariate normal distribution
Lecture 3. Inference about multivariate normal distribution 3.1 Point and Interval Estimation Let X 1,..., X n be i.i.d. N p (µ, Σ). We are interested in evaluation of the maximum likelihood estimates
More informationSummary of Chapters 7-9
Summary of Chapters 7-9 Chapter 7. Interval Estimation 7.2. Confidence Intervals for Difference of Two Means Let X 1,, X n and Y 1, Y 2,, Y m be two independent random samples of sizes n and m from two
More informationA Monte-Carlo study of asymptotically robust tests for correlation coefficients
Biometrika (1973), 6, 3, p. 661 551 Printed in Great Britain A Monte-Carlo study of asymptotically robust tests for correlation coefficients BY G. T. DUNCAN AND M. W. J. LAYAKD University of California,
More informationExtending the Robust Means Modeling Framework. Alyssa Counsell, Phil Chalmers, Matt Sigal, Rob Cribbie
Extending the Robust Means Modeling Framework Alyssa Counsell, Phil Chalmers, Matt Sigal, Rob Cribbie One-way Independent Subjects Design Model: Y ij = µ + τ j + ε ij, j = 1,, J Y ij = score of the ith
More information(!(5~~8) 13). Statistical Computing. -R-ES-O-N-A-N-C-E--I-A-p-ri-I ~~ '9
SERIES I ARTICLE Statistical Computing 2. Technique of Statistical Simulation Sudhakar Kunte The statistical simulation technique is a very powerful and simple technique for answering complicated probabilistic
More informationMath 494: Mathematical Statistics
Math 494: Mathematical Statistics Instructor: Jimin Ding jmding@wustl.edu Department of Mathematics Washington University in St. Louis Class materials are available on course website (www.math.wustl.edu/
More informationProbability Theory and Statistics. Peter Jochumzen
Probability Theory and Statistics Peter Jochumzen April 18, 2016 Contents 1 Probability Theory And Statistics 3 1.1 Experiment, Outcome and Event................................ 3 1.2 Probability............................................
More informationMarcia Gumpertz and Sastry G. Pantula Department of Statistics North Carolina State University Raleigh, NC
A Simple Approach to Inference in Random Coefficient Models March 8, 1988 Marcia Gumpertz and Sastry G. Pantula Department of Statistics North Carolina State University Raleigh, NC 27695-8203 Key Words
More informationHypothesis testing. 1 Principle of hypothesis testing 2
Hypothesis testing Contents 1 Principle of hypothesis testing One sample tests 3.1 Tests on Mean of a Normal distribution..................... 3. Tests on Variance of a Normal distribution....................
More informationChapter 10. Hypothesis Testing (I)
Chapter 10. Hypothesis Testing (I) Hypothesis Testing, together with statistical estimation, are the two most frequently used statistical inference methods. It addresses a different type of practical problems
More informationA Very Brief Summary of Statistical Inference, and Examples
A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)
More informationPreface Introduction to Statistics and Data Analysis Overview: Statistical Inference, Samples, Populations, and Experimental Design The Role of
Preface Introduction to Statistics and Data Analysis Overview: Statistical Inference, Samples, Populations, and Experimental Design The Role of Probability Sampling Procedures Collection of Data Measures
More informationCH.9 Tests of Hypotheses for a Single Sample
CH.9 Tests of Hypotheses for a Single Sample Hypotheses testing Tests on the mean of a normal distributionvariance known Tests on the mean of a normal distributionvariance unknown Tests on the variance
More informationA Non-parametric bootstrap for multilevel models
A Non-parametric bootstrap for multilevel models By James Carpenter London School of Hygiene and ropical Medicine Harvey Goldstein and Jon asbash Institute of Education 1. Introduction Bootstrapping is
More information5 Introduction to the Theory of Order Statistics and Rank Statistics
5 Introduction to the Theory of Order Statistics and Rank Statistics This section will contain a summary of important definitions and theorems that will be useful for understanding the theory of order
More informationLecture Outline. Biost 518 Applied Biostatistics II. Choice of Model for Analysis. Choice of Model. Choice of Model. Lecture 10: Multiple Regression:
Biost 518 Applied Biostatistics II Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture utline Choice of Model Alternative Models Effect of data driven selection of
More information* Tuesday 17 January :30-16:30 (2 hours) Recored on ESSE3 General introduction to the course.
Name of the course Statistical methods and data analysis Audience The course is intended for students of the first or second year of the Graduate School in Materials Engineering. The aim of the course
More informationNONPARAMETRIC TESTS. LALMOHAN BHAR Indian Agricultural Statistics Research Institute Library Avenue, New Delhi-12
NONPARAMETRIC TESTS LALMOHAN BHAR Indian Agricultural Statistics Research Institute Library Avenue, New Delhi-1 lmb@iasri.res.in 1. Introduction Testing (usually called hypothesis testing ) play a major
More informationProblem 1 (20) Log-normal. f(x) Cauchy
ORF 245. Rigollet Date: 11/21/2008 Problem 1 (20) f(x) f(x) 0.0 0.1 0.2 0.3 0.4 0.0 0.2 0.4 0.6 0.8 4 2 0 2 4 Normal (with mean -1) 4 2 0 2 4 Negative-exponential x x f(x) f(x) 0.0 0.1 0.2 0.3 0.4 0.5
More informationChapter 18 Resampling and Nonparametric Approaches To Data
Chapter 18 Resampling and Nonparametric Approaches To Data 18.1 Inferences in children s story summaries (McConaughy, 1980): a. Analysis using Wilcoxon s rank-sum test: Younger Children Older Children
More informationUnit 14: Nonparametric Statistical Methods
Unit 14: Nonparametric Statistical Methods Statistics 571: Statistical Methods Ramón V. León 8/8/2003 Unit 14 - Stat 571 - Ramón V. León 1 Introductory Remarks Most methods studied so far have been based
More informationPoint and Interval Estimation for Gaussian Distribution, Based on Progressively Type-II Censored Samples
90 IEEE TRANSACTIONS ON RELIABILITY, VOL. 52, NO. 1, MARCH 2003 Point and Interval Estimation for Gaussian Distribution, Based on Progressively Type-II Censored Samples N. Balakrishnan, N. Kannan, C. T.
More information2008 Winton. Statistical Testing of RNGs
1 Statistical Testing of RNGs Criteria for Randomness For a sequence of numbers to be considered a sequence of randomly acquired numbers, it must have two basic statistical properties: Uniformly distributed
More informationReview of Statistics 101
Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods
More informationA Bivariate Weibull Regression Model
c Heldermann Verlag Economic Quality Control ISSN 0940-5151 Vol 20 (2005), No. 1, 1 A Bivariate Weibull Regression Model David D. Hanagal Abstract: In this paper, we propose a new bivariate Weibull regression
More informationDESIGNING EXPERIMENTS AND ANALYZING DATA A Model Comparison Perspective
DESIGNING EXPERIMENTS AND ANALYZING DATA A Model Comparison Perspective Second Edition Scott E. Maxwell Uniuersity of Notre Dame Harold D. Delaney Uniuersity of New Mexico J,t{,.?; LAWRENCE ERLBAUM ASSOCIATES,
More informationSAMPLING IN FIELD EXPERIMENTS
SAMPLING IN FIELD EXPERIMENTS Rajender Parsad I.A.S.R.I., Library Avenue, New Delhi-0 0 rajender@iasri.res.in In field experiments, the plot size for experimentation is selected for achieving a prescribed
More informationSTAT 536: Genetic Statistics
STAT 536: Genetic Statistics Tests for Hardy Weinberg Equilibrium Karin S. Dorman Department of Statistics Iowa State University September 7, 2006 Statistical Hypothesis Testing Identify a hypothesis,
More informationANALYSIS OF VARIANCE OF BALANCED DAIRY SCIENCE DATA USING SAS
ANALYSIS OF VARIANCE OF BALANCED DAIRY SCIENCE DATA USING SAS Ravinder Malhotra and Vipul Sharma National Dairy Research Institute, Karnal-132001 The most common use of statistics in dairy science is testing
More informationFormulas and Tables by Mario F. Triola
Copyright 010 Pearson Education, Inc. Ch. 3: Descriptive Statistics x f # x x f Mean 1x - x s - 1 n 1 x - 1 x s 1n - 1 s B variance s Ch. 4: Probability Mean (frequency table) Standard deviation P1A or
More informationINTRODUCTION TO ANALYSIS OF VARIANCE
CHAPTER 22 INTRODUCTION TO ANALYSIS OF VARIANCE Chapter 18 on inferences about population means illustrated two hypothesis testing situations: for one population mean and for the difference between two
More informationACTEX CAS EXAM 3 STUDY GUIDE FOR MATHEMATICAL STATISTICS
ACTEX CAS EXAM 3 STUDY GUIDE FOR MATHEMATICAL STATISTICS TABLE OF CONTENTS INTRODUCTORY NOTE NOTES AND PROBLEM SETS Section 1 - Point Estimation 1 Problem Set 1 15 Section 2 - Confidence Intervals and
More informationChapter 7. Hypothesis Tests and Confidence Intervals in Multiple Regression
Chapter 7 Hypothesis Tests and Confidence Intervals in Multiple Regression Outline 1. Hypothesis tests and confidence intervals for a single coefficie. Joint hypothesis tests on multiple coefficients 3.
More informationDover- Sherborn High School Mathematics Curriculum Probability and Statistics
Mathematics Curriculum A. DESCRIPTION This is a full year courses designed to introduce students to the basic elements of statistics and probability. Emphasis is placed on understanding terminology and
More information