6 Single Sample Methods for a Location Parameter
|
|
- Amelia McKenzie
- 6 years ago
- Views:
Transcription
1 6 Single Sample Methods for a Location Parameter If there are serious departures from parametric test assumptions (e.g., normality or symmetry), nonparametric tests on a measure of central tendency (usually the median) are used. Recall: M is a median of a random variable X if P (X M) = P (X M) =.5. The distribution of X is symmetric about c if P (X c x) = P (X c + x) for all x. For symmetric continuous distributions, the median M = the mean µ. Thus, all conclusions about the median can also be applied to the mean. If X be a binomial ( ) random variable with parameters n and p (denoted X B(n, p)) then n P (X = x) = p x (1 p) n x for x = 0, 1,..., n x where ( ) n = x n! x!(n x)! and k! = k(k 1)(k 2) 2 1. Tables exist for the cdf P (X x) for various choices of n and p. The probabilities and cdf values are also easy to produce using SAS or R. Thus, if X B(n,.5), we have ( ) n P (X = x) = (.5) n x x ( ) n P (X x) = (.5) n k k=0 P (X x) = P (X n x) because the B(n,.5) distribution is symmetric. For sample sizes n > 20 and p =.5, a normal approximation (with continuity correction) to the binomial probabilities is often used instead of binomial tables. Calculate z = (x ±.5).5n.5. Use x+.5 when x <.5n and use x.5 when x >.5n. n The value of z is compared to N(0, 1), the standard normal distribution. For example: P (X x) P (Z z) and P (X x) P (Z z) = 1 P (Z z) 6.1 Ordinary Sign Test Assumptions: Given a random sample of n independent observations The measurement scale is at least nominal. Observations can be classified into 2 nonoverlapping categories whose union exhausts all possibilities. The categories will be labeled + and. 88
2 Hypotheses: The inference involves comparing probabilities P (+) and P ( ) for outcomes + and. (A) Two-sided: H 0 : P (+) = P ( ) vs H 1 : P (+) P ( ) (B) Upper one-sided: H 0 : P (+) P ( ) vs H 1 : P (+) < P ( ) (C) Lower one-sided: H 0 : P (+) P ( ) vs H 1 : P (+) > P ( ) Note: H 0 is true only if P (+) = P ( ) =.5 Method: For a given α Let T + = the number of + observations. Let T = the number of observations. If H 0 is true, then we would expect T + and T to be nearly equal ( n/2). In other words, if H 0 is true, T + and T are binomial B(n,.5) random variables. For alternative hypothesis (A) H 1 : P (+) P ( ). Let T = min(t +, T ). Then find the largest t such that B(n,.5) probability P (X t) α/2. (B) H 1 : P (+) < P ( ). Let T = T +. Then find the largest t such that B(n,.5) probability P (X t) α. (C) H 1 : P (+) > P ( ). Let T = T. Then find the largest t such that B(n,.5) probability P (X t) α. Decision Rule For (A), (B), or (C), if T is too small, then we will reject H 0. That is, If T t, Reject H 0. If T > t, Fail to Reject H 0. Large Sample Approximation 1. For the one-sided H 1, calculate z = T n.5 n z = T +.5.5n.5 n for (B) if T + <.5n z = T +.5.5n.5 n for (C) if T <.5n z = T.5.5n.5 n for (B) if T + >.5n for (C) if T >.5n 2. For the two-sided H 1, take the smaller of the two z-values in (1.). 3. Find Φ(z) = P (Z z) from the standard normal distribution. 4. Reject H 0 if (i) if P (Z z) α for either 1-sided test or (ii) P (Z z) α/2 for the 2-sided test. 89
3 90 44
4 Example: (From Gibbons, Nonparametric Methods for Quantitative Analysis). An oil company is considering the following procedures for training prospective service station managers: 1. On-the-job training under actual working conditions for three months. 2. A company-run school training program concentrated over one month. They plan to compare the two procedures in an experiment. No training program can be the only determining factor for the success of a manager. Success is also affected by other factors such as age, intelligence, and previous experience. In order to eliminate the effects of these factors as much as possible, each trainee is matched with another trainee that has similar attributes (such as similar age and previous experience). If a good match does not exist for a trainee, then the trainee is not included in the experiment. Once pairs are determined, one member of each pair is randomly selected to receive the on-the-job training, while the other is assigned to the company school. After completing the assigned training program, the personnel manager assesses each trainee and judges which member of each pair has done a better job of managing the service station. In total, 13 pairs had completed the training programs. The personnel manager stated that for 10 of the 13 pairs, the better manager received the company school training. Is there sufficient evidence to claim that the company-run school training program is more effective? Table of Binomial Probabilities and Binomial CDF for n=13, p=.5 n p x f(x) = Pr(X=x) F(x) = Pr(X<=x) < <.025 < > Sign (Binomial) Test for Location Assumptions: Given a random sample of n independent observations x 1, x 2,..., x n : The variable of interest is continuous, and the measurement scale is at least ordinal. Hypotheses: The inference concerns a hypothesis about the median M of a single population. (A) Two-sided: H 0 : M = M o vs H 1 : M M o (B) Upper one-sided: H 0 : M = M o vs H 1 : M > M o (C) Lower one-sided: H 0 : M = M o vs H 1 : M < M o 91
5 Method: For a given α Let T + = the number of observations > M o. Let T = the number of observations < M o. Delete any x i = M o and adjust the sample size n accordingly. If H 0 is true, then T + and T are binomial B(n,.5) random variables. Thus, we would expect T + and T to be approximately equal ( n/2). For alternative hypothesis (A) H 1 : M M o. Let T = min(t +, T ). Then find the largest t such that B(n,.5) probability P (X t) α/2. (B) H 1 : M > M o. Let T = T. Then find the largest t such that B(n,.5) probability P (X t) α. (C) H 1 : M < M o. Let T = T +. Then find the largest t such that B(n,.5) probability P (X t) α. Perform the Ordinary Sign Test based on T and t. Decision Rule For (A), (B), or (C), if T is too small, then we will reject H 0. That is, If T t, Reject H 0. If T > t, Fail to Reject H 0. Large Sample Approximation Same as for the Ordinary Sign Test. Example 2.1 from Applied Nonparametric Statistics by W. Daniel. In a study of heart disease, a researcher measured the blood s transit time in subjects with healthy right coronary arteries. The median transit time was 3.50 seconds. In another study, the researchers repeated the transit time study but on a sample of 11 patients with significantly blocked right coronary arteries. The results (in seconds) were Can these researchers conclude (using α =.05) that the median transit time in the population of patients with significantly blocked right coronary arteries is different than 3.50 seconds? 2. Can these researchers conclude (α =.05) that the median transit time in the population of patients with significantly blocked right coronary arteries is less than 3.50 seconds? 92
6 Table of Binomial Probabilities and Binomial CDF for n=10, p=.5 n p x f(x) = Pr(X=x) F(x) = Pr(X<=x) < <-- < Special Case: Paired Data Assumptions: Given a random sample of n independent pairs of observations (x 1, y 1 ), (x 2, y 2 ),..., (x n, y n ) : Both variables X and Y are continuous, and the measurement scales are at least ordinal. Testing Procedure: Calculate all differences D i = y i x i for i = 1,..., n. Use the median difference M D in the hypotheses. Typically, M D = 0. Run the Sign Test based on the differences (the D i values). Example 4.1 from Applied Nonparametric Statistics by W. Daniel. Researchers studied the effects of togetherness on the heart rate in rats. They recorded the heart rates of 10 rats while they were alone and while in the presence of another rat. The results are shown below. Using an α =.05 significance level for the Sign Test, can we conclude that togetherness increases the heart rate in rats? For this data, the ten D i values are
7 6.2.2 Sign Test Examples using R and SAS R Output for Sign (Binomial Test Examples) > # Sign Test Example from Gibbons > binom.test(10,13) Exact binomial test number of successes = 10, number of trials = 13, p-value = <-- Fail to reject alternative hypothesis: true probability of success is not equal to percent confidence interval: <-- The CI contains.5 so we fail to reject Ho sample estimates: probability of success > # Sign (Binomial) Test for Location -- Daniel Ex. 2.1 > time <- c(1.80,3.30,5.65,2.25,2.50,3.50,2.75,3.25,3.10,2.70,3.00) > time = time > time [1] > ties = sum(time==0) > ties [1] 1 > binom.test(sum(time>0),length(time)-ties) Exact binomial test Reject Ho ^^^ number of successes = 1, number of trials = 10, p-value = alternative hypothesis: true probability of success is not equal to percent confidence interval: <--.5 is not in the CI, so reject Ho sample estimates: probability of success 0.1 > # Sign (Binomial) Test for Location -- Paired Data, Daniel Ex. 4.1 > alone <- c(463,462,462,456,450,426,418,415,409,402) > together <- c(523,494,461,535,476,454,448,408,470,437) > diff <- together - alone > ties = sum(diff==0) > ties [1] 0 > binom.test(sum(diff>0),length(diff)-ties) 94
8 Exact binomial test Fail to reject Ho ^^^^^^^ number of successes = 8, number of trials = 10, p-value = alternative hypothesis: true probability of success is not equal to percent confidence interval: sample estimates: probability of success 0.8 R Code for Sign (Binomial Test Examples) # Sign Test Example from Gibbons binom.test(10,13) # Sign (Binomial) Test for Location -- Daniel Ex. 2.1 time <- c(1.80,3.30,5.65,2.25,2.50,3.50,2.75,3.25,3.10,2.70,3.00) time = time time ties = sum(time==0) ties binom.test(sum(time>0),length(time)-ties) # Sign (Binomial) Test for Location -- Paired Data, Daniel Ex. 4.1 alone <- c(463,462,462,456,450,426,418,415,409,402) together <- c(523,494,461,535,476,454,448,408,470,437) diff <- together - alone ties = sum(diff==0) ties binom.test(sum(diff>0),length(diff)-ties) SAS Output for Sign (Binomial) Test Examples In SAS, the Sign (Binomial) Test statistic is denoted M, and it represents the deviation in the observed count T + from the expected count.5n when the null hypothesis is true. Ordinary Sign Test for Training Program Example The UNIVARIATE Procedure Variable: level Tests for Location: Mu0=0 Test -Statistic p Value Student s t t Pr > t Sign M -3.5 Pr >= M <-- Fail to Signed Rank S Pr >= S reject Ho 95
9 Sign (Binomial) Test for Example 2.1 Variable: diff Basic Statistical Measures Location Variability Mean Std Deviation Median Variance Tests for Location: Mu0=0 Test -Statistic p Value Student s t t Pr > t Sign M -4 Pr >= M <-- Reject Signed Rank S Pr >= S Ho Sign (Binomial) Test for Paired Differences -- Example 4.1 Obs alone together diff Variable: diff Basic Statistical Measures Location Variability Mean Std Deviation Median Variance Tests for Location: Mu0=0 Test -Statistic p Value Student s t t Pr > t Sign M 3 Pr >= M <--- Fail to Signed Rank S 24.5 Pr >= S reject Ho 96
10 SAS Code for Sign (Binomial) Test Examples DM LOG; CLEAR; OUT; CLEAR; ; OPTIONS NODATE NONUMBER LS=76 PS=54; ****************************************************************; *** Ordinary Sign Test: Let -1,1 represent the 2 categories. ***; *** The frq values are the category frequencies ***; ****************************************************************; DATA in; INPUT level LINES; DATA signtest (DROP=i); SET in; IF level = -1 THEN DO i = 1 TO frq; OUTPUT; end; IF level = 1 THEN DO i = 1 TO frq; OUTPUT; end; PROC UNIVARIATE DATA=signtest; VAR level; TITLE Ordinary Sign Test for Training Program Example ; ******************************************; *** Sign (Binomial) Test for Location: ***; **** Example 2.1 in course notes ***; ******************************************; DATA in2; med_time = 3.50; INPUT diff = time - med_time; OUTPUT; LINES; PROC UNIVARIATE DATA=in2; VAR diff; TITLE Sign (Binomial) Test for Example 2.1 ; RUN; ****************************************************; *** Sign (Binomial) Test for Paired Differences: ***; *** Example 4.1 in course notes ***; ****************************************************; DATA in3; INPUT alone diff = together - alone; OUTPUT; LINES; ; PROC PRINT DATA=in3; TITLE Sign (Binomial) Test for Paired Differences -- Example 4.1 ; PROC UNIVARIATE DATA=in3; VAR diff; RUN; 97
11 6.3 Wilcoxon Signed Rank Test Assumptions: Given a random sample of n independent observations X 1,..., X n : Each X i was drawn from a symmetric and continuous population. Each X i has the same median M for i = 1,..., n). The measurement scale is at least on the interval scale. Hypotheses: The inference concerns a hypothesis about the median M of a single population. Given M o, a hypothesized value of the median, we have: (A) Two-sided: H 0 : M = M o vs H 1 : M M o (B) Lower one-sided: H 0 : M = M o vs H 1 : M < M o (C) Upper one-sided: H 0 : M = M o vs H 1 : M > M o Because of the symmetry assumption, we can replace the median M with the mean µ in the hypotheses. Method: For a given α Calculate all differences D i = X i M o. Remove all cases having D i = 0 and adjust the sample size n accordingly. Assign ranks 1, 2,..., n to the D i. For tied D i values, assign average ranks. If H 0 is true, then the D i are symmetrically distributed about 0. That is, we expect (Ranks where Di > 0) (Ranks where D i < 0). Let T + = (R i when D i > 0) and T = (R i when D i < 0). Under H 0, the sampling distributions of T + and T are symmetric about n(n + 1)/4 and can assume integer values from 0 to n(n + 1)/2. Note that T = n(n + 1)/2 T +. For alternative hypothesis: (A) (B) (C) H 1 : M M o, let T = min(t +, T ). Let w be the largest value from the Wilcoxon Signed Rank Test Table such that P (T w) α/2. H 1 : M < M o, let T = T +. Let w be the largest value from the Wilcoxon Signed Rank Test Table such that P (T w) α. H 1 : M > M o, let T = T. Let w be the largest value from the Wilcoxon Signed Rank Test Table such that P (T w) α. Decision Rule If T w, Reject H 0. If T > w, Fail to Reject H 0. 98
12 Large Sample Approximation (with continuity correction) (n > 30) Computer packages like R and SAS calculate approximate p-values for the two-sided alternative H 1 based on large sample normal distribution approximations. The normalizing formula is: T T n(n + 1)/4 = = T E(T ). n(n + 1)(2n + 1)/24 V ar(t ) Daniel (Applied Nonparametric Statistics, page 42) describes an adjustment to this formula in the event of ties. Example of Wilcoxon Signed Rank Test A random sample of 12 fish was taken and the bodyweights recorded. Test the null hypothesis H 0 : µ = 3.0 against the alternative H 1 : µ < 3.0 pounds. (Assume they are sampled from the same symmetric distribution.) R code for Wilcoxon Signed Rank Test with Confidence Interval # Wilcoxon Signed Rank Test with Confidence Interval for Fish Data fish <- c(2.11,2.22,2.23,2.41,2.54,2.73,2.80,2.80,2.92,3.06,3.12,3.12) wilcox.test(fish,mu=3,conf.int=true) R output for Wilcoxon Signed Rank Test with Confidence Interval > # Wilcoxon Signed Rank Test with Confidence Interval for Fish Data > fish <- c(2.11,2.22,2.23,2.41,2.54,2.73,2.80,2.80,2.92,3.06,3.12,3.12) > wilcox.test(fish,mu=3,conf.int=true) Wilcoxon signed rank test with continuity correction data: fish V = 8, p-value = alternative hypothesis: true location is not equal to 3 95 percent confidence interval: sample estimates: (pseudo)median SAS code and selected output: SIGN TEST AND WILCOXON SIGNED RANK TEST The UNIVARIATE Procedure Basic Statistical Measures Location Variability Mean Std Deviation Median Variance Mode Range Interquartile Range
13 Tests for Location: Mu0=0 Test -Statistic p Value Student s t t Pr > t Sign M -3 Pr >= M Signed Rank S -31 Pr >= S <-- p-value The Signed Rank statistics S in SAS = T (n(n + 1)/4 = 8 (12)(13)/4 = 8 39 = 31. The p-value is based on the normal approximation and is for a two-sided alternative. Thus, for one-sided H 1 : M < 3.0, the approximate p-value =.0112/2 = OPTIONS LS=72 PS=60 NONUMBER NODATE; DATA IN; INPUT X=X-3; CARDS; PROC UNIVARIATE DATA=IN; VAR X; TITLE ONE SAMPLE TESTS FOR LOCATION: ; TITLE2 SIGN TEST AND WILCOXON SIGNED RANK TEST ; RUN; Reference Distribution for the Signed Rank Test (n = 5) If H 0 is true, then any random D i has a probability 1/2 of being > M o or < M o. Without loss of generality, let D 1, D 2, D 3, D 4, D 5 be ordered from smallest to largest. Then when n = 5, every possible ranking of D 1, D 2, D 3, D 4, D 5 has a (1/2) 5 = 1/32 chance of occurring. Possible Ranks Cumulative T + With D i > 0 Probability Probability 0 None 1/32 1/32 = /32 2/32 = /32 3/32 = or 1,2 2/32 5/32 = or 1,3 2/32 7/32 = or 1,4 or 2,3 3/32 10/32 = ,5 or 2,4 or 1,2,3 3/32 13/32 = ,5 or 3,4 or 1,2,4 3/32 16/32 = ,5 or 1,2,5 or 1,3,4 3/32 19/32 = ,5 or 1,3,5 or 2,3,4 3/32 22/32 = ,4,5 or 2,3,5 or 1,2,3,4 3/32 25/32 = ,4,5 or 1,2 3,5 2/32 27/32 = ,4,5 or 1,2 4,5 2/32 29/32 = ,3,4,5 1/32 30/32 = ,3,4,5 1/32 31/32 = ,2,3,4,5 1/32 32/32 = 1 100
14 101 56
15 57102
16 6.3.2 Special Case: Paired Data Assumptions: Given a random sample of n pairs of observations (x 1, y 1 ), (x 2, y 2 ),..., (x n, y n ). Let D i = y i x i for i = 1,..., n. The D i s are independent. The measurement scale is at least on the interval scale. The distribution of the differences D i = y i x i for i = 1,..., n is symmetric. Testing Procedure: Calculate all differences D i = y i x i for i = 1,..., n. Use the median difference M D in the hypotheses. Typically, M D = 0. Because of the symmetry assumption, we can replace the median M D with the mean µ D in the hypotheses. Run the Wilcoxon Signed Test based on the D i. Example of Wilcoxon Signed Rank Test for Paired Data Two judges were asked to independently rate the rehabilitative potential for each of 22 male prison inmates. The following table contains the ratings: Inmate (i) Judge 1 Judge 2 D i D i Sign R i remove tie remove tie
17 SAS code and output: DATA IN; DO INMATE=1 TO 22; INPUT JUDGE1 JUDGE2 DIFF = JUDGE1 - JUDGE2; OUTPUT; END; CARDS; ; PROC UNIVARIATE DATA=IN; VAR DIFF; TITLE WILCOXON SIGNED RANK TEST FOR PAIRED DATA ; RUN; ==================================================================== WILCOXON SIGNED RANK TEST FOR PAIRED DATA The UNIVARIATE Procedure Variable: DIFF Tests for Location: Mu0=0 Test -Statistic p Value Student s t t Pr > t Sign M 5 Pr >= M Signed Rank S 75 Pr >= S <-- Reject Ho The approximate p-value is Thus, we would reject the null hypothesis H o : M D = Confidence Interval for the Median Based on the Wilcoxon Signed Rank Test To find the point estimate for the median M: Calculate all paired averages u ij allowing replication: u ij = x i + x j. 2 There are ( ) ( n 2 + n = n+1 ) 2 such averages. Arrange the u ij in increasing order. The point estimate for M is M = the median of the {u ij }. Method: For an approximate confidence level 100(1 α)% : Use the Wilcoxon Signed Rank Test Table to find the largest t such that P (T t) α/2. Let M L = (t + 1) st u ij observation from the beginning and M U = (t + 1) st u ij observation from the end of the set of ordered u ij values. Statistically, P (M L M M U ) = P (t + 1 T n(n + 1)/2 t) where T is the Wilcoxon signed rank statistic. The approximate 100(1 α)% confidence interval is (M L, M U ). The exact confidence level for (M L, M U ) is determined by the distribution given in the Wilcoxon Signed Rank Test Table. That is, if p = P (X t), then (M L, M U ) is an exact 100(1 2p)% confidence interval for M. 104
18 Note: You do not need to calculate all of the u ij values, but only the (t + 1) st largest and smallest. This procedure is also known as the Hodges-Lehmann estimates of shift. Example of Hodges-Lehmann confidence interval for M A random sample of 12 fish was taken and the body weights were recorded Calculate an approximate 95% confidence interval for the median bodyweight M. For n = 12, the largest value in the Wilcoxon Signed Rank Test Table with P (T t).025 is t = 13. Note: P (T 13) = Then t + 1 = 14 and n(n + 1)/2 t = = 65. Now find the 14 th and 65 th values in the list of the 78 u ij values. HODGES-LEHMANN CONFIDENCE INTERVAL Obs x1 x2 u Obs x1 x2 u <-- lower endpoint <-- upper endpoint Thus, the confidence interval is (2.42,2.93). 105
19 R code for Wilcoxon Signed Rank Test with Confidence Interval # Wilcoxon Signed Rank Test with Confidence Interval for Fish Data fish <- c(2.11,2.22,2.23,2.41,2.54,2.73,2.80,2.80,2.92,3.06,3.12,3.12) wilcox.test(fish,mu=3,conf.int=true) R output for Wilcoxon Signed Rank Test with Confidence Interval Wilcoxon signed rank test with continuity correction data: fish V = 8, p-value = alternative hypothesis: true location is not equal to 3 95 percent confidence interval: <--- Approximate 95% confidence interval for mu is (2.42, 2.93) sample estimates: (pseudo)median Asymptotic Relative Efficiency (A.R.E.) One way to compare properties of statistical tests is to compare the efficiency properties. The definition of efficiency can vary but, generally speaking, it is used to compare the sample size required of one test with that of another test under similar conditions. Suppose that two tests may be used to test a particular H 0 against a particular H 1, and both tests have the same specified α and β errors. These tests are therefore comparable under conditions related to the level of significance α and power (1 β). Thus, the test requiring the smaller sample size to satisfy these conditions will have the smaller sampling cost and effort. That is, the test with the smaller required sample size is more efficient than the other test, and its relative efficiency is greater than one. Let T 1 and T 2 represent two tests that test the same H 0 against the same H 1 with the same specified α and β values. For example, T 1 is the Sign Test and T 2 is the Wilcoxon Signed Rank Test which are used to test H 0 : µ = µ 0 with α =.05 and power 1 β =.90. The relative efficiency of test T 1 with respect to test T 2 is the ratio n 2 /n 1, where n 1 is the required sample size of T 1 to equal the power of test T 2 which has sample size n 2 (assuming the same H 0 and significance level α). Thus, there is a relative efficiency of T 1 with respect to T 2 for each choice of α and n 2. A more general measure of efficiency (asymptotic relative efficiency) was developed. Consider the situation of letting sample size n 1 increase for T 1 with specified α and β. Then there exists a sequence of n 2 values, such that for each value of n 1 (n 1 ), T 2 has the same α and β values. In other words, there is a sequence of relative efficiency values n 2 /n 1. If n 2 /n 1 approaches a constant value as n 1, and, if that constant is the same for all choices of α and β, then the constant is called the asymptotic relative efficiency of T 1 with respect to T
20 Note that if the A.R.E. exists for T 1 and T 2, then the limiting A.R.E. value is independent of the choice of α and β. To select a test with superior power, we generally select the test with the greatest A.R.E. because the power depends on many factors such as the maximum number of observations that can be collected given experimental or sampling resources and the type of distribution that generates the data (normal?, weibull?, gamma?,...) which is usually unknown. The A.R.E. is, in general, difficult to calculate. In this course, we will only consider A.R.E. results for various pairs of tests and for several choices of distributions. Note that A.R.E. assumes that an infinite sample size can be taken. Thus, a natural question arises: How good is a measure assuming an infinitely large sample when most practical situations involve relatively small sample sizes? In an attempt to answer this question, studies of exact relative efficiency values for very small samples have shown that A.R.E. provides a good approximation to the relative efficiency in many situations of practical interest A.R.E. Comparison for Three Single-Sample Tests of Location We will compare the t-test, Sign test, and the Wilcoxon Signed Rank test using the A.R.E. values. To do this we will consider three situations involving symmetric distributions. Under symmetry assumptions, H 0 and H 1 are identical for all three tests. (I) The sample was randomly sampled from a normal distribution having density function ] 1 f(x; µ, σ) = [ σ 2π exp (x µ)2 for < x < 2σ 2 Without loss of generality, we can assume it is a standard normal N(0, 1) having density function φ(x) = 1 2π exp ( x 2 /2 ) for < x < (II) The sample was randomly sampled from a uniform distribution having density function: f(x; a, b) = 1 (b a) for a < x < b = 0 otherwise Without loss of generality, we can assume a uniform U(0, 1) having density function f(x) = 1 for 0 < x < 1 = 0 otherwise The uniform distribution is considered a light-tailed symmetric distribution. 107
21 f(x) = 1 for 0 < x < 1 = 0 otherwise The uniform distribution is considered a light-tailed symmetric distribution. (III) The sample was randomly sampled from a double exponential distribution (DE) having Thedensity samplefunction was randomly sampled from a double exponential distribution (DE) hav- (III) ing f(x; a, b) = 1 ( ) 2b exp x a for < x < f(x; a, b) = 1 ( ) 2b exp x a for < x < b Without loss of generality, we can assume DE(0, 1) having density function Without loss of generality, assume it is DE(0, 1) having density function f(x) f(x) = exp ( x ) 1 exp ( x ) for 2 for < x < The DE distribution is is a heavy-tailed symmetric distribution. Table of A.R.E. Values 61 Test (I) Normal (II) Uniform (III) Double Exponential Comparison Distribution Distribution Distribution Sign test 2/π / / vs t-test (t) (t) (Sign) Wilcoxon Signed Rank test 3/2 = /4 = vs Sign test (Wlcxn) (Wlcxn) (Sign) Wilcoxon Signed Rank test 3/π /2 = vs t-test (t) (=) (Wlcxn) Bold-face letters indicate which test is more efficient 6.5 Introduction to the One-Sample Randomization Test Paired Data Example: An experimental drug was tested on 7 subjects. Blood level measurements were taken before (X) and after (Y ) administering the drug. In this situation, we have paired data. The difference (D = Y X) in blood level measurements for each subject were: Di Patient i D Difference D i The goal is to test the hypothesis that there is no change in blood level measurement after taking the drug. Statistically, we will assume that the distribution of the difference in blood levels measurements under the null hypothesis H 0 is symmetric about
22 Thus, H 0 : µ D = 0. We will consider two possible alternatives: (1) H 1 : µ D 0 and (2) H 1 : µ D < 0. If H 0 is true (and assuming symmetry about 0), the signs (+ or ) of the 7 measurements can be considered random. For example, we could have just as likely observed Di Patient i D Difference D i or Di Patient i D Difference D i or any other randomization of the signs. The Randomization Reference Distribution 1. Consider all possible sign assignments or randomizations of signs for the seven differences. 2. Calculate D i for each randomization. In this example, there are 2 7 = 128 different randomizations of the seven signs. This yields the randomization distribution of D i. In terms of testing, it is statistically equivalent to use the randomization distribution of the mean D. 3. Now compare the OBSERVED D i =.664 to the randomization distribution to find the probability (p-value) associated with the test H 0 : µ D = 0 against the alternative H 1 hypothesis. Case 1: For alternative H 1 : µ D 0, from the randomization reference distribution we see the p-value = P ( D i.664 ) = P (D i.664 ) + P (D i.664 ) = (6 + 6)/128 = Case 2: For alternative H 1 : µ D < 0, from the randomization reference distribution we see the p-value = P (D i.664 ) = 6/128 =
23 RANDOMIZATION TEST #1 Obs ID1 ID2 ID3 ID4 ID5 ID6 ID7 D_SUM CDF < RANDOMIZATION TEST #1 Obs ID1 ID2 ID3 ID4 ID5 ID6 ID7 D_SUM CDF <
24 SAS Code to Generate the Randomization Distribution for the Change in Blood Level Measurements DM LOG;CLEAR;OUT;CLEAR; ; OPTIONS LS=72 PS=68 NONUMBER NODATE; DATA IN; INPUT CARDS; DATA IN; SET IN; DO I1=-1 TO 1 BY 2; ID1=I1*D1; DO I2=-1 TO 1 BY 2; ID2=I2*D2; DO I3=-1 TO 1 BY 2; ID3=I3*D3; DO I4=-1 TO 1 BY 2; ID4=I4*D4; DO I5=-1 TO 1 BY 2; ID5=I5*D5; DO I6=-1 TO 1 BY 2; ID6=I6*D6; DO I7=-1 TO 1 BY 2; ID7=I7*D7; D_SUM = SUM(OF ID1-ID7); OUTPUT; END; END; END; END; END; END; END; KEEP ID1-ID8 D_SUM; PROC SORT DATA=IN; BY D_SUM; DATA IN; SET IN; CDF=_N_/128; PROC PRINT DATA=IN; TITLE RANDOMIZATION TEST #1 ; RUN; Randomization Test for Paired Data Assumptions: Given a random sample of n pairs of observations (x 1, y 1 ), (x 2, y 2 ),..., (x n, y n ). The differences D i = y i x i are independent. The distribution of each D i is symmetric and has the same mean. The measurement scale for the D i s is at least interval. Hypotheses: The inference concerns a hypotheses about whether or not the mean difference µ D = 0: (A) Two-sided: H 0 : µ D = 0 vs H 1 : µ D 0 (B) Lower one-sided: H 0 : µ D = 0 vs H 1 : µ D < 0 (C) Upper one-sided: H 0 : µ D = 0 vs H 1 : µ D > 0 Because of the symmetry assumption, we can replace the mean difference µ D with the median difference M D in the hypotheses. Method: For a given α Calculate the sum D i for each of the possible 2 n sign randomizations. Order these values to form the randomization distribution for the D i. 111
25 Decision Rule For (A) H 1 : µ D 0, If ACTUAL D i < 0, let D = D i. If ACTUAL D i > 0, let D = D i. For (B) H 1 : µ D < 0, let D = D i. For (C) H 1 : µ D > 0, let D = D i. p value = 2 Number of D is D 2 n p value = Number of D is D 2 n p value = Number of D is D 2 n = Number of D i s D 2 n Reject H 0 if p-value α. Otherwise, we fail to reject H 0. Without loss of generality, you can replace D i with D in the preceding arguments. Note that the number of sign randomizations (2 n ) forming the randomization distribution grows rapidly. For example, when n = 20, there are over 1 million randomizations. In such cases, it is generally not feasible to generate the entire randomization distribution. To handle this problem, a large number of randomizations of the signs are randomly taken. Then, an approximate randomization distribution is generated from this large subset of possible randomizations. Approximate p-values can then be determined from this distribution. This is known as the monte-carlo approach to generating approximate p-values. The following R code will generate p-values using the monte-carlo approach for the single-sample Randomization Test for Location. R Code for Randomization Test on Paired Data (Differences) # Single Sample Randomization Test for Location # Enter the number of permutations to take Prep = Prep # Enter vector of differences D <- c(-.187,.011,-.250,.034,-.137,-.112,-.023) D # Calculate the mean difference meand <- mean(d) meand sgnd <- sign(meand) Fp = 0 112
26 upper <- 0 lower <- 0 n = length(d) # Begin sign randomizations meanpermd <- 1:Prep for (i in 1:Prep){ sgnvec <- sign(runif(n)-.5) permvec <- sgnvec*d # random vector with 1 or -1 values } # Calculate the mean difference for the i_th randomization vector meanpermd[i] <- mean(permvec) if(meanpermd[i]>=meand) upper = upper+1 if(meanpermd[i]<=meand) lower = lower+1 # Calculate p-values: # for lower one-sided Ho pval_lower <- lower/prep pval_lower # for upper one-sided Ho p-val_upper <- upper/prep pval_upper # for two-sided Ho if(sgnd < 0) pval_two_sided = if(sgnd > 0) pval_two_sided = pval_two_sided 2*pval_lower 2*pval_upper hist(meanpermd) R Output for Randomization Test on Paired Data > meand > # Calculate p-values: > # for lower one-sided Ho [1] > # for upper one-sided H0 [1] > # for two-sided Ho [1] Note that the p-values from the monte-carlo approach (.04644,.96076,.09288) approximate the exact p-values of ( , ,.09375) from the true randomization distribution. Without loss of generality, I used D instead of D i in my R code. 113
27 Histogram ofhistogram 50,000 Values of meanpermd using the Monte-Carlo Approach Frequency meanpermd Single Sample Randomization Test for H o : µ = µ 0 Suppose we want to perform a randomization test if our inference concerns a hypotheses about whether or not µ = µ0 for some specified value µ 0 against one of three alternatives: (A) Two-sided: H 0 : µ = µ 0 vs H 1 : µ µ 0 (B) Lower one-sided: H 0 : µ = µ 0 vs H 1 : µ < µ 0 (C) Upper one-sided: H 0 : µ = µ 0 vs H 1 : µ > µ 0 To perform a randomization test, simply subtract µ 0 from each observation, and then run the randomization test as you would for paired data. Example: A random sample of 12 fish was taken and the bodyweights recorded. Test the null hypothesis H 0 : µ = 3.0 against the alternative H 1 : µ < 3.0 pounds Based on the randomization test, the p-value is approximately Therefore, we would reject H 0 : µ = 3.0 and conclude that µ < 3.0 pounds. 114
28 R Code for Randomization Test for Fish Weight Data # Single Sample Randomization Test for Location # Enter the number of randomizations to take Prep = Prep fish <- c(2.11,2.22,2.23,2.41,2.54,2.73,2.80,2.80,2.92,3.06,3.12,3.12) # Enter the hypothesized mean mu0 = 3 # Enter vector of differences D <- fish - mu0 D <-- enter mu_0 <- subtract mu_0 from the data # Calculate the mean difference meand <- mean(d) meand sgnd <- sign(meand) Fp = 0 upper <- 0 lower <- 0 n = length(d) # Begin sign randomizations meanpermd <- 1:Prep for (i in 1:Prep){ sgnvec <- sign(runif(n)-.5) permvec <- sgnvec*d # random vector with 1 or -1 values # Calculate the mean difference for the i_th randomization vector meanpermd[i] <- mean(permvec) if(meanpermd[i]>=meand) upper = upper+1 if(meanpermd[i]<=meand) lower = lower+1 } # Calculate p-values: # for lower one-sided Ho pval_lower <- lower/prep pval_lower # for upper one-sided H0 pval_upper <- upper/prep pval_upper # for two-sided Ho if(sgnd < 0) pval_two_sided = if(sgnd > 0) pval_two_sided = pval_two_sided 2*pval_lower 2*pval_upper hist(meanpermd) 115
29 R Output for Randomization Test on Fish Weight Data > # Enter vector of differences [1] > # Calculate the mean difference [1] > # Calculate p-values: > # for lower one-sided Ho [1] > # for upper one-sided H0 [1] > # for two-sided Ho [1] Histogram of 50,000 Values using the Monte-Carlo Approach Histogram of meanpermd Frequency meanpermd 116
Version 1: Equality of Distributions. 3. F (x) and G(x) represent the distribution functions corresponding to the Xs and Y s, respectively.
4 Two-Sample Methods 4.1 The (Mann-Whitney) Wilcoxon Rank Sum Test Version 1: Equality of Distributions Assumptions: Given two independent random samples X 1, X 2,..., X n and Y 1, Y 2,..., Y m : 1. The
More informationNonparametric tests. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 704: Data Analysis I
1 / 16 Nonparametric tests Timothy Hanson Department of Statistics, University of South Carolina Stat 704: Data Analysis I Nonparametric one and two-sample tests 2 / 16 If data do not come from a normal
More informationSTAT Section 3.4: The Sign Test. The sign test, as we will typically use it, is a method for analyzing paired data.
STAT 518 --- Section 3.4: The Sign Test The sign test, as we will typically use it, is a method for analyzing paired data. Examples of Paired Data: Similar subjects are paired off and one of two treatments
More informationData are sometimes not compatible with the assumptions of parametric statistical tests (i.e. t-test, regression, ANOVA)
BSTT523 Pagano & Gauvreau Chapter 13 1 Nonparametric Statistics Data are sometimes not compatible with the assumptions of parametric statistical tests (i.e. t-test, regression, ANOVA) In particular, data
More informationContents 1. Contents
Contents 1 Contents 1 One-Sample Methods 3 1.1 Parametric Methods.................... 4 1.1.1 One-sample Z-test (see Chapter 0.3.1)...... 4 1.1.2 One-sample t-test................. 6 1.1.3 Large sample
More informationGlossary. The ISI glossary of statistical terms provides definitions in a number of different languages:
Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the
More informationInference for Binomial Parameters
Inference for Binomial Parameters Dipankar Bandyopadhyay, Ph.D. Department of Biostatistics, Virginia Commonwealth University D. Bandyopadhyay (VCU) BIOS 625: Categorical Data & GLM 1 / 58 Inference for
More informationChapter 15: Nonparametric Statistics Section 15.1: An Overview of Nonparametric Statistics
Section 15.1: An Overview of Nonparametric Statistics Understand Difference between Parametric and Nonparametric Statistical Procedures Parametric statistical procedures inferential procedures that rely
More informationStatistical Inference: Estimation and Confidence Intervals Hypothesis Testing
Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing 1 In most statistics problems, we assume that the data have been generated from some unknown probability distribution. We desire
More informationDr. Maddah ENMG 617 EM Statistics 10/12/12. Nonparametric Statistics (Chapter 16, Hines)
Dr. Maddah ENMG 617 EM Statistics 10/12/12 Nonparametric Statistics (Chapter 16, Hines) Introduction Most of the hypothesis testing presented so far assumes normally distributed data. These approaches
More informationIntroduction to Statistical Data Analysis III
Introduction to Statistical Data Analysis III JULY 2011 Afsaneh Yazdani Preface Major branches of Statistics: - Descriptive Statistics - Inferential Statistics Preface What is Inferential Statistics? The
More informationComparison of Two Population Means
Comparison of Two Population Means Esra Akdeniz March 15, 2015 Independent versus Dependent (paired) Samples We have independent samples if we perform an experiment in two unrelated populations. We have
More informationModule 9: Nonparametric Statistics Statistics (OA3102)
Module 9: Nonparametric Statistics Statistics (OA3102) Professor Ron Fricker Naval Postgraduate School Monterey, California Reading assignment: WM&S chapter 15.1-15.6 Revision: 3-12 1 Goals for this Lecture
More informationLecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2
Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2 Fall, 2013 Page 1 Random Variable and Probability Distribution Discrete random variable Y : Finite possible values {y
More informationDiscrete Multivariate Statistics
Discrete Multivariate Statistics Univariate Discrete Random variables Let X be a discrete random variable which, in this module, will be assumed to take a finite number of t different values which are
More informationFrequency table: Var2 (Spreadsheet1) Count Cumulative Percent Cumulative From To. Percent <x<=
A frequency distribution is a kind of probability distribution. It gives the frequency or relative frequency at which given values have been observed among the data collected. For example, for age, Frequency
More informationNon-parametric methods
Eastern Mediterranean University Faculty of Medicine Biostatistics course Non-parametric methods March 4&7, 2016 Instructor: Dr. Nimet İlke Akçay (ilke.cetin@emu.edu.tr) Learning Objectives 1. Distinguish
More informationUnit 14: Nonparametric Statistical Methods
Unit 14: Nonparametric Statistical Methods Statistics 571: Statistical Methods Ramón V. León 8/8/2003 Unit 14 - Stat 571 - Ramón V. León 1 Introductory Remarks Most methods studied so far have been based
More informationContents Kruskal-Wallis Test Friedman s Two-way Analysis of Variance by Ranks... 47
Contents 1 Non-parametric Tests 3 1.1 Introduction....................................... 3 1.2 Advantages of Non-parametric Tests......................... 4 1.3 Disadvantages of Non-parametric Tests........................
More informationSTAT Section 5.8: Block Designs
STAT 518 --- Section 5.8: Block Designs Recall that in paired-data studies, we match up pairs of subjects so that the two subjects in a pair are alike in some sense. Then we randomly assign, say, treatment
More informationLecture 25: Models for Matched Pairs
Lecture 25: Models for Matched Pairs Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina Lecture
More informationSign test. Josemari Sarasola - Gizapedia. Statistics for Business. Josemari Sarasola - Gizapedia Sign test 1 / 13
Josemari Sarasola - Gizapedia Statistics for Business Josemari Sarasola - Gizapedia 1 / 13 Definition is a non-parametric test, a special case for the binomial test with p = 1/2, with these applications:
More informationNonparametric tests. Mark Muldoon School of Mathematics, University of Manchester. Mark Muldoon, November 8, 2005 Nonparametric tests - p.
Nonparametric s Mark Muldoon School of Mathematics, University of Manchester Mark Muldoon, November 8, 2005 Nonparametric s - p. 1/31 Overview The sign, motivation The Mann-Whitney Larger Larger, in pictures
More informationCHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)
FROM: PAGANO, R. R. (007) I. INTRODUCTION: DISTINCTION BETWEEN PARAMETRIC AND NON-PARAMETRIC TESTS Statistical inference tests are often classified as to whether they are parametric or nonparametric Parameter
More informationLecture Slides. Elementary Statistics. by Mario F. Triola. and the Triola Statistics Series
Lecture Slides Elementary Statistics Tenth Edition and the Triola Statistics Series by Mario F. Triola Slide 1 Chapter 13 Nonparametric Statistics 13-1 Overview 13-2 Sign Test 13-3 Wilcoxon Signed-Ranks
More informationNonparametric Location Tests: k-sample
Nonparametric Location Tests: k-sample Nathaniel E. Helwig Assistant Professor of Psychology and Statistics University of Minnesota (Twin Cities) Updated 04-Jan-2017 Nathaniel E. Helwig (U of Minnesota)
More informationLecture Slides. Section 13-1 Overview. Elementary Statistics Tenth Edition. Chapter 13 Nonparametric Statistics. by Mario F.
Lecture Slides Elementary Statistics Tenth Edition and the Triola Statistics Series by Mario F. Triola Slide 1 Chapter 13 Nonparametric Statistics 13-1 Overview 13-2 Sign Test 13-3 Wilcoxon Signed-Ranks
More informationSection 3 : Permutation Inference
Section 3 : Permutation Inference Fall 2014 1/39 Introduction Throughout this slides we will focus only on randomized experiments, i.e the treatment is assigned at random We will follow the notation of
More informationEpidemiology Principle of Biostatistics Chapter 11 - Inference about probability in a single population. John Koval
Epidemiology 9509 Principle of Biostatistics Chapter 11 - Inference about probability in a single population John Koval Department of Epidemiology and Biostatistics University of Western Ontario What is
More informationDesign of the Fuzzy Rank Tests Package
Design of the Fuzzy Rank Tests Package Charles J. Geyer July 15, 2013 1 Introduction We do fuzzy P -values and confidence intervals following Geyer and Meeden (2005) and Thompson and Geyer (2007) for three
More informationNONPARAMETRIC TESTS. LALMOHAN BHAR Indian Agricultural Statistics Research Institute Library Avenue, New Delhi-12
NONPARAMETRIC TESTS LALMOHAN BHAR Indian Agricultural Statistics Research Institute Library Avenue, New Delhi-1 lmb@iasri.res.in 1. Introduction Testing (usually called hypothesis testing ) play a major
More informationS D / n t n 1 The paediatrician observes 3 =
Non-parametric tests Paired t-test A paediatrician measured the blood cholesterol of her patients and was worried to note that some had levels over 00mg/100ml To investigate whether dietary regulation
More informationB.N.Bandodkar College of Science, Thane. Random-Number Generation. Mrs M.J.Gholba
B.N.Bandodkar College of Science, Thane Random-Number Generation Mrs M.J.Gholba Properties of Random Numbers A sequence of random numbers, R, R,., must have two important statistical properties, uniformity
More informationUQ, Semester 1, 2017, Companion to STAT2201/CIVL2530 Exam Formulae and Tables
UQ, Semester 1, 2017, Companion to STAT2201/CIVL2530 Exam Formulae and Tables To be provided to students with STAT2201 or CIVIL-2530 (Probability and Statistics) Exam Main exam date: Tuesday, 20 June 1
More informationSection 3: Permutation Inference
Section 3: Permutation Inference Yotam Shem-Tov Fall 2015 Yotam Shem-Tov STAT 239/ PS 236A September 26, 2015 1 / 47 Introduction Throughout this slides we will focus only on randomized experiments, i.e
More informationSummary of Chapters 7-9
Summary of Chapters 7-9 Chapter 7. Interval Estimation 7.2. Confidence Intervals for Difference of Two Means Let X 1,, X n and Y 1, Y 2,, Y m be two independent random samples of sizes n and m from two
More informationA Probability Primer. A random walk down a probabilistic path leading to some stochastic thoughts on chance events and uncertain outcomes.
A Probability Primer A random walk down a probabilistic path leading to some stochastic thoughts on chance events and uncertain outcomes. Are you holding all the cards?? Random Events A random event, E,
More informationANOVA - analysis of variance - used to compare the means of several populations.
12.1 One-Way Analysis of Variance ANOVA - analysis of variance - used to compare the means of several populations. Assumptions for One-Way ANOVA: 1. Independent samples are taken using a randomized design.
More informationDistribution-Free Procedures (Devore Chapter Fifteen)
Distribution-Free Procedures (Devore Chapter Fifteen) MATH-5-01: Probability and Statistics II Spring 018 Contents 1 Nonparametric Hypothesis Tests 1 1.1 The Wilcoxon Rank Sum Test........... 1 1. Normal
More informationResampling Methods. Lukas Meier
Resampling Methods Lukas Meier 20.01.2014 Introduction: Example Hail prevention (early 80s) Is a vaccination of clouds really reducing total energy? Data: Hail energy for n clouds (via radar image) Y i
More informationMATH Notebook 3 Spring 2018
MATH448001 Notebook 3 Spring 2018 prepared by Professor Jenny Baglivo c Copyright 2010 2018 by Jenny A. Baglivo. All Rights Reserved. 3 MATH448001 Notebook 3 3 3.1 One Way Layout........................................
More informationEXST Regression Techniques Page 1. We can also test the hypothesis H :" œ 0 versus H :"
EXST704 - Regression Techniques Page 1 Using F tests instead of t-tests We can also test the hypothesis H :" œ 0 versus H :" Á 0 with an F test.! " " " F œ MSRegression MSError This test is mathematically
More informationEXAM. Exam #1. Math 3342 Summer II, July 21, 2000 ANSWERS
EXAM Exam # Math 3342 Summer II, 2 July 2, 2 ANSWERS i pts. Problem. Consider the following data: 7, 8, 9, 2,, 7, 2, 3. Find the first quartile, the median, and the third quartile. Make a box and whisker
More information9/2/2010. Wildlife Management is a very quantitative field of study. throughout this course and throughout your career.
Introduction to Data and Analysis Wildlife Management is a very quantitative field of study Results from studies will be used throughout this course and throughout your career. Sampling design influences
More informationTable of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).
Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z). For example P(X.04) =.8508. For z < 0 subtract the value from,
More informationEpidemiology Wonders of Biostatistics Chapter 11 (continued) - probability in a single population. John Koval
Epidemiology 9509 Wonders of Biostatistics Chapter 11 (continued) - probability in a single population John Koval Department of Epidemiology and Biostatistics University of Western Ontario What is being
More informationInferential Statistics
Inferential Statistics Eva Riccomagno, Maria Piera Rogantin DIMA Università di Genova riccomagno@dima.unige.it rogantin@dima.unige.it Part G Distribution free hypothesis tests 1. Classical and distribution-free
More informationConditioning Nonparametric null hypotheses Permutation testing. Permutation tests. Patrick Breheny. October 5. STA 621: Nonparametric Statistics
Permutation tests October 5 The conditioning idea In most hypothesis testing problems, information can be divided into portions that pertain to the hypothesis and portions that do not The usual approach
More informationExam details. Final Review Session. Things to Review
Exam details Final Review Session Short answer, similar to book problems Formulae and tables will be given You CAN use a calculator Date and Time: Dec. 7, 006, 1-1:30 pm Location: Osborne Centre, Unit
More informationPerformance Evaluation and Comparison
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Cross Validation and Resampling 3 Interval Estimation
More informationBio 183 Statistics in Research. B. Cleaning up your data: getting rid of problems
Bio 183 Statistics in Research A. Research designs B. Cleaning up your data: getting rid of problems C. Basic descriptive statistics D. What test should you use? What is science?: Science is a way of knowing.(anon.?)
More informationE509A: Principle of Biostatistics. (Week 11(2): Introduction to non-parametric. methods ) GY Zou.
E509A: Principle of Biostatistics (Week 11(2): Introduction to non-parametric methods ) GY Zou gzou@robarts.ca Sign test for two dependent samples Ex 12.1 subj 1 2 3 4 5 6 7 8 9 10 baseline 166 135 189
More informationIntroduction to Nonparametric Statistics
Introduction to Nonparametric Statistics by James Bernhard Spring 2012 Parameters Parametric method Nonparametric method µ[x 2 X 1 ] paired t-test Wilcoxon signed rank test µ[x 1 ], µ[x 2 ] 2-sample t-test
More informationAnalysis of 2x2 Cross-Over Designs using T-Tests
Chapter 234 Analysis of 2x2 Cross-Over Designs using T-Tests Introduction This procedure analyzes data from a two-treatment, two-period (2x2) cross-over design. The response is assumed to be a continuous
More informationGlossary for the Triola Statistics Series
Glossary for the Triola Statistics Series Absolute deviation The measure of variation equal to the sum of the deviations of each value from the mean, divided by the number of values Acceptance sampling
More informationNon-parametric Inference and Resampling
Non-parametric Inference and Resampling Exercises by David Wozabal (Last update. Juni 010) 1 Basic Facts about Rank and Order Statistics 1.1 10 students were asked about the amount of time they spend surfing
More informationInferences About the Difference Between Two Means
7 Inferences About the Difference Between Two Means Chapter Outline 7.1 New Concepts 7.1.1 Independent Versus Dependent Samples 7.1. Hypotheses 7. Inferences About Two Independent Means 7..1 Independent
More informationNon-Parametric Statistics: When Normal Isn t Good Enough"
Non-Parametric Statistics: When Normal Isn t Good Enough" Professor Ron Fricker" Naval Postgraduate School" Monterey, California" 1/28/13 1 A Bit About Me" Academic credentials" Ph.D. and M.A. in Statistics,
More informationStatistical Methods I
tatistical Methods I EXT 7005 Course notes James P Geaghan Louisiana tate University Copyright 010 James P. Geaghan Copyright 010 tatistical Methods I (EXT 7005) Page 101 The two-sample t-test H: 0 μ1
More informationMitosis Data Analysis: Testing Statistical Hypotheses By Dana Krempels, Ph.D. and Steven Green, Ph.D.
Mitosis Data Analysis: Testing Statistical Hypotheses By Dana Krempels, Ph.D. and Steven Green, Ph.D. The number of cells in various stages of mitosis in your treatment and control onions are your raw
More informationDover- Sherborn High School Mathematics Curriculum Probability and Statistics
Mathematics Curriculum A. DESCRIPTION This is a full year courses designed to introduce students to the basic elements of statistics and probability. Emphasis is placed on understanding terminology and
More informationPermutation Tests. Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods
Permutation Tests Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods The Two-Sample Problem We observe two independent random samples: F z = z 1, z 2,, z n independently of
More informationTHE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE
THE ROYAL STATISTICAL SOCIETY 004 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE PAPER II STATISTICAL METHODS The Society provides these solutions to assist candidates preparing for the examinations in future
More information4/6/16. Non-parametric Test. Overview. Stephen Opiyo. Distinguish Parametric and Nonparametric Test Procedures
Non-parametric Test Stephen Opiyo Overview Distinguish Parametric and Nonparametric Test Procedures Explain commonly used Nonparametric Test Procedures Perform Hypothesis Tests Using Nonparametric Procedures
More informationNonparametric Statistics
Nonparametric Statistics Nonparametric or Distribution-free statistics: used when data are ordinal (i.e., rankings) used when ratio/interval data are not normally distributed (data are converted to ranks)
More information5.3 Three-Stage Nested Design Example
5.3 Three-Stage Nested Design Example A researcher designs an experiment to study the of a metal alloy. A three-stage nested design was conducted that included Two alloy chemistry compositions. Three ovens
More informationWeek 7.1--IES 612-STA STA doc
Week 7.1--IES 612-STA 4-573-STA 4-576.doc IES 612/STA 4-576 Winter 2009 ANOVA MODELS model adequacy aka RESIDUAL ANALYSIS Numeric data samples from t populations obtained Assume Y ij ~ independent N(μ
More informationAnswer keys for Assignment 10: Measurement of study variables (The correct answer is underlined in bold text)
Answer keys for Assignment 10: Measurement of study variables (The correct answer is underlined in bold text) 1. A quick and easy indicator of dispersion is a. Arithmetic mean b. Variance c. Standard deviation
More informationLecture 7: Hypothesis Testing and ANOVA
Lecture 7: Hypothesis Testing and ANOVA Goals Overview of key elements of hypothesis testing Review of common one and two sample tests Introduction to ANOVA Hypothesis Testing The intent of hypothesis
More information20 Hypothesis Testing, Part I
20 Hypothesis Testing, Part I Bob has told Alice that the average hourly rate for a lawyer in Virginia is $200 with a standard deviation of $50, but Alice wants to test this claim. If Bob is right, she
More informationMath Review Sheet, Fall 2008
1 Descriptive Statistics Math 3070-5 Review Sheet, Fall 2008 First we need to know about the relationship among Population Samples Objects The distribution of the population can be given in one of the
More informationProbability and Probability Distributions. Dr. Mohammed Alahmed
Probability and Probability Distributions 1 Probability and Probability Distributions Usually we want to do more with data than just describing them! We might want to test certain specific inferences about
More informationContents 1. Contents
Contents 1 Contents 4 Paired Comparisons & Block Designs 3 4.1 Paired Comparisons.................... 3 4.1.1 Paired Data.................... 3 4.1.2 Existing Approaches................ 6 4.1.3 Paired-comparison
More informationE509A: Principle of Biostatistics. GY Zou
E509A: Principle of Biostatistics (Week 4: Inference for a single mean ) GY Zou gzou@srobarts.ca Example 5.4. (p. 183). A random sample of n =16, Mean I.Q is 106 with standard deviation S =12.4. What
More informationFrequency Distribution Cross-Tabulation
Frequency Distribution Cross-Tabulation 1) Overview 2) Frequency Distribution 3) Statistics Associated with Frequency Distribution i. Measures of Location ii. Measures of Variability iii. Measures of Shape
More informationTwo-Sample Inferential Statistics
The t Test for Two Independent Samples 1 Two-Sample Inferential Statistics In an experiment there are two or more conditions One condition is often called the control condition in which the treatment is
More informationProbability and Statistics. Terms and concepts
Probability and Statistics Joyeeta Dutta Moscato June 30, 2014 Terms and concepts Sample vs population Central tendency: Mean, median, mode Variance, standard deviation Normal distribution Cumulative distribution
More informationMAT 2379, Introduction to Biostatistics, Sample Calculator Questions 1. MAT 2379, Introduction to Biostatistics
MAT 2379, Introduction to Biostatistics, Sample Calculator Questions 1 MAT 2379, Introduction to Biostatistics Sample Calculator Problems for the Final Exam Note: The exam will also contain some problems
More informationHypothesis Testing. Hypothesis: conjecture, proposition or statement based on published literature, data, or a theory that may or may not be true
Hypothesis esting Hypothesis: conjecture, proposition or statement based on published literature, data, or a theory that may or may not be true Statistical Hypothesis: conjecture about a population parameter
More information4.8 Alternate Analysis as a Oneway ANOVA
4.8 Alternate Analysis as a Oneway ANOVA Suppose we have data from a two-factor factorial design. The following method can be used to perform a multiple comparison test to compare treatment means as well
More informationChapter 9 Inferences from Two Samples
Chapter 9 Inferences from Two Samples 9-1 Review and Preview 9-2 Two Proportions 9-3 Two Means: Independent Samples 9-4 Two Dependent Samples (Matched Pairs) 9-5 Two Variances or Standard Deviations Review
More informationUNIVERSITY OF TORONTO Faculty of Arts and Science
UNIVERSITY OF TORONTO Faculty of Arts and Science December 2013 Final Examination STA442H1F/2101HF Methods of Applied Statistics Jerry Brunner Duration - 3 hours Aids: Calculator Model(s): Any calculator
More informationStatistics for Managers Using Microsoft Excel Chapter 9 Two Sample Tests With Numerical Data
Statistics for Managers Using Microsoft Excel Chapter 9 Two Sample Tests With Numerical Data 999 Prentice-Hall, Inc. Chap. 9 - Chapter Topics Comparing Two Independent Samples: Z Test for the Difference
More informationHYPOTHESIS TESTING. Hypothesis Testing
MBA 605 Business Analytics Don Conant, PhD. HYPOTHESIS TESTING Hypothesis testing involves making inferences about the nature of the population on the basis of observations of a sample drawn from the population.
More informationPermutation tests. Patrick Breheny. September 25. Conditioning Nonparametric null hypotheses Permutation testing
Permutation tests Patrick Breheny September 25 Patrick Breheny STA 621: Nonparametric Statistics 1/16 The conditioning idea In many hypothesis testing problems, information can be divided into portions
More informationHYPOTHESIS TESTING II TESTS ON MEANS. Sorana D. Bolboacă
HYPOTHESIS TESTING II TESTS ON MEANS Sorana D. Bolboacă OBJECTIVES Significance value vs p value Parametric vs non parametric tests Tests on means: 1 Dec 14 2 SIGNIFICANCE LEVEL VS. p VALUE Materials and
More informationMATH4427 Notebook 5 Fall Semester 2017/2018
MATH4427 Notebook 5 Fall Semester 2017/2018 prepared by Professor Jenny Baglivo c Copyright 2009-2018 by Jenny A. Baglivo. All Rights Reserved. 5 MATH4427 Notebook 5 3 5.1 Two Sample Analysis: Difference
More informationFish SR P Diff Sgn rank Fish SR P Diff Sng rank
Nonparametric tests Distribution free methods require fewer assumptions than parametric methods Focus on testing rather than estimation Not sensitive to outlying observations Especially useful for cruder
More information3. Nonparametric methods
3. Nonparametric methods If the probability distributions of the statistical variables are unknown or are not as required (e.g. normality assumption violated), then we may still apply nonparametric tests
More information5 Introduction to the Theory of Order Statistics and Rank Statistics
5 Introduction to the Theory of Order Statistics and Rank Statistics This section will contain a summary of important definitions and theorems that will be useful for understanding the theory of order
More informationThis is particularly true if you see long tails in your data. What are you testing? That the two distributions are the same!
Two sample tests (part II): What to do if your data are not distributed normally: Option 1: if your sample size is large enough, don't worry - go ahead and use a t-test (the CLT will take care of non-normal
More informationAMS7: WEEK 7. CLASS 1. More on Hypothesis Testing Monday May 11th, 2015
AMS7: WEEK 7. CLASS 1 More on Hypothesis Testing Monday May 11th, 2015 Testing a Claim about a Standard Deviation or a Variance We want to test claims about or 2 Example: Newborn babies from mothers taking
More informationBIOS 625 Fall 2015 Homework Set 3 Solutions
BIOS 65 Fall 015 Homework Set 3 Solutions 1. Agresti.0 Table.1 is from an early study on the death penalty in Florida. Analyze these data and show that Simpson s Paradox occurs. Death Penalty Victim's
More informationHotelling s One- Sample T2
Chapter 405 Hotelling s One- Sample T2 Introduction The one-sample Hotelling s T2 is the multivariate extension of the common one-sample or paired Student s t-test. In a one-sample t-test, the mean response
More informationWhat to do today (Nov 22, 2018)?
What to do today (Nov 22, 2018)? Part 1. Introduction and Review (Chp 1-5) Part 2. Basic Statistical Inference (Chp 6-9) Part 3. Important Topics in Statistics (Chp 10-13) Part 4. Further Topics (Selected
More informationRelating Graph to Matlab
There are two related course documents on the web Probability and Statistics Review -should be read by people without statistics background and it is helpful as a review for those with prior statistics
More informationSample Size and Power I: Binary Outcomes. James Ware, PhD Harvard School of Public Health Boston, MA
Sample Size and Power I: Binary Outcomes James Ware, PhD Harvard School of Public Health Boston, MA Sample Size and Power Principles: Sample size calculations are an essential part of study design Consider
More informationInferential statistics
Inferential statistics Inference involves making a Generalization about a larger group of individuals on the basis of a subset or sample. Ahmed-Refat-ZU Null and alternative hypotheses In hypotheses testing,
More informationBIO5312 Biostatistics Lecture 6: Statistical hypothesis testings
BIO5312 Biostatistics Lecture 6: Statistical hypothesis testings Yujin Chung October 4th, 2016 Fall 2016 Yujin Chung Lec6: Statistical hypothesis testings Fall 2016 1/30 Previous Two types of statistical
More informationMath 494: Mathematical Statistics
Math 494: Mathematical Statistics Instructor: Jimin Ding jmding@wustl.edu Department of Mathematics Washington University in St. Louis Class materials are available on course website (www.math.wustl.edu/
More information