6 Single Sample Methods for a Location Parameter

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "6 Single Sample Methods for a Location Parameter"

Transcription

1 6 Single Sample Methods for a Location Parameter If there are serious departures from parametric test assumptions (e.g., normality or symmetry), nonparametric tests on a measure of central tendency (usually the median) are used. Recall: M is a median of a random variable X if P (X M) = P (X M) =.5. The distribution of X is symmetric about c if P (X c x) = P (X c + x) for all x. For symmetric continuous distributions, the median M = the mean µ. Thus, all conclusions about the median can also be applied to the mean. If X be a binomial ( ) random variable with parameters n and p (denoted X B(n, p)) then n P (X = x) = p x (1 p) n x for x = 0, 1,..., n x where ( ) n = x n! x!(n x)! and k! = k(k 1)(k 2) 2 1. Tables exist for the cdf P (X x) for various choices of n and p. The probabilities and cdf values are also easy to produce using SAS or R. Thus, if X B(n,.5), we have ( ) n P (X = x) = (.5) n x x ( ) n P (X x) = (.5) n k k=0 P (X x) = P (X n x) because the B(n,.5) distribution is symmetric. For sample sizes n > 20 and p =.5, a normal approximation (with continuity correction) to the binomial probabilities is often used instead of binomial tables. Calculate z = (x ±.5).5n.5. Use x+.5 when x <.5n and use x.5 when x >.5n. n The value of z is compared to N(0, 1), the standard normal distribution. For example: P (X x) P (Z z) and P (X x) P (Z z) = 1 P (Z z) 6.1 Ordinary Sign Test Assumptions: Given a random sample of n independent observations The measurement scale is at least nominal. Observations can be classified into 2 nonoverlapping categories whose union exhausts all possibilities. The categories will be labeled + and. 88

2 Hypotheses: The inference involves comparing probabilities P (+) and P ( ) for outcomes + and. (A) Two-sided: H 0 : P (+) = P ( ) vs H 1 : P (+) P ( ) (B) Upper one-sided: H 0 : P (+) P ( ) vs H 1 : P (+) < P ( ) (C) Lower one-sided: H 0 : P (+) P ( ) vs H 1 : P (+) > P ( ) Note: H 0 is true only if P (+) = P ( ) =.5 Method: For a given α Let T + = the number of + observations. Let T = the number of observations. If H 0 is true, then we would expect T + and T to be nearly equal ( n/2). In other words, if H 0 is true, T + and T are binomial B(n,.5) random variables. For alternative hypothesis (A) H 1 : P (+) P ( ). Let T = min(t +, T ). Then find the largest t such that B(n,.5) probability P (X t) α/2. (B) H 1 : P (+) < P ( ). Let T = T +. Then find the largest t such that B(n,.5) probability P (X t) α. (C) H 1 : P (+) > P ( ). Let T = T. Then find the largest t such that B(n,.5) probability P (X t) α. Decision Rule For (A), (B), or (C), if T is too small, then we will reject H 0. That is, If T t, Reject H 0. If T > t, Fail to Reject H 0. Large Sample Approximation 1. For the one-sided H 1, calculate z = T n.5 n z = T +.5.5n.5 n for (B) if T + <.5n z = T +.5.5n.5 n for (C) if T <.5n z = T.5.5n.5 n for (B) if T + >.5n for (C) if T >.5n 2. For the two-sided H 1, take the smaller of the two z-values in (1.). 3. Find Φ(z) = P (Z z) from the standard normal distribution. 4. Reject H 0 if (i) if P (Z z) α for either 1-sided test or (ii) P (Z z) α/2 for the 2-sided test. 89

3 90 44

4 Example: (From Gibbons, Nonparametric Methods for Quantitative Analysis). An oil company is considering the following procedures for training prospective service station managers: 1. On-the-job training under actual working conditions for three months. 2. A company-run school training program concentrated over one month. They plan to compare the two procedures in an experiment. No training program can be the only determining factor for the success of a manager. Success is also affected by other factors such as age, intelligence, and previous experience. In order to eliminate the effects of these factors as much as possible, each trainee is matched with another trainee that has similar attributes (such as similar age and previous experience). If a good match does not exist for a trainee, then the trainee is not included in the experiment. Once pairs are determined, one member of each pair is randomly selected to receive the on-the-job training, while the other is assigned to the company school. After completing the assigned training program, the personnel manager assesses each trainee and judges which member of each pair has done a better job of managing the service station. In total, 13 pairs had completed the training programs. The personnel manager stated that for 10 of the 13 pairs, the better manager received the company school training. Is there sufficient evidence to claim that the company-run school training program is more effective? Table of Binomial Probabilities and Binomial CDF for n=13, p=.5 n p x f(x) = Pr(X=x) F(x) = Pr(X<=x) < <.025 < > Sign (Binomial) Test for Location Assumptions: Given a random sample of n independent observations x 1, x 2,..., x n : The variable of interest is continuous, and the measurement scale is at least ordinal. Hypotheses: The inference concerns a hypothesis about the median M of a single population. (A) Two-sided: H 0 : M = M o vs H 1 : M M o (B) Upper one-sided: H 0 : M = M o vs H 1 : M > M o (C) Lower one-sided: H 0 : M = M o vs H 1 : M < M o 91

5 Method: For a given α Let T + = the number of observations > M o. Let T = the number of observations < M o. Delete any x i = M o and adjust the sample size n accordingly. If H 0 is true, then T + and T are binomial B(n,.5) random variables. Thus, we would expect T + and T to be approximately equal ( n/2). For alternative hypothesis (A) H 1 : M M o. Let T = min(t +, T ). Then find the largest t such that B(n,.5) probability P (X t) α/2. (B) H 1 : M > M o. Let T = T. Then find the largest t such that B(n,.5) probability P (X t) α. (C) H 1 : M < M o. Let T = T +. Then find the largest t such that B(n,.5) probability P (X t) α. Perform the Ordinary Sign Test based on T and t. Decision Rule For (A), (B), or (C), if T is too small, then we will reject H 0. That is, If T t, Reject H 0. If T > t, Fail to Reject H 0. Large Sample Approximation Same as for the Ordinary Sign Test. Example 2.1 from Applied Nonparametric Statistics by W. Daniel. In a study of heart disease, a researcher measured the blood s transit time in subjects with healthy right coronary arteries. The median transit time was 3.50 seconds. In another study, the researchers repeated the transit time study but on a sample of 11 patients with significantly blocked right coronary arteries. The results (in seconds) were Can these researchers conclude (using α =.05) that the median transit time in the population of patients with significantly blocked right coronary arteries is different than 3.50 seconds? 2. Can these researchers conclude (α =.05) that the median transit time in the population of patients with significantly blocked right coronary arteries is less than 3.50 seconds? 92

6 Table of Binomial Probabilities and Binomial CDF for n=10, p=.5 n p x f(x) = Pr(X=x) F(x) = Pr(X<=x) < <-- < Special Case: Paired Data Assumptions: Given a random sample of n independent pairs of observations (x 1, y 1 ), (x 2, y 2 ),..., (x n, y n ) : Both variables X and Y are continuous, and the measurement scales are at least ordinal. Testing Procedure: Calculate all differences D i = y i x i for i = 1,..., n. Use the median difference M D in the hypotheses. Typically, M D = 0. Run the Sign Test based on the differences (the D i values). Example 4.1 from Applied Nonparametric Statistics by W. Daniel. Researchers studied the effects of togetherness on the heart rate in rats. They recorded the heart rates of 10 rats while they were alone and while in the presence of another rat. The results are shown below. Using an α =.05 significance level for the Sign Test, can we conclude that togetherness increases the heart rate in rats? For this data, the ten D i values are

7 6.2.2 Sign Test Examples using R and SAS R Output for Sign (Binomial Test Examples) > # Sign Test Example from Gibbons > binom.test(10,13) Exact binomial test number of successes = 10, number of trials = 13, p-value = <-- Fail to reject alternative hypothesis: true probability of success is not equal to percent confidence interval: <-- The CI contains.5 so we fail to reject Ho sample estimates: probability of success > # Sign (Binomial) Test for Location -- Daniel Ex. 2.1 > time <- c(1.80,3.30,5.65,2.25,2.50,3.50,2.75,3.25,3.10,2.70,3.00) > time = time > time [1] > ties = sum(time==0) > ties [1] 1 > binom.test(sum(time>0),length(time)-ties) Exact binomial test Reject Ho ^^^ number of successes = 1, number of trials = 10, p-value = alternative hypothesis: true probability of success is not equal to percent confidence interval: <--.5 is not in the CI, so reject Ho sample estimates: probability of success 0.1 > # Sign (Binomial) Test for Location -- Paired Data, Daniel Ex. 4.1 > alone <- c(463,462,462,456,450,426,418,415,409,402) > together <- c(523,494,461,535,476,454,448,408,470,437) > diff <- together - alone > ties = sum(diff==0) > ties [1] 0 > binom.test(sum(diff>0),length(diff)-ties) 94

8 Exact binomial test Fail to reject Ho ^^^^^^^ number of successes = 8, number of trials = 10, p-value = alternative hypothesis: true probability of success is not equal to percent confidence interval: sample estimates: probability of success 0.8 R Code for Sign (Binomial Test Examples) # Sign Test Example from Gibbons binom.test(10,13) # Sign (Binomial) Test for Location -- Daniel Ex. 2.1 time <- c(1.80,3.30,5.65,2.25,2.50,3.50,2.75,3.25,3.10,2.70,3.00) time = time time ties = sum(time==0) ties binom.test(sum(time>0),length(time)-ties) # Sign (Binomial) Test for Location -- Paired Data, Daniel Ex. 4.1 alone <- c(463,462,462,456,450,426,418,415,409,402) together <- c(523,494,461,535,476,454,448,408,470,437) diff <- together - alone ties = sum(diff==0) ties binom.test(sum(diff>0),length(diff)-ties) SAS Output for Sign (Binomial) Test Examples In SAS, the Sign (Binomial) Test statistic is denoted M, and it represents the deviation in the observed count T + from the expected count.5n when the null hypothesis is true. Ordinary Sign Test for Training Program Example The UNIVARIATE Procedure Variable: level Tests for Location: Mu0=0 Test -Statistic p Value Student s t t Pr > t Sign M -3.5 Pr >= M <-- Fail to Signed Rank S Pr >= S reject Ho 95

9 Sign (Binomial) Test for Example 2.1 Variable: diff Basic Statistical Measures Location Variability Mean Std Deviation Median Variance Tests for Location: Mu0=0 Test -Statistic p Value Student s t t Pr > t Sign M -4 Pr >= M <-- Reject Signed Rank S Pr >= S Ho Sign (Binomial) Test for Paired Differences -- Example 4.1 Obs alone together diff Variable: diff Basic Statistical Measures Location Variability Mean Std Deviation Median Variance Tests for Location: Mu0=0 Test -Statistic p Value Student s t t Pr > t Sign M 3 Pr >= M <--- Fail to Signed Rank S 24.5 Pr >= S reject Ho 96

10 SAS Code for Sign (Binomial) Test Examples DM LOG; CLEAR; OUT; CLEAR; ; OPTIONS NODATE NONUMBER LS=76 PS=54; ****************************************************************; *** Ordinary Sign Test: Let -1,1 represent the 2 categories. ***; *** The frq values are the category frequencies ***; ****************************************************************; DATA in; INPUT level frq LINES; DATA signtest (DROP=i); SET in; IF level = -1 THEN DO i = 1 TO frq; OUTPUT; end; IF level = 1 THEN DO i = 1 TO frq; OUTPUT; end; PROC UNIVARIATE DATA=signtest; VAR level; TITLE Ordinary Sign Test for Training Program Example ; ******************************************; *** Sign (Binomial) Test for Location: ***; **** Example 2.1 in course notes ***; ******************************************; DATA in2; med_time = 3.50; INPUT time diff = time - med_time; OUTPUT; LINES; PROC UNIVARIATE DATA=in2; VAR diff; TITLE Sign (Binomial) Test for Example 2.1 ; RUN; ****************************************************; *** Sign (Binomial) Test for Paired Differences: ***; *** Example 4.1 in course notes ***; ****************************************************; DATA in3; INPUT alone together diff = together - alone; OUTPUT; LINES; ; PROC PRINT DATA=in3; TITLE Sign (Binomial) Test for Paired Differences -- Example 4.1 ; PROC UNIVARIATE DATA=in3; VAR diff; RUN; 97

11 6.3 Wilcoxon Signed Rank Test Assumptions: Given a random sample of n independent observations X 1,..., X n : Each X i was drawn from a symmetric and continuous population. Each X i has the same median M for i = 1,..., n). The measurement scale is at least on the interval scale. Hypotheses: The inference concerns a hypothesis about the median M of a single population. Given M o, a hypothesized value of the median, we have: (A) Two-sided: H 0 : M = M o vs H 1 : M M o (B) Lower one-sided: H 0 : M = M o vs H 1 : M < M o (C) Upper one-sided: H 0 : M = M o vs H 1 : M > M o Because of the symmetry assumption, we can replace the median M with the mean µ in the hypotheses. Method: For a given α Calculate all differences D i = X i M o. Remove all cases having D i = 0 and adjust the sample size n accordingly. Assign ranks 1, 2,..., n to the D i. For tied D i values, assign average ranks. If H 0 is true, then the D i are symmetrically distributed about 0. That is, we expect (Ranks where Di > 0) (Ranks where D i < 0). Let T + = (R i when D i > 0) and T = (R i when D i < 0). Under H 0, the sampling distributions of T + and T are symmetric about n(n + 1)/4 and can assume integer values from 0 to n(n + 1)/2. Note that T = n(n + 1)/2 T +. For alternative hypothesis: (A) (B) (C) H 1 : M M o, let T = min(t +, T ). Let w be the largest value from the Wilcoxon Signed Rank Test Table such that P (T w) α/2. H 1 : M < M o, let T = T +. Let w be the largest value from the Wilcoxon Signed Rank Test Table such that P (T w) α. H 1 : M > M o, let T = T. Let w be the largest value from the Wilcoxon Signed Rank Test Table such that P (T w) α. Decision Rule If T w, Reject H 0. If T > w, Fail to Reject H 0. 98

12 Large Sample Approximation (with continuity correction) (n > 30) Computer packages like R and SAS calculate approximate p-values for the two-sided alternative H 1 based on large sample normal distribution approximations. The normalizing formula is: T T n(n + 1)/4 = = T E(T ). n(n + 1)(2n + 1)/24 V ar(t ) Daniel (Applied Nonparametric Statistics, page 42) describes an adjustment to this formula in the event of ties. Example of Wilcoxon Signed Rank Test A random sample of 12 fish was taken and the bodyweights recorded. Test the null hypothesis H 0 : µ = 3.0 against the alternative H 1 : µ < 3.0 pounds. (Assume they are sampled from the same symmetric distribution.) R code for Wilcoxon Signed Rank Test with Confidence Interval # Wilcoxon Signed Rank Test with Confidence Interval for Fish Data fish <- c(2.11,2.22,2.23,2.41,2.54,2.73,2.80,2.80,2.92,3.06,3.12,3.12) wilcox.test(fish,mu=3,conf.int=true) R output for Wilcoxon Signed Rank Test with Confidence Interval > # Wilcoxon Signed Rank Test with Confidence Interval for Fish Data > fish <- c(2.11,2.22,2.23,2.41,2.54,2.73,2.80,2.80,2.92,3.06,3.12,3.12) > wilcox.test(fish,mu=3,conf.int=true) Wilcoxon signed rank test with continuity correction data: fish V = 8, p-value = alternative hypothesis: true location is not equal to 3 95 percent confidence interval: sample estimates: (pseudo)median SAS code and selected output: SIGN TEST AND WILCOXON SIGNED RANK TEST The UNIVARIATE Procedure Basic Statistical Measures Location Variability Mean Std Deviation Median Variance Mode Range Interquartile Range

13 Tests for Location: Mu0=0 Test -Statistic p Value Student s t t Pr > t Sign M -3 Pr >= M Signed Rank S -31 Pr >= S <-- p-value The Signed Rank statistics S in SAS = T (n(n + 1)/4 = 8 (12)(13)/4 = 8 39 = 31. The p-value is based on the normal approximation and is for a two-sided alternative. Thus, for one-sided H 1 : M < 3.0, the approximate p-value =.0112/2 = OPTIONS LS=72 PS=60 NONUMBER NODATE; DATA IN; INPUT X X=X-3; CARDS; PROC UNIVARIATE DATA=IN; VAR X; TITLE ONE SAMPLE TESTS FOR LOCATION: ; TITLE2 SIGN TEST AND WILCOXON SIGNED RANK TEST ; RUN; Reference Distribution for the Signed Rank Test (n = 5) If H 0 is true, then any random D i has a probability 1/2 of being > M o or < M o. Without loss of generality, let D 1, D 2, D 3, D 4, D 5 be ordered from smallest to largest. Then when n = 5, every possible ranking of D 1, D 2, D 3, D 4, D 5 has a (1/2) 5 = 1/32 chance of occurring. Possible Ranks Cumulative T + With D i > 0 Probability Probability 0 None 1/32 1/32 = /32 2/32 = /32 3/32 = or 1,2 2/32 5/32 = or 1,3 2/32 7/32 = or 1,4 or 2,3 3/32 10/32 = ,5 or 2,4 or 1,2,3 3/32 13/32 = ,5 or 3,4 or 1,2,4 3/32 16/32 = ,5 or 1,2,5 or 1,3,4 3/32 19/32 = ,5 or 1,3,5 or 2,3,4 3/32 22/32 = ,4,5 or 2,3,5 or 1,2,3,4 3/32 25/32 = ,4,5 or 1,2 3,5 2/32 27/32 = ,4,5 or 1,2 4,5 2/32 29/32 = ,3,4,5 1/32 30/32 = ,3,4,5 1/32 31/32 = ,2,3,4,5 1/32 32/32 = 1 100

14 101 56

15 57102

16 6.3.2 Special Case: Paired Data Assumptions: Given a random sample of n pairs of observations (x 1, y 1 ), (x 2, y 2 ),..., (x n, y n ). Let D i = y i x i for i = 1,..., n. The D i s are independent. The measurement scale is at least on the interval scale. The distribution of the differences D i = y i x i for i = 1,..., n is symmetric. Testing Procedure: Calculate all differences D i = y i x i for i = 1,..., n. Use the median difference M D in the hypotheses. Typically, M D = 0. Because of the symmetry assumption, we can replace the median M D with the mean µ D in the hypotheses. Run the Wilcoxon Signed Test based on the D i. Example of Wilcoxon Signed Rank Test for Paired Data Two judges were asked to independently rate the rehabilitative potential for each of 22 male prison inmates. The following table contains the ratings: Inmate (i) Judge 1 Judge 2 D i D i Sign R i remove tie remove tie

17 SAS code and output: DATA IN; DO INMATE=1 TO 22; INPUT JUDGE1 JUDGE2 DIFF = JUDGE1 - JUDGE2; OUTPUT; END; CARDS; ; PROC UNIVARIATE DATA=IN; VAR DIFF; TITLE WILCOXON SIGNED RANK TEST FOR PAIRED DATA ; RUN; ==================================================================== WILCOXON SIGNED RANK TEST FOR PAIRED DATA The UNIVARIATE Procedure Variable: DIFF Tests for Location: Mu0=0 Test -Statistic p Value Student s t t Pr > t Sign M 5 Pr >= M Signed Rank S 75 Pr >= S <-- Reject Ho The approximate p-value is Thus, we would reject the null hypothesis H o : M D = Confidence Interval for the Median Based on the Wilcoxon Signed Rank Test To find the point estimate for the median M: Calculate all paired averages u ij allowing replication: u ij = x i + x j. 2 There are ( ) ( n 2 + n = n+1 ) 2 such averages. Arrange the u ij in increasing order. The point estimate for M is M = the median of the {u ij }. Method: For an approximate confidence level 100(1 α)% : Use the Wilcoxon Signed Rank Test Table to find the largest t such that P (T t) α/2. Let M L = (t + 1) st u ij observation from the beginning and M U = (t + 1) st u ij observation from the end of the set of ordered u ij values. Statistically, P (M L M M U ) = P (t + 1 T n(n + 1)/2 t) where T is the Wilcoxon signed rank statistic. The approximate 100(1 α)% confidence interval is (M L, M U ). The exact confidence level for (M L, M U ) is determined by the distribution given in the Wilcoxon Signed Rank Test Table. That is, if p = P (X t), then (M L, M U ) is an exact 100(1 2p)% confidence interval for M. 104

18 Note: You do not need to calculate all of the u ij values, but only the (t + 1) st largest and smallest. This procedure is also known as the Hodges-Lehmann estimates of shift. Example of Hodges-Lehmann confidence interval for M A random sample of 12 fish was taken and the body weights were recorded Calculate an approximate 95% confidence interval for the median bodyweight M. For n = 12, the largest value in the Wilcoxon Signed Rank Test Table with P (T t).025 is t = 13. Note: P (T 13) = Then t + 1 = 14 and n(n + 1)/2 t = = 65. Now find the 14 th and 65 th values in the list of the 78 u ij values. HODGES-LEHMANN CONFIDENCE INTERVAL Obs x1 x2 u Obs x1 x2 u <-- lower endpoint <-- upper endpoint Thus, the confidence interval is (2.42,2.93). 105

19 R code for Wilcoxon Signed Rank Test with Confidence Interval # Wilcoxon Signed Rank Test with Confidence Interval for Fish Data fish <- c(2.11,2.22,2.23,2.41,2.54,2.73,2.80,2.80,2.92,3.06,3.12,3.12) wilcox.test(fish,mu=3,conf.int=true) R output for Wilcoxon Signed Rank Test with Confidence Interval Wilcoxon signed rank test with continuity correction data: fish V = 8, p-value = alternative hypothesis: true location is not equal to 3 95 percent confidence interval: <--- Approximate 95% confidence interval for mu is (2.42, 2.93) sample estimates: (pseudo)median Asymptotic Relative Efficiency (A.R.E.) One way to compare properties of statistical tests is to compare the efficiency properties. The definition of efficiency can vary but, generally speaking, it is used to compare the sample size required of one test with that of another test under similar conditions. Suppose that two tests may be used to test a particular H 0 against a particular H 1, and both tests have the same specified α and β errors. These tests are therefore comparable under conditions related to the level of significance α and power (1 β). Thus, the test requiring the smaller sample size to satisfy these conditions will have the smaller sampling cost and effort. That is, the test with the smaller required sample size is more efficient than the other test, and its relative efficiency is greater than one. Let T 1 and T 2 represent two tests that test the same H 0 against the same H 1 with the same specified α and β values. For example, T 1 is the Sign Test and T 2 is the Wilcoxon Signed Rank Test which are used to test H 0 : µ = µ 0 with α =.05 and power 1 β =.90. The relative efficiency of test T 1 with respect to test T 2 is the ratio n 2 /n 1, where n 1 is the required sample size of T 1 to equal the power of test T 2 which has sample size n 2 (assuming the same H 0 and significance level α). Thus, there is a relative efficiency of T 1 with respect to T 2 for each choice of α and n 2. A more general measure of efficiency (asymptotic relative efficiency) was developed. Consider the situation of letting sample size n 1 increase for T 1 with specified α and β. Then there exists a sequence of n 2 values, such that for each value of n 1 (n 1 ), T 2 has the same α and β values. In other words, there is a sequence of relative efficiency values n 2 /n 1. If n 2 /n 1 approaches a constant value as n 1, and, if that constant is the same for all choices of α and β, then the constant is called the asymptotic relative efficiency of T 1 with respect to T

20 Note that if the A.R.E. exists for T 1 and T 2, then the limiting A.R.E. value is independent of the choice of α and β. To select a test with superior power, we generally select the test with the greatest A.R.E. because the power depends on many factors such as the maximum number of observations that can be collected given experimental or sampling resources and the type of distribution that generates the data (normal?, weibull?, gamma?,...) which is usually unknown. The A.R.E. is, in general, difficult to calculate. In this course, we will only consider A.R.E. results for various pairs of tests and for several choices of distributions. Note that A.R.E. assumes that an infinite sample size can be taken. Thus, a natural question arises: How good is a measure assuming an infinitely large sample when most practical situations involve relatively small sample sizes? In an attempt to answer this question, studies of exact relative efficiency values for very small samples have shown that A.R.E. provides a good approximation to the relative efficiency in many situations of practical interest A.R.E. Comparison for Three Single-Sample Tests of Location We will compare the t-test, Sign test, and the Wilcoxon Signed Rank test using the A.R.E. values. To do this we will consider three situations involving symmetric distributions. Under symmetry assumptions, H 0 and H 1 are identical for all three tests. (I) The sample was randomly sampled from a normal distribution having density function ] 1 f(x; µ, σ) = [ σ 2π exp (x µ)2 for < x < 2σ 2 Without loss of generality, we can assume it is a standard normal N(0, 1) having density function φ(x) = 1 2π exp ( x 2 /2 ) for < x < (II) The sample was randomly sampled from a uniform distribution having density function: f(x; a, b) = 1 (b a) for a < x < b = 0 otherwise Without loss of generality, we can assume a uniform U(0, 1) having density function f(x) = 1 for 0 < x < 1 = 0 otherwise The uniform distribution is considered a light-tailed symmetric distribution. 107

21 f(x) = 1 for 0 < x < 1 = 0 otherwise The uniform distribution is considered a light-tailed symmetric distribution. (III) The sample was randomly sampled from a double exponential distribution (DE) having Thedensity samplefunction was randomly sampled from a double exponential distribution (DE) hav- (III) ing f(x; a, b) = 1 ( ) 2b exp x a for < x < f(x; a, b) = 1 ( ) 2b exp x a for < x < b Without loss of generality, we can assume DE(0, 1) having density function Without loss of generality, assume it is DE(0, 1) having density function f(x) f(x) = exp ( x ) 1 exp ( x ) for 2 for < x < The DE distribution is is a heavy-tailed symmetric distribution. Table of A.R.E. Values 61 Test (I) Normal (II) Uniform (III) Double Exponential Comparison Distribution Distribution Distribution Sign test 2/π / / vs t-test (t) (t) (Sign) Wilcoxon Signed Rank test 3/2 = /4 = vs Sign test (Wlcxn) (Wlcxn) (Sign) Wilcoxon Signed Rank test 3/π /2 = vs t-test (t) (=) (Wlcxn) Bold-face letters indicate which test is more efficient 6.5 Introduction to the One-Sample Randomization Test Paired Data Example: An experimental drug was tested on 7 subjects. Blood level measurements were taken before (X) and after (Y ) administering the drug. In this situation, we have paired data. The difference (D = Y X) in blood level measurements for each subject were: Di Patient i D Difference D i The goal is to test the hypothesis that there is no change in blood level measurement after taking the drug. Statistically, we will assume that the distribution of the difference in blood levels measurements under the null hypothesis H 0 is symmetric about

22 Thus, H 0 : µ D = 0. We will consider two possible alternatives: (1) H 1 : µ D 0 and (2) H 1 : µ D < 0. If H 0 is true (and assuming symmetry about 0), the signs (+ or ) of the 7 measurements can be considered random. For example, we could have just as likely observed Di Patient i D Difference D i or Di Patient i D Difference D i or any other randomization of the signs. The Randomization Reference Distribution 1. Consider all possible sign assignments or randomizations of signs for the seven differences. 2. Calculate D i for each randomization. In this example, there are 2 7 = 128 different randomizations of the seven signs. This yields the randomization distribution of D i. In terms of testing, it is statistically equivalent to use the randomization distribution of the mean D. 3. Now compare the OBSERVED D i =.664 to the randomization distribution to find the probability (p-value) associated with the test H 0 : µ D = 0 against the alternative H 1 hypothesis. Case 1: For alternative H 1 : µ D 0, from the randomization reference distribution we see the p-value = P ( D i.664 ) = P (D i.664 ) + P (D i.664 ) = (6 + 6)/128 = Case 2: For alternative H 1 : µ D < 0, from the randomization reference distribution we see the p-value = P (D i.664 ) = 6/128 =

23 RANDOMIZATION TEST #1 Obs ID1 ID2 ID3 ID4 ID5 ID6 ID7 D_SUM CDF < RANDOMIZATION TEST #1 Obs ID1 ID2 ID3 ID4 ID5 ID6 ID7 D_SUM CDF <

24 SAS Code to Generate the Randomization Distribution for the Change in Blood Level Measurements DM LOG;CLEAR;OUT;CLEAR; ; OPTIONS LS=72 PS=68 NONUMBER NODATE; DATA IN; INPUT D1-D7 CARDS; DATA IN; SET IN; DO I1=-1 TO 1 BY 2; ID1=I1*D1; DO I2=-1 TO 1 BY 2; ID2=I2*D2; DO I3=-1 TO 1 BY 2; ID3=I3*D3; DO I4=-1 TO 1 BY 2; ID4=I4*D4; DO I5=-1 TO 1 BY 2; ID5=I5*D5; DO I6=-1 TO 1 BY 2; ID6=I6*D6; DO I7=-1 TO 1 BY 2; ID7=I7*D7; D_SUM = SUM(OF ID1-ID7); OUTPUT; END; END; END; END; END; END; END; KEEP ID1-ID8 D_SUM; PROC SORT DATA=IN; BY D_SUM; DATA IN; SET IN; CDF=_N_/128; PROC PRINT DATA=IN; TITLE RANDOMIZATION TEST #1 ; RUN; Randomization Test for Paired Data Assumptions: Given a random sample of n pairs of observations (x 1, y 1 ), (x 2, y 2 ),..., (x n, y n ). The differences D i = y i x i are independent. The distribution of each D i is symmetric and has the same mean. The measurement scale for the D i s is at least interval. Hypotheses: The inference concerns a hypotheses about whether or not the mean difference µ D = 0: (A) Two-sided: H 0 : µ D = 0 vs H 1 : µ D 0 (B) Lower one-sided: H 0 : µ D = 0 vs H 1 : µ D < 0 (C) Upper one-sided: H 0 : µ D = 0 vs H 1 : µ D > 0 Because of the symmetry assumption, we can replace the mean difference µ D with the median difference M D in the hypotheses. Method: For a given α Calculate the sum D i for each of the possible 2 n sign randomizations. Order these values to form the randomization distribution for the D i. 111

25 Decision Rule For (A) H 1 : µ D 0, If ACTUAL D i < 0, let D = D i. If ACTUAL D i > 0, let D = D i. For (B) H 1 : µ D < 0, let D = D i. For (C) H 1 : µ D > 0, let D = D i. p value = 2 Number of D is D 2 n p value = Number of D is D 2 n p value = Number of D is D 2 n = Number of D i s D 2 n Reject H 0 if p-value α. Otherwise, we fail to reject H 0. Without loss of generality, you can replace D i with D in the preceding arguments. Note that the number of sign randomizations (2 n ) forming the randomization distribution grows rapidly. For example, when n = 20, there are over 1 million randomizations. In such cases, it is generally not feasible to generate the entire randomization distribution. To handle this problem, a large number of randomizations of the signs are randomly taken. Then, an approximate randomization distribution is generated from this large subset of possible randomizations. Approximate p-values can then be determined from this distribution. This is known as the monte-carlo approach to generating approximate p-values. The following R code will generate p-values using the monte-carlo approach for the single-sample Randomization Test for Location. R Code for Randomization Test on Paired Data (Differences) # Single Sample Randomization Test for Location # Enter the number of permutations to take Prep = Prep # Enter vector of differences D <- c(-.187,.011,-.250,.034,-.137,-.112,-.023) D # Calculate the mean difference meand <- mean(d) meand sgnd <- sign(meand) Fp = 0 112

26 upper <- 0 lower <- 0 n = length(d) # Begin sign randomizations meanpermd <- 1:Prep for (i in 1:Prep){ sgnvec <- sign(runif(n)-.5) permvec <- sgnvec*d # random vector with 1 or -1 values } # Calculate the mean difference for the i_th randomization vector meanpermd[i] <- mean(permvec) if(meanpermd[i]>=meand) upper = upper+1 if(meanpermd[i]<=meand) lower = lower+1 # Calculate p-values: # for lower one-sided Ho pval_lower <- lower/prep pval_lower # for upper one-sided Ho p-val_upper <- upper/prep pval_upper # for two-sided Ho if(sgnd < 0) pval_two_sided = if(sgnd > 0) pval_two_sided = pval_two_sided 2*pval_lower 2*pval_upper hist(meanpermd) R Output for Randomization Test on Paired Data > meand > # Calculate p-values: > # for lower one-sided Ho [1] > # for upper one-sided H0 [1] > # for two-sided Ho [1] Note that the p-values from the monte-carlo approach (.04644,.96076,.09288) approximate the exact p-values of ( , ,.09375) from the true randomization distribution. Without loss of generality, I used D instead of D i in my R code. 113

27 Histogram ofhistogram 50,000 Values of meanpermd using the Monte-Carlo Approach Frequency meanpermd Single Sample Randomization Test for H o : µ = µ 0 Suppose we want to perform a randomization test if our inference concerns a hypotheses about whether or not µ = µ0 for some specified value µ 0 against one of three alternatives: (A) Two-sided: H 0 : µ = µ 0 vs H 1 : µ µ 0 (B) Lower one-sided: H 0 : µ = µ 0 vs H 1 : µ < µ 0 (C) Upper one-sided: H 0 : µ = µ 0 vs H 1 : µ > µ 0 To perform a randomization test, simply subtract µ 0 from each observation, and then run the randomization test as you would for paired data. Example: A random sample of 12 fish was taken and the bodyweights recorded. Test the null hypothesis H 0 : µ = 3.0 against the alternative H 1 : µ < 3.0 pounds Based on the randomization test, the p-value is approximately Therefore, we would reject H 0 : µ = 3.0 and conclude that µ < 3.0 pounds. 114

28 R Code for Randomization Test for Fish Weight Data # Single Sample Randomization Test for Location # Enter the number of randomizations to take Prep = Prep fish <- c(2.11,2.22,2.23,2.41,2.54,2.73,2.80,2.80,2.92,3.06,3.12,3.12) # Enter the hypothesized mean mu0 = 3 # Enter vector of differences D <- fish - mu0 D <-- enter mu_0 <- subtract mu_0 from the data # Calculate the mean difference meand <- mean(d) meand sgnd <- sign(meand) Fp = 0 upper <- 0 lower <- 0 n = length(d) # Begin sign randomizations meanpermd <- 1:Prep for (i in 1:Prep){ sgnvec <- sign(runif(n)-.5) permvec <- sgnvec*d # random vector with 1 or -1 values # Calculate the mean difference for the i_th randomization vector meanpermd[i] <- mean(permvec) if(meanpermd[i]>=meand) upper = upper+1 if(meanpermd[i]<=meand) lower = lower+1 } # Calculate p-values: # for lower one-sided Ho pval_lower <- lower/prep pval_lower # for upper one-sided H0 pval_upper <- upper/prep pval_upper # for two-sided Ho if(sgnd < 0) pval_two_sided = if(sgnd > 0) pval_two_sided = pval_two_sided 2*pval_lower 2*pval_upper hist(meanpermd) 115

29 R Output for Randomization Test on Fish Weight Data > # Enter vector of differences [1] > # Calculate the mean difference [1] > # Calculate p-values: > # for lower one-sided Ho [1] > # for upper one-sided H0 [1] > # for two-sided Ho [1] Histogram of 50,000 Values using the Monte-Carlo Approach Histogram of meanpermd Frequency meanpermd 116

Nonparametric tests. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 704: Data Analysis I

Nonparametric tests. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 704: Data Analysis I 1 / 16 Nonparametric tests Timothy Hanson Department of Statistics, University of South Carolina Stat 704: Data Analysis I Nonparametric one and two-sample tests 2 / 16 If data do not come from a normal

More information

Contents 1. Contents

Contents 1. Contents Contents 1 Contents 1 One-Sample Methods 3 1.1 Parametric Methods.................... 4 1.1.1 One-sample Z-test (see Chapter 0.3.1)...... 4 1.1.2 One-sample t-test................. 6 1.1.3 Large sample

More information

STAT Section 3.4: The Sign Test. The sign test, as we will typically use it, is a method for analyzing paired data.

STAT Section 3.4: The Sign Test. The sign test, as we will typically use it, is a method for analyzing paired data. STAT 518 --- Section 3.4: The Sign Test The sign test, as we will typically use it, is a method for analyzing paired data. Examples of Paired Data: Similar subjects are paired off and one of two treatments

More information

Inference for Binomial Parameters

Inference for Binomial Parameters Inference for Binomial Parameters Dipankar Bandyopadhyay, Ph.D. Department of Biostatistics, Virginia Commonwealth University D. Bandyopadhyay (VCU) BIOS 625: Categorical Data & GLM 1 / 58 Inference for

More information

Module 9: Nonparametric Statistics Statistics (OA3102)

Module 9: Nonparametric Statistics Statistics (OA3102) Module 9: Nonparametric Statistics Statistics (OA3102) Professor Ron Fricker Naval Postgraduate School Monterey, California Reading assignment: WM&S chapter 15.1-15.6 Revision: 3-12 1 Goals for this Lecture

More information

Contents Kruskal-Wallis Test Friedman s Two-way Analysis of Variance by Ranks... 47

Contents Kruskal-Wallis Test Friedman s Two-way Analysis of Variance by Ranks... 47 Contents 1 Non-parametric Tests 3 1.1 Introduction....................................... 3 1.2 Advantages of Non-parametric Tests......................... 4 1.3 Disadvantages of Non-parametric Tests........................

More information

Non-parametric methods

Non-parametric methods Eastern Mediterranean University Faculty of Medicine Biostatistics course Non-parametric methods March 4&7, 2016 Instructor: Dr. Nimet İlke Akçay (ilke.cetin@emu.edu.tr) Learning Objectives 1. Distinguish

More information

Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2

Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2 Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2 Fall, 2013 Page 1 Random Variable and Probability Distribution Discrete random variable Y : Finite possible values {y

More information

Design of the Fuzzy Rank Tests Package

Design of the Fuzzy Rank Tests Package Design of the Fuzzy Rank Tests Package Charles J. Geyer July 15, 2013 1 Introduction We do fuzzy P -values and confidence intervals following Geyer and Meeden (2005) and Thompson and Geyer (2007) for three

More information

S D / n t n 1 The paediatrician observes 3 =

S D / n t n 1 The paediatrician observes 3 = Non-parametric tests Paired t-test A paediatrician measured the blood cholesterol of her patients and was worried to note that some had levels over 00mg/100ml To investigate whether dietary regulation

More information

B.N.Bandodkar College of Science, Thane. Random-Number Generation. Mrs M.J.Gholba

B.N.Bandodkar College of Science, Thane. Random-Number Generation. Mrs M.J.Gholba B.N.Bandodkar College of Science, Thane Random-Number Generation Mrs M.J.Gholba Properties of Random Numbers A sequence of random numbers, R, R,., must have two important statistical properties, uniformity

More information

Nonparametric Location Tests: k-sample

Nonparametric Location Tests: k-sample Nonparametric Location Tests: k-sample Nathaniel E. Helwig Assistant Professor of Psychology and Statistics University of Minnesota (Twin Cities) Updated 04-Jan-2017 Nathaniel E. Helwig (U of Minnesota)

More information

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007) FROM: PAGANO, R. R. (007) I. INTRODUCTION: DISTINCTION BETWEEN PARAMETRIC AND NON-PARAMETRIC TESTS Statistical inference tests are often classified as to whether they are parametric or nonparametric Parameter

More information

Distribution-Free Procedures (Devore Chapter Fifteen)

Distribution-Free Procedures (Devore Chapter Fifteen) Distribution-Free Procedures (Devore Chapter Fifteen) MATH-5-01: Probability and Statistics II Spring 018 Contents 1 Nonparametric Hypothesis Tests 1 1.1 The Wilcoxon Rank Sum Test........... 1 1. Normal

More information

EXST Regression Techniques Page 1. We can also test the hypothesis H :" œ 0 versus H :"

EXST Regression Techniques Page 1. We can also test the hypothesis H : œ 0 versus H : EXST704 - Regression Techniques Page 1 Using F tests instead of t-tests We can also test the hypothesis H :" œ 0 versus H :" Á 0 with an F test.! " " " F œ MSRegression MSError This test is mathematically

More information

Performance Evaluation and Comparison

Performance Evaluation and Comparison Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Cross Validation and Resampling 3 Interval Estimation

More information

E509A: Principle of Biostatistics. (Week 11(2): Introduction to non-parametric. methods ) GY Zou.

E509A: Principle of Biostatistics. (Week 11(2): Introduction to non-parametric. methods ) GY Zou. E509A: Principle of Biostatistics (Week 11(2): Introduction to non-parametric methods ) GY Zou gzou@robarts.ca Sign test for two dependent samples Ex 12.1 subj 1 2 3 4 5 6 7 8 9 10 baseline 166 135 189

More information

Glossary for the Triola Statistics Series

Glossary for the Triola Statistics Series Glossary for the Triola Statistics Series Absolute deviation The measure of variation equal to the sum of the deviations of each value from the mean, divided by the number of values Acceptance sampling

More information

Non-Parametric Statistics: When Normal Isn t Good Enough"

Non-Parametric Statistics: When Normal Isn t Good Enough Non-Parametric Statistics: When Normal Isn t Good Enough" Professor Ron Fricker" Naval Postgraduate School" Monterey, California" 1/28/13 1 A Bit About Me" Academic credentials" Ph.D. and M.A. in Statistics,

More information

9/2/2010. Wildlife Management is a very quantitative field of study. throughout this course and throughout your career.

9/2/2010. Wildlife Management is a very quantitative field of study. throughout this course and throughout your career. Introduction to Data and Analysis Wildlife Management is a very quantitative field of study Results from studies will be used throughout this course and throughout your career. Sampling design influences

More information

Lecture 7: Hypothesis Testing and ANOVA

Lecture 7: Hypothesis Testing and ANOVA Lecture 7: Hypothesis Testing and ANOVA Goals Overview of key elements of hypothesis testing Review of common one and two sample tests Introduction to ANOVA Hypothesis Testing The intent of hypothesis

More information

Inferences About the Difference Between Two Means

Inferences About the Difference Between Two Means 7 Inferences About the Difference Between Two Means Chapter Outline 7.1 New Concepts 7.1.1 Independent Versus Dependent Samples 7.1. Hypotheses 7. Inferences About Two Independent Means 7..1 Independent

More information

Probability and Probability Distributions. Dr. Mohammed Alahmed

Probability and Probability Distributions. Dr. Mohammed Alahmed Probability and Probability Distributions 1 Probability and Probability Distributions Usually we want to do more with data than just describing them! We might want to test certain specific inferences about

More information

Nonparametric Statistics

Nonparametric Statistics Nonparametric Statistics Nonparametric or Distribution-free statistics: used when data are ordinal (i.e., rankings) used when ratio/interval data are not normally distributed (data are converted to ranks)

More information

Contents 1. Contents

Contents 1. Contents Contents 1 Contents 4 Paired Comparisons & Block Designs 3 4.1 Paired Comparisons.................... 3 4.1.1 Paired Data.................... 3 4.1.2 Existing Approaches................ 6 4.1.3 Paired-comparison

More information

Two-Sample Inferential Statistics

Two-Sample Inferential Statistics The t Test for Two Independent Samples 1 Two-Sample Inferential Statistics In an experiment there are two or more conditions One condition is often called the control condition in which the treatment is

More information

MAT 2379, Introduction to Biostatistics, Sample Calculator Questions 1. MAT 2379, Introduction to Biostatistics

MAT 2379, Introduction to Biostatistics, Sample Calculator Questions 1. MAT 2379, Introduction to Biostatistics MAT 2379, Introduction to Biostatistics, Sample Calculator Questions 1 MAT 2379, Introduction to Biostatistics Sample Calculator Problems for the Final Exam Note: The exam will also contain some problems

More information

Frequency Distribution Cross-Tabulation

Frequency Distribution Cross-Tabulation Frequency Distribution Cross-Tabulation 1) Overview 2) Frequency Distribution 3) Statistics Associated with Frequency Distribution i. Measures of Location ii. Measures of Variability iii. Measures of Shape

More information

3. Nonparametric methods

3. Nonparametric methods 3. Nonparametric methods If the probability distributions of the statistical variables are unknown or are not as required (e.g. normality assumption violated), then we may still apply nonparametric tests

More information

Introduction to Nonparametric Statistics

Introduction to Nonparametric Statistics Introduction to Nonparametric Statistics by James Bernhard Spring 2012 Parameters Parametric method Nonparametric method µ[x 2 X 1 ] paired t-test Wilcoxon signed rank test µ[x 1 ], µ[x 2 ] 2-sample t-test

More information

Sample Size and Power I: Binary Outcomes. James Ware, PhD Harvard School of Public Health Boston, MA

Sample Size and Power I: Binary Outcomes. James Ware, PhD Harvard School of Public Health Boston, MA Sample Size and Power I: Binary Outcomes James Ware, PhD Harvard School of Public Health Boston, MA Sample Size and Power Principles: Sample size calculations are an essential part of study design Consider

More information

5 Introduction to the Theory of Order Statistics and Rank Statistics

5 Introduction to the Theory of Order Statistics and Rank Statistics 5 Introduction to the Theory of Order Statistics and Rank Statistics This section will contain a summary of important definitions and theorems that will be useful for understanding the theory of order

More information

Analysis of 2x2 Cross-Over Designs using T-Tests

Analysis of 2x2 Cross-Over Designs using T-Tests Chapter 234 Analysis of 2x2 Cross-Over Designs using T-Tests Introduction This procedure analyzes data from a two-treatment, two-period (2x2) cross-over design. The response is assumed to be a continuous

More information

In many situations, there is a non-parametric test that corresponds to the standard test, as described below:

In many situations, there is a non-parametric test that corresponds to the standard test, as described below: There are many standard tests like the t-tests and analyses of variance that are commonly used. They rest on assumptions like normality, which can be hard to assess: for example, if you have small samples,

More information

Outline. PubH 5450 Biostatistics I Prof. Carlin. Confidence Interval for the Mean. Part I. Reviews

Outline. PubH 5450 Biostatistics I Prof. Carlin. Confidence Interval for the Mean. Part I. Reviews Outline Outline PubH 5450 Biostatistics I Prof. Carlin Lecture 11 Confidence Interval for the Mean Known σ (population standard deviation): Part I Reviews σ x ± z 1 α/2 n Small n, normal population. Large

More information

Nonparametric Tests. Mathematics 47: Lecture 25. Dan Sloughter. Furman University. April 20, 2006

Nonparametric Tests. Mathematics 47: Lecture 25. Dan Sloughter. Furman University. April 20, 2006 Nonparametric Tests Mathematics 47: Lecture 25 Dan Sloughter Furman University April 20, 2006 Dan Sloughter (Furman University) Nonparametric Tests April 20, 2006 1 / 14 The sign test Suppose X 1, X 2,...,

More information

Statistics 135 Fall 2008 Final Exam

Statistics 135 Fall 2008 Final Exam Name: SID: Statistics 135 Fall 2008 Final Exam Show your work. The number of points each question is worth is shown at the beginning of the question. There are 10 problems. 1. [2] The normal equations

More information

BIOS 6222: Biostatistics II. Outline. Course Presentation. Course Presentation. Review of Basic Concepts. Why Nonparametrics.

BIOS 6222: Biostatistics II. Outline. Course Presentation. Course Presentation. Review of Basic Concepts. Why Nonparametrics. BIOS 6222: Biostatistics II Instructors: Qingzhao Yu Don Mercante Cruz Velasco 1 Outline Course Presentation Review of Basic Concepts Why Nonparametrics The sign test 2 Course Presentation Contents Justification

More information

Sample Size Determination

Sample Size Determination Sample Size Determination 018 The number of subjects in a clinical study should always be large enough to provide a reliable answer to the question(s addressed. The sample size is usually determined by

More information

1 ONE SAMPLE TEST FOR MEDIAN: THE SIGN TEST

1 ONE SAMPLE TEST FOR MEDIAN: THE SIGN TEST NON-PARAMETRIC STATISTICS ONE AND TWO SAMPLE TESTS Non-parametric tests are normally based on ranks of the data samples, and test hypotheses relating to quantiles of the probability distribution representing

More information

Non-parametric Tests

Non-parametric Tests Statistics Column Shengping Yang PhD,Gilbert Berdine MD I was working on a small study recently to compare drug metabolite concentrations in the blood between two administration regimes. However, the metabolite

More information

16. Nonparametric Methods. Analysis of ordinal data

16. Nonparametric Methods. Analysis of ordinal data 16. Nonparametric Methods 數 Analysis of ordinal data 料 1 Data : Non-interval data : nominal data, ordinal data Interval data but not normally distributed Nonparametric tests : Two dependent samples pair

More information

psychological statistics

psychological statistics psychological statistics B Sc. Counselling Psychology 011 Admission onwards III SEMESTER COMPLEMENTARY COURSE UNIVERSITY OF CALICUT SCHOOL OF DISTANCE EDUCATION CALICUT UNIVERSITY.P.O., MALAPPURAM, KERALA,

More information

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

 M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2 Notation and Equations for Final Exam Symbol Definition X The variable we measure in a scientific study n The size of the sample N The size of the population M The mean of the sample µ The mean of the

More information

Statistics 100A Homework 5 Solutions

Statistics 100A Homework 5 Solutions Chapter 5 Statistics 1A Homework 5 Solutions Ryan Rosario 1. Let X be a random variable with probability density function a What is the value of c? fx { c1 x 1 < x < 1 otherwise We know that for fx to

More information

ST3241 Categorical Data Analysis I Two-way Contingency Tables. 2 2 Tables, Relative Risks and Odds Ratios

ST3241 Categorical Data Analysis I Two-way Contingency Tables. 2 2 Tables, Relative Risks and Odds Ratios ST3241 Categorical Data Analysis I Two-way Contingency Tables 2 2 Tables, Relative Risks and Odds Ratios 1 What Is A Contingency Table (p.16) Suppose X and Y are two categorical variables X has I categories

More information

Hotelling s One- Sample T2

Hotelling s One- Sample T2 Chapter 405 Hotelling s One- Sample T2 Introduction The one-sample Hotelling s T2 is the multivariate extension of the common one-sample or paired Student s t-test. In a one-sample t-test, the mean response

More information

AP Statistics Cumulative AP Exam Study Guide

AP Statistics Cumulative AP Exam Study Guide AP Statistics Cumulative AP Eam Study Guide Chapters & 3 - Graphs Statistics the science of collecting, analyzing, and drawing conclusions from data. Descriptive methods of organizing and summarizing statistics

More information

Basics on t-tests Independent Sample t-tests Single-Sample t-tests Summary of t-tests Multiple Tests, Effect Size Proportions. Statistiek I.

Basics on t-tests Independent Sample t-tests Single-Sample t-tests Summary of t-tests Multiple Tests, Effect Size Proportions. Statistiek I. Statistiek I t-tests John Nerbonne CLCG, Rijksuniversiteit Groningen http://www.let.rug.nl/nerbonne/teach/statistiek-i/ John Nerbonne 1/46 Overview 1 Basics on t-tests 2 Independent Sample t-tests 3 Single-Sample

More information

Testing Independence

Testing Independence Testing Independence Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM 1/50 Testing Independence Previously, we looked at RR = OR = 1

More information

Evaluating Classifiers. Lecture 2 Instructor: Max Welling

Evaluating Classifiers. Lecture 2 Instructor: Max Welling Evaluating Classifiers Lecture 2 Instructor: Max Welling Evaluation of Results How do you report classification error? How certain are you about the error you claim? How do you compare two algorithms?

More information

Wilcoxon Test and Calculating Sample Sizes

Wilcoxon Test and Calculating Sample Sizes Wilcoxon Test and Calculating Sample Sizes Dan Spencer UC Santa Cruz Dan Spencer (UC Santa Cruz) Wilcoxon Test and Calculating Sample Sizes 1 / 33 Differences in the Means of Two Independent Groups When

More information

Two Sample Problems. Two sample problems

Two Sample Problems. Two sample problems Two Sample Problems Two sample problems The goal of inference is to compare the responses in two groups. Each group is a sample from a different population. The responses in each group are independent

More information

Continuous Probability Distributions

Continuous Probability Distributions 1 Chapter 5 Continuous Probability Distributions 5.1 Probability density function Example 5.1.1. Revisit Example 3.1.1. 11 12 13 14 15 16 21 22 23 24 25 26 S = 31 32 33 34 35 36 41 42 43 44 45 46 (5.1.1)

More information

Slides 8: Statistical Models in Simulation

Slides 8: Statistical Models in Simulation Slides 8: Statistical Models in Simulation Purpose and Overview The world the model-builder sees is probabilistic rather than deterministic: Some statistical model might well describe the variations. An

More information

Inference for Distributions Inference for the Mean of a Population

Inference for Distributions Inference for the Mean of a Population Inference for Distributions Inference for the Mean of a Population PBS Chapter 7.1 009 W.H Freeman and Company Objectives (PBS Chapter 7.1) Inference for the mean of a population The t distributions The

More information

Inverse Sampling for McNemar s Test

Inverse Sampling for McNemar s Test International Journal of Statistics and Probability; Vol. 6, No. 1; January 27 ISSN 1927-7032 E-ISSN 1927-7040 Published by Canadian Center of Science and Education Inverse Sampling for McNemar s Test

More information

One-Sample Numerical Data

One-Sample Numerical Data One-Sample Numerical Data quantiles, boxplot, histogram, bootstrap confidence intervals, goodness-of-fit tests University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html

More information

MATH4427 Notebook 4 Fall Semester 2017/2018

MATH4427 Notebook 4 Fall Semester 2017/2018 MATH4427 Notebook 4 Fall Semester 2017/2018 prepared by Professor Jenny Baglivo c Copyright 2009-2018 by Jenny A. Baglivo. All Rights Reserved. 4 MATH4427 Notebook 4 3 4.1 K th Order Statistics and Their

More information

4 Hypothesis testing. 4.1 Types of hypothesis and types of error 4 HYPOTHESIS TESTING 49

4 Hypothesis testing. 4.1 Types of hypothesis and types of error 4 HYPOTHESIS TESTING 49 4 HYPOTHESIS TESTING 49 4 Hypothesis testing In sections 2 and 3 we considered the problem of estimating a single parameter of interest, θ. In this section we consider the related problem of testing whether

More information

Agonistic Display in Betta splendens: Data Analysis I. Betta splendens Research: Parametric or Non-parametric Data?

Agonistic Display in Betta splendens: Data Analysis I. Betta splendens Research: Parametric or Non-parametric Data? Agonistic Display in Betta splendens: Data Analysis By Joanna Weremjiwicz, Simeon Yurek, and Dana Krempels Once you have collected data with your ethogram, you are ready to analyze that data to see whether

More information

CHAPTER 8. Test Procedures is a rule, based on sample data, for deciding whether to reject H 0 and contains:

CHAPTER 8. Test Procedures is a rule, based on sample data, for deciding whether to reject H 0 and contains: CHAPTER 8 Test of Hypotheses Based on a Single Sample Hypothesis testing is the method that decide which of two contradictory claims about the parameter is correct. Here the parameters of interest are

More information

Chapter 7. Inference for Distributions. Introduction to the Practice of STATISTICS SEVENTH. Moore / McCabe / Craig. Lecture Presentation Slides

Chapter 7. Inference for Distributions. Introduction to the Practice of STATISTICS SEVENTH. Moore / McCabe / Craig. Lecture Presentation Slides Chapter 7 Inference for Distributions Introduction to the Practice of STATISTICS SEVENTH EDITION Moore / McCabe / Craig Lecture Presentation Slides Chapter 7 Inference for Distributions 7.1 Inference for

More information

Superiority by a Margin Tests for One Proportion

Superiority by a Margin Tests for One Proportion Chapter 103 Superiority by a Margin Tests for One Proportion Introduction This module provides power analysis and sample size calculation for one-sample proportion tests in which the researcher is testing

More information

Bootstrap tests. Patrick Breheny. October 11. Bootstrap vs. permutation tests Testing for equality of location

Bootstrap tests. Patrick Breheny. October 11. Bootstrap vs. permutation tests Testing for equality of location Bootstrap tests Patrick Breheny October 11 Patrick Breheny STA 621: Nonparametric Statistics 1/14 Introduction Conditioning on the observed data to obtain permutation tests is certainly an important idea

More information

Lecture 4. Checking Model Adequacy

Lecture 4. Checking Model Adequacy Lecture 4. Checking Model Adequacy Montgomery: 3-4, 15-1.1 Page 1 Model Checking and Diagnostics Model Assumptions 1 Model is correct 2 Independent observations 3 Errors normally distributed 4 Constant

More information

Example. χ 2 = Continued on the next page. All cells

Example. χ 2 = Continued on the next page. All cells Section 11.1 Chi Square Statistic k Categories 1 st 2 nd 3 rd k th Total Observed Frequencies O 1 O 2 O 3 O k n Expected Frequencies E 1 E 2 E 3 E k n O 1 + O 2 + O 3 + + O k = n E 1 + E 2 + E 3 + + E

More information

Sigmaplot di Systat Software

Sigmaplot di Systat Software Sigmaplot di Systat Software SigmaPlot Has Extensive Statistical Analysis Features SigmaPlot is now bundled with SigmaStat as an easy-to-use package for complete graphing and data analysis. The statistical

More information

CIVL /8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8

CIVL /8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8 CIVL - 7904/8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8 Chi-square Test How to determine the interval from a continuous distribution I = Range 1 + 3.322(logN) I-> Range of the class interval

More information

Statistics for Managers Using Microsoft Excel/SPSS Chapter 8 Fundamentals of Hypothesis Testing: One-Sample Tests

Statistics for Managers Using Microsoft Excel/SPSS Chapter 8 Fundamentals of Hypothesis Testing: One-Sample Tests Statistics for Managers Using Microsoft Excel/SPSS Chapter 8 Fundamentals of Hypothesis Testing: One-Sample Tests 1999 Prentice-Hall, Inc. Chap. 8-1 Chapter Topics Hypothesis Testing Methodology Z Test

More information

Week 1 Quantitative Analysis of Financial Markets Distributions A

Week 1 Quantitative Analysis of Financial Markets Distributions A Week 1 Quantitative Analysis of Financial Markets Distributions A Christopher Ting http://www.mysmu.edu/faculty/christophert/ Christopher Ting : christopherting@smu.edu.sg : 6828 0364 : LKCSB 5036 October

More information

Hypothesis Tests and Estimation for Population Variances. Copyright 2014 Pearson Education, Inc.

Hypothesis Tests and Estimation for Population Variances. Copyright 2014 Pearson Education, Inc. Hypothesis Tests and Estimation for Population Variances 11-1 Learning Outcomes Outcome 1. Formulate and carry out hypothesis tests for a single population variance. Outcome 2. Develop and interpret confidence

More information

Statistics for Managers Using Microsoft Excel Chapter 10 ANOVA and Other C-Sample Tests With Numerical Data

Statistics for Managers Using Microsoft Excel Chapter 10 ANOVA and Other C-Sample Tests With Numerical Data Statistics for Managers Using Microsoft Excel Chapter 10 ANOVA and Other C-Sample Tests With Numerical Data 1999 Prentice-Hall, Inc. Chap. 10-1 Chapter Topics The Completely Randomized Model: One-Factor

More information

Purposes of Data Analysis. Variables and Samples. Parameters and Statistics. Part 1: Probability Distributions

Purposes of Data Analysis. Variables and Samples. Parameters and Statistics. Part 1: Probability Distributions Part 1: Probability Distributions Purposes of Data Analysis True Distributions or Relationships in the Earths System Probability Distribution Normal Distribution Student-t Distribution Chi Square Distribution

More information

Multinomial Logistic Regression Models

Multinomial Logistic Regression Models Stat 544, Lecture 19 1 Multinomial Logistic Regression Models Polytomous responses. Logistic regression can be extended to handle responses that are polytomous, i.e. taking r>2 categories. (Note: The word

More information

Do not copy, post, or distribute. Independent-Samples t Test and Mann- C h a p t e r 13

Do not copy, post, or distribute. Independent-Samples t Test and Mann- C h a p t e r 13 C h a p t e r 13 Independent-Samples t Test and Mann- Whitney U Test 13.1 Introduction and Objectives This chapter continues the theme of hypothesis testing as an inferential statistical procedure. In

More information

A nonparametric two-sample wald test of equality of variances

A nonparametric two-sample wald test of equality of variances University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 211 A nonparametric two-sample wald test of equality of variances David

More information

Bootstrapping, Randomization, 2B-PLS

Bootstrapping, Randomization, 2B-PLS Bootstrapping, Randomization, 2B-PLS Statistics, Tests, and Bootstrapping Statistic a measure that summarizes some feature of a set of data (e.g., mean, standard deviation, skew, coefficient of variation,

More information

Lecture 10: Comparing two populations: proportions

Lecture 10: Comparing two populations: proportions Lecture 10: Comparing two populations: proportions Problem: Compare two sets of sample data: e.g. is the proportion of As in this semester 152 the same as last Fall? Methods: Extend the methods introduced

More information

Rank-sum Test Based on Order Restricted Randomized Design

Rank-sum Test Based on Order Restricted Randomized Design Rank-sum Test Based on Order Restricted Randomized Design Omer Ozturk and Yiping Sun Abstract One of the main principles in a design of experiment is to use blocking factors whenever it is possible. On

More information

Intro to Parametric & Nonparametric Statistics

Intro to Parametric & Nonparametric Statistics Kinds of variable The classics & some others Intro to Parametric & Nonparametric Statistics Kinds of variables & why we care Kinds & definitions of nonparametric statistics Where parametric stats come

More information

It can be shown that if X 1 ;X 2 ;:::;X n are independent r.v. s with

It can be shown that if X 1 ;X 2 ;:::;X n are independent r.v. s with Example: Alternative calculation of mean and variance of binomial distribution A r.v. X has the Bernoulli distribution if it takes the values 1 ( success ) or 0 ( failure ) with probabilities p and (1

More information

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL - MAY 2005 EXAMINATIONS STA 248 H1S. Duration - 3 hours. Aids Allowed: Calculator

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL - MAY 2005 EXAMINATIONS STA 248 H1S. Duration - 3 hours. Aids Allowed: Calculator UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL - MAY 2005 EXAMINATIONS STA 248 H1S Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 17 pages including

More information

Nonparametric Methods

Nonparametric Methods Nonparametric Methods Marc H. Mehlman marcmehlman@yahoo.com University of New Haven Nonparametric Methods, or Distribution Free Methods is for testing from a population without knowing anything about the

More information

Biostatistics Quantitative Data

Biostatistics Quantitative Data Biostatistics Quantitative Data Descriptive Statistics Statistical Models One-sample and Two-Sample Tests Introduction to SAS-ANALYST T- and Rank-Tests using ANALYST Thomas Scheike Quantitative Data This

More information

Math 2200 Fall 2014, Exam 3 You may use any calculator. You may use a 4 6 inch notecard as a cheat sheet.

Math 2200 Fall 2014, Exam 3 You may use any calculator. You may use a 4 6 inch notecard as a cheat sheet. 1 Math 2200 Fall 2014, Exam 3 You may use any calculator. You may use a 4 6 inch notecard as a cheat sheet. Warning to the Reader! If you are a student for whom this document is a historical artifact,

More information

Sleep data, two drugs Ch13.xls

Sleep data, two drugs Ch13.xls Model Based Statistics in Biology. Part IV. The General Linear Mixed Model.. Chapter 13.3 Fixed*Random Effects (Paired t-test) ReCap. Part I (Chapters 1,2,3,4), Part II (Ch 5, 6, 7) ReCap Part III (Ch

More information

STAT 135 Lab 8 Hypothesis Testing Review, Mann-Whitney Test by Normal Approximation, and Wilcoxon Signed Rank Test.

STAT 135 Lab 8 Hypothesis Testing Review, Mann-Whitney Test by Normal Approximation, and Wilcoxon Signed Rank Test. STAT 135 Lab 8 Hypothesis Testing Review, Mann-Whitney Test by Normal Approximation, and Wilcoxon Signed Rank Test. Rebecca Barter March 30, 2015 Mann-Whitney Test Mann-Whitney Test Recall that the Mann-Whitney

More information

Estimating the accuracy of a hypothesis Setting. Assume a binary classification setting

Estimating the accuracy of a hypothesis Setting. Assume a binary classification setting Estimating the accuracy of a hypothesis Setting Assume a binary classification setting Assume input/output pairs (x, y) are sampled from an unknown probability distribution D = p(x, y) Train a binary classifier

More information

7.3 Ridge Analysis of the Response Surface

7.3 Ridge Analysis of the Response Surface 7.3 Ridge Analysis of the Response Surface When analyzing a fitted response surface, the researcher may find that the stationary point is outside of the experimental design region, but the researcher wants

More information

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model

More information

NONPARAMETRICS. Statistical Methods Based on Ranks E. L. LEHMANN HOLDEN-DAY, INC. McGRAW-HILL INTERNATIONAL BOOK COMPANY

NONPARAMETRICS. Statistical Methods Based on Ranks E. L. LEHMANN HOLDEN-DAY, INC. McGRAW-HILL INTERNATIONAL BOOK COMPANY NONPARAMETRICS Statistical Methods Based on Ranks E. L. LEHMANN University of California, Berkeley With the special assistance of H. J. M. D'ABRERA University of California, Berkeley HOLDEN-DAY, INC. San

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

Review 6. n 1 = 85 n 2 = 75 x 1 = x 2 = s 1 = 38.7 s 2 = 39.2

Review 6. n 1 = 85 n 2 = 75 x 1 = x 2 = s 1 = 38.7 s 2 = 39.2 Review 6 Use the traditional method to test the given hypothesis. Assume that the samples are independent and that they have been randomly selected ) A researcher finds that of,000 people who said that

More information

Measuring relationships among multiple responses

Measuring relationships among multiple responses Measuring relationships among multiple responses Linear association (correlation, relatedness, shared information) between pair-wise responses is an important property used in almost all multivariate analyses.

More information

Chapter 11. Analysis of Variance (One-Way)

Chapter 11. Analysis of Variance (One-Way) Chapter 11 Analysis of Variance (One-Way) We now develop a statistical procedure for comparing the means of two or more groups, known as analysis of variance or ANOVA. These groups might be the result

More information

Lecture 3: Inference in SLR

Lecture 3: Inference in SLR Lecture 3: Inference in SLR STAT 51 Spring 011 Background Reading KNNL:.1.6 3-1 Topic Overview This topic will cover: Review of hypothesis testing Inference about 1 Inference about 0 Confidence Intervals

More information

Chapter 7: Theoretical Probability Distributions Variable - Measured/Categorized characteristic

Chapter 7: Theoretical Probability Distributions Variable - Measured/Categorized characteristic BSTT523: Pagano & Gavreau, Chapter 7 1 Chapter 7: Theoretical Probability Distributions Variable - Measured/Categorized characteristic Random Variable (R.V.) X Assumes values (x) by chance Discrete R.V.

More information

Chapter 16. Nonparametric Tests

Chapter 16. Nonparametric Tests Chapter 16 Nonparametric Tests The statistical tests we have examined so far are called parametric tests, because they assume the data have a known distribution, such as the normal, and test hypotheses

More information

Using Tables and Graphing Calculators in Math 11

Using Tables and Graphing Calculators in Math 11 Using Tables and Graphing Calculators in Math 11 Graphing calculators are not required for Math 11, but they are likely to be helpful, primarily because they allow you to avoid the use of tables in some

More information