Introduction to Statistical Analysis. Cancer Research UK 12 th of February 2018 D.-L. Couturier / M. Eldridge / M. Fernandes [Bioinformatics core]

Size: px
Start display at page:

Download "Introduction to Statistical Analysis. Cancer Research UK 12 th of February 2018 D.-L. Couturier / M. Eldridge / M. Fernandes [Bioinformatics core]"

Transcription

1 Introduction to Statistical Analysis Cancer Research UK 12 th of February 2018 D.-L. Couturier / M. Eldridge / M. Fernandes [Bioinformatics core]

2 2 Timeline 9:30 Morning I I 45mn Lecture: data type, summary statistics and graphical displays 15mn Quiz 10:30 15mn Co ee & Tea break I 60mn Lecture: some statistical distributions + CLT I 15mn Exercises & discussion 12:00 Lunch break 13:00 Afternoon I 45mn Lecture: Parametric tests for the mean I 30mn Exercises with shiny apps & discussion I I 30mn Lecture: Non-parametric tests for the mean 15mn Exercises & discussion 15:00 15mn Co ee & Tea break I 15mn Lecture: Tests for categorical variables I 15mn Exercises & discussion 16:00 Group based exercises I 60mn

3 Grand Picture of Statistics Population Sample Data (x 1,x 2,...,x n ) Parameters µ 2 Statistics bµ b 2 b 3

4 4 Data Types x 1 x 2 x 3 x n Cancer status C C C C Nucleic acid sequence C T T A 5-level pain score #ofdailyadmissionsata&e Gene expression intensity

5 5 Summary statistics and plots for qualitative data 5-level answers of 21 patients to the question How much did pain due to your ureteric stones interfere with your day to day activities? : 3, 1, 5, 3, 1, 1, 1, 5, 1, 3, 4, 1, 1, 4, 5, 5, 5, 5, 5, 4, 4, where I 1= Notatall, I 2= Alittlebit, I 3 = Somewhat, I 4= Quiteabit, I 5= Verymuch.

6 6 Summary statistics and plots for quantative data Gene expression values of gene CCND3 Cyclin D3 from 27 patients diagnosed with acute lymphoblastic leukaemia: x (1) x (2) x (3) x (4) x (5) x (6) x (7) x (8) x (9) x (10) x (11) x (12) x (13) x (14) x (15) x (16) x (17) x (18) x (19) x (20) x (21) x (22) x (23) x (24) x (25) x (26) x (27)

7 6 Summary statistics and plots for quantative data Gene expression values of gene CCND3 Cyclin D3 from 27 patients diagnosed with acute lymphoblastic leukaemia: x (1) x (2) x (3) x (4) x (5) x (6) x (7) x (8) x (9) x (10) x (11) x (12) x (13) x (14) x (15) x (16) x (17) x (18) x (19) x (20) x (21) x (22) x (23) x (24) x (25) x (26) x (27)

8 6 Summary statistics and plots for quantative data Gene expression values of gene CCND3 Cyclin D3 from 27 patients diagnosed with acute lymphoblastic leukaemia: x (1) x (2) x (3) x (4) x (5) x (6) x (7) x (8) x (9) x (10) x (11) x (12) x (13) x (14) x (15) x (16) x (17) x (18) x (19) x (20) x (21) x (22) x (23) x (24) x (25) x (26) x (27)

9 7 Two-sample case: independent versus paired samples Permeability constants of a placental membrane at term (X) and between 12 to 26 weeks gestational age (Y) X Y Hamilton depression scale factor measurements in 9 patients with mixed anxiety and depression, taken at the first (X) and second (Y) visit after initiation of a therapy (administration of a tranquilizer) X Y Y X

10 8 Quiz Time Sections 1 to 4 -ISfSCGp_EIVPPI_mnrJHttaKxln8vVoyjJFvS8BL1w/viewform

11 9 Some parametric distributions: Bernoulli distribution x 1 x 2 x 3 x n Cancer status C C C C If I n independent experiments, I outcome of each experiment is dichotomous (success/failure), I the probability of success is the same for all experiments, then, each dichotomous experiment, X i, follows a Bernoulli distribution with parameter : X i Bernoulli( ) P (X i =1)= P (X i =0)=1

12 Some parametric distributions: Binomial distribution 10 If I n independent experiments, I outcome of each experiment is dichotomous (success/failure), I the probability of success is the same for all experiments, then, I the number of successes out of n trials (experiments), Y = P n i=1 Xi, follows a binomial distribution with parameters n and : Y Bin(n, ), I the probability of observing exactly y successes out of n experiments, is given by P (Y = y n, ) = n! (n y)!y! y (1 ) n y.

13 Some parametric distributions: Binomial distribution If I n independent experiments, I outcome of each experiment is dichotomous (success/failure), I the probability of success is the same for all experiments, then, I the number of successes out of n trials (experiments), Y = P n i=1 Xi, follows a binomial distribution with parameters n and : 0.25 Y Bin(n, ), 0.20 probability P (Y = y n, ) = n! (n y)!y! y (1 ) n y Number of successes out of 50 experiments

14 Some parametric distributions: Poisson distribution 11 If, during a time interval or in a given area, I events occur independently, I at the same rate, I and the probability of an event to occur in a small interval (area) is proportional to the length of the interval (size of the area), then, I the number of events occurring in a fixed time interval or in a given area, X, may be modelled by means of a Poisson distribution with parameter : X Poisson( ), I the probability of observing x during a fixed time interval or in a given area is given by P (X = x )= x e x!.

15 Some parametric distributions: Poisson distribution If, during a time interval or in a given area, I events occur independently, I at the same rate, I and the probability of an event to occur in a small interval (area) is proportional to the length of the interval (size of the area), then, I the number of events occurring in a fixed time interval or in a given area, X, may be modelled by means of a Poisson distribution with parameter : X Poisson( ), 0.25 Probability P (X = x )= x e x! Number of chronic conditions per patient (US National Medical Expenditure Survey)

16 Some parametric distributions: Continuous distrib. 12 Density ex Gaussian skew t Gamma Inverse Gamma Gaussian

17 Some parametric distributions: Normal distribution 13 X N(µ, 2 ), f X (x) = 1 p 2 2 (x µ)2 e 2 2 E[X] =µ, Var[X] = 2, Z = X µ N(0, 1), f Z (z) = 1 p 2 e x2 2. Probability density function, f Z (z), of a standard normal: Density

18 Some parametric distributions: Normal distribution 13 X N(µ, 2 ), f X (x) = 1 p 2 2 (x µ)2 e 2 2 E[X] =µ, Var[X] = 2, Z = X µ N(0, 1), f Z (z) = 1 p 2 e x2 2. Probability density function, f Z (z), of a standard normal: µ 3 µ 2 µ µ µ + µ +2 µ % 95.45% 99.73%

19 Some parametric distributions: Normal distribution 13 X N(µ, 2 ), f X (x) = 1 p 2 2 (x µ)2 e 2 2 E[X] =µ, Var[X] = 2, Z = X µ N(0, 1), f Z (z) = 1 p 2 e x2 2. (i) Suitable modelling for a lot of variables

20 Some parametric distributions: Normal distribution 13 X N(µ, 2 ), f X (x) = 1 p 2 2 (x µ)2 e 2 2 E[X] =µ, Var[X] = 2, Z = X µ N(0, 1), f Z (z) = 1 p 2 e x2 2. (i) Suitable modelling for a lot of variables

21 Some parametric distributions: Normal distribution 13 X N(µ, 2 ), f X (x) = 1 p 2 2 (x µ)2 e 2 2 E[X] =µ, Var[X] = 2, Z = X µ N(0, 1), f Z (z) = 1 p 2 e x2 2. (i) Suitable modelling for a lot of variables: IQ % 95.45% 99.73%

22 Some parametric distributions: Normal distribution 13 X N(µ, 2 ), f X (x) = 1 p 2 2 E[X] =µ, Var[X] = 2, (x µ)2 e 2 2 Z = X µ N(0, 1), f Z (z) = 1 p 2 e x2 2. (ii) Central limit theorem (Lindeberg-Lévy CLT). Let (X 1,...,X n ) be n independent and identically distributed (iid) random variables drawn from distributions of expected values given by µ and finite variances given by 2,. then If X i N(µ, P n i=1 bµ = X = X i n d! N µ, 2 n. 2 ),thisresultistrueforallsamplesizes.

23 14 Central limit theorem shiny app: Distribution of the mean

24 95% Confidence interval for µ, the population mean, when X i N(µ, 2 ) 15 I if X N(µ, I if X N(µ, 2 ),thenx N µ, 2 n, 2 ),thenz = X µ N(0, 1), P < <! =0.95

25 95% Confidence interval for µ, the population mean, when X i N(µ, 2 ) I if X N(µ, 2 ),thenx N µ, 2 n, I if X N(µ, 2 ),thenz = X µ N(0, 1), I if unknown, then T = X µ St s n 1. P < <! =0.95 Density X N(0, 1) T St30 T St10 T St2 ft (t n 1) = ( n 2 2 ) ( n 1 2 )[(n 1) ] 1/2 1+ t2 n 1 n % % % %

26 95% Confidence interval for µ, the population mean, when X i N(µ, 2 ) I if X N(µ, 2 ),thenx N µ, 2 n, I if X N(µ, 2 ),thenz = X µ N(0, 1), I if unknown, then T = X µ St s n 1. P < <! =

27 95% Confidence interval for µ, the population mean, when X i iid(µ, 2 ) I CLT: X d! N µ, 2 n, I if X N(µ, 2 ),thenz = X µ N(0, 1), I if unknown, then T = X µ St s n 1. P < <! = Probability Number of chronic conditions per patient (US National Medical Expenditure Survey: n = 4406, bµ = 1.55)

28 95% Confidence interval for µ Y µ X,thedi erence between population means 17 If we have I X i iid(µ X, 2 X ), i =1,...,n X, I Y i iid(µ Y, 2 Y ), i =1,...,n Y, then I if 2 X = Y 2 [Student s t-test equation], q. CI (µ Y µ X, 0.95) = (Y X) ± t 1 2,n X +n Y 2s p I if 2 X 6= 2 Y where s p = (n X 1)s 2 X +(n Y 1)s 2 Y n X +n Y 2, [Welch-Satterthwaite s t-test equation],. CI (µ Y µ X, 0.95) = (Y X) ± t 1 df = s 2 2 X n + s2 Y X ny s 2 2 X n X n X 1 + s 2 2 Y n Y n Y 1. q s 2 2,df X nx 1 n X + 1 n Y + s2 Y n Y,where

29 18 Central limit theorem shiny app: Coverage of Student s asymptotic confidence intervals

30 Quiz Time Practical IntroductionToStats/practical.html

31 PART II: Parametric tests Cancer Research UK 24 th of April 2017 D.-L. Couturier / M. Dunning / M. Eldridge [Bioinformatics core]

32 Grand Picture of Statistics Population Sample Data (x 1,x 2,...,x n ) Parameters µ 2 Statistics bµ b 2 b 21

33 Statistical hypothesis testing 22 A hypothesis test describes a phenomenon by means of two non-overlapping idealised models/descriptions: I the null hypothesis H0, generally assumed to be true until evidence indicates otherwise I the alternative hypothesis H1. The aim of the test is to reject the null hypothesis in favour of the alternative hypothesis, and conclude, with a probability of being wrong, that the idealised model/description of H1 is true. Theory 1: Dieters lose more fat than the exercisers Theory 2: There is no majority for Brexit now Theory 3: Serum vitamin C is reduced in patients

34 Statistical hypothesis testing 23 Several-step process: I Define H0 and H1 according to a theory I Set, the probability of rejecting H0 when it is true (type I error), I Define n, the sample size, allowing you to reject H0 when H1 is true with a probability 1 (Power), I Determine the test statistic to be used, I Collect the data, I Perform the statistical test, define the p-value, and reject (or not) the null hypothesis.

35 24 Statistical hypothesis testing Example: One-sample two-sided t-test We test: H0: µ IQ =100, H1: µ IQ > We have X i N(µ, We know I X N µ, 2 ),i=1,...,n, 2 n, I Z = X µ p n N(0, 1), Thus, if H0 is true, we have: I Z = X µ 0 p n N(0, 1). Define the p-value: I p value = P ( T >T obs ) Density % 95.45% 99.73%

36 Statistical hypothesis testing 4possibleoutcomes 25 Conclude: I if p-value >! do not reject H0. I if p-value <! reject H0 in favour of H1. Unknown Truth Test Outcome H0 not rejected H1 accepted H0 true 1 H1 true 1 where I is the type I error, I is the type II error.

37 Statistical hypothesis testing Example: One-sided binomial exact test We test: H0: =5%, H1: >5%. We have X i Bernoulli( ),i=1,...,n, We know I Y = P n i=1 Xi Binomial (, n), Thus, if H0 is true, we have: I Y = P n i=1 Xi Binomial (5%,n), Define the p-value: I p value = P (Y > Y obs ) Probability n = 25 n =

38 Two-sample two-sided Student-s & Welch s t-tests Acute myeloid leukemia (AML) n=11 Acute lymphoblastic leukemia (ALL) n=27 Intensity expression of gene 'CCND3 Cyclin D3' We test H0: µ Y µ X =0 against H1: µ Y µ X 6=0. We know: I Student s t-test [assume 2 X = Y 2 ]: (Y X) (µ Y µ X ) q s 1 t 1 p + n 1 X n Y 2,n X +n Y 2 I Welch s t-test [assume 2 X 6= Y 2 ]: (Y X) (µ Y µ X ) r t s 2 1 X + s2 2,df Y n X ny Two Sample t-test 27 data: golub[1042, gol.fac == "ALL"] and golub[1042, gol.fac == "AML"] t = , df = 36, p-value = 6.046e-08 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: sample estimates: mean of x mean of y

39 Two-sample two-sided Student-s & Welch s t-tests Acute myeloid leukemia (AML) n=11 Acute lymphoblastic leukemia (ALL) n=27 Intensity expression of gene 'CCND3 Cyclin D3' We test H0: µ Y µ X =0 against H1: µ Y µ X 6=0. We know: I Student s t-test [assume 2 X = Y 2 ]: (Y X) (µ Y µ X ) q s 1 t 1 p + n 1 X n Y 2,n X +n Y 2 I Welch s t-test [assume 2 X 6= Y 2 ]: (Y X) (µ Y µ X ) r t s 2 1 X + s2 2,df Y n X ny Welch Two Sample t-test 27 data: golub[1042, gol.fac == "ALL"] and golub[1042, gol.fac == "AML"] t = , df = , p-value = 9.871e-06 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: sample estimates: mean of x mean of y

40 F-test of equality of variances 28 Acute myeloid leukemia (AML) n=11 Acute lymphoblastic leukemia (ALL) n=27 Intensity expression of gene 'CCND3 Cyclin D3' We test H0: 2 Y = 2 X against H1: 2 Y 6= 2 X. We know: I F-test [assume X i N(µ X, X) and Y i N(µ Y, Y )]: s 2 Y s 2 X F ny 1,n X 1 F test to compare two variances data: golub[1042, gol.fac == "ALL"] and golub[1042, gol.fac == "AML"] F = , num df = 26, denom df = 10, p-value = alternative hypothesis: true ratio of variances is not equal to 1 95 percent confidence interval: sample estimates: ratio of variances

41 Multiplicity correction For each test, the probability of rejecting H0 (and accept H1) when H0 is true equals. For k tests, the probability of rejecting H0 (and accept H1) at least 1 time when H0 is true, k, is given by k =1 (1 ) k. Thus, for =0.05, I if k =1, 1 =1 (1 ) 1 =0.05, I if k =2, 2 =1 (1 ) 2 =0.0975, I if k = 10, 10 =1 (1 ) 10 = Idea: change the level of each test so that k =0.05: I Bonferroni correction : = k k, I Dunn-Sidak correction: =1 (1 k ) 1/k. 29

42 Introduction to Shiny Apps and Exercises 30

43 PART III: Non-parametric tests Cancer Research UK 24 th of April 2017 D.-L. Couturier / M. Dunning / M. Eldridge [Bioinformatics core]

44 Parametric or non-parametric? 32 T-test Outcome(s) normally distributed Yes Mildly No Small Sample size Medium Large Situations which may suggest the use of non-parametric statistics: I When there is a small sample size or very unequal groups, I When the data has notable outliers, I When one outcome has a distribution other than normal, I When the data are ordered with many ties or are rank ordered.

45 Sign test 33 A location model is assumed for X i, i =1,...,n: where e i iid(µ e =0, 2 e). X i = + e i, Interest for H0: = 0 against H1: < 0 or 6= 0 or > 0. Test statistics: S = P n i=1 (Xi 0 > 0). Distribution of S under H0: Exact binomial test S Binomial(0.5,n). data: 21 and 27 number of successes = 21, number of trials = 27, p-value = alternative hypothesis: true probability of success is not equal to percent confidence interval: sample estimates: probability of success Probability Number of successes out of 27 experiments

46 Wilcoxon sign-rank test 34 A location model is assumed for X i, i =1,...,n: where e i iid(µ e =0, 2 e). X i = + e i, Interest for H0: = 0 against H1: < 0 or 6= 0 or > 0. Test statistics : W + = P n i=1 (Xi 0 > 0) Rank( Xi 0 ). Distribution of W under H0: W + has no closed-form distribution. Wilcoxon signed rank test data: golub[1042, gol.fac == "ALL"] V = 268, p-value = alternative hypothesis: true location is not equal to percent confidence interval: sample estimates: (pseudo)median

47 Mann-Whitney-Wilcoxon test: Shift in location Let I Xi iid(µ X, 2 ), i =1,...,n X, I Y i iid(µ X +, 2 ), i =1,...,n Y. Acute myeloid leukemia (AML) n=11 Acute lymphoblastic leukemia (ALL) n=27 Intensity expression of gene 'CCND3 Cyclin D3' Interest for H0: = 0 against H1: < 0 or 6= 0 or > 0. Standardised test statistic: z = P ny i=1 R(Y i) [n Y (n X +n Y +1)/2] p n X n Y (n X +n Y +1)/ where R(Y i) denotes the rank of Y i amongst the combined samples, i.e., amongst (X 1,...,X nx,y 1,...,Y ny )., Distribution of Z under H0: Z N(0, 1) Implementation 1: statistic = , p-value = e Implementation 2: W = 284, p-value = 6.15e-07 alternative hypothesis: true location shift is not equal to 0 95 percent confidence interval: sample estimates: difference in location Density

48 Non-parametric is not assumption free Shift in location tests when H0 is true 36 Simulate 2500 samples with I X i Uniform(1.5, 2.5), i =1,...,n X, I Y i Uniform(0, 4), i =1,...,n Y, so that E[X i]=e[y i]=2(i.e., same mean, same median). Assume I X i iid(µ X, 2 ), i =1,...,n X, I Y i iid(µ X +, 2 ), i =1,...,n Y. Test H0: = 0 against H1: 6= 0,atthe5%level,bymeansof I Mann-Whitney-Wilcoxon test (MWW), I T-test, I Welch-test. Sample size b Tests MWW Student s t-test Welch s test n X =200,n Y = n X =20,n Y =

49 Exercises 37

50 PART IV: Tests for categorical variables Cancer Research UK 24 th of April 2017 D.-L. Couturier / M. Dunning / M. Eldridge [Bioinformatics core]

51 39 2 goodness-of-fit test A trial to assess the e ectiveness of a new treatment versus a placebo in reducing tumour size in patients with ovarian cancer: Observed frequencies Binary outcome Group Tumour did not shrink Tumour did shrink Treatment (84) Placebo (40) (68) (56) (124) I H0 : No association between treatment group and tumour shrinkage, I H1 : Some association. Expected frequencies under H0 Binary outcome Group Treatment Placebo Tumour did not shrink = = Tumour did shrink = (84) = (40) (68) (56) (124) We have 2 categorical variables with a total of J =4cells (categories). I H0 : j = j0,j =1,...,J, I H1 : j 6= j0,j =1,...,J. JP 2 (O -test: j E j ) 2 E j 2 (J 1). j=1

52 2 goodness-of-fit test A trial to assess the e ectiveness of a new treatment versus a placebo in reducing tumour size in patients with ovarian cancer: Observed frequencies Binary outcome Group Tumour did not shrink Tumour did shrink Treatment (84) Placebo (40) (68) (56) (124) 39 I H0 : No association between treatment group and tumour shrinkage, I H1 : Some association. Expected frequencies under H0 Binary outcome Group Tumour did not shrink Tumour did shrink Treatment (84) Placebo (40) (68) (56) (124) We have 2 categorical variables with a total of J =4cells (categories). I H0 : j = j0,j =1,...,J, I H1 : j 6= j0,j =1,...,J. JP 2 (O -test: j E j ) 2 E j 2 (J 1). j=1 Pearson s Chi-squared test with Yates continuity correction data: M X-squared = , df = 1, p-value =

53 Fisher s exact test of independence 2 goodness-of-fit test not suitable when I n is small I E j < 5 for at least one cell. Observed frequencies Group Binary outcome Tumour did not shrink Tumour did shrink Treatment (84) Placebo (40) (68) (56) (124) Fisher showed that, under H0 (independence), P (observed table H0) =P (X = a) and X Hypergeometric(n, a + c, a + b). To compute the Fisher s test: I Define P (X = a) for all possible tables having the observed marginal counts, I Calculate the p value by defining the percentage of these tables that get a probability equal to or smaller than the one observed. Fisher s Exact Test for Count Data 40 data: M p-value = alternative hypothesis: true odds ratio is not equal to 1 95 percent confidence interval: sample estimates: odds ratio

Analysis of Variance (ANOVA) Cancer Research UK 10 th of May 2018 D.-L. Couturier / R. Nicholls / M. Fernandes

Analysis of Variance (ANOVA) Cancer Research UK 10 th of May 2018 D.-L. Couturier / R. Nicholls / M. Fernandes Analysis of Variance (ANOVA) Cancer Research UK 10 th of May 2018 D.-L. Couturier / R. Nicholls / M. Fernandes 2 Quick review: Normal distribution Y N(µ, σ 2 ), f Y (y) = 1 2πσ 2 (y µ)2 e 2σ 2 E[Y ] =

More information

Lecture 7: Hypothesis Testing and ANOVA

Lecture 7: Hypothesis Testing and ANOVA Lecture 7: Hypothesis Testing and ANOVA Goals Overview of key elements of hypothesis testing Review of common one and two sample tests Introduction to ANOVA Hypothesis Testing The intent of hypothesis

More information

Statistics Applied to Bioinformatics. Tests of homogeneity

Statistics Applied to Bioinformatics. Tests of homogeneity Statistics Applied to Bioinformatics Tests of homogeneity Two-tailed test of homogeneity Two-tailed test H 0 :m = m Principle of the test Estimate the difference between m and m Compare this estimation

More information

Contents 1. Contents

Contents 1. Contents Contents 1 Contents 1 One-Sample Methods 3 1.1 Parametric Methods.................... 4 1.1.1 One-sample Z-test (see Chapter 0.3.1)...... 4 1.1.2 One-sample t-test................. 6 1.1.3 Large sample

More information

Nonparametric tests. Mark Muldoon School of Mathematics, University of Manchester. Mark Muldoon, November 8, 2005 Nonparametric tests - p.

Nonparametric tests. Mark Muldoon School of Mathematics, University of Manchester. Mark Muldoon, November 8, 2005 Nonparametric tests - p. Nonparametric s Mark Muldoon School of Mathematics, University of Manchester Mark Muldoon, November 8, 2005 Nonparametric s - p. 1/31 Overview The sign, motivation The Mann-Whitney Larger Larger, in pictures

More information

Comparison of Two Population Means

Comparison of Two Population Means Comparison of Two Population Means Esra Akdeniz March 15, 2015 Independent versus Dependent (paired) Samples We have independent samples if we perform an experiment in two unrelated populations. We have

More information

Lecture 5: ANOVA and Correlation

Lecture 5: ANOVA and Correlation Lecture 5: ANOVA and Correlation Ani Manichaikul amanicha@jhsph.edu 23 April 2007 1 / 62 Comparing Multiple Groups Continous data: comparing means Analysis of variance Binary data: comparing proportions

More information

Inferential Statistics

Inferential Statistics Inferential Statistics Eva Riccomagno, Maria Piera Rogantin DIMA Università di Genova riccomagno@dima.unige.it rogantin@dima.unige.it Part G Distribution free hypothesis tests 1. Classical and distribution-free

More information

Contents. Acknowledgments. xix

Contents. Acknowledgments. xix Table of Preface Acknowledgments page xv xix 1 Introduction 1 The Role of the Computer in Data Analysis 1 Statistics: Descriptive and Inferential 2 Variables and Constants 3 The Measurement of Variables

More information

Frequency table: Var2 (Spreadsheet1) Count Cumulative Percent Cumulative From To. Percent <x<=

Frequency table: Var2 (Spreadsheet1) Count Cumulative Percent Cumulative From To. Percent <x<= A frequency distribution is a kind of probability distribution. It gives the frequency or relative frequency at which given values have been observed among the data collected. For example, for age, Frequency

More information

Session 3 The proportional odds model and the Mann-Whitney test

Session 3 The proportional odds model and the Mann-Whitney test Session 3 The proportional odds model and the Mann-Whitney test 3.1 A unified approach to inference 3.2 Analysis via dichotomisation 3.3 Proportional odds 3.4 Relationship with the Mann-Whitney test Session

More information

E509A: Principle of Biostatistics. (Week 11(2): Introduction to non-parametric. methods ) GY Zou.

E509A: Principle of Biostatistics. (Week 11(2): Introduction to non-parametric. methods ) GY Zou. E509A: Principle of Biostatistics (Week 11(2): Introduction to non-parametric methods ) GY Zou gzou@robarts.ca Sign test for two dependent samples Ex 12.1 subj 1 2 3 4 5 6 7 8 9 10 baseline 166 135 189

More information

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

An introduction to biostatistics: part 1

An introduction to biostatistics: part 1 An introduction to biostatistics: part 1 Cavan Reilly September 6, 2017 Table of contents Introduction to data analysis Uncertainty Probability Conditional probability Random variables Discrete random

More information

16.400/453J Human Factors Engineering. Design of Experiments II

16.400/453J Human Factors Engineering. Design of Experiments II J Human Factors Engineering Design of Experiments II Review Experiment Design and Descriptive Statistics Research question, independent and dependent variables, histograms, box plots, etc. Inferential

More information

Comparison of Two Samples

Comparison of Two Samples 2 Comparison of Two Samples 2.1 Introduction Problems of comparing two samples arise frequently in medicine, sociology, agriculture, engineering, and marketing. The data may have been generated by observation

More information

Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami

Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami Parametric Assumptions The observations must be independent. Dependent variable should be continuous

More information

Resampling Methods. Lukas Meier

Resampling Methods. Lukas Meier Resampling Methods Lukas Meier 20.01.2014 Introduction: Example Hail prevention (early 80s) Is a vaccination of clouds really reducing total energy? Data: Hail energy for n clouds (via radar image) Y i

More information

The t-distribution. Patrick Breheny. October 13. z tests The χ 2 -distribution The t-distribution Summary

The t-distribution. Patrick Breheny. October 13. z tests The χ 2 -distribution The t-distribution Summary Patrick Breheny October 13 Patrick Breheny Biostatistical Methods I (BIOS 5710) 1/25 Introduction Introduction What s wrong with z-tests? So far we ve (thoroughly!) discussed how to carry out hypothesis

More information

Statistics in medicine

Statistics in medicine Statistics in medicine Lecture 3: Bivariate association : Categorical variables Proportion in one group One group is measured one time: z test Use the z distribution as an approximation to the binomial

More information

HYPOTHESIS TESTING II TESTS ON MEANS. Sorana D. Bolboacă

HYPOTHESIS TESTING II TESTS ON MEANS. Sorana D. Bolboacă HYPOTHESIS TESTING II TESTS ON MEANS Sorana D. Bolboacă OBJECTIVES Significance value vs p value Parametric vs non parametric tests Tests on means: 1 Dec 14 2 SIGNIFICANCE LEVEL VS. p VALUE Materials and

More information

Solutions exercises of Chapter 7

Solutions exercises of Chapter 7 Solutions exercises of Chapter 7 Exercise 1 a. These are paired samples: each pair of half plates will have about the same level of corrosion, so the result of polishing by the two brands of polish are

More information

Non-parametric tests, part A:

Non-parametric tests, part A: Two types of statistical test: Non-parametric tests, part A: Parametric tests: Based on assumption that the data have certain characteristics or "parameters": Results are only valid if (a) the data are

More information

PHP2510: Principles of Biostatistics & Data Analysis. Lecture X: Hypothesis testing. PHP 2510 Lec 10: Hypothesis testing 1

PHP2510: Principles of Biostatistics & Data Analysis. Lecture X: Hypothesis testing. PHP 2510 Lec 10: Hypothesis testing 1 PHP2510: Principles of Biostatistics & Data Analysis Lecture X: Hypothesis testing PHP 2510 Lec 10: Hypothesis testing 1 In previous lectures we have encountered problems of estimating an unknown population

More information

Inference for Distributions Inference for the Mean of a Population

Inference for Distributions Inference for the Mean of a Population Inference for Distributions Inference for the Mean of a Population PBS Chapter 7.1 009 W.H Freeman and Company Objectives (PBS Chapter 7.1) Inference for the mean of a population The t distributions The

More information

* Tuesday 17 January :30-16:30 (2 hours) Recored on ESSE3 General introduction to the course.

* Tuesday 17 January :30-16:30 (2 hours) Recored on ESSE3 General introduction to the course. Name of the course Statistical methods and data analysis Audience The course is intended for students of the first or second year of the Graduate School in Materials Engineering. The aim of the course

More information

Review of One-way Tables and SAS

Review of One-way Tables and SAS Stat 504, Lecture 7 1 Review of One-way Tables and SAS In-class exercises: Ex1, Ex2, and Ex3 from http://v8doc.sas.com/sashtml/proc/z0146708.htm To calculate p-value for a X 2 or G 2 in SAS: http://v8doc.sas.com/sashtml/lgref/z0245929.htmz0845409

More information

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 27 pages

More information

Exam details. Final Review Session. Things to Review

Exam details. Final Review Session. Things to Review Exam details Final Review Session Short answer, similar to book problems Formulae and tables will be given You CAN use a calculator Date and Time: Dec. 7, 006, 1-1:30 pm Location: Osborne Centre, Unit

More information

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007) FROM: PAGANO, R. R. (007) I. INTRODUCTION: DISTINCTION BETWEEN PARAMETRIC AND NON-PARAMETRIC TESTS Statistical inference tests are often classified as to whether they are parametric or nonparametric Parameter

More information

One-sample categorical data: approximate inference

One-sample categorical data: approximate inference One-sample categorical data: approximate inference Patrick Breheny October 6 Patrick Breheny Biostatistical Methods I (BIOS 5710) 1/25 Introduction It is relatively easy to think about the distribution

More information

Introduction to Statistical Analysis

Introduction to Statistical Analysis Introduction to Statistical Analysis Changyu Shen Richard A. and Susan F. Smith Center for Outcomes Research in Cardiology Beth Israel Deaconess Medical Center Harvard Medical School Objectives Descriptive

More information

S D / n t n 1 The paediatrician observes 3 =

S D / n t n 1 The paediatrician observes 3 = Non-parametric tests Paired t-test A paediatrician measured the blood cholesterol of her patients and was worried to note that some had levels over 00mg/100ml To investigate whether dietary regulation

More information

Introduction and Descriptive Statistics p. 1 Introduction to Statistics p. 3 Statistics, Science, and Observations p. 5 Populations and Samples p.

Introduction and Descriptive Statistics p. 1 Introduction to Statistics p. 3 Statistics, Science, and Observations p. 5 Populations and Samples p. Preface p. xi Introduction and Descriptive Statistics p. 1 Introduction to Statistics p. 3 Statistics, Science, and Observations p. 5 Populations and Samples p. 6 The Scientific Method and the Design of

More information

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics DETAILED CONTENTS About the Author Preface to the Instructor To the Student How to Use SPSS With This Book PART I INTRODUCTION AND DESCRIPTIVE STATISTICS 1. Introduction to Statistics 1.1 Descriptive and

More information

Confidence Intervals, Testing and ANOVA Summary

Confidence Intervals, Testing and ANOVA Summary Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0

More information

Stat 427/527: Advanced Data Analysis I

Stat 427/527: Advanced Data Analysis I Stat 427/527: Advanced Data Analysis I Review of Chapters 1-4 Sep, 2017 1 / 18 Concepts you need to know/interpret Numerical summaries: measures of center (mean, median, mode) measures of spread (sample

More information

Physics 509: Non-Parametric Statistics and Correlation Testing

Physics 509: Non-Parametric Statistics and Correlation Testing Physics 509: Non-Parametric Statistics and Correlation Testing Scott Oser Lecture #19 Physics 509 1 What is non-parametric statistics? Non-parametric statistics is the application of statistical tests

More information

Nonparametric tests. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 704: Data Analysis I

Nonparametric tests. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 704: Data Analysis I 1 / 16 Nonparametric tests Timothy Hanson Department of Statistics, University of South Carolina Stat 704: Data Analysis I Nonparametric one and two-sample tests 2 / 16 If data do not come from a normal

More information

Statistics Handbook. All statistical tables were computed by the author.

Statistics Handbook. All statistical tables were computed by the author. Statistics Handbook Contents Page Wilcoxon rank-sum test (Mann-Whitney equivalent) Wilcoxon matched-pairs test 3 Normal Distribution 4 Z-test Related samples t-test 5 Unrelated samples t-test 6 Variance

More information

APPENDIX B Sample-Size Calculation Methods: Classical Design

APPENDIX B Sample-Size Calculation Methods: Classical Design APPENDIX B Sample-Size Calculation Methods: Classical Design One/Paired - Sample Hypothesis Test for the Mean Sign test for median difference for a paired sample Wilcoxon signed - rank test for one or

More information

Lecture 7: Confidence interval and Normal approximation

Lecture 7: Confidence interval and Normal approximation Lecture 7: Confidence interval and Normal approximation 26th of November 2015 Confidence interval 26th of November 2015 1 / 23 Random sample and uncertainty Example: we aim at estimating the average height

More information

Institute of Actuaries of India

Institute of Actuaries of India Institute of Actuaries of India Subject CT3 Probability & Mathematical Statistics May 2011 Examinations INDICATIVE SOLUTION Introduction The indicative solution has been written by the Examiners with the

More information

STAT 4385 Topic 01: Introduction & Review

STAT 4385 Topic 01: Introduction & Review STAT 4385 Topic 01: Introduction & Review Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso xsu@utep.edu Spring, 2016 Outline Welcome What is Regression Analysis? Basics

More information

Small n, σ known or unknown, underlying nongaussian

Small n, σ known or unknown, underlying nongaussian READY GUIDE Summary Tables SUMMARY-1: Methods to compute some confidence intervals Parameter of Interest Conditions 95% CI Proportion (π) Large n, p 0 and p 1 Equation 12.11 Small n, any p Figure 12-4

More information

Lecture 9. Selected material from: Ch. 12 The analysis of categorical data and goodness of fit tests

Lecture 9. Selected material from: Ch. 12 The analysis of categorical data and goodness of fit tests Lecture 9 Selected material from: Ch. 12 The analysis of categorical data and goodness of fit tests Univariate categorical data Univariate categorical data are best summarized in a one way frequency table.

More information

Lecture 21: October 19

Lecture 21: October 19 36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 21: October 19 21.1 Likelihood Ratio Test (LRT) To test composite versus composite hypotheses the general method is to use

More information

UNIVERSITY OF TORONTO Faculty of Arts and Science

UNIVERSITY OF TORONTO Faculty of Arts and Science UNIVERSITY OF TORONTO Faculty of Arts and Science December 2013 Final Examination STA442H1F/2101HF Methods of Applied Statistics Jerry Brunner Duration - 3 hours Aids: Calculator Model(s): Any calculator

More information

Spring 2012 Math 541B Exam 1

Spring 2012 Math 541B Exam 1 Spring 2012 Math 541B Exam 1 1. A sample of size n is drawn without replacement from an urn containing N balls, m of which are red and N m are black; the balls are otherwise indistinguishable. Let X denote

More information

Statistics: revision

Statistics: revision NST 1B Experimental Psychology Statistics practical 5 Statistics: revision Rudolf Cardinal & Mike Aitken 29 / 30 April 2004 Department of Experimental Psychology University of Cambridge Handouts: Answers

More information

Analyzing Small Sample Experimental Data

Analyzing Small Sample Experimental Data Analyzing Small Sample Experimental Data Session 2: Non-parametric tests and estimators I Dominik Duell (University of Essex) July 15, 2017 Pick an appropriate (non-parametric) statistic 1. Intro to non-parametric

More information

1 Hypothesis testing for a single mean

1 Hypothesis testing for a single mean This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

More information

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL - MAY 2005 EXAMINATIONS STA 248 H1S. Duration - 3 hours. Aids Allowed: Calculator

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL - MAY 2005 EXAMINATIONS STA 248 H1S. Duration - 3 hours. Aids Allowed: Calculator UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL - MAY 2005 EXAMINATIONS STA 248 H1S Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 17 pages including

More information

Exam Applied Statistical Regression. Good Luck!

Exam Applied Statistical Regression. Good Luck! Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.

More information

Multiple Sample Categorical Data

Multiple Sample Categorical Data Multiple Sample Categorical Data paired and unpaired data, goodness-of-fit testing, testing for independence University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html

More information

Chapter 7 Comparison of two independent samples

Chapter 7 Comparison of two independent samples Chapter 7 Comparison of two independent samples 7.1 Introduction Population 1 µ σ 1 1 N 1 Sample 1 y s 1 1 n 1 Population µ σ N Sample y s n 1, : population means 1, : population standard deviations N

More information

MATH4427 Notebook 5 Fall Semester 2017/2018

MATH4427 Notebook 5 Fall Semester 2017/2018 MATH4427 Notebook 5 Fall Semester 2017/2018 prepared by Professor Jenny Baglivo c Copyright 2009-2018 by Jenny A. Baglivo. All Rights Reserved. 5 MATH4427 Notebook 5 3 5.1 Two Sample Analysis: Difference

More information

THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE

THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE THE ROYAL STATISTICAL SOCIETY 004 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE PAPER II STATISTICAL METHODS The Society provides these solutions to assist candidates preparing for the examinations in future

More information

Nonparametric hypothesis tests and permutation tests

Nonparametric hypothesis tests and permutation tests Nonparametric hypothesis tests and permutation tests 1.7 & 2.3. Probability Generating Functions 3.8.3. Wilcoxon Signed Rank Test 3.8.2. Mann-Whitney Test Prof. Tesler Math 283 Fall 2018 Prof. Tesler Wilcoxon

More information

Cramér-Type Moderate Deviation Theorems for Two-Sample Studentized (Self-normalized) U-Statistics. Wen-Xin Zhou

Cramér-Type Moderate Deviation Theorems for Two-Sample Studentized (Self-normalized) U-Statistics. Wen-Xin Zhou Cramér-Type Moderate Deviation Theorems for Two-Sample Studentized (Self-normalized) U-Statistics Wen-Xin Zhou Department of Mathematics and Statistics University of Melbourne Joint work with Prof. Qi-Man

More information

Non-parametric (Distribution-free) approaches p188 CN

Non-parametric (Distribution-free) approaches p188 CN Week 1: Introduction to some nonparametric and computer intensive (re-sampling) approaches: the sign test, Wilcoxon tests and multi-sample extensions, Spearman s rank correlation; the Bootstrap. (ch14

More information

Nonparametric Location Tests: k-sample

Nonparametric Location Tests: k-sample Nonparametric Location Tests: k-sample Nathaniel E. Helwig Assistant Professor of Psychology and Statistics University of Minnesota (Twin Cities) Updated 04-Jan-2017 Nathaniel E. Helwig (U of Minnesota)

More information

TABLE OF CONTENTS CHAPTER 1 COMBINATORIAL PROBABILITY 1

TABLE OF CONTENTS CHAPTER 1 COMBINATORIAL PROBABILITY 1 TABLE OF CONTENTS CHAPTER 1 COMBINATORIAL PROBABILITY 1 1.1 The Probability Model...1 1.2 Finite Discrete Models with Equally Likely Outcomes...5 1.2.1 Tree Diagrams...6 1.2.2 The Multiplication Principle...8

More information

Biostatistics for biomedical profession. BIMM34 Karin Källen & Linda Hartman November-December 2015

Biostatistics for biomedical profession. BIMM34 Karin Källen & Linda Hartman November-December 2015 Biostatistics for biomedical profession BIMM34 Karin Källen & Linda Hartman November-December 2015 12015-11-02 Who needs a course in biostatistics? - Anyone who uses quntitative methods to interpret biological

More information

Y i = η + ɛ i, i = 1,...,n.

Y i = η + ɛ i, i = 1,...,n. Nonparametric tests If data do not come from a normal population (and if the sample is not large), we cannot use a t-test. One useful approach to creating test statistics is through the use of rank statistics.

More information

Transition Passage to Descriptive Statistics 28

Transition Passage to Descriptive Statistics 28 viii Preface xiv chapter 1 Introduction 1 Disciplines That Use Quantitative Data 5 What Do You Mean, Statistics? 6 Statistics: A Dynamic Discipline 8 Some Terminology 9 Problems and Answers 12 Scales of

More information

Name: Biostatistics 1 st year Comprehensive Examination: Applied in-class exam. June 8 th, 2016: 9am to 1pm

Name: Biostatistics 1 st year Comprehensive Examination: Applied in-class exam. June 8 th, 2016: 9am to 1pm Name: Biostatistics 1 st year Comprehensive Examination: Applied in-class exam June 8 th, 2016: 9am to 1pm Instructions: 1. This is exam is to be completed independently. Do not discuss your work with

More information

Review for Final. Chapter 1 Type of studies: anecdotal, observational, experimental Random sampling

Review for Final. Chapter 1 Type of studies: anecdotal, observational, experimental Random sampling Review for Final For a detailed review of Chapters 1 7, please see the review sheets for exam 1 and. The following only briefly covers these sections. The final exam could contain problems that are included

More information

Basics on t-tests Independent Sample t-tests Single-Sample t-tests Summary of t-tests Multiple Tests, Effect Size Proportions. Statistiek I.

Basics on t-tests Independent Sample t-tests Single-Sample t-tests Summary of t-tests Multiple Tests, Effect Size Proportions. Statistiek I. Statistiek I t-tests John Nerbonne CLCG, Rijksuniversiteit Groningen http://www.let.rug.nl/nerbonne/teach/statistiek-i/ John Nerbonne 1/46 Overview 1 Basics on t-tests 2 Independent Sample t-tests 3 Single-Sample

More information

= 1 i. normal approximation to χ 2 df > df

= 1 i. normal approximation to χ 2 df > df χ tests 1) 1 categorical variable χ test for goodness-of-fit ) categorical variables χ test for independence (association, contingency) 3) categorical variables McNemar's test for change χ df k (O i 1

More information

13.1 Categorical Data and the Multinomial Experiment

13.1 Categorical Data and the Multinomial Experiment Chapter 13 Categorical Data Analysis 13.1 Categorical Data and the Multinomial Experiment Recall Variable: (numerical) variable (i.e. # of students, temperature, height,). (non-numerical, categorical)

More information

Business Statistics MEDIAN: NON- PARAMETRIC TESTS

Business Statistics MEDIAN: NON- PARAMETRIC TESTS Business Statistics MEDIAN: NON- PARAMETRIC TESTS CONTENTS Hypotheses on the median The sign test The Wilcoxon signed ranks test Old exam question HYPOTHESES ON THE MEDIAN The median is a central value

More information

STATISTICS SYLLABUS UNIT I

STATISTICS SYLLABUS UNIT I STATISTICS SYLLABUS UNIT I (Probability Theory) Definition Classical and axiomatic approaches.laws of total and compound probability, conditional probability, Bayes Theorem. Random variable and its distribution

More information

STATISTICS ( CODE NO. 08 ) PAPER I PART - I

STATISTICS ( CODE NO. 08 ) PAPER I PART - I STATISTICS ( CODE NO. 08 ) PAPER I PART - I 1. Descriptive Statistics Types of data - Concepts of a Statistical population and sample from a population ; qualitative and quantitative data ; nominal and

More information

simple if it completely specifies the density of x

simple if it completely specifies the density of x 3. Hypothesis Testing Pure significance tests Data x = (x 1,..., x n ) from f(x, θ) Hypothesis H 0 : restricts f(x, θ) Are the data consistent with H 0? H 0 is called the null hypothesis simple if it completely

More information

Topic 3: Sampling Distributions, Confidence Intervals & Hypothesis Testing. Road Map Sampling Distributions, Confidence Intervals & Hypothesis Testing

Topic 3: Sampling Distributions, Confidence Intervals & Hypothesis Testing. Road Map Sampling Distributions, Confidence Intervals & Hypothesis Testing Topic 3: Sampling Distributions, Confidence Intervals & Hypothesis Testing ECO22Y5Y: Quantitative Methods in Economics Dr. Nick Zammit University of Toronto Department of Economics Room KN3272 n.zammit

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

Cherry Blossom run (1) The credit union Cherry Blossom Run is a 10 mile race that takes place every year in D.C. In 2009 there were participants

Cherry Blossom run (1) The credit union Cherry Blossom Run is a 10 mile race that takes place every year in D.C. In 2009 there were participants 18.650 Statistics for Applications Chapter 5: Parametric hypothesis testing 1/37 Cherry Blossom run (1) The credit union Cherry Blossom Run is a 10 mile race that takes place every year in D.C. In 2009

More information

z and t tests for the mean of a normal distribution Confidence intervals for the mean Binomial tests

z and t tests for the mean of a normal distribution Confidence intervals for the mean Binomial tests z and t tests for the mean of a normal distribution Confidence intervals for the mean Binomial tests Chapters 3.5.1 3.5.2, 3.3.2 Prof. Tesler Math 283 Fall 2018 Prof. Tesler z and t tests for mean Math

More information

Estimating the accuracy of a hypothesis Setting. Assume a binary classification setting

Estimating the accuracy of a hypothesis Setting. Assume a binary classification setting Estimating the accuracy of a hypothesis Setting Assume a binary classification setting Assume input/output pairs (x, y) are sampled from an unknown probability distribution D = p(x, y) Train a binary classifier

More information

Inference for Binomial Parameters

Inference for Binomial Parameters Inference for Binomial Parameters Dipankar Bandyopadhyay, Ph.D. Department of Biostatistics, Virginia Commonwealth University D. Bandyopadhyay (VCU) BIOS 625: Categorical Data & GLM 1 / 58 Inference for

More information

CDA Chapter 3 part II

CDA Chapter 3 part II CDA Chapter 3 part II Two-way tables with ordered classfications Let u 1 u 2... u I denote scores for the row variable X, and let ν 1 ν 2... ν J denote column Y scores. Consider the hypothesis H 0 : X

More information

Central Limit Theorem ( 5.3)

Central Limit Theorem ( 5.3) Central Limit Theorem ( 5.3) Let X 1, X 2,... be a sequence of independent random variables, each having n mean µ and variance σ 2. Then the distribution of the partial sum S n = X i i=1 becomes approximately

More information

MATH Notebook 3 Spring 2018

MATH Notebook 3 Spring 2018 MATH448001 Notebook 3 Spring 2018 prepared by Professor Jenny Baglivo c Copyright 2010 2018 by Jenny A. Baglivo. All Rights Reserved. 3 MATH448001 Notebook 3 3 3.1 One Way Layout........................................

More information

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3 STA 303 H1S / 1002 HS Winter 2011 Test March 7, 2011 LAST NAME: FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 303 STA 1002 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator. Some formulae

More information

Selection should be based on the desired biological interpretation!

Selection should be based on the desired biological interpretation! Statistical tools to compare levels of parasitism Jen_ Reiczigel,, Lajos Rózsa Hungary What to compare? The prevalence? The mean intensity? The median intensity? Or something else? And which statistical

More information

Non-parametric Inference and Resampling

Non-parametric Inference and Resampling Non-parametric Inference and Resampling Exercises by David Wozabal (Last update. Juni 010) 1 Basic Facts about Rank and Order Statistics 1.1 10 students were asked about the amount of time they spend surfing

More information

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing 1 In most statistics problems, we assume that the data have been generated from some unknown probability distribution. We desire

More information

Turning a research question into a statistical question.

Turning a research question into a statistical question. Turning a research question into a statistical question. IGINAL QUESTION: Concept Concept Concept ABOUT ONE CONCEPT ABOUT RELATIONSHIPS BETWEEN CONCEPTS TYPE OF QUESTION: DESCRIBE what s going on? DECIDE

More information

This does not cover everything on the final. Look at the posted practice problems for other topics.

This does not cover everything on the final. Look at the posted practice problems for other topics. Class 7: Review Problems for Final Exam 8.5 Spring 7 This does not cover everything on the final. Look at the posted practice problems for other topics. To save time in class: set up, but do not carry

More information

Sampling Distributions: Central Limit Theorem

Sampling Distributions: Central Limit Theorem Review for Exam 2 Sampling Distributions: Central Limit Theorem Conceptually, we can break up the theorem into three parts: 1. The mean (µ M ) of a population of sample means (M) is equal to the mean (µ)

More information

SEVERAL μs AND MEDIANS: MORE ISSUES. Business Statistics

SEVERAL μs AND MEDIANS: MORE ISSUES. Business Statistics SEVERAL μs AND MEDIANS: MORE ISSUES Business Statistics CONTENTS Post-hoc analysis ANOVA for 2 groups The equal variances assumption The Kruskal-Wallis test Old exam question Further study POST-HOC ANALYSIS

More information

PSY 307 Statistics for the Behavioral Sciences. Chapter 20 Tests for Ranked Data, Choosing Statistical Tests

PSY 307 Statistics for the Behavioral Sciences. Chapter 20 Tests for Ranked Data, Choosing Statistical Tests PSY 307 Statistics for the Behavioral Sciences Chapter 20 Tests for Ranked Data, Choosing Statistical Tests What To Do with Non-normal Distributions Tranformations (pg 382): The shape of the distribution

More information

Nonparametric Statistics. Leah Wright, Tyler Ross, Taylor Brown

Nonparametric Statistics. Leah Wright, Tyler Ross, Taylor Brown Nonparametric Statistics Leah Wright, Tyler Ross, Taylor Brown Before we get to nonparametric statistics, what are parametric statistics? These statistics estimate and test population means, while holding

More information

Introduction to Statistical Inference Lecture 10: ANOVA, Kruskal-Wallis Test

Introduction to Statistical Inference Lecture 10: ANOVA, Kruskal-Wallis Test Introduction to Statistical Inference Lecture 10: ANOVA, Kruskal-Wallis Test la Contents The two sample t-test generalizes into Analysis of Variance. In analysis of variance ANOVA the population consists

More information

The entire data set consists of n = 32 widgets, 8 of which were made from each of q = 4 different materials.

The entire data set consists of n = 32 widgets, 8 of which were made from each of q = 4 different materials. One-Way ANOVA Summary The One-Way ANOVA procedure is designed to construct a statistical model describing the impact of a single categorical factor X on a dependent variable Y. Tests are run to determine

More information

Nonparametric statistic methods. Waraphon Phimpraphai DVM, PhD Department of Veterinary Public Health

Nonparametric statistic methods. Waraphon Phimpraphai DVM, PhD Department of Veterinary Public Health Nonparametric statistic methods Waraphon Phimpraphai DVM, PhD Department of Veterinary Public Health Measurement What are the 4 levels of measurement discussed? 1. Nominal or Classificatory Scale Gender,

More information

Glossary for the Triola Statistics Series

Glossary for the Triola Statistics Series Glossary for the Triola Statistics Series Absolute deviation The measure of variation equal to the sum of the deviations of each value from the mean, divided by the number of values Acceptance sampling

More information

Fish SR P Diff Sgn rank Fish SR P Diff Sng rank

Fish SR P Diff Sgn rank Fish SR P Diff Sng rank Nonparametric tests Distribution free methods require fewer assumptions than parametric methods Focus on testing rather than estimation Not sensitive to outlying observations Especially useful for cruder

More information