Exam details Final Review Session Short answer, similar to book problems Formulae and tables will be given You CAN use a calculator Date and Time: Dec. 7, 006, 1-1:30 pm Location: Osborne Centre, Unit 1 ( A ) Things to Review Concepts Basic formulae Statistical tests
Concepts Basic formulae Statistical tests Things to Review Populations Samples Random sample Parameters Estimates Mean Median Mode Variance Standard deviation Categorical Nominal, ordinal Numerical Discrete, continuous First Half Alternative hypothesis P-value Type I error Type II error Sampling distribution Standard error Central limit theorem Normal distribution Quantile plot Shapiro-Wilk test Data transformations Nonparametric tests Independent contrasts Second Half Observations vs. experiments Confounding variables Control group Replication and pseudoreplication Blocking Factorial design Power analysis Simulation Randomization Bootstrap Likelihood Example Conceptual Questions (you ve just done a two-sample t-test comparing body size of lizards on islands and the mainland) What is the probability of committing a type I error with this test? State an example of a confounding variable that may have affected this result State one alternative statistical technique that you could have used to test the null hypothesis, and describe briefly how you would have carried it out.
Sample Randomization test Randomized data Things to Review Calculate the same test statistic on the randomized data Concepts Basic formulae Statistical tests Concepts Basic formulae Statistical tests Things to Review
Sample Statistical tests Binomial test Chi-squared goodness-of-fit Proportional, binomial, poisson Chi-squared contingency test t-tests One-sample t-test Paired t-test Two-sample t-test F-test for comparing variances Welch s t-test Sign test Mann-Whitney U Correlation Spearman s r Regression ANOVA Binomial test Chi-squared goodnessof-fit Proportional, binomial, poisson Chi-squared contingency test t-tests One-sample t-test Paired t-test Two-sample t-test Statistical tests F-test for comparing variances Welch s t-test Sign test Mann-Whitney U Correlation Spearman s r Regression ANOVA Quick reference summary: Binomial test What is it for? Compares the proportion of successes in a sample to a hypothesized value, p o What does it assume? Individual trials are randomly sampled and independent : X, the number of successes Distribution under H o : binomial with parameters n and p o. Formula: " P(x) = $ n% ' p x ( 1( p) n(x P = * Pr[x!X] # x& P(x) = probability of a total of x successes p = probability of success in each trial n = total number of trials
Sample Binomial test Pr[success]=p o Binomial test x = number of successes Binomial n, p o H 0 : The relative frequency of successes in the population is p 0 H A : The relative frequency of successes in the population is not p 0 Binomial test Chi-squared goodnessof-fit Proportional, binomial, poisson Chi-squared contingency test t-tests One-sample t-test Paired t-test Two-sample t-test Statistical tests F-test for comparing variances Welch s t-test Sign test Mann-Whitney U Correlation Spearman s r Regression ANOVA Quick reference summary: " Goodness-of-Fit test What is it for? Compares observed frequencies in categories of a single variable to the expected frequencies under a random model What does it assume? Random samples; no expected values < 1; no more than 0% of expected values < 5 : " Distribution under H o : " with Formula: df=# categories - # parameters - 1 " = ( Observed i # Expected i ) $ all classes Expected i
" goodness of fit test Sample Calculate expected values : Data fit a particular Discrete distribution " Goodness-of-Fit test " = ( Observed i # Expected i ) $ all classes Expected i : " With N-1-param. d.f. H 0 : The data come from a certain distribution H A : The data do not come from that distrubition Possible distributions " Pr[x] = $ n% ' p x 1( p # x& ( ) n(x Pr[ X ] = e"µ µ X X! Pr[x] = n * frequency of occurrence Proportional Binomial Poisson Given a number of categories Probability proportional to number of opportunities Days of the week, months of the year Number of successes in n trials Have to know n, p under the null hypothesis Punnett square, many p=0.5 examples Number of events in interval of space or time n not fixed, not given p Car wrecks, flowers in a field
Binomial test Chi-squared goodnessof-fit Proportional, binomial, poisson Chi-squared contingency test t-tests One-sample t-test Paired t-test Two-sample t-test Statistical tests F-test for comparing variances Welch s t-test Sign test Mann-Whitney U Correlation Spearman s r Regression ANOVA Quick reference summary: " Contingency Test What is it for? Tests the null hypothesis of no association between two categorical variables What does it assume? Random samples; no expected values < 1; no more than 0% of expected values < 5 : " Distribution under H o : " with df=(r-1)(c-1) where r = # rows, c = # columns Formulae: ( Observed " = i # Expected i ) RowTotal *ColTotal Expected = $ GrandTotal all classes Expected i Sample " Contingency Test Calculate expected values : No association between variables " Contingency test " = ( Observed i # Expected i ) $ all classes Expected i : " With (r-1)(c-1) d.f. H 0 : There is no association between these two variables H A : There is an association between these two variables
Binomial test Chi-squared goodnessof-fit Proportional, binomial, poisson Chi-squared contingency test t-tests One-sample t-test Paired t-test Two-sample t-test Statistical tests F-test for comparing variances Welch s t-test Sign test Mann-Whitney U Correlation Spearman s r Regression ANOVA Quick reference summary: One sample t-test What is it for? Compares the mean of a numerical variable to a hypothesized value,! o What does it assume? Individuals are randomly sampled from a population that is normally distributed. : t Distribution under H o : t-distribution with n-1 degrees of freedom. Formula: t = Y " µ o SE Y Sample One-sample t-test The population mean is equal to µ o One-sample t-test t = Y " µ o s/ n t with n-1 df H o : The population mean is equal to µ o H a : The population mean is not equal to µ o
Paired vs. sample comparisons Quick reference summary: Paired t-test What is it for? To test whether the mean difference in a population equals a null hypothesized value,! do What does it assume? Pairs are randomly sampled from a population. The differences are normally distributed : t Distribution under H o : t-distribution with n-1 degrees of freedom, where n is the number of pairs Formula: t = d " µ do SE d Paired t-test Sample The mean difference is equal to µ o Paired t-test t = d " µ do SE d t with n-1 df *n is the number of pairs H o : The mean difference is equal to 0 H a : The mean difference is not equal 0
Quick reference summary: Two-sample t-test What is it for? Tests whether two groups have the same mean What does it assume? Both samples are random samples. The numerical variable is normally distributed within both populations. The variance of the distribution is the same in the two populations : t Distribution under H o : t-distribution with n 1 +n - degrees of freedom. Formulae: t = Y 1 "Y SE Y 1 "Y # 1 SE Y 1 "Y = s p + 1 & % ( $ n 1 n ' s p = df 1s 1 + df s df 1 + df Sample t = Y 1 "Y SE Y 1 "Y Two-sample t-test The two populations have the same mean µ 1 =µ t with n 1 +n - df Two-sample t-test Statistical tests H o : The means of the two populations are equal H a : The means of the two populations are not equal Binomial test Chi-squared goodnessof-fit Proportional, binomial, poisson Chi-squared contingency test t-tests One-sample t-test Paired t-test Two-sample t-test F-test for comparing variances Welch s t-test Sign test Mann-Whitney U Correlation Spearman s r Regression ANOVA
F-test for Comparing the variance of two groups Sample F-test The two populations have the same variance! 1 =! H 0 :" 1 = " H A :" 1 # " F = s 1 s F with n 1-1, n -1 df Binomial test Chi-squared goodness-of-fit Proportional, binomial, poisson Chi-squared contingency test t-tests One-sample t-test Paired t-test Two-sample t-test Statistical tests F-test for comparing variances Welch s t-test Sign test Mann-Whitney U Correlation Spearman s r Regression ANOVA Sample t = Y 1 " Y s 1 + s n 1 n Welch s t-test The two populations have the same mean µ 1 =µ t with df from formula
Binomial test Chi-squared goodness-of-fit Proportional, binomial, poisson Chi-squared contingency test t-tests One-sample t-test Paired t-test Two-sample t-test Statistical tests F-test for comparing variances Welch s t-test Sign test Mann-Whitney U Correlation Spearman s r Regression ANOVA Parametric One-sample and Paired t-test Two-sample t-test Nonparametric Sign test Mann-Whitney U-test Quick Reference Summary: Sign Test What is it for? A non-parametric test to the medians of a group to some constant What does it assume? Random samples Formula: Identical to a binomial test with p o = 0.5. Uses the number of subjects with values greater than and less than a hypothesized median as the test statistic. " P(x) = probability of a total of x successes p = probability of success in each trial n = total number of trials P(x) = n% $ ' p x 1( p # x& ( ) n(x P = * Pr[x!X] Sample x = number of values greater than m o Sign test Median = m o Binomial n, 0.5
Sign Test H o : The median is equal to some value m o H a : The median is not equal to m o Quick Reference Summary: Mann-Whitney U Test What is it for? A non-parametric test to the central tendencies of two groups What does it assume? Random samples : U Distribution under H o : U distribution, with sample sizes n 1 and n Formulae: ( ) U 1 = n 1 n + n n +1 1 1 U = n 1 n " U 1 " R 1 n 1 = sample size of group 1 n = sample size of group R 1 = sum of ranks of group 1 Use the larger of U1 or U for a two-tailed test Sample U 1 or U (use the largest) Mann-Whitney U test The two groups Have the same median U with n 1, n Binomial test Chi-squared goodness-of-fit Proportional, binomial, poisson Chi-squared contingency test t-tests One-sample t-test Paired t-test Two-sample t-test Statistical tests F-test for comparing variances Welch s t-test Sign test Mann-Whitney U Correlation Spearman s r Regression ANOVA
Quick Reference Guide - Correlation Coefficient What is it for? Measuring the strength of a linear association between two numerical variables What does it assume? Bivariate normality and random sampling Parameter: # Estimate: r Formulae: #( X i " X )( Y i " Y ) r = SE #( X i " X ) #( Y i " Y ) r = 1" r n " Quick Reference Guide - t-test for zero linear correlation What is it for? To test the null hypothesis that the population parameter, #, is zero What does it assume? Bivariate normality and random sampling : t : t with n- degrees of freedom t = r Formulae: SE r Sample T-test for correlation #=0 Statistical tests t = r SE r t with n- d.f. Binomial test Chi-squared goodness-of-fit Proportional, binomial, poisson Chi-squared contingency test t-tests One-sample t-test Paired t-test Two-sample t-test F-test for comparing variances Welch s t-test Sign test Mann-Whitney U Correlation Spearman s r Regression ANOVA
Quick Reference Guide - Spearman s Rank Correlation What is it for? To test zero correlation between the ranks of two variables What does it assume? Linear relationship between ranks and random sampling : r s : See table; if n>100, use t- distribution Formulae: Same as linear correlation but based on ranks Sample Spearman s rank correlation #=0 Spearman s rank Table H Statistical tests Assumptions of Regression Binomial test Chi-squared goodness-of-fit Proportional, binomial, poisson Chi-squared contingency test t-tests One-sample t-test Paired t-test Two-sample t-test F-test for comparing variances Welch s t-test Sign test Mann-Whitney U Correlation Spearman s r Regression ANOVA At each value of X, there is a population of Y values whose mean lies on the true regression line At each value of X, the distribution of Y values is normal The variance of Y values is the same at all values of X At each value of X the Y measurements represent a random sample from the population of Y values
OK Non-linear Non-normal Unequal variance Quick Reference Summary: Confidence Interval for Regression Slope What is it for? Estimating the slope of the linear equation Y = $ + %X between an explanatory variable X and a response variable Y What does it assume? Relationship between X and Y is linear; each Y at a given X is a random sample from a normal distribution with equal variance Parameter: % Estimate: b Degrees of freedom: n- Formulae: b " t #(),df SE b < $ < b + t #(),df SE b SE b = MS residual = # MS residual #( X i " X ) # (Y i "Y ) " b (X i " X )(Y i "Y ) n " Quick Reference Summary: t-test for Regression Slope What is it for? To test the null hypothesis that the population parameter % equals a null hypothesized value, usually 0 What does it assume? Same as regression slope C.I. : t : t with n- d.f. Formula: t = b SE b
Sample T-test for Regression Slope %=0 Statistical tests t = b SE b t with n- df Binomial test Chi-squared goodness-of-fit Proportional, binomial, poisson Chi-squared contingency test t-tests One-sample t-test Paired t-test Two-sample t-test F-test for comparing variances Welch s t-test Sign test Mann-Whitney U Correlation Spearman s r Regression ANOVA Quick Reference Summary: ANOVA (analysis of variance) What is it for? Testing the difference among k means simultaneously What does it assume? The variable is normally distributed with equal standard deviations (and variances) in all k populations; each sample is a random sample : F Distribution under H o : F distribution with k-1 and N-k degrees of freedom Quick Reference Summary: ANOVA (analysis of variance) Formulae: MS group = SS group df group F = MS group MS error = SS group k "1 SS group = # n i (Y i "Y) Y i Y = mean of group i = overall mean MS error = SS error df error = SS error N " k SS error = # s i (n i "1) n i = size of sample i N = total sample size
k Samples ANOVA All groups have the same mean ANOVA F = MS group MS error F with k-1, N-k df H o : All of the groups have the same mean H a : At least one of the groups has a mean that differs from the others ANOVA Tables Picture of ANOVA Terms Source of variation Sum of squares df Mean Squares F ratio P Treatment SS group = # n i (Y i "Y) k-1 MS group = SS group df group Error SS error = # s i (n i "1) N-k MS error = SS error df error Total SS group + SS error N-1 SS Total MS Total SS Group MS Group SS Error MS Error
Source of variation Treatment 1 Treatment Treatment 1 * Treatment Error Total Two-factor ANOVA Table Sum of Squares SS 1 SS SS 1* SS error SS total df k 1-1 k - 1 (k 1-1)*(k - 1) XXX N-1 Mean Square SS 1 k 1-1 SS k - 1 SS 1* (k 1-1)*(k - 1) SS error XXX F ratio MS 1 MSE MS MSE MS 1* MSE P Interpretations of -way ANOVA Terms Interpretations of -way ANOVA Terms Interpretations of -way ANOVA Terms Effect of Temperature, Not ph Effect of ph, Not Temperature
Interpretations of -way ANOVA Terms Effect of ph and Temperature, No interaction Interpretations of -way ANOVA Terms Effect of ph and Temperature, with interaction Quick Reference Summary: -Way ANOVA What is it for? Testing the difference among means from a -way factorial experiment What does it assume? The variable is normally distributed with equal standard deviations (and variances) in all populations; each sample is a random sample : F (for three different hypotheses) Distribution under H o : F distribution Quick Reference Summary: - Way ANOVA Formulae: Just need to know how to fill in the table
-way ANOVA -way ANOVA Samples Null hypotheses (three of them) Samples Null hypotheses (three of them) F = MS group MS error F F = MS group MS error Treatment 1 F -way ANOVA -way ANOVA Samples Null hypotheses (three of them) Samples Null hypotheses (three of them) F = MS group MS error Treatment F F = MS group MS error Interaction F
General Linear Models First step: formulate a model statement Example: General Linear Models Second step: Make an ANOVA table Example: Y = µ + TREATMENT Source of variation Treatme nt Error Total Sum of squares SS group = # n i (Y i "Y) SS error = # s i (n i "1) SS group + SS error df k-1 N-k N-1 Mean Squares MS group = SS group df group MS error = SS error df error F ratio P F = MS group MS error * Sample Randomization test Randomized data Calculate the same test statistic on the randomized data Which test do I use?
1 Methods for a single variable 1 Methods for a single variable How many variables am I comparing? Methods for comparing two variables How many variables am I comparing? 3 Methods for comparing two variables Methods for comparing three or more variables Methods for one variable Categorical Comparing to a single proportion p o or to a distribution? p o Is the variable categorical or numerical? distribution Numerical Y Methods for two variables X Explanatory variable Response variable Categorical Numerical Categorical Contingency table Contingency Logistic Grouped bar graph analysis regression Mosaic plot Numerical Multiple histograms t-test Correlation Scatter plot Cumulative frequency distributions ANOVA Regression Binomial test " Goodnessof-fit test One-sample t-test
How many variables am I comparing? 1 Is the variable categorical or numerical? Categorical Comparing to a single proportion p o or to a distribution? Numerical Explanatory variable Response variable Categorical Numerical Categorical Contingency table Grouped Contingency Logistic bar graph analysis Mosaic plot regression Numerical Multiple t-test histograms Correlation Scatter plot Cumulative frequency distributions ANOVA Regression One-sample t-test p o distribution Binomial test " Goodnessof-fit test Contingency analysis