STATISTICS REVIEW. D. Parameter: a constant for the case or population under consideration.

Size: px
Start display at page:

Download "STATISTICS REVIEW. D. Parameter: a constant for the case or population under consideration."

Transcription

1 STATISTICS REVIEW I. Why do we need statistics? A. As human beings, we consciously and unconsciously evaluate whether variables affect phenomena of interest, but sometimes our common sense reasoning is fallacious and based on coincidence. Further, most biological phenomena are affected by many causal factors, and it is usually difficult to identify which among a collection of variables are those that most significantly predict a pattern of interest. 1. Statistics are used to summarize masses of data into simpler, more interpretable forms (e.g., mean, coefficient of variation, 90th percentile). 2. Statistics are used to explore data with the goal of finding interesting patterns (e.g., identifying risk factors for lung cancer by analyzing the habits of people who have and have not developed the disease). These are also known as exploratory tests. 3. Statistical tests are used to calculate the likelihood of finding a particular magnitude of association between variables (e.g., age and reproductive success) or difference between samples (weight of males vs. females) just by chance. This probability can then be used to make a decision (e.g., A is/isn t correlated with B, A causes/doesn t cause B, A is the same as/different from B). A statistical test of an a priori hypothesis is also called a confirmatory test. B. The "statistical frame of mind" 1. Experiments, sampling regimes, and behavioral measurements must be designed with the statistical tests to be undertaken in mind. 2. The most common error in data collection is the failure to collect a control or random database for comparison with the treatment or question of interest. 3. The next most common error is insufficient or no replication; you cannot do a statistical test with only one treatment and one control sample because there is no estimate of the inherent variability of each. II. Some basic definitions A. Observation: The data in a biometric study are based on individual observations; they are the observations or measurements taken on the smallest sampling unit. (Note: "data" is a plural word.) B. Sample: a collection of individual observations, symbolized by "n". C. Population: the totality of individual observations about which inferences are to be made; the universe from which we sample. D. Parameter: a constant for the case or population under consideration. E. Variable: the actual property measured by the individual observations, also called the character. 1. Interval variables (difference between two scores is meaningful. For example, a one inch difference already tells us something about the distance between two objects) a. Continuous variables (character can assume an infinite number of values between any two points) e.g., length, weight.

2 b. Discontinuous or discrete variables (character can have only certain fixed numerical or integer values) e.g., number of offspring. 2. Ordinal variables (some variables cannot be quantitatively measured but can be ranked) e.g., fat stores, aggressiveness. 3. Nominal or categorical variables (mutually exclusive classes) e.g., color, habitat, year. 4. Derived variables (quantitative variables which are computed from two or more measured variables); e.g., ratios, percents, rates. III. Descriptive statistical techniques A. Before conducting any statistical tests, it is essential to summarize and graphically examine your data to: 1) be sure you haven t made any data entry errors which are obvious outliers, 2) determine whether you can use parametric tests, 3) decide whether some variables need to be transformed, and 4) look for general patterns, relationships, and differences. These descriptive techniques include computing means and standard deviations, making frequency histograms of each variable, and making bivariate plots of variables that may be associated. B. Central tendency parameters 1. Arithmetic mean: X or μ (mu), is simply thye sum of all observations over the number of observations in your sample. You typically round the mean to one digit beyond the original data; for example, if your data were measured to two significant digits then the mean is reported using three significant digits. 2. Median: the value of the variable in an ordered array that has an equal number of observations on either side or it (also called 50% quartile). 3. Mode: the most frequently occurring measurement. There may be more than one mode; in this case, find them by looking at the histogram. C. Parameters of dispersion and variability among the points in a sample. 1. Range: Difference between the highest and lowest measurements in a sample, reported as ("minimum value" to "maximum value"). 2. Variance: s 2 (or s 2 ) = 1/(n-1) S ( X - xi ) 2 3. Standard deviation (SD), reported as X ± SD: s = s 2 4. Coefficient of variation, reported as a percentage: CV = s/ X * 100 D. Parameters of confidence in our estimate of the population mean (or any other index) if repeated samples were made. 1. Standard error of the mean, reported as X ± SE: SE = s / n 2. 95% confidence interval, reported as X ± CI: CI=1.96 s / 3. NOTE: because SD, SE, and CI all follow the mean with the ± sign, it is important to specify which one you are reporting. A convenient graphical means of displaying descriptive statistics for several n

3 samples together is the Box Plot. The horizontal line shows the median, the top and bottom of the box shows the 75 th and 25 th quartiles, respectively (i.e., the box represents the middle half of the data), the whiskers indicate adjacent values which are within 1.5 times the upper and lower box edges, and the asterisks show the outlying points (range). The plot below compares the start weight and end weight of male sage grouse that were monitored for 1-2 weeks with double-labeled water. Weight (g) Start wt. End wt. F. Frequency plot or histogram: a plot of the number of observations versus the magnitude of the variable or character Frequency Male weight (kg) 3.1 IV. The normal distribution A. The mean, standard deviation, and variance are only meaningful when the frequency distribution of values follows a normal, or Gaussian, curve. A normal distribution is defined as one in which the frequency (fi) of an observation of size xi is: f i = σ 1 2 π e - ( x i - µ ) 2 / 2 σ 2

4 1. The smoothed plot of this function is called a probability density function. 2. The mean, median, and mode are equal in a normal distribution. Two normal distributions can differ in their mean (location) and variance (width) and still be normal. A and B differ in location but have the same variance, B and C have the same mean but differ in variance. B. Deviations from normality 1. Distributions may be skewed left or right; the mean, mode, and median are no longer equal. 2. Distributions may be kurtotic (too tall or too flat).

5 3. When distributions deviate significantly from normal, they should be described with the median, mode, and range (as opposed to with the mean and SD). C. Proportions of the normal curve 1. The area under a normal curve between the mean ± one standard deviation includes about 68% of observations in the sample; the area between the mean ± two standard deviations includes about 96% of the observations. 2. For any xi value from a normal population with mean µ and standard deviation s, the value x Z = i - µ σ is a measure of how many standard deviations from the mean the x i value is located. Z is called a normal deviate or standard score, and the process of converting all x i values into Z scores is called normalizing or standardizing. A variable that has been standardized has a mean of zero and standard deviation of 1. V. General rules for drawing conclusions using statistical tests A. Most comparisons one makes fall into one of three categories: 1. Comparisons between two or more samples: Are the differences significant? 2. Comparisons between a sample and a theoretical distribution: Is the sample different from the theoretical expectation? These are also called goodness of fit tests. 3. Measures of association between two or more variables: Are two variables positively or negatively correlated? B. The null hypothesis is that the samples do not differ significantly, (i.e., H0: X1 = X2) or that the two variables are not associated (i.e., H0: ra,b = 0). 1. Statistical tests give us the probability that the observed difference could arise by chance. Depending on the type of comparison, there is a theoretical distribution and corresponding probability distribution from which we can compute the cumulative probability of our observed difference occurring by chance.

6 2. The smaller the probability, the greater is our confidence that the difference between our samples (or sample and theory) represents a true difference. 3. Usually for a probability of.05 or less, we reject the null hypothesis and conclude that the differences are real; for probabilities greater than.05 we accept the null hypothesis. The probability is symbolized by "p" or in some cases by "a". 4. Statistical tests are sensitive to sample size!!! In other words... the smaller your sample size, the lower your confidence that you will get a right answer to your question. 5. All tests assume that samples are random representations of the original population. 6. Because our conclusions are based on a probabilistic cut-off point, there is a possibility of making the wrong conclusion. a. Type I error: we reject the null hypothesis when the populations from which we took the samples are not in fact different (or associated). In other words, there is a coincidence. At p = 0.05, there is a 1 in 20 chance of getting a difference as extreme from two samples of the exact same population. b. Type II error: we accept the null hypothesis when in fact the populations are different (or associated). This is typically the result of having too small a sample size. A certain sample size is required if one is likely to detect a true difference of a particular magnitude; smaller differences require bigger sample sizes to detect (the calculation of this sample size is called a power analysis). The possibility of a type II error is why we generally do not say we accept the null hypothesis. If a result is not significant, it is better to think of the relationship between samples as ambiguous (maybe they are exactly the same or maybe the difference is too small to detect with the sample size) and hence conclude that the null hypothesis cannot be rejected. 7. IMPORTANT NOTE: When reporting the results of a statistical test in your lab write-ups, always give the unit of measure, either the sample means ± SD or medians, the sample sizes or DF, the value of the statistic, and the p level. Example: male weight = 2.3±0.8, female weight = 1.8±0.7; t- test: t = 1.2, df = 21, p > 0.2. C. There are often two choices of statistical tests available, depending on the nature of the variable. 1. Parametric tests generally require that the observations of the samples being compared are normally distributed. Certain types of analyses have additional requirements. 2. Non-parametric tests make no assumptions about the distribution of samples but tend to have less power than their parametric counterparts. D. Testing for normality 1. You can inspect whether a variable is likely to be normally distributed using visual tools. Plot histograms and look for the iconic 'bell curve'. You may also plot quantile-quantile plots to compare the expected values under a normal distribution with the values you see in your sample (normal distribution samples should result in q-q plots that resemble a fairly straight line). 2. Multiple statistical tests are available to test for significant deviations from normality. The Kolmogorov-Smirnow goodness of fit test is a generalized non-parametric test that detects any type of deviation from normality (skew and kurtosis). The Shapiro test accomplishes the same goal. If the p-value of these test is less than.05, the distribution is considered to be significantly non-normal.

7 E. Transforming skewed distributions to make them normal 1. When a frequency distribution looks skewed, it is sometimes necessary to 'transform' the raw variable using mathematical tools. Below are some guidelines for transforming skewed distributions. a. Left skew: x 2, x n (n>1), e x b. Right skew: ln x, log x, x n (n<1) sqrt x, -1/x c. Percentages or proportions: arcsine (this is a bit of a deceiving name. The arcsine transformation actually involves computing the arcsine of the square root of each observation) 3. Biological data are frequently right-skewed, so the most common transformation is ln(x). VI. Comparing sample means A. Parametric test: the t-test 1. The t-test is a general test for comparing two indices of any type for which we have standard errors or standard deviations; it can also be used to compare a sample mean index to a theoretical expectation. When used to compare the means of two samples, t is computed as the difference of the two means divided by the weighted standard deviations of the two samples. (There are many other versions of the formula for t depending on the precise hypothesis.) t = X 1 - X 2 s 1 2 s n 1 n 2 2. If the sample variances are high, or the sample sizes small, the two means will have to be farther apart to be significantly different. 1. Degrees of freedom = n 1 + n 2-2. The t distribution with infinite degrees of freedom (DF) is equal to a normal distribution of Z scores, but with smaller sample sizes it becomes more leptokurtotic to account for the increased error in estimating SE from the sample variance. 4. Usually one uses a two-tailed test (left) because there is no a priori reason to suspect that sample 1 should have the higher (or lower) mean, i.e. the alternative hypothesis is Ha: X1 not equal to X2. If there is some very compelling reason for hypothesizing that sample 1 is greater than sample 2, i.e. the alternate hypothesis is Ha: X1 > X2, then a one-tailed test should be done (right, this is rare).

8 Note that the distributions plotted below are for the difference between the means of the two samples. a = Cum p=.025 Cum p= t tests can be 'paired' when each observation in sample 1 can be matched with a unique observation in sample 2. For example, when comparing before and after samples for a collection of individuals, you should pair each before sample with the after sample for the corresponding individual. B. Non-parametric test: 1. The Mann-Whitney U test computes the probability that the central tendencies of two distributions are significantly different. This and many other non-parametric tests are "distribution-free" because they first convert the observations to ranks, then compute probabilities using Z scores. 2. The non-parametric equivalent of a paired t-test is the Wilcoxon Pired Signed Eanks test VII. Comparing sample variances A. F variance ratio test 1. Take the ratio of the variances of the two samples, with the largest variance in the numerator: F = 2 s 1 2 s 2 2. in R, use var.test() to run an F test VIII. Non-parametric goodness of fit tests for event-type data in categories: comparing a sample distribution to a theoretical or independently derived distribution. A. Ho: The sample distribution is the same as the theoretical distribution. B. The Chi-square goodness-of-fit test 1. Example: Do all of the birds in a communally nesting group contribute equally to the care of nestlings, or is there a significant bias?

9 a. Set up a file with the observed number of cases for each category listed in the first column, and the corresponding expected number of cases for each category in the second column. The totals for the observed and expected columns must be equal. Ind. # Obs trips Exp trips (O-E)^2/E b. Computation of Chi square statistic: k ( O i - E i ) 2 χ 2 = i = 1 E i where Oi is the observed number of cases in the i th category Ei is the expected number of cases in the i th category k is the total number of observed categories. c. The Chi square distribution for different degrees of freedom d. The magnitude of the Chi-square subcomponent for each category indicates which category deviates the most, but never use these subcomponents as a statistical test by themselves. 2. Rules for Chi-square a. No category should have an expected value of zero. b. No more than 20% of categories should have an expected value of less than 5. c. Lump categories together to meet the above conditions, but be careful that you don't end up with zero DF. c. Use Yate's correction (also called the continuity correction) for any tests involving two categories where the total sample is between 25 and 200. d. Whenever the total sample in a goodness of fit test is less than 25 and there are two categories, use the binomial test instead of chi-square.

10 C. The G test 1. The G test can be used in the same situations as the Chi-square test, and differs primarily in computational method. k O i G = 2 i ( O i ln ) E = 1 i 2. This test is less inflated by small sample sizes, but still follows the same rules as for Chi-square (e.g. use the continuity correction for k = 2 and 25 < n < 200). Because one takes the log of the observed values, observed values of zero are not allowed. 2. G is distributed as Chi-square IX. Contingency tables: non-parametric tests for comparing two or more sample distributions A. H 0 : There are no differences among the sample distributions (always two-tailed). B. Example: Do communal groups of birds suffer differential nest predation losses compared to single pairs? Factor A Single pairs Communal groups Factor B # predated nests # successful nests Row subtotal % predated % (8.3) (12.7) % (19.7) (30.3) Degrees of freedom = (rows - 1) (columns - 1). 2. For the Chi-square test, expected values are computed for each cell by multiplying its row subtotal by its column subtotal and diving this product by the grand total. The same Chi square formula (O- E)^2/E is computed and summed over all cells. B. Rules for Chi-square are the same as before 1. For a 2 by 2 table with a grand total of 25 or less, best to use Fisher's exact probability test 2. No cell should have an expected value of zero and no more than 20% of cells should have an expected value less than Lump categories (as in bold) to avoid these problems. Lump each sample in the same way: Score Freq. Yr.1 Freq. Yr

11 X. Introduction to parametric models A. Cause and effect: We often want to test whether some environmental, external, or social variable affects or determines some behavioral or fitness variable of interest. The potential causative variable is called an independent, predictor, or X variable and the potential affected variable is called a dependent, outcome, or Y variable. We construct a model to be tested of the form Y = X + E where E = the remaining unexplained error. When we graph the potential relationship, the dependent or Y variable usually appears on the vertical axis and the dependent or X variable appears on the horizontal axis. B. Confounding effects: In any natural system, there are often several potential predictor variables and relationships among these predictor variables may confuse or confound the effect of any single variable. If we can measure these other variables, they can be included in our model: Y = X1 + X2 +Xn + E. This is called a multivariable linear model, and such multivariable techniques allow us to examine more complex hypotheses and control statistically for variables we cannot control with experiments or field studies. NOTE: a common mistake is to refer to models like this as 'multivariate' models. However, in proper statistical parlance, multivariate models are actually those with multiple response variables (i.e., multiple Y's on the right hand side of the equation). C. Overview of multivariable and multivariate approaches: The type of statistical test we perform depends on how many dependent (Y) and independent (X) variables we have, and whether they are categorical or continuous variables. Name of statistical test Dependent (Y) variable Independent (X) variable *Simple linear regression 1 continuous Y 1 continuous X Polynomial regression 1 continuous Y 1 continuous X + X^2 + X^3 etc. *Multiple regression 1 continuous Y 2 or more continuous X s *One-way ANOVA 1 continuous Y 1 categorical X *Two-way ANOVA 1 continuous Y 2 categorical X s *ANCOVA and GLM 1 continuous Y 1+ continuous and 1+ categorical X s MANOVA & Discriminant function analysis 2+ continuous Y s 1 categorical X Canonical correlation 2+ continuous Y s 2+ continuous X s Logistic regression 1 categorical Y Any mix of categorical/continuous X s PCA, Factor analysis none 2 or more continuous X s XI. Analysis of variance (ANOVA) A. When the dependent variable is a continuous measure and the independent variable(s) is (are) categorical, we use some type of ANOVA to examine the effects of different categories, called levels, on the dependent variable. Examples of categorical independent variables include experimental treatments, behavioral contexts, different sites or years, different individuals or litters, etc. There can be one or more such categorical independent variables.

12 B. One-way analysis of variance is an extension of the t-test for comparing the means of more than two groups. It is a parametric test requiring that all samples have normal distributions and similar variances. Ho: Y1= Y2= Y3= Y4 etc. 1. Nature of the data: in the generic example below, a series of individual measurements have been taken from five groups or treatments. The measurement variable Y is continuous, and the categorical X variable has 5 levels. Sample size need not be equal for all of the groups but should be similar (here, 6 individuals per group, 30 observations total). Each group has its associated mean, and the grand mean is computed over all values in all groups. Treatment or group (X) _ Y1 Y2 Y3 Y4 Y5 Y = grand mean 2. In any analysis of variance, the overall variance in the entire data set can be partitioned into additive components, in this case the within-group component and the between-group component. It is the between-group component that we are interested in here. To determine the significance of the differences between groups, one subtracts the variance within groups from the overall variance. 3. ANOVA is a parametric test, and therefore all samples should be independent of each other and possess similar variances. The key assumption that ALWAYS needs to be checked is that the residuals should be normally distributed. 3. The result of the analysis is an ANOVA table, which shows the total sum of squares (the difference between each observation and the grand mean squared), the within-groups sum of squares (called the error) computed as the difference between each observation and its group mean), and the betweengroup sum of squares (group differences, which is what we are interested in) computed as the total sum of squares minus the within-group SS. The mean squares MS is computed for the group and error components as SS/DF, and the F test takes the ratio of the between-group variance to the within-group variance. The ratio of the between-group SS to the total SS tells us the proportion of the total variance in the data set that is due to the treatments or groups.

13 6. BOXPLOTS often are quite helpful in this context to visualize variation among treatments and differences among the means. For example, response 60 Boxplots of response by group (means are indicated by solid circles) group 7. ANOVA can tell you when there is reason to suspect differences among treatments but will not be able to tell you directly which treatments are actually different from each other (e.g., treatment 1 may differ from treatments 2 and 3 whereas the latter two may not vary significantly from each other). You will need a post hoc test like Tukey s test to do just that. C. Non-parametric analysis of variance: The Kruskal-Wallis test is a non-parametric equivalent of the one-way analysis of variance. D. Two-way (and multiway) ANOVA 1. In 2-way ANOVA, there are two grouping variables (or two factors), and observations are made in all combinations of the levels for the two categorical variables to yield a full factorial design. If the number of observations is the same in all cells, the design is also balanced (desirable). 2. Interaction effects: The effect of a variable may depend on that of another. For example, an experimental treatment may have positive effects on males and negative effects on females. We call these situations 'interactions' and can explore them in any linear model (like ANOVA) by including the product of the two variables as another independent variable in the right hand side of our equation. Please note that whenever an interaction has a significant effect, then, by definition, the two or three (or however many) variables involved in the interaction are considered to also have a significant effect (regardless of their individual p-values). 3. By the way, the interaction between three or more factors can be examined in the same way, but be aware that multidimensional interactions are not only harder to interpret but also tend to suffer from low numbers of observations in each category. D. Fixed versus Random Effects: Dependent categorical variables (i.e., predictors) are considered fixed effects when we've sampled all the possible categories that there is to sample. In contrast, whenever a categorical factor includes levels that were randomly sampled from a much larger set of potential levels, then we call these random effects. For example, individual ID or Group ID tend to be considered random effects becausewhen you set up an experiment you typically get a (semi) random sample of

14 individuals and groups on which to apply the treatments (but you would like to know whether what you see would apply to other individuals or groups as well). Linear models estimate the effect of random effects differently from that of fixed effects. Put simply, a fixed effect is estimated as a 'coefficient' in the equation (that is as a constant that you multiply with the value of the predictor), whereas a random effect is computed as the SD of a normal distribution with mean zero from which you've sampled at random a set of values to be added to the intercept. Some more elaborate models also estimate a potential random effect on the slope of a particular fixed value but we won't deal with thos in this course. XI. Correlation Correlation tests measure the strength of association, or degree of covariance, between two continuous variables, without assuming a causal relationship between them. A. Parametric test: Pearson correlation coefficient. 1. Assumes that both variables are more or less normally distributed, but more importantly, that their relationship is linear and there are no outliers. 2. Pearson's correlation coefficient r ranges from -1 to +1. a. Positive r indicates positive (direct) relationship, negative r indicates negative (inverse) relationship, zero indicates no relationship. b. r 2 *100 is the percent of the variation explained by the correlation. C. Non-parametric test: The Spearman Rank correlation is a distribution-free test of association. This correlation coefficient has the same meaning as Pearson r but cannot be squared to determine the percent variance explained. XII. Regression and Multivariable Models A. When we want to examine the correlation between two (or more) continuous variables, but have some reason to believe that one variable is dependent on the other, or that there is a cause and effect relationship, a regression analysis or a general linear model is appropriate. 1. By definition, Y is dependent on X so Y is the "dependent" variable and X is the "independent" variable (also sometimes called the "predictor"). Data are similar in nature to those described above for correlation. 2. Y = f (X), or for the typical linear model, y = a + bx where a is the y intercept and b is the slope of the line (also called the regression coefficient). B. Some key assumptions of linear models (which are parametric tests) are: 1. The residuals are normally distributed. 2. The variance of the Y's for each X value are similar. 3. X and Y are linearly related.

15 C. The significance of the entire regression model is often determined by an F-test, which is the ratio of the variance explained by the regression line over the residual variance. The greater the scatter around the line, the lower the F. D. The significance of a specific regression coefficient is determined by a t-test, where the null hypothesis is b=0. For a simple regression (only one predictor), the results of the F- test and the t-test should be identical. E. Examining the residuals of your models (remember... these should be normally distributed) 1. Look at your histogram of residuals. It should be roughly normal. Use quantile-quantile plots and normality tests like Shapiro or Kolmogorov-Smirnov to confirm this intuition. 1. It also often helps to look at a plot of residuals versus X values. Check for deviations in the residual plot pattern (see figure below) and transform as necessary. An ideal residual plot should show your points evenly spaced in a horizontal rectangular cloud. It is best if you can solve any residual problems with a transformation of one or more x variables (it should be fairly evident from individual histograms and bivariate correlation plots which X variables might need transformation); transform the y variable only as a last resort. Log transformations of x variables are common in models with biological data because large values tend to be rarer than smaller ones and therefore the distributions of these predictors are often skewed to the right. 2. If the relationship between your X and Y appears to be non-linear, there are several ways to go. If it s U-shaped or strongly humped, you probably need to use a quadratic or cubic model (e.g., Y = a + bx 2 ). If you only have a slight bend in your relationship, or an asymptotic curve, transformations of X and/or Y should solve the problem. If your transformed variables are approximately normal, you are likely to find that your residuals are also normal and the resulting associationas are linear. 4. Here are some common residual patterns and suggestions for transformations a. Properly transformed variables should look like this residuals x variable b. Needs transformation with e x, x 2, or other power function

16 c. Use ln, log, sqrt x, or 1/x d. A linear term is missing from the equation, incorrect model e. Extra quadratic terms are required, polynomial regression, or transformation of Y E. Multiple regression: when there are several continuous independent (predictor) variables that may affect the dependent (outcome) variable, multiple regression is the appropriate test. The important concept in multiple regression is the notion of statistical control of confounding effects, which occurs when some of the independent variables are correlated with each other. Multiple regression examines the effect of each independent variable on the dependent variable while controlling for (removing the effect of) the confounding effects of all the other independent variables in the regression model. 1. An example: In cooperatively breeding meerkats, we want to determine whether group size affects reproductive success. However, we find that larger groups occur on higher quality territories, so the possibility exists that groups are more successful merely because they have better territories. Is the primary determinant of reproductive success group size, territory quality, or both? Separate simple regression analyses of each variable on the dependent variable reproductive success (below left) show that both group size and territory quality are significantly correlated with success. In a multiple regression analysis (below right), however, group size is still significantly correlated with success after controlling for territory quality, but territory quality is no longer significant after correcting for group size. We can conclude that group size has a significant and independent effect on reproductive success. Group size r = P = RS Group size P = RS Territory quality r = P = RS Territory quality P = Regression Analysis: RS versus TerrQual, GpSize The regression equation is

17 RS = TerrQual GpSize Predictor Coef SE Coef T P Constant TerrQual GpSize S = R-Sq = 23.5% R-Sq(adj) = 22.0% Analysis of Variance Source DF SS MS F P Regression Residual Error Total Interpretation of a multiple regression result is always facilitated by a thorough understanding of the pairwise correlations among all of the independent variables, as well as the simple relationship between each of the independent variables and the independent variable. It is always a good idea to inspect the correlation matrix of all variables involved in the analysis. When two or more of your predictors are highly correlated, you run the risk of having unstable estimates of your coefficients and 'wierd' results in your models due to colinearity (or multicolinearity if we are talking about three or more highly correlated predictors). The reason: when variables are highly correlated is almost as saying that they are measuring the same general phenomenon but under different scales. Thus, when your predictors are highly colinear, your model is basically trying to estimate the effect of a predictor after correcting for its own effect!! Solutions: either remove one of the highly corelated predictors from your regression or iuse data reduction techniques like PCA. Group size Terr qual RS Group size *** 0.485*** Territory qual * RS 1.0 F. Stepwise regression: When there are many possible independent variables, a stepwise analysis is often useful for finding the best set of informative variables. You often begin with a fully parameterized model and remove one by one non-significant terms (begining with interactions where available) until achieving the simplest possible model. G. Polynomial regression: When there is a non-linear relationship between X and Y, especially one that is non-monotonic, the relationship can still be modeled with a linear model. The trick is to include one or more power terms of X, eg. Y = X + X 2 + X 3

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics DETAILED CONTENTS About the Author Preface to the Instructor To the Student How to Use SPSS With This Book PART I INTRODUCTION AND DESCRIPTIVE STATISTICS 1. Introduction to Statistics 1.1 Descriptive and

More information

Contents. Acknowledgments. xix

Contents. Acknowledgments. xix Table of Preface Acknowledgments page xv xix 1 Introduction 1 The Role of the Computer in Data Analysis 1 Statistics: Descriptive and Inferential 2 Variables and Constants 3 The Measurement of Variables

More information

Glossary for the Triola Statistics Series

Glossary for the Triola Statistics Series Glossary for the Triola Statistics Series Absolute deviation The measure of variation equal to the sum of the deviations of each value from the mean, divided by the number of values Acceptance sampling

More information

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007) FROM: PAGANO, R. R. (007) I. INTRODUCTION: DISTINCTION BETWEEN PARAMETRIC AND NON-PARAMETRIC TESTS Statistical inference tests are often classified as to whether they are parametric or nonparametric Parameter

More information

Basic Statistical Analysis

Basic Statistical Analysis indexerrt.qxd 8/21/2002 9:47 AM Page 1 Corrected index pages for Sprinthall Basic Statistical Analysis Seventh Edition indexerrt.qxd 8/21/2002 9:47 AM Page 656 Index Abscissa, 24 AB-STAT, vii ADD-OR rule,

More information

Exam details. Final Review Session. Things to Review

Exam details. Final Review Session. Things to Review Exam details Final Review Session Short answer, similar to book problems Formulae and tables will be given You CAN use a calculator Date and Time: Dec. 7, 006, 1-1:30 pm Location: Osborne Centre, Unit

More information

Introduction and Descriptive Statistics p. 1 Introduction to Statistics p. 3 Statistics, Science, and Observations p. 5 Populations and Samples p.

Introduction and Descriptive Statistics p. 1 Introduction to Statistics p. 3 Statistics, Science, and Observations p. 5 Populations and Samples p. Preface p. xi Introduction and Descriptive Statistics p. 1 Introduction to Statistics p. 3 Statistics, Science, and Observations p. 5 Populations and Samples p. 6 The Scientific Method and the Design of

More information

Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami

Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami Parametric Assumptions The observations must be independent. Dependent variable should be continuous

More information

Chapter Fifteen. Frequency Distribution, Cross-Tabulation, and Hypothesis Testing

Chapter Fifteen. Frequency Distribution, Cross-Tabulation, and Hypothesis Testing Chapter Fifteen Frequency Distribution, Cross-Tabulation, and Hypothesis Testing Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 15-1 Internet Usage Data Table 15.1 Respondent Sex Familiarity

More information

Statistics Handbook. All statistical tables were computed by the author.

Statistics Handbook. All statistical tables were computed by the author. Statistics Handbook Contents Page Wilcoxon rank-sum test (Mann-Whitney equivalent) Wilcoxon matched-pairs test 3 Normal Distribution 4 Z-test Related samples t-test 5 Unrelated samples t-test 6 Variance

More information

Mathematical Notation Math Introduction to Applied Statistics

Mathematical Notation Math Introduction to Applied Statistics Mathematical Notation Math 113 - Introduction to Applied Statistics Name : Use Word or WordPerfect to recreate the following documents. Each article is worth 10 points and should be emailed to the instructor

More information

Review for Final. Chapter 1 Type of studies: anecdotal, observational, experimental Random sampling

Review for Final. Chapter 1 Type of studies: anecdotal, observational, experimental Random sampling Review for Final For a detailed review of Chapters 1 7, please see the review sheets for exam 1 and. The following only briefly covers these sections. The final exam could contain problems that are included

More information

Review of Multiple Regression

Review of Multiple Regression Ronald H. Heck 1 Let s begin with a little review of multiple regression this week. Linear models [e.g., correlation, t-tests, analysis of variance (ANOVA), multiple regression, path analysis, multivariate

More information

1 Introduction to Minitab

1 Introduction to Minitab 1 Introduction to Minitab Minitab is a statistical analysis software package. The software is freely available to all students and is downloadable through the Technology Tab at my.calpoly.edu. When you

More information

AP Statistics Cumulative AP Exam Study Guide

AP Statistics Cumulative AP Exam Study Guide AP Statistics Cumulative AP Eam Study Guide Chapters & 3 - Graphs Statistics the science of collecting, analyzing, and drawing conclusions from data. Descriptive methods of organizing and summarizing statistics

More information

Transition Passage to Descriptive Statistics 28

Transition Passage to Descriptive Statistics 28 viii Preface xiv chapter 1 Introduction 1 Disciplines That Use Quantitative Data 5 What Do You Mean, Statistics? 6 Statistics: A Dynamic Discipline 8 Some Terminology 9 Problems and Answers 12 Scales of

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

Correlation and Simple Linear Regression

Correlation and Simple Linear Regression Correlation and Simple Linear Regression Sasivimol Rattanasiri, Ph.D Section for Clinical Epidemiology and Biostatistics Ramathibodi Hospital, Mahidol University E-mail: sasivimol.rat@mahidol.ac.th 1 Outline

More information

Experimental Design and Data Analysis for Biologists

Experimental Design and Data Analysis for Biologists Experimental Design and Data Analysis for Biologists Gerry P. Quinn Monash University Michael J. Keough University of Melbourne CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv I I Introduction 1 1.1

More information

Hypothesis testing, part 2. With some material from Howard Seltman, Blase Ur, Bilge Mutlu, Vibha Sazawal

Hypothesis testing, part 2. With some material from Howard Seltman, Blase Ur, Bilge Mutlu, Vibha Sazawal Hypothesis testing, part 2 With some material from Howard Seltman, Blase Ur, Bilge Mutlu, Vibha Sazawal 1 CATEGORICAL IV, NUMERIC DV 2 Independent samples, one IV # Conditions Normal/Parametric Non-parametric

More information

THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE

THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE THE ROYAL STATISTICAL SOCIETY 004 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE PAPER II STATISTICAL METHODS The Society provides these solutions to assist candidates preparing for the examinations in future

More information

LOOKING FOR RELATIONSHIPS

LOOKING FOR RELATIONSHIPS LOOKING FOR RELATIONSHIPS One of most common types of investigation we do is to look for relationships between variables. Variables may be nominal (categorical), for example looking at the effect of an

More information

Introduction to Statistical Analysis

Introduction to Statistical Analysis Introduction to Statistical Analysis Changyu Shen Richard A. and Susan F. Smith Center for Outcomes Research in Cardiology Beth Israel Deaconess Medical Center Harvard Medical School Objectives Descriptive

More information

Intuitive Biostatistics: Choosing a statistical test

Intuitive Biostatistics: Choosing a statistical test pagina 1 van 5 < BACK Intuitive Biostatistics: Choosing a statistical This is chapter 37 of Intuitive Biostatistics (ISBN 0-19-508607-4) by Harvey Motulsky. Copyright 1995 by Oxfd University Press Inc.

More information

Practical Statistics for the Analytical Scientist Table of Contents

Practical Statistics for the Analytical Scientist Table of Contents Practical Statistics for the Analytical Scientist Table of Contents Chapter 1 Introduction - Choosing the Correct Statistics 1.1 Introduction 1.2 Choosing the Right Statistical Procedures 1.2.1 Planning

More information

Lecture 7: Hypothesis Testing and ANOVA

Lecture 7: Hypothesis Testing and ANOVA Lecture 7: Hypothesis Testing and ANOVA Goals Overview of key elements of hypothesis testing Review of common one and two sample tests Introduction to ANOVA Hypothesis Testing The intent of hypothesis

More information

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model EPSY 905: Multivariate Analysis Lecture 1 20 January 2016 EPSY 905: Lecture 1 -

More information

Chapter 15: Nonparametric Statistics Section 15.1: An Overview of Nonparametric Statistics

Chapter 15: Nonparametric Statistics Section 15.1: An Overview of Nonparametric Statistics Section 15.1: An Overview of Nonparametric Statistics Understand Difference between Parametric and Nonparametric Statistical Procedures Parametric statistical procedures inferential procedures that rely

More information

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018 Statistics Boot Camp Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018 March 21, 2018 Outline of boot camp Summarizing and simplifying data Point and interval estimation Foundations of statistical

More information

Module 03 Lecture 14 Inferential Statistics ANOVA and TOI

Module 03 Lecture 14 Inferential Statistics ANOVA and TOI Introduction of Data Analytics Prof. Nandan Sudarsanam and Prof. B Ravindran Department of Management Studies and Department of Computer Science and Engineering Indian Institute of Technology, Madras Module

More information

STP 420 INTRODUCTION TO APPLIED STATISTICS NOTES

STP 420 INTRODUCTION TO APPLIED STATISTICS NOTES INTRODUCTION TO APPLIED STATISTICS NOTES PART - DATA CHAPTER LOOKING AT DATA - DISTRIBUTIONS Individuals objects described by a set of data (people, animals, things) - all the data for one individual make

More information

Stat 101 Exam 1 Important Formulas and Concepts 1

Stat 101 Exam 1 Important Formulas and Concepts 1 1 Chapter 1 1.1 Definitions Stat 101 Exam 1 Important Formulas and Concepts 1 1. Data Any collection of numbers, characters, images, or other items that provide information about something. 2. Categorical/Qualitative

More information

9 Correlation and Regression

9 Correlation and Regression 9 Correlation and Regression SW, Chapter 12. Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then retakes the

More information

Fundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur

Fundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur Fundamentals to Biostatistics Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur Statistics collection, analysis, interpretation of data development of new

More information

Introduction to Basic Statistics Version 2

Introduction to Basic Statistics Version 2 Introduction to Basic Statistics Version 2 Pat Hammett, Ph.D. University of Michigan 2014 Instructor Comments: This document contains a brief overview of basic statistics and core terminology/concepts

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

NON-PARAMETRIC STATISTICS * (http://www.statsoft.com)

NON-PARAMETRIC STATISTICS * (http://www.statsoft.com) NON-PARAMETRIC STATISTICS * (http://www.statsoft.com) 1. GENERAL PURPOSE 1.1 Brief review of the idea of significance testing To understand the idea of non-parametric statistics (the term non-parametric

More information

Everything is not normal

Everything is not normal Everything is not normal According to the dictionary, one thing is considered normal when it s in its natural state or conforms to standards set in advance. And this is its normal meaning. But, like many

More information

GROUPED DATA E.G. FOR SAMPLE OF RAW DATA (E.G. 4, 12, 7, 5, MEAN G x / n STANDARD DEVIATION MEDIAN AND QUARTILES STANDARD DEVIATION

GROUPED DATA E.G. FOR SAMPLE OF RAW DATA (E.G. 4, 12, 7, 5, MEAN G x / n STANDARD DEVIATION MEDIAN AND QUARTILES STANDARD DEVIATION FOR SAMPLE OF RAW DATA (E.G. 4, 1, 7, 5, 11, 6, 9, 7, 11, 5, 4, 7) BE ABLE TO COMPUTE MEAN G / STANDARD DEVIATION MEDIAN AND QUARTILES Σ ( Σ) / 1 GROUPED DATA E.G. AGE FREQ. 0-9 53 10-19 4...... 80-89

More information

Lecture 14: Introduction to Poisson Regression

Lecture 14: Introduction to Poisson Regression Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why

More information

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week

More information

Statistics: revision

Statistics: revision NST 1B Experimental Psychology Statistics practical 5 Statistics: revision Rudolf Cardinal & Mike Aitken 29 / 30 April 2004 Department of Experimental Psychology University of Cambridge Handouts: Answers

More information

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization. Statistical Tools in Evaluation HPS 41 Dr. Joe G. Schmalfeldt Types of Scores Continuous Scores scores with a potentially infinite number of values. Discrete Scores scores limited to a specific number

More information

Background to Statistics

Background to Statistics FACT SHEET Background to Statistics Introduction Statistics include a broad range of methods for manipulating, presenting and interpreting data. Professional scientists of all kinds need to be proficient

More information

Statistics Toolbox 6. Apply statistical algorithms and probability models

Statistics Toolbox 6. Apply statistical algorithms and probability models Statistics Toolbox 6 Apply statistical algorithms and probability models Statistics Toolbox provides engineers, scientists, researchers, financial analysts, and statisticians with a comprehensive set of

More information

3 Joint Distributions 71

3 Joint Distributions 71 2.2.3 The Normal Distribution 54 2.2.4 The Beta Density 58 2.3 Functions of a Random Variable 58 2.4 Concluding Remarks 64 2.5 Problems 64 3 Joint Distributions 71 3.1 Introduction 71 3.2 Discrete Random

More information

ANOVA - analysis of variance - used to compare the means of several populations.

ANOVA - analysis of variance - used to compare the means of several populations. 12.1 One-Way Analysis of Variance ANOVA - analysis of variance - used to compare the means of several populations. Assumptions for One-Way ANOVA: 1. Independent samples are taken using a randomized design.

More information

Rank-Based Methods. Lukas Meier

Rank-Based Methods. Lukas Meier Rank-Based Methods Lukas Meier 20.01.2014 Introduction Up to now we basically always used a parametric family, like the normal distribution N (µ, σ 2 ) for modeling random data. Based on observed data

More information

Hypothesis Testing hypothesis testing approach

Hypothesis Testing hypothesis testing approach Hypothesis Testing In this case, we d be trying to form an inference about that neighborhood: Do people there shop more often those people who are members of the larger population To ascertain this, we

More information

Final Exam - Solutions

Final Exam - Solutions Ecn 102 - Analysis of Economic Data University of California - Davis March 19, 2010 Instructor: John Parman Final Exam - Solutions You have until 5:30pm to complete this exam. Please remember to put your

More information

ANALYSIS OF VARIANCE OF BALANCED DAIRY SCIENCE DATA USING SAS

ANALYSIS OF VARIANCE OF BALANCED DAIRY SCIENCE DATA USING SAS ANALYSIS OF VARIANCE OF BALANCED DAIRY SCIENCE DATA USING SAS Ravinder Malhotra and Vipul Sharma National Dairy Research Institute, Karnal-132001 The most common use of statistics in dairy science is testing

More information

ESP 178 Applied Research Methods. 2/23: Quantitative Analysis

ESP 178 Applied Research Methods. 2/23: Quantitative Analysis ESP 178 Applied Research Methods 2/23: Quantitative Analysis Data Preparation Data coding create codebook that defines each variable, its response scale, how it was coded Data entry for mail surveys and

More information

Regression Analysis. Table Relationship between muscle contractile force (mj) and stimulus intensity (mv).

Regression Analysis. Table Relationship between muscle contractile force (mj) and stimulus intensity (mv). Regression Analysis Two variables may be related in such a way that the magnitude of one, the dependent variable, is assumed to be a function of the magnitude of the second, the independent variable; however,

More information

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 1: August 22, 2012

More information

Analysis of variance (ANOVA) Comparing the means of more than two groups

Analysis of variance (ANOVA) Comparing the means of more than two groups Analysis of variance (ANOVA) Comparing the means of more than two groups Example: Cost of mating in male fruit flies Drosophila Treatments: place males with and without unmated (virgin) females Five treatments

More information

Degrees of freedom df=1. Limitations OR in SPSS LIM: Knowing σ and µ is unlikely in large

Degrees of freedom df=1. Limitations OR in SPSS LIM: Knowing σ and µ is unlikely in large Z Test Comparing a group mean to a hypothesis T test (about 1 mean) T test (about 2 means) Comparing mean to sample mean. Similar means = will have same response to treatment Two unknown means are different

More information

What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty.

What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty. What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty. Statistics is a field of study concerned with the data collection,

More information

psychological statistics

psychological statistics psychological statistics B Sc. Counselling Psychology 011 Admission onwards III SEMESTER COMPLEMENTARY COURSE UNIVERSITY OF CALICUT SCHOOL OF DISTANCE EDUCATION CALICUT UNIVERSITY.P.O., MALAPPURAM, KERALA,

More information

STAT 200 Chapter 1 Looking at Data - Distributions

STAT 200 Chapter 1 Looking at Data - Distributions STAT 200 Chapter 1 Looking at Data - Distributions What is Statistics? Statistics is a science that involves the design of studies, data collection, summarizing and analyzing the data, interpreting the

More information

Elementary Statistics

Elementary Statistics Elementary Statistics Q: What is data? Q: What does the data look like? Q: What conclusions can we draw from the data? Q: Where is the middle of the data? Q: Why is the spread of the data important? Q:

More information

Turning a research question into a statistical question.

Turning a research question into a statistical question. Turning a research question into a statistical question. IGINAL QUESTION: Concept Concept Concept ABOUT ONE CONCEPT ABOUT RELATIONSHIPS BETWEEN CONCEPTS TYPE OF QUESTION: DESCRIBE what s going on? DECIDE

More information

Regression, Part I. - In correlation, it would be irrelevant if we changed the axes on our graph.

Regression, Part I. - In correlation, it would be irrelevant if we changed the axes on our graph. Regression, Part I I. Difference from correlation. II. Basic idea: A) Correlation describes the relationship between two variables, where neither is independent or a predictor. - In correlation, it would

More information

Unit 14: Nonparametric Statistical Methods

Unit 14: Nonparametric Statistical Methods Unit 14: Nonparametric Statistical Methods Statistics 571: Statistical Methods Ramón V. León 8/8/2003 Unit 14 - Stat 571 - Ramón V. León 1 Introductory Remarks Most methods studied so far have been based

More information

The entire data set consists of n = 32 widgets, 8 of which were made from each of q = 4 different materials.

The entire data set consists of n = 32 widgets, 8 of which were made from each of q = 4 different materials. One-Way ANOVA Summary The One-Way ANOVA procedure is designed to construct a statistical model describing the impact of a single categorical factor X on a dependent variable Y. Tests are run to determine

More information

sphericity, 5-29, 5-32 residuals, 7-1 spread and level, 2-17 t test, 1-13 transformations, 2-15 violations, 1-19

sphericity, 5-29, 5-32 residuals, 7-1 spread and level, 2-17 t test, 1-13 transformations, 2-15 violations, 1-19 additive tree structure, 10-28 ADDTREE, 10-51, 10-53 EXTREE, 10-31 four point condition, 10-29 ADDTREE, 10-28, 10-51, 10-53 adjusted R 2, 8-7 ALSCAL, 10-49 ANCOVA, 9-1 assumptions, 9-5 example, 9-7 MANOVA

More information

3rd Quartile. 1st Quartile) Minimum

3rd Quartile. 1st Quartile) Minimum EXST7034 - Regression Techniques Page 1 Regression diagnostics dependent variable Y3 There are a number of graphic representations which will help with problem detection and which can be used to obtain

More information

CIVL 7012/8012. Collection and Analysis of Information

CIVL 7012/8012. Collection and Analysis of Information CIVL 7012/8012 Collection and Analysis of Information Uncertainty in Engineering Statistics deals with the collection and analysis of data to solve real-world problems. Uncertainty is inherent in all real

More information

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization. Statistical Tools in Evaluation HPS 41 Fall 213 Dr. Joe G. Schmalfeldt Types of Scores Continuous Scores scores with a potentially infinite number of values. Discrete Scores scores limited to a specific

More information

MATH 1150 Chapter 2 Notation and Terminology

MATH 1150 Chapter 2 Notation and Terminology MATH 1150 Chapter 2 Notation and Terminology Categorical Data The following is a dataset for 30 randomly selected adults in the U.S., showing the values of two categorical variables: whether or not the

More information

Frequency Distribution Cross-Tabulation

Frequency Distribution Cross-Tabulation Frequency Distribution Cross-Tabulation 1) Overview 2) Frequency Distribution 3) Statistics Associated with Frequency Distribution i. Measures of Location ii. Measures of Variability iii. Measures of Shape

More information

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

 M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2 Notation and Equations for Final Exam Symbol Definition X The variable we measure in a scientific study n The size of the sample N The size of the population M The mean of the sample µ The mean of the

More information

Acknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression

Acknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression INTRODUCTION TO CLINICAL RESEARCH Introduction to Linear Regression Karen Bandeen-Roche, Ph.D. July 17, 2012 Acknowledgements Marie Diener-West Rick Thompson ICTR Leadership / Team JHU Intro to Clinical

More information

MULTIPLE REGRESSION METHODS

MULTIPLE REGRESSION METHODS DEPARTMENT OF POLITICAL SCIENCE AND INTERNATIONAL RELATIONS Posc/Uapp 816 MULTIPLE REGRESSION METHODS I. AGENDA: A. Residuals B. Transformations 1. A useful procedure for making transformations C. Reading:

More information

Textbook Examples of. SPSS Procedure

Textbook Examples of. SPSS Procedure Textbook s of IBM SPSS Procedures Each SPSS procedure listed below has its own section in the textbook. These sections include a purpose statement that describes the statistical test, identification of

More information

Sociology 6Z03 Review II

Sociology 6Z03 Review II Sociology 6Z03 Review II John Fox McMaster University Fall 2016 John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 1 / 35 Outline: Review II Probability Part I Sampling Distributions Probability

More information

INTRODUCTION TO DESIGN AND ANALYSIS OF EXPERIMENTS

INTRODUCTION TO DESIGN AND ANALYSIS OF EXPERIMENTS GEORGE W. COBB Mount Holyoke College INTRODUCTION TO DESIGN AND ANALYSIS OF EXPERIMENTS Springer CONTENTS To the Instructor Sample Exam Questions To the Student Acknowledgments xv xxi xxvii xxix 1. INTRODUCTION

More information

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics Exploring Data: Distributions Look for overall pattern (shape, center, spread) and deviations (outliers). Mean (use a calculator): x = x 1 + x

More information

THE PRINCIPLES AND PRACTICE OF STATISTICS IN BIOLOGICAL RESEARCH. Robert R. SOKAL and F. James ROHLF. State University of New York at Stony Brook

THE PRINCIPLES AND PRACTICE OF STATISTICS IN BIOLOGICAL RESEARCH. Robert R. SOKAL and F. James ROHLF. State University of New York at Stony Brook BIOMETRY THE PRINCIPLES AND PRACTICE OF STATISTICS IN BIOLOGICAL RESEARCH THIRD E D I T I O N Robert R. SOKAL and F. James ROHLF State University of New York at Stony Brook W. H. FREEMAN AND COMPANY New

More information

Basics of Experimental Design. Review of Statistics. Basic Study. Experimental Design. When an Experiment is Not Possible. Studying Relations

Basics of Experimental Design. Review of Statistics. Basic Study. Experimental Design. When an Experiment is Not Possible. Studying Relations Basics of Experimental Design Review of Statistics And Experimental Design Scientists study relation between variables In the context of experiments these variables are called independent and dependent

More information

An Analysis of College Algebra Exam Scores December 14, James D Jones Math Section 01

An Analysis of College Algebra Exam Scores December 14, James D Jones Math Section 01 An Analysis of College Algebra Exam s December, 000 James D Jones Math - Section 0 An Analysis of College Algebra Exam s Introduction Students often complain about a test being too difficult. Are there

More information

STAT Section 5.8: Block Designs

STAT Section 5.8: Block Designs STAT 518 --- Section 5.8: Block Designs Recall that in paired-data studies, we match up pairs of subjects so that the two subjects in a pair are alike in some sense. Then we randomly assign, say, treatment

More information

QUANTITATIVE TECHNIQUES

QUANTITATIVE TECHNIQUES UNIVERSITY OF CALICUT SCHOOL OF DISTANCE EDUCATION (For B Com. IV Semester & BBA III Semester) COMPLEMENTARY COURSE QUANTITATIVE TECHNIQUES QUESTION BANK 1. The techniques which provide the decision maker

More information

Simple Linear Regression Using Ordinary Least Squares

Simple Linear Regression Using Ordinary Least Squares Simple Linear Regression Using Ordinary Least Squares Purpose: To approximate a linear relationship with a line. Reason: We want to be able to predict Y using X. Definition: The Least Squares Regression

More information

Analysing data: regression and correlation S6 and S7

Analysing data: regression and correlation S6 and S7 Basic medical statistics for clinical and experimental research Analysing data: regression and correlation S6 and S7 K. Jozwiak k.jozwiak@nki.nl 2 / 49 Correlation So far we have looked at the association

More information

Correlation and Regression

Correlation and Regression Correlation and Regression Dr. Bob Gee Dean Scott Bonney Professor William G. Journigan American Meridian University 1 Learning Objectives Upon successful completion of this module, the student should

More information

F78SC2 Notes 2 RJRC. If the interest rate is 5%, we substitute x = 0.05 in the formula. This gives

F78SC2 Notes 2 RJRC. If the interest rate is 5%, we substitute x = 0.05 in the formula. This gives F78SC2 Notes 2 RJRC Algebra It is useful to use letters to represent numbers. We can use the rules of arithmetic to manipulate the formula and just substitute in the numbers at the end. Example: 100 invested

More information

Section 4.6 Simple Linear Regression

Section 4.6 Simple Linear Regression Section 4.6 Simple Linear Regression Objectives ˆ Basic philosophy of SLR and the regression assumptions ˆ Point & interval estimation of the model parameters, and how to make predictions ˆ Point and interval

More information

Types of Statistical Tests DR. MIKE MARRAPODI

Types of Statistical Tests DR. MIKE MARRAPODI Types of Statistical Tests DR. MIKE MARRAPODI Tests t tests ANOVA Correlation Regression Multivariate Techniques Non-parametric t tests One sample t test Independent t test Paired sample t test One sample

More information

Mathematical Notation Math Introduction to Applied Statistics

Mathematical Notation Math Introduction to Applied Statistics Mathematical Notation Math 113 - Introduction to Applied Statistics Name : Use Word or WordPerfect to recreate the following documents. Each article is worth 10 points and can be printed and given to the

More information

Contingency Tables. Contingency tables are used when we want to looking at two (or more) factors. Each factor might have two more or levels.

Contingency Tables. Contingency tables are used when we want to looking at two (or more) factors. Each factor might have two more or levels. Contingency Tables Definition & Examples. Contingency tables are used when we want to looking at two (or more) factors. Each factor might have two more or levels. (Using more than two factors gets complicated,

More information

Statistics for Managers using Microsoft Excel 6 th Edition

Statistics for Managers using Microsoft Excel 6 th Edition Statistics for Managers using Microsoft Excel 6 th Edition Chapter 3 Numerical Descriptive Measures 3-1 Learning Objectives In this chapter, you learn: To describe the properties of central tendency, variation,

More information

1 A Review of Correlation and Regression

1 A Review of Correlation and Regression 1 A Review of Correlation and Regression SW, Chapter 12 Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then

More information

Analysis of Covariance (ANCOVA) with Two Groups

Analysis of Covariance (ANCOVA) with Two Groups Chapter 226 Analysis of Covariance (ANCOVA) with Two Groups Introduction This procedure performs analysis of covariance (ANCOVA) for a grouping variable with 2 groups and one covariate variable. This procedure

More information

Ch. 1: Data and Distributions

Ch. 1: Data and Distributions Ch. 1: Data and Distributions Populations vs. Samples How to graphically display data Histograms, dot plots, stem plots, etc Helps to show how samples are distributed Distributions of both continuous and

More information

Introduction to inferential statistics. Alissa Melinger IGK summer school 2006 Edinburgh

Introduction to inferential statistics. Alissa Melinger IGK summer school 2006 Edinburgh Introduction to inferential statistics Alissa Melinger IGK summer school 2006 Edinburgh Short description Prereqs: I assume no prior knowledge of stats This half day tutorial on statistical analysis will

More information

An introduction to biostatistics: part 1

An introduction to biostatistics: part 1 An introduction to biostatistics: part 1 Cavan Reilly September 6, 2017 Table of contents Introduction to data analysis Uncertainty Probability Conditional probability Random variables Discrete random

More information

HYPOTHESIS TESTING II TESTS ON MEANS. Sorana D. Bolboacă

HYPOTHESIS TESTING II TESTS ON MEANS. Sorana D. Bolboacă HYPOTHESIS TESTING II TESTS ON MEANS Sorana D. Bolboacă OBJECTIVES Significance value vs p value Parametric vs non parametric tests Tests on means: 1 Dec 14 2 SIGNIFICANCE LEVEL VS. p VALUE Materials and

More information

Introduction to Statistics with GraphPad Prism 7

Introduction to Statistics with GraphPad Prism 7 Introduction to Statistics with GraphPad Prism 7 Outline of the course Power analysis with G*Power Basic structure of a GraphPad Prism project Analysis of qualitative data Chi-square test Analysis of quantitative

More information