Introduction to inferential statistics. Alissa Melinger IGK summer school 2006 Edinburgh

Size: px

Start display at page:

Download "Introduction to inferential statistics. Alissa Melinger IGK summer school 2006 Edinburgh"

Maximilian Stevens
6 years ago
Views:

1 Introduction to inferential statistics Alissa Melinger IGK summer school 2006 Edinburgh

2 Short description Prereqs: I assume no prior knowledge of stats This half day tutorial on statistical analysis will introduce the most common statistical approaches used for analyzing behavioral data, including t-tests, anovas, chisquared, correlations, regressions and nonparametric tests.

3 The focus will be on the appropriate application of these methods (when to use which test) rather than on their underlying mathematical foundations. The introduction will be presented in a very pragmatic problem-oriented manner with examples of how to conduct the analyses using SPSS, interpret the program output and report the results. One goal of the tutorial will be to generalize the use of these tests Won t learn how to conduct the analyses, but hopefully enough background to help you bootstrap up (with help from r-tutorial or Dr. Google)

4 Structure of Tutorial Hour 1: Background and fundamental underpinnings to inferential statistics Hour 2: Tests for evaluating differences Hour 3: Tests for evaluating relations between two (or more) variables

5 Background and fundamental underpinnings to inferential statistics Hour 1

6 The Experiment An experiment should allow for a systematic observation of a particular behavior under controlled circumstances. If we properly control the experiment, there should be minimal difference between the situations we create. Therefore any observed quantitative or qualitative difference must be due to our manipulation.

7 Two Types of Variables You manipulate the situations under which the behavior is observed and measured. The variables you manipulate are your independent variables. You observe and measure a particular behavior. This measurement is your dependent variable

8 Two Types of Variables The independent variable is the variable that will be manipulated and compared or that conditions or governs the behavior. Word length, sentence type, frequency, semantic relationship, instruction types, age, sex, task, etc. The dependent variable is the behavior that will be observed and measured. Reaction times, accuracy, weight, ratings, etc.

9 IV Terminology Factor is another word for independent variable, e.g., word frequency or length Level- value or state of the factor. Condition (treatments)- all the different situations you create by combining the levels of your different factors.

10 IV Terminology: Example Question: Does the frequency of an antecedent influence the ease of processing an anaphor? IV (factor) 1 Antecedent frequency with two levels (frequent vs. infrequent) Alone, this would produce two conditions IV (factor) 2 Anaphor type with two levels (repeated NP vs. Pronoun)

11 Combining levels factorially Each level of each factor is combined with each level of each other factor, we get 4 conditions Repeated NP - infrequent antecedent Repeated NP - frequent antecedent Pronoun - infrequent antecedent Pronoun - frequent antecedent Condition 1 Condition 2 Condition 3 Condition 4 2 levels x 2 levels = 4 conditions 2 levels x 3 levels = 6 conditions

12 How to choose a stat 3 issues determine what statistical test is appropriate for you and your data: What type of design do you have? What type of question do you want to ask? What type of data do you have?

13 Design Issues How many independent variables do you have? How many levels of the IVs? Is your comparison within (related/dependent/repeated measures) or between (unrelated/independent)? Within: subjects (items) are tested in all levels of an IV. Between: subjects (items) are tested only one level of an IV.

14 Types of Questions Different questions are addressed with different tests. Are my two conditions different from one another? Is there a relationship between my two factors? Which combination of factors best explains the patterns in my data?

15 What type of data do you have? Parametric tests make assumptions about your data. Normally distributed Independent ** Homogeneity of variance At least interval scale ** If your data violate these assumptions, consider using a non-parametric test.

Normality Your data should be from a normally distributed population. Normal distributions are symmetrical bell-shaped distributions with the majority of scores around the center.

16 Normality Your data should be from a normally distributed population. Normal distributions are symmetrical bell-shaped distributions with the majority of scores around the center. If you collect enough data, it should be normally distributed (Central Limits Theorem) Evaluate normality by: histogram Kolmogorov-Smirnov test Shapiro-Wilks test. Data Assumptions

17 Homogeneity of Variance The variance should not change systematically throughout the data, especially not between groups of subjects or items. When you test different groups of subjects (monolinguals vs. bilinguals; test vs. control; trained vs. untrained), their variances should not differ. If you test two corpora, the variance should not differ. Evaluate with Levene s test for between tests Mauchly s test of Sphericity in Repeated Measures Data Assumptions

18 Independence Data from different subjects (items) are independent. The observations within each treatment condition must be independent. Subjects not randomly assigned to a group If observations in one condition a subset of observations in another condition correlated samples, such as a set of pre- and post-test observations on the same subjects, are not independent Some tests are specifically designed to deal with dependent (within) data. Data Assumptions

19 Types of Data Nominal scale: Numbers represent qualitative features, not quantitative. 1 not bigger than 2, just different; 1=masculine, 2 = feminine Ordinal Scale: Rankings, 1 < 2 < 3 < 4, but differences between values not important or constant; Likert scale data. Interval Scale: like ordinal, but distances are equal Differences make sense, but ratios don t (30-20 =20-10, but 20 /10 is not twice as hot) e.g., temperature, dates Ratio Scale: interval, plus a meaningful 0 point. Weight, length, reaction times, age Data Assumptions

20 What type of data do you have? Parametric tests make assumptions about your data. Normally distributed Independent ** Homogeneity of variance At least interval scale ** Parametric tests are not robust to violations of the starred assumptions Data Assumptions

21 Miller, 1984 Type of Research Design One- Sample Two-Sample K sample Correlation Related Independ ent Related Independent Type of Data Parametric Onesample Z One sample t Related t Independ ent Z- Independ ent t- Variance Ratio (F) Variance Ratio (F) Variance Ratio (F) Productmoment correlation coefficient (Pearson s r) Linear regression Non-parametric Onesample proportio ns Wilcoxon Sign Mann- Whitney χ 2 Page s L trend Jonckheere trend Spearman s rank correlation coefficient Picking a test

22 Between-subjects analyses # of conditions Parametric scores Nonparame tric - ordinal Nonparame tric - nominal two Independent samples t-test Mann- Whitney χ 2 Three or more Betweensubjects ANOVA Kruskal- Wallis χ 2 Picking a test

23 Within-subjects analyses # of conditions Parametric scores Nonparame tric - ordinal Nonparame tric - nominal two Dependent samples t Wilcoxon none Three or more Repeated Measures ANOVA Friedman none Picking a test

24 Decision Tree (Howell, 2004) Type of data Qualitative (Categorical) Quantitative Measurement Type of Categorization One Categorical Variable Type of Categorization Two Categorical Variables Type of Question Goodness-of-fit Chi-squared Contingency Table Chi-squared Relationships Differences One predictor Multiple predictors Number of Groups Continuous measurement Ranks Multiple Regression Two Multiple Primary interest Degree of Relationship Primary interest Form of Relationship Spearman's r s Independent Dependent Independent Dependent Pearson Correlation Regression Two-sample t Related Sample t One Independent variable Multiple independent variables Repeated Measures Mann-Whitney Wilcoxon One-way Anova Kruskal- Wallis Factorial ANOVA Friedman Picking a test

25 How do stats work and why do we use them? We observe some behavior, of individuals, the economy, our computer models, etc. We want to say something about this behavior. We d like to say something that extends beyond just the observations we made to future behaviors, past behaviors, unobserved behaviors.

26 Types of Statistical Analyses Descriptive statistics: summarizing and describing the important characteristics of the data. Inferential statistics: decide if a pattern, difference or relations found with a sample is representative and true of the population.

27 Why go beyond descriptives? Q: Are monks taller than brides? Monks mean height 190 Brides mean height 170 The mean might make it look like monks are taller, but maybe that that doesn t represent the truth.

29 Is this difference REAL? Descriptive difference Milk costs.69 at Plus and.89 at Mini-mall Is this a real difference? Subjective difference Is the difference important enough to me? Is it worth my while to travel farther to pay less? Statistical difference Is Plus generally cheaper than Mini-mall? How representative of the prices is milk? Are all Plus stores cheaper than all Mini-malls? How representative of all Plus stores is my Plus? Inferential statistics help us answer these questions without the need for an exhaustive survey.

30 How to inferential statistics work? Different statistical methods attempt to build a model of the data using hypothesized factors to account for the characteristics of the observed pattern. One simple model of the data is the MEAN

The mean Subjects A B C D E F # siblings 1 3 2 0 4 1 How

31 The mean Subjects A B C D E F # siblings How well does the mean model the data? ERROR Mean # siblings = 1.83

32 Variance Variance is an index of error between the mean and the individual observations. Sum of error is offset by positive and negative numbers Take the square of each error value Sum of squared errors (SS) will increase the more data you collect. Large number bad estimate of # of siblings Divide the sum of squared errors by N-1 Variance (s 2 ) = SS/N-1 #( x i " x) #( x i " x) 2

33 Standard Deviation Variance gives us measure in units squared, so not comparable to directly to the units measured. If your data has a range from 1-100, you could easily have a variance of > 100, which is not that informative an index of error. Standard Deviation is a measure of how well the mean represents the data. s = SS N "1

34 Sampling Sampling is a random selection of representative members of a population. If you had access to all members of a population then you would not need to conduct inferential statistics to see whether some observation generalizes to the whole population. Normally, we only have access to a (representative) subset. Most random samples tend to be fairly typical of the population, but there is always variation.

35 Standard Error Variance and Standard Deviation index the relationship between the mean of observations and individual observations. Comment on the SAMPLE MSE = S / Standard error is similar to SD but it applies to the relationship between a sample mean and the population. Standard errors give you a measure of how representative your sample is of the population. A large standard error means your sample is not very representative of the population. Small SE means it is representative. N

36 Normality and SD The combination of measures of variance and assumptions of normality underlie hypothesis testing and inferential statistics. Statistical tests output a probability that an observed pattern (difference or relationship) are true of the population. How does this work?

37 Normal Curves Normal curves can be defined by their mean and variance. Z-distribution has mean = 0 & SD = 1 x: N (0,1) 95% of data points are within about 2 SD of mean

38 Normal Curves Given these characteristics of normal curves, you can calculate what percentage of the data points are above or below some any value. This is a basic principle behind most of the common inferential tests.

39 Example: IQ scores X:N(x,s 2 ) x = 100 s 2 = 256; s= ( ) = 16 34% 50% What proportion of the population has an IQ of less than 84??

40 Standard normal transformation You can calculate the proportion above or below or between any (2) point(2) for any normal curve. z(x) = x " x" First, calculate how many SD a value is from s mean. Then look up value in table. z(108) = 108 " =

42 Normal curves as gateway to inferential statistic Given the distribution of normal curves, you can now identify the likelihood that two means come from the same population. Imagine you want to know if jetlag results in poor test taking ability. You test some kids who just flew from Edinburgh to Chicago. You know the population mean is 100 with SD=16. The sample you draw has a mean test result of 84. What can we conclude? We know that 16% of the general population has an IQ of 84 or below, so you want to know the chances of drawing, at random, a sample with mean IQ of 84. If the chances are p >.05 (or whatever cut off you want to use) you d conclude that jetlag doesn t have a reliable effect on IQ. If p <.05 you would conclude that it does. That is statistical significance!!!

43 Sources of Variability There are two sources of variability in your data. Variability caused by subjects and items (individual differences) Variability induced by your manipulation To find a statistically significant effect, you want the variation between conditions to be greater than within conditions. Note: if you don t have variance in your data (e.g., cuz the program is deterministic and always behaves the same way) inferential stats might not be for you.

44 Variability of results median High variability (between 2 conditions) Medium variability Low variability (difference between conditions is reliable and can be generalized)

45 Tests for finding differences Hour 2.2

46 Comparing two means If you are interested in knowing whether your two conditions differ from one another AND you have only 2 conditions. Evaluates the influence of your Independent Variable on your Dependent Variable.

47 Test options: 2 conditions Parametric data: the popular t-test 1 sample Independent pairs related pairs Non-parametric equivalents Related pairs Independent pairs

48 1-sample t-test Compares mean of sample data to theoretical population mean. Standard error used to gauge the variability between sample and population means. The difference between the sample mean and the hypothesized population mean must be greater than the normal variance found within the population to conclude that you have a significant effect of the IV. Standard error small, samples should have similar means; SE large, large diffs in means by chance are possible. µ=estimated pop mean t = X " µ standarderror ofmean Parametric tests of differences

49 How big is your t? How large should the t-statistic be? The probability of a difference reflecting a REAL effect is determined by the characteristics of the particular curve. Z and t curves are normal curves. F and χ 2 are positively skewed curves. P-values are sensitive to the size of the sample, so a t of 2.5 might be significant for a large N but not for a small N. Tables or statistical programs relate t-values, degrees of Freedom and test statistics.

50 Independent samples t-test Compares means of two independent groups (between-design). Same underlying principle (signal-to-noise) based on standard errors. Bigger t-value --> larger effects t = (X "Y) " (µ 1 "µ 2 ) Estimateof SE of difference between two sample means Parametric tests of differences

51 Related t-test Compares the means of two related groups; either matched-pairs withinsubjects designs. Test the same person in both conditions. Reduce the amount of unsystematic variation introduced into the experiment. Parametric tests of differences

52 Non-parametric tests Most non-parametric tests work on the principle of ranking the data. The analysis is conducted on the ranks, rather than the data itself. We lose information on effect magnitude, thus non-parametric tests are less powerful than parametric counterparts. Increased chance of type-ii error (false negative) Non-parametric tests of differences

53 Ranking the Responses Cond 1 Rank Cond 2 Rank Sub1a/b Sub2a/b Sub3a/b Sub4a/b Non-parametric tests of differences

54 Non-parametric tests Mann-Whitney: 2 independent pairs test. Operates on ranking the responses, independent of condition membership. Equal sample size not required. Wilcoxon sign ranked: 2 related pairs rank the absolute value of differences from a pair Equal sample size required Non-parametric tests of differences

55 Sign Test Alternative to either the 2 related or 2 independent (with random pairing) nonparametric tests. Requires equal sample size. Simply counts number of positive vs. negative differences between conditions. If no difference, 50/50 split expected. Calculate the P (n +) Non-parametric tests of differences

56 Comparing more than two means Hour 2.85

57 ANOVA Analysis of Variance Similar to t-test in that it also calculates a Signal-to-noise ratio, F. Signal = variance between conditions Noise = variance within conditions Can analyze more than 2 levels of a factor. Not appropriate to simply conduct multiple t-tests because of inflation of p-value. Can analyze more than 1 factor. Can reveal interactions between factors. Parametric tests for more than 2 conditions

58 1-way ANOVA Analogue to independent groups t-test for 3 or more levels of one factor. A 1-way anova with 2 levels is equivalent to a t-test. P-values the same: F=t 2 Sum of Squares df Mean Square F Sig. Between Groups , ,04 8 8,979,000 Within Groups , ,776 Total , Parametric tests for more than 2 conditions

59 ANCOVA Analysis of Covariance If you have continuous variable that was not manipulated but that might add variance, like word frequency, subject age, years of programming experience, sentence length, etc you can factor out the variance attributed to this covariate. This removes the error variance and makes a large ratio more likely. Available for independent and repeated measures variants Parametric tests for more than 2 conditions

60 Factorial univariate ANOVA When you have more than one IV but the analysis remains between subjects. This analysis allows you to test the main effect of each independent variable and also the interaction between the variables. If you have multiple dependent variables, you can use multivariate anova. Parametric tests for more than 2 conditions

61 Repeated Measures Anova Within subjects This analysis is appropriate for data from just 1 IV, multiple IVs, for mixed designs and can factor out covariates. Parametric tests for more than 2 conditions

62 Main effects and interactions Let s assume we have 2 factors with 2 levels each that we manipulated in an experiment: Factor one: lexical frequency of words (high frequency and low frequency) Factor two: word length (long words and short words) We measured reaction times for a naming task. In a repeated measures ANOVA we can potentially find 2 main effects and 1 interaction Interactions

63 Main effects and interactions Main effect of word length (long words 350 ms, short words 250 ms) No main effect of frequency (high frequency 300 ms, low frequency 300 ms) Interaction between word length and frequency (i.e., frequency has a different influence on long words and short words) high frequency low frequency short words long words Interactions

64 Interactions

65 Interactions Interactions indicate that independent variable X influences the dependent variable differently depending on the level of independent variable Y. Interpret your main effects with the consideration of the interaction. You can have an interaction with no main effects. Interactions

66 Types of interactions Antagonistic interaction: the two independent variables reverse each other s effects. Synergistic interaction: a higher level of B enhances the effect of A. Ceiling-effect interaction: the higher level of B reduces the differential effect of A. Interactions

67 Antagonistic Interaction B1 B2 main Effect of A the two independent variables reverse each other s effects. 0 A1 A2 Interactions

68 Synergistic Interaction B1 B2 main Effect of A a higher level of B enhances the effect of A. 0 A1 A2 Interactions

69 Ceiling-effect Interaction B1 B2 main Effect of A The higher level of B reduces the differential effect of A. 0 A1 A2 Interactions

70 Interpreting Interactions If you want to say that two factors influence each other, you need to demonstrate an interaction If you want to say that a factor affects two DVs differently, you need to demonstrate an interaction. Interactions

71 Non-parametric alternatives If you have 1 independent variable with more than 2 levels (K levels): Kruskal Wallis independent test for between designs Friedman related samples test for within designs. Non-parametric tests of more than 2 conditions

72 Pearson s Chi-squared Appropriate if: 1 Categorical Dependent Variable (dichotomous data) 2 Independent variables with at least 2 levels Detects whether there is a significant association between two variables (no causality) Assumptions: Each person or case contributes to only one cell of the contingency table. Not appropriate for repeated measures. takes raw data, not proportions. Values in all cells should be > 5 (else do fisher s exact test) Non-parametric tests of more than 2 conditions

73 Hierarchical Loglinear Analysis If you have more than 3 Independent Variables and are interested in the Higher Order Interactions. Designed to analyze multi-way contingency tables Table entries are frequencies Functionally similar to χ 2 Non-parametric tests of more than 2 conditions

74 Summary for anovas If you have 1 independent factor with K levels between subjects: 1-way ANOVA If you have a covarying factor and one or more between subjects independent factors, use univariate ANOVA If you have repeated measures design, with 1 or more manipulated factors, with or without a covariate or an additional between subjects factor, use Repeated Measures

75 Correlations & Linear Regressions Hour 3.25

76 Question Are two conditions different? What relationship exists between two or more variables? Positively related: as x goes, so goes y. Negatively related: whatever x does, y does the opposite. No relationship. Correlations

77 Example of linear correlation 3000 Advertsing Budget (thousands of pounds) Record S ales (thousand s) Correlations

78 Covariance An association is indexed by covariance. Are changes in one variable met with a similar or opposite change in another variable? Variance (s 2 ) = SS/N-1 SS = #( x i " x) 2 We squared the error scores when looking for variance within one variable. If interested in the association between two variables, we multiply the error scores together. Correlations

79 Calculating Covariance If deviations from the mean go in the same directions for both variables, you ll get a positive number. If deviations from the mean go in opposite directions (one negative, one positive) you ll get a negative number. cov(x, y) = # ( x i " x) ( y i " y) N "1 Correlations

80 Interpreting linear relations Correlation coefficient [r] = linear relationship between two variables. Measures the amount of spread around an imaginary line through the center. r 2 = proportion of common variation in the two variables (strength or magnitude of the relationship). Proportion of the variance in 1 set of score that can be accounted for by knowing X. Outliers? A single outlier can greatly influence the strength of a correlation. Correlations

81 Effect of outliers One approach to dealing with outliers is to see if they are non-representative (i.e., at the far end of the normal distribution). If so, they should be removed. Correlations

82 Types of correlations Bivariate correlation: between two variable Pearson s correlation coefficient for parametric data (interval or ratio data) Partial correlation: relationship between two variables while controlling the effect of one or more additional variables. Biserial correlation: when one variable is dichotomous (e.g., alive vs. dead, male vs. female) Correlations

83 Drawing conclusions Correlations only inform us about a relationship between two or more variables. Not able to talk about directionality or causality. An increase in X does not CAUSE an increase in Y or vise versa. Cause could be from unmeasured third variable. Correlations

84 R 2 By squaring our test statistic, we can tell how much of total variance in the data for variable x is in common with variable y. R 2 = =.056 = 5.6% of variance. (94% of variability still unaccounted for!) = = 58% Correlations

85 Non-parametric correlations Spearman s Rho For non-interval data Ranks the data and then applies Pearson s equation to ranks. Kendall s Tau Preferred for small data sets with many tied rankings. Correlations

86 Simple Linear Regressions Hour 2.75

87 Regressions Correlations detect associations between two variables. Say nothing of causal relationships or directionality Can t predict behavior on one variable given a value behavior for another variable With Regression models we can predict variable Y based on variable X. Regressions

88 Simple Linear Regressions A line is fit to the data (similar to the correlations line). Best line is one that produces the smallest sum of squares from regression line to data points. Evaluation based on improvement of prediction relative to using the mean. Regressions

89 Hypothetical Data Predictor variable outcome variable Mean Regressions

90 Error from Mean Predictor variable outcome variable Mean Regressions

91 Predictor variable Error from regression line outcome variable Regression line Mean Regressions

92 Regression Results The best regression line has the lowest sum of squared errors Evaluation of the regression model is achieved via R 2 = tells you % of variance accounted for by the regression line (as with correlations) F = Evaluates improvement of regression line compared to the mean as a model of the data. Regressions

93 Predicting New Values Equation for line: Y - output value X = predictor value β0 = intercept (constant. Value of Y without predictors) β1 = slope of line (value for predictor) ε = residual (error) Y = " 0 + " 1 X i + # i Regressions

94 Multiple Regression Extends principles of simple linear regression to situation with multiple predictor variables. We seek to find the linear combination of predictors that correlate maximally with the outcome variable. Y = " 0 + " 1 X i + " n X n + # i Predictor 1 Predictor 2 Regression

95 Multiple Regression, con t R 2 gives the % variance accounted for by the model consisting of the multiple predictors. T-test tell you independent contribution of each predictor in capturing data. Logistic Regressions are appropriate for dichotomous data! Regressions

96 Adjusted R 2 = variance in Pop R 2 & Adjusted R 2 should be similar Is the model an improvement over the mean or over a prior model? β = Change in outcome resulting in change in predictor Tests that line isn t horizontal Regressions

97 Improvement over mean Improvement over 1st block Degree each predictor effects outcome if effects of other predictors held constant T-test gives impression of whether new predictors improve model Units expressed as standard deviation for better comparison

98 Summary You should know which of the common tests are applicable to what types of data, designs and questions. You should also have enough background knowledge to help you understand what you read on-line or in statistics books. You should have an idea which tests might be useful for the type of data you have.

99 Informative Websites for spss videos! stics/investigating.htm Or, just google the test you are interested in.

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007) FROM: PAGANO, R. R. (007) I. INTRODUCTION: DISTINCTION BETWEEN PARAMETRIC AND NON-PARAMETRIC TESTS Statistical inference tests are often classified as to whether they are parametric or nonparametric Parameter