Harvard University. Rigorous Research in Engineering Education

Statistical Inference Kari Lock Harvard University Department of Statistics Rigorous Research in Engineering Education 12/3/09

Statistical Inference You have a sample and want to use the data collected in your sample to make inferences about the underlying truth(s) () in the population p Target Population Sample Experiment?

Statistics and Parameters Mean the average : add everything up and divide by the sample size. Standard Deviation A measure of the average distance from the mean. Indicates how spread out the data are. Variance = (Standard Deviation) 2 Proportion the proportion of the sample that falls in a certain category Correlation (r) a measure of linear association between Correlation (r) a measure of linear association between two numeric variables that runs between 1 and 1

Sample Statistics and Population Parameters MEAN SD VARIANCE PROPORTION CORRELATION SAMPLE X s s 2 pˆp r POPULATION μ σ 2 σ p ρ Sample Statistic: a number calculated using your data Population Parameter: a usually unknown population value GOAL U th l t ti ti t k i f b t GOAL: Use the sample statistics to make inferences about the population parameters

Sampling Variability A sample statistic is rarely exactly the same as the (unknown) population parameter. Each time we take a random sample from a population, we are likely to get a different set of individuals and calculate a different statistic. This is called sampling variability Good news: if you have a random sample, as your sample size gets larger and larger, the sample statistic gets closer and closer to the population parameter.

Sampling Variability If we could take lots of random samples of the same size from a given population, the variation i of an estimate from sample to sample the sampling distribution follows a predictable pattern: Luckily, the mean, standard deviation, and shape of the sampling distribution are all fully estimable from quantities we can observe in our data! -3-2 -1 0 1 2 3 Statistical inference is based on this knowledge. We only get to observe one random sample, but can take advantage of the predictable sampling distribution to make valid inferences about population parameters

Sampling Distribution -3-2 -1 0 1 2 3 The sampling distribution gives us the distribution of our estimate (for example, a mean) if were to take many samples from the population. For many common estimates, with iha large enough sample size this will be a Normal Distribution http://onlinestatbook.com/stat_sim/sampling_dist/index.html

Standard Error The standard error (SE) of an estimate is the standard deviation of the sampling distribution. Most tforms of statistical ttiti linference rely on knowing or being able to calculate the standard error (uncertainty) of an estimate The sampling distribution and standard error depend on What type of parameter you are estimating The standard deviation of your data The sample size

Statistical Inference Confidence Intervals: Create an interval for the true population parameter based on your estimate and the uncertainty surrounding your estimate Hypothesis Testing: Determine whether a difference or association is statistically significant Statistical Modeling: Explore relationships between variables, model trends, create predictions

Confidence Intervals A confidence interval typically takes the form estimate ± ( margin of error) A 95% confidence interval will cover the true parameter in 95% of random samples. http://bcs.whfreeman.com/ips4e/cat http://bcs.whfreeman.com/ips4e/cat_010/applets/confid enceinterval.html

Confidence Intervals In any given analysis, you will only observe one of these intervals. 95% of 95% CIs will cover the truth

Confidence Intervals The formula estimate ± ( margin of error) can be re written as estimate ± (multiplier) ( standard error) where the multiplier depends on the confidence level: the percent of the intervals that will cover the truth (typically 95%). For 95% intervals this multiplier is usually close to 2.

Confidence Interval Proportion Example You want to estimate the proportion of your graduates who go on to eventually pass the EIT (Engineer in Training) exam You take a random sample of your graduates, contact them, and ask them whether or not they have passed the EIT. You sample 100 people and 55 have passed. Your estimated population proportion is.55, but you want an interval surrounding this estimate to reflect your uncertainty

Confidence Interval Proportion Example The standard error for a sample proportion is pˆ(1 pˆ) n The 95% confidence level multiplier for a proportion is 1.96. The margin of error for a proportion is 1.96 pˆ(1 pˆ) n

Confidence Interval Proportion Example Your 95% confidence intervalisis then estimate ± margin of error ( ) = estimate ± (multiplier) ( standard error ) pˆ(1 pˆ) = pˆ ± 1.96 n.55.45 =.55 ± 1.96 100 =.55 ± 1.96.050 =.55 ±.098.452,.648 ( ) Note that if we instead sampled 1000 graduates, and still 55% had passed, our interval for the true proportion would be (.52,.58) much narrower!

Confidence Intervals Confidence intervals are pretty easy to compute by hand if you know the standard error (SE) of your estimate However, each type of estimate requires a different formula for computing the standard error, so it s easiest to just use a computer (any stat software will produce confidence intervals)

Confidence Intervals The width of the confidence interval, determined by the margin of error, depends on The confidence level (usually 90%, 95%, or 99%) Higher confidence > Wider interval The standard deviation of the original data Higher SD > Wider interval The sample size Higher sample size > Narrower interval

Confidence Intervals The easiest way to get more precise intervals for the truth is to sample a lot of people If you have a desired margin of error you want for a specific situation, you can work backwards to determine how many people you will need to sample

Sample Size When doing inference for a proportion, suppose you want no higher h than a certain ti margin of error. Recall ME = 1.96 pˆ(1 pˆ) n We don t know the sample proportion, but p(1 p) is maximized when p = ½, so we can be conservative and use p = ½. If we use p = ½ and replace 1.96 with 2, then we have ME 1 1 2 4 n = n 1 n ME 2 If we want a margin of error of.02, f02 we should sample 1/.02 2 = 2500 people.

Confidence Intervals 95% CI estimate t ± 2 SE ( ) Quantity of Parameter Estimate Standard Error of Distribution Interest Estimate for multiplier One Mean Difference in Means μ μ X1 X 2 One Proportion μ 1 2 p Difference in Proportions p1 p2 X ˆp p1 p2 ˆ ˆ s s n s 2 2 1 2 n n 1 2 tn 1 t + n1 + n2 2 pˆ(1 pˆ) N(0,1) n pˆ (1 pˆ ) pˆ (1 pˆ ) 1 1 2 2 n + N(0,1) n 1 2 *Get the multiplier for a (1 k)% interval by finding the value on the distribution with k% remaining in the tails.

Confidence Intervals Single Mean The 95% confidence interval for a mean is approximately X ± 2 s n

Statistical Significance A phenomenon is statistically significant if it is unlikely to occur by random chance. For a difference in sample means between variables to be statistically significant, we need to see a difference much larger than we would see just by random chance if the true means really were the same.

Hypotheses Null Hypothesis (H 0 ): What you are trying to disprove, the status quo. The null hypothesis can only be disproved, never proved. The null hypothesis is assumed to be true, and a lack of evidence against tthe null does NOT mean it is true! Alternative Hypothesis (H a ): The hypothesis you are trying to prove. The alternative is proved by collecting evidence (data) that contradicts the null hypothesis. Together, the null and alternative must comprise all possibilities. i Rejecting the null is equivalent to proving the alternative.

Hypotheses Two Sided Alternative: You merely want to prove that two things are not equal. H : μ = μ H μ μ H : μ = μ 0 0 0 1 2 a : μ μ0 H a : μ1 μ2 H : p = p H p p H : p = p 0 0 0 1 2 a : 0 H a : p1 p2 One Sided Alternative: You want to prove something is greater (or less than) something else. You are fairly certain the sample statistics would not show otherwise. H 0 : μ μ 0 H 0 : μ 1 μ 2 H a : μ > μ0 H a : μ1 > μ2 H p p H : p > p 0 : 0 H 0 : p 1 p 2 a 0 H : p > p a 1 2 Hypotheses are always phrased in terms of population parameters, NOT sample statistics

Hypothesis Testing Example You devise an activity that you guess will improve conceptual understanding of a topic. Conceptual understanding is measured by a test worth 100 points. The existing average score for this test is 84. After completing your activity, your students score an average of 86. Did these students score significantly higher than the existing average? H0 : μ 84 : μ > 84 H a X = 86 Is 86 extreme enough to reject the null??? STATISTICS: quantifyinghow unlikelyyour data is given STATISTICS: quantifying how unlikely your data is, given that the null is true

p value The p value for a sample is the probability of getting g data as extreme (or more extreme) than the observed data, assuming the null hypothesis is true.

p value A small p value means that if the null hypothesis is true, you are very unlikely to observe a value that extreme just by random chance. Since you did observe a value that extreme, your null hypothesis probably is not true, so your alternative is probably true! ASMALLP-VALUE PROVIDES EVIDENCE A SMALL P VALUE PROVIDES EVIDENCE AGAINST THE NULL HYPOTHESIS

IMPORTANT! The smaller the p-value, The smaller the p-value, the the stronger stronger the evidence the against evidence H o. the stronger the against th. o evidence against H. o How small is small enough? o

Significance Level The significance level (α) of a test determines when a p value is small enough for the null hypothesis to be rejected α = proportion of times you will incorrectly reject the null, if it is true. Most commonly in education research, α=0.05 Decision rule: p value < α Reject H o p value > α Do not Reject H o α must always be specified before the data is analyzed!

Test Statistic The extremity of a sample statistic can often by determined by the number of standard errors it is from the null mean, called a z score: t.. s = estimate null mean SE In most cases, this test statistic follows a predictable, p distribution when the null hypothesis is true, so the p value is computed as area in the tails of this distribution.

p value The p value is the probability that the test statistic is extreme as that observed, given that the null hypothesis is true Distribution of the test statistic assuming the null hypothesis is true p value -3-2 -1 0 1 2 3 B Test Statistic

Hypothesis Testing 1. Determine the null and alternative hypotheses 2. Calculate the estimate, the standard error of the estimate, and your test statistic 3. Determine the distribution of the test statistic assuming the null hypothesis is true 4. Calculate l the p value, the probability bilit of observed a test t statistic as extreme as yours, given that the null is true 5. If the p value is small enough (typically.05), reject the null

Hypothesis Testing Example The existing average score for this test is 84. After completing your activity, your students sample mean is 86. Did these students score significantly higher than the existing average? What other information do you need? sample standard deviation: s = 5 sample size: n = 25

Hypothesis Testing One Mean 1. Determine the null and alternative hypotheses H0 : μ μ μ 0 0 = 84 H : μ > μ H : μ 84 a > 0 0 H a : μ > 84 2. Calculate the estimate, the standard error of the estimate, and your test statistic s Estimate = X SE = X = 86, s = 5, n= 25 n 86 84 2 ts.. = = = 2 estimate null mean X μ0 5 ts.. = 1 SE = s 25 n

Hypothesis Testing One Mean 3. Determine the distribution of the test statistic assuming the null hypothesis is true t n 1 : t distribution with n 1 degrees of freedom 4. Calculate the p value, the probability of observed a test statistic as extreme as yours, given that the null is true.028 <.05, so we would reject the null hypothesis. There is evidence that the true mean test score of students completing the activity is higher than the p value =.028 existing mean of 84. -3-2 -1 0 1 2 3

Hypothesis Testing Example We found that students completing gyour activity scored higher than the existing average. Does this mean the activity increases conceptual understanding? Not necessarily! To answer a question about causality we need A RANDOMIZED EXPERIMENT!

Hypothesis Testing Example You select n=100 students to participate p in your student, and randomly assign half of them participate in the activity, and half of them get a placebo. You give them each the conceptual understanding test. The placebo group has an average of 85 with a standard deviation of 5, and the activity group has an average of 87, also with a standard deviation of 5. Now is there evidence that the activity increases conceptual understanding (as measured by the test)?

Hypothesis Testing ts.. = estimate t null value SE Quantity of Hypotheses Estimate Standard Error of Distribution Interest Estimate under the null One Mean Difference in Means One Proportion Difference in Proportions H0 : μ = μ0 H : μ μ a 0 H0 : μ1 = μ2 H : μ μ a 1 2 H : p = p H : p p 0 0 a 1 0 H0 : p1 = p H : p p a 2 2 X pˆ X X 1 2 p s s n s 2 2 1 2 n tn 1 t + n1 + n2 2 n 1 2 ˆp 0 0 pˆ 1 2 pˆ(1 pˆ) (1 p ) N(0,1) n 1 1 + n1 n2 pˆ = pˆ n + pˆ n n + n 1 1 2 2 1 2 N(0,1)

Hypothesis Testing Difference inmeans H H : μ μ2 : μ > μ 0 1 a 1 2 estimate null mean X 1 X2 0 87 85 2 ts.. = = = = = 2 2 2 2 SE 1 s s 5 5 1 2 + + n n 50 50 1 2 2 Null Distribution t + t 50 + 50 2 = t 98 : n 1 n 2 2 p value =.024 The p value is less than.05, so we can reject the null hypothesis. Since this was a randomized experiment, we can conclude that the activity causes higher conceptual understanding.

Hypothesis Testing What makes results significant? Large effect size (estimate is far from null estimate) Low variability (standard deviation) in data (less random variation, easier to spot an effect) Large sample size (as sample size increases, estimates get closer and closer to the true, so can be trusted more)

Hypothesis Testing You will almost definitely use a computer to conduct hypotheses tests, so just need to know the names of the appropriate tests: Test for one mean: t test test Test for a difference in means: t test Test for a proportion or difference in proportions: z test for proportion(s) Test for association between categorical variables: chi square test for independence Test for association between quantitative variables: correlation test or test for the slope coefficient in linear regression

Hypothesis Testing If conducting a hypothesis test, you will have to COLLECT YOUR DATA STATE THE NULL AND ALTERNATIVE HYPOTHESIS KNOW WHICH TEST TO ASK A COMPUTER TO PERFORM (or just ask me) INTERPRET THE P VALUE

Simple Linear Regression Predicts your outcome data based on one explanatory variable 0 2 44 6 8 10 0 1 2 3 4 5

Simple Linear Regression Finds the best fit line The coefficient of the explanatory variable gives the change in the outcome variable ibl for every unit change in the explanatory (also called predictor ) variable Hypothesis tests on the coefficient for a variable determine if the variable is significantly correlated with the outcome Confidence intervals (for the predicted value) and prediction intervals (for an individual) can be produced

Multiple Regression Uses multiple explanatory variables to predict the outcome The coefficient for each variable represents the effect of that variable ibl given the other variables in the model dl If explanatory variables arecorrelated with each other, coefficients may change depending on what is in the model p values for each explanatory variable can be assessed Again, prediction intervals can be formed

Outliers Outliers can very strongly influence your results (for regression, hypothesis tests, etc ) Always plot your data first to check for outliers If you do have extreme outliers check for errors If the outliers are legitimate, you should run your analyses with and without the outliers to see how much the outliers influence the results

Outliers Correlation (as well as mean, standard deviation, regression coefficients) can be highly affected by outliers: r =.78 r =.17 y 60 70 y2 40 50 80 80 90 85 90 Outlier 70 75 30 40 50 60 70 80 90 x 75 80 85 x2

Statistical Analysis This is only a brief introduction to some basic statistical analysis tools, and to give you an idea of what s available All of these are very famous and detailed information can be found on the web, or in any introductory statistics textbook Most of these techniques need assumptions to be verified before applying them. Please read or ask me for more specifics about the technique you actually intend to use

lock@stat.harvard.edu