Harvard University. Rigorous Research in Engineering Education

Size: px
Start display at page:

Download "Harvard University. Rigorous Research in Engineering Education"

Transcription

1 Statistical Inference Kari Lock Harvard University Department of Statistics Rigorous Research in Engineering Education 12/3/09

2 Statistical Inference You have a sample and want to use the data collected in your sample to make inferences about the underlying truth(s) () in the population p Target Population Sample Experiment?

3 Statistics and Parameters Mean the average : add everything up and divide by the sample size. Standard Deviation A measure of the average distance from the mean. Indicates how spread out the data are. Variance = (Standard Deviation) 2 Proportion the proportion of the sample that falls in a certain category Correlation (r) a measure of linear association between Correlation (r) a measure of linear association between two numeric variables that runs between 1 and 1

4 Sample Statistics and Population Parameters MEAN SD VARIANCE PROPORTION CORRELATION SAMPLE X s s 2 pˆp r POPULATION μ σ 2 σ p ρ Sample Statistic: a number calculated using your data Population Parameter: a usually unknown population value GOAL U th l t ti ti t k i f b t GOAL: Use the sample statistics to make inferences about the population parameters

5 Sampling Variability A sample statistic is rarely exactly the same as the (unknown) population parameter. Each time we take a random sample from a population, we are likely to get a different set of individuals and calculate a different statistic. This is called sampling variability Good news: if you have a random sample, as your sample size gets larger and larger, the sample statistic gets closer and closer to the population parameter.

6 Sampling Variability If we could take lots of random samples of the same size from a given population, the variation i of an estimate from sample to sample the sampling distribution follows a predictable pattern: Luckily, the mean, standard deviation, and shape of the sampling distribution are all fully estimable from quantities we can observe in our data! Statistical inference is based on this knowledge. We only get to observe one random sample, but can take advantage of the predictable sampling distribution to make valid inferences about population parameters

7 Sampling Distribution The sampling distribution gives us the distribution of our estimate (for example, a mean) if were to take many samples from the population. For many common estimates, with iha large enough sample size this will be a Normal Distribution

8 Standard Error The standard error (SE) of an estimate is the standard deviation of the sampling distribution. Most tforms of statistical ttiti linference rely on knowing or being able to calculate the standard error (uncertainty) of an estimate The sampling distribution and standard error depend on What type of parameter you are estimating The standard deviation of your data The sample size

9 Statistical Inference Confidence Intervals: Create an interval for the true population parameter based on your estimate and the uncertainty surrounding your estimate Hypothesis Testing: Determine whether a difference or association is statistically significant Statistical Modeling: Explore relationships between variables, model trends, create predictions

10 Confidence Intervals A confidence interval typically takes the form estimate ± ( margin of error) A 95% confidence interval will cover the true parameter in 95% of random samples. enceinterval.html

11 Confidence Intervals In any given analysis, you will only observe one of these intervals. 95% of 95% CIs will cover the truth

12 Confidence Intervals The formula estimate ± ( margin of error) can be re written as estimate ± (multiplier) ( standard error) where the multiplier depends on the confidence level: the percent of the intervals that will cover the truth (typically 95%). For 95% intervals this multiplier is usually close to 2.

13 Confidence Interval Proportion Example You want to estimate the proportion of your graduates who go on to eventually pass the EIT (Engineer in Training) exam You take a random sample of your graduates, contact them, and ask them whether or not they have passed the EIT. You sample 100 people and 55 have passed. Your estimated population proportion is.55, but you want an interval surrounding this estimate to reflect your uncertainty

14 Confidence Interval Proportion Example The standard error for a sample proportion is pˆ(1 pˆ) n The 95% confidence level multiplier for a proportion is The margin of error for a proportion is 1.96 pˆ(1 pˆ) n

15 Confidence Interval Proportion Example Your 95% confidence intervalisis then estimate ± margin of error ( ) = estimate ± (multiplier) ( standard error ) pˆ(1 pˆ) = pˆ ± 1.96 n =.55 ± =.55 ± =.55 ± ,.648 ( ) Note that if we instead sampled 1000 graduates, and still 55% had passed, our interval for the true proportion would be (.52,.58) much narrower!

16 Confidence Intervals Confidence intervals are pretty easy to compute by hand if you know the standard error (SE) of your estimate However, each type of estimate requires a different formula for computing the standard error, so it s easiest to just use a computer (any stat software will produce confidence intervals)

17 Confidence Intervals The width of the confidence interval, determined by the margin of error, depends on The confidence level (usually 90%, 95%, or 99%) Higher confidence > Wider interval The standard deviation of the original data Higher SD > Wider interval The sample size Higher sample size > Narrower interval

18 Confidence Intervals The easiest way to get more precise intervals for the truth is to sample a lot of people If you have a desired margin of error you want for a specific situation, you can work backwards to determine how many people you will need to sample

19 Sample Size When doing inference for a proportion, suppose you want no higher h than a certain ti margin of error. Recall ME = 1.96 pˆ(1 pˆ) n We don t know the sample proportion, but p(1 p) is maximized when p = ½, so we can be conservative and use p = ½. If we use p = ½ and replace 1.96 with 2, then we have ME n = n 1 n ME 2 If we want a margin of error of.02, f02 we should sample 1/.02 2 = 2500 people.

20 Confidence Intervals 95% CI estimate t ± 2 SE ( ) Quantity of Parameter Estimate Standard Error of Distribution Interest Estimate for multiplier One Mean Difference in Means μ μ X1 X 2 One Proportion μ 1 2 p Difference in Proportions p1 p2 X ˆp p1 p2 ˆ ˆ s s n s n n 1 2 tn 1 t + n1 + n2 2 pˆ(1 pˆ) N(0,1) n pˆ (1 pˆ ) pˆ (1 pˆ ) n + N(0,1) n 1 2 *Get the multiplier for a (1 k)% interval by finding the value on the distribution with k% remaining in the tails.

21 Confidence Intervals Single Mean The 95% confidence interval for a mean is approximately X ± 2 s n

22 Statistical Significance A phenomenon is statistically significant if it is unlikely to occur by random chance. For a difference in sample means between variables to be statistically significant, we need to see a difference much larger than we would see just by random chance if the true means really were the same.

23 Hypotheses Null Hypothesis (H 0 ): What you are trying to disprove, the status quo. The null hypothesis can only be disproved, never proved. The null hypothesis is assumed to be true, and a lack of evidence against tthe null does NOT mean it is true! Alternative Hypothesis (H a ): The hypothesis you are trying to prove. The alternative is proved by collecting evidence (data) that contradicts the null hypothesis. Together, the null and alternative must comprise all possibilities. i Rejecting the null is equivalent to proving the alternative.

24 Hypotheses Two Sided Alternative: You merely want to prove that two things are not equal. H : μ = μ H μ μ H : μ = μ a : μ μ0 H a : μ1 μ2 H : p = p H p p H : p = p a : 0 H a : p1 p2 One Sided Alternative: You want to prove something is greater (or less than) something else. You are fairly certain the sample statistics would not show otherwise. H 0 : μ μ 0 H 0 : μ 1 μ 2 H a : μ > μ0 H a : μ1 > μ2 H p p H : p > p 0 : 0 H 0 : p 1 p 2 a 0 H : p > p a 1 2 Hypotheses are always phrased in terms of population parameters, NOT sample statistics

25 Hypothesis Testing Example You devise an activity that you guess will improve conceptual understanding of a topic. Conceptual understanding is measured by a test worth 100 points. The existing average score for this test is 84. After completing your activity, your students score an average of 86. Did these students score significantly higher than the existing average? H0 : μ 84 : μ > 84 H a X = 86 Is 86 extreme enough to reject the null??? STATISTICS: quantifyinghow unlikelyyour data is given STATISTICS: quantifying how unlikely your data is, given that the null is true

26 p value The p value for a sample is the probability of getting g data as extreme (or more extreme) than the observed data, assuming the null hypothesis is true.

27 p value A small p value means that if the null hypothesis is true, you are very unlikely to observe a value that extreme just by random chance. Since you did observe a value that extreme, your null hypothesis probably is not true, so your alternative is probably true! ASMALLP-VALUE PROVIDES EVIDENCE A SMALL P VALUE PROVIDES EVIDENCE AGAINST THE NULL HYPOTHESIS

28 IMPORTANT! The smaller the p-value, The smaller the p-value, the the stronger stronger the evidence the against evidence H o. the stronger the against th. o evidence against H. o How small is small enough? o

29 Significance Level The significance level (α) of a test determines when a p value is small enough for the null hypothesis to be rejected α = proportion of times you will incorrectly reject the null, if it is true. Most commonly in education research, α=0.05 Decision rule: p value < α Reject H o p value > α Do not Reject H o α must always be specified before the data is analyzed!

30 Test Statistic The extremity of a sample statistic can often by determined by the number of standard errors it is from the null mean, called a z score: t.. s = estimate null mean SE In most cases, this test statistic follows a predictable, p distribution when the null hypothesis is true, so the p value is computed as area in the tails of this distribution.

31 p value The p value is the probability that the test statistic is extreme as that observed, given that the null hypothesis is true Distribution of the test statistic assuming the null hypothesis is true p value B Test Statistic

32 Hypothesis Testing 1. Determine the null and alternative hypotheses 2. Calculate the estimate, the standard error of the estimate, and your test statistic 3. Determine the distribution of the test statistic assuming the null hypothesis is true 4. Calculate l the p value, the probability bilit of observed a test t statistic as extreme as yours, given that the null is true 5. If the p value is small enough (typically.05), reject the null

33 Hypothesis Testing Example The existing average score for this test is 84. After completing your activity, your students sample mean is 86. Did these students score significantly higher than the existing average? What other information do you need? sample standard deviation: s = 5 sample size: n = 25

34 Hypothesis Testing One Mean 1. Determine the null and alternative hypotheses H0 : μ μ μ 0 0 = 84 H : μ > μ H : μ 84 a > 0 0 H a : μ > Calculate the estimate, the standard error of the estimate, and your test statistic s Estimate = X SE = X = 86, s = 5, n= 25 n ts.. = = = 2 estimate null mean X μ0 5 ts.. = 1 SE = s 25 n

35 Hypothesis Testing One Mean 3. Determine the distribution of the test statistic assuming the null hypothesis is true t n 1 : t distribution with n 1 degrees of freedom 4. Calculate the p value, the probability of observed a test statistic as extreme as yours, given that the null is true.028 <.05, so we would reject the null hypothesis. There is evidence that the true mean test score of students completing the activity is higher than the p value =.028 existing mean of

36 Hypothesis Testing Example We found that students completing gyour activity scored higher than the existing average. Does this mean the activity increases conceptual understanding? Not necessarily! To answer a question about causality we need A RANDOMIZED EXPERIMENT!

37 Hypothesis Testing Example You select n=100 students to participate p in your student, and randomly assign half of them participate in the activity, and half of them get a placebo. You give them each the conceptual understanding test. The placebo group has an average of 85 with a standard deviation of 5, and the activity group has an average of 87, also with a standard deviation of 5. Now is there evidence that the activity increases conceptual understanding (as measured by the test)?

38 Hypothesis Testing ts.. = estimate t null value SE Quantity of Hypotheses Estimate Standard Error of Distribution Interest Estimate under the null One Mean Difference in Means One Proportion Difference in Proportions H0 : μ = μ0 H : μ μ a 0 H0 : μ1 = μ2 H : μ μ a 1 2 H : p = p H : p p 0 0 a 1 0 H0 : p1 = p H : p p a 2 2 X pˆ X X 1 2 p s s n s n tn 1 t + n1 + n2 2 n 1 2 ˆp 0 0 pˆ 1 2 pˆ(1 pˆ) (1 p ) N(0,1) n n1 n2 pˆ = pˆ n + pˆ n n + n N(0,1)

39 Hypothesis Testing Difference inmeans H H : μ μ2 : μ > μ 0 1 a 1 2 estimate null mean X 1 X ts.. = = = = = SE 1 s s n n Null Distribution t + t = t 98 : n 1 n 2 2 p value =.024 The p value is less than.05, so we can reject the null hypothesis. Since this was a randomized experiment, we can conclude that the activity causes higher conceptual understanding.

40 Hypothesis Testing What makes results significant? Large effect size (estimate is far from null estimate) Low variability (standard deviation) in data (less random variation, easier to spot an effect) Large sample size (as sample size increases, estimates get closer and closer to the true, so can be trusted more)

41 Hypothesis Testing You will almost definitely use a computer to conduct hypotheses tests, so just need to know the names of the appropriate tests: Test for one mean: t test test Test for a difference in means: t test Test for a proportion or difference in proportions: z test for proportion(s) Test for association between categorical variables: chi square test for independence Test for association between quantitative variables: correlation test or test for the slope coefficient in linear regression

42 Hypothesis Testing If conducting a hypothesis test, you will have to COLLECT YOUR DATA STATE THE NULL AND ALTERNATIVE HYPOTHESIS KNOW WHICH TEST TO ASK A COMPUTER TO PERFORM (or just ask me) INTERPRET THE P VALUE

43 Simple Linear Regression Predicts your outcome data based on one explanatory variable

44 Simple Linear Regression Finds the best fit line The coefficient of the explanatory variable gives the change in the outcome variable ibl for every unit change in the explanatory (also called predictor ) variable Hypothesis tests on the coefficient for a variable determine if the variable is significantly correlated with the outcome Confidence intervals (for the predicted value) and prediction intervals (for an individual) can be produced

45 Multiple Regression Uses multiple explanatory variables to predict the outcome The coefficient for each variable represents the effect of that variable ibl given the other variables in the model dl If explanatory variables arecorrelated with each other, coefficients may change depending on what is in the model p values for each explanatory variable can be assessed Again, prediction intervals can be formed

46 Outliers Outliers can very strongly influence your results (for regression, hypothesis tests, etc ) Always plot your data first to check for outliers If you do have extreme outliers check for errors If the outliers are legitimate, you should run your analyses with and without the outliers to see how much the outliers influence the results

47 Outliers Correlation (as well as mean, standard deviation, regression coefficients) can be highly affected by outliers: r =.78 r =.17 y y Outlier x x2

48 Statistical Analysis This is only a brief introduction to some basic statistical analysis tools, and to give you an idea of what s available All of these are very famous and detailed information can be found on the web, or in any introductory statistics textbook Most of these techniques need assumptions to be verified before applying them. Please read or ask me for more specifics about the technique you actually intend to use

49

Mathematical Notation Math Introduction to Applied Statistics

Mathematical Notation Math Introduction to Applied Statistics Mathematical Notation Math 113 - Introduction to Applied Statistics Name : Use Word or WordPerfect to recreate the following documents. Each article is worth 10 points and should be emailed to the instructor

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

Mathematical Notation Math Introduction to Applied Statistics

Mathematical Notation Math Introduction to Applied Statistics Mathematical Notation Math 113 - Introduction to Applied Statistics Name : Use Word or WordPerfect to recreate the following documents. Each article is worth 10 points and can be printed and given to the

More information

Chapter 5 Confidence Intervals

Chapter 5 Confidence Intervals Chapter 5 Confidence Intervals Confidence Intervals about a Population Mean, σ, Known Abbas Motamedi Tennessee Tech University A point estimate: a single number, calculated from a set of data, that is

More information

STA 101 Final Review

STA 101 Final Review STA 101 Final Review Statistics 101 Thomas Leininger June 24, 2013 Announcements All work (besides projects) should be returned to you and should be entered on Sakai. Office Hour: 2 3pm today (Old Chem

More information

Objectives Simple linear regression. Statistical model for linear regression. Estimating the regression parameters

Objectives Simple linear regression. Statistical model for linear regression. Estimating the regression parameters Objectives 10.1 Simple linear regression Statistical model for linear regression Estimating the regression parameters Confidence interval for regression parameters Significance test for the slope Confidence

More information

STA Module 10 Comparing Two Proportions

STA Module 10 Comparing Two Proportions STA 2023 Module 10 Comparing Two Proportions Learning Objectives Upon completing this module, you should be able to: 1. Perform large-sample inferences (hypothesis test and confidence intervals) to compare

More information

One-sample categorical data: approximate inference

One-sample categorical data: approximate inference One-sample categorical data: approximate inference Patrick Breheny October 6 Patrick Breheny Biostatistical Methods I (BIOS 5710) 1/25 Introduction It is relatively easy to think about the distribution

More information

Introduction to Survey Analysis!

Introduction to Survey Analysis! Introduction to Survey Analysis! Professor Ron Fricker! Naval Postgraduate School! Monterey, California! Reading Assignment:! 2/22/13 None! 1 Goals for this Lecture! Introduction to analysis for surveys!

More information

Business Statistics. Lecture 10: Course Review

Business Statistics. Lecture 10: Course Review Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,

More information

Econ 325: Introduction to Empirical Economics

Econ 325: Introduction to Empirical Economics Econ 325: Introduction to Empirical Economics Chapter 9 Hypothesis Testing: Single Population Ch. 9-1 9.1 What is a Hypothesis? A hypothesis is a claim (assumption) about a population parameter: population

More information

The t-test: A z-score for a sample mean tells us where in the distribution the particular mean lies

The t-test: A z-score for a sample mean tells us where in the distribution the particular mean lies The t-test: So Far: Sampling distribution benefit is that even if the original population is not normal, a sampling distribution based on this population will be normal (for sample size > 30). Benefit

More information

Hypothesis testing. Data to decisions

Hypothesis testing. Data to decisions Hypothesis testing Data to decisions The idea Null hypothesis: H 0 : the DGP/population has property P Under the null, a sample statistic has a known distribution If, under that that distribution, the

More information

Chapter 27 Summary Inferences for Regression

Chapter 27 Summary Inferences for Regression Chapter 7 Summary Inferences for Regression What have we learned? We have now applied inference to regression models. Like in all inference situations, there are conditions that we must check. We can test

More information

Statistical Inference. Why Use Statistical Inference. Point Estimates. Point Estimates. Greg C Elvers

Statistical Inference. Why Use Statistical Inference. Point Estimates. Point Estimates. Greg C Elvers Statistical Inference Greg C Elvers 1 Why Use Statistical Inference Whenever we collect data, we want our results to be true for the entire population and not just the sample that we used But our sample

More information

Two-Sample Inferential Statistics

Two-Sample Inferential Statistics The t Test for Two Independent Samples 1 Two-Sample Inferential Statistics In an experiment there are two or more conditions One condition is often called the control condition in which the treatment is

More information

Statistical Inference for Means

Statistical Inference for Means Statistical Inference for Means Jamie Monogan University of Georgia February 18, 2011 Jamie Monogan (UGA) Statistical Inference for Means February 18, 2011 1 / 19 Objectives By the end of this meeting,

More information

ECO220Y Review and Introduction to Hypothesis Testing Readings: Chapter 12

ECO220Y Review and Introduction to Hypothesis Testing Readings: Chapter 12 ECO220Y Review and Introduction to Hypothesis Testing Readings: Chapter 12 Winter 2012 Lecture 13 (Winter 2011) Estimation Lecture 13 1 / 33 Review of Main Concepts Sampling Distribution of Sample Mean

More information

9. Linear Regression and Correlation

9. Linear Regression and Correlation 9. Linear Regression and Correlation Data: y a quantitative response variable x a quantitative explanatory variable (Chap. 8: Recall that both variables were categorical) For example, y = annual income,

More information

Confidence Intervals, Testing and ANOVA Summary

Confidence Intervals, Testing and ANOVA Summary Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0

More information

Chapter 12 - Lecture 2 Inferences about regression coefficient

Chapter 12 - Lecture 2 Inferences about regression coefficient Chapter 12 - Lecture 2 Inferences about regression coefficient April 19th, 2010 Facts about slope Test Statistic Confidence interval Hypothesis testing Test using ANOVA Table Facts about slope In previous

More information

AP Statistics Cumulative AP Exam Study Guide

AP Statistics Cumulative AP Exam Study Guide AP Statistics Cumulative AP Eam Study Guide Chapters & 3 - Graphs Statistics the science of collecting, analyzing, and drawing conclusions from data. Descriptive methods of organizing and summarizing statistics

More information

Probability and Statistics

Probability and Statistics Probability and Statistics Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be CHAPTER 4: IT IS ALL ABOUT DATA 4a - 1 CHAPTER 4: IT

More information

Psychology 282 Lecture #4 Outline Inferences in SLR

Psychology 282 Lecture #4 Outline Inferences in SLR Psychology 282 Lecture #4 Outline Inferences in SLR Assumptions To this point we have not had to make any distributional assumptions. Principle of least squares requires no assumptions. Can use correlations

More information

appstats27.notebook April 06, 2017

appstats27.notebook April 06, 2017 Chapter 27 Objective Students will conduct inference on regression and analyze data to write a conclusion. Inferences for Regression An Example: Body Fat and Waist Size pg 634 Our chapter example revolves

More information

Inference for Regression Simple Linear Regression

Inference for Regression Simple Linear Regression Inference for Regression Simple Linear Regression IPS Chapter 10.1 2009 W.H. Freeman and Company Objectives (IPS Chapter 10.1) Simple linear regression p Statistical model for linear regression p Estimating

More information

Lab #12: Exam 3 Review Key

Lab #12: Exam 3 Review Key Psychological Statistics Practice Lab#1 Dr. M. Plonsky Page 1 of 7 Lab #1: Exam 3 Review Key 1) a. Probability - Refers to the likelihood that an event will occur. Ranges from 0 to 1. b. Sampling Distribution

More information

STAT Chapter 8: Hypothesis Tests

STAT Chapter 8: Hypothesis Tests STAT 515 -- Chapter 8: Hypothesis Tests CIs are possibly the most useful forms of inference because they give a range of reasonable values for a parameter. But sometimes we want to know whether one particular

More information

Lecture 7: Hypothesis Testing and ANOVA

Lecture 7: Hypothesis Testing and ANOVA Lecture 7: Hypothesis Testing and ANOVA Goals Overview of key elements of hypothesis testing Review of common one and two sample tests Introduction to ANOVA Hypothesis Testing The intent of hypothesis

More information

The Simple Linear Regression Model

The Simple Linear Regression Model The Simple Linear Regression Model Lesson 3 Ryan Safner 1 1 Department of Economics Hood College ECON 480 - Econometrics Fall 2017 Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 1 / 77 Bivariate

More information

Single Sample Means. SOCY601 Alan Neustadtl

Single Sample Means. SOCY601 Alan Neustadtl Single Sample Means SOCY601 Alan Neustadtl The Central Limit Theorem If we have a population measured by a variable with a mean µ and a standard deviation σ, and if all possible random samples of size

More information

y ˆ i = ˆ " T u i ( i th fitted value or i th fit)

y ˆ i = ˆ  T u i ( i th fitted value or i th fit) 1 2 INFERENCE FOR MULTIPLE LINEAR REGRESSION Recall Terminology: p predictors x 1, x 2,, x p Some might be indicator variables for categorical variables) k-1 non-constant terms u 1, u 2,, u k-1 Each u

More information

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing 1 In most statistics problems, we assume that the data have been generated from some unknown probability distribution. We desire

More information

2011 Pearson Education, Inc

2011 Pearson Education, Inc Statistics for Business and Economics Chapter 7 Inferences Based on Two Samples: Confidence Intervals & Tests of Hypotheses Content 1. Identifying the Target Parameter 2. Comparing Two Population Means:

More information

Inference for Regression Inference about the Regression Model and Using the Regression Line

Inference for Regression Inference about the Regression Model and Using the Regression Line Inference for Regression Inference about the Regression Model and Using the Regression Line PBS Chapter 10.1 and 10.2 2009 W.H. Freeman and Company Objectives (PBS Chapter 10.1 and 10.2) Inference about

More information

Ordinary Least Squares Regression Explained: Vartanian

Ordinary Least Squares Regression Explained: Vartanian Ordinary Least Squares Regression Explained: Vartanian When to Use Ordinary Least Squares Regression Analysis A. Variable types. When you have an interval/ratio scale dependent variable.. When your independent

More information

Lecture 26: Chapter 10, Section 2 Inference for Quantitative Variable Confidence Interval with t

Lecture 26: Chapter 10, Section 2 Inference for Quantitative Variable Confidence Interval with t Lecture 26: Chapter 10, Section 2 Inference for Quantitative Variable Confidence Interval with t t Confidence Interval for Population Mean Comparing z and t Confidence Intervals When neither z nor t Applies

More information

Last week: Sample, population and sampling distributions finished with estimation & confidence intervals

Last week: Sample, population and sampling distributions finished with estimation & confidence intervals Past weeks: Measures of central tendency (mean, mode, median) Measures of dispersion (standard deviation, variance, range, etc). Working with the normal curve Last week: Sample, population and sampling

More information

Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals

Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals (SW Chapter 5) Outline. The standard error of ˆ. Hypothesis tests concerning β 3. Confidence intervals for β 4. Regression

More information

Inferences for Regression

Inferences for Regression Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In

More information

Last two weeks: Sample, population and sampling distributions finished with estimation & confidence intervals

Last two weeks: Sample, population and sampling distributions finished with estimation & confidence intervals Past weeks: Measures of central tendency (mean, mode, median) Measures of dispersion (standard deviation, variance, range, etc). Working with the normal curve Last two weeks: Sample, population and sampling

More information

Inferential statistics

Inferential statistics Inferential statistics Inference involves making a Generalization about a larger group of individuals on the basis of a subset or sample. Ahmed-Refat-ZU Null and alternative hypotheses In hypotheses testing,

More information

CHAPTER 10 Comparing Two Populations or Groups

CHAPTER 10 Comparing Two Populations or Groups CHAPTER 10 Comparing Two Populations or Groups 10.1 Comparing Two Proportions The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers Comparing Two Proportions

More information

7.2 One-Sample Correlation ( = a) Introduction. Correlation analysis measures the strength and direction of association between

7.2 One-Sample Correlation ( = a) Introduction. Correlation analysis measures the strength and direction of association between 7.2 One-Sample Correlation ( = a) Introduction Correlation analysis measures the strength and direction of association between variables. In this chapter we will test whether the population correlation

More information

STAT Chapter 9: Two-Sample Problems. Paired Differences (Section 9.3)

STAT Chapter 9: Two-Sample Problems. Paired Differences (Section 9.3) STAT 515 -- Chapter 9: Two-Sample Problems Paired Differences (Section 9.3) Examples of Paired Differences studies: Similar subjects are paired off and one of two treatments is given to each subject in

More information

Chapter 9 Inferences from Two Samples

Chapter 9 Inferences from Two Samples Chapter 9 Inferences from Two Samples 9-1 Review and Preview 9-2 Two Proportions 9-3 Two Means: Independent Samples 9-4 Two Dependent Samples (Matched Pairs) 9-5 Two Variances or Standard Deviations Review

More information

Interactions and Factorial ANOVA

Interactions and Factorial ANOVA Interactions and Factorial ANOVA STA442/2101 F 2017 See last slide for copyright information 1 Interactions Interaction between explanatory variables means It depends. Relationship between one explanatory

More information

Math 124: Modules Overall Goal. Point Estimations. Interval Estimation. Math 124: Modules Overall Goal.

Math 124: Modules Overall Goal. Point Estimations. Interval Estimation. Math 124: Modules Overall Goal. What we will do today s David Meredith Department of Mathematics San Francisco State University October 22, 2009 s 1 2 s 3 What is a? Decision support Political decisions s s Goal of statistics: optimize

More information

Warm-up Using the given data Create a scatterplot Find the regression line

Warm-up Using the given data Create a scatterplot Find the regression line Time at the lunch table Caloric intake 21.4 472 30.8 498 37.7 335 32.8 423 39.5 437 22.8 508 34.1 431 33.9 479 43.8 454 42.4 450 43.1 410 29.2 504 31.3 437 28.6 489 32.9 436 30.6 480 35.1 439 33.0 444

More information

Interactions and Factorial ANOVA

Interactions and Factorial ANOVA Interactions and Factorial ANOVA STA442/2101 F 2018 See last slide for copyright information 1 Interactions Interaction between explanatory variables means It depends. Relationship between one explanatory

More information

Probability and Statistics Notes

Probability and Statistics Notes Probability and Statistics Notes Chapter Seven Jesse Crawford Department of Mathematics Tarleton State University Spring 2011 (Tarleton State University) Chapter Seven Notes Spring 2011 1 / 42 Outline

More information

Swarthmore Honors Exam 2012: Statistics

Swarthmore Honors Exam 2012: Statistics Swarthmore Honors Exam 2012: Statistics 1 Swarthmore Honors Exam 2012: Statistics John W. Emerson, Yale University NAME: Instructions: This is a closed-book three-hour exam having six questions. You may

More information

Relating Graph to Matlab

Relating Graph to Matlab There are two related course documents on the web Probability and Statistics Review -should be read by people without statistics background and it is helpful as a review for those with prior statistics

More information

Simple Linear Regression for the Climate Data

Simple Linear Regression for the Climate Data Prediction Prediction Interval Temperature 0.2 0.0 0.2 0.4 0.6 0.8 320 340 360 380 CO 2 Simple Linear Regression for the Climate Data What do we do with the data? y i = Temperature of i th Year x i =CO

More information

What is a Hypothesis?

What is a Hypothesis? What is a Hypothesis? A hypothesis is a claim (assumption) about a population parameter: population mean Example: The mean monthly cell phone bill in this city is μ = $42 population proportion Example:

More information

Comparing Means from Two-Sample

Comparing Means from Two-Sample Comparing Means from Two-Sample Kwonsang Lee University of Pennsylvania kwonlee@wharton.upenn.edu April 3, 2015 Kwonsang Lee STAT111 April 3, 2015 1 / 22 Inference from One-Sample We have two options to

More information

Interpret Standard Deviation. Outlier Rule. Describe the Distribution OR Compare the Distributions. Linear Transformations SOCS. Interpret a z score

Interpret Standard Deviation. Outlier Rule. Describe the Distribution OR Compare the Distributions. Linear Transformations SOCS. Interpret a z score Interpret Standard Deviation Outlier Rule Linear Transformations Describe the Distribution OR Compare the Distributions SOCS Using Normalcdf and Invnorm (Calculator Tips) Interpret a z score What is an

More information

11 Correlation and Regression

11 Correlation and Regression Chapter 11 Correlation and Regression August 21, 2017 1 11 Correlation and Regression When comparing two variables, sometimes one variable (the explanatory variable) can be used to help predict the value

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

Sampling Distributions: Central Limit Theorem

Sampling Distributions: Central Limit Theorem Review for Exam 2 Sampling Distributions: Central Limit Theorem Conceptually, we can break up the theorem into three parts: 1. The mean (µ M ) of a population of sample means (M) is equal to the mean (µ)

More information

Chapter 8. Inferences Based on a Two Samples Confidence Intervals and Tests of Hypothesis

Chapter 8. Inferences Based on a Two Samples Confidence Intervals and Tests of Hypothesis Chapter 8 Inferences Based on a Two Samples Confidence Intervals and Tests of Hypothesis Copyright 2018, 2014, and 2011 Pearson Education, Inc. Slide - 1 Content 1. Identifying the Target Parameter 2.

More information

Statistics for Managers Using Microsoft Excel/SPSS Chapter 8 Fundamentals of Hypothesis Testing: One-Sample Tests

Statistics for Managers Using Microsoft Excel/SPSS Chapter 8 Fundamentals of Hypothesis Testing: One-Sample Tests Statistics for Managers Using Microsoft Excel/SPSS Chapter 8 Fundamentals of Hypothesis Testing: One-Sample Tests 1999 Prentice-Hall, Inc. Chap. 8-1 Chapter Topics Hypothesis Testing Methodology Z Test

More information

Chapter 23. Inference About Means

Chapter 23. Inference About Means Chapter 23 Inference About Means 1 /57 Homework p554 2, 4, 9, 10, 13, 15, 17, 33, 34 2 /57 Objective Students test null and alternate hypotheses about a population mean. 3 /57 Here We Go Again Now that

More information

Important note: Transcripts are not substitutes for textbook assignments. 1

Important note: Transcripts are not substitutes for textbook assignments. 1 In this lesson we will cover correlation and regression, two really common statistical analyses for quantitative (or continuous) data. Specially we will review how to organize the data, the importance

More information

y = a + bx 12.1: Inference for Linear Regression Review: General Form of Linear Regression Equation Review: Interpreting Computer Regression Output

y = a + bx 12.1: Inference for Linear Regression Review: General Form of Linear Regression Equation Review: Interpreting Computer Regression Output 12.1: Inference for Linear Regression Review: General Form of Linear Regression Equation y = a + bx y = dependent variable a = intercept b = slope x = independent variable Section 12.1 Inference for Linear

More information

Chapter 10. Regression. Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania

Chapter 10. Regression. Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania Chapter 10 Regression Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania Scatter Diagrams A graph in which pairs of points, (x, y), are

More information

Business Statistics. Lecture 5: Confidence Intervals

Business Statistics. Lecture 5: Confidence Intervals Business Statistics Lecture 5: Confidence Intervals Goals for this Lecture Confidence intervals The t distribution 2 Welcome to Interval Estimation! Moments Mean 815.0340 Std Dev 0.8923 Std Error Mean

More information

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007) FROM: PAGANO, R. R. (007) I. INTRODUCTION: DISTINCTION BETWEEN PARAMETRIC AND NON-PARAMETRIC TESTS Statistical inference tests are often classified as to whether they are parametric or nonparametric Parameter

More information

10/4/2013. Hypothesis Testing & z-test. Hypothesis Testing. Hypothesis Testing

10/4/2013. Hypothesis Testing & z-test. Hypothesis Testing. Hypothesis Testing & z-test Lecture Set 11 We have a coin and are trying to determine if it is biased or unbiased What should we assume? Why? Flip coin n = 100 times E(Heads) = 50 Why? Assume we count 53 Heads... What could

More information

Exam Empirical Methods VU University Amsterdam, Faculty of Exact Sciences h, February 12, 2015

Exam Empirical Methods VU University Amsterdam, Faculty of Exact Sciences h, February 12, 2015 Exam Empirical Methods VU University Amsterdam, Faculty of Exact Sciences 18.30 21.15h, February 12, 2015 Question 1 is on this page. Always motivate your answers. Write your answers in English. Only the

More information

Unit 10: Simple Linear Regression and Correlation

Unit 10: Simple Linear Regression and Correlation Unit 10: Simple Linear Regression and Correlation Statistics 571: Statistical Methods Ramón V. León 6/28/2004 Unit 10 - Stat 571 - Ramón V. León 1 Introductory Remarks Regression analysis is a method for

More information

Hypothesis Testing. We normally talk about two types of hypothesis: the null hypothesis and the research or alternative hypothesis.

Hypothesis Testing. We normally talk about two types of hypothesis: the null hypothesis and the research or alternative hypothesis. Hypothesis Testing Today, we are going to begin talking about the idea of hypothesis testing how we can use statistics to show that our causal models are valid or invalid. We normally talk about two types

More information

Correlation & Simple Regression

Correlation & Simple Regression Chapter 11 Correlation & Simple Regression The previous chapter dealt with inference for two categorical variables. In this chapter, we would like to examine the relationship between two quantitative variables.

More information

Chapter 3 Multiple Regression Complete Example

Chapter 3 Multiple Regression Complete Example Department of Quantitative Methods & Information Systems ECON 504 Chapter 3 Multiple Regression Complete Example Spring 2013 Dr. Mohammad Zainal Review Goals After completing this lecture, you should be

More information

Lecture #16 Thursday, October 13, 2016 Textbook: Sections 9.3, 9.4, 10.1, 10.2

Lecture #16 Thursday, October 13, 2016 Textbook: Sections 9.3, 9.4, 10.1, 10.2 STATISTICS 200 Lecture #16 Thursday, October 13, 2016 Textbook: Sections 9.3, 9.4, 10.1, 10.2 Objectives: Define standard error, relate it to both standard deviation and sampling distribution ideas. Describe

More information

Tutorial 3: Power and Sample Size for the Two-sample t-test with Equal Variances. Acknowledgements:

Tutorial 3: Power and Sample Size for the Two-sample t-test with Equal Variances. Acknowledgements: Tutorial 3: Power and Sample Size for the Two-sample t-test with Equal Variances Anna E. Barón, Keith E. Muller, Sarah M. Kreidler, and Deborah H. Glueck Acknowledgements: The project was supported in

More information

CIVL /8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8

CIVL /8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8 CIVL - 7904/8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8 Chi-square Test How to determine the interval from a continuous distribution I = Range 1 + 3.322(logN) I-> Range of the class interval

More information

Section 10.1 (Part 2 of 2) Significance Tests: Power of a Test

Section 10.1 (Part 2 of 2) Significance Tests: Power of a Test 1 Section 10.1 (Part 2 of 2) Significance Tests: Power of a Test Learning Objectives After this section, you should be able to DESCRIBE the relationship between the significance level of a test, P(Type

More information

HYPOTHESIS TESTING. Hypothesis Testing

HYPOTHESIS TESTING. Hypothesis Testing MBA 605 Business Analytics Don Conant, PhD. HYPOTHESIS TESTING Hypothesis testing involves making inferences about the nature of the population on the basis of observations of a sample drawn from the population.

More information

MATH 1150 Chapter 2 Notation and Terminology

MATH 1150 Chapter 2 Notation and Terminology MATH 1150 Chapter 2 Notation and Terminology Categorical Data The following is a dataset for 30 randomly selected adults in the U.S., showing the values of two categorical variables: whether or not the

More information

determine whether or not this relationship is.

determine whether or not this relationship is. Section 9-1 Correlation A correlation is a between two. The data can be represented by ordered pairs (x,y) where x is the (or ) variable and y is the (or ) variable. There are several types of correlations

More information

Questions 3.83, 6.11, 6.12, 6.17, 6.25, 6.29, 6.33, 6.35, 6.50, 6.51, 6.53, 6.55, 6.59, 6.60, 6.65, 6.69, 6.70, 6.77, 6.79, 6.89, 6.

Questions 3.83, 6.11, 6.12, 6.17, 6.25, 6.29, 6.33, 6.35, 6.50, 6.51, 6.53, 6.55, 6.59, 6.60, 6.65, 6.69, 6.70, 6.77, 6.79, 6.89, 6. Chapter 7 Reading 7.1, 7.2 Questions 3.83, 6.11, 6.12, 6.17, 6.25, 6.29, 6.33, 6.35, 6.50, 6.51, 6.53, 6.55, 6.59, 6.60, 6.65, 6.69, 6.70, 6.77, 6.79, 6.89, 6.112 Introduction In Chapter 5 and 6, we emphasized

More information

Lecture 11 - Tests of Proportions

Lecture 11 - Tests of Proportions Lecture 11 - Tests of Proportions Statistics 102 Colin Rundel February 27, 2013 Research Project Research Project Proposal - Due Friday March 29th at 5 pm Introduction, Data Plan Data Project - Due Friday,

More information

Answer Key. 9.1 Scatter Plots and Linear Correlation. Chapter 9 Regression and Correlation. CK-12 Advanced Probability and Statistics Concepts 1

Answer Key. 9.1 Scatter Plots and Linear Correlation. Chapter 9 Regression and Correlation. CK-12 Advanced Probability and Statistics Concepts 1 9.1 Scatter Plots and Linear Correlation Answers 1. A high school psychologist wants to conduct a survey to answer the question: Is there a relationship between a student s athletic ability and his/her

More information

Multiple samples: Modeling and ANOVA

Multiple samples: Modeling and ANOVA Multiple samples: Modeling and Patrick Breheny April 29 Patrick Breheny Introduction to Biostatistics (171:161) 1/23 Multiple group studies In the latter half of this course, we have discussed the analysis

More information

y response variable x 1, x 2,, x k -- a set of explanatory variables

y response variable x 1, x 2,, x k -- a set of explanatory variables 11. Multiple Regression and Correlation y response variable x 1, x 2,, x k -- a set of explanatory variables In this chapter, all variables are assumed to be quantitative. Chapters 12-14 show how to incorporate

More information

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics Exploring Data: Distributions Look for overall pattern (shape, center, spread) and deviations (outliers). Mean (use a calculator): x = x 1 + x

More information

Statistical Distribution Assumptions of General Linear Models

Statistical Distribution Assumptions of General Linear Models Statistical Distribution Assumptions of General Linear Models Applied Multilevel Models for Cross Sectional Data Lecture 4 ICPSR Summer Workshop University of Colorado Boulder Lecture 4: Statistical Distributions

More information

Multiple Regression Analysis

Multiple Regression Analysis Multiple Regression Analysis y = β 0 + β 1 x 1 + β 2 x 2 +... β k x k + u 2. Inference 0 Assumptions of the Classical Linear Model (CLM)! So far, we know: 1. The mean and variance of the OLS estimators

More information

Business Statistics. Lecture 10: Correlation and Linear Regression

Business Statistics. Lecture 10: Correlation and Linear Regression Business Statistics Lecture 10: Correlation and Linear Regression Scatterplot A scatterplot shows the relationship between two quantitative variables measured on the same individuals. It displays the Form

More information

AMS7: WEEK 7. CLASS 1. More on Hypothesis Testing Monday May 11th, 2015

AMS7: WEEK 7. CLASS 1. More on Hypothesis Testing Monday May 11th, 2015 AMS7: WEEK 7. CLASS 1 More on Hypothesis Testing Monday May 11th, 2015 Testing a Claim about a Standard Deviation or a Variance We want to test claims about or 2 Example: Newborn babies from mothers taking

More information

Lectures 5 & 6: Hypothesis Testing

Lectures 5 & 6: Hypothesis Testing Lectures 5 & 6: Hypothesis Testing in which you learn to apply the concept of statistical significance to OLS estimates, learn the concept of t values, how to use them in regression work and come across

More information

t-test for b Copyright 2000 Tom Malloy. All rights reserved. Regression

t-test for b Copyright 2000 Tom Malloy. All rights reserved. Regression t-test for b Copyright 2000 Tom Malloy. All rights reserved. Regression Recall, back some time ago, we used a descriptive statistic which allowed us to draw the best fit line through a scatter plot. We

More information

Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3

Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3 Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details Section 10.1, 2, 3 Basic components of regression setup Target of inference: linear dependency

More information

Tables Table A Table B Table C Table D Table E 675

Tables Table A Table B Table C Table D Table E 675 BMTables.indd Page 675 11/15/11 4:25:16 PM user-s163 Tables Table A Standard Normal Probabilities Table B Random Digits Table C t Distribution Critical Values Table D Chi-square Distribution Critical Values

More information

INTERVAL ESTIMATION AND HYPOTHESES TESTING

INTERVAL ESTIMATION AND HYPOTHESES TESTING INTERVAL ESTIMATION AND HYPOTHESES TESTING 1. IDEA An interval rather than a point estimate is often of interest. Confidence intervals are thus important in empirical work. To construct interval estimates,

More information

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

Ch. 16: Correlation and Regression

Ch. 16: Correlation and Regression Ch. 1: Correlation and Regression With the shift to correlational analyses, we change the very nature of the question we are asking of our data. Heretofore, we were asking if a difference was likely to

More information

Statistical inference provides methods for drawing conclusions about a population from sample data.

Statistical inference provides methods for drawing conclusions about a population from sample data. Introduction to inference Confidence Intervals Statistical inference provides methods for drawing conclusions about a population from sample data. 10.1 Estimating with confidence SAT σ = 100 n = 500 µ

More information

COSC 341 Human Computer Interaction. Dr. Bowen Hui University of British Columbia Okanagan

COSC 341 Human Computer Interaction. Dr. Bowen Hui University of British Columbia Okanagan COSC 341 Human Computer Interaction Dr. Bowen Hui University of British Columbia Okanagan 1 Last Topic Distribution of means When it is needed How to build one (from scratch) Determining the characteristics

More information