Data Analysis and Statistical Methods Statistics 651

Size: px
Start display at page:

Download "Data Analysis and Statistical Methods Statistics 651"

Transcription

1 Data Analysis and Statistical Methods Statistics 65 Suhasini Subba Rao Comparing populations Suppose I want to compare the heights of males and females at A&M. I can consider all boys at Texas A&M as one population and all girls at Texas A&M as another population. Question : Is the mean girl height less than that of the boy height? Question 2: What is the difference in the mean girl and mean boy height. How much larger are boys than girls. As I do not have data from the entire student population, I can use the data from a class. Suggestion: Compare the sample mean of the male heights with the sample mean of female heights. Female Heights: Male Heights: Let X be the height of a randomly selected female and Y the height of randomly selected male. There are n = 37 girls and m = 27 boys in the samples. The sample mean for girls is X = 37 sample mean for boys is Ȳ = i= Y i = Let µ X be the female population mean height and µ Y population mean height. 37 i= X i = 5.45 and be the male We are interesting in the quantity µ X µ Y. It will tell us how much larger, how small or whether the male and female heights are equal. Of course, we do not know that difference µ X µ Y, and need to infer something about µ X µ Y from the samples. Intuitively it is obvious that to see whether µ X and µ Y are equal, we need to compare the sample averages X and Ȳ and look at their difference X Ȳ. What can the differences in the sample means say about the differences in the true means that is µ X µ Y (population mean of females - population mean of males)? 2 3

2 We would expect that population mean of females is less than population mean of males, in other words population mean of females - population mean of males to be less than zero. Hence we would be interested in testing H 0 : µ X µ Y 0 against H A : µ X µ Y < 0. We also want know the magnitude of the difference, this mean constructing a CI for µ X µ Y. Clearly if X Ȳ > 0 we would be unable to reject the null (why??? - remember X Ȳ has to pointing in the same direction as the alternative). But if X Ȳ < 0, then we can use a statistical test. The question is how to make the comparison, what is the distribution of X Ȳ, we look at this now. Aims: Comparing male and female heights To build a confidence interval for the mean difference µ X µ Y (this will tell us where the mean difference lies an is very informative). To test the hypothesis that H 0 : µ X µ Y 0 (mean female height and male height are the same or mean female height is greater than mean male height) against the alternative H A : µ X µ Y < 0. We can also test whether H 0 : µ X µ Y 0.3 against H A : µ X µ Y < 0.3. This is essentially testing whether boys tend on average to be more than 0.3 feet taller than girls. This situation can also arise. 4 5 We will consider both constructing CIs for the difference between the sample means and also hypothesis testing. We do how to do this by both hand and reading the output in JMP. It is important to understand both. Below we will consider assumptions that are required to make the test and also the details. The details may appear to be overwhelming, but do not be detered by them. In order to do any test, ie. H 0 : µ X µ Y 0 against H A : µ X µ Y < 0 or H 0 : µ X µ Y 0.3 against H A : µ X µ Y < 0.3 or to construct CI for µ X µ Y we need three magical ingredients: The difference of the sample averages: X Ȳ. The standard error of X Ȳ (this will turn out to be The sample sizes m and n are relatively large. σ 2 n + σ2 m )). 6 7

3 Formal: comparing populations We have two samples from two different populations. That is X,..., X n is a size n sample (eg. heights of females in the 65 class) from population (eg. heights of all females) and Y,..., Y m is a size m sample (eg. the heights of males in the 65 class) from population 2 (eg. heights of all males). The mean of population, is µ X (eg. mean height of a female) and the mean of population 2 is µ Y (eg. mean height of a male). Given the samples we want to make inference about the difference µ X µ Y. Is one housing material better than another? In the above examples what are the different populations and samples? All these questions are important and can lead to quite important decisions, therefore it is important that we do a careful analysis. To construct CIs and do a hypothesis we do an independent 2 sample t-test. To do this test we have to ensure the data satisfies the assumptions below. It is clear this is an important question. Other examples include: Does a new therapy work better than old therapy? Is there a difference in the performance of one school over an other? On average does eating healthy food mean you live longer? On average if one studies more do they get better grades? 8 9 Assumptions and how to check them We have two samples from two different populations X,..., X n and Y,..., Y m (sample size n and m respectively). Both samples are independent of each other and independent within the sample. For example the values X,..., X n should have no influence on Y,..., Y m and X should not have any influence on X 2,..., X n. Can you think of examples when this may not be true? It is likely for observations taken over time, those taken around the time will be close. Checking for independence can be difficult, though there are methods available. In practice this may not be true, but it does not have to be strictly the same so long as they have similar sample sizes (see Ott and Longnecker, page 275). Make a boxplot of both samples and check if the variation is the same. We can also do a test to see if the variances from the two populations are the same (we do this later). If n and m are small the observations X,..., X n and Y,..., Y m should be close to normal. If n and m are large this normality of the observations does not matter (this is the same as in the one-sample tests). When n and m are small make a QQ-plot. There may be good reasons why the original data is normal. The variance of both populations need to be about the same. That is var(x i ) = σ 2 X and var(y i) = σ 2 Y, and we must have σ2 X = σ2 Y. 0

4 Compare means of populations We do not have the populations available, only the samples, and we have to base our conclusions on the samples. To make inference about the population mean we should look at the difference between the sample means: X Ȳ. We are interested in constructing a confidence interval for the difference in the population means. The CI will tell us how much larger one mean is than another or if they could be similar in value. Testing the hypothesis H 0 : µ x µ y = 0 against H A : µ x µ y 0 (or the one-sided versions of this: H 0 : µ x µ y 0 against H A : µ x µ y < 0 or H 0 : µ x µ y 0 against H A : µ x µ y > 0). But to do any of the above we require more than just X Ȳ. Remember both X and Ȳ are sample means hence are random variables, their distribution is centered about the true means µ X and µ Y. Therefore X Ȳ is a random variable too, and their distribution is centered about µ X µ Y. Now to do anything we require the distribution of X Ȳ, and its standard error - this explains how much spread or error there is in X Ȳ. We formalise this below. Don t panic we will go through some examples later on. 2 3 Distribution of the difference of the sample means X Ȳ If X i and Y i are have the same variance σ 2 and X and Ȳ are close to normal, then the difference of the averages has the following distribution: ( ( X Ȳ N µ X µ Y, σ 2 n + )). m Note that σ ( 2 n + ) ) m = (σ 2 n + σ2 m. Important points: The distribution is centered about µ X µ Y, hence I am likely to draw close to µ X µ Y. How close depends on the standard error which is σ ( 2 n + m). The larger the sample sizes n and m are the smaller the standard error (just like in the one sample case, where we deal with just one sample mean X, which has standard error σ2 n ). Therefore we can make a Z-transform of the difference X Ȳ ( X Ȳ ) (µ x µ y ) N(0,). σ n + m Of course in practice σ 2 will not be known and has to be estimated from the data. 4 5

5 The distribution cont. When the variance is unknown and we use the sample pooled variance s 2, then The distribution of the standardised transform using the sample variance is : ( X Ȳ ) (µ x µ y ) t(m + n 2). s n + m It has a t-distribution with (n + m 2) degrees of freedoms (that is a t-distribution, where the number of degrees of freedom is the sum of the two sample sizes minus two). Don t panic! If the samples from both populations are greater than 30. Then everything is wonderful and all we require is X Ȳ, which we can get from the data, s (the sample standard deviation of the populations), which is always given to you and the sample sizes n and m. With these ingredients you can contruct CI and do tests. s If you are really lucky rather evaluate 2 n + s2 m yourself, if you are given output it will already be there in the JMP output! See how it is all put together intwo sample independent t-test JMP.pdf. 6 7 Confidence intervals for the differences the mean The 99% CI in the case that n = 2 and m = 3 is At the 00( α)% level this gives the confidence interval for the difference in mean to be ( X Ȳ ) t α/2(n + m 2)s n + m,( X Ȳ ) + t α/2(n + m 2)s n + m. ( X Ȳ ) t 0.005(50)s 2 + 3,( X Ȳ ) + t 0.005(50)s You will need to look up t (50) and t (50) in the t-tables. Examples: The 95% CI in the case that n = 2 and m = 3 is ( X Ȳ ) t 0.025(50)s 2 + 3,( X Ȳ ) + t 0.025(50)s

6 Choosing the sample sizes Notice that the length of the interval is small when Suppose n + m = 00, then If n = 50 and m = 50, If n = 99 and m =, 50 = = n + m is small. Hypothesis testing Testing H 0 : µ x µ y = 0 against H A : µ x µ y 0. What we need to do. Calculate the Z-statistic under the null: ( X Ȳ ) 0 s n + m We see that the variance will be small when n and m are close. Remember a smaller variance = a better estimator. Therefore having similar sample sizes (for a given total sample) is a good thing! We can always access the quality of the average difference X by looking at its variance: σ 2( n + m). As always the smaller ( n + m) the better. and look this number up in the t-tables with (n + m 2) degrees of freedom. If (n + m 2) is large use the normal tables instead. This will give you the p-value. If p-value is small (say less than 5%), then we reject the null in favour of the alternative Example: Heights of students We know that there are n = 37 girls and m = 27 boys. The sample mean for girls is X = i= X i = 5.45 and sample mean for boys is Ȳ = i= Y i = Build a 95% confidence interval for µ x µ y. Make a hypothesis test that the mean male and female height are the same against the alternative that mean male height is greater than mean female height (α = 0.05). The sample variance for girls is s 2 x = 37 s 2 y = i= (Y i Ȳ )2 = i= (X i X) 2 = and The two populations are all male and female heights in A&M. Suppose the population mean female height is µ x and the population mean male height is µ y. Object: 22 23

7 Checking the assumptions for the height data to do a independent sample t-test Male and Female Boxplot Unless many of the students in the 65 class were related it is reasonable to assume that they are independent. The sample standard deviations are s x = 0.22 and s y = 0.275, which are close Below we make boxplots and QQplots. 2 The standard deviations s x = 0.22 and s y = are quite smilar and this is confirmed by the boxplots. The spread of the interquatile ranges in the two plots look similar Sample Quantiles Male and Female QQ-plots Normal Q Q Plot Theoretical Quantiles They data looks close to normal (in a handwavey sense). The sample size of 27 and 37 are quite large so I think we can stick to the normal assumption. There does seem to be one huge outlier for the female plot and a few male outliers, which we need to keep in mind. We now do the test by hand, but compare it with the JMP output in two sample independent t-test JMP.pdf. Normal Q Q Plot Sample Quantiles The ingredients we need are: The sample variance of the population is Theoretical Quantiles Female heights is the top plot, male heights the lower plot. s 2 = (37 ) (27 ) Don t worry how this was obtained. = =

8 t α/2 (n + m 2) = t (62). The t-distribution with 62 degrees of freedom is not in the tables. Either use t (60), but since 62 is quite large you can also use the normal approximation: z =.96. X Ȳ = = The 95% CI is The confidence interval for the heights ( 27 + ) (, [ , ] = [ 0.59, 0.34]. Zero is not contained in the above. So it seems like Texas A&M boys tend to be taller than Texas A&M girls. With 95% confidence the difference in mean heights seems to lie in the interval [ 0.59, 0.34] Hypothesis test for the heights This is closely related to what we did above. We want to test H 0 : µ x µ y 0 against H A : µ x µ y < 0. Note that whether the test is a left hand test or a right hand test, depends on you choose to order of µ x and µ y, either µ x µ y or µ y µ x. This becomes even more important when you do the test in JMP. JMP automatically selects whether it is considering the difference µ x µ y or µ y µ x, and this depends on how you code the levels (for example for the male/female data it depends on how you code the male and female categories). But from the output you should see which way it takes the difference. In Means for Oneway Anova, you will see JMP gives the sample mean for each level (you should know what the levels correspond to), for example, in the height example 0 is male and is female, the mean for level 0 is 5.9 and the mean for level is In the t-test, it will give the Difference, for example the difference for the height exampe is , hence you can see that JMP is evaluating level - level 0, ie it formulates the test as µ x µ y, hence you should state your hypothesis in terms of the difference µ x µ y. JMP also gives you a clue, just below t-test it states -0, which means that it formulates the test as level - level 0. We assume for now the null and construct the test statistic. Under the null we have ( X Ȳ ) s t( ). 30 3

9 We do the calculation: ( ) = 8.3. Example: Diets Two diets are being compare for effectiveness. 0 volunteers went on diet and 0 different volunteers went on Diet 2. After one month their weight loss (in kilos) was recorded. The data is given below. We don t have the t(62) in the tables. So we approximate with a normal distribution. Suppose Z N(0, ), then P(Z 8.3) 0. So the p-value is really small. Diet I Diet II Let µ I be the mean weight loss of diet I and µ II be the mean weight loss of diet II. Test the hypothesis that the diets are different. So pretty much for all values of α we reject the null in favour of the alternative. Texas A&M boys tend to be taller than Texas A&M girls Aside: Estimating the variance σ 2 : The pooled sample variance This is the formula for estimating the sample variance σ 2 : Evaluate the sample variance s 2 x = n n i= (X i X) 2. Evaluate the sample variance s 2 y = n n i= (Y i Ȳ )2. Evaluate pooled sample variance: s 2 = (n )s2 x + (m )s 2 y. n + m 2 You do not have to know this, you just need to know that JMP will estimate the variance of the population variance σ 2 using the sample variance above. 34

Data Analysis and Statistical Methods Statistics 651

Data Analysis and Statistical Methods Statistics 651 y 1 2 3 4 5 6 7 x Data Analysis and Statistical Methods Statistics 651 http://www.stat.tamu.edu/~suhasini/teaching.html Lecture 32 Suhasini Subba Rao Previous lecture We are interested in whether a dependent

More information

Data Analysis and Statistical Methods Statistics 651

Data Analysis and Statistical Methods Statistics 651 Data Analysis and Statistical Methods Statistics 65 http://www.stat.tamu.edu/~suhasini/teaching.html Suhasini Subba Rao Review In the previous lecture we considered the following tests: The independent

More information

Data Analysis and Statistical Methods Statistics 651

Data Analysis and Statistical Methods Statistics 651 Data Analysis and Statistical Methods Statistics 651 http://www.stat.tamu.edu/~suhasini/teaching.html Suhasini Subba Rao Motivations for the ANOVA We defined the F-distribution, this is mainly used in

More information

Chapter 7: Statistical Inference (Two Samples)

Chapter 7: Statistical Inference (Two Samples) Chapter 7: Statistical Inference (Two Samples) Shiwen Shen University of South Carolina 2016 Fall Section 003 1 / 41 Motivation of Inference on Two Samples Until now we have been mainly interested in a

More information

Data Analysis and Statistical Methods Statistics 651

Data Analysis and Statistical Methods Statistics 651 Data Analysis and Statistical Methods Statistics 651 http://www.stat.tamu.edu/~suhasini/teaching.html Boxplots and standard deviations Suhasini Subba Rao Review of previous lecture In the previous lecture

More information

Inferences for Regression

Inferences for Regression Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In

More information

Chapter 24. Comparing Means. Copyright 2010 Pearson Education, Inc.

Chapter 24. Comparing Means. Copyright 2010 Pearson Education, Inc. Chapter 24 Comparing Means Copyright 2010 Pearson Education, Inc. Plot the Data The natural display for comparing two groups is boxplots of the data for the two groups, placed side-by-side. For example:

More information

Comparing Means from Two-Sample

Comparing Means from Two-Sample Comparing Means from Two-Sample Kwonsang Lee University of Pennsylvania kwonlee@wharton.upenn.edu April 3, 2015 Kwonsang Lee STAT111 April 3, 2015 1 / 22 Inference from One-Sample We have two options to

More information

STAT Chapter 9: Two-Sample Problems. Paired Differences (Section 9.3)

STAT Chapter 9: Two-Sample Problems. Paired Differences (Section 9.3) STAT 515 -- Chapter 9: Two-Sample Problems Paired Differences (Section 9.3) Examples of Paired Differences studies: Similar subjects are paired off and one of two treatments is given to each subject in

More information

Module 03 Lecture 14 Inferential Statistics ANOVA and TOI

Module 03 Lecture 14 Inferential Statistics ANOVA and TOI Introduction of Data Analytics Prof. Nandan Sudarsanam and Prof. B Ravindran Department of Management Studies and Department of Computer Science and Engineering Indian Institute of Technology, Madras Module

More information

Questions 3.83, 6.11, 6.12, 6.17, 6.25, 6.29, 6.33, 6.35, 6.50, 6.51, 6.53, 6.55, 6.59, 6.60, 6.65, 6.69, 6.70, 6.77, 6.79, 6.89, 6.

Questions 3.83, 6.11, 6.12, 6.17, 6.25, 6.29, 6.33, 6.35, 6.50, 6.51, 6.53, 6.55, 6.59, 6.60, 6.65, 6.69, 6.70, 6.77, 6.79, 6.89, 6. Chapter 7 Reading 7.1, 7.2 Questions 3.83, 6.11, 6.12, 6.17, 6.25, 6.29, 6.33, 6.35, 6.50, 6.51, 6.53, 6.55, 6.59, 6.60, 6.65, 6.69, 6.70, 6.77, 6.79, 6.89, 6.112 Introduction In Chapter 5 and 6, we emphasized

More information

Chapter 27 Summary Inferences for Regression

Chapter 27 Summary Inferences for Regression Chapter 7 Summary Inferences for Regression What have we learned? We have now applied inference to regression models. Like in all inference situations, there are conditions that we must check. We can test

More information

Chapter 10: STATISTICAL INFERENCE FOR TWO SAMPLES. Part 1: Hypothesis tests on a µ 1 µ 2 for independent groups

Chapter 10: STATISTICAL INFERENCE FOR TWO SAMPLES. Part 1: Hypothesis tests on a µ 1 µ 2 for independent groups Chapter 10: STATISTICAL INFERENCE FOR TWO SAMPLES Part 1: Hypothesis tests on a µ 1 µ 2 for independent groups Sections 10-1 & 10-2 Independent Groups It is common to compare two groups, and do a hypothesis

More information

An Analysis of College Algebra Exam Scores December 14, James D Jones Math Section 01

An Analysis of College Algebra Exam Scores December 14, James D Jones Math Section 01 An Analysis of College Algebra Exam s December, 000 James D Jones Math - Section 0 An Analysis of College Algebra Exam s Introduction Students often complain about a test being too difficult. Are there

More information

Statistics Primer. ORC Staff: Jayme Palka Peter Boedeker Marcus Fagan Trey Dejong

Statistics Primer. ORC Staff: Jayme Palka Peter Boedeker Marcus Fagan Trey Dejong Statistics Primer ORC Staff: Jayme Palka Peter Boedeker Marcus Fagan Trey Dejong 1 Quick Overview of Statistics 2 Descriptive vs. Inferential Statistics Descriptive Statistics: summarize and describe data

More information

Data Analysis and Statistical Methods Statistics 651

Data Analysis and Statistical Methods Statistics 651 Data Analysis and Statistical Methods Statistics 651 http://www.stat.tamu.edu/~suhasini/teaching.html Lecture 9 (MWF) Calculations for the normal distribution Suhasini Subba Rao Evaluating probabilities

More information

Chapter 23. Inferences About Means. Monday, May 6, 13. Copyright 2009 Pearson Education, Inc.

Chapter 23. Inferences About Means. Monday, May 6, 13. Copyright 2009 Pearson Education, Inc. Chapter 23 Inferences About Means Sampling Distributions of Means Now that we know how to create confidence intervals and test hypotheses about proportions, we do the same for means. Just as we did before,

More information

Chapter 23: Inferences About Means

Chapter 23: Inferences About Means Chapter 3: Inferences About Means Sample of Means: number of observations in one sample the population mean (theoretical mean) sample mean (observed mean) is the theoretical standard deviation of the population

More information

1-Way ANOVA MATH 143. Spring Department of Mathematics and Statistics Calvin College

1-Way ANOVA MATH 143. Spring Department of Mathematics and Statistics Calvin College 1-Way ANOVA MATH 143 Department of Mathematics and Statistics Calvin College Spring 2010 The basic ANOVA situation Two variables: 1 Categorical, 1 Quantitative Main Question: Do the (means of) the quantitative

More information

appstats27.notebook April 06, 2017

appstats27.notebook April 06, 2017 Chapter 27 Objective Students will conduct inference on regression and analyze data to write a conclusion. Inferences for Regression An Example: Body Fat and Waist Size pg 634 Our chapter example revolves

More information

This is particularly true if you see long tails in your data. What are you testing? That the two distributions are the same!

This is particularly true if you see long tails in your data. What are you testing? That the two distributions are the same! Two sample tests (part II): What to do if your data are not distributed normally: Option 1: if your sample size is large enough, don't worry - go ahead and use a t-test (the CLT will take care of non-normal

More information

hypotheses. P-value Test for a 2 Sample z-test (Large Independent Samples) n > 30 P-value Test for a 2 Sample t-test (Small Samples) n < 30 Identify α

hypotheses. P-value Test for a 2 Sample z-test (Large Independent Samples) n > 30 P-value Test for a 2 Sample t-test (Small Samples) n < 30 Identify α Chapter 8 Notes Section 8-1 Independent and Dependent Samples Independent samples have no relation to each other. An example would be comparing the costs of vacationing in Florida to the cost of vacationing

More information

AMS7: WEEK 7. CLASS 1. More on Hypothesis Testing Monday May 11th, 2015

AMS7: WEEK 7. CLASS 1. More on Hypothesis Testing Monday May 11th, 2015 AMS7: WEEK 7. CLASS 1 More on Hypothesis Testing Monday May 11th, 2015 Testing a Claim about a Standard Deviation or a Variance We want to test claims about or 2 Example: Newborn babies from mothers taking

More information

We're in interested in Pr{three sixes when throwing a single dice 8 times}. => Y has a binomial distribution, or in official notation, Y ~ BIN(n,p).

We're in interested in Pr{three sixes when throwing a single dice 8 times}. => Y has a binomial distribution, or in official notation, Y ~ BIN(n,p). Sampling distributions and estimation. 1) A brief review of distributions: We're in interested in Pr{three sixes when throwing a single dice 8 times}. => Y has a binomial distribution, or in official notation,

More information

Introduction to hypothesis testing

Introduction to hypothesis testing Introduction to hypothesis testing Review: Logic of Hypothesis Tests Usually, we test (attempt to falsify) a null hypothesis (H 0 ): includes all possibilities except prediction in hypothesis (H A ) If

More information

The Components of a Statistical Hypothesis Testing Problem

The Components of a Statistical Hypothesis Testing Problem Statistical Inference: Recall from chapter 5 that statistical inference is the use of a subset of a population (the sample) to draw conclusions about the entire population. In chapter 5 we studied one

More information

One sided tests. An example of a two sided alternative is what we ve been using for our two sample tests:

One sided tests. An example of a two sided alternative is what we ve been using for our two sample tests: One sided tests So far all of our tests have been two sided. While this may be a bit easier to understand, this is often not the best way to do a hypothesis test. One simple thing that we can do to get

More information

Lecture 18: Simple Linear Regression

Lecture 18: Simple Linear Regression Lecture 18: Simple Linear Regression BIOS 553 Department of Biostatistics University of Michigan Fall 2004 The Correlation Coefficient: r The correlation coefficient (r) is a number that measures the strength

More information

Chapter 22. Comparing Two Proportions. Bin Zou STAT 141 University of Alberta Winter / 15

Chapter 22. Comparing Two Proportions. Bin Zou STAT 141 University of Alberta Winter / 15 Chapter 22 Comparing Two Proportions Bin Zou (bzou@ualberta.ca) STAT 141 University of Alberta Winter 2015 1 / 15 Introduction In Ch.19 and Ch.20, we studied confidence interval and test for proportions,

More information

STA2601. Tutorial letter 203/2/2017. Applied Statistics II. Semester 2. Department of Statistics STA2601/203/2/2017. Solutions to Assignment 03

STA2601. Tutorial letter 203/2/2017. Applied Statistics II. Semester 2. Department of Statistics STA2601/203/2/2017. Solutions to Assignment 03 STA60/03//07 Tutorial letter 03//07 Applied Statistics II STA60 Semester Department of Statistics Solutions to Assignment 03 Define tomorrow. university of south africa QUESTION (a) (i) The normal quantile

More information

CHAPTER 9: HYPOTHESIS TESTING

CHAPTER 9: HYPOTHESIS TESTING CHAPTER 9: HYPOTHESIS TESTING THE SECOND LAST EXAMPLE CLEARLY ILLUSTRATES THAT THERE IS ONE IMPORTANT ISSUE WE NEED TO EXPLORE: IS THERE (IN OUR TWO SAMPLES) SUFFICIENT STATISTICAL EVIDENCE TO CONCLUDE

More information

CHAPTER 10 Comparing Two Populations or Groups

CHAPTER 10 Comparing Two Populations or Groups CHAPTER 10 Comparing Two Populations or Groups 10. Comparing Two Means The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers Comparing Two Means Learning

More information

CHAPTER 10 Comparing Two Populations or Groups

CHAPTER 10 Comparing Two Populations or Groups CHAPTER 10 Comparing Two Populations or Groups 10.2 Comparing Two Means The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers Comparing Two Means Learning

More information

LAB 2. HYPOTHESIS TESTING IN THE BIOLOGICAL SCIENCES- Part 2

LAB 2. HYPOTHESIS TESTING IN THE BIOLOGICAL SCIENCES- Part 2 LAB 2. HYPOTHESIS TESTING IN THE BIOLOGICAL SCIENCES- Part 2 Data Analysis: The mean egg masses (g) of the two different types of eggs may be exactly the same, in which case you may be tempted to accept

More information

Hypothesis Testing with Z and T

Hypothesis Testing with Z and T Chapter Eight Hypothesis Testing with Z and T Introduction to Hypothesis Testing P Values Critical Values Within-Participants Designs Between-Participants Designs Hypothesis Testing An alternate hypothesis

More information

Review: General Approach to Hypothesis Testing. 1. Define the research question and formulate the appropriate null and alternative hypotheses.

Review: General Approach to Hypothesis Testing. 1. Define the research question and formulate the appropriate null and alternative hypotheses. 1 Review: Let X 1, X,..., X n denote n independent random variables sampled from some distribution might not be normal!) with mean µ) and standard deviation σ). Then X µ σ n In other words, X is approximately

More information

Stat 427/527: Advanced Data Analysis I

Stat 427/527: Advanced Data Analysis I Stat 427/527: Advanced Data Analysis I Review of Chapters 1-4 Sep, 2017 1 / 18 Concepts you need to know/interpret Numerical summaries: measures of center (mean, median, mode) measures of spread (sample

More information

Standard normal distribution. t-distribution, (df=5) t-distribution, (df=2) PDF created with pdffactory Pro trial version

Standard normal distribution. t-distribution, (df=5) t-distribution, (df=2) PDF created with pdffactory Pro trial version t-ditribution In biological reearch the population variance i uually unknown and an unbiaed etimate,, obtained from the ample data, ha to be ued in place of σ. The propertie of t- ditribution are: -It

More information

Inferences Based on Two Samples

Inferences Based on Two Samples Chapter 6 Inferences Based on Two Samples Frequently we want to use statistical techniques to compare two populations. For example, one might wish to compare the proportions of families with incomes below

More information

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras. Lecture 11 t- Tests

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras. Lecture 11 t- Tests Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras Lecture 11 t- Tests Welcome to the course on Biostatistics and Design of Experiments.

More information

Multiple samples: Modeling and ANOVA

Multiple samples: Modeling and ANOVA Multiple samples: Modeling and Patrick Breheny April 29 Patrick Breheny Introduction to Biostatistics (171:161) 1/23 Multiple group studies In the latter half of this course, we have discussed the analysis

More information

Confidence intervals CE 311S

Confidence intervals CE 311S CE 311S PREVIEW OF STATISTICS The first part of the class was about probability. P(H) = 0.5 P(T) = 0.5 HTTHHTTTTHHTHTHH If we know how a random process works, what will we see in the field? Preview of

More information

Lecture 11: Simple Linear Regression

Lecture 11: Simple Linear Regression Lecture 11: Simple Linear Regression Readings: Sections 3.1-3.3, 11.1-11.3 Apr 17, 2009 In linear regression, we examine the association between two quantitative variables. Number of beers that you drink

More information

Chapter 22. Comparing Two Proportions 1 /29

Chapter 22. Comparing Two Proportions 1 /29 Chapter 22 Comparing Two Proportions 1 /29 Homework p519 2, 4, 12, 13, 15, 17, 18, 19, 24 2 /29 Objective Students test null and alternate hypothesis about two population proportions. 3 /29 Comparing Two

More information

One-sample categorical data: approximate inference

One-sample categorical data: approximate inference One-sample categorical data: approximate inference Patrick Breheny October 6 Patrick Breheny Biostatistical Methods I (BIOS 5710) 1/25 Introduction It is relatively easy to think about the distribution

More information

CHAPTER 5 Probabilistic Features of the Distributions of Certain Sample Statistics

CHAPTER 5 Probabilistic Features of the Distributions of Certain Sample Statistics CHAPTER 5 Probabilistic Features of the Distributions of Certain Sample Statistics Key Words Sampling Distributions Distribution of the Sample Mean Distribution of the difference between Two Sample Means

More information

Note that we are looking at the true mean, μ, not y. The problem for us is that we need to find the endpoints of our interval (a, b).

Note that we are looking at the true mean, μ, not y. The problem for us is that we need to find the endpoints of our interval (a, b). Confidence Intervals 1) What are confidence intervals? Simply, an interval for which we have a certain confidence. For example, we are 90% certain that an interval contains the true value of something

More information

We're in interested in Pr{three sixes when throwing a single dice 8 times}. => Y has a binomial distribution, or in official notation, Y ~ BIN(n,p).

We're in interested in Pr{three sixes when throwing a single dice 8 times}. => Y has a binomial distribution, or in official notation, Y ~ BIN(n,p). Sampling distributions and estimation. 1) A brief review of distributions: We're in interested in Pr{three sixes when throwing a single dice 8 times}. => Y has a binomial distribution, or in official notation,

More information

Chapter 7. Inference for Distributions. Introduction to the Practice of STATISTICS SEVENTH. Moore / McCabe / Craig. Lecture Presentation Slides

Chapter 7. Inference for Distributions. Introduction to the Practice of STATISTICS SEVENTH. Moore / McCabe / Craig. Lecture Presentation Slides Chapter 7 Inference for Distributions Introduction to the Practice of STATISTICS SEVENTH EDITION Moore / McCabe / Craig Lecture Presentation Slides Chapter 7 Inference for Distributions 7.1 Inference for

More information

Chapter 22. Comparing Two Proportions 1 /30

Chapter 22. Comparing Two Proportions 1 /30 Chapter 22 Comparing Two Proportions 1 /30 Homework p519 2, 4, 12, 13, 15, 17, 18, 19, 24 2 /30 3 /30 Objective Students test null and alternate hypothesis about two population proportions. 4 /30 Comparing

More information

Multiple Regression Analysis

Multiple Regression Analysis Multiple Regression Analysis y = β 0 + β 1 x 1 + β 2 x 2 +... β k x k + u 2. Inference 0 Assumptions of the Classical Linear Model (CLM)! So far, we know: 1. The mean and variance of the OLS estimators

More information

Harvard University. Rigorous Research in Engineering Education

Harvard University. Rigorous Research in Engineering Education Statistical Inference Kari Lock Harvard University Department of Statistics Rigorous Research in Engineering Education 12/3/09 Statistical Inference You have a sample and want to use the data collected

More information

Hypothesis testing I. - In particular, we are talking about statistical hypotheses. [get everyone s finger length!] n =

Hypothesis testing I. - In particular, we are talking about statistical hypotheses. [get everyone s finger length!] n = Hypothesis testing I I. What is hypothesis testing? [Note we re temporarily bouncing around in the book a lot! Things will settle down again in a week or so] - Exactly what it says. We develop a hypothesis,

More information

Mathematical Notation Math Introduction to Applied Statistics

Mathematical Notation Math Introduction to Applied Statistics Mathematical Notation Math 113 - Introduction to Applied Statistics Name : Use Word or WordPerfect to recreate the following documents. Each article is worth 10 points and should be emailed to the instructor

More information

Section 3: Simple Linear Regression

Section 3: Simple Linear Regression Section 3: Simple Linear Regression Carlos M. Carvalho The University of Texas at Austin McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 Regression: General Introduction

More information

CHAPTER 10 Comparing Two Populations or Groups

CHAPTER 10 Comparing Two Populations or Groups CHAPTER 10 Comparing Two Populations or Groups 10.1 Comparing Two Proportions The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers Comparing Two Proportions

More information

Formal Statement of Simple Linear Regression Model

Formal Statement of Simple Linear Regression Model Formal Statement of Simple Linear Regression Model Y i = β 0 + β 1 X i + ɛ i Y i value of the response variable in the i th trial β 0 and β 1 are parameters X i is a known constant, the value of the predictor

More information

Statistical Inference for Means

Statistical Inference for Means Statistical Inference for Means Jamie Monogan University of Georgia February 18, 2011 Jamie Monogan (UGA) Statistical Inference for Means February 18, 2011 1 / 19 Objectives By the end of this meeting,

More information

y = a + bx 12.1: Inference for Linear Regression Review: General Form of Linear Regression Equation Review: Interpreting Computer Regression Output

y = a + bx 12.1: Inference for Linear Regression Review: General Form of Linear Regression Equation Review: Interpreting Computer Regression Output 12.1: Inference for Linear Regression Review: General Form of Linear Regression Equation y = a + bx y = dependent variable a = intercept b = slope x = independent variable Section 12.1 Inference for Linear

More information

Interactions and Factorial ANOVA

Interactions and Factorial ANOVA Interactions and Factorial ANOVA STA442/2101 F 2017 See last slide for copyright information 1 Interactions Interaction between explanatory variables means It depends. Relationship between one explanatory

More information

Interactions and Factorial ANOVA

Interactions and Factorial ANOVA Interactions and Factorial ANOVA STA442/2101 F 2018 See last slide for copyright information 1 Interactions Interaction between explanatory variables means It depends. Relationship between one explanatory

More information

Data Analysis and Statistical Methods Statistics 651

Data Analysis and Statistical Methods Statistics 651 Data Analysis and Statistical Methods Statistics 651 http://www.stat.tamu.edu/~suhasini/teaching.html Lecture 26 (MWF) Tests and CI based on two proportions Suhasini Subba Rao Comparing proportions in

More information

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing 1 In most statistics problems, we assume that the data have been generated from some unknown probability distribution. We desire

More information

Business Statistics. Lecture 10: Course Review

Business Statistics. Lecture 10: Course Review Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,

More information

4 Hypothesis testing. 4.1 Types of hypothesis and types of error 4 HYPOTHESIS TESTING 49

4 Hypothesis testing. 4.1 Types of hypothesis and types of error 4 HYPOTHESIS TESTING 49 4 HYPOTHESIS TESTING 49 4 Hypothesis testing In sections 2 and 3 we considered the problem of estimating a single parameter of interest, θ. In this section we consider the related problem of testing whether

More information

Design of Engineering Experiments Part 2 Basic Statistical Concepts Simple comparative experiments

Design of Engineering Experiments Part 2 Basic Statistical Concepts Simple comparative experiments Design of Engineering Experiments Part 2 Basic Statistical Concepts Simple comparative experiments The hypothesis testing framework The two-sample t-test Checking assumptions, validity Comparing more that

More information

Chapter 26: Comparing Counts (Chi Square)

Chapter 26: Comparing Counts (Chi Square) Chapter 6: Comparing Counts (Chi Square) We ve seen that you can turn a qualitative variable into a quantitative one (by counting the number of successes and failures), but that s a compromise it forces

More information

Midterm 1 and 2 results

Midterm 1 and 2 results Midterm 1 and 2 results Midterm 1 Midterm 2 ------------------------------ Min. :40.00 Min. : 20.0 1st Qu.:60.00 1st Qu.:60.00 Median :75.00 Median :70.0 Mean :71.97 Mean :69.77 3rd Qu.:85.00 3rd Qu.:85.0

More information

Regression, part II. I. What does it all mean? A) Notice that so far all we ve done is math.

Regression, part II. I. What does it all mean? A) Notice that so far all we ve done is math. Regression, part II I. What does it all mean? A) Notice that so far all we ve done is math. 1) One can calculate the Least Squares Regression Line for anything, regardless of any assumptions. 2) But, if

More information

Inference in Regression Analysis

Inference in Regression Analysis Inference in Regression Analysis Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 1 Today: Normal Error Regression Model Y i = β 0 + β 1 X i + ǫ i Y i value

More information

1 Least Squares Estimation - multiple regression.

1 Least Squares Estimation - multiple regression. Introduction to multiple regression. Fall 2010 1 Least Squares Estimation - multiple regression. Let y = {y 1,, y n } be a n 1 vector of dependent variable observations. Let β = {β 0, β 1 } be the 2 1

More information

ANOVA Situation The F Statistic Multiple Comparisons. 1-Way ANOVA MATH 143. Department of Mathematics and Statistics Calvin College

ANOVA Situation The F Statistic Multiple Comparisons. 1-Way ANOVA MATH 143. Department of Mathematics and Statistics Calvin College 1-Way ANOVA MATH 143 Department of Mathematics and Statistics Calvin College An example ANOVA situation Example (Treating Blisters) Subjects: 25 patients with blisters Treatments: Treatment A, Treatment

More information

Constant linear models

Constant linear models Constant linear models A constant linear model is a type of model that provides us with tools for drawing statistical inferences about means of random variables. Means of random variables are theoretical

More information

Lecture 5: ANOVA and Correlation

Lecture 5: ANOVA and Correlation Lecture 5: ANOVA and Correlation Ani Manichaikul amanicha@jhsph.edu 23 April 2007 1 / 62 Comparing Multiple Groups Continous data: comparing means Analysis of variance Binary data: comparing proportions

More information

Keppel, G. & Wickens, T.D. Design and Analysis Chapter 2: Sources of Variability and Sums of Squares

Keppel, G. & Wickens, T.D. Design and Analysis Chapter 2: Sources of Variability and Sums of Squares Keppel, G. & Wickens, T.D. Design and Analysis Chapter 2: Sources of Variability and Sums of Squares K&W introduce the notion of a simple experiment with two conditions. Note that the raw data (p. 16)

More information

Chapter 12 - Lecture 2 Inferences about regression coefficient

Chapter 12 - Lecture 2 Inferences about regression coefficient Chapter 12 - Lecture 2 Inferences about regression coefficient April 19th, 2010 Facts about slope Test Statistic Confidence interval Hypothesis testing Test using ANOVA Table Facts about slope In previous

More information

Confidence Intervals. - simply, an interval for which we have a certain confidence.

Confidence Intervals. - simply, an interval for which we have a certain confidence. Confidence Intervals I. What are confidence intervals? - simply, an interval for which we have a certain confidence. - for example, we are 90% certain that an interval contains the true value of something

More information

STA Module 11 Inferences for Two Population Means

STA Module 11 Inferences for Two Population Means STA 2023 Module 11 Inferences for Two Population Means Learning Objectives Upon completing this module, you should be able to: 1. Perform inferences based on independent simple random samples to compare

More information

STA Rev. F Learning Objectives. Two Population Means. Module 11 Inferences for Two Population Means

STA Rev. F Learning Objectives. Two Population Means. Module 11 Inferences for Two Population Means STA 2023 Module 11 Inferences for Two Population Means Learning Objectives Upon completing this module, you should be able to: 1. Perform inferences based on independent simple random samples to compare

More information

Note that we are looking at the true mean, μ, not y. The problem for us is that we need to find the endpoints of our interval (a, b).

Note that we are looking at the true mean, μ, not y. The problem for us is that we need to find the endpoints of our interval (a, b). Confidence Intervals 1) What are confidence intervals? Simply, an interval for which we have a certain confidence. For example, we are 90% certain that an interval contains the true value of something

More information

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model 1 Linear Regression 2 Linear Regression In this lecture we will study a particular type of regression model: the linear regression model We will first consider the case of the model with one predictor

More information

Confidence intervals

Confidence intervals Confidence intervals We now want to take what we ve learned about sampling distributions and standard errors and construct confidence intervals. What are confidence intervals? Simply an interval for which

More information

Statistical inference (estimation, hypothesis tests, confidence intervals) Oct 2018

Statistical inference (estimation, hypothesis tests, confidence intervals) Oct 2018 Statistical inference (estimation, hypothesis tests, confidence intervals) Oct 2018 Sampling A trait is measured on each member of a population. f(y) = propn of individuals in the popn with measurement

More information

Inferences About Two Proportions

Inferences About Two Proportions Inferences About Two Proportions Quantitative Methods II Plan for Today Sampling two populations Confidence intervals for differences of two proportions Testing the difference of proportions Examples 1

More information

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X. Estimating σ 2 We can do simple prediction of Y and estimation of the mean of Y at any value of X. To perform inferences about our regression line, we must estimate σ 2, the variance of the error term.

More information

INTRODUCTION TO ANALYSIS OF VARIANCE

INTRODUCTION TO ANALYSIS OF VARIANCE CHAPTER 22 INTRODUCTION TO ANALYSIS OF VARIANCE Chapter 18 on inferences about population means illustrated two hypothesis testing situations: for one population mean and for the difference between two

More information

Data Analysis and Statistical Methods Statistics 651

Data Analysis and Statistical Methods Statistics 651 Data Analysis and Statistical Methods Statistics 651 http://www.stat.tamu.edu/~suhasini/teaching/ Suhasini Subba Rao Review In the previous lecture we looked at the statistics of M&Ms. This example illustrates

More information

Analysis of Variance. Contents. 1 Analysis of Variance. 1.1 Review. Anthony Tanbakuchi Department of Mathematics Pima Community College

Analysis of Variance. Contents. 1 Analysis of Variance. 1.1 Review. Anthony Tanbakuchi Department of Mathematics Pima Community College Introductory Statistics Lectures Analysis of Variance 1-Way ANOVA: Many sample test of means Department of Mathematics Pima Community College Redistribution of this material is prohibited without written

More information

1 Statistical inference for a population mean

1 Statistical inference for a population mean 1 Statistical inference for a population mean 1. Inference for a large sample, known variance Suppose X 1,..., X n represents a large random sample of data from a population with unknown mean µ and known

More information

Solving with Absolute Value

Solving with Absolute Value Solving with Absolute Value Who knew two little lines could cause so much trouble? Ask someone to solve the equation 3x 2 = 7 and they ll say No problem! Add just two little lines, and ask them to solve

More information

Relating Graph to Matlab

Relating Graph to Matlab There are two related course documents on the web Probability and Statistics Review -should be read by people without statistics background and it is helpful as a review for those with prior statistics

More information

Chapter 16. Simple Linear Regression and dcorrelation

Chapter 16. Simple Linear Regression and dcorrelation Chapter 16 Simple Linear Regression and dcorrelation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will

More information

Contingency Tables. Safety equipment in use Fatal Non-fatal Total. None 1, , ,128 Seat belt , ,878

Contingency Tables. Safety equipment in use Fatal Non-fatal Total. None 1, , ,128 Seat belt , ,878 Contingency Tables I. Definition & Examples. A) Contingency tables are tables where we are looking at two (or more - but we won t cover three or more way tables, it s way too complicated) factors, each

More information

Sociology 6Z03 Review II

Sociology 6Z03 Review II Sociology 6Z03 Review II John Fox McMaster University Fall 2016 John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 1 / 35 Outline: Review II Probability Part I Sampling Distributions Probability

More information

CS 5014: Research Methods in Computer Science

CS 5014: Research Methods in Computer Science Computer Science Clifford A. Shaffer Department of Computer Science Virginia Tech Blacksburg, Virginia Fall 2010 Copyright c 2010 by Clifford A. Shaffer Computer Science Fall 2010 1 / 207 Correlation and

More information

Warm-up Using the given data Create a scatterplot Find the regression line

Warm-up Using the given data Create a scatterplot Find the regression line Time at the lunch table Caloric intake 21.4 472 30.8 498 37.7 335 32.8 423 39.5 437 22.8 508 34.1 431 33.9 479 43.8 454 42.4 450 43.1 410 29.2 504 31.3 437 28.6 489 32.9 436 30.6 480 35.1 439 33.0 444

More information

MATH 1150 Chapter 2 Notation and Terminology

MATH 1150 Chapter 2 Notation and Terminology MATH 1150 Chapter 2 Notation and Terminology Categorical Data The following is a dataset for 30 randomly selected adults in the U.S., showing the values of two categorical variables: whether or not the

More information

16.400/453J Human Factors Engineering. Design of Experiments II

16.400/453J Human Factors Engineering. Design of Experiments II J Human Factors Engineering Design of Experiments II Review Experiment Design and Descriptive Statistics Research question, independent and dependent variables, histograms, box plots, etc. Inferential

More information

Sampling Distributions

Sampling Distributions Sampling Distributions Sampling Distribution of the Mean & Hypothesis Testing Remember sampling? Sampling Part 1 of definition Selecting a subset of the population to create a sample Generally random sampling

More information

Sampling Distributions: Central Limit Theorem

Sampling Distributions: Central Limit Theorem Review for Exam 2 Sampling Distributions: Central Limit Theorem Conceptually, we can break up the theorem into three parts: 1. The mean (µ M ) of a population of sample means (M) is equal to the mean (µ)

More information