Inference for Distributions Inference for the Mean of a Population. Section 7.1

Similar documents
Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3

MATH Chapter 21 Notes Two Sample Problems

Inference for Distributions Inference for the Mean of a Population

Stat 427/527: Advanced Data Analysis I

CHAPTER 10 Comparing Two Populations or Groups

CHAPTER 10 Comparing Two Populations or Groups

Chapter 8: Estimating with Confidence

Sociology 6Z03 Review II

Two-Sample Inferential Statistics

Inference for the mean of a population. Testing hypotheses about a single mean (the one sample t-test). The sign test for matched pairs

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS

Review: General Approach to Hypothesis Testing. 1. Define the research question and formulate the appropriate null and alternative hypotheses.

Chapter 23. Inferences About Means. Monday, May 6, 13. Copyright 2009 Pearson Education, Inc.

7.2 One-Sample Correlation ( = a) Introduction. Correlation analysis measures the strength and direction of association between

Review of Statistics 101

Psychology 282 Lecture #4 Outline Inferences in SLR

AMS7: WEEK 7. CLASS 1. More on Hypothesis Testing Monday May 11th, 2015

4.1. Introduction: Comparing Means

Ch18 links / ch18 pdf links Ch18 image t-dist table

Chapter 8: Estimating with Confidence

Questions 3.83, 6.11, 6.12, 6.17, 6.25, 6.29, 6.33, 6.35, 6.50, 6.51, 6.53, 6.55, 6.59, 6.60, 6.65, 6.69, 6.70, 6.77, 6.79, 6.89, 6.

Confidence Intervals, Testing and ANOVA Summary

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

Comparing Means from Two-Sample

MBA 605, Business Analytics Donald D. Conant, Ph.D. Master of Business Administration

Chapter 7. Inference for Distributions. Introduction to the Practice of STATISTICS SEVENTH. Moore / McCabe / Craig. Lecture Presentation Slides

Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2

The t-statistic. Student s t Test

Stat 529 (Winter 2011) Experimental Design for the Two-Sample Problem. Motivation: Designing a new silver coins experiment

Chapters 4-6: Inference with two samples Read sections 4.2.5, 5.2, 5.3, 6.2

5.2 Tests of Significance

y ˆ i = ˆ " T u i ( i th fitted value or i th fit)

Lecture 12: Small Sample Intervals Based on a Normal Population Distribution

Lecture 26: Chapter 10, Section 2 Inference for Quantitative Variable Confidence Interval with t

Statistics Primer. ORC Staff: Jayme Palka Peter Boedeker Marcus Fagan Trey Dejong

9.5 t test: one μ, σ unknown

Inferences for Regression

An inferential procedure to use sample data to understand a population Procedures

z and t tests for the mean of a normal distribution Confidence intervals for the mean Binomial tests

Non-parametric (Distribution-free) approaches p188 CN

INTERVAL ESTIMATION AND HYPOTHESES TESTING

The t-test: A z-score for a sample mean tells us where in the distribution the particular mean lies

STAT 512 MidTerm I (2/21/2013) Spring 2013 INSTRUCTIONS

Harvard University. Rigorous Research in Engineering Education

Chapter 7: Sampling Distributions

Single Sample Means. SOCY601 Alan Neustadtl

Lectures on Simple Linear Regression Stat 431, Summer 2012

Inference for Regression Inference about the Regression Model and Using the Regression Line

HYPOTHESIS TESTING. Hypothesis Testing

13. Sampling distributions

Objectives Simple linear regression. Statistical model for linear regression. Estimating the regression parameters

Inferences about Means

The One-Way Repeated-Measures ANOVA. (For Within-Subjects Designs)

Econometrics. 4) Statistical inference

Inferential statistics

y = a + bx 12.1: Inference for Linear Regression Review: General Form of Linear Regression Equation Review: Interpreting Computer Regression Output

Chapter 20 Comparing Groups

Summary: the confidence interval for the mean (σ 2 known) with gaussian assumption

Population Variance. Concepts from previous lectures. HUMBEHV 3HB3 one-sample t-tests. Week 8

ECON Introductory Econometrics. Lecture 5: OLS with One Regressor: Hypothesis Tests

Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami

Mathematics for Economics MA course

Chapter 23. Inference About Means

Section 9.4. Notation. Requirements. Definition. Inferences About Two Means (Matched Pairs) Examples

Epidemiology Principles of Biostatistics Chapter 10 - Inferences about two populations. John Koval

HYPOTHESIS TESTING II TESTS ON MEANS. Sorana D. Bolboacă

Inference for Regression Simple Linear Regression

Confidence Intervals for Two Means

Class 24. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Statistical Foundations:

Applied Multivariate and Longitudinal Data Analysis

Lecture 15: Inference Based on Two Samples

Inference in Normal Regression Model. Dr. Frank Wood

ECO220Y Simple Regression: Testing the Slope

Chapter 7: Sampling Distributions

Chapter 9 Inferences from Two Samples

Chapter 5 Confidence Intervals

MATH Notebook 3 Spring 2018

Confidence Intervals with σ unknown

Sampling distribution of t. 2. Sampling distribution of t. 3. Example: Gas mileage investigation. II. Inferential Statistics (8) t =


Inferences about central values (.)

Sampling Distributions: Central Limit Theorem

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Statistical inference provides methods for drawing conclusions about a population from sample data.

STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015

AMS 7 Correlation and Regression Lecture 8

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =

EC2001 Econometrics 1 Dr. Jose Olmo Room D309

1 Statistical inference for a population mean

Chapter 24. Comparing Means. Copyright 2010 Pearson Education, Inc.

INFERENCE FOR REGRESSION

On Assumptions. On Assumptions

Chapter 8 Handout: Interval Estimates and Hypothesis Testing

Tables Table A Table B Table C Table D Table E 675

Chapter 24. Comparing Means

Institute of Actuaries of India

Inferences About the Difference Between Two Means

Correlation and Regression

Correlation & Simple Regression

Transcription:

Inference for Distributions Inference for the Mean of a Population Section 7.1

Statistical inference in practice Emphasis turns from statistical reasoning to statistical practice: Population standard deviation, σ, unknown. Inference on µ and comparisons of µ between populations

Example: Cola sweetness Does storage reduce the sweetness of cola? The loss in sweetness after storage is measured by a random sample of n = 10 professional tasters. 2.0 0.4 0.7 2.0-0.4 2.2-1.3 1.2 1.1 2.3 Want to test H 0 : µ = 0 versus H a : µ > 0 Use of the one-sample z test requires knowledge of σ We have the estimate s = 1.196 of σ, but this introduces additional random variability Can t ignore since n is small.

The t distributions Assume a SRS from a N(µ, σ) population. The t statistic has a t distribution with n 1 degrees of freedom The statistic SE = s/ n is the standard error of SE = s/ n estimates In general, t(k) denotes a t distribution with k degrees of freedom

Comparison of t(k) with N(0, 1) A t(k) density curve resembles that of a standard Normal Similarities: Both are centered at zero, symmetric, mound-shaped Differences: t(k) has an additional parameter, k = deg. of freedom The sampling distribution of t statistic depends on sample size t(k) has larger spread, but close match for large k If T is t(k) then σ T > 1, but σ T 1 if k is large Larger spread reflects additional variability due to SE = s/ n

Calculating t probabilities and critical values Suppose T is t(k). In Excel: For c > 0, tdist(c, k, 1) = P(T c) For c > 0, tdist(c, k, 2) = 2P(T c) For 0 < α < 1, tinv(α, k) is the c for which P(T c) = α/2

One-sample t test Assumptions: SRS of size n from a Normal population Hypotheses: H 0 : µ = µ 0 versus a one- or two-sided H a Test statistic: P-value: P(T -t) for H a : µ < µ 0 P(T t) for H a : µ > µ 0 2P(T t ) for H a : µ µ 0 where T is t(n 1)

Example: Cola sweetness (continued) Data: SRS of size n = 10 from a Normal population of professional tasters. Hypotheses: H 0 : µ = 0 versus H a : µ > 0 Summary statistics: and s = 1.196 Test statistic: P-value: P(T 2.70) = 0.012, with k = n 1 = 9 d.f. Decision: Reject H 0 at significance level α = 0.05, and conclude a loss of sweetness

Confidence intervals in testing When H 0 is rejected, a natural follow-up question is: how large is the effect that has been detected? Example: Cola sweetness (continued) H 0 is rejected with α = 0.05, indicating evidence of a loss of sweetness How much sweetness is lost? Answer with a confidence interval

One-sample t confidence interval Assumptions: SRS of size n from a Normal population Target parameter: µ CI formula: For confidence level C, the interval is where t* is such that P(T t*) = (1 C)/2, with T being t(n 1)

Example: Cola sweetness (continued) How much sweetness is lost? 95% CI: To find t* when k = n 1 = 9 d.f., we use Excel. t* = tinv(0.05,9) = 2.26. The interval is Conclude a loss of sweetness between 0.16 and 1.88 units, on average

Robustness With larger samples, one-sample t procedures become robust against violations of the Normality assumption Some guidelines: If n < 15, the Normality assumption is critical If n 15, proceed only in absence of outliers and strong skewness If n 40, the procedures are generally robust

Example: Cola sweetness (continued) Normality may be hard to verify when n is small. Often Normality is argued from one s understanding of the phenomenon under study

Matched pairs experiments The cola sweetness study is an example of a matched -pairs experiment: The raw measurements came in pairs (x 1, x 2 ) x 1 = sweetness before storage x 2 = sweetness after storage But we analyzed the differences within pairs x = x 1 x 2

Comments on matched pairs Common matched pairs settings: Response before and after exposure to a stimulus. Pairs of very similar subjects (i.e., identical twins) applied different treatments When treatments are randomized, matched pairs is a randomized, comparative experiment

Inference for Distributions Comparing Two Means Section 7.2

The two-sample setup Objective: compare two distinct populations through random samples drawn respectively from them Population 1 Population 2 Sample 1 Sample 2 May represent distinct treatments of a randomized comparative experiment. Samples are assumed to be drawn independently of each other

Notation Population 1 Population µ 1 µ 2 σ 1 σ 2 Independence n 1 n 2 s 1 s 2

Basic approach to inference Objective: Calculate a confidence interval for µ 1 µ 2 or test H 0 : µ 1 = µ 2 Starting point: Estimate µ 1 µ 2 with Unbiased for µ 1 µ 2 If both populations are Normal then The z-score of is

The two-sample t statistic Two-sample t procedures are based on the two-sample t statistic z-score with estimated σ 1 and σ 2 If both populations are Normal then t is approximately t(k) with two possible d.f. formulas Satterthwaite s formula: k = smaller of n 1 1 and n 2 1 Easier computation Yields conservative confidence and significance levels

Two-sample t test Assumptions: independent SRSs drawn from distinct Normal populations Hypotheses: H 0 : µ 1 = µ 2 versus a one- or two-sided H a Test statistic: P-value: P(T -t) for H a : µ 1 < µ 2 P(T t) for H a : µ 1 > µ 2 2P(T t ) for H a : µ 1 µ 2 where T is t(k) with k as above

Example: Directed reading Do directed reading activities improve reading ability? Measure degree of reading power (DRP) in: treatment n 1 = 21 third-graders under directed reading n 2 = 23 third-graders under conventional reading Want to test H 0 : µ 1 = µ 2 versus H a : µ 1 > µ 2 control

Example: Directed reading (continued) Data: Independent SRS of sizes n 1 = 21 and n 2 = 23 from Normal populations students DRP measurements Hypotheses: H 0 : µ 1 = µ 2 versus H a : µ 1 > µ 2 Summary statistics: Test statistic:

Example: Directed reading (continued) Test statistic: t = 2.31 P-value*: P(T 2.31) = 0.016, with k = smaller of n 1 1 and n 2 1 = 21 1 = 20 d.f. Decision: Reject H 0 at significance level α = 0.05, and conclude that directed reading activities improve reading ability Next question: How much improvement? * Satterthwaite s formula yields k = 37.9, hence the P-value P(T 2.31) = 0.013

Two-sample t confidence interval Assumptions: independent SRSs drawn from distinct Normal populations Target parameter: µ 1 µ 2 CI formula: For confidence level C, the interval is where t* is such that P(T t*) = (1 C)/2, with T being t(k) and k as above

Example: Directed reading (continued) How much improvement? 95% CI*: Since k = smaller of n 1 1 and n 2 1 = 21 1 = 20 d.f. t* = tinv(0.05,20) = 2.09. The interval is Conclude an improvement between 0.97 and 18.95 units of DRP, on average * Satterthwaite s formula yields the 95% CI (1.23, 18.69)

Robustness The sum of sample sizes provides robustness guidelines on the use of two-sample t procedures: If n 1 + n 2 < 15, the Normality assumption is critical If n 1 + n 2 15, proceed only in absence of outliers and strong skewness If n 1 + n 2 40, the procedures are generally robust Enhance robustness by planning n 1 n 2