Business Statistics. Lecture 5: Confidence Intervals

Similar documents
Business Statistics. Lecture 10: Course Review

Hypothesis testing I. - In particular, we are talking about statistical hypotheses. [get everyone s finger length!] n =

Lecture 26: Chapter 10, Section 2 Inference for Quantitative Variable Confidence Interval with t

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS

7.1 Basic Properties of Confidence Intervals

Chapter 5 Confidence Intervals

Today - SPSS and standard error - End of Midterm 1 exam material - T-scores

LECTURE 12 CONFIDENCE INTERVAL AND HYPOTHESIS TESTING

Chapter 24. Comparing Means. Copyright 2010 Pearson Education, Inc.

Confidence intervals

STA Module 10 Comparing Two Proportions

Business Statistics. Lecture 9: Simple Regression

Interval estimation. October 3, Basic ideas CLT and CI CI for a population mean CI for a population proportion CI for a Normal mean

Statistics for IT Managers

Note that we are looking at the true mean, μ, not y. The problem for us is that we need to find the endpoints of our interval (a, b).

Note that we are looking at the true mean, μ, not y. The problem for us is that we need to find the endpoints of our interval (a, b).

Week 11 Sample Means, CLT, Correlation

Harvard University. Rigorous Research in Engineering Education

Confidence Intervals for Population Mean

Confidence Intervals. - simply, an interval for which we have a certain confidence.

Data analysis and Geostatistics - lecture VI

CS 5014: Research Methods in Computer Science. Bernoulli Distribution. Binomial Distribution. Poisson Distribution. Clifford A. Shaffer.

Last few slides from last time

Chapter 23. Inference About Means

Regression, Part I. - In correlation, it would be irrelevant if we changed the axes on our graph.

1 Matched pair comparison(p430-)

Inference for Distributions Inference for the Mean of a Population

Data Analysis and Statistical Methods Statistics 651

Lecture 10A: Chapter 8, Section 1 Sampling Distributions: Proportions

Study and research skills 2009 Duncan Golicher. and Adrian Newton. Last draft 11/24/2008

Review of the Normal Distribution

Chapter 7. Inference for Distributions. Introduction to the Practice of STATISTICS SEVENTH. Moore / McCabe / Craig. Lecture Presentation Slides

Two-sample inference: Continuous data

Chapter 18. Sampling Distribution Models. Copyright 2010, 2007, 2004 Pearson Education, Inc.

Confidence intervals CE 311S

Lecture 7: Confidence interval and Normal approximation

Multiple Regression Analysis

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series

Chapter 18 Sampling Distribution Models

First we look at some terms to be used in this section.

Review of Statistics 101

Sampling Distribution Models. Chapter 17

Outline. PubH 5450 Biostatistics I Prof. Carlin. Confidence Interval for the Mean. Part I. Reviews

Lectures 5 & 6: Hypothesis Testing

STA Why Sampling? Module 6 The Sampling Distributions. Module Objectives

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur

The t-distribution. Patrick Breheny. October 13. z tests The χ 2 -distribution The t-distribution Summary

Inference in Regression Analysis

2011 Pearson Education, Inc

Statistical inference

STAT 201 Assignment 6

16.400/453J Human Factors Engineering. Design of Experiments II

Chapter 6. Estimates and Sample Sizes

Latent Trait Reliability

Comparing Means from Two-Sample

Chapter 23. Inferences About Means. Monday, May 6, 13. Copyright 2009 Pearson Education, Inc.

Statistical Inference

STA Module 11 Inferences for Two Population Means

STA Rev. F Learning Objectives. Two Population Means. Module 11 Inferences for Two Population Means

Applied Regression Modeling: A Business Approach Chapter 3: Multiple Linear Regression Sections

STA 2101/442 Assignment 2 1

One-sample categorical data: approximate inference

Descriptive Statistics (And a little bit on rounding and significant digits)

Lecture 27. DATA 8 Spring Sample Averages. Slides created by John DeNero and Ani Adhikari

Two-sample t-tests. - Independent samples - Pooled standard devation - The equal variance assumption

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

Two-sample Categorical data: Testing

Gov 2000: 6. Hypothesis Testing

Module 03 Lecture 14 Inferential Statistics ANOVA and TOI

Big Data Analysis with Apache Spark UC#BERKELEY

Topic 6 - Confidence intervals based on a single sample

Confidence Intervals with σ unknown

Histograms, Central Tendency, and Variability

MA 3280 Lecture 05 - Generalized Echelon Form and Free Variables. Friday, January 31, 2014.

ACMS Statistics for Life Sciences. Chapter 13: Sampling Distributions

MS&E 226: Small Data

Confidence Intervals. Confidence interval for sample mean. Confidence interval for sample mean. Confidence interval for sample mean

Elementary Statistics Triola, Elementary Statistics 11/e Unit 17 The Basics of Hypotheses Testing

Introduction to hypothesis testing

p = q ˆ = 1 -ˆp = sample proportion of failures in a sample size of n x n Chapter 7 Estimates and Sample Sizes

Statistical Intervals (One sample) (Chs )

Open book, but no loose leaf notes and no electronic devices. Points (out of 200) are in parentheses. Put all answers on the paper provided to you.

Chapter 16. Simple Linear Regression and Correlation

Confidence Intervals for Normal Data Spring 2018

Part Possible Score Base 5 5 MC Total 50

Are data normally normally distributed?

Math 124: Modules Overall Goal. Point Estimations. Interval Estimation. Math 124: Modules Overall Goal.

Ch. 7: Estimates and Sample Sizes

Confidence Intervals for Normal Data Spring 2014

STAT 4385 Topic 01: Introduction & Review

Regression Analysis: Basic Concepts

review session gov 2000 gov 2000 () review session 1 / 38

Confidence Intervals 1

value of the sum standard units

Introduction to Survey Analysis!

Chapter 26: Comparing Counts (Chi Square)

Categorical Data Analysis. The data are often just counts of how many things each category has.

Stat 412/512 REVIEW OF SIMPLE LINEAR REGRESSION. Jan Charlotte Wickham. stat512.cwick.co.nz

ANOVA: Analysis of Variation

Experiment 2 Random Error and Basic Statistics

Transcription:

Business Statistics Lecture 5: Confidence Intervals

Goals for this Lecture Confidence intervals The t distribution 2

Welcome to Interval Estimation! Moments Mean 815.0340 Std Dev 0.8923 Std Error Mean 0.0892 Upper 95% Mean 815.2111 Lower 95% Mean 814.8569 N 100.0000 Sum Weights 100.0000 Sample mean Sample SD LAST CLASS THIS CLASS Sample size 3

Hmmm... In the motor shaft case, we built a model using X and s We treated them like population parameters What if they were way off? How would we know? E.g., What if X 820 for the 100 shafts and we decided the process was not capable How sure are we that would give the same result? 4

Inference: Making Educated Guesses We want to use a sample to make guesses about a larger population Samples are variable (we d get different values if we took a different sample), so our guesses are uncertain We want to guess in such a way that: There is a chance we guessed right We know what that chance is 5

General Strategy for Guessing Pick a statistic similar to the parameter you want to guess Figure out what the sampling distribution of your statistic looks like Use the sampling distribution to assess the quality of your guess We ll start by guessing averages, because they re easy 6

Assumption Data are collected as a simple random sample: Every unit in population equally likely to be chosen Don t just look at the 800 most highly paid CEO s Choosing one unit does not change the relative chances for another unit to be chosen Other sampling schemes require different techniques 7

Last Class: Distribution of Averages 0.9 0.8 0.7 0.6 0.5 0.4 0.3 Individual Mean of 5 Histogram of all possible means of five future shafts. SE= s/sqrt(5) 0.2 0.1 0.0 812 813 814 815 816 817 818 ShaftDiam Histogram of all future shafts. SD=s 8

Guessing, the Population Mean Best guess for is If you always guess never be right! for, you will You can guess with a confidence interval and be right some of the time Narrow intervals: higher chance of being wrong Wide intervals: less useful X X 9

Main Idea 0.9 0.8 0.7 0.6 0.5 0.4 0.3 Because of the CLT, we know that X is within 2 SE s of 95% of the time Alternatively, is within 2 SE s of 95% of the time Individual Mean of 5 Unobserved pop mean Sample mean X (Unobserved) dist. of sample mean 95% confidence interval for pop mean 0.2 0.1 0.0 812 813 814 815 816 817 818 ShaftDiam (Unobserved) dist. 10 of population

How to Guess Choose a probability of being wrong: a Find z on the normal table so that a/2 of the probability is above z and a/2 is less than -z Example: if a = 5% then z = 1.96 s X and n Calculate Your interval is X z s n 11

Example: Shaft Process Sample mean: x 815.03 Sample SD: s 0.8923 Number of shafts: n 100 SE: s s / n 0.8923/ 100 = 0.08923 X 95% confidence interval for : x 1.96 s / n With numbers: 815.03 1.96(0.08923) = [814.859, 815.209] 12

JMP: Shaft Diameters Mean Std Dev Std Error Mean Upper 95% Mean Lower 95% Mean N Sum Weights 815.0340 0.8923 0.0892 815.2111 814.8569 100.0000 100.0000 Best Guess at Best Guess at s 95% Confidence Interval for There s a 95% chance that this interval contains There is a 5% chance it does not 13

So, What is a Confidence Interval? It s an interval around our sample statistic that shows how variable the sample statistic is Narrow: real (population) value unlikely to be far away Wide: little information about the population value Two CIs for our example: Confidence interval #1 Confidence interval #2 814 815 816 Observed Sample Mean 14

Again, What is a Confidence Interval? A confidence interval is a random interval Random because it is a function of a random variable ( X ) Confidence level is the long-run percentage of intervals that will cover the population parameter It is not the probability that the interval contains the true parameter! 15

A Simulation intervals not including population mean: 2 100 90 80 70 60 50 40 30 20 10 1 5 10 15 20 25 30 35 40 45 50 sample 95% Confidence Intervals for mean = 50, sd = 10, n = 5 16

Another Simulation intervals not including population mean: 10 80 70 60 50 40 30 20 1 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 sample 95% Confidence Intervals for mean = 50, sd = 10, n = 595 17

Deriving a Confidence Interval (1) Let X 1, X 2,, X n be a random sample from a normal population with unknown mean and known standard deviation s Create a CI for based on the sampling 2 distribution of the mean: X ~ N, s / n To start, we know that (via standardizing): X s / n ~ N(0,1) 18

Deriving a Confidence Interval (2) Now for Z ~ N(0,1) we know Pr( 1.96 Z 1.96) 0.95 That is, there is a 95% probability that the random variable Z lies in this fixed interval Thus X - Pr -1.96 1.96 0.95 s / n And, after some algebra s s Pr X 1.96 X 1.96 0.95 n n Now we say we are 95% confident that this random interval covers the fixed (unknown) 19

Deriving a Confidence Interval (3) So, If X 1 = x 1, X 2 = x 2,, X n = x n are observed values of a random sample 2 from a N,s : x 1.96 s n is a 95% confidence interval for We can be 95% confident that the interval covers the population mean Interpretation: In the long run, 19 times out of 20 the interval will cover the true mean and 1 time out of 20 it will not 20

But Something s Fishy... Why make all the fuss about the mean being random if we treat the SD as known? Since s is a population quantity, we have to estimate it with s Should that make the intervals wider or narrower? When s unknown (almost always), use t distribution rather than the normal 21

The t Distribution 0.40 normal 0.30 T3 T10 T100 0.20 0.10 0.00-4 -3-2 -1 0 1 2 3 4 Z= number of SE s from the mean 22

Degrees of Freedom (df) The more degrees of freedom we have, the better we can estimate s The better we estimate s, the closer we are to s being known Thus, the more df we have, the closer t values are to z values Calculating degrees of freedom: Each observation adds one degree of freedom One degree of freedom is used up when we calculate X There are n-1 degrees of freedom left 23

How to (Really) Guess Choose a probability of being wrong: a Calculate X and s/ n For DF=1-n, find t from Table A3-5 (page 496) for p=1-a Then, the confidence interval is X t s n 24

Table A3-5 Example For a=0.05 and df=100, we have t=1.984 Notation t df, a / 2 t t n 1, a / 2 100,0.025 1.984 25

Shaft Diameters Example Redux Mean Std Dev Std Error Mean Upper 95% Mean Lower 95% Mean N Sum Weights 815.0340 0.8923 0.0892 815.2111 814.8569 100.0000 100.0000 Best Guess at Best Guess at s 95% Confidence Interval for x 1.984 s / n 815.034 1.984 0.8923/ 100 [814.857,815.211] 26

How Confidence Intervals Behave Width of CI s: w 2t n 1, a / 2 s Margin of error: E tn 1, a / 2 n s n Bigger SD Bigger SE wider intervals Bigger sample size Smaller SE narrower intervals Smaller t values narrower intervals Higher confidence Bigger t values wider intervals 27

t vs. z Use t when you don t know s The t distribution assumes the data are normally distributed Options if data are not normally distributed: Transform the data (logarithms) If transformations don t work and sample size is big ( > 30) ignore the problem If transformations don t work and sample size is small, read the book about nonparametric tests 28

Example (CompPur.jmp) Manufacturer of consumer electronics: How many households will purchase a computer in the next year? Use survey to collect responses from 100 households To justify sales projections, management needs the proportion to be at least 25% Should management revise sales projections? 29

Example, continued Survey results: Frequencies Level Count No 86 Yes 14 Total 100 2 Levels Prob 0.86000 0.14000 1.00000 Another sample would likely have given a different result What we want to know is, based on this result, where could the true proportion lie? 30

Example, continued When data are 0 s and 1 s, they are REALLY not normal 95% CI for true proportion.0.1.2.3.4.5.6.7.8.9 1.0 1.1 Mean Std Dev Std Error Mean Upper 95% Mean Lower 95% Mean N Sum Weights 0.1400 0.3487 0.0349 0.2092 0.0708 100.0000 100.0000 Rule of thumb: at least 30 observations, 5 successes, and 5 failures lets CLT kick in No difference between means and proportions! 31

Other Confidence Intervals There are lots of other confidence intervals we ve concentrated on CIs for the mean See your textbook for CI for the variance (i.e., s 2 ) CI for the difference of two means Not enough time to learn about these Just skim those sections in the book And know that CIs exist for other parameters 32

What We Have Learned So Far Descriptive Statistics Probability And, Or, Not Normal distribution Central limit theorem Computing SE( X ) from SD(X) Inference Confidence intervals for population means and proportions 33