Business Statistics: Lecture 8: Introduction to Estimation & Hypothesis Testing

Agenda Introduction to Estimation Point estimation Interval estimation Introduction to Hypothesis Testing Concepts en terminology of hypothesis testing Null and alternative hypothesis Type I and type II error 2

Concepts of Estimation Objective: determine the approximate value of a population parameter on the basis of a sample statistic, e.g. sample mean ( X) is used to estimate population mean (μ) sample proportion ( ˆP ) is used to estimate population proportion (p) Estimator: formula (statistic) that provides the guess of the population parameter (denoted by uppercase letters: X, ˆP ) Estimate: numerical outcome of estimator once the sample has been drawn (denoted by lowercase letters: x, ˆp ) Two types of estimators: Point estimator: provides single value Interval estimator: provides interval with certain confidence 3

Point Estimators Point Estimator: draws inferences about a population by estimating the value of an unknown parameter using a single value Example: Consider adult, male inhabitants interested in mean height (μ) in cm; standard deviation (σ) is assumed to be 10 cm; population is assumed to be normal Draw random sample X 1, X 2,, X n from population E(X i ) = μ for i = 1,, n Consider two point estimators: Sample mean: X Sample median: M 4

Interval Estimators Drawbacks of Point Estimator: Virtually certain that estimate is wrong: P(X ) 0 We need to know how close estimator is to the parameter of interest: P( X )? Therefore use is made of Interval Estimator: draws inferences about a population by estimating the value of an unknown parameter using an interval The width of the interval is related to the confidence (probability) that the interval includes the true parameter 5

Interval Estimator for μ (σ known) Derivation: Suppose that X ~ N(μ, σ) or n > 30 (CLT) Then it follows that X N, / n X X X P z/ 2 z/ 2 1 / n P z / n X z / n 1 / 2 / 2 P X z / n X z / n 1 / 2 / 2 1 = 0.95; z /2 = z.025 = 1.96 The interval [X z / n, X z / n] is a random interval /2 /2 It includes (covers) the parameter μ with probability 1 6

Interval Estimator for μ (σ known) Interpretation: with repeated sampling from this population, the proportion of valus of X for which the interval [X z includes the population mean μ is /2 / n, X z /2 / n] equal to 1 The interval [X z /2 / n, X z /2 / n] is called the (1 )x100% confidence interval (CI) estimator of μ 1 is the confidence level (probability of correct estimate) X z : lower confidence limit (LCL) /2/ n X z / n : upper confidence limit (UCL) /2 If we replace X by the observed value x, we get the (1 )x100% confidence interval estimate for μ Alternative notation: x z / n /2 7

Interval Estimator for μ (σ known) Example (mean height of adult, male inhabitants of the Netherlands cont.): Suppose n = 400 and x = 182 cm (σ = 10 cm) Question: compute 95.44% CI estimate for μ Solution: (1 )x100% CI estimate: x = 182 1 = 0.9544 /2 = 0.0228 z /2 = 2.0 (Table 4) / n 10 / 400 0.5 95.44% CI estimate for μ: 182 2.0 0.5 = 182 1 cm LCL = 181 cm; UCL = 183 cm x z n /2 / 8

The Error of Estimation Sampling error can be defined as difference between an estimator (e.g. X) and a parameter (e.g. μ); also called error of estimation 9

Hypothesis & Hypothesis Testing Hypothesis Answer to a research question or assumption made about a population parameter (Not a sample estimate!) population mean Example: The mean monthly cell phone bill of this city is = $42 population proportion Example: The proportion of adults in this city with cell phones is p =.68 Hypothesis Testing Determine whether there is enough statistical evidence in favor of a certain belief or hypothesis about a parameter 10

Concepts of Hypothesis Testing Overview of critical concepts in hypothesis testing: 1. There are two hypotheses H 0 (null hypothesis) & H 1 (alternative hypothesis) 2. Testing procedure starts from assumption that H 0 is true 3. Goal of the process is to determine whether there is enough evidence in favor of H 1 4. There are two possible decisions: there is enough evidence to reject H 0 in favor of H 1 there is not enough evidence to reject H 0 in favor of H 1 5. There are two possible errors: Type I error: Reject a true H 0 ; P(Type I error) = Type II error: Do not reject a false H 0 ; P(Type II error) = 11

The Null Hypothesis, H 0 States the assumption (numerical) to be tested Example: The average number of TV sets in U.S. Homes is at H 0 : μ 3 least three ( ) Is always about a population parameter, not about a sample statistic

The Null Hypothesis, H 0 (continued) Begin with the assumption that the null hypothesis is true Always contains =, or sign May or may not be rejected

The Alternative Hypothesis, H A Is the opposite of the null hypothesis e.g.: The average number of TV sets in U.S. homes is less than 3 ( H A : < 3 ) Never contains the =, or sign May or may not be accepted H A is generally the hypothesis that is believed (or needs to be supported) by the researcher

Hypothesis Testing Process Is Claim: the population mean age is 50. (Null Hypothesis: H 0 : = 50 ) x 20 likely if = 50? If not likely, REJECT Null Hypothesis Suppose the sample mean age is 20: x = 20 Population Now select a random sample Sample

Reason for Rejecting H 0 Sampling Distribution of x 20 It is unlikely that we would get a sample mean of this value... X = = 50 If H 0 is true... if in fact this were the population mean x... then we reject the null hypothesis that = 50.

How much is a value of sample statistic far away from the population value under H0? We choose the critical value (cut-off value) on your sampling distribution that tells you that your sample statistic is very far from the null hypothesis and thus not likely. 17

Level of Significance, In statistics, a critical value is the value corresponding to a given significance level Level of significance defines unlikely values of sample statistic if null hypothesis is true Defines rejection region of the sampling distribution Is designated by (level of significance) Typical values are.01,.05, or.10

Level of Significance and the Rejection Region Level of significance = H 0 : μ 3 H A : μ < 3 H 0 : μ 3 H A : μ > 3 H 0 : μ = 3 H A : μ 3 Lower tail test Upper tail test Two tailed test /2 0 0 0 /2 critical value Rejection region is shaded

Errors in Making Decisions Type I Error Reject a true null hypothesis The probability of Type I Error is Called level of significance of the test Set by researcher in advance

Errors in Making Decisions (continued) Type II Error Fail to reject a false null hypothesis The probability of Type II Error is β β

Outcomes and Probabilities Possible Hypothesis Test Outcomes Key: Outcome (Probability) Decision Do Not Reject H 0 Reject H 0 State of Nature H 0 True No error (1 - ) Type I Error ( ) H 0 False Type II Error ( β ) No Error: Power ( 1 - β )