Probability and Statistics

Size: px

Start display at page:

Download "Probability and Statistics"

Alexander Newton
5 years ago
Views:

1 The big picture Probability and Statistics Sample Population 1) Data Collection Data ) Explanatory Data Analysis (EDA) Inference on Relationship Between two Variables 4) Inference 3) Probability The Big Prof. Liping Fu, University of Waterloo Outline 3 Statistical Inference (Questioning)? 4 Introduction: What and Why? Confidence Interval Difference between two means Difference between two proportions Hypothesis Testing Difference between two means Difference between two proportions Paired vs. Unpaired Comparison Comparison of More-than-Two Populations (ANOVA) Statistical Inference On Single Variable On Relationship Estimation Hypothesis Testing Estimation Hypothesis Testing

2 Relationship between... 5 Bivariate Relationships 6 Categorical Y (C) ~ Categorical X (C) Explanatory Variable (X) Response Variable (Y) Relationship between Two Variables Quantitative Y (Q) ~ Categorical X (C) Quantitative Y (Q) ~ Quantitative X (Q) Independent variable Dependent variable Two Types of Questions on Relationship? Estimation What is the difference between two population means, variance, proportions? 7 Confidence Interval for Difference Between Two Population Means 8 Hypothesis Testing Is there difference between two population means? μ 1 - μ =? The relationship between a quantitative variable and a categorical can be rephrased as the difference between two or more (sub)populations as defined by the categorical variable Relationship between student height and gender == Difference between male and female students Population : (μ ) Sample 1 Sample n, x, s

3 Confidence Interval for the Difference between Means (n>30) With given samples ( ; n, x, s ), we are (1-α)100% confident that the following interval contains the difference between the two population means (μ 1 - μ ) x 1 x ± zα s 1 n + s 1 n Confidence Interval for the Difference between Means (n<30) With given samples ( ; n, x, s ), we are (1-α)100% confident that the following interval contains the difference between the two population means (μ 1 - μ ) 9 Hypothesis Testing on Claims Related to Difference Between Two Population Means Claim about μ 1 - μ Claims to Be Tested The population mean (μ 1 - μ ) is equal (not equal) to a given value (δ) Hypotheses H 0 : μ 1 - μ = δ H 1 : μ 1 - μ δ 10 x1 x ± tα s 1 n + s 1 n The population mean (μ 1 - μ ) is greater than (less than or equal to) a given value (δ) H 0 : μ 1 - μ = δ H 1 : μ 1 - μ > δ Degree of Freedom The population mean (μ 1 - μ ) is less (greater than or equal to ) than a given value (δ) H 0 : μ 1 - μ = δ H 1 : μ 1 - μ < δ Data: Two Independent Samples 11 Critical Values and Regions 1 μ 1 - μ =? Hypotheses Reject H 0 if Test Type Population : (μ ) H 0 : μ 1 - μ = δ H 1 : μ 1 - μ δ H 0 : μ 1 - μ = δ H 1 : μ 1 - μ < δ z<-z α/ or z>z α/ z<-z α Two-tailed Test Sample 1 Sample H 0 : μ 1 - μ = δ H 1 : μ 1 - μ > δ z>z α n, x, s z = x 1 x δ s 1 + s n1 n Z α is from z-distribution table

4 Test about Difference between Means (n<30) 13 Unpaired vs. Pair-Wise Comparison 14 Test Statistic: t = x 1 x δ s 1 n1 + s n What We Have Talked About Is Unpaired Comparison Decision (t α is from t-distribution table with df = ) Hypotheses Reject H 0 if Test Type H 0 : μ 1 - μ = δ H 1 : μ 1 - μ δ t<-t α/ or t>t α/ Two-tailed Test Population : (μ ) H 0 : μ 1 - μ = δ H 1 : μ 1 - μ < δ H 0 : μ 1 - μ = δ H 1 : μ 1 - μ > δ t<-t α t>t α Sample 1 Sample n, x, s Independent Samples Paired Comparison Population : (μ ) Sample 1 Sample n, x, s Paired (Dependent) Samples The temperature of 10 water sample is measured with two different thermometers. Each sample is measured with the first thermometer and the result is recorded, then each sample is measured with the second thermometer. Do the two thermometers give the same reading on average? Unpaired samples can cool down between experiments. The temperature of 10 water samples is measured with two different thermometers. The samples are well-mixed and each sample is measured with both thermometers simultaneously. Do the two thermometers give the same reading on average? Paired identical tests carried out simultaneously 3. The temperature of 10 water samples is measured with two different thermometers. The samples are well-mixed and each sample is measured with both thermometers simultaneously. The second thermometer takes longer to give a reading than the first. Do the two thermometers give the same reading on average? Possibly unpaired if the temperatures are varying over time 4. Two types of tires are compared by driving a car around a circular track. Type A is on the left, type B is on the right. Possibly unpaired since the car always turns one way and forces may vary by side of car. 16

5 Example 1 Unconscious driving is one the main causes of car accidents. Interviews with Unconscious drivers who were involved in accidents and survived revealed that one of the main problems is that drivers do not realize that they are impaired, thinking "I only had 1- injections...i am OK to drive". Two types of studies were conducted: Study I (unpaired): Two groups of drivers were chosen, each consisting of 0 drivers. A driving test was conducted to measure their reacting times in an obstacle course. Before the driving test, the first group of drivers were kept conscious while the other group of drivers were asked to have two injections. Study II (paired): A sample of 0 drivers was chosen, and their reaction times in an obstacle course were measured before and after having two injections. The purpose of this study was to check whether drivers are impaired after having two injections. Data for Study II are provided as follows 1) What is the difference between these two studies? Which one do you think is more powerful? ) Carry out a testing for Study II and report the test statistic and p-value. 17 Data from Study I Group 1: Conscious Drivers Group : Drivers with Injections Driver Reaction Time Driver Reaction Time Mean Stdev Data from Study II Reaction Time Reaction Time Driver Before Injection(sec) After Injections (sec) Difference Mean 19 Test Procedure for Paired Comparison 1.Consider the difference as the population and the parameter of interest is the mean of the difference (d).formulate hypotheses: e.g. H 0 : d=0 H 1 : d>0 3.Specify Type-I error α, the level of significance and determine the critical test statistic: t crit 4.Use the sampled differences to calculate the value of the test statistic on which the decision is to be based 5.Decision If t calc > t crit then reject H 0 in favor of H 1 ; Otherwise, fail to reject H 0 t calc = d δ s d / n 0

6 Difference Between Three or More Population Means Are μ 1,μ, μ 3 Really Different? Or μ 1 =μ = μ 3? Population : (μ ) Population 3: (μ 3 ) 1 Example The following random samples are measurements of the heat-producing capacity (in millions of calories per ton) of specimens of coal from two mines: Mine 1: 8,60 8,130 8,350 8,070 8,340 Mine : 7,950 7,890 7,900 8,140 7,90 7,840 Use the 0.01 level of significance to test whether the difference between the means of these two samples is significant. Sample 1 Sample Sample 3 n, x, s n 3, x 3, s 3 Analysis of Variance Method (ANOVA) Example 3 A study was conducted at a large state university in order to compare the sleeping habits of undergraduate students to those of graduate students. Random samples of 75 undergraduate students and 50 graduate students were chosen and each of the subjects was asked to report the number of hours he/she sleeps in a typical day. The thought was that since undergraduate students are generally younger and party more during their years in school, they sleep less, on average, than graduate students. Do the data support this hypothesis? The following table summarizes the sample data: 3 Example 4 In a study conducted by the Department of Human Nutrition and Foods at the Virginia Polytechnic Institute and State University the following data on the comparison of sorbic acid residuals in parts per million in ham immediately after dipping in a sorbate solution and after 60 days of storage were recorded: 4 Let μ 1 - the mean number of hours undergraduate students sleep in a typical day μ - the mean number of hours graduate students sleep in a typical day 1) State the Null and alternative hypothesis ) State why we could use t-test or z-test safely to test the hypothesis 3) Carry out the test and report the test statistic and p-value Assuming the populations to be normally distributed, is there sufficient evidence, at the 0.05 level of significance, to say that the length of storage influences sorbic acid residual concentrations?

7 please explain the formula on page 11 what is the table on page 10 showing? explain about tree pairing lines in page 15 it seems they are different... why? in page 9 why we are (1-a)100% confident? i dont find it out what can we undersatnd from M1-M?! what is test type in page 1's chart?! what is the diffrences between paired comparison and unpaired comparison? what can we say if z=0 in the formula in page 11? pleaseexplain the ci fordifference beetween two populations! why did u tabulate the datu in page 6? in confidence interval,what is the mathematical explanation for the given formula? "why do we use α/ in the formula in confidence interval? what is the diffrences between independent & dependent variable in the studing of errors? "why don't we have more than one response variable? what are M1,M? what is the difference between paired and unpaired sample? if the variable is categorical,then what is M1? why is M1-M used for caim? please explain the formulas on page 9? what is paired sample? how hypotheses determine different regions? what is the relation between the amount of alpha and the amount of confidence? how you obtaine the formula on page 11 what is difference between table on page 1 an dtable on page 13? explain more about errors and their types? what is ƃ? 5

Chapter 9 Inferences from Two Samples

Chapter 9 Inferences from Two Samples 9-1 Review and Preview 9-2 Two Proportions 9-3 Two Means: Independent Samples 9-4 Two Dependent Samples (Matched Pairs) 9-5 Two Variances or Standard Deviations Review