Nicole Dalzell. July 2, 2014

Size: px
Start display at page:

Download "Nicole Dalzell. July 2, 2014"

Transcription

1 UNIT 1: INTRODUCTION TO DATA LECTURE 3: EDA (CONT.) AND INTRODUCTION TO STATISTICAL INFERENCE VIA SIMULATION STATISTICS 101 Nicole Dalzell July 2, 2014

2 Teams and Announcements Team1 = Houdan Sai Cui Huanqi Team2 = Ludi Li Jackson Hannah Team3 = Christian Christine Sasha Office Hours: Today from 2-3 PM in Old Chem 211 A. Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

3 Review and Recap 1 Review and Recap 2 Distribution of one numerical variable Standard Deviation 3 Distribution of one numerical variable Robust statistics 4 Relationship between a numerical and a categorical variable 5 Case study: Gender discrimination Study description and data Competing claims Testing via simulation Checking for independence 6 ] Statistics 101

4 Review and Recap Distribution Shape How would you describe the shape of the distribution of number of piercing college students have? frequency # of piercings Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

5 Distribution of one numerical variable 1 Review and Recap 2 Distribution of one numerical variable Standard Deviation 3 Distribution of one numerical variable Robust statistics 4 Relationship between a numerical and a categorical variable 5 Case study: Gender discrimination Study description and data Competing claims Testing via simulation Checking for independence 6 ] Statistics 101

6 Distribution of one numerical variable Standard Deviation Standard deviation Standard deviation, s Roughly the deviation around the mean, calculated as the square root of the variance, and has the same units as the data. s = s 2 = n i=1 (x i x) 2 n 1 Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

7 Distribution of one numerical variable Standard Deviation Standard deviation Standard deviation, s Roughly the deviation around the mean, calculated as the square root of the variance, and has the same units as the data. s = s 2 = n i=1 (x i x) 2 n 1 The standard deviation of energy use per capita can be calculated as: s = = Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

8 Distribution of one numerical variable Standard Deviation Standard Deviation The standard deviation gives a rough estimate of the typical distance of a data values from the mean. The larger the standard deviation, the more variability there is in the data and the more spread out the data are. Standard Deviation of 2 Standard Deviation of 4 Frequency Frequency rnorm(100, 0, 2) rnorm(100, 0, 4) Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

9 Distribution of one numerical variable Standard Deviation Variability in Student Sleep sleep, x = 4.6, s x = out of 86 students (80%) are within 1 SD of the mean. 80 out of 86 students (93%) are within 2 SDs of the mean. 86 out of 86 students (100%) are within 3 SDs of the mean. Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

10 Distribution of one numerical variable Standard Deviation 95% Rule 95 % Rule If a distribution of data is approximately symmetric and bell-shaped, about 95% of the data should fall within two standard deviations of the mean. For a population, 95% of the data will be between µ 2σ and µ + 2σ rchsbowman.files.wordpress.com/ 2008/ 09/ empirical-rule-3.jpg Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

11 Distribution of one numerical variable Standard Deviation Notation Recap mean variance SD sample x s 2 s population µ σ 2 σ Do you see a trend in what types of letters are used for sample statistics vs. population parameters? Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

12 Distribution of one numerical variable Standard Deviation Notation Recap mean variance SD sample x s 2 s population µ σ 2 σ Do you see a trend in what types of letters are used for sample statistics vs. population parameters? Latin letters for sample statistics, Greek letters for population parameters. Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

13 Distribution of one numerical variable Standard Deviation Z-Scores Z-Score The z-score for a data value, x i, is z = x i x s For a population, x is replaced with µ and s is replaced with σ. Values farther from 0 are more extreme. Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

14 Distribution of one numerical variable Standard Deviation Z-Scores: Why? A z-score puts values on a common scale A z-score is the number of standard deviations a value falls from the mean 95% of all z-scores fall between -2 and 2. z-scores beyond -2 or 2 can be considered extreme Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

15 Distribution of one numerical variable Standard Deviation Z-Scores: Example Which is better, (A) an ACT score of 28 or (B) a combined SAT score of 2100? Assume ACT and SAT scores have approximately bell-shaped distributions. ACT: x = 21, s = 5 SAT: x = 1500, s = 325 Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

16 Distribution of one numerical variable Standard Deviation Z-Scores: Example Which is better, (A) an ACT score of 28 or (B) a combined SAT score of 2100? Assume ACT and SAT scores have approximately bell-shaped distributions. ACT: x = 21, s = 5 SAT: x = 1500, s = 325 Histogram of Z Scores ACT: SAT: z = z = = 7 5 = 1.4 = = 1.85 Frequency Z Score Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

17 Distribution of one numerical variable Standard Deviation Other Measures of Location The 25 th percentile is also called the first quartile, Q1. The 50 th percentile is also called the median. The 75 th percentile is also called the third quartile, Q3. summary ( energy$x2011 ) Min. 1 s t Qu. Median Mean 3 rd Qu. Max Between Q1 and Q3 is the middle 50% of the data. The range these data span is called the interquartile range, or the IQR. IQR = = Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

18 Distribution of one numerical variable Standard Deviation Participation question Which of the following is false about the distribution of average number of hours students study daily? Average number of hours students study daily Min. 1st Qu. Median Mean 3rd Qu. Max (a) There are no students who don t study at all. (b) 75% of the students study more than 5 hours daily, on average. (c) 25% of the students study less than 3 hours, on average. (d) IQR is 2 hours. Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

19 Distribution of one numerical variable Standard Deviation Participation question Which of the following is false about the distribution of average number of hours students study daily? Average number of hours students study daily Min. 1st Qu. Median Mean 3rd Qu. Max (a) There are no students who don t study at all. (b) 75% of the students study more than 5 hours daily, on average. (c) 25% of the students study less than 3 hours, on average. (d) IQR is 2 hours. Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

20 Distribution of one numerical variable Standard Deviation Box Plot The box in a box plot represents the middle 50% of the data, and the thick line in the box is the median # of study hours / week Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

21 Distribution of one numerical variable Standard Deviation Anatomy of a Box Plot # of study hours / week suspected outliers max whisker reach upper whisker Q 3 (third quartile) median Q 1 (first quartile) 0 lower whisker Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

22 Distribution of one numerical variable Standard Deviation Whiskers and Outliers Whiskers of a box plot can extend up to 1.5 * IQR away from the quartiles. max upper whisker reach : Q IQR = = 35 max lower whisker reach : Q1 1.5 IQR = = 5 An outlier is defined as an observation beyond the maximum reach of the whiskers. It is an observation that appears extreme relative to the rest of the data. Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

23 Distribution of one numerical variable Standard Deviation Outliers (cont.) Why is it important to look for outliers? Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

24 Distribution of one numerical variable Standard Deviation Outliers (cont.) Why is it important to look for outliers? Identify extreme skew in the distribution. Identify data collection and entry errors. Provide insight into interesting features of the data. Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

25 Distribution of one numerical variable Standard Deviation Example: Visualizing What does our Energy Data look like? Energy Use Data Boxplot Energy Usage Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

26 Distribution of one numerical variable Standard Deviation Who uses the most energy? Country.Name X Iceland Qatar Trinidad and Tobago Kuwait Brunei Darussalam Oman Luxembourg United Arab Emirates Bahrain Canada North America United States Saudi Arabia Singapore Finland Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

27 Distribution of one numerical variable 1 Review and Recap 2 Distribution of one numerical variable Standard Deviation 3 Distribution of one numerical variable Robust statistics 4 Relationship between a numerical and a categorical variable 5 Case study: Gender discrimination Study description and data Competing claims Testing via simulation Checking for independence 6 ] Statistics 101

28 Distribution of one numerical variable Robust statistics Range and IQR Range Range of the entire data. range = max min Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

29 Distribution of one numerical variable Robust statistics Range and IQR Range Range of the entire data. range = max min IQR Range of the middle 50% of the data. IQR = Q3 Q1 Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

30 Distribution of one numerical variable Robust statistics Range and IQR Range Range of the entire data. range = max min IQR Range of the middle 50% of the data. IQR = Q3 Q1 Is the range or the IQR more robust to outliers? Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

31 Distribution of one numerical variable Robust statistics Range and IQR Range Range of the entire data. range = max min IQR Range of the middle 50% of the data. IQR = Q3 Q1 Is the range or the IQR more robust to outliers? IQR Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

32 Distribution of one numerical variable Robust statistics Extreme observations How would sample statistics such as mean, median, SD, and IQR of household income be affected if the largest value was replaced with $10 million? What if the smallest value was replaced with $10 million? household income ($ thousands) Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

33 Distribution of one numerical variable Robust statistics Income Example household income ($ thousands) robust not robust scenario median IQR x s original data 165K 150K 211K 180K move largest to $10 million 165K 150K 398K 1,422K move smallest to $10 million 190K 163K 4,186K 1,424K Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

34 Distribution of one numerical variable Robust statistics Robust statistics Since the median and IQR are more robust to skewness and outliers than mean and SD: skewed median and IQR symmetric mean and SD Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

35 Distribution of one numerical variable Robust statistics Robust statistics Since the median and IQR are more robust to skewness and outliers than mean and SD: skewed median and IQR symmetric mean and SD If you were searching for a car, and you are price conscious, would you be more interested in the mean or median vehicle price when considering a car? Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

36 Distribution of one numerical variable Robust statistics Mean vs. median If the distribution is symmetric, center is the mean Symmetric: mean is roughly equal to the median If the distribution is skewed or has outliers center is the median Right-skewed: mean is likely greater than the median Left-skewed: mean is likely less than the median red solid - mean, black dashed - median ls rs sym Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

37 Relationship between a numerical and a categorical variable 1 Review and Recap 2 Distribution of one numerical variable Standard Deviation 3 Distribution of one numerical variable Robust statistics 4 Relationship between a numerical and a categorical variable 5 Case study: Gender discrimination Study description and data Competing claims Testing via simulation Checking for independence 6 ] Statistics 101

38 Relationship between a numerical and a categorical variable Side-by-side box plot How does the number of the average number of times students go out per week vary by involvement? Do the two variables appear to be associated or independent? Greek Independent SLG Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

39 Case study: Gender discrimination 1 Review and Recap 2 Distribution of one numerical variable Standard Deviation 3 Distribution of one numerical variable Robust statistics 4 Relationship between a numerical and a categorical variable 5 Case study: Gender discrimination Study description and data Competing claims Testing via simulation Checking for independence 6 ] Statistics 101

40 Case study: Gender discrimination Study description and data Gender discrimination In 1972, as a part of a study on gender discrimination, 48 male bank supervisors were each given the same personnel file and asked to judge whether the person should be promoted to a branch manager job that was described as routine. The files were identical except that half of the supervisors had files showing the person was male while the other half had files showing the person was female. It was randomly determined which supervisors got male applications and which got female applications. Of the 48 files reviewed, 35 were promoted. The study is testing whether females are unfairly discriminated against. Is this an observational study or an experiment? B.Rosen and T. Jerdee (1974), Influence of sex role stereotypes on personnel decisions, J.Applied Psychology, 59:9-14. Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

41 Case study: Gender discrimination Study description and data Data At a first glance, does there appear to be a relatonship between promotion and gender? Gender Promotion Promoted Not Promoted Total Male Female Total Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

42 Case study: Gender discrimination Study description and data Data At a first glance, does there appear to be a relatonship between promotion and gender? Gender Promotion Promoted Not Promoted Total Male Female Total % of males promoted: 21/24 = % of females promoted: 14/24 = Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

43 Case study: Gender discrimination Study description and data Participation question We saw a difference of almost 30% (29.2% to be exact) between the proportion of male and female files that are promoted. Based on this information, which of the below is true? (a) If we were to repeat the experiment we will definitely see that more female files get promoted, this was a fluke. (b) Promotion is dependent on gender, males are more likely to be promoted, and hence there is gender discrimination against women in promotion decisions. (c) The difference in the proportions of promoted male and female files is due to chance, this is not evidence of gender discrimation against women in promotion decisions. (d) Women are less qualified than men, and this is why fewer females get promoted. Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

44 Case study: Gender discrimination Study description and data Participation question We saw a difference of almost 30% (29.2% to be exact) between the proportion of male and female files that are promoted. Based on this information, which of the below is true? (a) If we were to repeat the experiment we will definitely see that more female files get promoted, this was a fluke. (b) Promotion is dependent on gender, males are more likely to be promoted, and hence there is gender discrimination against women in promotion decisions. Maybe (c) The difference in the proportions of promoted male and female files is due to chance, this is not evidence of gender discrimation against women in promotion decisions. Maybe (d) Women are less qualified than men, and this is why fewer females get promoted. Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

45 Case study: Gender discrimination Competing claims Two competing claims 1 There is nothing going on. Promotion and gender are independent, no gender discrimination, observed difference in proportions is simply due to chance. Null hypothesis Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

46 Case study: Gender discrimination Competing claims Two competing claims 1 There is nothing going on. Promotion and gender are independent, no gender discrimination, observed difference in proportions is simply due to chance. Null hypothesis 2 There is something going on. Promotion and gender are dependent, there is gender discrimination, observed difference in proportions is not due to chance. Alternative hypothesis Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

47 Case study: Gender discrimination Competing claims A trial as a hypothesis test Hypothesis testing is very much like a court trial. Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

48 Case study: Gender discrimination Competing claims A trial as a hypothesis test Hypothesis testing is very much like a court trial. H 0 : Defendant is innocent Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

49 Case study: Gender discrimination Competing claims A trial as a hypothesis test Hypothesis testing is very much like a court trial. H 0 : Defendant is innocent H A : Defendant is guilty Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

50 Case study: Gender discrimination Competing claims A trial as a hypothesis test Hypothesis testing is very much like a court trial. H 0 : Defendant is innocent H A : Defendant is guilty Present the evidence: collect data. Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

51 Case study: Gender discrimination Competing claims A trial as a hypothesis test Hypothesis testing is very much like a court trial. H 0 : Defendant is innocent H A : Defendant is guilty Present the evidence: collect data. Judge the evidence: Could these data plausibly have happened by chance if the null hypothesis were true? Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

52 Case study: Gender discrimination Competing claims A trial as a hypothesis test Hypothesis testing is very much like a court trial. H 0 : Defendant is innocent H A : Defendant is guilty Present the evidence: collect data. Judge the evidence: Could these data plausibly have happened by chance if the null hypothesis were true? Make a decision: How unlikely is unlikely? Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

53 Case study: Gender discrimination Competing claims A trial as a hypothesis test Hypothesis testing is very much like a court trial. H 0 : Defendant is innocent H A : Defendant is guilty Present the evidence: collect data. Judge the evidence: Could these data plausibly have happened by chance if the null hypothesis were true? Make a decision: How unlikely is unlikely? Evidence not strong enough to reject the assumption of innocence verdict: not guilty The jury does not say that the defendant is innocent, just that there is not enough evidence to convict. The defendant may, in fact, be innocent, but the jury has no way of being sure. Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

54 Case study: Gender discrimination Competing claims Recap: hypothesis testing framework We start with a null hypothesis (H 0 ) that represents the status quo. Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

55 Case study: Gender discrimination Competing claims Recap: hypothesis testing framework We start with a null hypothesis (H 0 ) that represents the status quo. We also have an alternative hypothesis (H A ) that represents our research question, i.e. what we re testing for. Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

56 Case study: Gender discrimination Competing claims Recap: hypothesis testing framework We start with a null hypothesis (H 0 ) that represents the status quo. We also have an alternative hypothesis (H A ) that represents our research question, i.e. what we re testing for. We conduct a hypothesis test under the assumption that the null hypothesis is true, either via simulation (today) or theoretical methods (later in the course). Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

57 Case study: Gender discrimination Competing claims Recap: hypothesis testing framework We start with a null hypothesis (H 0 ) that represents the status quo. We also have an alternative hypothesis (H A ) that represents our research question, i.e. what we re testing for. We conduct a hypothesis test under the assumption that the null hypothesis is true, either via simulation (today) or theoretical methods (later in the course). If the test results suggest that the data do not provide convincing evidence for the alternative hypothesis, we stick with the null hypothesis. If they do, then we reject the null hypothesis in favor of the alternative. We never declare the null hypothesis to be true, because we simply do not know whether it s true or not. Therefore we never accept the null hypothesis. Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

58 Case study: Gender discrimination Testing via simulation Checking for Independence using Simulation Instead of repeating the experiment (difficult in this case), we instead simulate our results under the assumption of independence. If results from the simulations based on the chance model look like our data, then we can say that the differences between men and women were simply due to chance. If results from the simulations do not look like the data, then we can say that the difference between men and women was not due to chance. Gender Promotion Promoted Not Promoted Total Male Female Total Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

59 Case study: Gender discrimination Testing via simulation Simulating the experiment... Simulate the experiment in a way that satisfies the null hypothesis (in this case, in a way that there is no discrimination against females) Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

60 Case study: Gender discrimination Testing via simulation Simulating the experiment... Simulate the experiment in a way that satisfies the null hypothesis (in this case, in a way that there is no discrimination against females) Determine if the observed outcome from the original experiment (roughly 30% more males being promoted) is a likely outcome when things are left up to chance. Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

61 Case study: Gender discrimination Testing via simulation Simulating the experiment... Simulate the experiment in a way that satisfies the null hypothesis (in this case, in a way that there is no discrimination against females) Determine if the observed outcome from the original experiment (roughly 30% more males being promoted) is a likely outcome when things are left up to chance. If the results from the simulations based on the chance model do not look like the data, determine that the observed difference between males and females was due to an actual effect of gender (promotion and gender are dependent). Gender Promotion Promoted Not Promoted Total Male Female Total Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

62 Case study: Gender discrimination Testing via simulation Simulation setup 1 We ll let a face card represent not promoted and a non-face card represent a promoted. Consider aces as face cards. Set aside the jokers. Take out 3 aces there are exactly 13 face cards left in the deck (face cards: A, K, Q, J): NOT PROMOTED Take out a number card there are exactly 35 number (non-face) cards left in the deck (number cards: 2-10): PROMOTED Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

63 Case study: Gender discrimination Testing via simulation Simulation setup 1 We ll let a face card represent not promoted and a non-face card represent a promoted. Consider aces as face cards. Set aside the jokers. Take out 3 aces there are exactly 13 face cards left in the deck (face cards: A, K, Q, J): NOT PROMOTED Take out a number card there are exactly 35 number (non-face) cards left in the deck (number cards: 2-10): PROMOTED 2 Shuffle the cards and deal them into two groups of size 24, representing males and females. Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

64 Case study: Gender discrimination Testing via simulation Simulation setup 1 We ll let a face card represent not promoted and a non-face card represent a promoted. Consider aces as face cards. Set aside the jokers. Take out 3 aces there are exactly 13 face cards left in the deck (face cards: A, K, Q, J): NOT PROMOTED Take out a number card there are exactly 35 number (non-face) cards left in the deck (number cards: 2-10): PROMOTED 2 Shuffle the cards and deal them into two groups of size 24, representing males and females. 3 Count and record how many files in each group are promoted (number cards). Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

65 Case study: Gender discrimination Testing via simulation Simulation setup 1 We ll let a face card represent not promoted and a non-face card represent a promoted. Consider aces as face cards. Set aside the jokers. Take out 3 aces there are exactly 13 face cards left in the deck (face cards: A, K, Q, J): NOT PROMOTED Take out a number card there are exactly 35 number (non-face) cards left in the deck (number cards: 2-10): PROMOTED 2 Shuffle the cards and deal them into two groups of size 24, representing males and females. 3 Count and record how many files in each group are promoted (number cards). 4 Calculate the proportion of promoted files in each group and take the difference (male - female). Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

66 Case study: Gender discrimination Testing via simulation Simulation setup 1 We ll let a face card represent not promoted and a non-face card represent a promoted. Consider aces as face cards. Set aside the jokers. Take out 3 aces there are exactly 13 face cards left in the deck (face cards: A, K, Q, J): NOT PROMOTED Take out a number card there are exactly 35 number (non-face) cards left in the deck (number cards: 2-10): PROMOTED 2 Shuffle the cards and deal them into two groups of size 24, representing males and females. 3 Count and record how many files in each group are promoted (number cards). 4 Calculate the proportion of promoted files in each group and take the difference (male - female). 5 Report the difference on the board (up to 2 decimal places, only 1 submission for each simulation). Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

67 Case study: Gender discrimination Testing via simulation Step 1 Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

68 Case study: Gender discrimination Testing via simulation Step 2-4 Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

69 Case study: Gender discrimination Testing via simulation Simulating with StatKey lock5stat.com/ statkey/ randomization 2 cat/ randomization 2 cat.html Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

70 Case study: Gender discrimination Checking for independence Participation question Do the data provide convincing evidence of gender discrimination against women, i.e. dependence between gender and promotion decisions? (a) No, the data do not provide convincing evidence for the alternative hypothesis, therefore we can t reject the null hypothesis of independence between gender and promotion decisions. The observed difference between the two proportions was due to chance. (b) Yes, the data provide convincing evidence for the alternative hypothesis of gender discrimination against women in promotion decisions. The observed difference between the two proportions was due to a real effect of gender. Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

71 Case study: Gender discrimination Checking for independence Participation question Do the data provide convincing evidence of gender discrimination against women, i.e. dependence between gender and promotion decisions? (a) No, the data do not provide convincing evidence for the alternative hypothesis, therefore we can t reject the null hypothesis of independence between gender and promotion decisions. The observed difference between the two proportions was due to chance. (b) Yes, the data provide convincing evidence for the alternative hypothesis of gender discrimination against women in promotion decisions. The observed difference between the two proportions was due to a real effect of gender. Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

72 Case study: Gender discrimination Checking for independence Making a decision The probability of observing a difference at least as favorable to the alternative hypothesis as the one observed in the original data (a difference of 29.2%) if H 0 is true is called the p-value. Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

73 Case study: Gender discrimination Checking for independence Making a decision The probability of observing a difference at least as favorable to the alternative hypothesis as the one observed in the original data (a difference of 29.2%) if H 0 is true is called the p-value. The significance level is the threshold against which we compare the p-value to determine if it s small enough to reject the null hypothesis (this is usually 5%). Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

74 Case study: Gender discrimination Checking for independence Making a decision The probability of observing a difference at least as favorable to the alternative hypothesis as the one observed in the original data (a difference of 29.2%) if H 0 is true is called the p-value. The significance level is the threshold against which we compare the p-value to determine if it s small enough to reject the null hypothesis (this is usually 5%). Difference in promotion rates Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

75 Case study: Gender discrimination Checking for independence Making a decision The probability of observing a difference at least as favorable to the alternative hypothesis as the one observed in the original data (a difference of 29.2%) if H 0 is true is called the p-value. The significance level is the threshold against which we compare the p-value to determine if it s small enough to reject the null hypothesis (this is usually 5%). Difference in promotion rates p value= 1/100 = 0.01 Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

76 Case study: Gender discrimination Checking for independence To Do: Complete Lab 1, submit on Sakai by today at 5 PM Work on Problem Set (PS) 1, due at 11 AM in class tomorrow. Please staple your work! Reminder: Office Hours today from 2-3 PM in Old Chem 211 A Reading: Begin Reading Chapter 2 in preparation for tomorrow s lecture. Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

77 Case study: Tapping on caffeine [Time permitting 1 Review and Recap 2 Distribution of one numerical variable Standard Deviation 3 Distribution of one numerical variable Robust statistics 4 Relationship between a numerical and a categorical variable 5 Case study: Gender discrimination Study description and data Competing claims Testing via simulation Checking for independence 6 ] Statistics 101

78 Case study: Tapping on caffeine [Time permitting Case study: Tapping on caffeine [Time permitting] Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

79 Case study: Tapping on caffeine [Time permitting Tapping on caffeine In a double-blind experiment a sample of male college students were asked to tap their fingers at a rapid rate. The sample was then divided at random into two groups of 10 students each. Each student drank the equivalent of about two cups of coffee, which included about 200 mg of caffeine for the students in one group but was decaffeinated coffee for the second group. After a two hour period, each student was tested to measure finger tapping rate (taps per minute). Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

80 Case study: Tapping on caffeine [Time permitting Data Taps Group Caffeine Caffeine Caffeine Caffeine Caffeine Caffeine NoCaffeine NoCaffeine NoCaffeine NoCaffeine NoCaffeine Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

81 Case study: Tapping on caffeine [Time permitting Participation question What type of plot would be useful to visualize the distributions of tapping rate in the caffeine and no caffeine groups. (a) Bar plot (b) Mosaic plot (c) Pie chart (d) Side-by-side box plots (e) Single box plot Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

82 Case study: Tapping on caffeine [Time permitting Participation question What type of plot would be useful to visualize the distributions of tapping rate in the caffeine and no caffeine groups. (a) Bar plot (b) Mosaic plot (c) Pie chart (d) Side-by-side box plots (e) Single box plot Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

83 Case study: Tapping on caffeine [Time permitting Exploratory data analysis Compare the distributions of tapping rates in the caffeine and no caffeine groups. Caffeine No Caffeine Difference mean SD median IQR Caffeine NoCaffeine Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

84 Case study: Tapping on caffeine [Time permitting Participation question We are interested in finding out if caffeine increases tapping rate. Which of the following are the correct set of hypotheses? (a) H 0 : µ caff = µ no caff H A : µ caff < µ no caff (b) H 0 : µ caff = µ no caff H A : µ caff > µ no caff (c) H 0 : x caff = x no caff H A : x caff > x no caff (d) H 0 : µ caff > µ no caff H A : µ caff = µ no caff (e) H 0 : µ caff = µ no caff H A : µ caff µ no caff Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

85 Case study: Tapping on caffeine [Time permitting Participation question We are interested in finding out if caffeine increases tapping rate. Which of the following are the correct set of hypotheses? (a) H 0 : µ caff = µ no caff H A : µ caff < µ no caff (b) H 0 : µ caff = µ no caff H A : µ caff > µ no caff (c) H 0 : x caff = x no caff H A : x caff > x no caff (d) H 0 : µ caff > µ no caff H A : µ caff = µ no caff (e) H 0 : µ caff = µ no caff H A : µ caff µ no caff Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

86 Case study: Tapping on caffeine [Time permitting Simulation scheme On 20 index cards write the tapping rate of each subject in the study. Shuffle the cards and divide them into two stacks of 10 cards each, label one stack caffeine and the other stack no caffeine. Calculate the average tapping rates in the two simulated groups, and record the difference on a dot plot. Repeat steps (2) and (3) many times to build a randomization distribution. Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

87 Case study: Tapping on caffeine [Time permitting Making a decision Calculate the p-value based on the randomization distribution below and determine the conclusion of the hypothesis test. (100 simulations) Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

88 Case study: Tapping on caffeine [Time permitting Making a decision Calculate the p-value based on the randomization distribution below and determine the conclusion of the hypothesis test. (100 simulations) /100 = 0.01 Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

89 Case study: Tapping on caffeine [Time permitting Testing for the median Describe how could we use the same approach to test whether the median tapping rate is higher for the caffeine group? Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

90 Case study: Tapping on caffeine [Time permitting Testing for the median Describe how could we use the same approach to test whether the median tapping rate is higher for the caffeine group? Use the same simulation scheme but record the difference between the medians instead of the means, and calculate the p-value as the proportion of simulations where the simulated difference in medians is at least 3. Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

91 Case study: Tapping on caffeine [Time permitting Testing for the median (cont.) Using the randomization distribution below of simulated differences in means, determine whether the data provide convincing evidence that caffeine increases median tapping rate. Caffeine No Caffeine Difference median affeine Randomization distribution Statistics 101 ( Nicole Dalzell) U1 - L3: EDA + Inference July 2, / 48

Announcements. Lecture 1 - Data and Data Summaries. Data. Numerical Data. all variables. continuous discrete. Homework 1 - Out 1/15, due 1/22

Announcements. Lecture 1 - Data and Data Summaries. Data. Numerical Data. all variables. continuous discrete. Homework 1 - Out 1/15, due 1/22 Announcements Announcements Lecture 1 - Data and Data Summaries Statistics 102 Colin Rundel January 13, 2013 Homework 1 - Out 1/15, due 1/22 Lab 1 - Tomorrow RStudio accounts created this evening Try logging

More information

MATH 1150 Chapter 2 Notation and Terminology

MATH 1150 Chapter 2 Notation and Terminology MATH 1150 Chapter 2 Notation and Terminology Categorical Data The following is a dataset for 30 randomly selected adults in the U.S., showing the values of two categorical variables: whether or not the

More information

Resistant Measure - A statistic that is not affected very much by extreme observations.

Resistant Measure - A statistic that is not affected very much by extreme observations. Chapter 1.3 Lecture Notes & Examples Section 1.3 Describing Quantitative Data with Numbers (pp. 50-74) 1.3.1 Measuring Center: The Mean Mean - The arithmetic average. To find the mean (pronounced x bar)

More information

Elementary Statistics

Elementary Statistics Elementary Statistics Q: What is data? Q: What does the data look like? Q: What conclusions can we draw from the data? Q: Where is the middle of the data? Q: Why is the spread of the data important? Q:

More information

Chapter 2: Tools for Exploring Univariate Data

Chapter 2: Tools for Exploring Univariate Data Stats 11 (Fall 2004) Lecture Note Introduction to Statistical Methods for Business and Economics Instructor: Hongquan Xu Chapter 2: Tools for Exploring Univariate Data Section 2.1: Introduction What is

More information

Describing Distributions

Describing Distributions Describing Distributions With Numbers April 18, 2012 Summary Statistics. Measures of Center. Percentiles. Measures of Spread. A Summary Statement. Choosing Numerical Summaries. 1.0 What Are Summary Statistics?

More information

STAT 200 Chapter 1 Looking at Data - Distributions

STAT 200 Chapter 1 Looking at Data - Distributions STAT 200 Chapter 1 Looking at Data - Distributions What is Statistics? Statistics is a science that involves the design of studies, data collection, summarizing and analyzing the data, interpreting the

More information

20 Hypothesis Testing, Part I

20 Hypothesis Testing, Part I 20 Hypothesis Testing, Part I Bob has told Alice that the average hourly rate for a lawyer in Virginia is $200 with a standard deviation of $50, but Alice wants to test this claim. If Bob is right, she

More information

Announcements. Unit 3: Foundations for inference Lecture 3: Decision errors, significance levels, sample size, and power.

Announcements. Unit 3: Foundations for inference Lecture 3: Decision errors, significance levels, sample size, and power. Announcements Announcements Unit 3: Foundations for inference Lecture 3:, significance levels, sample size, and power Statistics 101 Mine Çetinkaya-Rundel October 1, 2013 Project proposal due 5pm on Friday,

More information

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1 Lecture Slides Elementary Statistics Tenth Edition and the Triola Statistics Series by Mario F. Triola Slide 1 Chapter 3 Statistics for Describing, Exploring, and Comparing Data 3-1 Overview 3-2 Measures

More information

Fundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur

Fundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur Fundamentals to Biostatistics Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur Statistics collection, analysis, interpretation of data development of new

More information

What is statistics? Statistics is the science of: Collecting information. Organizing and summarizing the information collected

What is statistics? Statistics is the science of: Collecting information. Organizing and summarizing the information collected What is statistics? Statistics is the science of: Collecting information Organizing and summarizing the information collected Analyzing the information collected in order to draw conclusions Two types

More information

Unit 19 Formulating Hypotheses and Making Decisions

Unit 19 Formulating Hypotheses and Making Decisions Unit 19 Formulating Hypotheses and Making Decisions Objectives: To formulate a null hypothesis and an alternative hypothesis, and to choose a significance level To identify the Type I error and the Type

More information

Chapter 2 Solutions Page 15 of 28

Chapter 2 Solutions Page 15 of 28 Chapter Solutions Page 15 of 8.50 a. The median is 55. The mean is about 105. b. The median is a more representative average" than the median here. Notice in the stem-and-leaf plot on p.3 of the text that

More information

Further Mathematics 2018 CORE: Data analysis Chapter 2 Summarising numerical data

Further Mathematics 2018 CORE: Data analysis Chapter 2 Summarising numerical data Chapter 2: Summarising numerical data Further Mathematics 2018 CORE: Data analysis Chapter 2 Summarising numerical data Extract from Study Design Key knowledge Types of data: categorical (nominal and ordinal)

More information

AP Final Review II Exploring Data (20% 30%)

AP Final Review II Exploring Data (20% 30%) AP Final Review II Exploring Data (20% 30%) Quantitative vs Categorical Variables Quantitative variables are numerical values for which arithmetic operations such as means make sense. It is usually a measure

More information

Chapter 1: Exploring Data

Chapter 1: Exploring Data Chapter 1: Exploring Data Section 1.3 with Numbers The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE Chapter 1 Exploring Data Introduction: Data Analysis: Making Sense of Data 1.1

More information

Lecture 1: Description of Data. Readings: Sections 1.2,

Lecture 1: Description of Data. Readings: Sections 1.2, Lecture 1: Description of Data Readings: Sections 1.,.1-.3 1 Variable Example 1 a. Write two complete and grammatically correct sentences, explaining your primary reason for taking this course and then

More information

CIVL 7012/8012. Collection and Analysis of Information

CIVL 7012/8012. Collection and Analysis of Information CIVL 7012/8012 Collection and Analysis of Information Uncertainty in Engineering Statistics deals with the collection and analysis of data to solve real-world problems. Uncertainty is inherent in all real

More information

Introduction to Statistics

Introduction to Statistics Introduction to Statistics Data and Statistics Data consists of information coming from observations, counts, measurements, or responses. Statistics is the science of collecting, organizing, analyzing,

More information

Chapter 2 Class Notes Sample & Population Descriptions Classifying variables

Chapter 2 Class Notes Sample & Population Descriptions Classifying variables Chapter 2 Class Notes Sample & Population Descriptions Classifying variables Random Variables (RVs) are discrete quantitative continuous nominal qualitative ordinal Notation and Definitions: a Sample is

More information

Introduction to Statistics for Traffic Crash Reconstruction

Introduction to Statistics for Traffic Crash Reconstruction Introduction to Statistics for Traffic Crash Reconstruction Jeremy Daily Jackson Hole Scientific Investigations, Inc. c 2003 www.jhscientific.com Why Use and Learn Statistics? 1. We already do when ranging

More information

CHAPTER 2: Describing Distributions with Numbers

CHAPTER 2: Describing Distributions with Numbers CHAPTER 2: Describing Distributions with Numbers The Basic Practice of Statistics 6 th Edition Moore / Notz / Fligner Lecture PowerPoint Slides Chapter 2 Concepts 2 Measuring Center: Mean and Median Measuring

More information

Announcements. Final Review: Units 1-7

Announcements. Final Review: Units 1-7 Announcements Announcements Final : Units 1-7 Statistics 104 Mine Çetinkaya-Rundel June 24, 2013 Final on Wed: cheat sheet (one sheet, front and back) and calculator Must have webcam + audio on at all

More information

M 140 Test 1 B Name (1 point) SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 75

M 140 Test 1 B Name (1 point) SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 75 M 140 est 1 B Name (1 point) SHOW YOUR WORK FOR FULL CREDI! Problem Max. Points Your Points 1-10 10 11 10 12 3 13 4 14 18 15 8 16 7 17 14 otal 75 Multiple choice questions (1 point each) For questions

More information

Describing distributions with numbers

Describing distributions with numbers Describing distributions with numbers A large number or numerical methods are available for describing quantitative data sets. Most of these methods measure one of two data characteristics: The central

More information

1.3.1 Measuring Center: The Mean

1.3.1 Measuring Center: The Mean 1.3.1 Measuring Center: The Mean Mean - The arithmetic average. To find the mean (pronounced x bar) of a set of observations, add their values and divide by the number of observations. If the n observations

More information

Describing distributions with numbers

Describing distributions with numbers Describing distributions with numbers A large number or numerical methods are available for describing quantitative data sets. Most of these methods measure one of two data characteristics: The central

More information

(quantitative or categorical variables) Numerical descriptions of center, variability, position (quantitative variables)

(quantitative or categorical variables) Numerical descriptions of center, variability, position (quantitative variables) 3. Descriptive Statistics Describing data with tables and graphs (quantitative or categorical variables) Numerical descriptions of center, variability, position (quantitative variables) Bivariate descriptions

More information

Measures of the Location of the Data

Measures of the Location of the Data Measures of the Location of the Data 1. 5. Mark has 51 films in his collection. Each movie comes with a rating on a scale from 0.0 to 10.0. The following table displays the ratings of the aforementioned

More information

Math 223 Lecture Notes 3/15/04 From The Basic Practice of Statistics, bymoore

Math 223 Lecture Notes 3/15/04 From The Basic Practice of Statistics, bymoore Math 223 Lecture Notes 3/15/04 From The Basic Practice of Statistics, bymoore Chapter 3 continued Describing distributions with numbers Measuring spread of data: Quartiles Definition 1: The interquartile

More information

What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty.

What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty. What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty. Statistics is a field of study concerned with the data collection,

More information

Units. Exploratory Data Analysis. Variables. Student Data

Units. Exploratory Data Analysis. Variables. Student Data Units Exploratory Data Analysis Bret Larget Departments of Botany and of Statistics University of Wisconsin Madison Statistics 371 13th September 2005 A unit is an object that can be measured, such as

More information

1. Descriptive stats methods for organizing and summarizing information

1. Descriptive stats methods for organizing and summarizing information Two basic types of statistics: 1. Descriptive stats methods for organizing and summarizing information Stats in sports are a great example Usually we use graphs, charts, and tables showing averages and

More information

Practice problems from chapters 2 and 3

Practice problems from chapters 2 and 3 Practice problems from chapters and 3 Question-1. For each of the following variables, indicate whether it is quantitative or qualitative and specify which of the four levels of measurement (nominal, ordinal,

More information

Instructor: Doug Ensley Course: MAT Applied Statistics - Ensley

Instructor: Doug Ensley Course: MAT Applied Statistics - Ensley Student: Date: Instructor: Doug Ensley Course: MAT117 01 Applied Statistics - Ensley Assignment: Online 04 - Sections 2.5 and 2.6 1. A travel magazine recently presented data on the annual number of vacation

More information

Statistics for Managers Using Microsoft Excel/SPSS Chapter 8 Fundamentals of Hypothesis Testing: One-Sample Tests

Statistics for Managers Using Microsoft Excel/SPSS Chapter 8 Fundamentals of Hypothesis Testing: One-Sample Tests Statistics for Managers Using Microsoft Excel/SPSS Chapter 8 Fundamentals of Hypothesis Testing: One-Sample Tests 1999 Prentice-Hall, Inc. Chap. 8-1 Chapter Topics Hypothesis Testing Methodology Z Test

More information

Describing Distributions With Numbers Chapter 12

Describing Distributions With Numbers Chapter 12 Describing Distributions With Numbers Chapter 12 May 1, 2013 What Do We Usually Summarize? Measures of Center. Percentiles. Measures of Spread. A Summary. 1.0 What Do We Usually Summarize? source: Prof.

More information

Sets and Set notation. Algebra 2 Unit 8 Notes

Sets and Set notation. Algebra 2 Unit 8 Notes Sets and Set notation Section 11-2 Probability Experimental Probability experimental probability of an event: Theoretical Probability number of time the event occurs P(event) = number of trials Sample

More information

AP Statistics Cumulative AP Exam Study Guide

AP Statistics Cumulative AP Exam Study Guide AP Statistics Cumulative AP Eam Study Guide Chapters & 3 - Graphs Statistics the science of collecting, analyzing, and drawing conclusions from data. Descriptive methods of organizing and summarizing statistics

More information

Business Statistics. Lecture 10: Course Review

Business Statistics. Lecture 10: Course Review Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,

More information

Lecture 2. Quantitative variables. There are three main graphical methods for describing, summarizing, and detecting patterns in quantitative data:

Lecture 2. Quantitative variables. There are three main graphical methods for describing, summarizing, and detecting patterns in quantitative data: Lecture 2 Quantitative variables There are three main graphical methods for describing, summarizing, and detecting patterns in quantitative data: Stemplot (stem-and-leaf plot) Histogram Dot plot Stemplots

More information

1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved.

1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved. 1-1 Chapter 1 Sampling and Descriptive Statistics 1-2 Why Statistics? Deal with uncertainty in repeated scientific measurements Draw conclusions from data Design valid experiments and draw reliable conclusions

More information

MATH 10 INTRODUCTORY STATISTICS

MATH 10 INTRODUCTORY STATISTICS MATH 10 INTRODUCTORY STATISTICS Tommy Khoo Your friendly neighbourhood graduate student. Week 1 Chapter 1 Introduction What is Statistics? Why do you need to know Statistics? Technical lingo and concepts:

More information

STAT Chapter 8: Hypothesis Tests

STAT Chapter 8: Hypothesis Tests STAT 515 -- Chapter 8: Hypothesis Tests CIs are possibly the most useful forms of inference because they give a range of reasonable values for a parameter. But sometimes we want to know whether one particular

More information

Objective A: Mean, Median and Mode Three measures of central of tendency: the mean, the median, and the mode.

Objective A: Mean, Median and Mode Three measures of central of tendency: the mean, the median, and the mode. Chapter 3 Numerically Summarizing Data Chapter 3.1 Measures of Central Tendency Objective A: Mean, Median and Mode Three measures of central of tendency: the mean, the median, and the mode. A1. Mean The

More information

Descriptive Statistics-I. Dr Mahmoud Alhussami

Descriptive Statistics-I. Dr Mahmoud Alhussami Descriptive Statistics-I Dr Mahmoud Alhussami Biostatistics What is the biostatistics? A branch of applied math. that deals with collecting, organizing and interpreting data using well-defined procedures.

More information

Math 58. Rumbos Fall More Review Problems Solutions

Math 58. Rumbos Fall More Review Problems Solutions Math 58. Rumbos Fall 2008 1 More Review Problems Solutions 1. A particularly common question in the study of wildlife behavior involves observing contests between residents of a particular area and intruders.

More information

Describing Distributions With Numbers

Describing Distributions With Numbers Describing Distributions With Numbers October 24, 2012 What Do We Usually Summarize? Measures of Center. Percentiles. Measures of Spread. A Summary Statement. Choosing Numerical Summaries. 1.0 What Do

More information

Chapter 6 The Standard Deviation as a Ruler and the Normal Model

Chapter 6 The Standard Deviation as a Ruler and the Normal Model Chapter 6 The Standard Deviation as a Ruler and the Normal Model Overview Key Concepts Understand how adding (subtracting) a constant or multiplying (dividing) by a constant changes the center and/or spread

More information

STT 315 This lecture is based on Chapter 2 of the textbook.

STT 315 This lecture is based on Chapter 2 of the textbook. STT 315 This lecture is based on Chapter 2 of the textbook. Acknowledgement: Author is thankful to Dr. Ashok Sinha, Dr. Jennifer Kaplan and Dr. Parthanil Roy for allowing him to use/edit some of their

More information

Chapter 9. Hypothesis testing. 9.1 Introduction

Chapter 9. Hypothesis testing. 9.1 Introduction Chapter 9 Hypothesis testing 9.1 Introduction Confidence intervals are one of the two most common types of statistical inference. Use them when our goal is to estimate a population parameter. The second

More information

Test 1 Review. Review. Cathy Poliak, Ph.D. Office in Fleming 11c (Department Reveiw of Mathematics University of Houston Exam 1)

Test 1 Review. Review. Cathy Poliak, Ph.D. Office in Fleming 11c (Department Reveiw of Mathematics University of Houston Exam 1) Test 1 Review Review Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Exam 1 Review Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c

More information

FSA Algebra I End-of-Course Review Packet

FSA Algebra I End-of-Course Review Packet FSA Algebra I End-of-Course Review Packet Table of Contents MAFS.912.N-RN.1.2 EOC Practice... 3 MAFS.912.N-RN.2.3 EOC Practice... 5 MAFS.912.N-RN.1.1 EOC Practice... 8 MAFS.912.S-ID.1.1 EOC Practice...

More information

Unit5: Inferenceforcategoricaldata. 4. MT2 Review. Sta Fall Duke University, Department of Statistical Science

Unit5: Inferenceforcategoricaldata. 4. MT2 Review. Sta Fall Duke University, Department of Statistical Science Unit5: Inferenceforcategoricaldata 4. MT2 Review Sta 101 - Fall 2015 Duke University, Department of Statistical Science Dr. Çetinkaya-Rundel Slides posted at http://bit.ly/sta101_f15 Outline 1. Housekeeping

More information

Descriptive Univariate Statistics and Bivariate Correlation

Descriptive Univariate Statistics and Bivariate Correlation ESC 100 Exploring Engineering Descriptive Univariate Statistics and Bivariate Correlation Instructor: Sudhir Khetan, Ph.D. Wednesday/Friday, October 17/19, 2012 The Central Dogma of Statistics used to

More information

Lecture 3. Measures of Relative Standing and. Exploratory Data Analysis (EDA)

Lecture 3. Measures of Relative Standing and. Exploratory Data Analysis (EDA) Lecture 3. Measures of Relative Standing and Exploratory Data Analysis (EDA) Problem: The average weekly sales of a small company are $10,000 with a standard deviation of $450. This week their sales were

More information

Section 3. Measures of Variation

Section 3. Measures of Variation Section 3 Measures of Variation Range Range = (maximum value) (minimum value) It is very sensitive to extreme values; therefore not as useful as other measures of variation. Sample Standard Deviation The

More information

Chapter 6 Group Activity - SOLUTIONS

Chapter 6 Group Activity - SOLUTIONS Chapter 6 Group Activity - SOLUTIONS Group Activity Summarizing a Distribution 1. The following data are the number of credit hours taken by Math 105 students during a summer term. You will be analyzing

More information

3.1 Measure of Center

3.1 Measure of Center 3.1 Measure of Center Calculate the mean for a given data set Find the median, and describe why the median is sometimes preferable to the mean Find the mode of a data set Describe how skewness affects

More information

Final Exam STAT On a Pareto chart, the frequency should be represented on the A) X-axis B) regression C) Y-axis D) none of the above

Final Exam STAT On a Pareto chart, the frequency should be represented on the A) X-axis B) regression C) Y-axis D) none of the above King Abdul Aziz University Faculty of Sciences Statistics Department Final Exam STAT 0 First Term 49-430 A 40 Name No ID: Section: You have 40 questions in 9 pages. You have 90 minutes to solve the exam.

More information

Chapter 6. The Standard Deviation as a Ruler and the Normal Model 1 /67

Chapter 6. The Standard Deviation as a Ruler and the Normal Model 1 /67 Chapter 6 The Standard Deviation as a Ruler and the Normal Model 1 /67 Homework Read Chpt 6 Complete Reading Notes Do P129 1, 3, 5, 7, 15, 17, 23, 27, 29, 31, 37, 39, 43 2 /67 Objective Students calculate

More information

where Female = 0 for males, = 1 for females Age is measured in years (22, 23, ) GPA is measured in units on a four-point scale (0, 1.22, 3.45, etc.

where Female = 0 for males, = 1 for females Age is measured in years (22, 23, ) GPA is measured in units on a four-point scale (0, 1.22, 3.45, etc. Notes on regression analysis 1. Basics in regression analysis key concepts (actual implementation is more complicated) A. Collect data B. Plot data on graph, draw a line through the middle of the scatter

More information

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 3.1- #

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 3.1- # Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series by Mario F. Triola Chapter 3 Statistics for Describing, Exploring, and Comparing Data 3-1 Review and Preview 3-2 Measures

More information

LECTURE 5. Introduction to Econometrics. Hypothesis testing

LECTURE 5. Introduction to Econometrics. Hypothesis testing LECTURE 5 Introduction to Econometrics Hypothesis testing October 18, 2016 1 / 26 ON TODAY S LECTURE We are going to discuss how hypotheses about coefficients can be tested in regression models We will

More information

TOPIC: Descriptive Statistics Single Variable

TOPIC: Descriptive Statistics Single Variable TOPIC: Descriptive Statistics Single Variable I. Numerical data summary measurements A. Measures of Location. Measures of central tendency Mean; Median; Mode. Quantiles - measures of noncentral tendency

More information

5.2 Tests of Significance

5.2 Tests of Significance 5.2 Tests of Significance Example 5.7. Diet colas use artificial sweeteners to avoid sugar. Colas with artificial sweeteners gradually lose their sweetness over time. Manufacturers therefore test new colas

More information

Exam: practice test 1 MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Exam: practice test 1 MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Exam: practice test MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Solve the problem. ) Using the information in the table on home sale prices in

More information

Statistics for IT Managers

Statistics for IT Managers Statistics for IT Managers 95-796, Fall 2012 Module 2: Hypothesis Testing and Statistical Inference (5 lectures) Reading: Statistics for Business and Economics, Ch. 5-7 Confidence intervals Given the sample

More information

Section 2.3: One Quantitative Variable: Measures of Spread

Section 2.3: One Quantitative Variable: Measures of Spread Section 2.3: One Quantitative Variable: Measures of Spread Objectives: 1) Measures of spread, variability a. Range b. Standard deviation i. Formula ii. Notation for samples and population 2) The 95% rule

More information

6 THE NORMAL DISTRIBUTION

6 THE NORMAL DISTRIBUTION CHAPTER 6 THE NORMAL DISTRIBUTION 341 6 THE NORMAL DISTRIBUTION Figure 6.1 If you ask enough people about their shoe size, you will find that your graphed data is shaped like a bell curve and can be described

More information

Vocabulary: Samples and Populations

Vocabulary: Samples and Populations Vocabulary: Samples and Populations Concept Different types of data Categorical data results when the question asked in a survey or sample can be answered with a nonnumerical answer. For example if we

More information

Slide 1. Slide 2. Slide 3. Pick a Brick. Daphne. 400 pts 200 pts 300 pts 500 pts 100 pts. 300 pts. 300 pts 400 pts 100 pts 400 pts.

Slide 1. Slide 2. Slide 3. Pick a Brick. Daphne. 400 pts 200 pts 300 pts 500 pts 100 pts. 300 pts. 300 pts 400 pts 100 pts 400 pts. Slide 1 Slide 2 Daphne Phillip Kathy Slide 3 Pick a Brick 100 pts 200 pts 500 pts 300 pts 400 pts 200 pts 300 pts 500 pts 100 pts 300 pts 400 pts 100 pts 400 pts 100 pts 200 pts 500 pts 100 pts 400 pts

More information

Econ 325: Introduction to Empirical Economics

Econ 325: Introduction to Empirical Economics Econ 325: Introduction to Empirical Economics Chapter 9 Hypothesis Testing: Single Population Ch. 9-1 9.1 What is a Hypothesis? A hypothesis is a claim (assumption) about a population parameter: population

More information

Section 3.2 Measures of Central Tendency

Section 3.2 Measures of Central Tendency Section 3.2 Measures of Central Tendency 1 of 149 Section 3.2 Objectives Determine the mean, median, and mode of a population and of a sample Determine the weighted mean of a data set and the mean of a

More information

Topic 3: Introduction to Statistics. Algebra 1. Collecting Data. Table of Contents. Categorical or Quantitative? What is the Study of Statistics?!

Topic 3: Introduction to Statistics. Algebra 1. Collecting Data. Table of Contents. Categorical or Quantitative? What is the Study of Statistics?! Topic 3: Introduction to Statistics Collecting Data We collect data through observation, surveys and experiments. We can collect two different types of data: Categorical Quantitative Algebra 1 Table of

More information

Announcements. Final exam, Saturday 9AM to Noon, usual classroom cheat sheet (1 page, front&back) + calculator

Announcements. Final exam, Saturday 9AM to Noon, usual classroom cheat sheet (1 page, front&back) + calculator Announcements Announcements FINAL REVIEW: UNITS 1-7 STATISTICS 101 Nicole Dalzell August 7, 2014 Final exam, Saturday 9AM to Noon, usual classroom cheat sheet (1 page, front&back) + calculator Check grades

More information

Chapter 4.notebook. August 30, 2017

Chapter 4.notebook. August 30, 2017 Sep 1 7:53 AM Sep 1 8:21 AM Sep 1 8:21 AM 1 Sep 1 8:23 AM Sep 1 8:23 AM Sep 1 8:23 AM SOCS When describing a distribution, make sure to always tell about three things: shape, outliers, center, and spread

More information

Chapter 1. Looking at Data

Chapter 1. Looking at Data Chapter 1 Looking at Data Types of variables Looking at Data Be sure that each variable really does measure what you want it to. A poor choice of variables can lead to misleading conclusions!! For example,

More information

QUANTITATIVE DATA. UNIVARIATE DATA data for one variable

QUANTITATIVE DATA. UNIVARIATE DATA data for one variable QUANTITATIVE DATA Recall that quantitative (numeric) data values are numbers where data take numerical values for which it is sensible to find averages, such as height, hourly pay, and pulse rates. UNIVARIATE

More information

Describing Distributions with Numbers

Describing Distributions with Numbers Describing Distributions with Numbers Using graphs, we could determine the center, spread, and shape of the distribution of a quantitative variable. We can also use numbers (called summary statistics)

More information

Comparing Measures of Central Tendency *

Comparing Measures of Central Tendency * OpenStax-CNX module: m11011 1 Comparing Measures of Central Tendency * David Lane This work is produced by OpenStax-CNX and licensed under the Creative Commons Attribution License 1.0 1 Comparing Measures

More information

Sections 2.3 and 2.4

Sections 2.3 and 2.4 1 / 24 Sections 2.3 and 2.4 Note made by: Dr. Timothy Hanson Instructor: Peijie Hou Department of Statistics, University of South Carolina Stat 205: Elementary Statistics for the Biological and Life Sciences

More information

First we look at some terms to be used in this section.

First we look at some terms to be used in this section. 8 Hypothesis Testing 8.1 Introduction MATH1015 Biostatistics Week 8 In Chapter 7, we ve studied the estimation of parameters, point or interval estimates. The construction of CI relies on the sampling

More information

Determining the Spread of a Distribution

Determining the Spread of a Distribution Determining the Spread of a Distribution 1.3-1.5 Cathy Poliak, Ph.D. cathy@math.uh.edu Department of Mathematics University of Houston Lecture 3-2311 Lecture 3-2311 1 / 58 Outline 1 Describing Quantitative

More information

The Empirical Rule, z-scores, and the Rare Event Approach

The Empirical Rule, z-scores, and the Rare Event Approach Overview The Empirical Rule, z-scores, and the Rare Event Approach Look at Chebyshev s Rule and the Empirical Rule Explore some applications of the Empirical Rule How to calculate and use z-scores Introducing

More information

Determining the Spread of a Distribution

Determining the Spread of a Distribution Determining the Spread of a Distribution 1.3-1.5 Cathy Poliak, Ph.D. cathy@math.uh.edu Department of Mathematics University of Houston Lecture 3-2311 Lecture 3-2311 1 / 58 Outline 1 Describing Quantitative

More information

Last Lecture. Distinguish Populations from Samples. Knowing different Sampling Techniques. Distinguish Parameters from Statistics

Last Lecture. Distinguish Populations from Samples. Knowing different Sampling Techniques. Distinguish Parameters from Statistics Last Lecture Distinguish Populations from Samples Importance of identifying a population and well chosen sample Knowing different Sampling Techniques Distinguish Parameters from Statistics Knowing different

More information

Preliminary Statistics Lecture 5: Hypothesis Testing (Outline)

Preliminary Statistics Lecture 5: Hypothesis Testing (Outline) 1 School of Oriental and African Studies September 2015 Department of Economics Preliminary Statistics Lecture 5: Hypothesis Testing (Outline) Gujarati D. Basic Econometrics, Appendix A.8 Barrow M. Statistics

More information

Clinical Research Module: Biostatistics

Clinical Research Module: Biostatistics Clinical Research Module: Biostatistics Lecture 1 Alberto Nettel-Aguirre, PhD, PStat These lecture notes based on others developed by Drs. Peter Faris, Sarah Rose Luz Palacios-Derflingher and myself Who

More information

Descriptive statistics

Descriptive statistics Patrick Breheny February 6 Patrick Breheny to Biostatistics (171:161) 1/25 Tables and figures Human beings are not good at sifting through large streams of data; we understand data much better when it

More information

Review: Central Measures

Review: Central Measures Review: Central Measures Mean, Median and Mode When do we use mean or median? If there is (are) outliers, use Median If there is no outlier, use Mean. Example: For a data 1, 1.2, 1.5, 1.7, 1.8, 1.9, 2.3,

More information

Lecture 1: Descriptive Statistics

Lecture 1: Descriptive Statistics Lecture 1: Descriptive Statistics MSU-STT-351-Sum 15 (P. Vellaisamy: MSU-STT-351-Sum 15) Probability & Statistics for Engineers 1 / 56 Contents 1 Introduction 2 Branches of Statistics Descriptive Statistics

More information

Chapter2 Description of samples and populations. 2.1 Introduction.

Chapter2 Description of samples and populations. 2.1 Introduction. Chapter2 Description of samples and populations. 2.1 Introduction. Statistics=science of analyzing data. Information collected (data) is gathered in terms of variables (characteristics of a subject that

More information

Do students sleep the recommended 8 hours a night on average?

Do students sleep the recommended 8 hours a night on average? BIEB100. Professor Rifkin. Notes on Section 2.2, lecture of 27 January 2014. Do students sleep the recommended 8 hours a night on average? We first set up our null and alternative hypotheses: H0: μ= 8

More information

Chapter 4. Displaying and Summarizing. Quantitative Data

Chapter 4. Displaying and Summarizing. Quantitative Data STAT 141 Introduction to Statistics Chapter 4 Displaying and Summarizing Quantitative Data Bin Zou (bzou@ualberta.ca) STAT 141 University of Alberta Winter 2015 1 / 31 4.1 Histograms 1 We divide the range

More information

ECO220Y Review and Introduction to Hypothesis Testing Readings: Chapter 12

ECO220Y Review and Introduction to Hypothesis Testing Readings: Chapter 12 ECO220Y Review and Introduction to Hypothesis Testing Readings: Chapter 12 Winter 2012 Lecture 13 (Winter 2011) Estimation Lecture 13 1 / 33 Review of Main Concepts Sampling Distribution of Sample Mean

More information

Hypothesis testing. Data to decisions

Hypothesis testing. Data to decisions Hypothesis testing Data to decisions The idea Null hypothesis: H 0 : the DGP/population has property P Under the null, a sample statistic has a known distribution If, under that that distribution, the

More information

Analysis of Variance. Contents. 1 Analysis of Variance. 1.1 Review. Anthony Tanbakuchi Department of Mathematics Pima Community College

Analysis of Variance. Contents. 1 Analysis of Variance. 1.1 Review. Anthony Tanbakuchi Department of Mathematics Pima Community College Introductory Statistics Lectures Analysis of Variance 1-Way ANOVA: Many sample test of means Department of Mathematics Pima Community College Redistribution of this material is prohibited without written

More information

CIVL /8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8

CIVL /8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8 CIVL - 7904/8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8 Chi-square Test How to determine the interval from a continuous distribution I = Range 1 + 3.322(logN) I-> Range of the class interval

More information