READ YOUR SYLLABUS WHICH IS ALSO POSTED ON THE CLASS PAGE!!!!!!!

Size: px
Start display at page:

Download "READ YOUR SYLLABUS WHICH IS ALSO POSTED ON THE CLASS PAGE!!!!!!!"

Transcription

1 Text Book : Moore, McCabe and Craig, Introduction to the Practice of Statistics, 6 th ed. Class Web Page: Choose STT421. READ YOUR SYLLABUS WHICH IS ALSO POSTED ON THE CLASS PAGE!!!!!!!

2 Why are you taking this course? a) It is required. b) I am so bored that I enrolled the first course that I came across. c) I heard that statistics can lie and I just wanted to check if it was true. d) The name is funny and interesting. e) Is it the statistics class that I am in? I will definitely drop this course as soon as this boring class ends. f) I am aware of the fact that the science of statistics would really help me in my daily life let alone for my profession.

3 It is easy to lie with statistics. It is hard to tell the truth without it. Andrejs Dunkels

4 Looking at Data - Distributions Displaying Distributions with Graphs IPS Chapter W.H. Freeman and Company

5 Objectives (IPS Chapter 1.1) Displaying distributions with graphs Variables Types of variables Graphs for categorical variables Bar graphs Pie charts Graphs for quantitative variables Histograms Stemplots Stemplots versus histograms Interpreting histograms Time plots

6 What Is Statistics? 1. Collecting Data e.g., Survey 2. Presenting Data e.g., Charts & Tables 3. Characterizing Data e.g., Average Data Analysis T/Maker Co. Why? Decision- Making T/Maker Co.

7 What Is Statistics? Statistics is the science of data. It involves collecting, classifying, summarizing, organizing, analyzing, and interpreting numerical information.

8 Application Areas Economics Forecasting Demographics Engineering Construction Materials Sports Individual & Team Performance Business Consumer Preferences Financial Trends

9 Statistical Methods Statistical Methods Descriptive Statistics Inferential Statistics

10 Descriptive Statistics 1. Involves Collecting Data Presenting Data Characterizing Data 2. Purpose Describe Data $ Q1 Q2 Q3 Q4 X = 30.5 S 2 = 113

11 Inferential Statistics 1. Involves Estimation Hypothesis Testing Population? 2. Purpose Make decisions about population characteristics

12 What is a Statistical Inference? It is an estimate or prediction or some generalization about a population based on information contained in a sample.

13 Fundamental Elements Experimental unit(cases or individuals) Object upon which we collect data Example: people, animals, plants, or any object of interest. Population All items of interest Variable Characteristic of an individual experimental unit A variable varies among individuals. Example: age, height, blood pressure, ethnicity, leaf length, first language The distribution of a variable tells us what values the variable takes and how often it takes these values. Sample Subset of the units of a population

14 Two types of variables Variables can be either quantitative Something that takes numerical values for which arithmetic operations, such as adding and averaging, make sense. Example: How tall you are, your age, your blood cholesterol level, the number of credit cards you own. or categorical (qualitative). Something that falls into one of several categories. What can be counted is the count or proportion of individuals in each category. Example: Your blood type (A, B, AB, O), your hair color, your ethnicity, whether you paid income tax last tax year or not.

15 How do you know if a variable is categorical or quantitative? Ask: What are the n individuals/units in the sample (of size n )? What is being recorded about those n individuals/units? Is that a number ( quantitative) or a statement ( categorical)? Categorical Each individual is assigned to one of several categories. Quantitative Each individual is attributed a numerical value. Individuals in sample DIAGNOSIS AGE AT DEATH Patient A Heart disease 56 Patient B Stroke 70 Patient C Stroke 75 Patient D Lung cancer 60 Patient E Heart disease 80 Patient F Accident 73 Patient G Diabetes 69

16 Ways to chart categorical data Summary Table Bar graphs Pie chart

17 Summary table Lists categories & number of elements in category Obtained by tallying responses in category May show the class frequencies and/or relative frequencies

18 Example: Top 10 causes of death in the United States 2001 Rank Causes of death Counts % of top 10s % of total deaths 1 Heart disease 700,142 37% 29% 2 Cancer 553,768 29% 23% 3 Cerebrovascular 163,538 9% 7% 4 Chronic respiratory 123,013 6% 5% 5 Accidents 101,537 5% 4% 6 Diabetes mellitus 71,372 4% 3% 7 Flu and pneumonia 62,034 3% 3% 8 Alzheimer s disease 53,852 3% 2% 9 Kidney disorders 39,480 2% 2% 10 Septicemia 32,238 2% 1% All other causes 629,967 21% For each individual who died in the United States in 2001, we record what was the cause of death. The table above is a summary of that information.

19 Bar graphs Each category is represented by one bar. The bar s height shows the count (or sometimes the percentage) for that particular category Top 10 causes of deaths in the United States 2001 The number of individuals who died of an accident in 2001 is approximately 100,000. Heart diseases Cancers Cerebrovascular Chronic respiratory Accidents Diabetes mellitus Flu & pneumonia Alzheimer's disease Kidney disorders Septicemia Counts (x1000)

20 Septicemia Cerebrovascular Chronic respiratory Accidents Diabetes mellitus Flu & pneumonia Alzheimer's disease Kidney disorders Septicemia Cancers Heart diseases Top 10 causes of deaths in the United States 2001 Bar graph sorted by rank Easy to analyze Sorted alphabetically Much less useful Accidents Alzheimer's disease Cancers Cerebrovascular Chronic respiratory Diabetes mellitus Flu & pneumonia Heart diseases Kidney disorders Counts (x1000) Counts (x1000)

21 Pie charts Each slice represents a piece of one whole. The size of a slice depends on what percent of the whole this category represents. Angle size:(360 ) (9%) = 32.4 Make sure all percents add up to 100. Percent of people dying from top 10 causes of death in the United States in 2000

22 Bar Chart vs. Pie Chart Bar chart is used more often to represent the actual values while pie chart is used to represent relative proportions (in %). When comparison of relative proportion is important, pie chart is more appropriate. When the absolute counts or values are more important, a bar chart should be used.

23 Ways to chart quantitative data Histograms and stem-and-leaf plots (or stemplots) These are summary graphs for a single variable. They are very useful to understand the pattern of variability in the data. Line graphs: time plots Use when there is a meaningful sequence, like time. The line connecting the points helps emphasize any change over time.

24 Histograms The range of values that a variable can take is divided into equal size intervals. The histogram shows the number of individual data points that fall in each interval. The first column represents all states with a Hispanic percent in their population between 0% and 4.99%. The height of the column shows how many states (27) have a percent in this range. The last column represents all states with a Hispanic percent in their population between 40% and 44.99%. There is only one such state: New Mexico, at 42.1% Hispanics.

25 Stem plots How to make a stemplot: STEM LEAVES 1) Separate each observation into a stem, consisting of all but the final (rightmost) digit, and a leaf, which is that remaining final digit. Stems may have as many digits as needed, but each leaf contains only a single digit. 2) Write the stems in a vertical column with the smallest value at the top, and draw a vertical line at the right of this column. 3) Write each leaf in the row to the right of its stem, in increasing order out from the stem.

26 Example Suppose we have the following data on weights (in lb) of 17 school-kids:

27 How do they work? Sorted data: Stem Leaf key: 6 3 = 63 leaf unit: 1.0 stem unit: 10.0

28 S tate Percent A labam a 1.5 A laska 4.1 A rizona 25.3 A rkansas 2.8 C alifornia 32.4 C olorado 17.1 C onnecticut 9.4 D elaw are 4.8 Florida 16.8 G eorgia 5.3 H aw aii 7.2 Idaho 7.9 Illinois 10.7 Indiana 3.5 Iow a 2.8 K ansas 7 K entucky 1.5 Louis iana 2.4 M aine 0.7 M aryland 4.3 M ass achusett 6.8 M ic higan 3.3 M innesota 2.9 M ississippi 1.3 M is souri 2.1 M ontana 2 N ebras ka 5.5 N evada 19.7 N ew H am pshir 1.7 N ew Jersey 13.3 N ew M exico 42.1 N ew York 15.1 N orthc arolina 4.7 N orthd akota 1.2 O hio 1.9 O klahom a 5.2 O regon 8 P enns ylvania 3.2 R hodeis land 8.7 S outhc arolina 2.4 S outhd akota 1.4 Tenness ee 2 Texas 32 U tah 9 V erm ont 0.9 V irginia 4.7 W ashington 7.2 W estv irginia 0.7 W isconsin 3.6 W yom ing 6.4 Step 1: Sort the data S ta te P e rc ent M a in e 0.7 W estv irg in ia 0.7 V erm ont 0.9 N orth D akota 1.2 M is s is sip pi 1.3 S outh D akota 1.4 A la bam a 1.5 K entu c k y 1.5 N ew H a m pshir 1.7 O h io 1.9 M o nta na 2 Te nne s se e 2 M is s ou ri 2.1 Lo uis ia na 2.4 S outh C arolina 2.4 A rk an s as 2.8 Iow a 2.8 M inn esota 2.9 P enn s ylvania 3.2 M ic higa n 3.3 Indian a 3.5 W is c o nsin 3.6 A la s ka 4.1 M a ryland 4.3 N orth C arolina 4.7 V irginia 4.7 D elaw are 4.8 O k lah om a 5.2 G e orgia 5.3 N eb ra s k a 5.5 W yom in g 6.4 M a s s achu se tt 6.8 K ansas 7 H aw aii 7.2 W ashing ton 7.2 Idah o 7.9 O reg on 8 R ho deis lan d 8.7 U ta h 9 C on nectic u t 9.4 Illinois 10.7 N ew J ers ey 13.3 N ew Yo rk 15.1 Flo rida 16.8 C olorad o 17.1 N eva da 19.7 A rizo na 25.3 Te xas 32 C alifornia 32.4 N ew M exic o 42.1 Step 2: Assign the values to stems and leaves Percent of Hispanic residents in each of the 50 states

29 Stem Plot To compare two related distributions, a back-to-back stem plot with common stems is useful. Stem plots do not work well for large datasets. When the observed values have too many digits, trim the numbers before making a stem plot. When plotting a moderate number of observations, you can split each stem.

30 Stemplots versus histograms Unlike histograms, stem-and-leaf displays retain the original data. Stemplots are quick and dirty histograms that can easily be done by hand. However, they are rarely found in scientific or laymen publications.

31 Interpreting histograms When describing the distribution of a quantitative variable, we look for the overall pattern and for striking deviations from that pattern. We can describe the overall pattern of a histogram by its shape, center, and spread. Histogram with a line connecting each column too detailed Histogram with a smoothed curve highlighting the overall pattern of the distribution

32 Most common distribution shapes Symmetric distribution A distribution is symmetric if the right and left sides of the histogram are approximately mirror images of each other. A distribution is skewed to the right if the right side of the histogram (side with larger values) extends much farther out than the left side. It is skewed to the left if the left side of the histogram extends much farther out than the right side. Skewed distribution

33 Outliers An important kind of deviation is an outlier. Outliers are observations that lie outside the overall pattern of a distribution. Always look for outliers and try to explain them. The overall pattern is fairly symmetrical except for 2 states that clearly do not belong to the main trend. Alaska and Florida have unusual representation of the elderly in their population. Alaska Florida A large gap in the distribution is typically a sign of an outlier.

34 A Few Questions How to choose the bin size? Let the computer decide it for you.

35 Line graphs: time plots In a time plot, time always goes on the horizontal, x axis. We describe time series by looking for an overall pattern and for striking deviations from that pattern. In a time series: A trend is a rise or fall that persists over time, despite small irregularities. A pattern that repeats itself at regular intervals of time is called seasonal variation.

36 Retail price of fresh oranges over time Time is on the horizontal, x axis. The variable of interest here retail price of fresh oranges goes on the vertical, y axis. This time plot shows a regular pattern of yearly variations. These are seasonal variations in fresh orange pricing most likely due to similar seasonal variations in the production of fresh oranges. There is also an overall upward trend in pricing over time. It could simply be reflecting inflation trends or a more fundamental change in this industry.

37 A time plot can be used to compare two or more data sets covering the same time period influenza epidemic Date # Cases # Deaths week week week week week week week week week week week week week week week week week # cases diagnosed week 1 week 3 week 5 week 7 week 9 week 11 week 13 # Cases # Deaths week 15 week # deaths reported The pattern over time for the number of flu diagnoses closely resembles that for the number of deaths from the flu, indicating that about 8% to 10% of the people diagnosed that year died shortly afterward, from complications of the flu.

38 Scales matter How you stretch the axes and choose your scales can give a different impression. Death rate (per thousand) Death rates f rom cancer (US, ) Years Death rate (per thousand) Death rates from cancer (US, ) Years Death rates from cancer (US, ) Death rate (per thousand) Death rate (per thousand) Death rates from cancer (US, ) A picture is worth a thousand words, BUT There is nothing like hard numbers. Look at the scales. Years Years

39 Looking at Data - Distributions Describing distributions with numbers IPS Chapter W.H. Freeman and Company

40 Objectives (IPS Chapter 1.2) Describing distributions with numbers Measures of center: mean, median, mode Mean versus median Measures of spread: quartiles, standard deviation Five-number summary and boxplot Choosing among summary statistics Changing the unit of measurement

41 Two Characteristics The central tendency of the set of measurements that is, the tendency of the data to cluster, or center, about certain numerical values. Central Tendency (Location) Center

42 Two Characteristics The variability (spread) of the set of measurements that is, the spread of the data. Variation (Dispersion) Sample B Sample A Variation of Sample B Variation of Sample A

43 Measure of center: the mean The mean or arithmetic average To calculate the average, or mean, add all values, then divide by the number of individuals. It is the center of mass. Sum of heights is divided by 25 women = 63.9 inches

44 woman (i) height (x) woman (i) height (x) i = 1 x 1 = 58.2 i = 14 x 14 = 64.0 i = 2 x 2 = 59.5 i = 15 x 15 = 64.5 i = 3 x 3 = 60.7 i = 16 x 16 = 64.1 i = 4 x 4 = 60.9 i = 17 x 17 = 64.8 i = 5 x 5 = 61.9 i = 18 x 18 = 65.2 i = 6 x 6 = 61.9 i = 19 x 19 = 65.7 i = 7 x 7 = 62.2 i = 20 x 20 = 66.2 i = 8 x 8 = 62.2 i = 21 x 21 = 66.7 i = 9 x 9 = 62.4 i = 22 x 22 = 67.1 i = 10 x 10 = 62.9 i = 23 x 23 = 67.8 i = 11 x 11 = 63.9 i = 24 x 24 = 68.9 i = 12 x 12 = 63.1 i = 25 x 25 = 69.6 Mathematical notation: x x = = x 1 n 1 + n i= 1 x 2 xi n x = x n = 63.9 i = 13 x 13 = 63.9 n= 25 Σ= Learn right away how to get the mean using your calculators.

45 Your numerical summary must be meaningful. The distribution of women s heights appears coherent and symmetrical. The mean is a good numerical summary. x = 63.9

46 Measure of center: the median The median is the midpoint of a distribution the number such that half of the observations are smaller and half are larger Sort observations by size. n = number of observations 2.a. If n is odd, the median is observation (n+1)/2 down the list n = 25 (n+1)/2 = 26/2 = 13 Median = b. If n is even, the median is the mean of the two middle observations. n = 24 n/2 = 12 Median = ( ) /2 =

47 Mode 1. Measure of central tendency 2. Value that occurs most often 3. Not affected by extreme values 4. May be no mode or several modes 5. May be used for quantitative or qualitative data

48 Mode Example No Mode Raw Data: One Mode Raw Data: More Than 1 Mode Raw Data:

49 Modes The peaks of a histogram are called modes. A distribution is unimodal if it has one mode, bimodal if it has two modes, multimodal if it has three or more modes.

50 Unimodal, Bimodal or Multimodal? Unimodal Bimodal Multimodal

51 Uniform Histogram A histogram that doesn t appear to have any mode. All the bars are approximately the same.

52 Left-Skewed Mean Median Mode Symmetric Mean = Median= Mode Right-Skewed Mode Median Mean

53 Mean and median of a distribution with outliers x = 3.4 x = 4.2 Percent of people dying Without the outliers With the outliers The mean is pulled to the right a lot by the outliers (from 3.4 to 4.2). The median, on the other hand, is only slightly pulled to the right by the outliers (from 3.4 to 3.6).

54 Impact of skewed data Symmetric distribution Disease X: x = 3.4 M = 3.4 Mean and median are the same. and a right-skewed distribution Multiple myeloma: x = 3.4 M = 2.5 The mean is pulled toward the skew.

55 Spread of a Distribution Are the values concentrated around the center of the distribution or they are spread out? Range, Interquartile Range, Variance, Standard Deviation. Note: Variance and standard deviation are more appropriate when the distribution is symmetric.

56 Range Range of the data is defined as the difference between the maximum and the minimum values. Data: 23, 21, 67, 44, 51, 12, 35. Range = maximum minimum = = 55. Disadvantage: A single extreme value can make it very large, giving a value that does not really represent the data overall. On the other hand, it is not affected at all if some observation changes in the middle.

57 Interquartile Range (IQR) What is IQR? IQR = Third Quartile (Q 3 ) First Quartile (Q 1 ). What are quartiles? Recall: Median divides the data into 2 equal halves. The first quartile, median and the third quartile divide the data into 4 roughly equal parts.

58 The quartiles The first quartile, Q 1, is the value in the sample that has 25% of the data at or below it ( it is the median of the lower half of the sorted data, excluding M). M = median = 3.4 The third quartile, Q 3, is the value in the sample that has 75% of the data at or below it ( it is the median of the upper half of the sorted data, excluding M) Q 1 = first quartile = 2.2 Q 3 = third quartile = 4.35

59 Outlier-sensitivity Data: 10, 13, 17, 21, 28, 32 Without the outlier IQR = 15 Range = 22 Data: 10, 13, 17, 21, 28, 32, 59 With the outlier IQR = 19 Range = 49 Conclusion: IQR is less outlier-sensitive than range.

60 Five-number summary and boxplot Upper whisker=largest = max. = 6.1 Q 3 = third quartile = 4.35 M = median = 3.4 Q 1 = first quartile = 2.2 Years until death Lower whisker = Smallest = min = BOXPLOT Disease X Five-number summary: min Q 1 M Q 3 max

61 Data must be ordered from lowest value to highest value before finding the 5 number summary. Upper whisker of the box-plot extends to the largest value within the upper inner fence (Q3+1.5*IQR) Lower whisker of the box-plot extends to the smallest value within the lower inner fence (Q1-1.5*IQR)

62 Boxplots for skewed data Years until death Comparing box plots for a normal and a right-skewed distribution Disease X Multiple Myeloma Boxplots remain true to the data and depict clearly symmetry or skew.

63 Suspected outliers Outliers are troublesome data points, and it is important to be able to identify them. One way to raise the flag for a suspected outlier is to compare the distance from the suspicious data point to the nearest quartile (Q 1 or Q 3 ). We then compare this distance to the interquartile range (distance between Q 1 and Q 3 ). We call an observation a suspected outlier if it falls more than 1.5 times the size of the interquartile range (IQR) above the first quartile or below the third quartile. This is called the 1.5 * IQR rule for outliers.

64 Q 3 = 4.35 Q 1 = 2.2 Years until death Disease X Distance to Q = 3.55 Interquartile range Q 3 Q = 2.15 Individual #25 has a value of 7.9 years, which is 3.55 years above the third quartile. This is more than years, 1.5 * IQR. Thus, individual #25 is a suspected outlier.

65 Histogram vs. Box plot Both histogram and box plot capture the symmetry or skewness of distributions. Box plot cannot indicate the modality of the data. Box plot is much better in finding outliers. The shape of histogram depends to some extent on the choice of bins.

66 Which type of car has the largest median Time to accelerate? A. upscale B. sports C. small D. large E. family

67 Which type of car always take less than 3.6 seconds to accelerate? A. upscale B. sports C. small D. Large E. Luxury

68 Which type of car has the smallest IQR for Time to accelerate? A. upscale B. sports C. small D. Large E. Luxury

69 What is the shape of the distribution of acceleration times for luxury cars? A. Left skewed B. Right skewed C. Roughly symmetric D. Cannot be determined from the information given.

70 What percent of luxury cars accelerate to 30 mph in less than 3.5 seconds? A. Roughly 25% B. Exactly 37.5% C. Roughly 50% D. Roughly 75% E. Cannot be determined from the information given

71 What percent of family cars accelerate to 30 mph in less than 3.5 seconds? A. Less than 25% B. More than 50% C. Less than 50% D. Exactly 75% E. None of the above

72 Measure of spread: the standard deviation The standard deviation s is used to describe the variation around the mean. Like the mean, it is not resistant to skew or outliers. 1. First calculate the variance s 2. s 2 n 1 = ( x n 1 1 i x) 2 x 2. Then take the square root to get the standard deviation s. Mean ± 1 s.d. s n 1 = ( x n 1 1 i x) 2

73 Calculations Women height (inches) i x i x (x i -x) (x i -x) s n 1 = ( x df 1 i x) Mean = 63.4 Sum of squared deviations from mean = 85.2 Degrees freedom (df) = (n 1) = 13 s 2 = variance = 85.2/13 = 6.55 inches squared s = standard deviation = 6.55 = 2.56 inches Mean 63.4 Sum 0.0 Sum 85.2 We ll never calculate these by hand, so make sure to know how to get the standard deviation using your calculator or software.

74 Properties of Standard Deviation s measures spread about the mean and should be used only when the mean is the measure of center. s = 0 only when all observations have the same value and there is no spread. Otherwise, s > 0. s is not resistant to outliers. s has the same units of measurement as the original observations.

75 Choosing among summary statistics Because the mean is not resistant to outliers or skew, use it to describe distributions that are fairly symmetrical and don t have outliers. Plot the mean and use the standard deviation for error bars. Otherwise use the median in the five number summary which can be plotted as a boxplot. H eig h t in In ch es Height of 30 Women Boxplot Plot Mean ± +/- SD

76 Changing the unit of measurement Variables can be recorded in different units of measurement. Most often, one measurement unit is a linear transformation of another measurement unit: x new = a + bx. Temperatures can be expressed in degrees Fahrenheit or degrees Celsius. Temperature Fahrenheit = 32 + (9/5)* Temperature Celsius a + bx. Linear transformations do not change the basic shape of a distribution (skew, symmetry, multimodal). But they do change the measures of center and spread: Multiplying each observation by a positive number b multiplies both measures of center (mean, median) and spread (IQR, s) by b. Adding the same number a (positive or negative) to each observation adds a to measures of center and to quartiles but it does not change measures of spread (IQR, s).

77 Temperature data (in F): 10, 13, 18, 22, 29 Range = 19 F, IQR =14 F, s = 7.5 F, s 2 = F 2. Suppose transformed data = (-3)*original data. So transformed data (in F): -30, -39, -54, -66, -87 Range = -3 *19 = 57 F, IQR = -3 *14 = 42 F, s = -3 * 7.5 = F, s 2 = (-3) 2 *56.25 = F 2.

78 Temperature data (in F): 10, 13, 18, 22, 29 Range = 19 F, IQR =14 F, s = 7.5 F, s 2 = F 2. Suppose transformed data = original data Hence transformed data (in F): 12.5, 15.5, 20.5, 24.5, 31.5 Range = 19 F, IQR =14 F, s = 7.5 F, s 2 = F 2.

79 Software output for summary statistics: Excel - From Menu: Tools Data Analysis Descriptive Statistics Give common statistics of your sample data. Minitab

80 MINITAB For graphs From the Menu Graphs Histogram Graphs Stem-and-Leaf Graphs Box-Plot For measures of central tendency and spread Stat Basic Statistics Display Descriptive Statistics

81 Comparing Distributions Use of Stem-and-Leaf Display and 5 number summary

82 Comparing Distributions: Use of Stem-and-Leaf Plot East Lansing Gas Prices Ann Arbor Gas Prices

83 Ann Arbor Gas Prices 24 9 = $2.49 Gas prices in Ann Arbor are unimodal with right skew. They are between $2.63 and $2.96 per gallon, centered at about $2.75 per gallon East Lansing Gas Prices Gas prices tend to be higher in Ann Arbor and have more variability. The range of prices in Ann Arbor is 33 and in East Lansing is only 27 Gas prices in East Lansing are roughly uniform between $2.49 and $2.76 per gallon, centered at $2.60 per gallon

84 Ann Arbor Gas Prices 24 9 = $2.49 East Lansing Gas Prices 5-Number Summary Min Q1 Median Q3 Max Ann Arbor Gas Prices $2.63 $2.69 $2.72 $2.93 $2.96 East Lansing Gas Prices $2.49 $2.53 $2.60 $2.69 $2.76

85 Comparing Distributions Use of Histograms

86 Which data have more variability? A FREQUENCY B SCORE A. Graph A B. Graph B SCORE C. Both have the same variability

87 Which data have more variability? A 6 6 B SCORE SCORE A. Graph A B. Graph B C. Both have the same variability

88 Which data have a higher median? A 6 6 B SCORE SCORE A. Graph A B. Graph B C. Both have the same median 88

89 Which data have more variability? A FREQUENCY B SCORE A. Graph A B. Graph B SCORE C. Roughly, both have the same variability 89

90 Looking at Data - Distributions Density Curves and Normal Distributions IPS Chapter W.H. Freeman and Company

91 Objectives (IPS Chapter 1.3) Density curves and Normal distributions Density curves Measuring center and spread for density curves Normal distributions The rule Standardizing observations Using the standard Normal Table Inverse Normal calculations Normal quantile plots

92 Density curves A density curve is a mathematical model of a distribution. The total area under the curve, by definition, is equal to 1, or 100%. The area under the curve for a range of values is the proportion of all observations for that range. Histogram of a sample with the smoothed, density curve describing theoretically the population.

93 Density curves come in any imaginable shape. Some are well known mathematically and others aren t.

94 Median and mean of a density curve The median of a density curve is the equal-areas point: the point that divides the area under the curve in half. The mean of a density curve is the balance point, at which the curve would balance if it were made of solid material. The median and mean are the same for a symmetric density curve. The mean of a skewed curve is pulled in the direction of the long tail.

95 Normal distributions Normal or Gaussian distributions are a family of symmetrical, bellshaped density curves defined by a mean µ (mu) and a standard deviation σ (sigma) : N(µ,σ). f ( x) = σ 1 (2 π ) e 1 2 x µ σ 2 x x e = The base of the natural logarithm π = pi =

96 A family of density curves Here, means are the same (µ = 15) while standard deviations are different (σ = 2, 4, and 6) Here, means are different (µ = 10, 15, and 20) while standard deviations are the same (σ = 3)

97 The % Rule for Normal Distributions About 68% of all observations are within 1 standard deviation (σ) of the mean (µ). Inflection point About 95% of all observations are within 2 σ of the mean µ. Almost all (99.7%) observations are within 3 σ of the mean. mean µ = 64.5 standard deviation σ = 2.5 N(µ, σ) = N(64.5, 2.5) Reminder: µ (mu) is the mean of the idealized curve, while is the mean of a sample. σ (sigma) is the standard deviation of the idealized curve, while s is the s.d. of a sample. x

98 The standard Normal distribution Because all Normal distributions share the same properties, we can standardize our data to transform any Normal curve N(µ,σ) into the standard Normal curve N(0,1). N(64.5, 2.5) N(0,1) => x Standardized height (no units) z For each x we calculate a new value, z (called a z-score).

99 How to compare apples with oranges? A college admissions committee is looking at the files of two candidates, one with a total SAT score of 1500 and another with an ACT score of 22. Which candidate scored better? How do we compare things when they are measured on different scales? We need to standardize the values so that they will be free of units.

100 Standardizing: calculating z-scores A z-score measures the number of standard deviations that a data value x is from the mean µ. z = (x µ ) σ When x is 1 standard deviation larger than the mean, then z = 1. µ + σ µ σ for x = µ + σ, z = = =1 σ σ When x is 2 standard deviations larger than the mean, then z = 2. µ + 2σ µ 2σ for x = µ + 2 σ, z = = = σ σ 2 When x is larger than the mean, z is positive. When x is smaller than the mean, z is negative.

101 Interpretation of z-scores The z-scores measure the distance of the data values from the mean in the standard deviation scale. A z-score of 1 means that data value is 1 standard deviation above the mean. A z-score of -1.2 means that data value is 1.2 standard deviations below the mean. Regardless of the direction, the further a data value is from the mean, the more unusual it is. A z-score of -1.3 is more unusual than a z-score of 1.2.

102 z-scores: An Example Data: 4, 3, 10, 12, 8, 9, 3 (n=7 in this case) Mean = ( )/7 = 49/7 =7. Standard Deviation = Original Value z-score (4 7)/3.65 = (3 7)/3.65 = (10 7)/3.65 = (12 7)/3.65 = (8 7)/3.65 = (9 7)/3.65 = (3 7)/3.65 =

103 How to use z-scores? A college admissions committee is looking at the files of two candidates, one with a total SAT score of 1500 and another with an ACT score of 22. Which candidate scored better? SAT score mean = 1600, std dev = 500. ACT score mean = 23, std dev = 6. SAT score 1500 has z-score = ( )/500 = ACT score 22 has z-score = (22-23)/6 = ACT score 22 is better than SAT score 1500.

104 Which is more unusual? A. A 58 in tall woman z-score = ( )/2.5 = B. A 64 in tall man z-score = (64-69)/2.8 = C. They are the same. Heights of adult women have mean of 63.6 in. std. dev. of 2.5 in. Heights of adult men have mean of 69.0 in. std. dev. of 2.8 in.

105 Using z-scores to solve problems An example using height data and U.S. Marine and Army height requirements Question: Are the height restrictions set up by the U.S. Army and U.S. Marine more restrictive for men or women or are they roughly the same?

106 Data from a National Health Survey Heights of adult women have mean of 63.6 in. standard deviation of 2.5 in. Heights of adult men have mean of 69.0 in. standard deviation of 2.8 in. Height Restrictions Men Minimum Women Minimum U.S. Army 60 in 58 in U.S. Marine Corps 64 in 58 in

107 Heights of adult men have mean of 69.0 in. standard deviation of 2.8 in. Men Minimum 60 in Heights of adult women have mean of 63.6 in. standard deviation of 2.5 in. Women minimum 58 in U.S. Army U.S. Marine z-score = Less restrictive 64 in z-score = More restrictive z-score = More restrictive 58 in z-score = Less restrictive

108 Effect of Standardization Standardization into z-scores does not change the shape of the histogram. Standardization into z-scores changes the center of the distribution by making the mean 0. Standardization into z-scores changes the spread of the distribution by making the standard deviation 1.

109 Ex. Women heights Women s heights follow the N(64.5,2.5 ) distribution. What percent of women are shorter than 67 inches tall (that s 5 6 )? Area=??? N(µ, σ) = N(64.5, 2.5) Area =??? mean µ = 64.5" standard deviation σ = 2.5" x (height) = 67" µ = 64.5 x = 67 z = 0 z = 1 We calculate z, the standardized value of x: z ( x µ ) =, z σ = ( ) 2.5 = = 1 => 1 stand. dev. from mean Because of the rule, we can conclude that the percent of women shorter than 67 should be, approximately,.68 + half of (1 -.68) =.84 or 84%.

110 Using the standard Normal table Table A gives the area under the standard Normal curve to the left of any z value is the area under N(0,1) left of z = is the area under N(0,1) left of z = ( ) is the area under N(0,1) left of z = -2.46

111 Percent of women shorter than 67 For z = 1.00, the area under the standard Normal curve to the left of z is N(µ, σ) = N(64.5, 2.5 ) Area 0.84 Conclusion: 84.13% of women are shorter than 67. Area 0.16 By subtraction, , or 15.87% of women are taller than 67". µ = 64.5 x = 67 z = 1

112 Tips on using Table A Because the Normal distribution is symmetrical, there are 2 ways that you can calculate the area under the standard Normal curve to the right of a z value. Area = Area = z = area right of z = area left of -z area right of z = 1 - area left of z

113 Tips on using Table A To calculate the area between 2 z- values, first get the area under N(0,1) to the left for each z-value from Table A. Then subtract the smaller area from the larger area. A common mistake made by students is to subtract both z values. But the Normal curve is not uniform. area between z 1 and z 2 = area left of z 1 area left of z 2 The area under N(0,1) for a single value of z is zero. (Try calculating the area to the left of z minus that same area!)

114 Calculating the area for normal distribution by TI calculators Use TI 83/84 Plus. Press [2nd] & [VARS] (i.e. [DISTR]) Select 2: normalcdf Format of command: normalcdf(lower bound, upper bound, mean, std.dev.) For percentage between a and b normalcdf(a, b, mean, sigma) For area (percentage) below a specific value b normalcdf(-1000, b, mean, sigma) For area (percentage) above any specific value a normalcdf( a, 1000, mean, sigma)

115 Approximately what percent of U.S. women do you expect to be less than 64 in tall? Heights of adult women are normally distributed with mean of 63.6 in, standard deviation of 2.5 in. Note that here upper bound is 64, but there is no mention of lower bound. So take a very small value for lower bound, say For this problem normalcdf(-1000, 64, 63.6, 2.5) = i.e. about 56.4% of adult U.S. women have heights less than 64 in.

116 The National Collegiate Athletic Association (NCAA) requires Division I athletes to score at least 820 on the combined math and verbal SAT exam to compete in their first college year. The SAT scores of 2003 were approximately normal with mean 1026 and standard deviation 209. What proportion of all students would be NCAA qualifiers (SAT 820)? x = 820 µ = 1026 σ = 209 ( x µ ) z = σ ( ) z = z = Table A: area under N(0,1) to the left of z = is or approx. 16%. area right of 820 = total area - area left of 820 = % normalcdf (820,1000,1026,209)

117 The NCAA defines a partial qualifier eligible to practice and receive an athletic scholarship, but not to compete, with a combined SAT score of at least 720. What proportion of all students who take the SAT would be partial qualifiers? That is, what proportion have scores between 720 and 820? x = 720 µ = 1026 σ = 209 ( x µ ) z = σ ( ) z = z = Table A: area under N(0,1) to the left of z = -.99 is or approx. 7%. area between = area left of area left of and 820 = % About 9% of all students who take the SAT have scores between 720 and 820. normalcdf (720,820,1026,209)

118 The cool thing about working with normally distributed data is that we can manipulate it, and then find answers to questions that involve comparing seemingly noncomparable distributions. We do this by standardizing the data. All this involves is changing the scale so that the mean now = 0 and the standard deviation =1. If you do this to different distributions it makes them comparable. z = (x µ ) σ N(0,1)

119 Ex. Gestation time in malnourished mothers What is the effect of better maternal care on gestation time and preemies? The goal is to obtain pregnancies 240 days (8 months) or longer. What improvement did we get by adding better food? µ 250 σ 20 µ 266 σ Gestation time (days) Vitamins only Vitamins and better food

120 Under each treatment, what percent of mothers failed to carry their babies at least 240 days? x = 240 µ = 250 σ = 20 Vitamins Only ( x µ ) z = σ ( ) z = z = = (half a standard deviation) Table A: area under N(0,1) to the left of z = -0.5 is normalcdf(-1000,240,250,20) Gestation time (days) Vitamins only: 30.85% of women would be expected to have gestation times shorter than 240 days. µ=250, σ=20, x=240

121 Vitamins and better food x = 240 µ = 266 σ = 15 ( x µ ) z = σ ( ) z = z = = (almost 2 sd from mean) Table A: area under N(0,1) to the left of z = is normalcdf(-1000,240,266,15) Gestation time (days) Vitamins and better food: 4.18% of women would be expected to have gestation times shorter than 240 days. µ=266, σ=15, x=240 Compared to vitamin supplements alone, vitamins and better food resulted in a much smaller percentage of women with pregnancy terms below 8 months (4% vs. 31%).

122 Inverse normal calculations We may also want to find the observed range of values that correspond to a given proportion/ area under the curve. For that, we use Table A backward: we first find the desired area/ proportion in the body of the table, we then read the corresponding z-value from the left column and top row. For an area to the left of 1.25 % (0.0125), the z-value is -2.24

123 Use TI 83/84 Plus. Press [2nd] & [VARS] (i.e. [DISTR]) Select 3: invnorm Format of command: invnorm(fraction, mean, std.dev.) invnorm only considers percentage or fraction in the lower tail of normal distribution.

124 Vitamins and better food How long are the longest 75% of pregnancies when mothers with malnutrition are given vitamins and better food? µ = 266 σ = 15 upper area = 75% lower area = 25% x =? Table A :z valuefor the lower area 25% under N(0,1)is about ( x µ ) z = x = µ + ( z * σ ) σ x = 266+ ( 0.67*15) x = invnorm(0.25, 266,15)=256 upper 75%? Gestation time (days) µ=266, σ=15, upper area 75% Remember that Table A gives the area to the left of z. Thus, we need to search for the lower 25% in Table A in order to get z. The 75% longest pregnancies in this group are about 256 days or longer.

125 Normal quantile plots One way to assess if a distribution is indeed approximately normal is to plot the data on a normal quantile plot. The data points are ranked and the percentile ranks are converted to z- scores with Table A. The z-scores are then used for the x axis against which the data are plotted on the y axis of the normal quantile plot. If the distribution is indeed normal the plot will show a straight line, indicating a good match between the data and a normal distribution. Systematic deviations from a straight line indicate a non-normal distribution. Outliers appear as points that are far away from the overall pattern of the plot.

126 Good fit to a straight line: the distribution of rainwater ph values is close to normal. Curved pattern: the data are not normally distributed. Instead, it shows a right skew: a few individuals have particularly long survival times. Normal quantile plots are complex to do by hand, but they are standard features in most statistical software.

127 By Minitab Graph-Probability plot-single (For one sample)

Looking at data: distributions - Density curves and Normal distributions. Copyright Brigitte Baldi 2005 Modified by R. Gordon 2009.

Looking at data: distributions - Density curves and Normal distributions. Copyright Brigitte Baldi 2005 Modified by R. Gordon 2009. Looking at data: distributions - Density curves and Normal distributions Copyright Brigitte Baldi 2005 Modified by R. Gordon 2009. Objectives Density curves and Normal distributions!! Density curves!!

More information

STT 315 This lecture is based on Chapter 2 of the textbook.

STT 315 This lecture is based on Chapter 2 of the textbook. STT 315 This lecture is based on Chapter 2 of the textbook. Acknowledgement: Author is thankful to Dr. Ashok Sinha, Dr. Jennifer Kaplan and Dr. Parthanil Roy for allowing him to use/edit some of their

More information

STP 420 INTRODUCTION TO APPLIED STATISTICS NOTES

STP 420 INTRODUCTION TO APPLIED STATISTICS NOTES INTRODUCTION TO APPLIED STATISTICS NOTES PART - DATA CHAPTER LOOKING AT DATA - DISTRIBUTIONS Individuals objects described by a set of data (people, animals, things) - all the data for one individual make

More information

Finding Quartiles. . Q1 is the median of the lower half of the data. Q3 is the median of the upper half of the data

Finding Quartiles. . Q1 is the median of the lower half of the data. Q3 is the median of the upper half of the data Finding Quartiles. Use the median to divide the ordered data set into two halves.. If n is odd, do not include the median in either half. If n is even, split this data set exactly in half.. Q1 is the median

More information

Chapter 2: Tools for Exploring Univariate Data

Chapter 2: Tools for Exploring Univariate Data Stats 11 (Fall 2004) Lecture Note Introduction to Statistical Methods for Business and Economics Instructor: Hongquan Xu Chapter 2: Tools for Exploring Univariate Data Section 2.1: Introduction What is

More information

Elementary Statistics

Elementary Statistics Elementary Statistics Q: What is data? Q: What does the data look like? Q: What conclusions can we draw from the data? Q: Where is the middle of the data? Q: Why is the spread of the data important? Q:

More information

The empirical ( ) rule

The empirical ( ) rule The empirical (68-95-99.7) rule With a bell shaped distribution, about 68% of the data fall within a distance of 1 standard deviation from the mean. 95% fall within 2 standard deviations of the mean. 99.7%

More information

Describing distributions with numbers

Describing distributions with numbers Describing distributions with numbers A large number or numerical methods are available for describing quantitative data sets. Most of these methods measure one of two data characteristics: The central

More information

Chapter 1. Looking at Data

Chapter 1. Looking at Data Chapter 1 Looking at Data Types of variables Looking at Data Be sure that each variable really does measure what you want it to. A poor choice of variables can lead to misleading conclusions!! For example,

More information

What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty.

What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty. What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty. Statistics is a field of study concerned with the data collection,

More information

STAT 200 Chapter 1 Looking at Data - Distributions

STAT 200 Chapter 1 Looking at Data - Distributions STAT 200 Chapter 1 Looking at Data - Distributions What is Statistics? Statistics is a science that involves the design of studies, data collection, summarizing and analyzing the data, interpreting the

More information

Histograms allow a visual interpretation

Histograms allow a visual interpretation Chapter 4: Displaying and Summarizing i Quantitative Data s allow a visual interpretation of quantitative (numerical) data by indicating the number of data points that lie within a range of values, called

More information

Stat 101 Exam 1 Important Formulas and Concepts 1

Stat 101 Exam 1 Important Formulas and Concepts 1 1 Chapter 1 1.1 Definitions Stat 101 Exam 1 Important Formulas and Concepts 1 1. Data Any collection of numbers, characters, images, or other items that provide information about something. 2. Categorical/Qualitative

More information

Describing distributions with numbers

Describing distributions with numbers Describing distributions with numbers A large number or numerical methods are available for describing quantitative data sets. Most of these methods measure one of two data characteristics: The central

More information

Chapter 5: Exploring Data: Distributions Lesson Plan

Chapter 5: Exploring Data: Distributions Lesson Plan Lesson Plan Exploring Data Displaying Distributions: Histograms Interpreting Histograms Displaying Distributions: Stemplots Describing Center: Mean and Median Describing Variability: The Quartiles The

More information

Math 140 Introductory Statistics

Math 140 Introductory Statistics Math 140 Introductory Statistics Professor Silvia Fernández Chapter 2 Based on the book Statistics in Action by A. Watkins, R. Scheaffer, and G. Cobb. Visualizing Distributions Recall the definition: The

More information

Math 140 Introductory Statistics

Math 140 Introductory Statistics Visualizing Distributions Math 140 Introductory Statistics Professor Silvia Fernández Chapter Based on the book Statistics in Action by A. Watkins, R. Scheaffer, and G. Cobb. Recall the definition: The

More information

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency Math 1 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency The word average: is very ambiguous and can actually refer to the mean, median, mode or midrange. Notation:

More information

Chapter 3: The Normal Distributions

Chapter 3: The Normal Distributions Chapter 3: The Normal Distributions http://www.yorku.ca/nuri/econ2500/econ2500-online-course-materials.pdf graphs-normal.doc / histogram-density.txt / normal dist table / ch3-image Ch3 exercises: 3.2,

More information

3.1 Measure of Center

3.1 Measure of Center 3.1 Measure of Center Calculate the mean for a given data set Find the median, and describe why the median is sometimes preferable to the mean Find the mode of a data set Describe how skewness affects

More information

QUANTITATIVE DATA. UNIVARIATE DATA data for one variable

QUANTITATIVE DATA. UNIVARIATE DATA data for one variable QUANTITATIVE DATA Recall that quantitative (numeric) data values are numbers where data take numerical values for which it is sensible to find averages, such as height, hourly pay, and pulse rates. UNIVARIATE

More information

Introduction to Statistics

Introduction to Statistics Introduction to Statistics Data and Statistics Data consists of information coming from observations, counts, measurements, or responses. Statistics is the science of collecting, organizing, analyzing,

More information

What is statistics? Statistics is the science of: Collecting information. Organizing and summarizing the information collected

What is statistics? Statistics is the science of: Collecting information. Organizing and summarizing the information collected What is statistics? Statistics is the science of: Collecting information Organizing and summarizing the information collected Analyzing the information collected in order to draw conclusions Two types

More information

CIVL 7012/8012. Collection and Analysis of Information

CIVL 7012/8012. Collection and Analysis of Information CIVL 7012/8012 Collection and Analysis of Information Uncertainty in Engineering Statistics deals with the collection and analysis of data to solve real-world problems. Uncertainty is inherent in all real

More information

11. The Normal distributions

11. The Normal distributions 11. The Normal distributions The Practice of Statistics in the Life Sciences Third Edition 2014 W. H. Freeman and Company Objectives (PSLS Chapter 11) The Normal distributions Normal distributions The

More information

CHAPTER 1. Introduction

CHAPTER 1. Introduction CHAPTER 1 Introduction Engineers and scientists are constantly exposed to collections of facts, or data. The discipline of statistics provides methods for organizing and summarizing data, and for drawing

More information

MATH 1150 Chapter 2 Notation and Terminology

MATH 1150 Chapter 2 Notation and Terminology MATH 1150 Chapter 2 Notation and Terminology Categorical Data The following is a dataset for 30 randomly selected adults in the U.S., showing the values of two categorical variables: whether or not the

More information

are the objects described by a set of data. They may be people, animals or things.

are the objects described by a set of data. They may be people, animals or things. ( c ) E p s t e i n, C a r t e r a n d B o l l i n g e r 2016 C h a p t e r 5 : E x p l o r i n g D a t a : D i s t r i b u t i o n s P a g e 1 CHAPTER 5: EXPLORING DATA DISTRIBUTIONS 5.1 Creating Histograms

More information

Further Mathematics 2018 CORE: Data analysis Chapter 2 Summarising numerical data

Further Mathematics 2018 CORE: Data analysis Chapter 2 Summarising numerical data Chapter 2: Summarising numerical data Further Mathematics 2018 CORE: Data analysis Chapter 2 Summarising numerical data Extract from Study Design Key knowledge Types of data: categorical (nominal and ordinal)

More information

Lecture 2. Quantitative variables. There are three main graphical methods for describing, summarizing, and detecting patterns in quantitative data:

Lecture 2. Quantitative variables. There are three main graphical methods for describing, summarizing, and detecting patterns in quantitative data: Lecture 2 Quantitative variables There are three main graphical methods for describing, summarizing, and detecting patterns in quantitative data: Stemplot (stem-and-leaf plot) Histogram Dot plot Stemplots

More information

A is one of the categories into which qualitative data can be classified.

A is one of the categories into which qualitative data can be classified. Chapter 2 Methods for Describing Sets of Data 2.1 Describing qualitative data Recall qualitative data: non-numerical or categorical data Basic definitions: A is one of the categories into which qualitative

More information

AP Final Review II Exploring Data (20% 30%)

AP Final Review II Exploring Data (20% 30%) AP Final Review II Exploring Data (20% 30%) Quantitative vs Categorical Variables Quantitative variables are numerical values for which arithmetic operations such as means make sense. It is usually a measure

More information

Types of Information. Topic 2 - Descriptive Statistics. Examples. Sample and Sample Size. Background Reading. Variables classified as STAT 511

Types of Information. Topic 2 - Descriptive Statistics. Examples. Sample and Sample Size. Background Reading. Variables classified as STAT 511 Topic 2 - Descriptive Statistics STAT 511 Professor Bruce Craig Types of Information Variables classified as Categorical (qualitative) - variable classifies individual into one of several groups or categories

More information

Lecture 1: Descriptive Statistics

Lecture 1: Descriptive Statistics Lecture 1: Descriptive Statistics MSU-STT-351-Sum 15 (P. Vellaisamy: MSU-STT-351-Sum 15) Probability & Statistics for Engineers 1 / 56 Contents 1 Introduction 2 Branches of Statistics Descriptive Statistics

More information

Chapter 4. Displaying and Summarizing. Quantitative Data

Chapter 4. Displaying and Summarizing. Quantitative Data STAT 141 Introduction to Statistics Chapter 4 Displaying and Summarizing Quantitative Data Bin Zou (bzou@ualberta.ca) STAT 141 University of Alberta Winter 2015 1 / 31 4.1 Histograms 1 We divide the range

More information

Math 223 Lecture Notes 3/15/04 From The Basic Practice of Statistics, bymoore

Math 223 Lecture Notes 3/15/04 From The Basic Practice of Statistics, bymoore Math 223 Lecture Notes 3/15/04 From The Basic Practice of Statistics, bymoore Chapter 3 continued Describing distributions with numbers Measuring spread of data: Quartiles Definition 1: The interquartile

More information

Chapter 3: Displaying and summarizing quantitative data p52 The pattern of variation of a variable is called its distribution.

Chapter 3: Displaying and summarizing quantitative data p52 The pattern of variation of a variable is called its distribution. Chapter 3: Displaying and summarizing quantitative data p52 The pattern of variation of a variable is called its distribution. 1 Histograms p53 The breakfast cereal data Study collected data on nutritional

More information

Chapter 5: Exploring Data: Distributions Lesson Plan

Chapter 5: Exploring Data: Distributions Lesson Plan Lesson Plan Exploring Data Displaying Distributions: Histograms For All Practical Purposes Mathematical Literacy in Today s World, 7th ed. Interpreting Histograms Displaying Distributions: Stemplots Describing

More information

CHAPTER 5: EXPLORING DATA DISTRIBUTIONS. Individuals are the objects described by a set of data. These individuals may be people, animals or things.

CHAPTER 5: EXPLORING DATA DISTRIBUTIONS. Individuals are the objects described by a set of data. These individuals may be people, animals or things. (c) Epstein 2013 Chapter 5: Exploring Data Distributions Page 1 CHAPTER 5: EXPLORING DATA DISTRIBUTIONS 5.1 Creating Histograms Individuals are the objects described by a set of data. These individuals

More information

Topic 3: Introduction to Statistics. Algebra 1. Collecting Data. Table of Contents. Categorical or Quantitative? What is the Study of Statistics?!

Topic 3: Introduction to Statistics. Algebra 1. Collecting Data. Table of Contents. Categorical or Quantitative? What is the Study of Statistics?! Topic 3: Introduction to Statistics Collecting Data We collect data through observation, surveys and experiments. We can collect two different types of data: Categorical Quantitative Algebra 1 Table of

More information

1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved.

1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved. 1-1 Chapter 1 Sampling and Descriptive Statistics 1-2 Why Statistics? Deal with uncertainty in repeated scientific measurements Draw conclusions from data Design valid experiments and draw reliable conclusions

More information

Descriptive Statistics-I. Dr Mahmoud Alhussami

Descriptive Statistics-I. Dr Mahmoud Alhussami Descriptive Statistics-I Dr Mahmoud Alhussami Biostatistics What is the biostatistics? A branch of applied math. that deals with collecting, organizing and interpreting data using well-defined procedures.

More information

Performance of fourth-grade students on an agility test

Performance of fourth-grade students on an agility test Starter Ch. 5 2005 #1a CW Ch. 4: Regression L1 L2 87 88 84 86 83 73 81 67 78 83 65 80 50 78 78? 93? 86? Create a scatterplot Find the equation of the regression line Predict the scores Chapter 5: Understanding

More information

BNG 495 Capstone Design. Descriptive Statistics

BNG 495 Capstone Design. Descriptive Statistics BNG 495 Capstone Design Descriptive Statistics Overview The overall goal of this short course in statistics is to provide an introduction to descriptive and inferential statistical methods, with a focus

More information

Summarizing and Displaying Measurement Data/Understanding and Comparing Distributions

Summarizing and Displaying Measurement Data/Understanding and Comparing Distributions Summarizing and Displaying Measurement Data/Understanding and Comparing Distributions Histograms, Mean, Median, Five-Number Summary and Boxplots, Standard Deviation Thought Questions 1. If you were to

More information

Chapter2 Description of samples and populations. 2.1 Introduction.

Chapter2 Description of samples and populations. 2.1 Introduction. Chapter2 Description of samples and populations. 2.1 Introduction. Statistics=science of analyzing data. Information collected (data) is gathered in terms of variables (characteristics of a subject that

More information

Units. Exploratory Data Analysis. Variables. Student Data

Units. Exploratory Data Analysis. Variables. Student Data Units Exploratory Data Analysis Bret Larget Departments of Botany and of Statistics University of Wisconsin Madison Statistics 371 13th September 2005 A unit is an object that can be measured, such as

More information

Chapter 3. Data Description

Chapter 3. Data Description Chapter 3. Data Description Graphical Methods Pie chart It is used to display the percentage of the total number of measurements falling into each of the categories of the variable by partition a circle.

More information

Continuous random variables

Continuous random variables Continuous random variables A continuous random variable X takes all values in an interval of numbers. The probability distribution of X is described by a density curve. The total area under a density

More information

Chapter 6. The Standard Deviation as a Ruler and the Normal Model 1 /67

Chapter 6. The Standard Deviation as a Ruler and the Normal Model 1 /67 Chapter 6 The Standard Deviation as a Ruler and the Normal Model 1 /67 Homework Read Chpt 6 Complete Reading Notes Do P129 1, 3, 5, 7, 15, 17, 23, 27, 29, 31, 37, 39, 43 2 /67 Objective Students calculate

More information

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 3.1- #

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 3.1- # Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series by Mario F. Triola Chapter 3 Statistics for Describing, Exploring, and Comparing Data 3-1 Review and Preview 3-2 Measures

More information

Chapter 2 Solutions Page 15 of 28

Chapter 2 Solutions Page 15 of 28 Chapter Solutions Page 15 of 8.50 a. The median is 55. The mean is about 105. b. The median is a more representative average" than the median here. Notice in the stem-and-leaf plot on p.3 of the text that

More information

TOPIC: Descriptive Statistics Single Variable

TOPIC: Descriptive Statistics Single Variable TOPIC: Descriptive Statistics Single Variable I. Numerical data summary measurements A. Measures of Location. Measures of central tendency Mean; Median; Mode. Quantiles - measures of noncentral tendency

More information

Chapter 5. Understanding and Comparing. Distributions

Chapter 5. Understanding and Comparing. Distributions STAT 141 Introduction to Statistics Chapter 5 Understanding and Comparing Distributions Bin Zou (bzou@ualberta.ca) STAT 141 University of Alberta Winter 2015 1 / 27 Boxplots How to create a boxplot? Assume

More information

Unit Two Descriptive Biostatistics. Dr Mahmoud Alhussami

Unit Two Descriptive Biostatistics. Dr Mahmoud Alhussami Unit Two Descriptive Biostatistics Dr Mahmoud Alhussami Descriptive Biostatistics The best way to work with data is to summarize and organize them. Numbers that have not been summarized and organized are

More information

Lecture 1: Description of Data. Readings: Sections 1.2,

Lecture 1: Description of Data. Readings: Sections 1.2, Lecture 1: Description of Data Readings: Sections 1.,.1-.3 1 Variable Example 1 a. Write two complete and grammatically correct sentences, explaining your primary reason for taking this course and then

More information

Objective A: Mean, Median and Mode Three measures of central of tendency: the mean, the median, and the mode.

Objective A: Mean, Median and Mode Three measures of central of tendency: the mean, the median, and the mode. Chapter 3 Numerically Summarizing Data Chapter 3.1 Measures of Central Tendency Objective A: Mean, Median and Mode Three measures of central of tendency: the mean, the median, and the mode. A1. Mean The

More information

Lecture 6: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries)

Lecture 6: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries) Lecture 6: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries) Summarize with Shape, Center, Spread Displays: Stemplots, Histograms Five Number Summary, Outliers, Boxplots Cengage Learning

More information

6 THE NORMAL DISTRIBUTION

6 THE NORMAL DISTRIBUTION CHAPTER 6 THE NORMAL DISTRIBUTION 341 6 THE NORMAL DISTRIBUTION Figure 6.1 If you ask enough people about their shoe size, you will find that your graphed data is shaped like a bell curve and can be described

More information

1.3.1 Measuring Center: The Mean

1.3.1 Measuring Center: The Mean 1.3.1 Measuring Center: The Mean Mean - The arithmetic average. To find the mean (pronounced x bar) of a set of observations, add their values and divide by the number of observations. If the n observations

More information

MATH 10 INTRODUCTORY STATISTICS

MATH 10 INTRODUCTORY STATISTICS MATH 10 INTRODUCTORY STATISTICS Tommy Khoo Your friendly neighbourhood graduate student. Week 1 Chapter 1 Introduction What is Statistics? Why do you need to know Statistics? Technical lingo and concepts:

More information

Lecture 2 and Lecture 3

Lecture 2 and Lecture 3 Lecture 2 and Lecture 3 1 Lecture 2 and Lecture 3 We can describe distributions using 3 characteristics: shape, center and spread. These characteristics have been discussed since the foundation of statistics.

More information

2011 Pearson Education, Inc

2011 Pearson Education, Inc Statistics for Business and Economics Chapter 2 Methods for Describing Sets of Data Summary of Central Tendency Measures Measure Formula Description Mean x i / n Balance Point Median ( n +1) Middle Value

More information

(i) The mean and mode both equal the median; that is, the average value and the most likely value are both in the middle of the distribution.

(i) The mean and mode both equal the median; that is, the average value and the most likely value are both in the middle of the distribution. MATH 382 Normal Distributions Dr. Neal, WKU Measurements that are normally distributed can be described in terms of their mean µ and standard deviation σ. These measurements should have the following properties:

More information

F78SC2 Notes 2 RJRC. If the interest rate is 5%, we substitute x = 0.05 in the formula. This gives

F78SC2 Notes 2 RJRC. If the interest rate is 5%, we substitute x = 0.05 in the formula. This gives F78SC2 Notes 2 RJRC Algebra It is useful to use letters to represent numbers. We can use the rules of arithmetic to manipulate the formula and just substitute in the numbers at the end. Example: 100 invested

More information

Lecture 3B: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries)

Lecture 3B: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries) Lecture 3B: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries) Summarize with Shape, Center, Spread Displays: Stemplots, Histograms Five Number Summary, Outliers, Boxplots Mean vs.

More information

Measures of Central Tendency and their dispersion and applications. Acknowledgement: Dr Muslima Ejaz

Measures of Central Tendency and their dispersion and applications. Acknowledgement: Dr Muslima Ejaz Measures of Central Tendency and their dispersion and applications Acknowledgement: Dr Muslima Ejaz LEARNING OBJECTIVES: Compute and distinguish between the uses of measures of central tendency: mean,

More information

(i) The mean and mode both equal the median; that is, the average value and the most likely value are both in the middle of the distribution.

(i) The mean and mode both equal the median; that is, the average value and the most likely value are both in the middle of the distribution. MATH 183 Normal Distributions Dr. Neal, WKU Measurements that are normally distributed can be described in terms of their mean µ and standard deviation!. These measurements should have the following properties:

More information

1 Measures of the Center of a Distribution

1 Measures of the Center of a Distribution 1 Measures of the Center of a Distribution Qualitative descriptions of the shape of a distribution are important and useful. But we will often desire the precision of numerical summaries as well. Two aspects

More information

Statistics in medicine

Statistics in medicine Statistics in medicine Lecture 1- part 1: Describing variation, and graphical presentation Outline Sources of variation Types of variables Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease

More information

Statistics I Chapter 2: Univariate data analysis

Statistics I Chapter 2: Univariate data analysis Statistics I Chapter 2: Univariate data analysis Chapter 2: Univariate data analysis Contents Graphical displays for categorical data (barchart, piechart) Graphical displays for numerical data data (histogram,

More information

20 Hypothesis Testing, Part I

20 Hypothesis Testing, Part I 20 Hypothesis Testing, Part I Bob has told Alice that the average hourly rate for a lawyer in Virginia is $200 with a standard deviation of $50, but Alice wants to test this claim. If Bob is right, she

More information

Chapter 2: Descriptive Analysis and Presentation of Single- Variable Data

Chapter 2: Descriptive Analysis and Presentation of Single- Variable Data Chapter 2: Descriptive Analysis and Presentation of Single- Variable Data Mean 26.86667 Standard Error 2.816392 Median 25 Mode 20 Standard Deviation 10.90784 Sample Variance 118.981 Kurtosis -0.61717 Skewness

More information

Francine s bone density is 1.45 standard deviations below the mean hip bone density for 25-year-old women of 956 grams/cm 2.

Francine s bone density is 1.45 standard deviations below the mean hip bone density for 25-year-old women of 956 grams/cm 2. Chapter 3 Solutions 3.1 3.2 3.3 87% of the girls her daughter s age weigh the same or less than she does and 67% of girls her daughter s age are her height or shorter. According to the Los Angeles Times,

More information

Sampling, Frequency Distributions, and Graphs (12.1)

Sampling, Frequency Distributions, and Graphs (12.1) 1 Sampling, Frequency Distributions, and Graphs (1.1) Design: Plan how to obtain the data. What are typical Statistical Methods? Collect the data, which is then subjected to statistical analysis, which

More information

DEPARTMENT OF QUANTITATIVE METHODS & INFORMATION SYSTEMS QM 120. Spring 2008

DEPARTMENT OF QUANTITATIVE METHODS & INFORMATION SYSTEMS QM 120. Spring 2008 DEPARTMENT OF QUANTITATIVE METHODS & INFORMATION SYSTEMS Introduction to Business Statistics QM 120 Chapter 3 Spring 2008 Measures of central tendency for ungrouped data 2 Graphs are very helpful to describe

More information

Chapter. Numerically Summarizing Data Pearson Prentice Hall. All rights reserved

Chapter. Numerically Summarizing Data Pearson Prentice Hall. All rights reserved Chapter 3 Numerically Summarizing Data Section 3.1 Measures of Central Tendency Objectives 1. Determine the arithmetic mean of a variable from raw data 2. Determine the median of a variable from raw data

More information

Statistics I Chapter 2: Univariate data analysis

Statistics I Chapter 2: Univariate data analysis Statistics I Chapter 2: Univariate data analysis Chapter 2: Univariate data analysis Contents Graphical displays for categorical data (barchart, piechart) Graphical displays for numerical data data (histogram,

More information

Chapter 6 The Standard Deviation as a Ruler and the Normal Model

Chapter 6 The Standard Deviation as a Ruler and the Normal Model Chapter 6 The Standard Deviation as a Ruler and the Normal Model Overview Key Concepts Understand how adding (subtracting) a constant or multiplying (dividing) by a constant changes the center and/or spread

More information

Resistant Measure - A statistic that is not affected very much by extreme observations.

Resistant Measure - A statistic that is not affected very much by extreme observations. Chapter 1.3 Lecture Notes & Examples Section 1.3 Describing Quantitative Data with Numbers (pp. 50-74) 1.3.1 Measuring Center: The Mean Mean - The arithmetic average. To find the mean (pronounced x bar)

More information

Descriptive Univariate Statistics and Bivariate Correlation

Descriptive Univariate Statistics and Bivariate Correlation ESC 100 Exploring Engineering Descriptive Univariate Statistics and Bivariate Correlation Instructor: Sudhir Khetan, Ph.D. Wednesday/Friday, October 17/19, 2012 The Central Dogma of Statistics used to

More information

Section 5.4. Ken Ueda

Section 5.4. Ken Ueda Section 5.4 Ken Ueda Students seem to think that being graded on a curve is a positive thing. I took lasers 101 at Cornell and got a 92 on the exam. The average was a 93. I ended up with a C on the test.

More information

Chapter 1: Exploring Data

Chapter 1: Exploring Data Chapter 1: Exploring Data Section 1.3 with Numbers The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE Chapter 1 Exploring Data Introduction: Data Analysis: Making Sense of Data 1.1

More information

Describing Distributions with Numbers

Describing Distributions with Numbers Describing Distributions with Numbers Using graphs, we could determine the center, spread, and shape of the distribution of a quantitative variable. We can also use numbers (called summary statistics)

More information

CHAPTER 2: Describing Distributions with Numbers

CHAPTER 2: Describing Distributions with Numbers CHAPTER 2: Describing Distributions with Numbers The Basic Practice of Statistics 6 th Edition Moore / Notz / Fligner Lecture PowerPoint Slides Chapter 2 Concepts 2 Measuring Center: Mean and Median Measuring

More information

Example 2. Given the data below, complete the chart:

Example 2. Given the data below, complete the chart: Statistics 2035 Quiz 1 Solutions Example 1. 2 64 150 150 2 128 150 2 256 150 8 8 Example 2. Given the data below, complete the chart: 52.4, 68.1, 66.5, 75.0, 60.5, 78.8, 63.5, 48.9, 81.3 n=9 The data is

More information

Unit 2: Numerical Descriptive Measures

Unit 2: Numerical Descriptive Measures Unit 2: Numerical Descriptive Measures Summation Notation Measures of Central Tendency Measures of Dispersion Chebyshev's Rule Empirical Rule Measures of Relative Standing Box Plots z scores Jan 28 10:48

More information

Stat Lecture Slides Exploring Numerical Data. Yibi Huang Department of Statistics University of Chicago

Stat Lecture Slides Exploring Numerical Data. Yibi Huang Department of Statistics University of Chicago Stat 22000 Lecture Slides Exploring Numerical Data Yibi Huang Department of Statistics University of Chicago Outline In this slide, we cover mostly Section 1.2 & 1.6 in the text. Data and Types of Variables

More information

Statistics Lecture 3

Statistics Lecture 3 Statistics 111 - Lecture 3 Continuous Random Variables The probable is what usually happens. (Aristotle ) Moore, McCabe and Craig: Section 4.3,4.5 Continuous Random Variables Continuous random variables

More information

(quantitative or categorical variables) Numerical descriptions of center, variability, position (quantitative variables)

(quantitative or categorical variables) Numerical descriptions of center, variability, position (quantitative variables) 3. Descriptive Statistics Describing data with tables and graphs (quantitative or categorical variables) Numerical descriptions of center, variability, position (quantitative variables) Bivariate descriptions

More information

M 225 Test 1 B Name SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 75

M 225 Test 1 B Name SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 75 M 225 Test 1 B Name SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points 1-13 13 14 3 15 8 16 4 17 10 18 9 19 7 20 3 21 16 22 2 Total 75 1 Multiple choice questions (1 point each) 1. Look at

More information

Determining the Spread of a Distribution

Determining the Spread of a Distribution Determining the Spread of a Distribution 1.3-1.5 Cathy Poliak, Ph.D. cathy@math.uh.edu Department of Mathematics University of Houston Lecture 3-2311 Lecture 3-2311 1 / 58 Outline 1 Describing Quantitative

More information

Exercises from Chapter 3, Section 1

Exercises from Chapter 3, Section 1 Exercises from Chapter 3, Section 1 1. Consider the following sample consisting of 20 numbers. (a) Find the mode of the data 21 23 24 24 25 26 29 30 32 34 39 41 41 41 42 43 48 51 53 53 (b) Find the median

More information

Ch. 7: Estimates and Sample Sizes

Ch. 7: Estimates and Sample Sizes Ch. 7: Estimates and Sample Sizes Section Title Notes Pages Introduction to the Chapter 2 2 Estimating p in the Binomial Distribution 2 5 3 Estimating a Population Mean: Sigma Known 6 9 4 Estimating a

More information

Determining the Spread of a Distribution

Determining the Spread of a Distribution Determining the Spread of a Distribution 1.3-1.5 Cathy Poliak, Ph.D. cathy@math.uh.edu Department of Mathematics University of Houston Lecture 3-2311 Lecture 3-2311 1 / 58 Outline 1 Describing Quantitative

More information

Week 1: Intro to R and EDA

Week 1: Intro to R and EDA Statistical Methods APPM 4570/5570, STAT 4000/5000 Populations and Samples 1 Week 1: Intro to R and EDA Introduction to EDA Objective: study of a characteristic (measurable quantity, random variable) for

More information

Descriptive Data Summarization

Descriptive Data Summarization Descriptive Data Summarization Descriptive data summarization gives the general characteristics of the data and identify the presence of noise or outliers, which is useful for successful data cleaning

More information

Stat 20 Midterm 1 Review

Stat 20 Midterm 1 Review Stat 20 Midterm Review February 7, 2007 This handout is intended to be a comprehensive study guide for the first Stat 20 midterm exam. I have tried to cover all the course material in a way that targets

More information

AP Statistics Cumulative AP Exam Study Guide

AP Statistics Cumulative AP Exam Study Guide AP Statistics Cumulative AP Eam Study Guide Chapters & 3 - Graphs Statistics the science of collecting, analyzing, and drawing conclusions from data. Descriptive methods of organizing and summarizing statistics

More information

Let's Do It! What Type of Variable?

Let's Do It! What Type of Variable? 1 2.1-2.3: Organizing Data DEFINITIONS: Qualitative Data are those which classify the units into categories. The categories may or may not have a natural ordering to them. Qualitative variables are also

More information