Chapter 1 Introduction & 1.1: Analyzing Categorical Data

Size: px
Start display at page:

Download "Chapter 1 Introduction & 1.1: Analyzing Categorical Data"

Transcription

1 Chapter 1 Chapter 1 Introduction & 1.1: Analyzing Categorical Data Population Sample Make an inference about the population. Collect data from a representative sample... Perform Data Analysis, keeping probability in mind Introduction Data Analysis: Making Sense of Data After this section, you should be able to DEFINE Individuals and Variables DISTINGUISH between Categorical and Quantitative variables DEFINE Distribution DESCRIBE the idea behind Inference Categorical Variable Usually an adjective Rarely a number Examples: Gender Race Grade in School (Sophomore, Jr., Sr.) Zip Code Variable any characteristic of an individual or object Quantitative Variable Always a number Must be able to find the mean of the numbers Examples: Weight Height GPA # of AP Classes taken Square footage What is the Study of Statistics?! Statistics is the science of data. In this course we study four different aspects of statistics: Data Analysis (Chapters 1 to 3) The process of organizing, displaying, summarizing, and asking questions about data. Data Collection (Chapter 4) The process of conducting and interpreting surveys and experiments. Anticipating Patterns/Probability (Chapter 5 to 7) The process of using probability and chance to explain natural phenomena. Inference (Chapter 8 to 12) The process of making predications and evaluations about a population from a sample. Distribution Distribution: describes what values a variable takes and how often it takes those values Essentially distribution replaces the words data or graph. The median of the distribution is 28. The distribution is skewed left. Dotplot of MPG Distribution 1

2 Chapter 1 Organizing a Statistical Problem The Four Step Process State: What s the question that you re trying to answer? Displaying Categorical Data Frequency tables can be difficult to read. Sometimes it is easier to analyze a distribution by displaying it with a bar graph or pie chart. Plan: How will you go about answering the question? What statistical techniques does this problem call for? Do: Make graphs and carry out needed calculations. Conclude: Give your practical conclusion in the setting of the real world problem. ***Using this method is NOT required; however, all complete answers MUST include the Do and Conclude steps*** Section 1.1 Analyzing Categorical Data 2014 AP Exam Scores After this section, you should be able to CONSTRUCT and INTERPRET bar graphs and pie charts RECOGNIZE good and bad graphs CONSTRUCT and INTERPRET two way tables DESCRIBE relationships between two categorical variables ORGANIZE statistical problems Distribution & Categorical Variables The distribution of a categorical variable lists the count or percent of individuals who fall into each category. Favorite Course Count English 8 Foreign Language 4 Histroy 11 Math 15 Science 12 Favorite Course Percentage English 16% Foreign Language 8% Histroy 22% Math 30% Science 24% 2

3 Chapter 1 Graphs: Good and Bad Bar graphs compare several quantities by comparing the heights of bars that represent those quantities. Our eyes react to the area of the bars as well as height. Be sure to make your bars equally wide. Avoid the temptation to replace the bars with pictures for greater appeal this can be misleading! What proportion of males have a good chance at being rich? What proportion of females have a chance at being rich? What proportion of young adults that have an almost certain chance of being rich are male? This ad for DIRECTV has multiple problems. How many can you point out? Two Way Tables Two Way Tables: describe two categorical variables, organizing counts according to a row variable and a column variable. When a dataset involves two categorical variables, we begin by examining the counts or percents in various categories for one of the variables. Member of No Clubs Member of Member of 2 or One Club More Clubs Total Rides the School Bus Does not Ride Bus Total Comparing Categorical Distributions Sophomore Junior Senior Total One Two Three Four Five Total What proportion of students that ride the school bus are members of two or more clubs? What proportion of students that are members of no clubs do not ride the school bus? What proportion of students that do not ride the school bus are members of at least one club? Senior Comparing Categorical Distributions Member of No Clubs Member of Member of 2 or One Club More Clubs Total Rides the School Bus Does not Ride Bus Total Junior Sophomore One Two Three Four Five 0% 20% 40% 60% 80% 100% 3

4 Chapter 1 Does not Ride Bus Rides the School Bus Comparing Categorical Distributions 0% 20% 40% 60% 80% 100% Member of No Clubs Member of One Club Member of 2 or More Clubs Comparing Categorical Distributions Sample Answer: Yes, there is a clear association between after school club participation and transportation. Only 11% of students who don t ride the bus do not participate in after school clubs, whereas 51% of students who do ride the bus do not participate. Similarly, 58% of students who do not ride the bus are involved in 2 or more clubs, while only 19% of students riding the bus are involved in 2 or more clubs. However, the proportion of students who participate in one club is the same for students who ride and students who don t ride the bus. Writing to Compare Categorical Distributions Cite specific numerical values/proportions. Use comparison words. Greater, smaller, less, while only, more, wider, narrower, etc. Use transition words However, whereas, similarly, additionally, etc. Discuss at least two points of comparison. 1.2: Displaying Quantitative Data with Graphs Does not Ride Bus Rides the School Bus Comparing Categorical Distributions Is there an association between after school club participation and whether or not the student rides the school bus? Support your answer with a discussion of the provided graphs. Member of No Clubs Member of One Club Member of 2 or More Clubs Section 1.2 Displaying Quantitative Data with Graphs After this section, you should be able to CONSTRUCT and INTERPRET dotplots, stemplots, and histograms DESCRIBE the shape of a distribution COMPARE distributions USE histograms wisely 0% 20% 40% 60% 80% 100% 4

5 Chapter 1 Dotplots Each data value is shown as a dot above its location on a number line. Describing Shape When you describe a distribution s shape, concentrate on the main features. Look for rough symmetry or clear skewness. Number of Goals Scored Per Game by the 2004 US Women s Soccer Team How to Make a Dotplot 1. Draw a horizontal axis (a number line) and label it with the variable name. 2. Scale the axis from the minimum to the maximum value. Shape Definitions: Symmetric: if the right and left sides of the graph are approximately mirror images of each other. Skewed to the right (right skewed) if the right side of the graph is much longer than the left side. Skewed to the left (left skewed) if the left side of the graph is much longer than the right side. 3. Mark a dot above the location on the horizontal axis corresponding to each data value. Symmetric Skewed left Skewed right How to Describe Quantitative Data In any graph, look for the overall pattern and for striking departures from that pattern. Describe the overall pattern of a distribution by its: Shape Outliers Don t forget Center your SOCS! Spread 5

6 Chapter 1 Center We can describe the center by finding a value that divides the observations so that about half take larger values and about half take smaller values. Ways to describe center: Calculate median (best when distribution is skewed) Calculate mean (best when distribution is symmetric) Other Ways to Describe Shape: Unimodal Bimodal Multimodal Spread The spread of a distribution tells us how much variability there is in the data. Ways to describe spread: Calculate the range IQR (coming later) Standard Deviation (coming later) Outliers Definition: Values that differ from the overall pattern are outliers. We will learn specific ways to find outliers in a later chapter. For now, we can only identify potential outliers. Describe the shape, center, and spread of the distribution. Are there any potential outliers? Remember to include CONTEXT!!! 6

7 Chapter 1 Sample Answer: Shape: The shape of the distribution is roughly unimodal and skewed left. Center: The mean is 25.9 mpg and the median is 28 mpg. (only need one measure) Spread: The range is 19 mpg. Outliers: There are two potential outliers/influential values: 14 mpg and 18 mpg. Stemplots (Stem and Leaf Plots) These data represent the responses of 20 female AP Statistics students to the question, How many pairs of shoes do you have? Stemplots (Stem and Leaf Plots) Stemplots give us a quick picture of the distribution while including the actual numerical values. Two Special Types of Stem Plots Spilt Stemplots: Best when data values are bunched up Spilt 0 4 and 5 9 Back to Back Stemplot: Compares two distributions of the same quantitative variable split stems Females Males Back to Back Key: 4 9 represents a student who reported having 49 pairs of shoes. How to Make a Stemplot 1)Separate each observation into a stem (all but the final digit) and a leaf (the final digit). 2)Write all possible stems from the smallest to the largest in a vertical column and draw a vertical line to the right of the column. 3)Write each leaf in the row to the right of its stem. 4)Arrange the leaves in increasing order out from the stem. 5)Provide a key that explains in context what the stems and leaves represent. Histograms Quantitative variables often take many values. A graph of the distribution may be clearer if nearby values are grouped together. The most common graph of the distribution of one quantitative variable is a histogram. 7

8 This image cannot currently be displayed. Chapter 1 How to Make a Histogram 1)Divide the range of data into classes of equal width. 2)Find the count (frequency) or percent (relative frequency) of individuals in each class. 3)Label and scale your axes and draw the histogram. The height of the bar equals its frequency. Adjacent bars should touch, unless a class contains no individuals. 1.3: Describing Quantitative Data with Numbers Making a Histogram Section 1.3 Describing Quantitative Data with Numbers Frequency Table Class Count 0 to < to < to < to < to < to <30 1 Total 50 Number of States Percent of foreign-born residents After this section, you should be able to MEASURE center with the mean and median MEASURE spread with standard deviation and interquartile range IDENTIFY outliers CONSTRUCT a boxplot using the five number summary CALCULATE numerical summaries with technology Caution: Using Histograms Wisely Measuring Center: The Mean 1)Don t confuse histograms and bar graphs. 2)Don t use counts (in a frequency table) or percents (in a relative frequency table) as data. 3)Use percents instead of counts on the vertical axis when comparing distributions with different numbers of observations. 4)Just because a graph looks nice, it s not necessarily a meaningful display of data. To find the mean (pronounced x bar ) of a set of observations, add their values and divide by the number of observations. If the n observations are x 1, x 2, x 3,, x n, their mean is: Compact Notation: 8

9 Chapter 1 Measuring Center: The Median The median M is the midpoint of a distribution, the number such that half of the observations are smaller and the other half are larger. To find the median of a distribution: 1)Arrange all observations from smallest to largest. 2)If the number of observations n is odd, the median M is the center observation in the ordered list. 3)If the number of observations n is even, the median M is the average of the two center observations in the ordered list. Why is the mean more affected by the presence of outliers than the median? Comparing the Mean and the Median The mean and median measure center in different ways, and both are useful. Mean: average value Median: typical value Standard Deviation Standard deviation is a number used to tell how measurements for a group are spread out from the mean. Relationship between Mean & Median: The mean and median of a roughly symmetric distribution are close together. If the distribution is exactly symmetric, the mean and median are exactly the same. In a skewed distribution, the mean is usually farther out in the long tail than is the median. Standard Deviation A relatively low standard deviation value indicates that the data points tend to be very close to the mean. A relatively high standard deviation value indicates that the data points are spread out over a large range of values. 9

10 Chapter 1 Standard Deviation Formula The standard deviation s x measures the average distance of the observations from their mean. It is calculated by finding an average of the squared distances and then taking the square root. This average squared distance is called the variance. Calculate the Standard Deviation Calculate the standard deviation. FYI: Why n 1?! Applet: sticsapplets/n 1.html Proof Calculate the Standard Deviation 1) Calculate the mean. Step 1: 5 2) Calculate each deviation. deviation = observation mean deviation: 1-5 = -4 deviation: 8-5 = 3 x i (x i mean) = Sum= How to Calculate Standard Deviation by Hand 1. Calculate mean. 2. Calculate each deviation. Subtract your mean score from every actual (observed) score. 3. Square each deviation. 4. Find the average squared deviation by calculating the sum of the squared deviations divided by (n 1). 4. Divide that sum by the number of cases in your data 5. Finally, calculate the square root of the number calculate in step #4 Calculate the Standard Deviation 3) Square each deviation. Step 3: See Table 4) Find the average squared deviation by calculating the sum of the squared deviations divided by (n 1). Step 4: Average squared deviation = 52/(9 1) = 6.5 Variance = 6.5 x i (x i mean) (x i mean) = = = = = = = = = 4 Sum= Sum= 10

11 Chapter 1 Calculate the Standard Deviation 5) Calculate the square root of the variance this is the standard deviation. Step 5: Square root of variance Standard Deviation = 2.55 x i (x i mean) (x i mean) = 4 ( 4) 2 = = 2 ( 2) 2 = = 1 ( 1) 2 = = 1 ( 1) 2 = = 1 ( 1) 2 = = 0 (0) 2 = = 2 (2) 2 = = 3 (3) 2 = = 4 (4) 2 = 16 Sum=? Sum=? TI NSpire: Calculate standard deviation and mean. 1. Select Lists & Spreadsheet (blue/green button at bottom of home screen) 2. Type the values into list1. 3. With your cursor on the values, press menu 4. Select 4: Statistics, then 1: Stat Calculations, press enter. 5. Select 1: One Variable Stats Two Extreme Examples: In dataset #1, we have five people that report eating 4 pieces of cake and five people that report eating 6 pieces of cake, for a mean of 5 pieces of cake ([ ]/10=5). Mean =5; Variance = 1 In dataset #2, we have five people that report eating 0 piece of cake and five people that report eating 10 pieces of cake, for a mean of 5 pieces of cake ([ ]/10=5). Mean = 5; Variance = 5 TI NSpire: Calculate standard deviation and mean. 6. Set screen to: and then press enter. Below are dotplots of three different distributions, A, B, and C. Which one has the largest standard deviation? Justify your answer. Mean Standard Deviation 11

12 Chapter 1 Interquartile Range (IQR) Find and Interpret the IQR Travel times to work for 20 randomly selected New Yorkers Q 1 = 15 M = 22.5 Q 3 = 42.5 IQR = Q 3 Q 1 = = 27.5 minutes Interpretation: The range of the middle half of travel times for the New Yorkers in the sample is 27.5 minutes. Interquartile Range (IQR) To calculate: 1)Arrange the observations in increasing order and locate the median M. 2)The first quartile Q 1 is the median of the observations located to the left of the median in the ordered list. 3)The third quartile Q 3 is the median of the observations located to the right of the median in the ordered list. The interquartile range (IQR) is defined as: Identifying Outliers In addition to serving as a measure of spread, the interquartile range (IQR) is used as part of a rule of thumb for identifying outliers. 1.5 x IQR Rule for Outliers Call an observation an outlier if it falls more than 1.5 x IQR above the third quartile or below the first quartile. IQR = Q 3 Q 1 Find and Interpret the IQR Travel times to work for 20 randomly selected New Yorkers In the New York travel time data, we found Q 1 =15 minutes, Q 3 =42.5 minutes, and IQR=27.5 minutes. Calculate the outlier cutoffs using the IQR rule. For these data, 1.5 x IQR = 1.5(27.5) = Q x IQR = = Q x IQR = = Any travel time shorter than minutes or longer than minutes is considered an outlier. 12

13 Chapter 1 In the New York travel time data, we found Q 1 =15 minutes, Q 3 =42.5 minutes, and IQR=27.5 minutes. Calculate the outlier cutoffs using the IQR rule. TI Nspire: 5 Number Summary 6. Set screen to: and then press enter. 7. Scroll down to see the 5 number summary. The Five Number Summary The five number summary of a distribution consists of the smallest observation, the first quartile, the median, the third quartile, and the largest observation, written in order from smallest to largest. Minimum Q 1 M Q 3 Maximum TI Nspire: 5 Number Summary 1. Select Lists & Spreadsheet (bottom of home screen) 2. Type the values into list1. 3. With your cursor on the values, press menu 4. Select 4: Statistics, then 1: Stat Calculations, press enter. 5. Select 1: One Variable Stats Boxplots (Box and Whisker Plots) Draw and label a number line that includes the range of the distribution. Draw a central box from Q 1 to Q 3. Note the median M inside the box. Extend lines (whiskers) from the box out to the minimum and maximum values that are not outliers. 13

14 Chapter 1 Construct a Boxplot Using our NY travel times data. Construct a boxplot Construct a Boxplot Using our NY travel times data. Construct a boxplot Min=5 Q 1 = 15 M = 22.5 Q 3 = 42.5 Max=85 Recall, this is an outlier by the 1.5 x IQR rule Choosing Best Measures of Center & Spread Symmetric Distribution Skewed Distribution Best Measure of Center Best Measure of Spread 14

15 Chapter 2 2.1: Describing Location in a Distribution Jenny earned a score of 86 on her test. How did she perform relative to the rest of the class? What percentile is she ranked in? Section 2.1 Describing Location in a Distribution After this section, you should be able to MEASURE position using percentiles INTERPRET cumulative relative frequency graphs TRANSFORM data Jenny earned a score of 86 on her test. How did she perform relative to the rest of the class? What percentile is she ranked in? Her score was greater than 21 of the 25 observations. Since 21 of the 25, or 84%, of the scores are below hers, Jenny is at the 84 th percentile in the class s test score distribution. DEFINE and DESCRIBE density curves Measuring Position: Percentiles One way to describe the location of a value in a distribution is to tell what percent of observations are less than it. The p th percentile of a distribution is the value with p percent of the observations less than it. Cumulative Relative Frequency Graphs A cumulative relative frequency graph displays the cumulative relative frequency of each class of a frequency distribution. 1

16 Chapter 2 Age of First 44 Presidents When They Were Inaugurated Age Frequency Relative frequency /44 = 4.5% 7 7/44 = 15.9% 13 13/44 = 29.5% 12 12/44 = 27.3% 7 7/44 = 15.9% 3 3/44 = 6.8% Cumulative frequency Cumulative relative frequency 2 2/44 = 4.5% 9 9/44 = 20.5% 22 22/44 = 50.0% 34 34/44 = 77.3% 41 41/44 = 93.2% 44 44/44 = 100% Age of Presidents When Inaugurated Transforming Data Transforming converts the original observations from the original units of measurements to another scale. Some transformations can affect the shape, center, and spread of a distribution. Age of First 44 Presidents When They Were Inaugurated Age Frequency Relative frequency /44 = 4.5% 7 7/44 = 15.9% 13 13/44 = 29.5% 12 12/44 = 27.3% 7 7/44 = 15.9% 3 3/44 = 6.8% Cumulative frequency Cumulative relative frequency 2 2/44 = 4.5% 9 9/44 = 20.5% 22 22/44 = 50.0% 34 34/44 = 77.3% 41 41/44 = 93.2% 44 44/44 = 100% Age of Presidents When Inaugurated Transforming Data: Add/Sub a Constant Adding the same number a (either positive, zero, or negative) to each observation: adds a to measures of center and location (mean, median, quartiles, percentiles), but Does not change the shape of the distribution or measures of spread (range, IQR, standard deviation). 1. Was Barack Obama, who was inaugurated at age 47, unusually young? 2. Estimate and interpret the 65 th percentile of the distribution Transforming Data: Add/Sub a Constant n Mean s x Min Q 1 M Q 3 Max IQR Range Guess(m) Error (m 13)

17 Chapter 2 Transforming Data: Multiplying/Dividing Multiplying (or dividing) each observation by the same number b (positive, negative, or zero): Describing Density Curves The median and the mean are the same for a symmetric density curve. They both lie at the center of the curve. The mean of a skewed curve is pulled away from the median in the direction of the long tail. multiplies (divides) measures of center and location by b multiplies (divides) measures of spread by b does not change the shape of the distribution Transforming Data Change data from feet to meters n Mean s x Min Q 1 M Q 3 Max IQR Range Error (feet) Error (meters) : Normal Distributions Density Curve A density curve: is always on or above the horizontal axis, and has area exactly 1 underneath it. A density curve describes the overall pattern of a distribution. The area under the curve and above any interval of values on the horizontal axis is the proportion of all observations that fall in that interval. The overall pattern of this histogram of the scores of all 947 seventh grade students in Gary, Indiana, on the vocabulary part of the Iowa Test of Basic Skills (ITBS) can be described by a smooth curve drawn through the tops of the bars. Section 2.2 Normal Distributions After this section, you should be able to DESCRIBE and APPLY the Rule DESCRIBE the standard Normal Distribution PERFORM Normal distribution calculations ASSESS Normality 3

18 Chapter 2 Normal Distributions All Normal curves are symmetric, single peaked, and bellshaped A Specific Normal curve is described by giving its mean µ and standard deviation σ. Two Normal curves, showing the mean µ and standard deviation σ. The Rule Although there are many different sizes and shapes of Normal curves, they all have properties in common. The Rule ( The Empirical Rule ) In the Normal distribution with mean µ and standard deviation σ: Approximately 68% of the observations fall within σ of µ. Approximately 95% of the observations fall within 2σ of µ. Approximately 99.7% of the observations fall within 3σ of µ. Normal Distributions We abbreviate the Normal distribution with mean µ and standard deviation σ as N(µ,σ). Any particular Normal distribution is completely specified by two numbers: its mean µ and standard deviation σ. The mean of a Normal distribution is the center of the symmetric Normal curve. The standard deviation is the distance from the center to the change of curvature points on either side. Normal Distributions are Useful Normal distributions are good descriptions for some distributions of real data. Normal distributions are good approximations of the results of many kinds of chance outcomes. The distribution of Iowa Test of Basic Skills (ITBS) vocabulary scores for 7 th grade students in Gary, Indiana, is close to Normal. Suppose the distribution is N(6.84, 1.55) and the range is between 0 and 12. a) Sketch the Normal density curve for this distribution. Many statistical inference procedures are based on Normal distributions. 4

19 Chapter 2 The distribution of Iowa Test of Basic Skills (ITBS) vocabulary scores for 7 th grade students in Gary, Indiana, is close to Normal. Suppose the distribution is N(6.84, 1.55) and the range is between 0 and 12. a) Sketch the Normal density curve for this distribution. The distribution of Iowa Test of Basic Skills (ITBS) vocabulary scores for 7 th grade students in Gary, Indiana, is close to Normal. Suppose the distribution is N(6.84, 1.55).? c) Using the Empirical Rule, what percent of the scores are between 5.29 and 9.94? The distribution of Iowa Test of Basic Skills (ITBS) vocabulary scores for 7 th grade students in Gary, Indiana, is close to Normal. Suppose the distribution is N(6.84, 1.55). b) Using the Empirical Rule, what percent of ITBS vocabulary scores are less than 3.74? The distribution of Iowa Test of Basic Skills (ITBS) vocabulary scores for 7 th grade students in Gary, Indiana, is close to Normal. Suppose the distribution is N(6.84, 1.55).? c) Using the Empirical Rule, What percent of the scores are between 5.29 and 9.94? The distribution of Iowa Test of Basic Skills (ITBS) vocabulary scores for 7 th grade students in Gary, Indiana, is close to Normal. Suppose the distribution is N(6.84, 1.55). b) Using the Empirical Rule, what percent of ITBS vocabulary scores are less than 3.74? Importance of Standardizing There are infinitely many different Normal distributions; all with unique standard deviations and means. In order to more effectively compare different Normal distributions we standardize. Standardizing allows us to compare apples to apples. We can compare SAT and ACT scores by standardizing. 5

20 Chapter 2 The Standardized Normal Distribution All Normal distributions are the same if we measure in units of size σ from the mean µ as center. The standardized Normal distribution is the Normal distribution with mean 0 and standard deviation 1. The Standard Normal Table Because all Normal distributions are the same when we standardize, we can find areas under any Normal curve from a single table. The Standard Normal table is a table of the areas under the standard normal curve. The table entry for each value z is area under the curve to the LEFT of z. The area to left is called the p value Probability Percent How to Standardize a Variable: 1. Draw and label an Normal curve with the mean and standard deviation. 2. Calculate the z score x= variable µ= mean σ= standard deviation Using the Standard Normal Table Row: Ones and tenths digits Column: Hundredths digit Practice: What is the p value for a z score of 2.33? 3. Determine the p value by looking up the z score in the Standard Normal table. 4. Conclude in context. Using the Standard Normal Table Using the Standard Normal Table, find the following: Z Score P value 6

21 Chapter 2 Let s Practice In the 2008 Wimbledon tennis tournament, Rafael Nadal averaged 115 miles per hour (mph) on his first serves. Assume that the distribution of his first serve speeds is Normal with a mean of 115 mph and a standard deviation of 6.2 mph. About what proportion of his first serves would you expect to be less than 120 mph? Greater than? 4. Conclude in context. We expect that 79.1% of Nadal s first serves will be less than 120 mph. We expect that 20.9% of Nadal s first serves will be greater than 120 mps. 1. Draw and label an Normal curve with the mean and standard deviation. 2. Calculate the z score x= variable µ= mean σ= standard deviation Let s Practice When Tiger Woods hits his driver, the distance the ball travels can be described by N(304, 8). What percent of Tiger s drives travel between 305 and 325 yards? 3. Determine the p value by looking up the z score in the Standard Normal table. P(z < 0.81) =.7910 Z When Tiger Woods hits his driver, the distance the ball travels can be described by N(304, 8). What percent of Tiger s drives travel between 305 and 325 yards? Step 1: Draw Distribution Step 2: Z Scores 7

22 Chapter 2 Step 3: P values TI Nspire: NormalPDF Normalpdf Exact percentile/probability of a specific event occurring Using Table A, we can find the area to the left of z=2.63 and the area to the left of z= = Step 4: Conclude In Context 1. Select Calculator (on home screen), press center button. 2. Press menu, press enter. 3. Select 6: Statistics, press enter. 4. Select 5: Distributions, press enter. 5. Select 1: Normal Pdf press enter. 6. Enter the following information: 1. Xvalue (not a percent) 2. µ: (mean) 3. Ơ: (standard deviation) 7. Press enter, number that appears is the p value About 44% of Tiger s drives travel between 305 and 325 yards. Normal Calculations on Calculator TI Nspire: InvNorm NormalCDF NormalPDF InvNorm Calculates Probability of obtaining a value BETWEEN two values Probability of obtaining PRECISELY or EXACTLY a specific x value X value given probability or percentile Example What percent of students scored between 70 and 95 on the test? What is the probability that Suzy scored a 75 on the test? Tommy scored a 92 on the test; what proportion of students did he score better than? invnorm Exact x value at which something occurred 1. Select Calculator (on home screen), press center button. 2. Press menu, press enter. 3. Select 6: Statistics, press enter. 4. Select 5: Distributions, press enter. 5. Select 3: Inverse Norm press enter. 6. Enter the following information: 1. Area (enter as a decimal) 2. µ: (mean) 3. Ơ: (standard deviation) 7. Press enter, number that appears is the p value TI Nspire: NormalCDF Normalcdf Area under the curve between two points 1. Select Calculator (on home screen), press center button. 2. Press menu, press enter. 3. Select 6: Statistics, press enter. 4. Select 5: Distributions, press enter. 5. Select 2: Normal Cdf, press enter. 6. Enter the following information: 1. Lower: (the lower bound of the region OR 1^ 99) 2. Upper: (the upper band of the region OR 1,000,000) 3. µ: (mean) 4. Ơ: (standard deviation) 7. Press enter, number that appears is the p value When Tiger Woods hits his driver, the distance the ball travels can be described by N(304, 8). What percent of Tiger s drives travel between 305 and 325 yards? 8

23 Chapter 2 When Tiger Woods hits his driver, the distance the ball travels can be described by N(304, 8). What percent of Tiger s drives travel between 305 and 325 yards? When Can I Use Normal Calculations?! Whenever the distribution is Normal. Ways to Assess Normality: Plot the data. Make a dotplot, stemplot, or histogram and see if the graph is approximately symmetric and bell shaped. Check whether the data follow the rule. Construct a Normal probability plot. Suzy bombed her recent AP Stats exam; she scored at the 25 th percentile. The class average was a 170 with a standard deviation of 30. Assuming the scores are normally distributed, what score did Suzy earn of the exam? Normal Probability Plot These plots are constructed by plotting each observation in a data set against its corresponding percentile s z score. Suzy bombed her recent AP Stats exam; she scored at the 25 th percentile. The class average was a 170 with a standard deviation of 30. Assuming the scores are normally distributed, what score did Suzy earn of the exam? Interpreting Normal Probability Plot If the points on a Normal probability plot lie close to a straight line, the plot indicates that the data are Normal. Systematic deviations from a straight line indicate a non Normal distribution. Outliers appear as points that are far away from the overall pattern of the plot. 9

24 Chapter 2 Summary: Normal Distributions The Normal Distributions are described by a special family of bellshaped, symmetric density curves called Normal curves. The mean µ and standard deviation σ completely specify a Normal distribution N(µ,σ). The mean is the center of the curve, and σ is the distance from µ to the change of curvature points on either side. All Normal distributions obey the Rule, which describes what percent of observations lie within one, two, and three standard deviations of the mean. All Normal distributions are the same when measurements are standardized. The standard Normal distribution has mean µ=0 and standard deviation σ=1. Table A gives percentiles for the standard Normal curve. By standardizing, we can use Table A to determine the percentile for a given z score or the z score corresponding to a given percentile in any Normal distribution. To assess Normality for a given set of data, we first observe its shape. We then check how well the data fits the rule. Finding Areas Under the Standard Normal Curve Find the proportion of observations from the standard Normal distribution that are between 1.25 and Step 3: Subtract. Additional Help Finding Areas Under the Standard Normal Curve Find the proportion of observations from the standard Normal distribution that are between 1.25 and Step 1: Look up area to the left of 0.81 using table A. Step 2: Find the area to the left of

25 Chapter 3 3.1: Scatterplots & Correlation Scatterplots A scatterplot shows the relationship between two quantitative variables measured on the same individuals. The values of one variable appear on the horizontal axis, and the values of the other variable appear on the vertical axis. Each individual in the data appears as a point on the graph. Section 3.1 Scatterplots and Correlation After this section, you should be able to IDENTIFY explanatory and response variables CONSTRUCT scatterplots to display relationships INTERPRET scatterplots MEASURE linear association using correlation Scatterplots 1. Decide which variable should go on each axis. Remember, the explanatory variable goes on the X axis! 2. Label and scale your axes. 3. Plot individual data values. INTERPRET correlation Explanatory & Response Variables Explanatory Variables (Independent Variables ) Car weight Response Variables (Dependent Variables) Accident death rate Scatterplots Make a scatterplot of the relationship between body weight and pack weight. Body weight is our explanatory variable. Body weight (lb) Backpack weight (lb) Number of cigarettes smoked Life expectancy Number of hours studied SAT scores 1

26 Chapter 3 Constructing a Scatterplot: TI Nspire 1. Enter x values into list 1 and enter y values into list Label each column. Label column x : weight and column y: bpack. 3. Press HOME/On, click Add Data & Statistics Describing Scatterplots As in any graph of data, look for the overall pattern and for striking departures from that pattern. You can describe the overall pattern of a scatterplot by the direction, form, and strength of the relationship. An important kind of departure is an outlier, an individual value that falls outside the overall pattern of the relationship. Also, clustering. Constructing a Scatterplot: TI Nspire 4. Move the cursor to the bottom of the screen and click to add variable. Select weight. 5. Move the cursor to the left of the screen and click to add variable. Select bpack. Words That Describe Direction (slope) Positive or Negative Form Linear, quadratic, cubic, exponential, curved, nonlinear, etc. Strength Strong, weak, somewhat strong, very weak, moderately strong, etc. Constructing a Scatterplot More on Strength Strength refers to how tightly grouped the points are in a particular pattern. Later on we use describe strength as correlation 2

27 Chapter 3 Describe this Scatterplot Interpreting a Scatterplot Interpret.tell what the data suggests in real world terms. Example: The data suggests that the more hours a student studied for Mrs. Daniel s AP Stats test the higher grade the student earned. There is a positive relationship between hours studied and grade earned. Describe this Scatterplot Describe and interpret the scatterplot below. The y axis refers to backpack weight in pounds and the x axis refers to body weight in pounds. Describe this Scatterplot Describe and interpret the scatterplot below. The y axis refers to backpack weight in pounds and the x axis refers to body weight in pounds. Sample Answer: There is a moderately strong, positive, linear relationship between body weight and pack weight. There is one possible outlier, the hiker with the body weight of 187 pounds seems to be carrying relatively less weight than are the other group members. It appears that lighter students are carrying lighter backpacks 3

28 Chapter 3 Describe and interpret the scatterplot below. The y axis refer to a school s mean SAT math score. The x axis refers to the percentage of students at a school taking the SAT. What does r tell us?! Correlation describes what percent of variation in y is explained by x. Notice that the formula is the sum of the z scores of x multiplied by the z scores of y. Describe and interpret the scatterplot below. The y axis refer to a school s mean SAT math score. The x axis refers to the percentage of students at a school taking the SAT. Sample Answer: There is a moderately strong, negative, curved relationship between the percent of students in a state who take the SAT and the mean SAT math score. Further, there are two distinct clusters of states and at least one possible outliers that falls outside the overall pattern. Scatterplots and Correlation What is Correlation? A mathematical value that describes the strength of a linear relationship between two quantitative variables. Correlation values are between 1 and 1. Correlation is abbreviated: r The strength of the linear relationship increases as r moves away from 0 towards 1 or 1. What does r mean? R Value Strength 1 Perfectly linear; negative 0.75 Strong negative relationship 0.50 Moderately strong negative relationship 0.25 Weak negative relationship 0 nonexistent 0.25 Weak positive relationship 0.50 Moderately strong positive relationship 0.75 Strong positive relationship 1 Perfectly linear; positive 4

29 Chapter 3 How strong is the correlation? Is it positive or negative? Describe and interpret the scatterplot below. Be sure to estimate the correlation Describe and interpret the scatterplot below. Be sure to estimate the correlation. Sample Answer: As the number of predicted storms increases, so does the number of observed storms, but the relationship is weak. The relationship evidenced in the scatterplot is a fairly weak positive linear relationship. The estimated correlation is approximately r = **Answers between 0.15 and 0.45 would be acceptable. Sample Answer: As the number of boats registered in Florida increases so does the number of manatees killed by boats. This relationship is evidenced in the scatterplot by a strong, positive linear relationship. The estimated correlation is approximately r =0.85. Estimate the Correlation Coefficient **Answers between would be acceptable. 5

30 Chapter 3 Estimate the Correlation Coefficient Facts about Correlation 1. Correlation requires that both variables be quantitative. 2. Correlation does not describe curved relationships between variables, no matter how strong the relationship is. 3. Correlation is not resistant. r is strongly affected by a few outlying observations. 4. Correlation makes no distinction between explanatory and response variables. 5. r does not change when we change the units of measurement of x, y, or both. 6. r does not change when we add or subtract a constant to either x, y or both. 7. The correlation r itself has no unit of measurement. Calculate Correlation: TI Nspire 1. Enter x values in list 1 and y values in list Press MENU, then 4: Statistics 3. Option 1: Stat Calculations 4. Option 3: Linear Regression mx + b 5. X: a[], Y: b[], ENTER 6. Correlation = r R: Ignores distinctions between X & Y Correlation should be 0.79 Height in Feet Weight in pounds Find the Correlation R: Highly Effected By Outliers R =

31 Chapter 3 Why?! Since r is calculated using standardized values (z scores), the correlation value will not change if the units of measure are changed (feet to inches, etc.) Adding a constant to either x or y or both will not change the correlation because neither the standard deviation nor distance from the mean will be impacted. Section 3.2 Least Squares Regression After this section, you should be able to INTERPRET a regression line CALCULATE the equation of the least squares regression line CALCULATE residuals CONSTRUCT and INTERPRET residual plots DETERMINE how well a line fits observed data INTERPRET computer regression output Correlation Formula: Suppose that we have data on variables x and y for n individuals. The values for the first individual are x 1 and y 1, the values for the second individual are x 2 and y 2, and so on. The means and standard deviations of the two variables are x bar and s x for the x values and y bar and s y for the y values. The correlation r between x and y is: Regression Lines A regression line summarizes the relationship between two variables, but only in settings where one of the variables helps explain or predict the other. A regression line is a line that describes how a response variable y changes as an explanatory variable x changes. We often use a regression line to predict the value of y for a given value of x. 3.2: Least Squares Regressions Regression Lines Regression lines are used to conduct analysis. Colleges use student s SAT and GPAs to predict college success Professional sports teams use player s vital stats (40 yard dash, height, weight) to predict success The Federal Reserve uses economic data (GDP, unemployment, etc.) to predict future economic trends. Macy s uses shipping, sales and inventory data predict future sales. 7

32 Chapter 3 Regression Line Equation Suppose that y is a response variable (plotted on the vertical axis) and x is an explanatory variable (plotted on the horizontal axis). A regression line relating y to x has an equation of the form: ŷ = ax + b In this equation, ŷ (read y hat ) is the predicted value of the response variable y for a given value of the explanatory variable x. a is the slope, the amount by which y is predicted to change when x increases by one unit. b is the y intercept, the predicted value of y when x = 0. Interpreting Linear Regression Y intercept: A student weighing zero pounds is predicted to have a backpack weight of 16.3 pounds (no practical interpretation). Slope: For each additional pound that the student weighs, it is predicted that their backpack will weigh an additional pounds more, on average. Regression Line Equation Interpreting Linear Regression Interpret the y intercept and slope values in context. Is there any practical interpretation? = 37x x= Hours Studied for the SAT Predicted SAT Math Score Format of Regression Lines Format 1: = x = predicted back pack weight x= student s weight Format 2: Predicted back pack weight= (student s weight) Interpreting Linear Regression = 37x Y intercept: If a student studies for zero hours, then the student s predicted SAT score is 270 points. This makes sense. Slope: For each additional hour the student studies, his/her score is predicted to increase 37 points, on average. This makes sense. 8

33 Chapter 3 Predicted Value What is the predicted SAT Math score for a student who studies 12 hours? = 37x Hours Studied for the SAT (x) Predicted SAT Math Score (y) Self Check Quiz: Calculate the Regression Equation A crazy professor believes that a child with IQ 100 should have a reading test score of 50, and that reading score should increase by 1 point for every additional point of IQ. What is the equation of the professor s regression line for predicting reading score from IQ? Be sure to identify all variables used. Predicted Value What is the predicted SAT Math score for a student who studies 12 hours? = 37x Hours Studied for the SAT (x) Predicted SAT Math Score (y) = 37(12) Predicted Score: 714 points Self Check Quiz: Calculate the Regression Equation A crazy professor believes that a child with IQ 100 should have a reading test score of 50, and that reading score should increase by 1 point for every additional point of IQ. What is the equation of the professor s regression line for predicting reading score from IQ? Be sure to identify all variables used. Answer: = 50 + x = predicted reading score x = number of IQ points above 100 Self Check Quiz! Self Check Quiz: Interpreting Regression Lines & Predicted Value Data on the IQ test scores and reading test scores for a group of fifth grade children resulted in the following regression line: predicted reading score = (IQ score) (a) What s the slope of this line? Interpret this value in context. (b) What s the y intercept? Explain why the value of the intercept is not statistically meaningful. (c) Find the predicted reading scores for two children with IQ scores of 90 and 130, respectively. 9

34 Chapter 3 predicted reading score = (IQ score) (a) Slope = For each 1 point increase of IQ score, the reading score is predicted to increase points, on average. (b) Y intercept= If the student has an IQ of zero, which is essential impossible (would not be able to hold a pencil to take the exam), the score would be This has no practical interpretation. (c) Predicted Value: 90: (90) = : (130) = points. Least Squares Regression Line Different regression lines produce different residuals. The regression line we use in AP Stats is Least Squares Regression. The least squares regression line of y on x is the line that makes the sum of the squared residuals as small as possible. TI NSPIRE: LSRL to View Graph 1. Enter x data into list 1 and y data into list 2. Be sure to name lists 2. Press HOME/ON, Add Data & Statistics 3. Enter variables to x and y axis. 4. Click MENU, 4: Analyze 5. Option 6: Regression 6. Option 2: Show Linear (a + bx), ENTER TI NSpire: LSRL 1. Enter x data into list 1 and y data into list Press MENU, 4: Statistics, 1: Stat Calculations 3. Select Option4: Linear Regression. 4. Insert either name of list or a[] for x and name of list or b[] of y. Press ENTER. 10

35 Chapter 3 Residuals A residual is the difference between an observed value of the response variable and the value predicted by the regression line. That is, residual = observed y predicted y residual = y ŷ residual Positive residuals (above line) Calculate the Residual 1. If a student weighs 170 pounds and their backpack weighs 35 pounds, what is the value of the residual? Predicted: ŷ = (170) = Observed: 35 Residual: = pounds The student s backpack weighs pounds more than predicted. Negative residuals (below line) How to Calculate the Residual 1. Calculate the predicted value, by plugging in x to the LSRE. 2. Determine the observed/actual value. 3. Subtract. Calculate the Residual 2. If a student weighs 105 pounds and their backpack weighs 24 pounds, what is the value of the residual? Predicted: ŷ = (105) = Observed: 24 Residual: = The student s backpack weighs pounds less than predicted Calculate the Residual 1. If a student weighs 170 pounds and their backpack weighs 35 pounds, what is the value of the residual? 2. If a student weighs 105 pounds and their backpack weighs 24 pounds, what is the value of the residual? Residual Plots A residual plot is a scatterplot of the residuals against the explanatory variable. Residual plots help us assess how well a regression line fits the data. 11

36 Chapter 3 TI NSpire: Residual Plots 1. Press MENU, 4: Analyze 2. Option 6: Residual, Option 2: Show Residual Plot Interpreting Computer Regression Output Be sure you can locate: the slope, the y intercept and determine the equation of the LSRL. = x = predicted... x = explanatory variable Interpreting Residual Plots A residual plot magnifies the deviations of the points from the line, making it easier to see unusual observations and patterns. 1) The residual plot should show no obvious patterns 2) The residuals should be relatively small in size. A valid residual plot should look like the night sky with approximately equal amounts of positive and negative residuals. Pattern in residuals Linear model not appropriate r 2: Coefficient of Determination r 2 tells us how much better the LSRL does at predicting values of y than simply guessing the mean y for each value in the dataset. In this example, r 2 equals 60.6%. 60.6% of the variation in pack weight is explained by the linear relationship with bodyweight. (Insert r 2 )% of the variation in y is explained by the linear relationship with x. 1. Should You Use LSRL? 2. Interpret r 2 Interpret in a sentence (how much variation is accounted for?) 1. r 2 = 0.875, x= hours studied, y= SAT score 2. r 2 = 0.523, x= hours slept, y= alertness score 12

37 Chapter 3 Interpret r 2 Answers: % of the variation in SAT score is explained by the linear relationship with the number of hours studied % of the variation in alertness score is explained by the linear relationship with the number of hours slept. S: Standard Deviation of the Residuals If we use a least squares regression line to predict the values of a response variable y from an explanatory variable x, the standard deviation of the residuals (s) is given by S represents the typical or average error (residual). Positive = UNDER predicts Negative = OVER predicts S: Standard Deviation of the Residuals Self Check Quiz! The data is a random sample of 10 trains comparing number of cars on the train and fuel consumption in pounds of coal. What is the regression equation? Be sure to define all variables. What is r 2 telling you? Define and interpret the slope in context. Does it have a practical interpretation? Define and interpret the y intercept in context. What is s telling you? 1. Identify and interpret the standard deviation of the residual. S: Standard Deviation of the Residuals Answer: S= Interpretation: On average, the model under predicts fat gain by kilograms using the least squares regression line. 1. ŷ = x ŷ = predicted fuel consumption in pounds of coal x = number of rail cars % of the varation is fuel consumption is explained by the linear realtionship with the number of rail cars. 3. Slope = With each additional car, the fuel consuption increased by pounds of coal, on average. This makes practical sense. 4. Y interpect = When there are no cars attached to the train the fuel consuption is pounds of coal. This has no practical intrepretation beacuse there is always at least one car, the engine. 5. S= On average, the model over predicts fuel consumption by pounds of coal using the least squares regression line. 13

38 Chapter 3 Extrapolation We can use a regression line to predict the response ŷ for a specific value of the explanatory variable x. The accuracy of the prediction depends on how much the data scatter about the line. Exercise caution in making predictions outside the observed values of x. Correlation and Regression Limitations The distinction between explanatory and response variables is important in regression. Extrapolation is the use of a regression line for prediction far outside the interval of values of the explanatory variable x used to obtain the line. Such predictions are often not accurate. Outliers and Influential Points An outlier is an observation that lies outside the overall pattern of the other observations. An observation is influential for a statistical calculation if removing it would markedly change the result of the calculation. Points that are outliers in the x direction of a scatterplot are often influential for the least squares regression line. Note: Not all influential points are outliers, nor are all outliers influential points. Correlation and Regression Limitations Correlation and regression lines describe only linear relationships. NO!!! Outliers and Influential Points Correlation and Regression Limitations Correlation and least squares regression lines are not resistant. The left graph is perfectly linear. In the right graph, the last value was changed from (5, 5) to (5, 8) clearly influential, because it changed the graph significantly. However, the residual is very small. 14

39 Chapter 3 Correlation and Regression Wisdom Association Does Not Imply Causation An association between an explanatory variable x and a response variable y, even if it is very strong, is not by itself good evidence that changes in x actually cause changes in y. A serious study once found that people with two cars live longer than people who only own one car. Owning three cars is even better, and so on. There is a substantial positive correlation between number of cars x and length of life y. Why? Calculate the Least Squares Regression Line Some people think that the behavior of the stock market in January predicts its behavior for the rest of the year. Take the explanatory variable x to be the percent change in a stock market index in January and the response variable y to be the change in the index for the entire year. We expect a positive correlation between x and y because the change during January contributes to the full year s change. Calculation from data for an 18 year period gives Mean x =1.75 % S x = 5.36% Mean y = 9.07% S y = 15.35% r = Find the equation of the least squares line for predicting full year change from January change. Show your work. The Role of r 2 in Regression The standard deviation of the residuals gives us a numerical estimate of the average size of our prediction errors. Additional Calculations & Proofs The coefficient of determination r 2 is the fraction of the variation in the values of y that is accounted for by the leastsquares regression line of y on x. We can calculate r 2 using the following formula: In practicality, just square the correlation r. Least Squares Regression Line We can use technology to find the equation of the leastsquares regression line. We can also write it in terms of the means and standard deviations of the two variables and their correlation. Equation of the least squares regression line We have data on an explanatory variable x and a response variable y for n individuals. From the data, calculate the means and standard deviations of the two variables and their correlation. The least squares regression line is the line ŷ = a + bx with slope and y intercept Accounted for Error If we use the LSRL to make our predictions, the sum of the squared residuals is SSE = SSE/SST = /83.87 r 2 = % of the variation in backpack weight is accounted for by the linear model relating pack weight to body weight. 15

40 Chapter 3 Unaccounted for Error SSE/SST = 30.97/83.87 SSE/SST = If we use the mean backpack weight as our prediction, the sum of the squared residuals is SST = Therefore, 36.8% of the variation in pack weight is unaccounted for by the least squares regression line. Interpreting a Regression Line Consider the regression line from the example (pg. 164) Does Fidgeting Keep You Slim? Identify the slope and y intercept and interpret each value in context. The slope b = tells us that the amount of fat gained is predicted to go down by kg for each added calorie of NEA. The y-intercept a = kg is the fat gain estimated by this model if NEA does not change when a person overeats. 16

41 Chapter 4 4.1: Samples & Surveys How do we gather data? Surveys Opinion polls Interviews Studies Observational Retrospective (past) Experiments Section 4.1 Samples and Surveys After this section, you should be able to IDENTIFY the population and sample in a sample survey IDENTIFY voluntary response samples and convenience samples DESCRIBE how to use a table of random digits to select a simple random sample (SRS) DESCRIBE simple random samples, stratified random samples, and cluster samples EXPLAIN how undercoverage, nonresponse, and question wording can lead to bias in a sample survey The Idea of a Sample Survey Step 1: Define the population we want to describe. Step 2: Say exactly what we want to measure. A sample survey is a study that uses an organized plan to choose a sample that represents some specific population. Step 3: Decide how to choose a sample from the population. Populations and Samples The population in a statistical study is the entire group of individuals about which we want information. A sample is the part of the population from which we actually collect information. We use information from a sample to draw conclusions about the entire population. Population Sample Collect data from a representative Sample... Make an Inference about the Population. Sampling Design Sampling Design: method used to choose the sample from the population Types of Samples: Simple Random Sample Stratified Random Sample Systematic Random Sample Cluster Sample Multistage Sample 1

42 Chapter 4 Simple Random Sample (SRS) Consist of n individuals from the population chosen in such a way that every individual has an equal chance of being selected every set of n individuals has an equal chance of being selected Table of Random Digits A table of random digits is a long string of the digits 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 with these properties: Each entry in the table is equally likely to be any of the 10 digits 0 9. The entries are independent of each other. That is, knowledge of one part of the table gives no information about any other part. How to Choose an SRS Using Table D Step 1: Label. Give each member of the population a numerical label of the same length. Step 2: Table. Read consecutive groups of digits of the appropriate length from Table D. Your sample contains the individuals whose labels you find. SRS Use Table D at line 130 to choose an SRS of 4 hotels. Advantages Unbiased Easy Disadvantages Large variance/high variability May not be representative Must be able to identify entire population 01 Aloha Kai 08 Captiva 15 Palm Tree 22 Sea Shell 02 Anchor Down 09 Casa del Mar 16 Radisson 23 Silver Beach 03 Banana Bay 10 Coconuts 17 Ramada 24 Sunset Beach 04 Banyan Tree 11 Diplomat 18 Sandpiper 25 Tradewinds 05 Beach Castle 12 Holiday Inn 19 Sea Castle 26 Tropical Breeze 06 Best Western 13 Lime Tree 20 Sea Club 27 Tropical Shores 07 Cabana 14 Outrigger 21 Sea Grape 28 Veranda Our SRS of 4 hotels for the editors to contact is: 05 Beach Castle, 16 Radisson, 17 Ramada, and 20 Sea Club. Methods of Selecting an SRS Draw names from a hat Assign each person in the group and randomly generate chosen numbers Ways to randomly generate numbers Computer Random Table of Digits Calculator A university s financial aid office wants to know how much it can expect students to earn from summer employment. This information will be used to set the level of financial aid. The population contains 478 students who have completed at least one year of study but have not yet graduated. A questionnaire will be sent to an SRS of 100 of these students, drawn from an alphabetized list. Starting at line 135, select the first three students in the sample

43 Chapter 4 Stratified Random Sample Population is divided into homogeneous (alike) groups called strata Strata 1: Seniors Strata 2: Juniors SRS s are pulled from each strata Helps control for lurking variables Common Strata What are some common stratas in the following areas? Politics School Stratified Random Sample Systematic Random Sample Pick a method of identifying subjects randomly before starting Requires strict adherence Example: Suppose a supermarket wants to study buying habits of their customers, then using systematic sampling they can choose every 10th or 15th customer entering the supermarket and conduct the study on this sample. Stratified Random Sample Advantages Disadvantages More precise Difficult to do if you must unbiased estimator divide stratum than SRS Formulas for SD & Less variability confidence intervals are Cost reduced if strata more complicated already exists Cluster Sample Based upon location Randomly pick a location & sample all there Examples: All houses on a certain block All houses in a specific zip code All students at specific schools in MDCPS All students in specific homeroom classes 3

44 Chapter 4 Cluster Samples Identify the Sampling Design Advantages Unbiased Cost is reduced Disadvantages Clusters may not be representative of population Formulas are complicated 1)The Educational Testing Service (ETS) needed a sample of colleges. ETS first divided all colleges into groups of similar types (small public, small private, etc.) Then they randomly selected 3 colleges from each group. Multistage Sample At least two separate levels/stages of SRS. Example: Stage 1: Juniors vs. Seniors Stage 2: Divide the above groups (Juniors and Seniors) by AP, Regular and Honors.select 10 for each of the groups for a total of 60. Identify the Sampling Design 2) A county commissioner wants to survey people in her district to determine their opinions on a particular law up for adoption. She decides to randomly select blocks in her district and then survey all who live on those blocks. Sampling at a School Assembly Describe how you would use the following sampling methods to select 80 students to complete a survey. (a) Simple Random Sample (b) Stratified Random Sample (c) Cluster Sample Identify the Sampling Design 3) A local restaurant manager wants to survey customers about the service they receive. Each night the manager randomly chooses a number between 1 & 10. He then gives a survey to that customer, and to every 10 th customer after them, to fill it out before they leave. 4

45 Chapter 4 How would you do it? Ms. Garcia is determining what classes to offer next school year at ATM. She wants to conduct a survey of students to help determine course offerings (electives, Dual Enrollment, AP, regular, honors, etc.). Design a sampling method to help Ms. Garcia accurately and fairly survey a representative sample of the entire school population. Sources of Error in Sample Surveys Undercoverage occurs when some groups in the population are left out of the process of choosing the sample. Nonresponse occurs when an individual chosen for the sample can t be contacted or refuses to participate. A systematic pattern of incorrect responses in a sample survey leads to response bias (wanting to look cool, not wanting to be a prude, etc.). The wording of questions is the most important influence on the answers given to a sample survey. Voluntary response bias occurs when participation is optional. Usually only people with strong opinions respond. Inference for Sampling The purpose of a sample is to give us information about a larger population. The process of drawing conclusions about a population on the basis of sample data is called inference. Why should we rely on random sampling? 1)To eliminate bias in selecting samples from the list of available individuals. 2)The laws of probability allow trustworthy inference about the population Results from random samples come with a margin of error that sets bounds on the size of the likely error. Larger random samples give better information about the population than smaller samples. Errors?! How much do you weigh? Will you not vote for President Obama s reelection? Why should guns be outlawed? How often do you exercise? How many cigarettes do you smoke each week? How often should Mrs. Daniel give quizzes? Errors in Surveys 4.2: Experiments 5

46 Chapter 4 Section 4.2 Experiments After this section, you should be able to DISTINGUISH observational studies from experiments DESCRIBE the language of experiments APPLY the three principles of experimental design DESIGN comparative experiments utilizing completely randomized designs and randomized block designs, including matched pairs design Survey: Asking students about how many hours they studied for the SAT and their resulting scores. Experiment: Selecting a group of same IQ students and assigning each student a different random number of hours to studying for the SAT. The student is ONLY allowed to study the mandated amount of hours. Then, compare their result scores Observational Study vs. Experiment An observational study observes individuals and measures variables of interest but does not attempt to influence the responses. An experiment deliberately imposes some treatment on individuals to measure their responses. ***When our goal is to understand cause and effect, experiments are the only source of fully convincing data.*** Lurking & Confounding Variables A lurking variable is a variable that is not among the explanatory or response variables in a study but that may influence the response variable. Lurking = not included. A confounding variable is one whose effects on the response variable cannot be distinguished from one or more of the explanatory variables in the study. Confounding = included. Experiments SAT Survey vs. SAT Experiment Describe a survey and an experiment that can be used to determine the relationship between SAT scores and hours studied? Confounding Variables Confounding refers to a problem that can arise in an experiment, when there is another variable that may effect the response and is in some way tied together with the factor under investigation, leaving us unable to tell which of the two variables (or perhaps some interaction) caused the observed response. 6

47 Chapter 4 Confounding Variables For example, we plant tomatoes in a garden that's halfshaded. We test a fertilizer by putting it on the plants in the sun and apply none to the shaded plants. Months later the fertilized plants bear more and better tomatoes. Why? Well, maybe it's the fertilizer, maybe it's the sun, maybe we need both. We're unable to conclude that the fertilizer works because any effect of fertilizer is confounded with any effect of the extra sunshine. The Randomized Comparative Experiment The remedy for confounding is to perform a comparative experiment in which some units receive one treatment and similar units receive another. Most well designed experiments compare two or more treatments. Comparison alone isn t enough, if the treatments are given to groups that differ greatly, bias will result. The solution to the problem of bias is random assignment. In an experiment, random assignment means that experimental units are assigned to treatments at random, that is, using some sort of chance process. Examples What s Lurking?! 1. As shoe size increases so does reading ability. 2. An increase in ice cream consumption equals an increase in the number of drowning deaths for a given period. The Randomized Comparative Experiment In a completely randomized design, the treatments are assigned to all the experimental units completely by chance. Some experiments may include a control group that receives an inactive treatment or an existing baseline treatment. Experimental Units Random Assignment Group 1 Treatment 1 Compare Results Group 2 Treatment 2 A high school regularly offers a review course to prepare students for the SAT. This year, budget cuts will allow the school to offer only an online version of the course. Over the past 10 years, the average SAT score of students in the classroom course was The online group gets an average score of That s roughly 10% higher than the long time average for those who took the classroom review course. Is the online course more effective? Is there a lurking variable? Is there a confounding variable? The Language of Experiments Experimental Units: smallest collection of individuals to which treatments are applied. When the units are human beings, they often are called subjects. Factors: General name for explanatory variables in an experiment (multi vitamin regime). Treatment: a specific condition (given vitamin A vs. vitamin B; time frame vitamin taken) applied to the individuals in an experiment. 7

48 Chapter 4 Factor v. Treatment A factor is a specific type or category of treatments. Whereas the specific different treatments constitute levels of a factor. For example, three different groups of runners are subjected to different training methods. Experimental units runners Factor Training methods Treatments Specific type of workout: Speed, strength training and distance workouts Factor = General Group Treatment = Specific Implementation Three Principles of Experimental Design 1. Control for lurking variables that might affect the response: Use a comparative design and ensure that the only systematic difference between the groups is the treatment administered. 2. Random assignment: Use impersonal chance to assign experimental units to treatments. This helps create roughly equivalent groups of experimental units by balancing the effects of lurking variables that aren t controlled on the treatment groups. 3. Replication: Use enough experimental units in each group so that any differences in the effects of the treatments can be distinguished from chance differences between the groups. A cookie manufacturer is trying to determine how long cookies stay fresh on store shelves, and the extent to which the type of packaging and the store s temperature influences how long the cookies stay fresh. He designs a completely randomized experiment involving low (64 Fº) and high (75 Fº) temperatures and two types of packaging plastic and waxed cardboard. List the experimental units, factors, and treatments in this experiment. Specific Types of Experimental Design Double Blind Single Blind Matched Pairs Block Design Experimental units: packages of cookies. Factors: Temperature and packaging. Treatments: Low temp and plastic, high temp and plastic, low temp and waxed cardboard, high temp and waxed cardboard. Double Blind In a double blind experiment, neither the subjects nor the experimenters know which treatment a subject received. 8

49 Chapter 4 Matched Pair Design In a matched pair design, subjects are paired by matching common important attributes. Some times the results are a pre test and post test with the unit being matched to itself. Block Design A block is a group of experimental units or subjects that are known before the experiment to be similar in some way that is expected to affect the response to the treatments. In a block design, the random assignment of units to treatments is carried out separately within each block. Helps control for lurking variables. Matched Pair Design Example: Tire wear and tear. Put one set of tires on the left side of the car and a different set on the right side of the car. This would help control the lurking variable of different driving styles (between teenage boys vs. teachers) and mileage driven. Block Design Experiments are often blocked by Age Gender Race Achievement Level (Regular, Honors, AP, IQ level, etc.) 9

50 Chapter 4 Inference for Experiments An observed effect so large that it would rarely occur by chance is called statistically significant. A statistically significant association in data from a well designed experiment does imply causation. 10

51 Chapter 5 5.1: Randomness, Probability and Simulation The Law of Large Numbers The law of large numbers says that if we observe more and more repetitions of any chance process, the proportion of times that a specific outcome occurs approaches a single value. Section 5.1 Randomness, Probability and Simulation After this section, you should be able to DESCRIBE the idea of probability DESCRIBE myths about randomness DESIGN and PERFORM simulations Myths about Randomness The myth of short run regularity: The idea of probability is that randomness is predictable in the long run (1 million plus occurrences). Probability does not allow us to make short run predictions. The myth of the law of averages : Probability tells us random behavior evens out in the long run. Future outcomes are not affected by past behavior. Women have a 50% of having a boy with each pregnancy; the gender of any previous children do not matter! The Idea of Probability Chance behavior is unpredictable in the short run, but has a regular and predictable pattern in the long run. The probability of any outcome of a chance process is a number between 0 (never occurs) and 1 (always occurs) that describes the proportion of times the outcome would occur in a very long series of repetitions. Performing a Simulation The imitation of chance behavior, based on a model that accurately reflects the situation, is called a simulation. Simulations are usually done with a table of random digits, calculator random number generator (RandInt) or computer software. State: Identify the probability calculation at interest. Plan: Describe how to use a chance device/tool to implement one repetition of the process. Explain clearly how to identify the outcomes of the chance process. Do: Perform many (at least 20) repetitions of the simulation. Conclude: Use the results of your simulation to answer the question of interest, in context. 1

52 Chapter 5 Performing a Simulation For Example: What is the probability that a student earns an 80% on a true/false quiz written in Chinese? (Assume the exam taker does not know any Chinese). Should the instructor be concerned about cheating? How can we simulate the probability of guessing 80% correct on a True/False quiz? The Golden Ticket At a local high school, 95 students have permission to park on campus. Each month, the student council holds a golden ticket parking lottery at a school assembly. The two lucky winners are given reserved parking spots next to the school s main entrance. Last month, the winning tickets were drawn by a student council member from the AP Statistics class. When both golden tickets went to members of that same class, some people thought the lottery had been rigged. There are 28 students in the AP Statistics class, all of whom are eligible to park on campus. Design and carry out a simulation to decide whether it s plausible that the lottery was carried out fairly. **See 5.1 WS Required Elements: State must include: Identify variable Statement of probability in symbols or words. Plan must include: What tool? What values are you assigning? How many values are you picking each time? How many times are you conducting the simulation? What about repeat digits or ignored digits? What are you recording? STATE: What is the probability that the lottery would result in two winners from the AP Stats class? P (X=2), where x is the number of winners from AP Stats Required Elements: Do must include: Simulation data, if number of trials is 20 or less Summary of data for larger trials Conclude must include: Statement of probability Answer to question Usually about being surprised/reasonable/expected, etc. PLAN: Using the table of random digits, we will randomly assign each student a two digit number from 01 to 95. We ll label the students in the AP Statistics class from 01 to 28, and the remaining students from 29 to 95. (Numbers from 96 to 00 will be skipped.) Starting at the randomly selected row 139 and moving left to right across the row, we ll look at pairs of digits until we come across two different values from 01 to 95. These two values will represent the two students with these labels will win the prime parking spaces. We will record whether both winners are members of the AP Statistics class (Yes or no). We will conduct the simulation 18 times. 2

53 Chapter 5 Required Elements: Plan must include: What tool? Table of Random of Digits, Calculator Random Number Generator (RandInt), etc. What values are you assigning? 01 to 95 How many values are you picking each time? 2 values How many times are you conducting the simulation? 18 times What about repeat digits or ignored digits? Ignore repeat digits within a single draw What are you recording? Yes for both AP Stats. NASCAR In an attempt to increase sales, a breakfast cereal company decides to offer a NASCAR promotion. Each box of cereal will contain a collectible card featuring one of these NASCAR drivers: Jeff Gordon, Dale Earnhardt, Jr., Tony Stewart, Danica Patrick, or Jimmie Johnson. The company says that each of the 5 cards is equally likely to appear in any box of cereal. A NASCAR fan decides to keep buying boxes of the cereal until she has all 5 drivers cards. She is surprised when it takes her 23 boxes to get the full set of cards. Should she be surprised? Design and carry out a simulation to help answer this question. DO: Students Labels AP Statistics Class Other Skip numbers from Reading across row 139 in Table D, look at pairs of digits until you see two different labels from Record whether or not both winners are members of the AP Statistics Class. STATE: What is the probability of needing to buy 23 or more cereal boxes to obtain one card from each driver? X X X X X X X Sk X X X X X X X X No No No No No No No No No Sk X X X X X X X X X Sk X Sk Yes No No No No No Yes No Yes CONCLUDE: Based on 18 repetitions of our simulation, both winners came from the AP Statistics class 3 times, so the probability is estimated as 16.67%. Therefore is definitely possible for two AP Stats students to be selected in a fair drawing. PLAN: Using the calculator's random number generator (RandInt) we are going to simulate 50 trials. We will assign each driver a unique number 1 through 5. We will record how many trials it takes to get all five values (drivers). We will record the total number of digits required each time. Driver Label Jeff Gordon 1 Dale Earnhardt, Jr. 2 Tony Stewart 3 Danica Patrick 4 Jimmie Johnson 5 3

54 Chapter 5 DO: Dotplot of 50 Trials Section 5.2 Probability Rules After this section, you should be able to DESCRIBE chance behavior with a probability model DEFINE and APPLY basic rules of probability DETERMINE probabilities from two way tables CONSTRUCT Venn diagrams and DETERMINE probabilities CONCLUDE: We never had to buy more than 22 boxes to get the full set of cards in 50 repetitions of our simulation. Our estimate of the probability that it takes 23 or more boxes to get a full set of driver is roughly 0. Therefore, she should be surprised that it took 23 cereal box purchases to find all 5 driver cards. Basic Rules of Probability The probability of any event is a number between 0 and 1. All possible outcomes sum to 1. If all outcomes in a sample space ( ex: rolling a single dice) are equally likely, the probability the event A occurs can be found using the formula: P(A) = The probability that an event does not occur is 1 minus the probability the event does occur. 5.2: Probability Rules Probability Models The sample space S of a chance process is the set of all possible outcomes. A probability model is a description of some chance process that consists of two parts: a sample space S and a probability for each outcome. Example of Coin Toss: Sample Space: Either heads or tails. Probability: Heads (0.5) and Tails (0.5) 4

55 Chapter 5 Probability Models Venn Diagram, Tree Diagram, List, Chart, etc. Probability Models Event: Rolling a sum of 5 with 2 dice or P(A)= sum of 5 Event Space: There are 4 different combination of dice rolls that sum to 5. Solution: Since each outcome has probability 1/36: P(A) = 4/36 or 1/9. Probability Models Probability models allow us to find the probability of any collection of outcomes. An event is any collection of outcomes from some chance process. That is, an event is a subset of the sample space. Events are usually designated by capital letters, like A, B, C, and so on. Specific event examples: Flipping 3 heads in a row Rolling two dice that sum to 5 What type of Pizza do you like? Meat Veggies Veggies and meat Neither (cheese) ***There are NO other choices at Mrs. Daniel's pizzeria*** Sample Space: Rolling Two Dice The probability model for the chance process of rolling two fair, six sided dice one that s red and one that s green. Meat Meat & Veggies Veggies Sample Space 36 Outcomes Since the dice are fair, each outcome is equally likely. Each outcome has probability 1/36. Neither (Cheese) 5

56 Chapter 5 What is the probability that a randomly selected student Likes meat Likes veggies Likes veggies and meat Likes neither (cheese) Like either veggies or meat Mutually Exclusive Two events that cannot occur at the same time. There are no common outcomes. Student is EITHER a Junior or Senior Intersection Probability of both events occurring. For example: A = likes salad, B = likes meat, therefore P(A and B) = likes both salad and meat Complement The event that did not occur.not A A= airplane takes off on time A c = airplane does not take off on time Calculating Probabilities Complement of A Complement = 1 P(A) Mutually Inclusive (A or B) P(A or B) = P(A) + P(B) P(A and B) Intersection (A and B) P(A and B) 6

57 Chapter AP Statistics Exam Scores Probabilities: Score Probability (a) Is this a legitimate probability model? Justify. (b) Find the probability that the chosen student scored 3 or better. Distance learning courses are rapidly gaining popularity among college students. Randomly select an undergraduate student who is taking distance learning courses for credit and record the student s age. Here is the probability model: Age group (yr): 18 to to to or over Probability: (a)is this a legitimate probability model? Justify. Each probability is between 0 and 1 and = 1 (b)find the probability that the chosen student is not in the traditional college age group (18 to 23 years). P(not 18 to 23 years) = 1 P(18 to 23 years) = = AP Statistics Exam Scores Probabilities: Score Probability (a) Is this a legitimate probability model? Justify. Each probability is between 0 and 1 and the sum of the probabilities : = 1. (b) Find the probability that the chosen student scored 3 or better. The probability of scoring a 3 or better: = What is the relationship between educational achievement and home ownership? A random sample of 500 people and each member of the sample was identified as a high school graduate (or not) and as a home owner (or not). The two way table displays the data. High School Graduate Not a High School Graduate Total Homeowner Not a Homeowner Total What is the probability that a randomly selected person (a) is a high school graduate (b) is a high school graduate and owns a home (c) is a high school graduate or owns a home Online learning courses are rapidly gaining popularity among college students. Randomly select an undergraduate student who is taking online learning courses for credit and record the student s age. Here is the probability model: Age group (yr): 18 to to to or over Probability: (a) Is this a legitimate probability model? Justify. (b)find the probability that the chosen student is not in the traditional college age group (18 to 23 years). What is the relationship between educational achievement and home ownership? A random sample of 500 people and each member of the sample was identified as a high school graduate (or not) and as a home owner (or not). The two way table displays the data. High School Graduate Not a High School Graduate Total Homeowner Not a Homeowner Total What is the probability that a randomly selected person (a) is a high school graduate = 310/500 (b) is a high school graduate and owns a home = 221/500 (c) is a high school graduate or owns a home = = 429/500 7

58 Chapter 5 5.3: Conditional Probability and Independence Basic Probability Assume a spinner has 8 equal sized sections; each section is numbered a unique number from 1 to 8. A. What is the probability of getting an even number? 4/8 or 1/2 B. What is the probability of getting a prime number? 5/8 C. What is the probability of getting a multiple of 3? 2/8 or 1/4 After this section, you should be able to DEFINE conditional probability COMPUTE conditional probabilities DESCRIBE chance behavior with a tree diagram DEFINE independent events DETERMINE whether two events are independent APPLY the general multiplication rule to solve probability questions Mixed Probability Assume a spinner has 8 equal sized sections; each section is numbered a unique number from 1 to 8. A. What is the probability of getting 2 even spins in a row? B. What is the probability of getting a prime number or an odd number? C. What is the probability of getting a multiple of 3 or an even spin? Basic Probability Assume a spinner has 8 equal sized sections; each section is numbered a unique number from 1 to 8. A. What is the probability of getting an even number? B. What is the probability of getting a prime number? C. What is the probability of getting a multiple of 3? Mixed Probability Assume a spinner has 8 equal sized sections; each section is numbered a unique number from 1 to 8. A. What is the probability of getting 2 even spins in a row? 1/4 B. What is the probability of getting a prime number or an odd number? 5/8 C. What is the probability of getting a multiple of 3 or an even spin? 5/8 8

59 Chapter 5 What is Conditional Probability? When we are trying to find the probability that one event will happen under the condition that some other event is already known to have occurred, we are trying to determine a conditional probability. The probability that one event happens given that another event is already known to have happened is called a conditional probability. Suppose we know that event A has happened. Then the probability that event B happens given that event A has happened is denoted by P(B A). Read as given that or under the condition that Calculate the following conditional probabilities: 1. P = 19/90 2. P = 4/88 3. P = 84/103 Example: Grade Distributions E: the grade comes from an EPS course, and L: the grade is lower than a B. Total Total Who Reads the Newspaper? Residents of a large apartment complex can be classified based on the events A: reads USA Today and B: reads the New York Times. What is the probability that a randomly selected resident who reads USA Today also reads the New York Times? Find P(L) Find P(E L) Find P(L E) P(L) = 3656 / = P(E L) = 800 / 3656 = P(L E) = 800 / 1600 = Who Reads the Newspaper? Residents of a large apartment complex can be classified based on the events A: reads USA Today and B: reads the New York Times. What is the probability that a randomly selected resident who reads USA Today also reads the New York Times? Calculate the following conditional probabilities: 1. P 2. P 3. P There is a 12.5% chance that a randomly selected resident who reads USA Today also reads the New York Times. 9

60 Chapter 5 Conditional Probability and Independence When knowledge that one event has happened does not change the likelihood that another event will happen, we say the two events are independent. Two events A and B are independent if the occurrence of one event has no effect on the chance that the other event will happen. In other words, events A and B are independent if: P(A B) = P(A) OR P(B A) = P(B). Are these events independent? Earns A in AP Stats 1. Junior and AP Calc? 2. Senior and AP Stats? Earns A in AP Calc Total Junior Senior Total Conditional Probability and Independence P(A B) = P(A) OR P(B A) = P(B). Are the events male and left handed independent? A: left handed B: male Are these events independent? Earns A in AP Stats Earns A in AP Calc Total Junior Senior Total Junior and AP Calc? P = 7/16 ; P(Junior)= 12/33 Since the values are not equal, the events are not independent. 2. Senior and AP Stats? P (Senior Stats) = 12/17 ; P(Senior)= 21/33 Since the values are not equal, the events are not independent. Conditional Probability and Independence P(A B) = P(A) OR P(B A) = P(B). Are the events male and lefthanded independent?. A: left handed B: male General Multiplication Rule The probability that events A and B both occur can be found using the general multiplication rule P(A B) = P(A) P(B A) where P(B A) is the conditional probability that event B occurs given that event A has already occurred. P(left-handed male) = 3/23 = 0.13 P(left-handed) = 7/50 =

61 Chapter 5 Tree Diagrams Tree Diagrams are best for events that follow each other, events that happen multiple times or events that are logically related (example: graduate high school first, then attend college OR having cancer, then testing positive). Example: Teens with Online Profiles The Pew Internet and American Life Project finds that 93% of teenagers (ages 12 to 17) use the Internet, and that 55% of online teens have a Facebook profile. What percent of teens are online and have a Facebook profile? 51.15% of teens are online and have posted a profile. Tree Diagrams Consider flipping a coin twice. What is the probability of getting two heads? Sample Space: HH HT TH TT So, P(two heads) = P(HH) = 1/4 Consecutive Probability Assume a spinner has 8 equal sized sections; each section is numbered a unique number from 1 to 8. You spin the spinner three times. A. What is the probability of getting at least two even spins? B. What is the probability of getting a prime number exactly twice? C. What is the probability of getting a multiple of 3 or an even spin only once? Example: Teens with Online Profiles The Pew Internet and American Life Project finds that 93% of teenagers (ages 12 to 17) use the Internet, and that 55% of online teens have a Facebook profile. What percent of teens are online and have a Facebook profile? 11

62 Chapter 5 Internet & YouTube Usage About 27% of adult Internet users are 18 to 29 years old, another 45% are 30 to 49 years old, and the remaining 28% are 50 and over. The Pew Internet and American Life Project finds that 70% of Internet users aged 18 to 29 have visited a video sharing site, along with 51% of those aged 30 to 49 and 26% of those 50 or older. Make a Tree Diagram of the probabilities. Questions on next slide. B. What proportion of adults are 18 to 29 year old Internet users that visit video sharing sites?.27 x.7 =.189 C. What proportion of adults are 30 to 49 year old Internet users that visit video sharing sites?.45 x.51 =.2295 D. What proportion of adults are 50 and over year old Internet users that visit video sharing sites?.28 x 26 =.0728 E. P(video yes 18 to 29) = = P(video yes 30 to 49) = = P(video yes 50 +) = = B. What proportion of adults are 18 to 29 year old Internet users that visit video sharing sites? C. What proportion of adults are 30 to 49 year old Internet users that visit video sharing sites? D. What proportion of adults are 50 and over year old Internet users that visit video sharing sites? P(video yes) = = % of all adult Americans that use the Internet watch videos online. While 49.13% represents a large proportion of the population, it is not a majority, so it is not fair to say most adult American Internet users visit video sharing sites. E. What proportion of all adult Internet users visit videosharing sites? Do most Internet users visit YouTube and/or similar sites? Justify your answer. 12

63 Chapter 5 Special Probability Rules Independence: A Special Multiplication Rule When events A and B are independent, we can simplify the general multiplication rule since P(B A) = P(B). Multiplication rule for independent events If A and B are independent events, then the probability that A and B both occur is P(A B) = P(A) P(B) Example: Following the Space Shuttle Challenger disaster, it was determined that the failure of O-ring joints in the shuttle s booster rockets was to blame. Under cold conditions, it was estimated that the probability that an individual O-ring joint would function properly was Assuming O-ring joints succeed or fail independently, what is the probability all six would function properly? P(joint1 OK and joint 2 OK and joint 3 OK and joint 4 OK and joint 5 OK and joint 6 OK) =P(joint 1 OK) P(joint 2 OK) P(joint 6 OK) =(0.977)(0.977)(0.977)(0.977)(0.977)(0.977) =

64 Chapter 6 6.1: Discrete and Continuous Random Variables Discrete Random Variables A discrete random variable is one which may take on only a countable number of distinct values such as 0, 1, 2, 3, 4,... Discrete random variables are usually (but not necessarily) counts. Examples: number of children in a family the Friday night attendance at a cinema the number of patients a doctor sees in one day the number of defective light bulbs in a box of ten the number of heads flipped in 3 trials Section 6.1 Discrete & Continuous Random Variables After this section, you should be able to APPLY the concept of discrete random variables to a variety of statistical settings CALCULATE and INTERPRET the mean (expected value) of a discrete random variable CALCULATE and INTERPRET the standard deviation (and variance) of a discrete random variable DESCRIBE continuous random variables Probability Distribution The probability distribution of a discrete random variable is a list of probabilities associated with each of its possible values Consider tossing a fair coin 3 times. Define X= the number of heads obtained X = 0: TTT X = 1: HTT THT TTH X = 2: HHT HTH THH X = 3: HHH Value Probability 1/8 3/8 3/8 1/8 Random Variables A random variable, usually written as X, is a variable whose possible values are numerical outcomes of a random phenomenon. There are two types of random variables, discrete random variables and continuous random variables. Rolling Dice: Probability Distribution Roll your pair of dice 20 times, record the sum for each trial. 1

65 Chapter 6 Example: Babies Health at Birth Background details are on page 343. (a)show that the probability distribution for X is legitimate. (b)make a histogram of the probability distribution. Describe the distribution. (c)apgar scores of 7 or higher indicate a healthy baby. What is P(X 7)? Value: Probability: Discrete Random Variables A discrete random variable X takes a fixed set of possible values with gaps between. The probability distribution of a discrete random variable X lists the values x i and their probabilities p i : Value: x 1 x 2 x 3 Probability: p 1 p 2 p 3 The probabilities p i must satisfy two requirements: 1. Every probability p i is a number between 0 and The sum of the probabilities is 1. Example: Babies Health at Birth Background details are on page 343. (a)show that the probability distribution for X is legitimate. (b)make a histogram of the probability distribution. Describe the distribution. (c)apgar scores of 7 or higher indicate a healthy baby. What is P(X 7)? Value: Probability: (a) All probabilities are between 0 and 1 and the probabilities sum to 1. This is a legitimate probability distribution. To find the probability of any event, add the probabilities p i of the particular values x i that make up the event. Describing the (Probability) Distribution When analyzing discrete random variables, we ll follow the same strategy we used with quantitative data describe the shape, center (mean), and spread (standard deviation), and identify any outliers. Example: Babies Health at Birth b. Make a histogram of the probability distribution. Describe what you see. c. Apgar scores of 7 or higher indicate a healthy baby. What is P(X 7)? Value: Probability: (c) P(X 7) =.908 We d have a 91 % chance of randomly choosing a healthy baby. (b) The left skewed shape of the distribution suggests a randomly selected newborn will have an Apgar score at the high end of the scale. While the range is from 0 to 10, there is a VERY small chance of getting a baby with a score of 5 or lower. There are no obvious outliers. The center of the distribution is approximately 8. 2

66 Chapter 6 Mean of a Discrete Random Variable The mean of any discrete random variable is an average of the possible outcomes, with each outcome weighted by its probability. Suppose that X is a discrete random variable whose probability distribution is Value: x 1 x 2 x 3 Probability: p 1 p 2 p 3 To find the mean (expected value) of X, multiply each possible value by its probability, then add all the products: Analyzing Discrete Random Variables on the Calculator 1. Using one variable statistics to calculate: 2. Enter ascre for X1 and freqas for frequency list. Example: Apgar Scores What s Typical? Consider the random variable X = Apgar Score Compute the mean of the random variable X and interpret it in context. Value: Probability: Analyzing Discrete Random Variables on the Calculator Example: Apgar Scores What s Typical? Consider the random variable X = Apgar Score Compute the mean of the random variable X and interpret it in context. Value: Probability: Calculate the Mean (Expected Value) Value Probability The mean Apgar score of a randomly selected newborn is This is the long term average Agar score of many, many randomly chosen babies. Note: The expected value does not need to be a possible value of X or an integer! It is a long term average over many repetitions. 3

67 Chapter 6 Standard Deviation of a Discrete Random Variable The definition of the variance of a random variable is similar to the definition of the variance for a set of quantitative data. To get the standard deviation of a random variable, take the square root of the variance. Suppose that X is a discrete random variable whose probability distribution is Value: x 1 x 2 x 3 Probability: p 1 p 2 p 3 and that µ X is the mean of X. The variance of X is Continuous Random Variables A continuous random variable X takes on all values in an interval of numbers. The probability distribution of X is described by a density curve. The probability of any event is the area under the density curve and above the values of X that make up the event. Example: Apgar Scores How Variable Are They? Consider the random variable X = Apgar Score Compute the standard deviation of the random variable X and interpret it in context. Value: Probability: Variance Continuous Random Variables A continuous random variable is not defined at specific values. Instead, it is defined over an interval of value; however, you can calculate the probability of a range of values. It is very similar to z scores and normal distribution calculations. The standard deviation of X is On average, a randomly selected baby s Apgar score will differ from the mean by about 1.4 units. Continuous Random Variable A continuous random variable is one which takes an infinite number of possible values. Continuous random variables are usually measurements. Examples: height weight the amount of sugar in an orange the time required to run a mile. Example: Young Women s Heights The height of young women can be defined as a continuous random variable (Y) with a probability distribution is N(64, 2.7). A. What is the probability that a randomly chosen young woman has height between 68 and 70 inches? P(68 Y 70) =??? 4

68 Chapter 6 Example: Young Women s Heights The height of young women can be defined as a continuous random variable (Y) with a probability distribution is N(64, 2.7). A. What is the probability that a randomly chosen young woman has height between 68 and 70 inches? P(68 Y 70) =??? 6.2: Transforming and Combining Random Variables P(1.48 Z 2.22) = P(Z 2.22) P(Z 1.48) = = There is about a 5.6% chance that a randomly chosen young woman has a height between 68 and 70 inches. Example: Young Women s Heights The height of young women can be defined as a continuous random variable (Y) with a probability distribution is N(64, 2.7). B. At 70 inches tall, is Mrs. Daniel unusually tall? After this section, you should be able to DESCRIBE the effect of performing a linear transformation on a random variable COMBINE random variables and CALCULATE the resulting mean and standard deviation CALCULATE and INTERPRET probabilities involving combinations of Normal random variables Example: Young Women s Heights The height of young women can be defined as a continuous random variable (Y) with a probability distribution is N(64, 2.7). B. At 70 inches tall, is Mrs. Daniel unusually tall? P(Y 70) =??? P value: Yes, Mrs. Daniel is unusually tall because 98.68% of the population is shorter than her. Linear Transformations on Random Variables 5

69 Chapter 6 Linear Transformations on Random Variables Multiplying (or dividing) each value of a random variable by a number b: Multiplies (divides) measures of center and location (mean, median, quartiles, percentiles) by b. Multiplies (divides) measures of spread (range, IQR, standard deviation) by b. Does not change the shape of the distribution. Linear Transformations Pete charges $150 per passenger. The random variable C describes the amount Pete collects on a randomly selected day. Collected c i Probability p i The mean of C is $ and the standard deviation is $ Note: Multiplying a random variable by a constant b multiplies the variance by b 2. Review: Linear Transformations In Chapter 2, we studied the effects of linear transformations on the shape, center, and spread of a distribution of data. Remember: Compare the shape, center and spread of each distribution. 1. Adding (or subtracting) a constant, a, to each observation: Adds a to measures of center and location. Does not change the shape or measures of spread. 2. Multiplying (or dividing) each observation by a constant, b: Multiplies (divides) measures of center and location by b. Multiplies (divides) measures of spread by b. Does not change the shape of the distribution. Linear Transformations Pete s Jeep Tours offers a popular half day trip in a tourist area. There must be at least 2 passengers for the trip to run, and the vehicle will hold up to 6 passengers. Define X as the number of passengers on a randomly selected day. Passengers x i Probability p i The mean of X is 3.75 and the standard deviation is Linear Transformations on Random Variables Adding the same number a (which could be negative) to each value of a random variable: Adds a to measures of center and location (mean, median, quartiles, percentiles). Does not change measures of spread (range, IQR, standard deviation). Does not change the shape of the distribution. 6

70 Chapter 6 Linear Transformations Consider Pete s Jeep Tours again. We defined C as the amount of money Pete collects on a randomly selected day. Collected c i Probability p i The mean of C is $ and the standard deviation is $ It costs Pete $100 per trip to buy permits, gas, and a ferry pass. The random variable V describes the profit Pete makes on a randomly selected day. Profit v i Probability p i The mean of V is $ and the standard deviation is $ Combining Random Variables Before we can combine random variables, a determination about the independence of each variable from the other must be made. Probability models often assume independence when the random variables describe outcomes that appear unrelated to each other. You should always ask yourself whether the assumption of independence seems reasonable. Compare the shape, center, and spread of the two probability distributions. Bottom Line: Whether we are dealing with data or random variables, the effects of a linear transformation are the same!!! Combining Random Variables Let D = the number of passengers on a randomly selected Delta flight to Atlanta Let A = the number of passengers on a randomly selected trip American Airlines flight to Atlanta Define T = X + Y. Calculate the mean and standard deviation of T. Passengers x i Probability p i Passengers y i Probability p i Combining Random Variables Combining Random Variables Let D = the number of passengers on a randomly selected Delta flight to Atlanta Let A = the number of passengers on a randomly selected trip American Airlines flight to Atlanta Define T = X + Y. Calculate the mean and standard deviation of T. Passengers x i Probability p i Mean µ D = Standard Deviation σ D = Passengers y i Probability p i Mean µ A = 76.1 Standard Deviation σ A =

71 Chapter 6 Combining Random Variables: Mean How many total passengers fly to Atlanta on a randomly selected day? Delta: µ D = American: µ A = Total: = passengers to Atlanta daily For any two random variables X and Y, if T = X + Y, then the expected value of T is E(T) = µ T = µ X + µ Y In general, the mean of the sum of several random variables is the sum of their means. Subtracting Random Variables: Mean Mean of the Difference of Random Variables For any two random variables X and Y, if D = X Y, then the expected value of D is E(D) = µ D = µ X µ Y In general, the mean of the difference of several random variables is the difference of their means. The order of Variance of the Difference of Random Variables subtraction is important! Combining Random Variables: Variance How much variability is there in the total number of passengers who fly to Atlanta on a randomly selected day? (Hint: find the combined variance) Delta: American: Mean µ D = Standard Deviation σ D = Mean µ A = 76.1 Standard Deviation σ A = REMEMBER: Standard Deviations do not add!!! Subtracting Random Variables: Variance Variance of the Difference of Random Variables For any two independent random variables X and Y, if D = X Y, then the variance of D is In general, the variance of the difference of two independent random variables is the sum of their variances. **This was an FRQ on the 2013 exam** Combining Random Variables: Variance Delta = (1.090) 2 American = (0.943) 2 Total Variance = (1.090) 2 + (0.943) 2 = For any two independent random variables X and Y, if T = X + Y, then the variance of T is Combining Normal Random Variables: Calculating Probabilities If a random variable is Normally distributed, we can use its mean and standard deviation to compute probabilities. Important Fact: Any sum or difference of independent Normal random variables is also Normally distributed. In general, the variance of the sum of several independent random variables is the sum of their variances. 8

72 Chapter 6 Combining Normal Random Variables: Calculating Probabilities Mrs. Daniel likes between 8.5 and 9 grams of sugar in her iced coffee. Suppose the amount of sugar in a randomly selected packet follows a Normal distribution with mean 2.17 g and standard deviation 0.08 g. If Mrs. Daniel selects 4 packets at random, what is the probability her iced coffee will taste right? Combining Normal Random Variables: Calculating Probabilities DO, cont.: 4. Calculate z scores. 5. Find p values p values (z = 1.13) = and (z = 2) Final Calculations = CONCLUDE: There is an 84.8% percent chance that Mrs. Daniel s iced coffee will taste right. Combining Normal Random Variables: Calculating Probabilities Combining Normal Random Variables: Calculating Probabilities STATE & PLAN: Let X = the amount of sugar in a randomly selected packet. Then, T = X 1 + X 2 + X 3 + X 4. We want to find P(8.5 T 9). YES, you may use your calculator! Just remember to recalculate the combined mean and standard deviation, before using the calculator!!!! Combining Normal Random Variables: Calculating Probabilities DO: 1. Calculate combined mean µ T = µ X1 + µ X2 + µ X3 +µ X4 = = Calculate combined variance 3. Calculate combined standard deviation. Combining Normal Random Variables: Calculating Probabilities The diameter C of a randomly selected large drink cup at a fastfood restaurant follows a Normal distribution with a mean of 3.96 inches and a standard deviation of 0.01 inches. The diameter L of a randomly selected large lid at this restaurant follows a Normal distribution with mean 3.98 inches and standard deviation 0.02 inches. For a lid to fit on a cup, the value of L has to be bigger than the value of C, but not by more than 0.06 inches. What s the probability that a randomly selected large lid will fit on a randomly chosen large drink cup? 9

73 Chapter 6 Combining Normal Random Variables: Calculating Probabilities STATE & PLAN: We ll define the random variable D = L C to represent the difference between the lid s diameter and the cup s diameter. Our goal is to find P(0.00 < D 0.06). DO: 1. Calculate combined mean. μ D = μ L μ C = = Calculate combined variance (0.02) 2 + (0.01) 2 = Calculate combined standard deviation = Combining Normal Random Variables: Calculating Probabilities Mrs. Daniel and Mrs. Cooper bowl every Tuesday night. Over the past few years, Mrs. Daniel s scores have been approximately Normally distributed with a mean of 212 and a standard deviation of 31. During the same period, Mrs. Cooper s scores have also been approximately Normally distributed with a mean of 230 and a standard deviation of 40. Assuming their scores are independent, what is the probability that Mrs. Daniel scores higher than Mrs. Cooper on a randomly selected Tuesday night? Combining Normal Random Variables: Calculating Probabilities Combining Normal Random Variables: Calculating Probabilities DO, cont.: 4. Calculate z scores: z=... = 0.89 and.. = Find p values: pvalues (z = 0.89) = and (z = 1.79) = Final calculations: = Combining Normal Random Variables: Calculating Probabilities Combining Normal Random Variables: Calculating Probabilities CONCLUDE: We predict that the lids will fit properly 77.66% of the time. This means the lids will not fit properly more than 22% of the time. That is annoying! CONCLUDE: There is a 35.94% chance that Mrs. Daniel will score higher than Mrs. Cooper on any given night. 10

74 Chapter 6 Mixed Practice: ACT Scores Leona and Fred are friendly competitors in high school. Both are about to take the ACT college entrance examination. They agree that if one of them scores 5 or more points better than the other, the loser will buy the winner a pizza. Suppose that in fact Fred and Leona have equal ability, so that each score varies Normally with mean 24 and standard deviation 2. (The variation is due to luck in guessing and the accident of the specific questions being familiar to the student.) The two scores are independent. What is the probability that the scores differ by 5 or more points in either direction? Toothpaste New mean: 0.78 New standard deviation: Normcdf (0.85,, 0.78, 0.049) = ACT Scores New mean: 0 New standard deviation: Normal cdf (, 5, 0, ) = normcdf(5,, 0, ) = = = : Binomial and Geometric Random Variables Mixed Practice: Toothpaste Mr. Daniel is traveling for his business. He has a new 0.85 ounce tube of toothpaste that s supposed to last him the whole trip. The amount of toothpaste Mr. Daniel squeezes out of the tube each time he brushes varies according to a Normal distribution with mean 0.13 ounces and standard deviation 0.02 ounces. If Mr. Daniel brushes his teeth six times during the trip, what s the probability that he ll be cranky because he ran out of toothpaste? After this section, you should be able to DETERMINE whether the conditions for a binomial setting are met COMPUTE and INTERPRET probabilities involving binomial random variables CALCULATE the mean and standard deviation of a binomial random variable and INTERPRET these values in context CALCULATE probabilities involving geometric random variables 11

75 Chapter 6 Binomial Settings A binomial setting arises when we perform several independent trials of the same chance process and record the number of times that a particular outcome occurs. The four conditions for a binomial setting are B I N S Binary? The possible outcomes of each trial can be classified as success or failure. Independent? Trials must be independent; that is, knowing the result of one trial must not have any effect on the result of any other trial. Number? The number of trials n of the chance process must be fixed in advance. Success? On each trial, the probability p of success must be the same. Find the mean and standard deviation of X. X is a binomial random variable with parameters n = 21 and p = 1/3. Binomial Random Variable Consider tossing a coin n times. Each toss gives either heads or tails. Knowing the outcome of one toss does not change the probability of an outcome on any other toss. If we define heads as a success, then p is the probability of a head and is 0.5 on any toss. The number of heads in n tosses is a binomial random variable X. The probability distribution of X is called a binomial distribution. Find the mean and standard deviation of X. X is a binomial random variable with parameters n = 21 and p = 1/3. Count the number of successes in a predetermined number of trials! Binomial Distribution: Mean and Standard Deviation If a count X has the binomial distribution with number of trials n and probability of success p, the mean and standard deviation of X are Binomial Distribution: Describe We describe the probability distribution of a binomial random variable just like any other distribution shape, center, and spread. Consider the probability distribution of X = number of children with type O blood in a family with 5 children. x i p i Note: These formulas work ONLY for binomial distributions. They can t be used for other distributions! 12

76 Chapter 6 Binomial Distribution: Describe x i p i Shape: The probability distribution of X is skewed to the right. It is more likely to have 0, 1, or 2 children with type O blood than a larger value. Center: The median number of children with type O blood is 1. The mean is Spread: The variance of X is and the standard deviation is Binomial Probabilities CHECK CONDITIONS: Binary: Yes. Type O blood = yes and not type O blood = no. There are only two options. Independent: Stated. Number: Yes. The number of trials is stated as 5. Success: Yes. The probability of success is the same on each attempt, p = Calculator: Binomial Probability MENU, 6: Statistics, 5: Distributions D: binompdf E: binomcdf Binompdf calculates equal to value For PERCISE numbers Binomial Probabilities Using your calculator: Binompdf, enter the following information: Trials: 5 P:.25 X value: 2 Answer: We are using binompdf in this example because we want the precise probability of 2. CONCLUDE: There is a 26.37% chance that the family will have two children with type O blood. Binomial Probabilities Each child of a particular pair of parents has probability 0.25 of having type O blood. Genetics says that children receive genes from each of their parents independently. If these parents have 5 children, the count X of children with type O blood is a binomial random variable with n = 5 trials and probability p = 0.25 of a success on each trial. In this setting, a child with type O blood is a success (S) and a child with another blood type is a failure (F). What s P(X = 2)? Inheriting Blood Type Each child of a particular pair of parents has probability 0.25 of having blood type O. Suppose the parents have 5 children. (a) Find the probability that exactly 3 of the children have type O blood. (b) Should the parents be surprised if more than 3 of their children have type O blood? We have already checked the conditions, so just do the calculations. 13

77 Chapter 6 Inheriting Blood Type Each child of a particular pair of parents has probability 0.25 of having blood type O. Suppose the parents have 5 children (a) Find the probability that exactly 3 of the children have type O blood. Binompdf :(5,.25, 3) = There is an 8.79% percent chance that the family will have three children with type O blood. (b) Should the parents be surprised if more than 3 of their children have type O blood? Binomcdf: (5,.25, 4, 5) = There is a 1.5% percent chance that more than 3 of the children (aka at least 4 children) will have type O blood. This is surprising! Example: CDs CHECK CONDITIONS: Binary: Yes. Defective or not defective, only two options. Independent: We can safely assume independence in this case because we are sampling less than 10% of the population. Number: Yes. The number of trials is stated as 10. Success: Yes. The probability of success is the same on each attempt, p = DO & CONCLUDE: Binompdf (10,.1, 0) = There is a 34.87% that there will be no defective CDs in the sample. Binomial Distributions: Statistical Sampling The binomial distributions are important in statistics when we want to make inferences about the proportion p of successes in a population. Binomial Distributions: Normal Approximation As n gets larger, something interesting happens to the shape of a binomial distribution. Sampling Without Replacement Condition When taking an SRS of size n from a population of size N, we can use a binomial distribution to model the count of successes in the sample as long as Example: CDs Suppose 10% of CDs have defective copy protection schemes that can harm computers. A music distributor inspects an SRS of 10 CDs from a shipment of 10,000. Let X = number of defective CDs. What is P (X = 0)? Binomial Distributions: Normal Approximation Suppose that X has the binomial distribution with n trials and success probability p. When n is large, the distribution of X is approximately Normal with mean and standard deviation As a rule of thumb, we will use the Normal approximation when n is so large that np 10 and n(1 p) 10. That is, the expected number of successes and failures are both at least 10. We use the normal approximation more in Chapters

78 Chapter 6 Example: Attitudes Toward Shopping Sample surveys show that fewer people enjoy shopping than in the past. A survey asked a nationwide random sample of 2500 adults if they agreed or disagreed that I like buying new clothes, but shopping is often frustrating and timeconsuming. Suppose that exactly 60% of all adult US residents would say Agree if asked the same question. Let X = the number in the sample who agree. Estimate the probability that 1520 or more of the sample agree. Consider the normal approximation for this setting. Geometric Settings A geometric setting arises when we perform independent trials of the same chance process and record the number of trials until a particular outcome occurs. The four conditions for a geometric setting are B I T S Binary? The possible outcomes of each trial can be classified as success or failure. Independent? Trials must be independent; that is, knowing the result of one trial must not have any effect on the result of any other trial. Trials? The goal is to count the number of trials until the first success occurs. Success? On each trial, the probability p of success must be the same. CHECK CONDITIONS: Binomial: Binary: There are only 2 options. Success = agree, Failure = don t agree Independent: Because the population of U.S. adults is greater than 25,000, it is reasonable to assume the sampling without replacement condition is met; we are sampling less than 10% of the population. Number of Trials: n = 2500 trials of the chance process Success: The probability of selecting an adult who agrees is p = 0.60 Normal: Since np = 2500(0.60) = 1500 and n(1 p) = 2500(0.40) = 1000 are both at least 10, we may use the Normal approximation. Geometric Random Variable Geometric random variable: the number of trials needed to get the first success. Examples: How many M&Ms are drawn until a blue one is selected? How many students will I draw from a hat until a pick a senior? How many households can a surveyor call until someone answers? DO 1. Calculate the mean. 2. Calculate standard deviation. 3. Use Calculator Normalcdf (1520, 2500, 1500, 24.49) = CONCLUDE: There is a 20.61% that 1520 or more of the people in the sample agree. Calculator: Geometric Probability MENU, 6: Statistics, 5: Distributions F: Geometpdf G: Geometcdf Geometpdf calculates equal to value For PERCISE numbers Same idea as normpdf and normcdf Geometcdf calculates the probability of getting at least one success within a specific range of number of trials 15

79 Chapter 6 Example: The Birthday Game I am going to think of the day of the week of one of my friend s birthdays. If the first guesser gets it right you all will receive 1 homework question. If the second guesser gets the day right you will receive 2 homework questions, etc. Before playing the game, my plan was to give you all 10 homework questions. The random variable of interest in this game is Y = the number of guesses it takes to correctly identify the birth day of one of your teacher s friends. What is the probability the first student guesses correctly? The second? Third? What is the probability one of the first three students will be correct? Geometric Distribution: Mean If Y is a geometric random variable with probability p of success on each trial, then its mean (expected value) is E(Y) = µ Y = 1/p. Meaning: Expected number of n trials to achieve first success (average) Example: Suppose you are a 80% free throw shooter. You are going to shoot until you make. For p =.8, the mean is 1/.8 = This means we expect the shooter to take 1.25 shots, on average, to make first. CHECK CONDITIONS: Example: The Birthday Game Binary: There are only 2 options: Success = correct guess, Failure = incorrect guess Independent: The result of one student s guess has no effect on the result of any other guess. Trials: We re counting the number of guesses up to and including the first correct guess. Success: On each trial, the probability of a correct guess is 1/7, which is the same. Binomial vs. Geometric The Binomial Setting 1. Each observation falls into one of two categories. 2. The probability of success is the same for each observation. 3. The observations are all independent. 4. There is a fixed number n of observations. The Geometric Setting 1. Each observation falls into one of two categories. 2. The probability of success is the same for each observation. 3. The observations are all independent. 4. The variable of interest is the number of trials required to obtain the 1 st success. Example: The Birthday Game DO: Probability First Student: 1/7 = Probability Second Student: geometpdf(1/7, 2) = Probability Third Student: geometpdf (1/7, 3) = What is the probability one of the first three students will be correct? GeometCDF(1/7, 1, 3) = CONCLUDE: There is a 37.03% percent change that one of the first three students will guess correctly. Binomial or Geometric?? First defective tire Baskets made until first miss Questions guessed correctly on SAT Math Light blubs purchased until third failure Jurors selected for trial until first disqualification Number of students that interrupt class until Mrs. Daniel gets mad/mean 16

80 Chapter 6 FRQ Answers Must Include: 1. Name of distribution Geometric, Binomial 2. Parameters Binomial: X (define variable), n & p Geometric: X (define variable), p 3. Probability Statement Ex: P (X = 7) or P (X 3) 4. Calculation and p value Calculator notation is okay, but needs to be labeled. 5. Solution interpreted in context. Binomial Probability The binomial coefficient counts the number of different ways in which k successes can be arranged among n trials. The binomial probability P(X = k) is this count multiplied by the probability of any one specific arrangement of the k successes. Binomial Probability If X has the binomial distribution with n trials and probability p of success on each trial, the possible values of X are 0, 1, 2,, n. If k is any one of these values, Number of arrangements of k successes Probability of k successes Probability of n k failures Binomial Probabilities (Alternative Solution) Each child of a particular pair of parents has probability 0.25 of having type O blood. Genetics says that children receive genes from each of their parents independently. If these parents have 5 children, the count X of children with type O blood is a binomial random variable with n = 5 trials and probability p = 0.25 of a success on each trial. In this setting, a child with type O blood is a success (S) and a child with another blood type is a failure (F). What s P(X = 2)? Calculating Binomial & Geometric Distributions by Hand P(SSFFF) = (0.25)(0.25)(0.75)(0.75)(0.75) = (0.25) 2 (0.75) 3 = However, there are a number of different arrangements in which 2 out of the 5 children have type O blood: SSFFF SFSFF SFFSF SFFFS FSSFF FSFSF FSFFS FFSSF FFSFS FFFSS Verify that in each arrangement, P(X = 2) = (0.25) 2 (0.75) 3 = Therefore, P(X = 2) = 10(0.25) 2 (0.75) 3 = Binomial Coefficient How to Calculate Number of Arrangements: The number of ways of arranging k successes among n observations is given by the binomial coefficient Inheriting Blood Type (Alternative Solution) Each child of a particular pair of parents has probability 0.25 of having blood type O. Suppose the parents have 5 children (a) Find the probability that exactly 3 of the children have type O blood. Let X = the number of children with type O blood. We know X has a binomial distribution with n = 5 and p = (b) Should the parents be surprised if more than 3 of their children have type O blood? To answer this, we need to find P(X > 3). Since there is only a 1.5% chance that more than 3 children out of 5 would have Type O blood, the parents should be surprised! 17

81 Chapter 7 7.1: What is a Sampling Distribution?!?! Section 7.1 What Is a Sampling Distribution? After this section, you should be able to üdistinguish between a parameter and a statistic üdefine sampling distribution üdistinguish between population distribution, sampling distribution, and the distribution of sample data üdetermine whether a statistic is an unbiased estimator of a population parameter üdescribe the relationship between sample size and the variability of an estimator The process of statistical inference involves using information from a sample to draw conclusions about a wider population. Different random samples yield different statistics. We need to be able to describe the sampling distribution of possible statistic values in order to perform statistical inference. We can think of a statistic as a random variable because it takes numerical values that describe the outcomes of the random sampling process. Population Sample Collect data from a representative Sample... Make an Inference about the Population. Parameters and Statistics A parameter is a number that describes some characteristic of the population. In statistical practice, the value of a parameter is usually not known because we cannot examine the entire population. A statistic is a number that describes some characteristic of a sample. The value of a statistic can be computed directly from the sample data. We use a statistic to estimate an unknown parameter. Symbols: Parameters and Statistics Statistic Proportions Means Standard Deviation Parameter p µ s 1

82 Chapter 7 Parameter v. Statistic Identify the population, the parameter (of interest), the sample, and the statistic in each of the following settings. A pediatrician wants to know the 75th percentile for the distribution of heights of 10- year-old boys so she takes a sample of 50 patients and calculates Q3 = 56 inches. Parameter v. Statistic Population: all 10-year-old boys Parameter: 75th percentile, or Q3 Sample: year-old boys included in the sample Statistic: Q3 = 56 inches. Parameter v. Statistic Identify the population, the parameter, the sample, and the statistic in each of the following settings. A Pew Research Center poll asked to 17-year-olds in the United States if they have a cell phone. Of the respondents, 71% said yes. Parameter v. Statistic Population: All year olds in the US Parameter: Proportion with cell phones Sample: year olds with cell phones Statistic: pp = 0.71 Sampling Distribution The sampling distribution of a statistic is the distribution of values taken by the statistic in ALL possible samples of the same size from the same population. In practice, it s difficult (usually impossible) to take all possible samples of size n to obtain the actual sampling distribution of a statistic. Instead, we can use simulation to imitate the process of taking many, many samples. One of the uses of probability theory in statistics is to obtain sampling distributions without simulation. We ll get to the theory later. 2

83 Chapter 7 Population Distributions vs. Sampling Distributions There are actually three distinct distributions involved when we sample repeatedly and measure a variable of interest. 1) The population distribution gives the values of the variable for all the individuals in the population. 2) The distribution of sample data shows the values of the variable for all the individuals in the sample. 3) The sampling distribution shows the statistic values from all the possible samples of the same size from the population. Hours of Sleep Activity 1. Write your name and the number of hours of sleep (e.g., 7 hours, 8.5 hours) on the paper provided. 2. Select a SRS of 5 cards. Each person will do this. (Ignore sampling independence concerns). 3. Using your values calculate the sample IQR of sleep hours and the sample maximum of sleep hours. Then, plot your values on the board. 4. Based on these values and the approximate sampling distributions, do either of these statistics appear to be unbiased estimators? Bias & Variability Bias means that our aim is off and we consistently miss the bull s-eye in the same direction. Our sample values do not center on the population value. High variability means that repeated shots are widely scattered on the target. Repeated samples do not give very similar results. Describing Sampling Distributions: Center A statistic used to estimate a parameter is an unbiased estimator (most accurate) if the mean of its sampling distribution is equal to the true value of the parameter being estimated. Describing Sampling Distributions: Spread The variability of a statistic is described by the spread of its sampling distribution. This spread is determined primarily by the size of the random sample. Larger samples give smaller spread. The spread of the sampling distribution does not depend on the size of the population, as long as the population is at least 10 times larger than the sample. n=100 n=1000 3

84 Chapter 7 Describing Sampling Distributions: Shape Sampling distributions can take on many shapes. The same statistic can have sampling distributions with different shapes depending on the population distribution and the sample size. Sampling distributions for different statistics used to estimate the number of tanks in German during World War 2. The blue line represents the true number of tanks. A. Which of these statistics appear to be biased estimators? B. Of the unbiased estimators, which is the best? Explain. 7.2: Sample Proportions Section 7.2 Sample Proportions After this section, you should be able to üfind the mean and standard deviation of the sampling distribution of a sample proportion üdetermine whether or not it is appropriate to use the Normal approximation to calculate probabilities involving the sample proportion ücalculate probabilities involving the sample proportion üevaluate a claim about a population proportion using the sampling distribution of the sample proportion pplets/reeses/reesespieces.html The Sampling Distribution of What do you notice about the shape, center, and spread of each? n =100 n =400 4

85 Chapter 7 Sample Proportion Formulas pˆ = p(1 n p) The sample size MUST be less than 10% of the total population. Normal Approximation & Sample Proportions As the sample size increase, sample proportion approach the normal distribution; therefore, we can use Normal calculations. Before using Normal calculation, check Normal conditions: (sample size)(proportion) must be greater than 10. (sample size)(1 proportion) must be greater than 10. Both must be greater than 10 Normal Approximation & Sample Proportions In the game of Scrabble, each player starts by drawing 7 tiles from a bag of 100 tiles. There are 42 vowels, 56 constants and 2 blank tiles. Cait choses an SRS of 7 tiles. Let be the proportion of vowels in her sample. Normal Approximation & Sample Proportions (a) Yes. Seven tiles is less than 10% of the population of 100 tiles. (b) No. Since the total sample size was 7, both np and n(1-p) must be less than 10. The Normal condition is not satisfied. a) Is the 10% condition met? Justify your answer. b) Is the Normal condition met? Justify your answer. Normal Approximation & Sample Proportions A polling organization asks an SRS of 1500 first-year college students how far away their home is. Suppose that 35% of all first-year students actually attend college within 50 miles of home. What is the probability that the random sample of 1500 students will give a result within 2 percentage points of this true value? We have an SRS of size n = 1500 drawn from a population in which the proportion p = 0.35 attend college within 50 miles of home. Ùp = 0.35 Ùp = (0.35)(0.65) 1500 = Conditions: Independence: It is reasonable to assume that there are more than 15,000 college freshmen and therefore the sample represents less than 10% of the population. Normality: Additionally, np = 1500(0.35) = 525 and n(1 p) = 1500(0.65)=975 are both greater than 10, so it is reasonable to assume normality. 5

86 Chapter 7 The Harvard College Alcohol Study finds that 67% of college students support efforts to crack down on underage drinking. The study took a random sample of almost 15,000 students, so the population proportion who support a crackdown is close to p = The administration of a local college surveys an SRS of 100 students and finds that 62 support a crackdown on underage drinking. Suppose that the proportion of all students attending this college who support a crackdown is 67%, the same as the national proportion. Normalcdf (0.33, 0.37, 0.35, ) = CONCLUDE: There is an 89.61% chance that the sample will yield results within 2 percentage points of the true value. What is the probability that the proportion in an SRS of 100 students is as small as or smaller than the result of the administration s sample? (0.67)(0.33) pˆ = 0.67 pˆ = = Conditions: Independence: It is reasonable to assume that there are more than 1000 college freshmen and therefore the sample represents less than 10% of the population. Normality: Additionally, np = 100(0.67) = 67and n(1 p) = 100(0.33)= 33 are both greater than 10, so it is reasonable to assume normality. Normalcdf (0, 0.62, 0.67, ) = Be sure to include labels! CONCLUDE: There is an 14.38% chance that the sample will yield results at or below 62% given that the true population proportions is 67% FYI: Derivation of Formulas In Chapter 6, we learned that the mean and standard deviation of a binomial random variable X are X = np X = np(1 p) 7.3: Sample Means Since Ùp = X /n = (1/n) X, we are just multiplying the random variable X by a constant (1/n) to get the random variable Ùp. Therefore, Ùp = 1 n (np) = p Ùp is an unbiased estimator or p Ùp = 1 n np(1 p) = np(1 p) n 2 = p(1 p) n As sample size increases, the spread decreases. 6

87 Chapter 7 Section 7.3 Sample Means After this section, you should be able to üfind the mean and standard deviation of the sampling distribution of a sample mean ücalculate probabilities involving a sample mean when the population distribution is Normal üexplain how the shape of the sampling distribution of sample means is related to the shape of the population distribution üapply the central limit theorem to help find probabilities involving a sample mean Sample Means Consider the mean household earnings for samples of size 100. Compare the population distribution on the left with the sampling distribution on the right. What do you notice about the shape, center, and spread of each? Theory: Sample Means Sample Means Formulas x = Notes: The sample size must be less than 10% of the population to satisfy the independence condition. The mean and standard deviation of the sample mean are true no matter the same of the population distribution. 7

88 Chapter 7 REVIEW: Young Women s Heights The height of young women follows a Normal distribution with mean µ = 64.5 inches and standard deviation σ = 2.5 inches. Find the probability that a randomly selected young woman is taller than 66.5 inches. REVIEW: Young Women s Heights STATE: Let X = the height of a randomly selected young woman. X is N(64.5, 2.5). PLAN: Since the sample in this case is only one person, the sample size is clearly smaller than the 10% of the population. DO: z = = P(X > 66.5) = P(Z > 0.80) = = OR Normalcdf (66.5, 10000, 64.5, 2.5) = CONCLUDE: The probability of choosing a young woman at random whose height exceeds 66.5 inches is about Example: Young Women s Heights Example: Young Women s Heights The height of young women follows a Normal distribution with mean µ = 64.5 inches and standard deviation σ = 2.5 inches. Find the probability that the mean height of an SRS of 10 young women exceeds 66.5 inches. z = = P(x > 66.5) = P(Z > 2.53) = = OR normalcdf(66.5, 10000, 64.5, ) = CONCLUDE: There is a 0.57% percent chance of getting a sample of 10 women with a mean height of 66.5 It is very unlikely (less than a 1% chance) that we would choose an SRS of 10 young women whose average height exceeds 66.5 inches. Sample Distributions & Normality If the population is Normal, then the sample distribution is Normal. No further checks are need! Sample Distributions & Normality: If the population is NOT Normal, then. If the sample is large enough, the distribution of sample means is approximately Normal, no matter what shape the population distribution has, as long as the population has a finite standard deviation. 8

89 Chapter 7 Sample Distributions & Normality: Sample Distributions & Normality: HOW LARGE IS LARGE ENOUGH? If the Population shape is. Minimum Sample Size to assume Normal Normal 0 Slightly Skewed 15 Heavily Skewed 30 Unknown 30 Example: Servicing Air Conditioners Based on many service records from the past year, the time (in hours) that a technician requires to complete preventative maintenance on an air conditioner follows the distribution that is strongly right-skewed, and whose most likely outcomes are close to 0. The mean time is µ = 1 hour and the standard deviation is σ = 1 Your company will service an SRS of 70 air conditioners. You have budgeted 1.1 hours per unit. Will this be enough time? What is the chance that the technician will not finish within the allotted time (1.1 hours)? Example: Servicing Air Conditioners Conditions: Independence: It is reasonable to assume that the company has serviced more than 700 unit, therefore the 70 units in the sample represent less than 10% of the population. Normal: Even though the population has a strong right skew, a sample size of 70 is large enough to assume normality. μμ x = =1 S σσ x = = 1 n 70 = 0.12 Example: Servicing Air Conditioners DO: z = = 0.83 P(x >1.1) = P(Z > 0.83) = = OR Normalcdf ( , 10000, 1, ) = CONCLUDE: If you budget 1.1 hours per unit, there is a 20.13% chance the technicians will not complete the work within the budgeted time. 9

Scatterplots. 3.1: Scatterplots & Correlation. Scatterplots. Explanatory & Response Variables. Section 3.1 Scatterplots and Correlation

Scatterplots. 3.1: Scatterplots & Correlation. Scatterplots. Explanatory & Response Variables. Section 3.1 Scatterplots and Correlation 3.1: Scatterplots & Correlation Scatterplots A scatterplot shows the relationship between two quantitative variables measured on the same individuals. The values of one variable appear on the horizontal

More information

3.2: Least Squares Regressions

3.2: Least Squares Regressions 3.2: Least Squares Regressions Section 3.2 Least-Squares Regression After this section, you should be able to INTERPRET a regression line CALCULATE the equation of the least-squares regression line CALCULATE

More information

1.3: Describing Quantitative Data with Numbers

1.3: Describing Quantitative Data with Numbers 1.3: Describing Quantitative Data with Numbers Section 1.3 Describing Quantitative Data with Numbers After this section, you should be able to MEASURE center with the mean and median MEASURE spread with

More information

Percentile: Formula: To find the percentile rank of a score, x, out of a set of n scores, where x is included:

Percentile: Formula: To find the percentile rank of a score, x, out of a set of n scores, where x is included: AP Statistics Chapter 2 Notes 2.1 Describing Location in a Distribution Percentile: The pth percentile of a distribution is the value with p percent of the observations (If your test score places you in

More information

Topic 3: Introduction to Statistics. Algebra 1. Collecting Data. Table of Contents. Categorical or Quantitative? What is the Study of Statistics?!

Topic 3: Introduction to Statistics. Algebra 1. Collecting Data. Table of Contents. Categorical or Quantitative? What is the Study of Statistics?! Topic 3: Introduction to Statistics Collecting Data We collect data through observation, surveys and experiments. We can collect two different types of data: Categorical Quantitative Algebra 1 Table of

More information

Elementary Statistics

Elementary Statistics Elementary Statistics Q: What is data? Q: What does the data look like? Q: What conclusions can we draw from the data? Q: Where is the middle of the data? Q: Why is the spread of the data important? Q:

More information

What is statistics? Statistics is the science of: Collecting information. Organizing and summarizing the information collected

What is statistics? Statistics is the science of: Collecting information. Organizing and summarizing the information collected What is statistics? Statistics is the science of: Collecting information Organizing and summarizing the information collected Analyzing the information collected in order to draw conclusions Two types

More information

MATH 1150 Chapter 2 Notation and Terminology

MATH 1150 Chapter 2 Notation and Terminology MATH 1150 Chapter 2 Notation and Terminology Categorical Data The following is a dataset for 30 randomly selected adults in the U.S., showing the values of two categorical variables: whether or not the

More information

Chapter 1: Exploring Data

Chapter 1: Exploring Data Chapter 1: Exploring Data Section 1.2 with Graphs The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE Chapter 1 Exploring Data Introduction: Data Analysis: Making Sense of Data 1.1

More information

CHAPTER 1 Exploring Data

CHAPTER 1 Exploring Data CHAPTER 1 Exploring Data 1.2 Displaying Quantitative Data with Graphs The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers Displaying Quantitative Data

More information

STAT 200 Chapter 1 Looking at Data - Distributions

STAT 200 Chapter 1 Looking at Data - Distributions STAT 200 Chapter 1 Looking at Data - Distributions What is Statistics? Statistics is a science that involves the design of studies, data collection, summarizing and analyzing the data, interpreting the

More information

Chapter 1: Exploring Data

Chapter 1: Exploring Data Chapter 1: Exploring Data Section 1.3 with Numbers The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE Chapter 1 Exploring Data Introduction: Data Analysis: Making Sense of Data 1.1

More information

Further Mathematics 2018 CORE: Data analysis Chapter 2 Summarising numerical data

Further Mathematics 2018 CORE: Data analysis Chapter 2 Summarising numerical data Chapter 2: Summarising numerical data Further Mathematics 2018 CORE: Data analysis Chapter 2 Summarising numerical data Extract from Study Design Key knowledge Types of data: categorical (nominal and ordinal)

More information

3.1 Measure of Center

3.1 Measure of Center 3.1 Measure of Center Calculate the mean for a given data set Find the median, and describe why the median is sometimes preferable to the mean Find the mode of a data set Describe how skewness affects

More information

STP 420 INTRODUCTION TO APPLIED STATISTICS NOTES

STP 420 INTRODUCTION TO APPLIED STATISTICS NOTES INTRODUCTION TO APPLIED STATISTICS NOTES PART - DATA CHAPTER LOOKING AT DATA - DISTRIBUTIONS Individuals objects described by a set of data (people, animals, things) - all the data for one individual make

More information

1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved.

1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved. 1-1 Chapter 1 Sampling and Descriptive Statistics 1-2 Why Statistics? Deal with uncertainty in repeated scientific measurements Draw conclusions from data Design valid experiments and draw reliable conclusions

More information

Chapter 2: Tools for Exploring Univariate Data

Chapter 2: Tools for Exploring Univariate Data Stats 11 (Fall 2004) Lecture Note Introduction to Statistical Methods for Business and Economics Instructor: Hongquan Xu Chapter 2: Tools for Exploring Univariate Data Section 2.1: Introduction What is

More information

1.3.1 Measuring Center: The Mean

1.3.1 Measuring Center: The Mean 1.3.1 Measuring Center: The Mean Mean - The arithmetic average. To find the mean (pronounced x bar) of a set of observations, add their values and divide by the number of observations. If the n observations

More information

+ Check for Understanding

+ Check for Understanding n Measuring Position: Percentiles n One way to describe the location of a value in a distribution is to tell what percent of observations are less than it. Definition: The p th percentile of a distribution

More information

The response variable depends on the explanatory variable.

The response variable depends on the explanatory variable. A response variable measures an outcome of study. > dependent variables An explanatory variable attempts to explain the observed outcomes. > independent variables The response variable depends on the explanatory

More information

Resistant Measure - A statistic that is not affected very much by extreme observations.

Resistant Measure - A statistic that is not affected very much by extreme observations. Chapter 1.3 Lecture Notes & Examples Section 1.3 Describing Quantitative Data with Numbers (pp. 50-74) 1.3.1 Measuring Center: The Mean Mean - The arithmetic average. To find the mean (pronounced x bar)

More information

AP Final Review II Exploring Data (20% 30%)

AP Final Review II Exploring Data (20% 30%) AP Final Review II Exploring Data (20% 30%) Quantitative vs Categorical Variables Quantitative variables are numerical values for which arithmetic operations such as means make sense. It is usually a measure

More information

The empirical ( ) rule

The empirical ( ) rule The empirical (68-95-99.7) rule With a bell shaped distribution, about 68% of the data fall within a distance of 1 standard deviation from the mean. 95% fall within 2 standard deviations of the mean. 99.7%

More information

( )( ) of wins. This means that the team won 74 games.

( )( ) of wins. This means that the team won 74 games. AP Statistics Ch. 2 Notes Describing Location in a Distribution Often, we are interested in describing where one observation falls in a distribution in relation to the other observations. The pth percentile

More information

Chapter 3: The Normal Distributions

Chapter 3: The Normal Distributions Chapter 3: The Normal Distributions http://www.yorku.ca/nuri/econ2500/econ2500-online-course-materials.pdf graphs-normal.doc / histogram-density.txt / normal dist table / ch3-image Ch3 exercises: 3.2,

More information

QUANTITATIVE DATA. UNIVARIATE DATA data for one variable

QUANTITATIVE DATA. UNIVARIATE DATA data for one variable QUANTITATIVE DATA Recall that quantitative (numeric) data values are numbers where data take numerical values for which it is sensible to find averages, such as height, hourly pay, and pulse rates. UNIVARIATE

More information

Practice Questions for Exam 1

Practice Questions for Exam 1 Practice Questions for Exam 1 1. A used car lot evaluates their cars on a number of features as they arrive in the lot in order to determine their worth. Among the features looked at are miles per gallon

More information

Chapter 5: Exploring Data: Distributions Lesson Plan

Chapter 5: Exploring Data: Distributions Lesson Plan Lesson Plan Exploring Data Displaying Distributions: Histograms Interpreting Histograms Displaying Distributions: Stemplots Describing Center: Mean and Median Describing Variability: The Quartiles The

More information

Chapter 6. The Standard Deviation as a Ruler and the Normal Model 1 /67

Chapter 6. The Standard Deviation as a Ruler and the Normal Model 1 /67 Chapter 6 The Standard Deviation as a Ruler and the Normal Model 1 /67 Homework Read Chpt 6 Complete Reading Notes Do P129 1, 3, 5, 7, 15, 17, 23, 27, 29, 31, 37, 39, 43 2 /67 Objective Students calculate

More information

Histograms allow a visual interpretation

Histograms allow a visual interpretation Chapter 4: Displaying and Summarizing i Quantitative Data s allow a visual interpretation of quantitative (numerical) data by indicating the number of data points that lie within a range of values, called

More information

STT 315 This lecture is based on Chapter 2 of the textbook.

STT 315 This lecture is based on Chapter 2 of the textbook. STT 315 This lecture is based on Chapter 2 of the textbook. Acknowledgement: Author is thankful to Dr. Ashok Sinha, Dr. Jennifer Kaplan and Dr. Parthanil Roy for allowing him to use/edit some of their

More information

Chapter 4. Displaying and Summarizing. Quantitative Data

Chapter 4. Displaying and Summarizing. Quantitative Data STAT 141 Introduction to Statistics Chapter 4 Displaying and Summarizing Quantitative Data Bin Zou (bzou@ualberta.ca) STAT 141 University of Alberta Winter 2015 1 / 31 4.1 Histograms 1 We divide the range

More information

Introduction to Statistics

Introduction to Statistics Introduction to Statistics Data and Statistics Data consists of information coming from observations, counts, measurements, or responses. Statistics is the science of collecting, organizing, analyzing,

More information

CHAPTER 2: Describing Distributions with Numbers

CHAPTER 2: Describing Distributions with Numbers CHAPTER 2: Describing Distributions with Numbers The Basic Practice of Statistics 6 th Edition Moore / Notz / Fligner Lecture PowerPoint Slides Chapter 2 Concepts 2 Measuring Center: Mean and Median Measuring

More information

Sampling, Frequency Distributions, and Graphs (12.1)

Sampling, Frequency Distributions, and Graphs (12.1) 1 Sampling, Frequency Distributions, and Graphs (1.1) Design: Plan how to obtain the data. What are typical Statistical Methods? Collect the data, which is then subjected to statistical analysis, which

More information

Stat 101 Exam 1 Important Formulas and Concepts 1

Stat 101 Exam 1 Important Formulas and Concepts 1 1 Chapter 1 1.1 Definitions Stat 101 Exam 1 Important Formulas and Concepts 1 1. Data Any collection of numbers, characters, images, or other items that provide information about something. 2. Categorical/Qualitative

More information

Math 140 Introductory Statistics

Math 140 Introductory Statistics Math 140 Introductory Statistics Professor Silvia Fernández Chapter 2 Based on the book Statistics in Action by A. Watkins, R. Scheaffer, and G. Cobb. Visualizing Distributions Recall the definition: The

More information

Math 140 Introductory Statistics

Math 140 Introductory Statistics Visualizing Distributions Math 140 Introductory Statistics Professor Silvia Fernández Chapter Based on the book Statistics in Action by A. Watkins, R. Scheaffer, and G. Cobb. Recall the definition: The

More information

CHAPTER 5: EXPLORING DATA DISTRIBUTIONS. Individuals are the objects described by a set of data. These individuals may be people, animals or things.

CHAPTER 5: EXPLORING DATA DISTRIBUTIONS. Individuals are the objects described by a set of data. These individuals may be people, animals or things. (c) Epstein 2013 Chapter 5: Exploring Data Distributions Page 1 CHAPTER 5: EXPLORING DATA DISTRIBUTIONS 5.1 Creating Histograms Individuals are the objects described by a set of data. These individuals

More information

6 THE NORMAL DISTRIBUTION

6 THE NORMAL DISTRIBUTION CHAPTER 6 THE NORMAL DISTRIBUTION 341 6 THE NORMAL DISTRIBUTION Figure 6.1 If you ask enough people about their shoe size, you will find that your graphed data is shaped like a bell curve and can be described

More information

A C E. Answers Investigation 4. Applications

A C E. Answers Investigation 4. Applications Answers Applications 1. 1 student 2. You can use the histogram with 5-minute intervals to determine the number of students that spend at least 15 minutes traveling to school. To find the number of students,

More information

Performance of fourth-grade students on an agility test

Performance of fourth-grade students on an agility test Starter Ch. 5 2005 #1a CW Ch. 4: Regression L1 L2 87 88 84 86 83 73 81 67 78 83 65 80 50 78 78? 93? 86? Create a scatterplot Find the equation of the regression line Predict the scores Chapter 5: Understanding

More information

Chapter 3: Examining Relationships

Chapter 3: Examining Relationships Chapter 3: Examining Relationships Most statistical studies involve more than one variable. Often in the AP Statistics exam, you will be asked to compare two data sets by using side by side boxplots or

More information

M 140 Test 1 B Name (1 point) SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 75

M 140 Test 1 B Name (1 point) SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 75 M 140 est 1 B Name (1 point) SHOW YOUR WORK FOR FULL CREDI! Problem Max. Points Your Points 1-10 10 11 10 12 3 13 4 14 18 15 8 16 7 17 14 otal 75 Multiple choice questions (1 point each) For questions

More information

TOPIC: Descriptive Statistics Single Variable

TOPIC: Descriptive Statistics Single Variable TOPIC: Descriptive Statistics Single Variable I. Numerical data summary measurements A. Measures of Location. Measures of central tendency Mean; Median; Mode. Quantiles - measures of noncentral tendency

More information

Math 223 Lecture Notes 3/15/04 From The Basic Practice of Statistics, bymoore

Math 223 Lecture Notes 3/15/04 From The Basic Practice of Statistics, bymoore Math 223 Lecture Notes 3/15/04 From The Basic Practice of Statistics, bymoore Chapter 3 continued Describing distributions with numbers Measuring spread of data: Quartiles Definition 1: The interquartile

More information

Chapter 1. Looking at Data

Chapter 1. Looking at Data Chapter 1 Looking at Data Types of variables Looking at Data Be sure that each variable really does measure what you want it to. A poor choice of variables can lead to misleading conclusions!! For example,

More information

Chapter 3: Describing Relationships

Chapter 3: Describing Relationships Chapter 3: Describing Relationships Section 3.2 The Practice of Statistics, 4 th edition For AP* STARNES, YATES, MOORE Chapter 3 Describing Relationships 3.1 Scatterplots and Correlation 3.2 Section 3.2

More information

are the objects described by a set of data. They may be people, animals or things.

are the objects described by a set of data. They may be people, animals or things. ( c ) E p s t e i n, C a r t e r a n d B o l l i n g e r 2016 C h a p t e r 5 : E x p l o r i n g D a t a : D i s t r i b u t i o n s P a g e 1 CHAPTER 5: EXPLORING DATA DISTRIBUTIONS 5.1 Creating Histograms

More information

Unit Six Information. EOCT Domain & Weight: Algebra Connections to Statistics and Probability - 15%

Unit Six Information. EOCT Domain & Weight: Algebra Connections to Statistics and Probability - 15% GSE Algebra I Unit Six Information EOCT Domain & Weight: Algebra Connections to Statistics and Probability - 15% Curriculum Map: Describing Data Content Descriptors: Concept 1: Summarize, represent, and

More information

Lecture 1: Descriptive Statistics

Lecture 1: Descriptive Statistics Lecture 1: Descriptive Statistics MSU-STT-351-Sum 15 (P. Vellaisamy: MSU-STT-351-Sum 15) Probability & Statistics for Engineers 1 / 56 Contents 1 Introduction 2 Branches of Statistics Descriptive Statistics

More information

AP Statistics Cumulative AP Exam Study Guide

AP Statistics Cumulative AP Exam Study Guide AP Statistics Cumulative AP Eam Study Guide Chapters & 3 - Graphs Statistics the science of collecting, analyzing, and drawing conclusions from data. Descriptive methods of organizing and summarizing statistics

More information

Lecture 2. Quantitative variables. There are three main graphical methods for describing, summarizing, and detecting patterns in quantitative data:

Lecture 2. Quantitative variables. There are three main graphical methods for describing, summarizing, and detecting patterns in quantitative data: Lecture 2 Quantitative variables There are three main graphical methods for describing, summarizing, and detecting patterns in quantitative data: Stemplot (stem-and-leaf plot) Histogram Dot plot Stemplots

More information

Index I-1. in one variable, solution set of, 474 solving by factoring, 473 cubic function definition, 394 graphs of, 394 x-intercepts on, 474

Index I-1. in one variable, solution set of, 474 solving by factoring, 473 cubic function definition, 394 graphs of, 394 x-intercepts on, 474 Index A Absolute value explanation of, 40, 81 82 of slope of lines, 453 addition applications involving, 43 associative law for, 506 508, 570 commutative law for, 238, 505 509, 570 English phrases for,

More information

Vocabulary: Samples and Populations

Vocabulary: Samples and Populations Vocabulary: Samples and Populations Concept Different types of data Categorical data results when the question asked in a survey or sample can be answered with a nonnumerical answer. For example if we

More information

Chapter 5: Exploring Data: Distributions Lesson Plan

Chapter 5: Exploring Data: Distributions Lesson Plan Lesson Plan Exploring Data Displaying Distributions: Histograms For All Practical Purposes Mathematical Literacy in Today s World, 7th ed. Interpreting Histograms Displaying Distributions: Stemplots Describing

More information

Bemidji Area Schools Outcomes in Mathematics Algebra 2 Applications. Based on Minnesota Academic Standards in Mathematics (2007) Page 1 of 7

Bemidji Area Schools Outcomes in Mathematics Algebra 2 Applications. Based on Minnesota Academic Standards in Mathematics (2007) Page 1 of 7 9.2.1.1 Understand the definition of a function. Use functional notation and evaluate a function at a given point in its domain. For example: If f x 1, find f(-4). x2 3 Understand the concept of function,

More information

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 3.1- #

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 3.1- # Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series by Mario F. Triola Chapter 3 Statistics for Describing, Exploring, and Comparing Data 3-1 Review and Preview 3-2 Measures

More information

Stat 20 Midterm 1 Review

Stat 20 Midterm 1 Review Stat 20 Midterm Review February 7, 2007 This handout is intended to be a comprehensive study guide for the first Stat 20 midterm exam. I have tried to cover all the course material in a way that targets

More information

What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty.

What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty. What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty. Statistics is a field of study concerned with the data collection,

More information

M 225 Test 1 B Name SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 75

M 225 Test 1 B Name SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 75 M 225 Test 1 B Name SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points 1-13 13 14 3 15 8 16 4 17 10 18 9 19 7 20 3 21 16 22 2 Total 75 1 Multiple choice questions (1 point each) 1. Look at

More information

Chapter 3: Describing Relationships

Chapter 3: Describing Relationships Chapter 3: Describing Relationships Section 3.2 The Practice of Statistics, 4 th edition For AP* STARNES, YATES, MOORE Chapter 3 Describing Relationships 3.1 Scatterplots and Correlation 3.2 Section 3.2

More information

Chapter 8. Linear Regression /71

Chapter 8. Linear Regression /71 Chapter 8 Linear Regression 1 /71 Homework p192 1, 2, 3, 5, 7, 13, 15, 21, 27, 28, 29, 32, 35, 37 2 /71 3 /71 Objectives Determine Least Squares Regression Line (LSRL) describing the association of two

More information

Mrs. Poyner/Mr. Page Chapter 3 page 1

Mrs. Poyner/Mr. Page Chapter 3 page 1 Name: Date: Period: Chapter 2: Take Home TEST Bivariate Data Part 1: Multiple Choice. (2.5 points each) Hand write the letter corresponding to the best answer in space provided on page 6. 1. In a statistics

More information

Units. Exploratory Data Analysis. Variables. Student Data

Units. Exploratory Data Analysis. Variables. Student Data Units Exploratory Data Analysis Bret Larget Departments of Botany and of Statistics University of Wisconsin Madison Statistics 371 13th September 2005 A unit is an object that can be measured, such as

More information

Chapter 6 The Standard Deviation as a Ruler and the Normal Model

Chapter 6 The Standard Deviation as a Ruler and the Normal Model Chapter 6 The Standard Deviation as a Ruler and the Normal Model Overview Key Concepts Understand how adding (subtracting) a constant or multiplying (dividing) by a constant changes the center and/or spread

More information

Chapter 5. Understanding and Comparing. Distributions

Chapter 5. Understanding and Comparing. Distributions STAT 141 Introduction to Statistics Chapter 5 Understanding and Comparing Distributions Bin Zou (bzou@ualberta.ca) STAT 141 University of Alberta Winter 2015 1 / 27 Boxplots How to create a boxplot? Assume

More information

Review for Exam #1. Chapter 1. The Nature of Data. Definitions. Population. Sample. Quantitative data. Qualitative (attribute) data

Review for Exam #1. Chapter 1. The Nature of Data. Definitions. Population. Sample. Quantitative data. Qualitative (attribute) data Review for Exam #1 1 Chapter 1 Population the complete collection of elements (scores, people, measurements, etc.) to be studied Sample a subcollection of elements drawn from a population 11 The Nature

More information

MATH 2560 C F03 Elementary Statistics I Lecture 1: Displaying Distributions with Graphs. Outline.

MATH 2560 C F03 Elementary Statistics I Lecture 1: Displaying Distributions with Graphs. Outline. MATH 2560 C F03 Elementary Statistics I Lecture 1: Displaying Distributions with Graphs. Outline. data; variables: categorical & quantitative; distributions; bar graphs & pie charts: What Is Statistics?

More information

CIVL 7012/8012. Collection and Analysis of Information

CIVL 7012/8012. Collection and Analysis of Information CIVL 7012/8012 Collection and Analysis of Information Uncertainty in Engineering Statistics deals with the collection and analysis of data to solve real-world problems. Uncertainty is inherent in all real

More information

Describing distributions with numbers

Describing distributions with numbers Describing distributions with numbers A large number or numerical methods are available for describing quantitative data sets. Most of these methods measure one of two data characteristics: The central

More information

Announcements: You can turn in homework until 6pm, slot on wall across from 2202 Bren. Make sure you use the correct slot! (Stats 8, closest to wall)

Announcements: You can turn in homework until 6pm, slot on wall across from 2202 Bren. Make sure you use the correct slot! (Stats 8, closest to wall) Announcements: You can turn in homework until 6pm, slot on wall across from 2202 Bren. Make sure you use the correct slot! (Stats 8, closest to wall) We will cover Chs. 5 and 6 first, then 3 and 4. Mon,

More information

Describing distributions with numbers

Describing distributions with numbers Describing distributions with numbers A large number or numerical methods are available for describing quantitative data sets. Most of these methods measure one of two data characteristics: The central

More information

Determining the Spread of a Distribution

Determining the Spread of a Distribution Determining the Spread of a Distribution 1.3-1.5 Cathy Poliak, Ph.D. cathy@math.uh.edu Department of Mathematics University of Houston Lecture 3-2311 Lecture 3-2311 1 / 58 Outline 1 Describing Quantitative

More information

Determining the Spread of a Distribution

Determining the Spread of a Distribution Determining the Spread of a Distribution 1.3-1.5 Cathy Poliak, Ph.D. cathy@math.uh.edu Department of Mathematics University of Houston Lecture 3-2311 Lecture 3-2311 1 / 58 Outline 1 Describing Quantitative

More information

AP Statistics Semester I Examination Section I Questions 1-30 Spend approximately 60 minutes on this part of the exam.

AP Statistics Semester I Examination Section I Questions 1-30 Spend approximately 60 minutes on this part of the exam. AP Statistics Semester I Examination Section I Questions 1-30 Spend approximately 60 minutes on this part of the exam. Name: Directions: The questions or incomplete statements below are each followed by

More information

Chapter2 Description of samples and populations. 2.1 Introduction.

Chapter2 Description of samples and populations. 2.1 Introduction. Chapter2 Description of samples and populations. 2.1 Introduction. Statistics=science of analyzing data. Information collected (data) is gathered in terms of variables (characteristics of a subject that

More information

STATISTICS 1 REVISION NOTES

STATISTICS 1 REVISION NOTES STATISTICS 1 REVISION NOTES Statistical Model Representing and summarising Sample Data Key words: Quantitative Data This is data in NUMERICAL FORM such as shoe size, height etc. Qualitative Data This is

More information

Lecture 6: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries)

Lecture 6: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries) Lecture 6: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries) Summarize with Shape, Center, Spread Displays: Stemplots, Histograms Five Number Summary, Outliers, Boxplots Cengage Learning

More information

Lecture 1: Description of Data. Readings: Sections 1.2,

Lecture 1: Description of Data. Readings: Sections 1.2, Lecture 1: Description of Data Readings: Sections 1.,.1-.3 1 Variable Example 1 a. Write two complete and grammatically correct sentences, explaining your primary reason for taking this course and then

More information

appstats8.notebook October 11, 2016

appstats8.notebook October 11, 2016 Chapter 8 Linear Regression Objective: Students will construct and analyze a linear model for a given set of data. Fat Versus Protein: An Example pg 168 The following is a scatterplot of total fat versus

More information

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency Math 1 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency The word average: is very ambiguous and can actually refer to the mean, median, mode or midrange. Notation:

More information

y = a + bx 12.1: Inference for Linear Regression Review: General Form of Linear Regression Equation Review: Interpreting Computer Regression Output

y = a + bx 12.1: Inference for Linear Regression Review: General Form of Linear Regression Equation Review: Interpreting Computer Regression Output 12.1: Inference for Linear Regression Review: General Form of Linear Regression Equation y = a + bx y = dependent variable a = intercept b = slope x = independent variable Section 12.1 Inference for Linear

More information

Francine s bone density is 1.45 standard deviations below the mean hip bone density for 25-year-old women of 956 grams/cm 2.

Francine s bone density is 1.45 standard deviations below the mean hip bone density for 25-year-old women of 956 grams/cm 2. Chapter 3 Solutions 3.1 3.2 3.3 87% of the girls her daughter s age weigh the same or less than she does and 67% of girls her daughter s age are her height or shorter. According to the Los Angeles Times,

More information

Sem. 1 Review Ch. 1-3

Sem. 1 Review Ch. 1-3 AP Stats Sem. 1 Review Ch. 1-3 Name 1. You measure the age, marital status and earned income of an SRS of 1463 women. The number and type of variables you have measured is a. 1463; all quantitative. b.

More information

Chapter 2 Solutions Page 15 of 28

Chapter 2 Solutions Page 15 of 28 Chapter Solutions Page 15 of 8.50 a. The median is 55. The mean is about 105. b. The median is a more representative average" than the median here. Notice in the stem-and-leaf plot on p.3 of the text that

More information

Scatterplots and Correlation

Scatterplots and Correlation Bivariate Data Page 1 Scatterplots and Correlation Essential Question: What is the correlation coefficient and what does it tell you? Most statistical studies examine data on more than one variable. Fortunately,

More information

Describing Distributions with Numbers

Describing Distributions with Numbers Describing Distributions with Numbers Using graphs, we could determine the center, spread, and shape of the distribution of a quantitative variable. We can also use numbers (called summary statistics)

More information

CHAPTER 1. Introduction

CHAPTER 1. Introduction CHAPTER 1 Introduction Engineers and scientists are constantly exposed to collections of facts, or data. The discipline of statistics provides methods for organizing and summarizing data, and for drawing

More information

Describing Bivariate Relationships

Describing Bivariate Relationships Describing Bivariate Relationships Bivariate Relationships What is Bivariate data? When exploring/describing a bivariate (x,y) relationship: Determine the Explanatory and Response variables Plot the data

More information

Multiple Choice Circle the letter corresponding to the best answer for each of the problems below (4 pts each)

Multiple Choice Circle the letter corresponding to the best answer for each of the problems below (4 pts each) Math 221 Hypothetical Exam 1, Wi2008, (Chapter 1-5 in Moore, 4th) April 3, 2063 S. K. Hyde, S. Barton, P. Hurst, K. Yan Name: Show all your work to receive credit. All answers must be justified to get

More information

Sociology 6Z03 Review I

Sociology 6Z03 Review I Sociology 6Z03 Review I John Fox McMaster University Fall 2016 John Fox (McMaster University) Sociology 6Z03 Review I Fall 2016 1 / 19 Outline: Review I Introduction Displaying Distributions Describing

More information

Statistics 511 Additional Materials

Statistics 511 Additional Materials Graphical Summaries Consider the following data x: 78, 24, 57, 39, 28, 30, 29, 18, 102, 34, 52, 54, 57, 82, 90, 94, 38, 59, 27, 68, 61, 39, 81, 43, 90, 40, 39, 33, 42, 15, 88, 94, 50, 66, 75, 79, 83, 34,31,36,

More information

Objective A: Mean, Median and Mode Three measures of central of tendency: the mean, the median, and the mode.

Objective A: Mean, Median and Mode Three measures of central of tendency: the mean, the median, and the mode. Chapter 3 Numerically Summarizing Data Chapter 3.1 Measures of Central Tendency Objective A: Mean, Median and Mode Three measures of central of tendency: the mean, the median, and the mode. A1. Mean The

More information

BIVARIATE DATA data for two variables

BIVARIATE DATA data for two variables (Chapter 3) BIVARIATE DATA data for two variables INVESTIGATING RELATIONSHIPS We have compared the distributions of the same variable for several groups, using double boxplots and back-to-back stemplots.

More information

Which boxplot represents the same information as the histogram? Test Scores Test Scores

Which boxplot represents the same information as the histogram? Test Scores Test Scores 01 013 SEMESTER EXAMS SEMESTER 1. Mrs. Johnson created this histogram of her 3 rd period students test scores. 8 Frequency of Test Scores 6 4 50 60 70 80 90 100 Test Scores Which boxplot represents the

More information

CHAPTER 3 Describing Relationships

CHAPTER 3 Describing Relationships CHAPTER 3 Describing Relationships 3.1 Scatterplots and Correlation The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers Scatterplots and Correlation Learning

More information

Example 2. Given the data below, complete the chart:

Example 2. Given the data below, complete the chart: Statistics 2035 Quiz 1 Solutions Example 1. 2 64 150 150 2 128 150 2 256 150 8 8 Example 2. Given the data below, complete the chart: 52.4, 68.1, 66.5, 75.0, 60.5, 78.8, 63.5, 48.9, 81.3 n=9 The data is

More information

Chapter 3. Data Description

Chapter 3. Data Description Chapter 3. Data Description Graphical Methods Pie chart It is used to display the percentage of the total number of measurements falling into each of the categories of the variable by partition a circle.

More information

Shape, Outliers, Center, Spread Frequency and Relative Histograms Related to other types of graphical displays

Shape, Outliers, Center, Spread Frequency and Relative Histograms Related to other types of graphical displays Histograms: Shape, Outliers, Center, Spread Frequency and Relative Histograms Related to other types of graphical displays Sep 9 1:13 PM Shape: Skewed left Bell shaped Symmetric Bi modal Symmetric Skewed

More information