Honors Statistics. Daily Agenda

Similar documents
What is statistics? Statistics is the science of: Collecting information. Organizing and summarizing the information collected

Let's Do It! What Type of Variable?

Chapter 1. Looking at Data

Let's Do It! What Type of Variable?

MATH 1150 Chapter 2 Notation and Terminology

Shape, Outliers, Center, Spread Frequency and Relative Histograms Related to other types of graphical displays

Chapter 2: Tools for Exploring Univariate Data

M 140 Test 1 B Name (1 point) SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 75

b. Why do you suppose the percentage of women doctors has been increasing over the past 40 years?

QUANTITATIVE DATA. UNIVARIATE DATA data for one variable

Units. Exploratory Data Analysis. Variables. Student Data

AP Final Review II Exploring Data (20% 30%)

Chapter 6 Group Activity - SOLUTIONS

1.3.1 Measuring Center: The Mean

Math 223 Lecture Notes 3/15/04 From The Basic Practice of Statistics, bymoore

Elementary Statistics

Vocabulary: Samples and Populations

STAT 200 Chapter 1 Looking at Data - Distributions

download instant at

Summarizing and Displaying Measurement Data/Understanding and Comparing Distributions

Further Mathematics 2018 CORE: Data analysis Chapter 2 Summarising numerical data

MATH 081. Diagnostic Review Materials PART 2. Chapters 5 to 7 YOU WILL NOT BE GIVEN A DIAGNOSTIC TEST UNTIL THIS MATERIAL IS RETURNED.

STA 218: Statistics for Management

Chapter 6. The Standard Deviation as a Ruler and the Normal Model 1 /67

Section 3.2 Measures of Central Tendency

Chapter 1:Descriptive statistics

Describing distributions with numbers

Describing Distributions

Descriptive statistics

Resistant Measure - A statistic that is not affected very much by extreme observations.

MATH 2560 C F03 Elementary Statistics I Lecture 1: Displaying Distributions with Graphs. Outline.

Nicole Dalzell. July 2, 2014

Practice Questions for Exam 1

are the objects described by a set of data. They may be people, animals or things.

Percentile: Formula: To find the percentile rank of a score, x, out of a set of n scores, where x is included:

Chapter. Numerically Summarizing Data Pearson Prentice Hall. All rights reserved

Topic 3: Introduction to Statistics. Algebra 1. Collecting Data. Table of Contents. Categorical or Quantitative? What is the Study of Statistics?!

STP 420 INTRODUCTION TO APPLIED STATISTICS NOTES

Objective A: Mean, Median and Mode Three measures of central of tendency: the mean, the median, and the mode.

Instructor: Doug Ensley Course: MAT Applied Statistics - Ensley

Do Now 18 Balance Point. Directions: Use the data table to answer the questions. 2. Explain whether it is reasonable to fit a line to the data.

Sampling, Frequency Distributions, and Graphs (12.1)

Complement: 0.4 x 0.8 = =.6

In this investigation you will use the statistics skills that you learned the to display and analyze a cup of peanut M&Ms.

Whitby Community College Your account expires on: 8 Nov, 2015

Performance of fourth-grade students on an agility test

COMPLEMENTARY EXERCISES WITH DESCRIPTIVE STATISTICS

Chapter 2: Summarizing and Graphing Data

CHAPTER 5: EXPLORING DATA DISTRIBUTIONS. Individuals are the objects described by a set of data. These individuals may be people, animals or things.

Lecture 1: Description of Data. Readings: Sections 1.2,

Exam: practice test 1 MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

STA220H1F Term Test Oct 26, Last Name: First Name: Student #: TA s Name: or Tutorial Room:

Stat 101 Exam 1 Important Formulas and Concepts 1

TOPIC: Descriptive Statistics Single Variable

The point value of each problem is in the left-hand margin. You must show your work to receive any credit, except in problem 1. Work neatly.

Chapter 5. Understanding and Comparing. Distributions

CHAPTER 2 Description of Samples and Populations

Describing Distributions With Numbers Chapter 12

Chapter 6 Assessment. 3. Which points in the data set below are outliers? Multiple Choice. 1. The boxplot summarizes the test scores of a math class?

What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty.

14. (A)Type O (B)False ( % 150 ) (C)True 15. (A) (B) (C)Everyone (E) (D)False (E)True (12.5% < 20%) Chapter 2: Graphical Summaries of Data

Elisha Mae Kostka 243 Assignment Mock Test 1 due 02/11/2015 at 09:01am PST

3.1 Measure of Center

Chapter 3. Measuring data

Multiple Choice Circle the letter corresponding to the best answer for each of the problems below (4 pts each)

BIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke

Math 082 Final Examination Review

a. Write what the survey would look like (Hint: there should be 2 questions and options to select for an answer!).

Final Exam STAT On a Pareto chart, the frequency should be represented on the A) X-axis B) regression C) Y-axis D) none of the above

Sem. 1 Review Ch. 1-3

Practice problems from chapters 2 and 3

Unit 1: Number System Fluency

Full file at

Stat Lecture Slides Exploring Numerical Data. Yibi Huang Department of Statistics University of Chicago

Chapter 2 Class Notes Sample & Population Descriptions Classifying variables

Name: Class: Date: Mini-Unit. Data & Statistics. Investigation 1: Variability & Associations in Numerical Data. Practice Problems

Measures of Central Tendency. Mean, Median, and Mode

Survey on Population Mean

CHAPTER 1 Univariate data

Chapter 1: Exploring Data

Measures of the Location of the Data

CHAPTER 2: Describing Distributions with Numbers

Data Analysis and Statistical Methods Statistics 651

Math 074 Final Exam Review. REVIEW FOR NO CALCULATOR PART OF THE EXAM (Questions 1-14)

Honors Algebra 1 - Fall Final Review

CHAPTER 1. Introduction

Chapter Four. Numerical Descriptive Techniques. Range, Standard Deviation, Variance, Coefficient of Variation

Section 2.3: One Quantitative Variable: Measures of Spread

Describing Distributions With Numbers

A graph for a quantitative variable that divides a distribution into 25% segments.

Test 1, / /130. MASSEY UNIVERSITY Institute of Information Sciences and Technology (Statistics)

Chapter 6 The Standard Deviation as a Ruler and the Normal Model

Remember your SOCS! S: O: C: S:

Math 138 Summer Section 412- Unit Test 1 Green Form, page 1 of 7

Section 3. Measures of Variation

Lecture 6: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries)

Slide 1. Slide 2. Slide 3. Pick a Brick. Daphne. 400 pts 200 pts 300 pts 500 pts 100 pts. 300 pts. 300 pts 400 pts 100 pts 400 pts.

Introduction to Statistics

Example 2. Given the data below, complete the chart:

Chapter 1: Exploring Data

Transcription:

Honors Statistics Aug 23-8:26 PM Daily Agenda Aug 23-8:31 PM 1

Jan 19-2:20 PM Jan 24-5:14 PM 2

Sep 2-8:43 AM Sep 2-8:52 AM 3

Standard Deviation WORK >>> 97, 98 Boxplot outlier work >>> 99c and 109 Sep 12-5:28 PM 4.6 (4.6-5.4) = -.8 (-.8) 2 = 0.64 4.9 (4.9-5.4) = -.5 (-.5) 2 = 0.25 5.2 (5.2-5.4) = -.2 (-.2) 2 = 0.04 5.6 (5.6-5.4) =.2 (.2) 2 = 0.04 5.7 (5.7-5.4) =.3 (.3) 2 = 0.09 6.4 (6.4-5.4) = 1.0 (1) 2 = 1.00 0.64 mg is the standard deviation of the data point from the mean. The Phosphate levels in this patient's blood typically vary from the group mean of 5.4 mg by approximately 0.64 mg. 32.4 = 5.4 6 2.06 2.06/(6-1) = 2.06/5 = 0.412 0.412 = 0.64 mg Sep 3-7:36 PM 4

7 (7-8) = -1 (-1) 2 = 1 7 (7-8) = -1 (-1) 2 = 1 9 (9-8) = 1 (1) 2 = 1 9 (9-8) = 1 (1) 2 = 1 b) 1.15 hours is the standard deviation of the data points from the mean. Hours of sleep for the first four student in this class typically vary from the mean of 8 hours by 1.15 hours. 32 = 8 4 4 c) I would not use this data for all 30 members of the class. It was not a random sample and 4/(4-1) = 4/3 = 1.3333 the first 4 students perhaps got the most sleep that is why they are early. 1.333 = 1.15 hours Sep 3-7:37 PM a) I would guess that the shape is right skewed. The distance between Q3 and the maximum point is $47.62 where as IMPORTANT the distance between the SKILL minimum and Q1 is much smaller at $15.95. See the graph below for the other quartile distances. b) 21.70 dollars is the standard deviation. The dollar amount spent on groceries by 50 shoppers typically varies from the mean of $34.70 by $21.70. (Not exactly true in a skewed distribution) c) Q1-1.5(IQR) = 19.06-1.5(45.72-19.06) = 19.06-1.5(26.66) = 19.06-39.99 = -20.93 no bottom outliers (minimum is 3.11) Q3 +1.5(IQR) = 45.72 + 1.5(45.72-19.06) = 45.72 + 1.5(26.66) = 45.72 + 39.99 = 85.71 YES top outliers (maximum is 93.34) Sep 3-7:37 PM 5

a) I would guess that the female doctors distribution has a more symmetrical shape because the mean X = 19.1 is very close in value to the Med = 18.5. The male doctors distribution has a much bigger difference in the mean and median. b) The IQRs can be similar (23 vs 19) because the IQR only represents the middle 50% of the data set. The standard deviation is very influenced by the extreme observations so if there is a skew in the data set, which there appears to be for the male doctors distribution the standard deviations of the two distributions could be very different (20.607 vs 10.126). C) It does seem that the male doctors perform more C-sections. This is evident by examining the means and Maximum values of both distributions. The male mean X = 41.33 per year vs the smaller female mean X = 19.1 per year (this is approximately half). The maximum for the males was 86 C-sections per year and the maximum for the females is much smaller at 33 C-sections per year. Sep 3-7:37 PM Sep 3-7:38 PM 6

I believe that variable A will have the biggest standard deviation because it has more data located "away" from its center. Sep 3-7:38 PM D B Sep 3-7:38 PM 7

E First I need to find the first and third quartile numbers. 30 data points makes the median between 15 and 16 so Q1 will be position 8 and Q3 will be position 23. Q1 = 117 Q3 = 173 173-117 = 56 = IQR IQR(1.5) = 56(1.5) = 84 117-84 = 33 (no bottom outliers) 173+84 = 257 (no top outliers) Sep 3-7:39 PM A Sep 3-7:40 PM 8

Women appear to be more likely to engage in behaviors that are indicative of good habits of mind. They are especially more likely to revise papers to improve their writing (about 55% of females report this as opposed to about 37% of males). The difference is a little less for seeking feedback on their work. In that case about 49% of the females did this as opposed to about 38% males. Sep 3-7:40 PM Jan 25-7:36 AM 9

12 or 16 23 Jan 25-7:40 AM Groups - Block 1 Jack Corenna Dezarae Grace Amanda Libby Mackensie Brendan Will Partick Kaitlyn Samantha Maddie Caroline Zoe Maria Charlie Spencer Jesse Jan 25-8:02 AM 10

Luke Chloe Sarah Hunter Groups - Block 2 Jenna Jake Carl John Will Kassidy Grant Sophia Mackenzie Michael Wendel Bridget Olivia George Allison Jan 25-8:02 AM Libby Gabe Justin Groups - Block 4 Tommy Nico Danielle Nate Kim Ashley Elliott Rachel Eva Cole L. Taylor Erica Cole M. Jackie Jan 25-8:02 AM 11

Aug 19-7:05 PM Aug 12-5:55 PM 12

Aug 28-11:10 AM Please place the following in your folder: 1. your homework in the pocket folder Aug 28-10:59 PM 13

This distribution of file sizes is NOT symmetric. It is skewed to the right. The center is approximately at the median 2.45 mb. The spread is from 1.1 to 7.5 mb which is a range of 6.4 mb. There is a gap from 6.22 to 7.5 mb and a potential outlier at 7.5 mb. The data set should be analyzed using the five number summary. 1.9 mb is the standard deviation of the data points from the mean. A file size on Tim's mp3 player typically varies from the mean size of 3.19 megabytes by 1.9 mb. Sep 7-9:07 PM Sep 8-8:19 AM 14

Sep 7-8:30 PM Sep 4-6:50 PM 15

Feb 4-7:33 AM Sep 4-6:53 PM 16

Optional 1-20 MC from textbook website List is in L GUINE part a) Histogram b) boxplot R1.9 & R1.10 write alot of words... Sep 12-5:26 PM Aug 30-10:08 PM 17

R1.1. Hit movies According to the Internet Movie Database, Avatar is tops based on box office sales worldwide. The following table displays data on several popular movies. 47 (a) What individuals does this data set describe? The data set describes a list of Top Hit Movies based on box office income. (b) Clearly identify each of the variables. Which are quantitative? Name of Movie Year of release Rating Time Genre Box office dollars Categorical Categorical (or quantitative?) Categorical Quantitative (minutes) Categorical Quantitative (dollars) (c) Describe the individual in the highlighted row. Feb 2-5:57 PM > (a) What individuals does this data set describe? > (b) Clearly identify each of the variables. Which are quant > (c) Describe the individual in the highlighted row. The movie Avatar is an action film that was released in the year 2009 and rated PG-13. It lasted for 162 minutes and has recorded the most income $2,781,505,847. R1.2. Movie ratings The movie rating system we use today was first established on November 1, 1968. Back then, the possible ratings were G, PG, R, and X. In 1984, the PG-13 rating was created. And in 1990, NC-17 replaced the X rating. Here is a summary of the ratings assigned to movies between 1968 and 2000: 8% rated G, 24% rated PG, 10% rated PG-13, 55% rated R, and 3% rated NC-17. 48 Make an appropriate graph for displaying these data. Movie ratings from the years 1968 to 2000 Feb 2-6:06 PM 18

R1.3. I d die without my phone! In a survey of over 2000 U.S. teenagers by Harris Interactive, 47% said that their social life would end or be worsened without their cell phone. 49 One survey question asked the teens how important it is for their phone to have certain features. The figure below displays data on the percent who indicated that a particular feature is vital. (a) Explain how the graph gives a misleading impression. The graph uses pictures that are not properly sized. The make and receive calls should only be approximately twice as big as the camera bar but it is clearly over four times bigger (perhaps 9 times bigger). (b) Would it be appropriate to make a pie chart to display these data? Why or why not? It would not be appropriate to make a pie chart for two reasons. The first reason is that the percentages add up to more than 100%, thus indicating the second reason. The bars do not represent parts of one whole group. They are parts of separate groups. (c) Make a graph of the data that isn t misleading. Vital features of cell phones according to a sample of U.S. teenagers. Feb 2-6:08 PM R1.4. Facebook and age Is there a relationship between Facebook use and age among college students? The following two-way table displays data for the 219 students who responded to the survey. 50 148 71 82 70 67 219 (a) What percent of the students who responded were Facebook users? Is this percent part of a marginal distribution or a conditional distribution? Explain. What percent of students were Facebook users? 148 = 67.58% This is a marginal distribution because 219 the numerator is a row total and the denominator is the Total total. (b) What percent of the younger students in the sample were Facebook users? What percent of the Facebook users in the sample were younger students? What percent of younger students were Facebook users? 78 = 95.12% 82 What percent of the Facebook users were younger students? 78 = 52.7% 148 The last two questions are conditional distributions. They use a table number divided by a row or column total. Feb 2-6:16 PM 19

R1.5. Facebook and age Use the data in the previous exercise to determine whether there is an association between Facebook use and age. Give appropriate graphical and numerical evidence to support your answer. 78 =95% 49 21 82 =70% = 31.3% 70 67 Let's make a segmented bar chart. (A triple bar for each age) 100% 5% 30% = No Facebook percent 50% 95% 68.6% = Facebook 70% 31.3% younger middle older age group of student There does appear to be an association between age and Facebook status. From both the Table A and the graph given above, we can see that as age increases, the percent of Facebook users decreases. For younger students, about 95% are members. That drops to 70% for middle students and drops even further to 31.3% for older students. Feb 2-6:26 PM Density of the earth measurement s b) The distribution of earth density measurements is roughly symmetric. It is bell shaped. The center (mean) of the measurements is 5.45 units. The measurements spread from 4.88 to 5.85 which is a range of approximately 1 unit. There are no deviations from the general pattern. The data should be analyzed using the mean and standard deviation. c) I estimate the earth's density to be approximately 5.45 times the density of water. Sep 7-8:44 PM 20

Guinea pig survival times: Here are the survival times in days of 72 guinea pigs after they were injected with infectious bacteria in a medical experiment.52 Survival times, whether of machines under stress or cancer patients after treatment, usually have distributions that are skewed to the right. (a) Make a histogram of the data and describe its main features. Does it show the expected right skew? Guinea pig survival (b) Now make a boxplot of the data. Be sure to check for outliers. Guinea pig survival (c) Which measure of center and spread would you use to summarize the distribution the mean and standard deviation or the median and IQR? Justify your answer. The median and IQR should be used to summarize the distribution because the data is severly skewed to the right and the mean and standard deviation are NON-RESISTANT to the influence of these outliers. Sep 8-7:36 AM R1.8. Household incomes Rich and poor households differ in ways that go beyond income. Following are histograms that compare the distributions of household size (number of people) for low-income and high-income households. Low-income households had annual incomes less than $15,000, and high-income households had annual incomes of at least $100,000. households consisted of two people? Low income households two people size is 20% High income households two people size is 34% What are the important differences between these two distributions? What do you think explains these differences? The first difference is their shape. The low income distribution is skewed to the right. (Most households are only of size 1 person). The high income households distribution is still right skewed but slightly more bell shaped. The center and range of the distributions seem to be approximately equal at 3 persons for mean and 6 persons I believe that the difference in shape is due to the fact that there are more families in the high income distribution and more single people in the low income distribution. Feb 3-3:43 PM 21

. Do you like to eat tuna? Many people do. Unfortunately, some of the tuna that people eat may contain high levels of mercury. Exposure to mercury can be especially hazardous for pregnant women and small children. How much mercury is safe to consume? The Food and Drug Administration will take action (like removing the product from store shelves) if the mercury concentration in a sixounce can of tuna is 1.00 ppm (parts per million) or higher. What is the typical mercury concentration in cans of tuna sold in stores? A study conducted by Defenders of Wildlife set out to answer this question. Defenders collected a sample of 164 cans of tuna from stores across the United States. They sent the selected cans to a laboratory that is often used by the Environmental Protection Agency for mercury testing. provide information about the mercury concentration in the sampled cans (in parts per million, ppm). terpret the standard deviation in context. 0.300 ppm is the standard deviation of the sampled cans typically varies from the mean of etermine 0.286 whether ppm there by are about any outliers. 0.3 ppm. = 0.380-0.071 = 0.309 )(0.309) = 0.4635 0.4635 = 0.38 + 0.4635 = 0.8435 Maximum was 1.5 so there is at least one top outlier 0.4635 = 0.071-0.4635 = -0.3925 Minimum was 0.012 no bottom outliers escribe the shape, center, and spread of the distribution. This distribution of mercury in cans is NOT symmetric, it is skewed to the right. The center (median) is at approximately 0.18 ppm. The spread of the distribution is 1.488 ppm. There are potential outliers at the top values of the data set, and a few small gaps. This data should be analyzed using the five number symmary. Feb 3-3:43 PM R1.10. Mercury Is there a difference in the mercury concentration of light tuna and albacore tuna? Use the parallel boxplots and the computer output to write a few sentences comparing the two distributions. The distribution of light and albacore tuna are different. The Albacore tuna distribution is roughly symmetric. The light tuna distribution is very skewed to the right. The center (mean) of the Albacore tuna distribution is approximately 0.401 ppm and the center (mean) of the light tuna is lower at approximately 0.269 ppm. The medians follow this same pattern (0.400 to 0.160ppm). The range of the Albacore tuna is approximately 0.560 ppm, and the range of the light tuna is much larger at 1.488 ppm. If you disregard some of the top potential outliers in the light tuna distribution, it seems that the light tuna has a smaller amount of mercury concentration per can than does the Albacore tuna. Feb 3-3:43 PM 22