Sampling, Frequency Distributions, and Graphs (12.1)

Similar documents
MATH 1150 Chapter 2 Notation and Terminology

Elementary Statistics

Chapter 2: Tools for Exploring Univariate Data

Section 3.2 Measures of Central Tendency

Exercises from Chapter 3, Section 1

STAT 200 Chapter 1 Looking at Data - Distributions

CHAPTER 1. Introduction

8.1 Frequency Distribution, Frequency Polygon, Histogram page 326

Describing distributions with numbers

TOPIC: Descriptive Statistics Single Variable

IB Questionbank Mathematical Studies 3rd edition. Grouped discrete. 184 min 183 marks

Stat 101 Exam 1 Important Formulas and Concepts 1

1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved.

Introduction to Statistics

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 3.1- #

A is one of the categories into which qualitative data can be classified.

Review for Exam #1. Chapter 1. The Nature of Data. Definitions. Population. Sample. Quantitative data. Qualitative (attribute) data

Math 1040 Sample Final Examination. Problem Points Score Total 200

AP Final Review II Exploring Data (20% 30%)

6 THE NORMAL DISTRIBUTION

Chapter 2: Summarizing and Graphing Data

Chapter 3. Data Description

Histograms allow a visual interpretation

What is statistics? Statistics is the science of: Collecting information. Organizing and summarizing the information collected

are the objects described by a set of data. They may be people, animals or things.

CHAPTER 5: EXPLORING DATA DISTRIBUTIONS. Individuals are the objects described by a set of data. These individuals may be people, animals or things.

2011 Pearson Education, Inc

Francine s bone density is 1.45 standard deviations below the mean hip bone density for 25-year-old women of 956 grams/cm 2.

Math 223 Lecture Notes 3/15/04 From The Basic Practice of Statistics, bymoore

Example 2. Given the data below, complete the chart:

Slide 1. Slide 2. Slide 3. Pick a Brick. Daphne. 400 pts 200 pts 300 pts 500 pts 100 pts. 300 pts. 300 pts 400 pts 100 pts 400 pts.

DEPARTMENT OF QUANTITATIVE METHODS & INFORMATION SYSTEMS QM 120. Spring 2008

MATH 2560 C F03 Elementary Statistics I Lecture 1: Displaying Distributions with Graphs. Outline.

Lecture 11. Data Description Estimation

QUANTITATIVE DATA. UNIVARIATE DATA data for one variable

Determining the Spread of a Distribution

STP 420 INTRODUCTION TO APPLIED STATISTICS NOTES

Determining the Spread of a Distribution

SESSION 5 Descriptive Statistics

Unit 1: Statistics. Mrs. Valentine Math III

Lecture 2. Descriptive Statistics: Measures of Center

Lecture 2. Quantitative variables. There are three main graphical methods for describing, summarizing, and detecting patterns in quantitative data:

(quantitative or categorical variables) Numerical descriptions of center, variability, position (quantitative variables)

Describing distributions with numbers

STT 315 This lecture is based on Chapter 2 of the textbook.

Sem. 1 Review Ch. 1-3

Chapter 3. Measuring data

Chapter. Numerically Summarizing Data Pearson Prentice Hall. All rights reserved

Linear Regression Communication, skills, and understanding Calculator Use

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency

MATH 10 INTRODUCTORY STATISTICS

Unit Six Information. EOCT Domain & Weight: Algebra Connections to Statistics and Probability - 15%

Chapter 5: Exploring Data: Distributions Lesson Plan

Resistant Measure - A statistic that is not affected very much by extreme observations.

The empirical ( ) rule

LC OL - Statistics. Types of Data

Descriptive Univariate Statistics and Bivariate Correlation

The point value of each problem is in the left-hand margin. You must show your work to receive any credit, except in problem 1. Work neatly.

3.1 Measure of Center

AP Statistics Bivariate Data Analysis Test Review. Multiple-Choice

Vocabulary: Samples and Populations

Statistics 100 Exam 2 March 8, 2017

Topic 3: Introduction to Statistics. Algebra 1. Collecting Data. Table of Contents. Categorical or Quantitative? What is the Study of Statistics?!

Chapter 1. Looking at Data

Unit 2. Describing Data: Numerical

q3_3 MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

ADMS2320.com. We Make Stats Easy. Chapter 4. ADMS2320.com Tutorials Past Tests. Tutorial Length 1 Hour 45 Minutes

Further Mathematics 2018 CORE: Data analysis Chapter 2 Summarising numerical data

Chapter 2: Descriptive Analysis and Presentation of Single- Variable Data

Objective A: Mean, Median and Mode Three measures of central of tendency: the mean, the median, and the mode.

The response variable depends on the explanatory variable.

P8130: Biostatistical Methods I

Unit 4 Probability. Dr Mahmoud Alhussami

Percentile: Formula: To find the percentile rank of a score, x, out of a set of n scores, where x is included:

EQ: What is a normal distribution?

Objectives. 2.1 Scatterplots. Scatterplots Explanatory and response variables. Interpreting scatterplots Outliers

Analyzing Lines of Fit

Lecture 1: Descriptive Statistics

Math 082 Final Examination Review

Statistics 1. Edexcel Notes S1. Mathematical Model. A mathematical model is a simplification of a real world problem.

Continuous random variables

Practice problems from chapters 2 and 3

How spread out is the data? Are all the numbers fairly close to General Education Statistics

Exam: practice test 1 MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Practice Questions for Exam 1

What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty.

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

FREQUENCY DISTRIBUTIONS AND PERCENTILES

Chapter 6 The Standard Deviation as a Ruler and the Normal Model

Topic 2 Part 3 [189 marks]

AIM HIGH SCHOOL. Curriculum Map W. 12 Mile Road Farmington Hills, MI (248)

AP Statistics Semester I Examination Section I Questions 1-30 Spend approximately 60 minutes on this part of the exam.

Chapter 3 Data Description

Final Exam STAT On a Pareto chart, the frequency should be represented on the A) X-axis B) regression C) Y-axis D) none of the above

CHAPTER 1 Exploring Data

1.3.1 Measuring Center: The Mean

STA220H1F Term Test Oct 26, Last Name: First Name: Student #: TA s Name: or Tutorial Room:

GRACEY/STATISTICS CH. 3. CHAPTER PROBLEM Do women really talk more than men? Science, Vol. 317, No. 5834). The study

Chapter2 Description of samples and populations. 2.1 Introduction.

Do Now 18 Balance Point. Directions: Use the data table to answer the questions. 2. Explain whether it is reasonable to fit a line to the data.

Transcription:

1 Sampling, Frequency Distributions, and Graphs (1.1) Design: Plan how to obtain the data. What are typical Statistical Methods? Collect the data, which is then subjected to statistical analysis, which serves two related purposes: Description and Inference. Description: Summarize the data using graphs and numerical summaries. Inference: Use data from a random and representative sample (a small group of subjects) to draw conclusions about the population (all subjects) of interest. Data sets consist of: The Population and the Sample Experimental Units/Subjects- the people, animals, or objects in our study/experiment. An EU/ A Subject- One of what we are measuring in our study/experiment. A student, an airline flight, one flip of a coin. Variable- The characteristics that we measure on each subject. (Think measurements will vary from subject to subject; they are variable). Population- All subjects that we are interested in. All UCF psychology majors, all Delta domestic flights in 008, all possible flips of a coin. Sample- A subset of the population for which we have data. The sample contains the subjects for which we have data. 18 UCF psychology majors randomly chosen by Student ID Number, 40 Delta domestic flights in 008 randomly chosen by date and flight number, 100 flips of a coin. Notice that we always want to use a random sample; a sample that is chosen from the population by some random method. Random Sampling- Each member of the population has an equal chance of being included in the sample. Random samples tend to be representative of the population, so we can draw better conclusions.

Summarizing Data: 1) Frequency Distributions Frequency - The number of measurements/observations in a category (class). Frequency Distribution - Shows how a data set is partitioned among several classes. CLASS/ CATEGORY FREQUENCY Ex) A group of people were randomly selected and asked the question: "Did you watch all, part, or none of the last football game?" The responses are summarized in the Frequency Distribution below. GAME FREQUENCY None 31 Part 36 All 34 Ex) The pulse rate measurements in beats per minute was obtained from 37 randomly selected individuals. The results are summarized in the Frequency Distribution below. PULSE RATE FREQUENCY 60-69 1 70-79 14 80-89 11

3 Definitions: Lower Class Limit The smallest value in each category. Upper Class Limit - The largest value in each category. Class Boundary The number that separates each class. (This number may or may not be observed in the data set.) Class Midpoint = ( Lower _ Class _ Limit ) ( Upper _ Class _ Limit ) value. ; Midpoint class Class Width The difference between two consecutive: a) lower class limits, or b) lower class boundaries. PULSE RATE FREQUENCY 60-69 1 70-79 14 80-89 11 Steps to make a Frequency Distribution: 1. Decide on the optimal number of classes (categories). This should be a number between 5-0. Max Min. Calculate the class width = # classes 3. Starting at the lower class limit, add the class width to create categories. 4. Count the number of data observations falling within the categories; this is the frequency.

4 Ex) A group of seven randomly selected students listed their drive times from home to school, in minutes. The results are listed below. 10 1 0 11 30 40 80 Let s create four classes. Then the class width = 80 10 4 = 17.5. Drive Time FREQUENCY 10-7.5 4 7.5-45 45-6.5 0 6.5-80 1

5 Summarizing Data: ) Histograms Histograms A bar chart in which the height of the bars shows how frequently an observations fall within a subinterval. The left boundary point of each subinterval is included, the right boundary point is not included. GENERAL PICTURE Ex) Create a Frequency Histogram for the data set below. Drive Time FREQUENCY 10-7.5 4 7.5-45 45-6.5 0 6.5-80 1

6 Summarizing Data: 3) Polygons Frequency Polygon A plot of midpoints vs. Frequency. The plotted points are connected with a line. Ex) Create a Frequency Polygon for the data below. Drive Time FREQUENCY 10-7.5 4 7.5-45 45-6.5 0 6.5-80 1

7 Summarizing Data: 4) Stem-and-Leaf Plots Stem-and-leaf plots: 1. Order your observations from smallest to largest.. Divide each measurement into two parts, the stem and the leaf 3. Record the stem part of the measurement to the left of a vertical line and the leaf to the right of the vertical line. 4. Repeat for all the measurements. Example Grades on an exam for a small class are 45 65 7 76 8 86 89 91 94

8 Measures of Central Tendency (1.) Notation n = # of observations in the data set f = frequency of the data value x 1, x, x 3,, x n = first, second, third,, last observation x (1), x (),, x (n) = smallest observation, second smallest,., largest observation = summation notation (capitol sigma) Describing Data The Center of a data set can be described by the: I. Mean, which is the average of all observations. Mean = x 1... n x n = n x i Mean of a Frequency Distribution = x n xf II. Median, which is the observation in the middle of the data. 1. Order the observations from smallest to largest.. M =. M = x when n is odd x 1 ( n ) n ( ) x n ( 1) when n is even III. Mode, which is the most frequently occurring value. If more than one data value has the highest frequency, then each of these values is a mode. IV. Midrange, which is the average of the lowest and highest data values. Midrange = ( lowest_ data_ value) ( highest_ data_ value)

9 Example The number of friends on Facebook for a sample of 5 female members is 3 55 60 10 7 a) Find the mean Mean = x 1... n x n = n x i b) Find the median M = x when n is odd 1 ( n ) c) Find the midrange. Midrange = ( lowest_ data_ value) ( highest_ data_ value)

10 Example The number of friends on Facebook for a sample of 6 male members is 3 55 60 10 7 7 a) Find the mode b) Find the median M = x n ( ) x n ( 1) when n is even

11 Ex. Find the mean and the mode for the items given in the frequency distribution. Mean of a Frequency Distribution = x n xf Hours Spent, x Frequency, f 13 14 0 15 6 16 3 17 3 18 4 19 3 0 3 1 1 3 4 0 5 1

1 Ex. Find the mean, median, mode, and midrange for the displays below. a) b) c)

13 Measures of Dispersion (1.3) The Spread of the data set about the mean can be described by the: I. Range = maximum minimum = x (n) - x (1) II. III. Variance, which is the average squared deviation from the mean. Its units are the units of the original data set, squared. Variance = s = n 1 1 n i 1 x i x Standard Deviation, which is the square root of the variance. Its units of measure are the same as the original data sets. Population Standard Deviation = Standard Deviation = s = s = ( x i n 1 _ x) s = ( data_ item n 1 mean)

14 Example Two very similar data sets. For each one, make a quick plot of their distribution, find mean, and range. Compare the two distributions. Then find the standard deviation using the formula. Repeat using your calculator. Data Set One: 1, 1, 1, 4, 7, 7, 7 Data Set Two 1, 3, 4, 4, 4, 5, 7

15 Interpreting Standard Deviation, s The larger the standard deviation, the more spread out the data set is. s can never be negative. s is very much affected by outliers. s can only be zero if there is no variability in the data; if all the observations are identical.

16 The Normal Distribution (1. 4) Common Distribution Shapes Symmetric-The left and right sides of the distribution when divided at the middle value form mirror images. Bimodal Unimodal Mound or Bell-Shaped Skewed Left Skewed Right

17 The Normal Probability Distribution The family of Normal Distributions are o Bell-shaped o Symmetric o Centered at their mean = median = mode o With spread given by their standard deviation s Percentiles For a set of ordered observations x (1), x (),, x (n), the pth percentile is the value of x that is greater than p% of the measurements and is less than the remaining (100-p)%. Example Suppose that a score of 80 points on test one placed you at the 5 th percentile in the distribution of test scores. Where does your score of 80 stand in relation to the scores of others who took the test? Note: The Median is the same as the 50 th Percentile. Along with the median there are two other important pth percentiles, called quartiles. Together, these quartiles divide the data set into four quarters. Quartiles Q1 = 5 th percentile: 5% of observations lie below Q1 and 75% of observations lie above Q1. Q1 is the median for the lower half of the data. Q = Median: 50% of observations lie below M and 50% of observations lie above M. Q is the median of the entire data set. Q3 = 75 th percentile: 75% of observations lie below Q3 and 5% of observations lie above Q3. Q3 is the median for the upper half of the data.

18 Example Find the quartiles of the distribution of the number of friends on Facebook for the sample of 5 female members. Data appears below in order. 3 55 60 10 7 Example Repeat for the sample size of 6 males, with the data listed below. 3 55 60 10 7 7

19 Margin of Error If a statistic is obtained from a random sample of size n, there is a 1 95% probability that it lies within 100% of the true population percent. n 1 ME = 100% n Example: Using a random sample of 300 teachers, 85.4% say they work after school hours at least 10 hours a week. a) Find the margin of error in this percent. b) What would the margin of error be if they sampled 3000 teachers finding the same percent? c) Which sample size would give a better estimate of the population?

0 Empirical Rule For any BELL-SHAPED and SYMMETRIC DISTRIBUTION: You will find 68% of the observations within one standard deviation of the mean (within the interval ). 95% of the observations within two standard deviations of the mean (within the interval ). 99.7% of the observations within three standard deviations of the mean (within the interval ).

1 Example The length of time required for an automobile driver to respond to a particular emergency situation was recorded for n=10 drivers. The mean response time was.8 seconds with a standard deviation of. seconds. a) What is the probability it takes a driver more than.8 seconds to respond? b) What is the probability it takes a driver more than 1 second to respond?

c) What is the probability it takes a driver less than. seconds to respond? d) What is the probability it takes a driver between.4 and.8 seconds to respond?

3 e) What is the probability it takes a driver between.4 and.6 seconds to respond? f) What is the probability it takes a driver between 1 and 1.4 seconds to respond? g) What is the probability it takes a driver less than.8 seconds to respond?

4 Example Assume IQ values for the whole population follow a bell-shaped and symmetric distribution with a mean of 100 points and standard deviation 10 points. a) Sketch a graph of this distribution b) Between what two values will you find the central 68% of IQs? 95% of IQs? 99.7% of IQs?

5 The relative standing of an observation can be described by Standardized Observations, or z-scores observation mean z tells you how many standard deviations above s tan dard _ deviation or below the mean an average observation x is. Positive z-scores indicate values above the mean; negative z-scores indicate a value below the mean. The distribution of z is normal with a mean of zero and standard deviation of 1. This is called the standard normal distribution. Example The distribution of IQ scores is approximately normal with mean 100 and standard deviation 10. a) If Bubba has an IQ of 15, what is his z-score? b) If Laura-Lynn has an IQ of 80, what is her z-score?

6 Problem Solving with the Normal Distribution (1.5) Standard Normal Table The standard normal table is Table 1.14, page 70, at the beginning of chapter 1. It gives areas under the normal curve to the left of the z-score; these are called cumulative probabilities. The z-scores appear on the margins of the table, areas are in the center. Remember, z = x the probability of an event. and the area under the curve is the same thing as Steps for Calculating Probabilities given x 1. Identify the random variable, x, the mean, and the standard deviation. x mean. Convert X=x to Z = s 3. Look for the z-score along the margin of the Table to find the corresponding percentile. I find it helps to draw a picture:

7 Example The distribution of IQ scores is approximately normal with mean 100 and standard deviation 10. a) If Bubba has an IQ of 15, what s his z-score? b) What percentage of people IQs lower than 13? c) What percentage of people IQs lower than 89?

8 d) What percentage of people IQs higher than 89? e) What percentage of people IQs between 89 and 13? f) What percentage of people IQs of exactly 89?

9 Example The amount of daily cell phone usage time for teens in the U.S. is normally distributed with a mean of 4.46 hours and a standard deviation of 1.44 hours. a) What percentage of these young Americans talk more than 3.74 hours daily? b) What percentage talks less than 8.06 hours daily? c) What percent talk between 3.74and 8 hours per day?

30 Exploring the association between Two Quantative Variables (Section 1.6) Notation For quantative data, we label the explanatory variable x and the response variable y. Example Determine which variable should be explanatory (x) and response (y). a) Father s height and son s height. b) How much a car is worth and how old the car is. Scatterplots Plot of y vs. x, two quantative variables, measured on the same individual. Interpreting Scatterplots Direction: positive or negative? Linear trend? How strong? Or is a curved trend? Any outliers? Do the points cluster in groups? Is there an explanation?

cellular Selling price MGF 1106 CH 1 TEST FIVE 31 Examples: Interpret the following scatterplots. 1. Y= Selling price of a residential property in thousands of dollars X = Square feet of living area (Mendenhall, Beaver, and Beaver, page 113). 450 Scatterplot of Selling price vs Living area (ft sq) 400 350 300 50 00 150 1400 1500 1600 1700 1800 1900 Living area (ft sq) 000 100 00 300. Y = percentage of adults with cellular phones in a country X = country s Gross Domestic Product per capita (in thousands of US dollars) 40 Scatterplot of Cellular vs GDP 30 0 10 0 0 4 gpd 6 8 10

3 Correlation We will use the symbol r to represent the sample s correlation coefficient.. We will use the symbol to represent the population s correlation coefficient. The correlation summarizes the direction and strength of the linear relationship between x and y. r = 1 n 1 x x y y s x s y Comment: The text uses the equivalent formula: n xy x y r n x x n y y The two variables x and y have the same correlation regardless of which one is called the explanatory or the response variable. r is always between -1 and +1. r has no units. Outliers have a strong effect on r. Interpretation positive/negative? Strong/weak?

33 Examples

34 Regression Line Used to predict the response variable y for a particular value of x. We call the predicted values y ^. ^ y = mx + b where: m is the slope (rise/run) o The slope represents the average (or predicted) change in y for a oneunit change in x. s y o m = r s x o Comment: Your text uses the equivalent formula n xy x y m n x x b is the y-intercept (the point where the regression line crosses the y-axis) o The y-intercept corresponds to the predicted value of y when x=0. We only interpret the y-intercept if: 1. x=0 makes sense AND. it is close to the values of x observed. o b = y m x o Comment: Your text uses the equivalent formula y x b m n n

35 Example Suppose the regression equation to predict the amount of money spent of groceries in a week based on the number of people in a household is y ^ = 55.5 + 3.8x. a) Interpret the equation. b) Predict the amount of money a household of six would spend on groceries in a week. c) Roughly sketch the relationship between amounts spent on groceries and household size. Do you expect all households of people to spend the same amount of money?

36 Example The height of 11 pairs of brothers and sisters were measured. The values are below. (Peter Dunn, USQ). We wish to predict the sister s height based on the brother s height. a) Create a scatter plot. Interpret. b) Calculate the regression equation by hand using s y m = r and b = y m x. (r =.558). s x c) Calculate the regression equation on your calculator. d) Draw the regression equation on your scatter plot. e) Interpret your regression equation. f) What is the predicted average height for sisters with a brother who is 75 inches tall? g) Calculate the correlation. Interpret.

37 Comments There are many lines that, visually, seem to fit a scatterplot well. The least squares regression line finds the line that minimizes the squared distance between the observed points and the regression line. Picture: If the data contains outliers then: o Check the data and correct any typos. o If there are still unusual observations, try to find out more about them. Do they belong in the data set? What makes them different? If they do not belong in the data set, you should delete the point before proceeding with the regression analysis. o If the point is valid, conduct the regression analysis with and without that point. If the results are similar you may use them. If they are different, then you should collect more data to find out the true relationship between x and y.

cellular MGF 1106 CH 1 TEST FIVE 38 R : Coefficient of Determination R = (r) = (correlation) It is easier to interpret than r, the correlation coefficient. It is interpreted as the percent of variability in y explained by the linear regression on x. R is always between 0 and +1. Interpretation: strong or weak? To get back to r from R, take the square root. Determine the sign of r by either looking at a scatter plot or the slope. Example a) If r =.3, what is R? What is R when r = -.3? b) Given ^ y = 55.5 + 3.8x and R =.84, what is r? c) For the scatter plot below, we know that R =.97. What is r? 40 Scatterplot of Cellular vs GDP 30 0 10 0 0 4 gpd 6 8 10

39 Example Reduced visual performance with increasing age has been a much-studied phenomenon in recent years. This decline is due partly to changes in optical properties of the eye itself and partly neural degeneration throughout the visual system. As one aspect of this problem, the article Morphometry of Nerve Fiber Bundle Pores in the Optic Nerve Head of the Human presented the accompanying data on age (x) and percentage of the cribriform area of the lamina scleralis occupied by pores (y). a) Create a scatter plot. Interpret. b) Calculate the regression equation by hand using s y m = r and b = y m x. s x c) Calculate the regression equation on your calculator. d) Draw the regression equation on your scatter plot. e) Interpret your regression equation. f) What is the predicted percentage of the cribriform area of the lamina scleralis occupied by pores for a 3 year old? g) Calculate the correlation. Interpret.

40 Example The authors of the paper Weight-Bearing Activity during Youth Is a More Important Factor for Peak Bone Mass than Calcium Intake studied a number of variables they thought might be related to bone mineral density (BMD)> The accompanying data on x = weight at age 13 and y = bone mineral density at age 7 are consisted with summary quantities for women given in the paper. a) Create a scatter plot. Interpret. b) Calculate (by hand) a simple linear regression model that can be used to describe the relationship between weight at age 13 and BMD at age 7. s y m = r and b = y m x. s x c) Calculate the regression equation on your calculator. d) Draw the regression equation on your scatter plot. e) Interpret your regression equation. f) What is the predicted BMD for at 7 year old who weighed 70 pounds when he was 13 years old? g) Calculate the correlation. Interpret.