The empirical ( ) rule

Size: px
Start display at page:

Download "The empirical ( ) rule"

Transcription

1 The empirical ( ) rule With a bell shaped distribution, about 68% of the data fall within a distance of 1 standard deviation from the mean. 95% fall within 2 standard deviations of the mean. 99.7% fall within 3 standard deviations of the mean. What if the distribution is not bell-shaped? There is another rule, named Chebyshev's Rule, that tells us that there must be at least 75% of the data within 2 standard deviations of the mean, regardless of the shape, and at least 89% within 3 standard deviations. week3 1

2 Linear transformations A linear transformation changes the original value x into a new variable x new. x new is given by an equation of the form, xnew = a + bx Example 1.19 on page 54 in IPS. (i) A distance x measured in km. can be expressed in miles as follow,. xnew = 0.62x (ii) A temperature x measured in degrees Fahrenheit can be converted to degrees Celsius by x 5( 32) new = x x 9 = week3 2

3 Effect of a Linear Transformation Multiplying each observation in a data set by a number b multiplies both the measures of center (mean, median, and trimmed means) by b and the measures of spread (range, standard deviation and IQR) by b that is the absolute value of b. Adding the same number a to each observation in a data set adds a to measures of center, quartiles and percentiles but does not change the measures of spread. Linear transformations do NOT change the overall shape of a distribution. week3 3

4 Measure x x new Mean a + bx Median x M a+bm Mode Range IQR Stdev Mode R IQR s a+bmode b R b IQR b s week3 4

5 Example 1 A sample of 20 employees of a company was taken and their salaries were recorded. Suppose each employee receives a $300 raise in the salary for the next year. State whether the following statements are true or false. a) The IQR of the salaries will i. be unchanged ii. increase by $300 iii. be multiplied by $300 b) The mean of the salaries will i. be unchanged ii. increase by $300 iii. be multiplied by $300 week3 5

6 Nonlinear transformations A very common nonlinear transformation in statistic is the logarithm transformation. Recall: lnx = log e x where e is the natural number e = If measurements on a variable x have a right skewed distribution. The distribution of lnx will be roughly symmetric. If measurements on a variable x have a left skewed distribution. The distribution of lnx will be even more left skewed. week3 6

7 Example 2 - Nonlinear transformations Histogram for sales data Histogram for ln(sales) Frequency 100 Frequency Sales ln(sales) week3 7

8 Density curves Using software, clever algorithms can describe a distribution in a way that is not feasible by hand, by fitting a smooth curve to the data in addition to or instead of a histogram. The curves used are called density curves. It is easier to work with a smooth curve, because histogram depends on the choice of classes. Density Curve Density curve is a curve that is always on or above the horizontal axis. has area exactly 1 underneath it. A density curve describes the overall pattern of a distribution. week3 8

9 The area under the curve and above any range of values is the relative frequency (proportion) of all observations that fall in that range of values. Example: The curve below shows the density curve for scores in an exam and the area of the shaded region is the proportion of students who scores between 60 and 80. week3 9

10 Median and mean of Density Curve The median of a distribution described by a density curve is the point that divides the area under the curve in half. A mode of a distribution described by a density curve is a peak point of the curve, the location where the curve is highest. Quartiles of a distribution can be roughly located by dividing the area under the curve into quarters as accurately as possible by eye. week3 10

11 Normal distributions An important class of density curves are the symmetric unimodal bell-shaped curves known as normal curves. They describe normal distributions. All normal distributions have the same overall shape. The exact density curve for a particular normal distribution is specified by giving its mean μ and its standard deviation σ. The mean is located at the center of the symmetric curve and is the same as the median and the mode. Changing μ without changing σ moves the normal curve along the horizontal axis without changing its spread. week3 11

12 The standard deviation σ controls the spread of a normal curve. week3 12

13 There are other symmetric bell-shaped density curves that are not normal e.g. t distribution. The normal density curves are specified by a particular function. The height of a normal density curve at any point x is given by 2 1 x μ 1 e 2 σ σ 2π Notation: A normal distribution with mean μ and standard deviation σ is denoted by N(μ, σ). week3 13

14 The rule In the normal distribution with mean μ and standard deviation σ, Approx. 68% of the observations fall within σ of the mean μ. Approx. 95% of the observations fall within 2σ of the mean μ. Approx. 99.7% of the observations fall within 3σ of the mean μ. week3 14

15 Example 1.23 on p72 in IPS The distribution of heights of women aged is approximately N(64.5, 2.5), that is,normal with mean μ = 64.5 inches and standard deviation σ = 2.5 inches. The rule says that the middle 95% (approx.) of women are between to inches tall. The other 5% have heights outside the range from 59.5 to 69.5 inches, and 2.5% of the women are taller than Exercise: 1) The middle 68% (approx.) of women are between to inches tall. 2) % of the women are taller than ) % of the women are taller than 72. week3 15

16 Standardizing and z-scores If x is an observation from a distribution that has mean μ and standard deviation σ, the standardized value of x is given by z = A standardized value is often called a z-score. x σ μ A z-score tells us how many standard deviations the original observation falls away from the mean of the distribution. Standardizing is a linear transformation that transform the data into the standard scale of z-scores. Therefore, standardizing does not change the shape of a distribution, but changes the value of the mean and stdev. week3 16

17 Example 1.24 on p73 in IPS The heights of women is approximately normal with mean μ = 64.5 inches and standard deviation σ = 2.5 inches. The standardized height is The standardized value (z-score) of height 68 inches is or 1.4 std. dev. above the mean. A woman 60 inches tall has standardized height or 1.8 std. dev. below the mean. z = height z = = z = = week3 17

18 The Standard Normal distribution The standard normal distribution is the normal distribution N(0, 1) that is, the mean μ = 0 and the sdev σ = 1. If a random variable X has normal distribution N(μ, σ), then the standardized variable Z = X μ σ has the standard normal distribution. Areas under a normal curve represent proportion of observations from that normal distribution. There is no formula to calculate areas under a normal curve. Calculations use either software or a table of areas. The table and most software calculate one kind of area: cumulative proportions. A cumulative proportion is the proportion of observations in a distribution that fall at or below a given value and is also the area under the curve to the left of a given value. week3 18

19 The standard normal tables Table A gives cumulative proportions for the standard normal distribution. The table entry for each value z is the area under the curve to the left of z, the notation used is P( Z z). e.g. P( Z 1.4 ) = week3 19

20 Standard Normal Distribution z The table shows area to left of z under standard normal curve For a negative number, -z : Area below (-z) = Area above (z) = 1 Area below (z) 20

21 The standard normal tables - Example What proportion of the observations of a N(0,1) distribution takes values a) less than z = 1.4? b) greater than z = 1.4? c) greater than z = -1.96? d) between z = 0.43 and z = 2.15? week3 21

22 Properties of Normal distribution If a random variable Z has a N(0,1) distribution then P(Z = z)=0. The area under the curve below any point is 0. The area between any two points a and b (a < b) under the standard normal curve is given by P(a Z b) = P(Z b) P(Z a) As mentioned earlier, if a random variable X has a N(μ, σ) distribution, then the standardized variable Z μ = X σ has a standard normal distribution and any calculations about X can be done using the following rules: week3 22

23 week3 23 P(X = k) = 0 for all k. The solution to the equation P(X k) = p is k = μ + σz p Where z p is the value z from the standard normal table that has area (and cumulative proportion) p below it, i.e. z p is the p th percentile of the standard normal distribution. ( ) = σ μ a Z P a X P ( ) = σ μ b Z P b X P 1 ( ) = σ μ σ μ b Z a P b X P a

24 Questions 1. The marks of STA221 students has N(65, 15) distribution. Find the proportion of students having marks (a) less then 50. (b) greater than 80. (c) between 50 and Example 1.30 on page 79 in IPS: Scores on SAT verbal test follow approximately the N(505, 110) distribution. How high must a student score in order to place in the top 10% of all students taking the SAT? 3. The time it takes to complete a stat220 term test is normally distributed with mean 100 minutes and standard deviation 14 minutes. How much time should be allowed if we wish to ensure that at least 9 out of 10 students (on average) can complete it? (final exam Dec. 2001) week3 24

25 4. General Motors of Canada has a deal: an oil filter and lube job in 25 minutes or the next one free. Suppose that you worked for GM and knew that the time needed to provide these services was approximately normal with mean 15 minutes and std. dev. 2.5 minutes. How many minutes would you have recommended to put in the ad above if it was decided that about 5 free services for 100 customers was reasonable? 5. In a survey of patients of a rehabilitation hospital the mean length of stay in the hospital was 12 weeks with a std. dev. of 1 week. The distribution was approximately normal. a) Out of 100 patients how many would you expect to stay longer than 13 weeks? b) What is the percentile rank of a stay of 11.3 weeks? c) What percentage of patients would you expect to be in longer than 12 weeks? d) What is the length of stay at the 90 th percentile? e) What is the median length of stay? week3 25

26 Normal quantile plots and their use A histogram or stem plot can reveal distinctly nonnormal features of a distribution. If the stem-plot or histogram appears roughly symmetric and unimodal, we use another graph, the normal quantile plot as a better way of judging the adequacy of a normal model. Any normal distribution produces a straight line on the plot. Use of normal quantile plots: If the points on a normal quantile plot lie close to a straight line, the plot indicates that the data are normal. Systematic deviations from a straight line indicate a nonnormal distribution. Outliers appear as points that are far away from the overall pattern of the plot. week3 26

27 Histogram, the nscores plot and the normal quantile plot for data generated from a normal distribution (N(500, 20)) Frequency 5 value value Normal Probability Plot for value ncores 99 ML Estimates Mean: StDev: Percent week Data

28 Histogram, the nscores plots and the normal quantile plot for data generated from a right skewed distribution 10 Frequency value 10 value week3 ncores 21 28

29 2 1 ncores value Norm al Probability Plot for value 99 M L Estim ates M ean: StDev: Percent week3 29 Data

30 Histogram, the nscores plots and the normal quantile plot for data generated from a left skewed distribution 10 Frequency value value nscore week3 30

31 2 1 nscore value Normal Probability Plot for value 99 M L Estimates M ean: StDev: Percent Data week3 31

32 Histogram, the nscores plots and the normal quantile plot for data generated from a uniform distribution (0,5) Frequency value 5 4 value ncores week3 32

33 2 1 ncores value Normal Probability Plot for value 99 M L Estim ates M ean: StDev: Percent week3 33 Data

34 Question (similar to Q5 Term test Oct, 2000) Below are 4 normal probability (quantile) plots and 4 histograms produced by MINITAB for some data sets. The histograms are not in the same order as normal scores plots. Match the histograms with the nscores plots. week3 34

35 data Frequency nscores data data 30 Frequency nscores data data Frequency nscores data 60 8 data nscores Frequency week data

36 Looking at data - relationships Two variables measured on the same individuals are associated if some values of one variable tend to occur more often with some values of the second variable than with other values of that variable. When examining the relationship between two or more variables, we should first think about the following questions: What individuals do the data describe? What variables are present? How are they measured? Which variables are quantitative and which are categorical? Is the purpose of the study is simply to explore the nature of the relationship, or do we hope to show that one variable can explain variation in the other? week3 36

37 Response and explanatory variables A response variable measure an outcome of a study. An explanatory variable explains or causes changes in the response variables. Explanatory variables are often called independent variables and response variables are called dependent variables. The ides behind this is that response variables depend on explanatory variables. We usually call the explanatory variable x and the response variable y. week3 37

38 Scatterplot A scatterplot shows the relationship between two quantitative variables measured on the same individuals. Each individual in the data appears as a point in the plot fixed by the values of both variables for that individual. Always plot the explanatory variable, if there is one, on the horizontal axis (the x axis) of a scatterplot. Examining and interpreting Scatterplots Look for overall pattern and striking deviations from that pattern. The overall pattern of a scatterplot can be described by the form, direction and strength of the relationship. An important kind of deviation is an outlier, an individual value that falls outside the overall pattern. week3 38

39 Example There is some evidence that drinking moderate amounts of wine helps prevent heart attack. A data set contain information on yearly wine consumption (litters per person) and yearly deaths from heart disease (deaths per 100,000 people) in 19 developed nations. Answer the following questions. What is the explanatory variable? What is the response variable? Examine the scatterplot below. week3 39

40 Wine week3 40 Heart disease deaths

41 Interpretation of the scatterplot The pattern is fairly linear with a negative slope. No outliers. The direction of the association is negative. This means that higher levels of wine consumption are associated with lower death rates. This does not mean there is a causal effect. There could be lurking variables. For example, higher wine consumption could be linked to higher income, which would allow better medical care. MINITAB command for scatterplot Graph > Plot week3 41

42 Categorical variables in scatterplots To add a categorical variable to a scatterplot, use a different colour or symbol for each category. The scatterplot below shows the relationship between the world record times for 10,000m run and the year for both men and women F M Time (seconds) Year 2000 week3 42

43 Categorical explanatory variables Scatterplots display the association between two quantitative variables. To display a relationship between a categorical explanatory variable and a quantitative response variable, make a side-byside comparison of the distributions of the response for each category. A back-to-back stemplot compares two distributions. Side-by-side boxplots compare any number of distributions. week3 43

44 Example We want to investigate to association between how much education a person has and his/her income. Education appears as a categorical variable. 1 = did not reach high school, 2 = some high school but no high school diploma. up to 6 = postgraduate degree. Order the categories and make side-by side boxplots for the income. week3 44

45 The side-by-side boxplots show a strong positive association between education and earnings. week3 45

46 week3 46 Correlation A sctterplot displays the form, direction and strength of the relationship between two quantitative variables. Correlation (denoted by r) measures the direction and strength of the liner relationship between two quantitative variables. Suppose that we have data on variables x and y for n individuals. The correlation r between x and y is given by ( ) y x n i i i n i y i x i s s nxy y x n s y y s x x n r = = = =

47 Example Family income and annual savings in thousand of $ for a sample of eight families are given below. savings income C3 C4 C Sum of C5 = r = /7 = MINITAB command: Stat > Basic Statistics > Correlation week3 47

48 Properties of correlation Correlation requires both variables to be quantitative and make no use of the distinction between explanatory and response variables. Because r uses standardized values of observations, it does not depend on units of measurements of x and y. Correlation r has no unit if measurement. Positive r indicates positive association between the variables and negative r indicates negative association. Correlation measures the strength of only the linear relationship between two variables, it does not describe curved relationship! r is always a number between 1 and 1. Values of r near 0 indicates a weak linear relationship. The strength of the linear relationship increases as r moves away from 0. Values of r close to 1 or 1 indicates that the points lie close to a straight line. r is not resistant. r is strongly affected by a few outliers. week3 48

49 week3 49

50 Question from Term test, summer 99 MINITAB analyses of math and verbal SAT scores is given below. Variable N Mean Median TrMean StDev SE Mean Verbal Math GPA Variable Minimum Maximum Verbal Math GPA Stem-and-leaf of Verbal N = 200 Leaf Unit = (56) week3 50

51 Stem-and-leaf of Math N = 200 Leaf Unit = (63) Frequency 10 Frequency Math Verbal week3 51

52 a) Find the 25 th percentile, 75 th percentile and the IQR of the math SAT scores. b) You were one of the students of this study and your math SAT score was 532. What is your z-score and percentile standing? c) If the math SAT scores were in fact left (negatively) skewed, but the mean was still 650, what could you say about the percentile standing of someone who obtains a score of 650? d) What is the class width? i) of the histogram for verbal SAT scores? ii) of the stemplot of the verbal SAT scores? e) Describe both the verbal and math score distributions and compare one with the other. week3 52

53 g) Give a rough sketch of how a normal probability plot would look if the verbal scores were i. Right (positively) skewed ii. Uniform in shape h) For verbal scores, aside from running through the data and tallying, can you determine the approx. percentage of scores which fall between 523 and 668? If so give the percentage. week3 53

54 Question (Term Test May 98) Descriptive statistics of scores of 3 groups of students are given below. Variable Group N Mean Median TrMean StDev Post1 B D S Using the information above estimate the following in some reasonable way. State any assumptions that you have to make. (a) The 90 th percentile of the post1 scores using method B. b) The proportion of post1 scores that would be 7 or higher for those using method D. week3 54

Continuous random variables

Continuous random variables Continuous random variables A continuous random variable X takes all values in an interval of numbers. The probability distribution of X is described by a density curve. The total area under a density

More information

Describing distributions with numbers

Describing distributions with numbers Describing distributions with numbers A large number or numerical methods are available for describing quantitative data sets. Most of these methods measure one of two data characteristics: The central

More information

Elementary Statistics

Elementary Statistics Elementary Statistics Q: What is data? Q: What does the data look like? Q: What conclusions can we draw from the data? Q: Where is the middle of the data? Q: Why is the spread of the data important? Q:

More information

Statistics 528: Homework 2 Solutions

Statistics 528: Homework 2 Solutions Statistics 28: Homework 2 Solutions.4 There are several gaps in the data, as can be seen from the histogram. Minitab Result: Min Q Med Q3 Max 8 3278 22 2368 2624 Manual Result: Min Q Med Q3 Max 8 338 22.

More information

Describing distributions with numbers

Describing distributions with numbers Describing distributions with numbers A large number or numerical methods are available for describing quantitative data sets. Most of these methods measure one of two data characteristics: The central

More information

STP 420 INTRODUCTION TO APPLIED STATISTICS NOTES

STP 420 INTRODUCTION TO APPLIED STATISTICS NOTES INTRODUCTION TO APPLIED STATISTICS NOTES PART - DATA CHAPTER LOOKING AT DATA - DISTRIBUTIONS Individuals objects described by a set of data (people, animals, things) - all the data for one individual make

More information

Math 2311 Sections 4.1, 4.2 and 4.3

Math 2311 Sections 4.1, 4.2 and 4.3 Math 2311 Sections 4.1, 4.2 and 4.3 4.1 - Density Curves What do we know about density curves? Example: Suppose we have a density curve defined for defined by the line y = x. Sketch: What percent of observations

More information

Chapter 2: Tools for Exploring Univariate Data

Chapter 2: Tools for Exploring Univariate Data Stats 11 (Fall 2004) Lecture Note Introduction to Statistical Methods for Business and Economics Instructor: Hongquan Xu Chapter 2: Tools for Exploring Univariate Data Section 2.1: Introduction What is

More information

STAT 200 Chapter 1 Looking at Data - Distributions

STAT 200 Chapter 1 Looking at Data - Distributions STAT 200 Chapter 1 Looking at Data - Distributions What is Statistics? Statistics is a science that involves the design of studies, data collection, summarizing and analyzing the data, interpreting the

More information

Sem. 1 Review Ch. 1-3

Sem. 1 Review Ch. 1-3 AP Stats Sem. 1 Review Ch. 1-3 Name 1. You measure the age, marital status and earned income of an SRS of 1463 women. The number and type of variables you have measured is a. 1463; all quantitative. b.

More information

Math 223 Lecture Notes 3/15/04 From The Basic Practice of Statistics, bymoore

Math 223 Lecture Notes 3/15/04 From The Basic Practice of Statistics, bymoore Math 223 Lecture Notes 3/15/04 From The Basic Practice of Statistics, bymoore Chapter 3 continued Describing distributions with numbers Measuring spread of data: Quartiles Definition 1: The interquartile

More information

(quantitative or categorical variables) Numerical descriptions of center, variability, position (quantitative variables)

(quantitative or categorical variables) Numerical descriptions of center, variability, position (quantitative variables) 3. Descriptive Statistics Describing data with tables and graphs (quantitative or categorical variables) Numerical descriptions of center, variability, position (quantitative variables) Bivariate descriptions

More information

What is statistics? Statistics is the science of: Collecting information. Organizing and summarizing the information collected

What is statistics? Statistics is the science of: Collecting information. Organizing and summarizing the information collected What is statistics? Statistics is the science of: Collecting information Organizing and summarizing the information collected Analyzing the information collected in order to draw conclusions Two types

More information

QUANTITATIVE DATA. UNIVARIATE DATA data for one variable

QUANTITATIVE DATA. UNIVARIATE DATA data for one variable QUANTITATIVE DATA Recall that quantitative (numeric) data values are numbers where data take numerical values for which it is sensible to find averages, such as height, hourly pay, and pulse rates. UNIVARIATE

More information

Example 2. Given the data below, complete the chart:

Example 2. Given the data below, complete the chart: Statistics 2035 Quiz 1 Solutions Example 1. 2 64 150 150 2 128 150 2 256 150 8 8 Example 2. Given the data below, complete the chart: 52.4, 68.1, 66.5, 75.0, 60.5, 78.8, 63.5, 48.9, 81.3 n=9 The data is

More information

Lecture 2. Quantitative variables. There are three main graphical methods for describing, summarizing, and detecting patterns in quantitative data:

Lecture 2. Quantitative variables. There are three main graphical methods for describing, summarizing, and detecting patterns in quantitative data: Lecture 2 Quantitative variables There are three main graphical methods for describing, summarizing, and detecting patterns in quantitative data: Stemplot (stem-and-leaf plot) Histogram Dot plot Stemplots

More information

MATH 1150 Chapter 2 Notation and Terminology

MATH 1150 Chapter 2 Notation and Terminology MATH 1150 Chapter 2 Notation and Terminology Categorical Data The following is a dataset for 30 randomly selected adults in the U.S., showing the values of two categorical variables: whether or not the

More information

Percentile: Formula: To find the percentile rank of a score, x, out of a set of n scores, where x is included:

Percentile: Formula: To find the percentile rank of a score, x, out of a set of n scores, where x is included: AP Statistics Chapter 2 Notes 2.1 Describing Location in a Distribution Percentile: The pth percentile of a distribution is the value with p percent of the observations (If your test score places you in

More information

Remember your SOCS! S: O: C: S:

Remember your SOCS! S: O: C: S: Remember your SOCS! S: O: C: S: 1.1: Displaying Distributions with Graphs Dotplot: Age of your fathers Low scale: 45 High scale: 75 Doesn t have to start at zero, just cover the range of the data Label

More information

Exam: practice test 1 MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Exam: practice test 1 MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Exam: practice test MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Solve the problem. ) Using the information in the table on home sale prices in

More information

Chapter 3: Displaying and summarizing quantitative data p52 The pattern of variation of a variable is called its distribution.

Chapter 3: Displaying and summarizing quantitative data p52 The pattern of variation of a variable is called its distribution. Chapter 3: Displaying and summarizing quantitative data p52 The pattern of variation of a variable is called its distribution. 1 Histograms p53 The breakfast cereal data Study collected data on nutritional

More information

Practice Questions for Exam 1

Practice Questions for Exam 1 Practice Questions for Exam 1 1. A used car lot evaluates their cars on a number of features as they arrive in the lot in order to determine their worth. Among the features looked at are miles per gallon

More information

Describing Distributions

Describing Distributions Describing Distributions With Numbers April 18, 2012 Summary Statistics. Measures of Center. Percentiles. Measures of Spread. A Summary Statement. Choosing Numerical Summaries. 1.0 What Are Summary Statistics?

More information

A is one of the categories into which qualitative data can be classified.

A is one of the categories into which qualitative data can be classified. Chapter 2 Methods for Describing Sets of Data 2.1 Describing qualitative data Recall qualitative data: non-numerical or categorical data Basic definitions: A is one of the categories into which qualitative

More information

Chapter 3: The Normal Distributions

Chapter 3: The Normal Distributions Chapter 3: The Normal Distributions http://www.yorku.ca/nuri/econ2500/econ2500-online-course-materials.pdf graphs-normal.doc / histogram-density.txt / normal dist table / ch3-image Ch3 exercises: 3.2,

More information

Chapter 2 Solutions Page 15 of 28

Chapter 2 Solutions Page 15 of 28 Chapter Solutions Page 15 of 8.50 a. The median is 55. The mean is about 105. b. The median is a more representative average" than the median here. Notice in the stem-and-leaf plot on p.3 of the text that

More information

1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved.

1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved. 1-1 Chapter 1 Sampling and Descriptive Statistics 1-2 Why Statistics? Deal with uncertainty in repeated scientific measurements Draw conclusions from data Design valid experiments and draw reliable conclusions

More information

Objective A: Mean, Median and Mode Three measures of central of tendency: the mean, the median, and the mode.

Objective A: Mean, Median and Mode Three measures of central of tendency: the mean, the median, and the mode. Chapter 3 Numerically Summarizing Data Chapter 3.1 Measures of Central Tendency Objective A: Mean, Median and Mode Three measures of central of tendency: the mean, the median, and the mode. A1. Mean The

More information

STT 315 This lecture is based on Chapter 2 of the textbook.

STT 315 This lecture is based on Chapter 2 of the textbook. STT 315 This lecture is based on Chapter 2 of the textbook. Acknowledgement: Author is thankful to Dr. Ashok Sinha, Dr. Jennifer Kaplan and Dr. Parthanil Roy for allowing him to use/edit some of their

More information

STA 218: Statistics for Management

STA 218: Statistics for Management Al Nosedal. University of Toronto. Fall 2017 My momma always said: Life was like a box of chocolates. You never know what you re gonna get. Forrest Gump. Simple Example Random Experiment: Rolling a fair

More information

Exercises from Chapter 3, Section 1

Exercises from Chapter 3, Section 1 Exercises from Chapter 3, Section 1 1. Consider the following sample consisting of 20 numbers. (a) Find the mode of the data 21 23 24 24 25 26 29 30 32 34 39 41 41 41 42 43 48 51 53 53 (b) Find the median

More information

Chapter 6 The Standard Deviation as a Ruler and the Normal Model

Chapter 6 The Standard Deviation as a Ruler and the Normal Model Chapter 6 The Standard Deviation as a Ruler and the Normal Model Overview Key Concepts Understand how adding (subtracting) a constant or multiplying (dividing) by a constant changes the center and/or spread

More information

Chapter 6. Exploring Data: Relationships. Solutions. Exercises:

Chapter 6. Exploring Data: Relationships. Solutions. Exercises: Chapter 6 Exploring Data: Relationships Solutions Exercises: 1. (a) It is more reasonable to explore study time as an explanatory variable and the exam grade as the response variable. (b) It is more reasonable

More information

6 THE NORMAL DISTRIBUTION

6 THE NORMAL DISTRIBUTION CHAPTER 6 THE NORMAL DISTRIBUTION 341 6 THE NORMAL DISTRIBUTION Figure 6.1 If you ask enough people about their shoe size, you will find that your graphed data is shaped like a bell curve and can be described

More information

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency Math 1 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency The word average: is very ambiguous and can actually refer to the mean, median, mode or midrange. Notation:

More information

Chapter 1. Looking at Data

Chapter 1. Looking at Data Chapter 1 Looking at Data Types of variables Looking at Data Be sure that each variable really does measure what you want it to. A poor choice of variables can lead to misleading conclusions!! For example,

More information

CHAPTER 1. Introduction

CHAPTER 1. Introduction CHAPTER 1 Introduction Engineers and scientists are constantly exposed to collections of facts, or data. The discipline of statistics provides methods for organizing and summarizing data, and for drawing

More information

Math 140 Introductory Statistics

Math 140 Introductory Statistics Math 140 Introductory Statistics Professor Silvia Fernández Chapter 2 Based on the book Statistics in Action by A. Watkins, R. Scheaffer, and G. Cobb. Visualizing Distributions Recall the definition: The

More information

Math 140 Introductory Statistics

Math 140 Introductory Statistics Visualizing Distributions Math 140 Introductory Statistics Professor Silvia Fernández Chapter Based on the book Statistics in Action by A. Watkins, R. Scheaffer, and G. Cobb. Recall the definition: The

More information

The response variable depends on the explanatory variable.

The response variable depends on the explanatory variable. A response variable measures an outcome of study. > dependent variables An explanatory variable attempts to explain the observed outcomes. > independent variables The response variable depends on the explanatory

More information

are the objects described by a set of data. They may be people, animals or things.

are the objects described by a set of data. They may be people, animals or things. ( c ) E p s t e i n, C a r t e r a n d B o l l i n g e r 2016 C h a p t e r 5 : E x p l o r i n g D a t a : D i s t r i b u t i o n s P a g e 1 CHAPTER 5: EXPLORING DATA DISTRIBUTIONS 5.1 Creating Histograms

More information

STA220H1F Term Test Oct 26, Last Name: First Name: Student #: TA s Name: or Tutorial Room:

STA220H1F Term Test Oct 26, Last Name: First Name: Student #: TA s Name: or Tutorial Room: STA0HF Term Test Oct 6, 005 Last Name: First Name: Student #: TA s Name: or Tutorial Room: Time allowed: hour and 45 minutes. Aids: one sided handwritten aid sheet + non-programmable calculator Statistical

More information

Chapter 6 Scatterplots, Association and Correlation

Chapter 6 Scatterplots, Association and Correlation Chapter 6 Scatterplots, Association and Correlation Looking for Correlation Example Does the number of hours you watch TV per week impact your average grade in a class? Hours 12 10 5 3 15 16 8 Grade 70

More information

M 140 Test 1 B Name (1 point) SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 75

M 140 Test 1 B Name (1 point) SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 75 M 140 est 1 B Name (1 point) SHOW YOUR WORK FOR FULL CREDI! Problem Max. Points Your Points 1-10 10 11 10 12 3 13 4 14 18 15 8 16 7 17 14 otal 75 Multiple choice questions (1 point each) For questions

More information

Statistics Lecture 3

Statistics Lecture 3 Statistics 111 - Lecture 3 Continuous Random Variables The probable is what usually happens. (Aristotle ) Moore, McCabe and Craig: Section 4.3,4.5 Continuous Random Variables Continuous random variables

More information

Ch. 3 Review - LSRL AP Stats

Ch. 3 Review - LSRL AP Stats Ch. 3 Review - LSRL AP Stats Multiple Choice Identify the choice that best completes the statement or answers the question. Scenario 3-1 The height (in feet) and volume (in cubic feet) of usable lumber

More information

ADMS2320.com. We Make Stats Easy. Chapter 4. ADMS2320.com Tutorials Past Tests. Tutorial Length 1 Hour 45 Minutes

ADMS2320.com. We Make Stats Easy. Chapter 4. ADMS2320.com Tutorials Past Tests. Tutorial Length 1 Hour 45 Minutes We Make Stats Easy. Chapter 4 Tutorial Length 1 Hour 45 Minutes Tutorials Past Tests Chapter 4 Page 1 Chapter 4 Note The following topics will be covered in this chapter: Measures of central location Measures

More information

Sociology 6Z03 Review I

Sociology 6Z03 Review I Sociology 6Z03 Review I John Fox McMaster University Fall 2016 John Fox (McMaster University) Sociology 6Z03 Review I Fall 2016 1 / 19 Outline: Review I Introduction Displaying Distributions Describing

More information

Further Mathematics 2018 CORE: Data analysis Chapter 2 Summarising numerical data

Further Mathematics 2018 CORE: Data analysis Chapter 2 Summarising numerical data Chapter 2: Summarising numerical data Further Mathematics 2018 CORE: Data analysis Chapter 2 Summarising numerical data Extract from Study Design Key knowledge Types of data: categorical (nominal and ordinal)

More information

Complement: 0.4 x 0.8 = =.6

Complement: 0.4 x 0.8 = =.6 Homework The Normal Distribution Name: 1. Use the graph below 1 a) Why is the total area under this curve equal to 1? Rectangle; A = LW A = 1(1) = 1 b) What percent of the observations lie above 0.8? 1

More information

STOR 155 Introductory Statistics. Lecture 4: Displaying Distributions with Numbers (II)

STOR 155 Introductory Statistics. Lecture 4: Displaying Distributions with Numbers (II) The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL STOR 155 Introductory Statistics Lecture 4: Displaying Distributions with Numbers (II) 9/8/09 Lecture 4 1 Numerical Summary for Distributions Center Mean

More information

Chapter 5. Understanding and Comparing. Distributions

Chapter 5. Understanding and Comparing. Distributions STAT 141 Introduction to Statistics Chapter 5 Understanding and Comparing Distributions Bin Zou (bzou@ualberta.ca) STAT 141 University of Alberta Winter 2015 1 / 27 Boxplots How to create a boxplot? Assume

More information

Stat 101 Exam 1 Important Formulas and Concepts 1

Stat 101 Exam 1 Important Formulas and Concepts 1 1 Chapter 1 1.1 Definitions Stat 101 Exam 1 Important Formulas and Concepts 1 1. Data Any collection of numbers, characters, images, or other items that provide information about something. 2. Categorical/Qualitative

More information

Section 2.3: One Quantitative Variable: Measures of Spread

Section 2.3: One Quantitative Variable: Measures of Spread Section 2.3: One Quantitative Variable: Measures of Spread Objectives: 1) Measures of spread, variability a. Range b. Standard deviation i. Formula ii. Notation for samples and population 2) The 95% rule

More information

Chapter 6. The Standard Deviation as a Ruler and the Normal Model 1 /67

Chapter 6. The Standard Deviation as a Ruler and the Normal Model 1 /67 Chapter 6 The Standard Deviation as a Ruler and the Normal Model 1 /67 Homework Read Chpt 6 Complete Reading Notes Do P129 1, 3, 5, 7, 15, 17, 23, 27, 29, 31, 37, 39, 43 2 /67 Objective Students calculate

More information

Shape, Outliers, Center, Spread Frequency and Relative Histograms Related to other types of graphical displays

Shape, Outliers, Center, Spread Frequency and Relative Histograms Related to other types of graphical displays Histograms: Shape, Outliers, Center, Spread Frequency and Relative Histograms Related to other types of graphical displays Sep 9 1:13 PM Shape: Skewed left Bell shaped Symmetric Bi modal Symmetric Skewed

More information

Chapter 6 Group Activity - SOLUTIONS

Chapter 6 Group Activity - SOLUTIONS Chapter 6 Group Activity - SOLUTIONS Group Activity Summarizing a Distribution 1. The following data are the number of credit hours taken by Math 105 students during a summer term. You will be analyzing

More information

TOPIC: Descriptive Statistics Single Variable

TOPIC: Descriptive Statistics Single Variable TOPIC: Descriptive Statistics Single Variable I. Numerical data summary measurements A. Measures of Location. Measures of central tendency Mean; Median; Mode. Quantiles - measures of noncentral tendency

More information

Lecture 4 Scatterplots, Association, and Correlation

Lecture 4 Scatterplots, Association, and Correlation Lecture 4 Scatterplots, Association, and Correlation Previously, we looked at Single variables on their own One or more categorical variable In this lecture: We shall look at two quantitative variables.

More information

Sampling, Frequency Distributions, and Graphs (12.1)

Sampling, Frequency Distributions, and Graphs (12.1) 1 Sampling, Frequency Distributions, and Graphs (1.1) Design: Plan how to obtain the data. What are typical Statistical Methods? Collect the data, which is then subjected to statistical analysis, which

More information

3.1 Measure of Center

3.1 Measure of Center 3.1 Measure of Center Calculate the mean for a given data set Find the median, and describe why the median is sometimes preferable to the mean Find the mode of a data set Describe how skewness affects

More information

The Standard Deviation as a Ruler and the Normal Model

The Standard Deviation as a Ruler and the Normal Model The Standard Deviation as a Ruler and the Normal Model Al Nosedal University of Toronto Summer 2017 Al Nosedal University of Toronto The Standard Deviation as a Ruler and the Normal Model Summer 2017 1

More information

M 225 Test 1 B Name SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 75

M 225 Test 1 B Name SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 75 M 225 Test 1 B Name SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points 1-13 13 14 3 15 8 16 4 17 10 18 9 19 7 20 3 21 16 22 2 Total 75 1 Multiple choice questions (1 point each) 1. Look at

More information

EQ: What is a normal distribution?

EQ: What is a normal distribution? Unit 5 - Statistics What is the purpose EQ: What tools do we have to assess data? this unit? What vocab will I need? Vocabulary: normal distribution, standard, nonstandard, interquartile range, population

More information

Section 3. Measures of Variation

Section 3. Measures of Variation Section 3 Measures of Variation Range Range = (maximum value) (minimum value) It is very sensitive to extreme values; therefore not as useful as other measures of variation. Sample Standard Deviation The

More information

INFERENCE FOR REGRESSION

INFERENCE FOR REGRESSION CHAPTER 3 INFERENCE FOR REGRESSION OVERVIEW In Chapter 5 of the textbook, we first encountered regression. The assumptions that describe the regression model we use in this chapter are the following. We

More information

Describing Distributions With Numbers

Describing Distributions With Numbers Describing Distributions With Numbers October 24, 2012 What Do We Usually Summarize? Measures of Center. Percentiles. Measures of Spread. A Summary Statement. Choosing Numerical Summaries. 1.0 What Do

More information

F78SC2 Notes 2 RJRC. If the interest rate is 5%, we substitute x = 0.05 in the formula. This gives

F78SC2 Notes 2 RJRC. If the interest rate is 5%, we substitute x = 0.05 in the formula. This gives F78SC2 Notes 2 RJRC Algebra It is useful to use letters to represent numbers. We can use the rules of arithmetic to manipulate the formula and just substitute in the numbers at the end. Example: 100 invested

More information

Lecture 4 Scatterplots, Association, and Correlation

Lecture 4 Scatterplots, Association, and Correlation Lecture 4 Scatterplots, Association, and Correlation Previously, we looked at Single variables on their own One or more categorical variables In this lecture: We shall look at two quantitative variables.

More information

Chapter 3: Examining Relationships

Chapter 3: Examining Relationships Chapter 3 Review Chapter 3: Examining Relationships 1. A study is conducted to determine if one can predict the yield of a crop based on the amount of yearly rainfall. The response variable in this study

More information

CHAPTER 5: EXPLORING DATA DISTRIBUTIONS. Individuals are the objects described by a set of data. These individuals may be people, animals or things.

CHAPTER 5: EXPLORING DATA DISTRIBUTIONS. Individuals are the objects described by a set of data. These individuals may be people, animals or things. (c) Epstein 2013 Chapter 5: Exploring Data Distributions Page 1 CHAPTER 5: EXPLORING DATA DISTRIBUTIONS 5.1 Creating Histograms Individuals are the objects described by a set of data. These individuals

More information

Mrs. Poyner/Mr. Page Chapter 3 page 1

Mrs. Poyner/Mr. Page Chapter 3 page 1 Name: Date: Period: Chapter 2: Take Home TEST Bivariate Data Part 1: Multiple Choice. (2.5 points each) Hand write the letter corresponding to the best answer in space provided on page 6. 1. In a statistics

More information

Describing Distributions With Numbers Chapter 12

Describing Distributions With Numbers Chapter 12 Describing Distributions With Numbers Chapter 12 May 1, 2013 What Do We Usually Summarize? Measures of Center. Percentiles. Measures of Spread. A Summary. 1.0 What Do We Usually Summarize? source: Prof.

More information

(i) The mean and mode both equal the median; that is, the average value and the most likely value are both in the middle of the distribution.

(i) The mean and mode both equal the median; that is, the average value and the most likely value are both in the middle of the distribution. MATH 382 Normal Distributions Dr. Neal, WKU Measurements that are normally distributed can be described in terms of their mean µ and standard deviation σ. These measurements should have the following properties:

More information

STATISTICS 1 REVISION NOTES

STATISTICS 1 REVISION NOTES STATISTICS 1 REVISION NOTES Statistical Model Representing and summarising Sample Data Key words: Quantitative Data This is data in NUMERICAL FORM such as shoe size, height etc. Qualitative Data This is

More information

Lecture 6: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries)

Lecture 6: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries) Lecture 6: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries) Summarize with Shape, Center, Spread Displays: Stemplots, Histograms Five Number Summary, Outliers, Boxplots Cengage Learning

More information

Density Curves and the Normal Distributions. Histogram: 10 groups

Density Curves and the Normal Distributions. Histogram: 10 groups Density Curves and the Normal Distributions MATH 2300 Chapter 6 Histogram: 10 groups 1 Histogram: 20 groups Histogram: 40 groups 2 Histogram: 80 groups Histogram: 160 groups 3 Density Curve Density Curves

More information

Chapter 3. Measuring data

Chapter 3. Measuring data Chapter 3 Measuring data 1 Measuring data versus presenting data We present data to help us draw meaning from it But pictures of data are subjective They re also not susceptible to rigorous inference Measuring

More information

Perhaps the most important measure of location is the mean (average). Sample mean: where n = sample size. Arrange the values from smallest to largest:

Perhaps the most important measure of location is the mean (average). Sample mean: where n = sample size. Arrange the values from smallest to largest: 1 Chapter 3 - Descriptive stats: Numerical measures 3.1 Measures of Location Mean Perhaps the most important measure of location is the mean (average). Sample mean: where n = sample size Example: The number

More information

Chapters 1 & 2 Exam Review

Chapters 1 & 2 Exam Review Problems 1-3 refer to the following five boxplots. 1.) To which of the above boxplots does the following histogram correspond? (A) A (B) B (C) C (D) D (E) E 2.) To which of the above boxplots does the

More information

Finding Quartiles. . Q1 is the median of the lower half of the data. Q3 is the median of the upper half of the data

Finding Quartiles. . Q1 is the median of the lower half of the data. Q3 is the median of the upper half of the data Finding Quartiles. Use the median to divide the ordered data set into two halves.. If n is odd, do not include the median in either half. If n is even, split this data set exactly in half.. Q1 is the median

More information

Math 082 Final Examination Review

Math 082 Final Examination Review Math 08 Final Examination Review 1) Write the equation of the line that passes through the points (4, 6) and (0, 3). Write your answer in slope-intercept form. ) Write the equation of the line that passes

More information

ACMS Statistics for Life Sciences. Chapter 11: The Normal Distributions

ACMS Statistics for Life Sciences. Chapter 11: The Normal Distributions ACMS 20340 Statistics for Life Sciences Chapter 11: The Normal Distributions Introducing the Normal Distributions The class of Normal distributions is the most widely used variety of continuous probability

More information

Unit 2: Numerical Descriptive Measures

Unit 2: Numerical Descriptive Measures Unit 2: Numerical Descriptive Measures Summation Notation Measures of Central Tendency Measures of Dispersion Chebyshev's Rule Empirical Rule Measures of Relative Standing Box Plots z scores Jan 28 10:48

More information

Chapter. Numerically Summarizing Data Pearson Prentice Hall. All rights reserved

Chapter. Numerically Summarizing Data Pearson Prentice Hall. All rights reserved Chapter 3 Numerically Summarizing Data Section 3.1 Measures of Central Tendency Objectives 1. Determine the arithmetic mean of a variable from raw data 2. Determine the median of a variable from raw data

More information

Practice problems from chapters 2 and 3

Practice problems from chapters 2 and 3 Practice problems from chapters and 3 Question-1. For each of the following variables, indicate whether it is quantitative or qualitative and specify which of the four levels of measurement (nominal, ordinal,

More information

[ ESS ESS ] / 2 [ ] / ,019.6 / Lab 10 Key. Regression Analysis: wage versus yrsed, ex

[ ESS ESS ] / 2 [ ] / ,019.6 / Lab 10 Key. Regression Analysis: wage versus yrsed, ex Lab 1 Key Regression Analysis: wage versus yrsed, ex wage = - 4.78 + 1.46 yrsed +.126 ex Constant -4.78 2.146-2.23.26 yrsed 1.4623.153 9.73. ex.12635.2739 4.61. S = 8.9851 R-Sq = 11.9% R-Sq(adj) = 11.7%

More information

AP Final Review II Exploring Data (20% 30%)

AP Final Review II Exploring Data (20% 30%) AP Final Review II Exploring Data (20% 30%) Quantitative vs Categorical Variables Quantitative variables are numerical values for which arithmetic operations such as means make sense. It is usually a measure

More information

What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty.

What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty. What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty. Statistics is a field of study concerned with the data collection,

More information

Chapter 3. Data Description

Chapter 3. Data Description Chapter 3. Data Description Graphical Methods Pie chart It is used to display the percentage of the total number of measurements falling into each of the categories of the variable by partition a circle.

More information

Units. Exploratory Data Analysis. Variables. Student Data

Units. Exploratory Data Analysis. Variables. Student Data Units Exploratory Data Analysis Bret Larget Departments of Botany and of Statistics University of Wisconsin Madison Statistics 371 13th September 2005 A unit is an object that can be measured, such as

More information

Math 10 - Compilation of Sample Exam Questions + Answers

Math 10 - Compilation of Sample Exam Questions + Answers Math 10 - Compilation of Sample Exam Questions + Sample Exam Question 1 We have a population of size N. Let p be the independent probability of a person in the population developing a disease. Answer the

More information

Chapter 3: Displaying and summarizing quantitative data p52 The pattern of variation of a variable is called its distribution.

Chapter 3: Displaying and summarizing quantitative data p52 The pattern of variation of a variable is called its distribution. Chapter 3: Displaying and summarizing quantitative data p52 The pattern of variation of a variable is called its distribution. 1 Histograms p53 Spoiled ballots are a real threat to democracy. Below are

More information

Lecture 10/Chapter 8 Bell-Shaped Curves & Other Shapes. From a Histogram to a Frequency Curve Standard Score Using Normal Table Empirical Rule

Lecture 10/Chapter 8 Bell-Shaped Curves & Other Shapes. From a Histogram to a Frequency Curve Standard Score Using Normal Table Empirical Rule Lecture 10/Chapter 8 Bell-Shaped Curves & Other Shapes From a Histogram to a Frequency Curve Standard Score Using Normal Table Empirical Rule From Histogram to Normal Curve Start: sample of female hts

More information

Introduction to Statistics

Introduction to Statistics Introduction to Statistics Data and Statistics Data consists of information coming from observations, counts, measurements, or responses. Statistics is the science of collecting, organizing, analyzing,

More information

The Normal Distribution. Chapter 6

The Normal Distribution. Chapter 6 + The Normal Distribution Chapter 6 + Applications of the Normal Distribution Section 6-2 + The Standard Normal Distribution and Practical Applications! We can convert any variable that in normally distributed

More information

Range The range is the simplest of the three measures and is defined now.

Range The range is the simplest of the three measures and is defined now. Measures of Variation EXAMPLE A testing lab wishes to test two experimental brands of outdoor paint to see how long each will last before fading. The testing lab makes 6 gallons of each paint to test.

More information

Review for Exam #1. Chapter 1. The Nature of Data. Definitions. Population. Sample. Quantitative data. Qualitative (attribute) data

Review for Exam #1. Chapter 1. The Nature of Data. Definitions. Population. Sample. Quantitative data. Qualitative (attribute) data Review for Exam #1 1 Chapter 1 Population the complete collection of elements (scores, people, measurements, etc.) to be studied Sample a subcollection of elements drawn from a population 11 The Nature

More information

Chapter 5: Exploring Data: Distributions Lesson Plan

Chapter 5: Exploring Data: Distributions Lesson Plan Lesson Plan Exploring Data Displaying Distributions: Histograms Interpreting Histograms Displaying Distributions: Stemplots Describing Center: Mean and Median Describing Variability: The Quartiles The

More information

Descriptive Univariate Statistics and Bivariate Correlation

Descriptive Univariate Statistics and Bivariate Correlation ESC 100 Exploring Engineering Descriptive Univariate Statistics and Bivariate Correlation Instructor: Sudhir Khetan, Ph.D. Wednesday/Friday, October 17/19, 2012 The Central Dogma of Statistics used to

More information