Chapter 1 Describing Data

Size: px
Start display at page:

Download "Chapter 1 Describing Data"

Transcription

1 Chapter 1 Describing Data

2 Variable Basics Def: A Variable is any characteristic of an individual. Def: Individuals are the objects described by data. Note the term individual is somewhat flawed. Sometimes groups of things constitute an individual.

3 Variable Basics Broad categories of variables Categorical (AKA Qualitative) Groups or categories Quantitative (AKA Numerical) variables for which arithmetic operations make sense (summing, averaging, etc.) Continuous: No theoretical end to the ability to subdivide gradations Discrete: Variables with finite possible values. Often things we count (baskets made, failures experienced, dots on a die)

4 Levels of Measurement* Remember NOIR Nominal ( Names ) Ordinal ( Order of things ) Interval ( Same gaps between marks on a scale ) Ratio ( Twice is nice There s a true zero, and doubling really means doubling!) * Not officially part of the AP curriculum

5 Variability Variables vary! That s why we call them variables. Variability is how far from the middle. (Much more on this later) A Distribution of a variable is some kind of display that tells us: What values the variable takes on How often those values occur, or how probable they are to occur.

6 Graphs involving Categorical Variables Bar Graphs Bars don t touch Horizontal axis is a categorical variable Vertical axis is a numerical variable (Usually a count or a percent) See p. 8 for How to construct a bar graph.

7 Graphs involving Categorical Variables Pie Charts Must account for 100% of the data. Can use an other category if necessary. Only trick to constructing a pie chart is to equate percent with central angle Θ = (%) x 100 x 360

8 Displays of quantitative variables Two proto-histograms 1. Dotplot (see p. 11) Easy! Dotplots preserve your data it s all still there. 2. Stemplot All data is preserved. Final digits are the leaves Must have a key: 5 4 = 54 Apples Use a back-to-back stemplot to compare two distributions (see p. 58 for a good example) Space digits the same laterally. Then, when you re done, you can tilt your head to the side and see a sort of histogram. This shows the shape of the distribution.

9 Describing Data SOCS/U Shape Outliers Center Spread Unusual Features

10 Measures of Center (AKA Measures of Central Tendency ) Mean Median Mode

11 Side Trip The 5-number summary Min Q1 (The Median of the lower half) Median (AKA Q2) Q3 (The Median of the upper half) Max (AKA Q4)

12 The -ile suffix the number at or below which are fraction of the data E.g: The 75 th percentile is the number at or below which are 75 percent of all values in the data. E.g.: The 3 rd Quartile is the number at or below which are 3 quarters of all data.

13 Hot Calculator Move! 1-var stats (Stat Calc 1-var stats) Gives you summary info for your data, including the mean, standard deviation and the 5- number summary

14 Histograms (See text) Consider home prices in a nice neighborhood 2, 3, 3, 4, 5, 8, 10, 13 Task: Build a 4-class histogram

15 Histograms 2, 3, 3, 4, 5, 8, 10, 13 Class width: Range divided by number of classes, then round up to the next integer. (13-2)/4 = 2 +. Round up to 3.

16 In which class is the 40 th percentile? n=13 n=17 n=5 n=1 n=0 n=2 n=4 n=0 n=1

17 Measures of Spread (AKA Measures of Dispersion ) Range (Max Min = Range) Interquartile Range (Q3 Q1) = IQR Standard Deviation (More to follow)

18 Histograms Warmup Consider the following data regarding muffins produced by students for a class 4, 5, 6, 7, 8, 10, 12, 18, 22, 24, 36, 30, 31 16, 17, 19, 20, 38, 39, 45, 50 Task: Build a 5-class histogram with the smallest class starting at 4 First-Establish the width of each class algebraically (Here I shifted to the whiteboard for a demonstration)

19 Histograms Warmup Consider the following data regarding muffins produced by students for a class 4, 5, 6, 7, 8, 10, 12, 18, 22, 24, 36, 30, 31 16, 17, 19, 20, 38, 39, 45, 50 Task: Build a 5-class histogram with the smallest class starting at 4 First-Establish the width of each class algebraically After you build a histogram, build a cumulative relative frequency ogive for the same data.

20 Outliers! Defined: Any observation that s not part of the overall pattern. The 1.5 IQR Rule On the high side: A number is an outlier if it is greater than Q IQR On the low side: A number is an outlier if it is lower than Q1 1.5 IQR

21 Outliers Data: 4, 3, 6, 8, 2, 1, 16, 7 Task: Determine whether 16 is an outlier. Establish the 5-number summary 1.5 IQR = Q IQR = So, 16 an outlier Yes/ No is / is not Min = Q1 = Med = Q3 = Max =

22 Outliers Data: -1, 4, 6, 8, 7, 6, 9, 5 Task: Determine whether -1 is an outlier. Establish the 5-number summary 1.5 IQR = Q1-1.5 IQR = So, -1 an outlier Yes/ No is / is not Min = Q1 = Med = Q3 = Max =

23 While we re at it: Boxplots Four elements to a successful (fully creditworthy) boxplot: A scale distinct from the plot itself The boxplot itself Labels along the top of the boxplot (Min, Q1, Median, Q3, Max) Values along the bottom. Whiteboard demo.

24 Variance, then Standard Deviation Standard Deviation: The most common measure of spread Appropriate when you ve chosen mean as your measure of center. This implies an approximately symmetric distribution Variance: The waypoint on the way to calculating standard deviation

25 Variance For a Sample: s 2 s 2 ( xx) n1 ( 2n 1) ( xx) 2 2 ( xx) n 2 For a Population 2 s 2 2 ( xx) n1 ( N ) 2 ( x) ( xx) n 2 2

26 Standard Deviation For a Sample: s ( xx) 2 ( n1) For a Population: ( 2x ) s 2 ( N) ( xx) n1 2 2 ( xx) n 2

27 Why n-1 for a sample Edward Sapir(?) Possibly the most frequently asked and least frequently answered question is why does the definition of the standard deviation involve division by -1, when might seem the obvious choice. This is a question which perplexes introductory statistics students and calculator manufacturers alike. The explanations given in calculator manuals tend to range from obscure to fanciful, and both options are given on the keypad, usually labelled s-1 and s to add to the confusion. (The symbol s is reserved for the standard deviation of a random variable or a population/distribution, an entity which lecturers valiantly try but usually fail to keep distinct from its sample counterpart.) Australian students first meet the standard deviation in secondary school, where the definition given does indeed involve division by. This definition is preferred to avoid the question about the -1 being raised it would seem. Secondary school teachers have a hard enough life as it is. And it must be remembered that the difference between the two definitions is largely academic for all but the smallest of sample sizes (say, less than 10 observations). So, don t get too agitated by the revamped definition. The truth can be told but the telling usually quells the desire to know. If the fire still burns in your belly, read on. The most widely accepted explanation involves the concept of unbiasedness, and I see some have stopped reading already. If you fire arrows at a target and consistently hit a mark 5 cm to the left of the bullseye, there is something wrong with your aim. It shows a bias. The definition of the sample which involves division by has this flaw. It consistently underestimates the variance of the population/distribution from which the sample was drawn. The -1 formula fixes the astigmatism. Compelling and relatively simple as this argument is, it doesn t quite ring true. Both definitions of the sample standard deviation produce biased estimates of the of the population/distribution, although the -1 alternative is less biased. If you re after unbiasedness, why not use a definition which gives you unbiasedness where it s needed - on the original scale of measurement, rather than on the squared scale. Such a contender exists, but it involves gamma functions in the definition, and I see quite a few more people have drifted away. (Gamma functions extend the concept of factorials to non-integers.) The real reason is a simple housekeeping issue. If you deal with the -1 straight away in the definition of the standard deviation, it doesn t keep popping up in every subsequent procedure involving the standard deviation, to the increasing annoyance of all concerned. The subsequent procedures in question involve the definition of the and c2 distributions where the issue of degrees of freedom arises. Degrees of freedom means what it says - in how many independent directions can you move at once. If you re a point moving on a page, you are moving in two dimensions and you have correspondingly two degrees of freedom. The freedom to move up the page and the freedom to move across it. Any motion on a page can be described in terms of these two independent motions. Now consider a sample of size. It inhabits an dimensional space. There are degrees of freedom in total. Each sample member is free to take any value it likes, independently from all the others. If however, you fix the sample mean, then the sample values are constrained to have a fixed sum. You can let -1 of them roam free, but the value of the remaining sample value is determined by the fixed sum. The space inhabited by the deviations from the sample mean is thus -1 dimensional rather than dimensional, since the deviations must sum to zero. The sum of the squared deviations, although looking like a sum of things is actually a sum of only -1 independent things, and its natural divisor - its degrees of freedom - is also -1. But wait, there s more. Degrees of freedom will return to haunt you if and when you do (ANOVA to its friends). You will be ahead of the game if you grasp the concept now. Degrees of freedom can neither be created nor destroyed. You start off with, the sample size. You use up a few trying to estimate the structure of the mean. For example, the mean could be a straight line, as in simple linear regression. You need two degrees of freedom to estimate the two characteristics possessed by all straight lines - a slope and an intercept. These characteristics are called parameters. So, two degrees of freedom have gone into the mean. This is the signal. Everything else in this model is noise. The remaining -2 degrees of freedom go into estimating the one parameter which describes the noise - the variance. If you re not part of the solution (the signal or mean) you re part of the problem (the noise). The simplest model is the one which says the mean is a single constant, ably estimated by the sample mean. Everything else is just inexplicable variation about that constant. That s -1 degrees of freedom s worth of noise, all kindly donated to the sample variance. Kim Anderson, mayor of Naples, Florida in Servant Leadership by Robert Greenleaf

28 Why n-1 for a sample? In other words It produces an unbiased estimator of the population SD, even though you only have a sample. It relates to the number of degrees of freedom the formula enjoys. If mean is fixed in place, only n-1 of the values could be filled in freely. That last value would have to respond to all those others. DON T WORRY ABOUT IT!

29 Important-to-know stuff about variance and standard deviation The sum of all deviations will always = 0, so if we merely averaged them, the average would be 0. (How boring!) This is why it makes sense to square the deviations before averaging them. However, squaring makes the units of variance the squares of the data s units. Hmmm. Dollars squared, grams squared it s wacky. After squaring the deviations and averaging them, the final square root unsquares the result. This returns the units of standard deviation back to the same units as the data. This alone is a good reason to use standard deviation instead of variance as our measure of spread.

30 Linear Transformations (Adding a constant) Exercise (All hands ) Take the numbers 2, 5, 7, 6, 4 (a population) Calculate mean: and Std Dev: Now add 2 to each value Calculate mean: and Std Dev: Conclusion:

31 Linear Transformations 1 (Adding a constant) Exercise (All hands ) Take the numbers 2, 5, 7, 6, 4 Calculate mean: and Std Dev: Now add 2 to each value Calculate mean: and Std Dev: Conclusion: Adding a number raises mean by that amount, but standard deviation remains unchanged.

32 Linear Transformations 1 (Multiplying by a constant) Exercise (All hands ) Now take the same numbers: 2, 5, 7, 6, 4 Originally, mean is 4.8, std dev (pop) is 1.72 Multiply each value by 3. Calculate mean: and Std Dev: How does this compare to the original mean and standard deviation?

33 Linear Transformations (Multiplying by a constant) Exercise (All hands ) Now take the same numbers: 2, 5, 7, 6, 4 Originally, mean is 4.8, std dev (pop) is 1.72 Multiply each value by 3. Calculate mean: and Std Dev: Both Mean and Standard Deviation went up by a factor of 3.

34 Linear Transformation Example 1 You are performing a physics experiment in which you measure temperatures in degrees Celsius. The mean temperature is 47, with a standard deviation of 13. Another physicist is interested in your research but wants all values in degrees Kelvin. What are the new mean and standard deviation?

35 Linear Transformation Example 1 You are performing a physics experiment in which you measure temperatures in degrees Celsius. The mean temperature is 47, with a standard deviation of 13. K = C +273, so add 273 to mean, and leave standard deviation alone. μ = 320, and σ = 13

36 Linear Transformation Example 2 Same setup: You are performing a physics experiment in which you measure temperatures in degrees Celsius. The mean temperature is 47, with a standard deviation of 13. Now, a different physicist wants results in degrees Fahrenheit. Conversion: F = (9/5)C +32

37 Linear Transformation Example 2 Conversion: F = (9/5)C +32 First, take care of the multiplication step: Mean = (9/5)47 = 84.6 degrees F Standard Deviation = (9/5) 13 = 23.5 degrees F Next, take care of the + 32 Add 32 to the mean: = Leave Standard Deviation unchanged = 23.5

38 Describing a set of Data Example: Lengths of spiders The distribution of spider lengths is approximately symmetric with a mean of 1.7 centimeters and a standard deviation of.3 CM. There are two outliers among the data at 3.3 and 3.7 CM, possibly accounted for by the fact that these two were found in a different area. There is a gap in the data between.8 and 1.0 CM which cannot be accounted for by any known flaw in sampling.

39 Example: Problem 12 from HW 1-1 The distribution of percents of those aged 65 and older in each state is approximately symmetric, with a mean of 12.7 percent and a standard deviation of 1.94 percentage points. There are two outliers one at the high end and one at the low end. Alaska has only 5.5 percent of its population 65 or over, and Florida has 18.3% in that age group. On further review, Median and 5-Num Summary may be more approriate than Mean/Std Dev.There are two gaps in the data. No state has a percent 65 and over between 6 and 8%, or between 16 and 18 percent, so with the exception of the outliers, states percent 65 and over are from about 9 to 15% of their populations.

40 When Describing: Address each element of SOCS/U Not necessarily in SOCS/U order Speak to context. Ex: The distribution of spider lengths is Not simply, the distribution is Write in English, not bulletized grunts.

Name SUMMARY/QUESTIONS TO ASK IN CLASS AP STATISTICS CHAPTER 1: NOTES CUES. 1. What is the difference between descriptive and inferential statistics?

Name SUMMARY/QUESTIONS TO ASK IN CLASS AP STATISTICS CHAPTER 1: NOTES CUES. 1. What is the difference between descriptive and inferential statistics? CUES 1. What is the difference between descriptive and inferential statistics? 2. What is the difference between an Individual and a Variable? 3. What is the difference between a categorical and a quantitative

More information

Chapter 2: Tools for Exploring Univariate Data

Chapter 2: Tools for Exploring Univariate Data Stats 11 (Fall 2004) Lecture Note Introduction to Statistical Methods for Business and Economics Instructor: Hongquan Xu Chapter 2: Tools for Exploring Univariate Data Section 2.1: Introduction What is

More information

QUANTITATIVE DATA. UNIVARIATE DATA data for one variable

QUANTITATIVE DATA. UNIVARIATE DATA data for one variable QUANTITATIVE DATA Recall that quantitative (numeric) data values are numbers where data take numerical values for which it is sensible to find averages, such as height, hourly pay, and pulse rates. UNIVARIATE

More information

STP 420 INTRODUCTION TO APPLIED STATISTICS NOTES

STP 420 INTRODUCTION TO APPLIED STATISTICS NOTES INTRODUCTION TO APPLIED STATISTICS NOTES PART - DATA CHAPTER LOOKING AT DATA - DISTRIBUTIONS Individuals objects described by a set of data (people, animals, things) - all the data for one individual make

More information

TOPIC: Descriptive Statistics Single Variable

TOPIC: Descriptive Statistics Single Variable TOPIC: Descriptive Statistics Single Variable I. Numerical data summary measurements A. Measures of Location. Measures of central tendency Mean; Median; Mode. Quantiles - measures of noncentral tendency

More information

What is statistics? Statistics is the science of: Collecting information. Organizing and summarizing the information collected

What is statistics? Statistics is the science of: Collecting information. Organizing and summarizing the information collected What is statistics? Statistics is the science of: Collecting information Organizing and summarizing the information collected Analyzing the information collected in order to draw conclusions Two types

More information

Chapter 2 Class Notes Sample & Population Descriptions Classifying variables

Chapter 2 Class Notes Sample & Population Descriptions Classifying variables Chapter 2 Class Notes Sample & Population Descriptions Classifying variables Random Variables (RVs) are discrete quantitative continuous nominal qualitative ordinal Notation and Definitions: a Sample is

More information

are the objects described by a set of data. They may be people, animals or things.

are the objects described by a set of data. They may be people, animals or things. ( c ) E p s t e i n, C a r t e r a n d B o l l i n g e r 2016 C h a p t e r 5 : E x p l o r i n g D a t a : D i s t r i b u t i o n s P a g e 1 CHAPTER 5: EXPLORING DATA DISTRIBUTIONS 5.1 Creating Histograms

More information

Histograms allow a visual interpretation

Histograms allow a visual interpretation Chapter 4: Displaying and Summarizing i Quantitative Data s allow a visual interpretation of quantitative (numerical) data by indicating the number of data points that lie within a range of values, called

More information

Last Lecture. Distinguish Populations from Samples. Knowing different Sampling Techniques. Distinguish Parameters from Statistics

Last Lecture. Distinguish Populations from Samples. Knowing different Sampling Techniques. Distinguish Parameters from Statistics Last Lecture Distinguish Populations from Samples Importance of identifying a population and well chosen sample Knowing different Sampling Techniques Distinguish Parameters from Statistics Knowing different

More information

AP Final Review II Exploring Data (20% 30%)

AP Final Review II Exploring Data (20% 30%) AP Final Review II Exploring Data (20% 30%) Quantitative vs Categorical Variables Quantitative variables are numerical values for which arithmetic operations such as means make sense. It is usually a measure

More information

1. Exploratory Data Analysis

1. Exploratory Data Analysis 1. Exploratory Data Analysis 1.1 Methods of Displaying Data A visual display aids understanding and can highlight features which may be worth exploring more formally. Displays should have impact and be

More information

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

Elementary Statistics

Elementary Statistics Elementary Statistics Q: What is data? Q: What does the data look like? Q: What conclusions can we draw from the data? Q: Where is the middle of the data? Q: Why is the spread of the data important? Q:

More information

F78SC2 Notes 2 RJRC. If the interest rate is 5%, we substitute x = 0.05 in the formula. This gives

F78SC2 Notes 2 RJRC. If the interest rate is 5%, we substitute x = 0.05 in the formula. This gives F78SC2 Notes 2 RJRC Algebra It is useful to use letters to represent numbers. We can use the rules of arithmetic to manipulate the formula and just substitute in the numbers at the end. Example: 100 invested

More information

Chapter2 Description of samples and populations. 2.1 Introduction.

Chapter2 Description of samples and populations. 2.1 Introduction. Chapter2 Description of samples and populations. 2.1 Introduction. Statistics=science of analyzing data. Information collected (data) is gathered in terms of variables (characteristics of a subject that

More information

MATH 1150 Chapter 2 Notation and Terminology

MATH 1150 Chapter 2 Notation and Terminology MATH 1150 Chapter 2 Notation and Terminology Categorical Data The following is a dataset for 30 randomly selected adults in the U.S., showing the values of two categorical variables: whether or not the

More information

STOR 155 Introductory Statistics. Lecture 4: Displaying Distributions with Numbers (II)

STOR 155 Introductory Statistics. Lecture 4: Displaying Distributions with Numbers (II) The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL STOR 155 Introductory Statistics Lecture 4: Displaying Distributions with Numbers (II) 9/8/09 Lecture 4 1 Numerical Summary for Distributions Center Mean

More information

Measures of center. The mean The mean of a distribution is the arithmetic average of the observations:

Measures of center. The mean The mean of a distribution is the arithmetic average of the observations: Measures of center The mean The mean of a distribution is the arithmetic average of the observations: x = x 1 + + x n n n = 1 x i n i=1 The median The median is the midpoint of a distribution: the number

More information

Section 3. Measures of Variation

Section 3. Measures of Variation Section 3 Measures of Variation Range Range = (maximum value) (minimum value) It is very sensitive to extreme values; therefore not as useful as other measures of variation. Sample Standard Deviation The

More information

Describing distributions with numbers

Describing distributions with numbers Describing distributions with numbers A large number or numerical methods are available for describing quantitative data sets. Most of these methods measure one of two data characteristics: The central

More information

Linear Regression. Linear Regression. Linear Regression. Did You Mean Association Or Correlation?

Linear Regression. Linear Regression. Linear Regression. Did You Mean Association Or Correlation? Did You Mean Association Or Correlation? AP Statistics Chapter 8 Be careful not to use the word correlation when you really mean association. Often times people will incorrectly use the word correlation

More information

STAT 200 Chapter 1 Looking at Data - Distributions

STAT 200 Chapter 1 Looking at Data - Distributions STAT 200 Chapter 1 Looking at Data - Distributions What is Statistics? Statistics is a science that involves the design of studies, data collection, summarizing and analyzing the data, interpreting the

More information

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization. Statistical Tools in Evaluation HPS 41 Fall 213 Dr. Joe G. Schmalfeldt Types of Scores Continuous Scores scores with a potentially infinite number of values. Discrete Scores scores limited to a specific

More information

CHAPTER 5: EXPLORING DATA DISTRIBUTIONS. Individuals are the objects described by a set of data. These individuals may be people, animals or things.

CHAPTER 5: EXPLORING DATA DISTRIBUTIONS. Individuals are the objects described by a set of data. These individuals may be people, animals or things. (c) Epstein 2013 Chapter 5: Exploring Data Distributions Page 1 CHAPTER 5: EXPLORING DATA DISTRIBUTIONS 5.1 Creating Histograms Individuals are the objects described by a set of data. These individuals

More information

Determining the Spread of a Distribution

Determining the Spread of a Distribution Determining the Spread of a Distribution 1.3-1.5 Cathy Poliak, Ph.D. cathy@math.uh.edu Department of Mathematics University of Houston Lecture 3-2311 Lecture 3-2311 1 / 58 Outline 1 Describing Quantitative

More information

Determining the Spread of a Distribution

Determining the Spread of a Distribution Determining the Spread of a Distribution 1.3-1.5 Cathy Poliak, Ph.D. cathy@math.uh.edu Department of Mathematics University of Houston Lecture 3-2311 Lecture 3-2311 1 / 58 Outline 1 Describing Quantitative

More information

SESSION 5 Descriptive Statistics

SESSION 5 Descriptive Statistics SESSION 5 Descriptive Statistics Descriptive statistics are used to describe the basic features of the data in a study. They provide simple summaries about the sample and the measures. Together with simple

More information

Descriptive Data Summarization

Descriptive Data Summarization Descriptive Data Summarization Descriptive data summarization gives the general characteristics of the data and identify the presence of noise or outliers, which is useful for successful data cleaning

More information

ST Presenting & Summarising Data Descriptive Statistics. Frequency Distribution, Histogram & Bar Chart

ST Presenting & Summarising Data Descriptive Statistics. Frequency Distribution, Histogram & Bar Chart ST2001 2. Presenting & Summarising Data Descriptive Statistics Frequency Distribution, Histogram & Bar Chart Summary of Previous Lecture u A study often involves taking a sample from a population that

More information

Stat 101 Exam 1 Important Formulas and Concepts 1

Stat 101 Exam 1 Important Formulas and Concepts 1 1 Chapter 1 1.1 Definitions Stat 101 Exam 1 Important Formulas and Concepts 1 1. Data Any collection of numbers, characters, images, or other items that provide information about something. 2. Categorical/Qualitative

More information

P8130: Biostatistical Methods I

P8130: Biostatistical Methods I P8130: Biostatistical Methods I Lecture 2: Descriptive Statistics Cody Chiuzan, PhD Department of Biostatistics Mailman School of Public Health (MSPH) Lecture 1: Recap Intro to Biostatistics Types of Data

More information

Lecture 3B: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries)

Lecture 3B: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries) Lecture 3B: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries) Summarize with Shape, Center, Spread Displays: Stemplots, Histograms Five Number Summary, Outliers, Boxplots Mean vs.

More information

Lecture 6: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries)

Lecture 6: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries) Lecture 6: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries) Summarize with Shape, Center, Spread Displays: Stemplots, Histograms Five Number Summary, Outliers, Boxplots Cengage Learning

More information

Lecture 2 and Lecture 3

Lecture 2 and Lecture 3 Lecture 2 and Lecture 3 1 Lecture 2 and Lecture 3 We can describe distributions using 3 characteristics: shape, center and spread. These characteristics have been discussed since the foundation of statistics.

More information

Lecture 2. Quantitative variables. There are three main graphical methods for describing, summarizing, and detecting patterns in quantitative data:

Lecture 2. Quantitative variables. There are three main graphical methods for describing, summarizing, and detecting patterns in quantitative data: Lecture 2 Quantitative variables There are three main graphical methods for describing, summarizing, and detecting patterns in quantitative data: Stemplot (stem-and-leaf plot) Histogram Dot plot Stemplots

More information

Chapter Four. Numerical Descriptive Techniques. Range, Standard Deviation, Variance, Coefficient of Variation

Chapter Four. Numerical Descriptive Techniques. Range, Standard Deviation, Variance, Coefficient of Variation Chapter Four Numerical Descriptive Techniques 4.1 Numerical Descriptive Techniques Measures of Central Location Mean, Median, Mode Measures of Variability Range, Standard Deviation, Variance, Coefficient

More information

Example 2. Given the data below, complete the chart:

Example 2. Given the data below, complete the chart: Statistics 2035 Quiz 1 Solutions Example 1. 2 64 150 150 2 128 150 2 256 150 8 8 Example 2. Given the data below, complete the chart: 52.4, 68.1, 66.5, 75.0, 60.5, 78.8, 63.5, 48.9, 81.3 n=9 The data is

More information

CIVL 7012/8012. Collection and Analysis of Information

CIVL 7012/8012. Collection and Analysis of Information CIVL 7012/8012 Collection and Analysis of Information Uncertainty in Engineering Statistics deals with the collection and analysis of data to solve real-world problems. Uncertainty is inherent in all real

More information

20 Hypothesis Testing, Part I

20 Hypothesis Testing, Part I 20 Hypothesis Testing, Part I Bob has told Alice that the average hourly rate for a lawyer in Virginia is $200 with a standard deviation of $50, but Alice wants to test this claim. If Bob is right, she

More information

Describing distributions with numbers

Describing distributions with numbers Describing distributions with numbers A large number or numerical methods are available for describing quantitative data sets. Most of these methods measure one of two data characteristics: The central

More information

1.3.1 Measuring Center: The Mean

1.3.1 Measuring Center: The Mean 1.3.1 Measuring Center: The Mean Mean - The arithmetic average. To find the mean (pronounced x bar) of a set of observations, add their values and divide by the number of observations. If the n observations

More information

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency Math 1 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency The word average: is very ambiguous and can actually refer to the mean, median, mode or midrange. Notation:

More information

Business Statistics. Lecture 10: Course Review

Business Statistics. Lecture 10: Course Review Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,

More information

BIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke

BIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke BIOL 51A - Biostatistics 1 1 Lecture 1: Intro to Biostatistics Smoking: hazardous? FEV (l) 1 2 3 4 5 No Yes Smoke BIOL 51A - Biostatistics 1 2 Box Plot a.k.a box-and-whisker diagram or candlestick chart

More information

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization. Statistical Tools in Evaluation HPS 41 Dr. Joe G. Schmalfeldt Types of Scores Continuous Scores scores with a potentially infinite number of values. Discrete Scores scores limited to a specific number

More information

Units. Exploratory Data Analysis. Variables. Student Data

Units. Exploratory Data Analysis. Variables. Student Data Units Exploratory Data Analysis Bret Larget Departments of Botany and of Statistics University of Wisconsin Madison Statistics 371 13th September 2005 A unit is an object that can be measured, such as

More information

3.1 Measure of Center

3.1 Measure of Center 3.1 Measure of Center Calculate the mean for a given data set Find the median, and describe why the median is sometimes preferable to the mean Find the mode of a data set Describe how skewness affects

More information

Chapter 1. Looking at Data

Chapter 1. Looking at Data Chapter 1 Looking at Data Types of variables Looking at Data Be sure that each variable really does measure what you want it to. A poor choice of variables can lead to misleading conclusions!! For example,

More information

Percentile: Formula: To find the percentile rank of a score, x, out of a set of n scores, where x is included:

Percentile: Formula: To find the percentile rank of a score, x, out of a set of n scores, where x is included: AP Statistics Chapter 2 Notes 2.1 Describing Location in a Distribution Percentile: The pth percentile of a distribution is the value with p percent of the observations (If your test score places you in

More information

Describing Distributions with Numbers

Describing Distributions with Numbers Describing Distributions with Numbers Using graphs, we could determine the center, spread, and shape of the distribution of a quantitative variable. We can also use numbers (called summary statistics)

More information

Objective A: Mean, Median and Mode Three measures of central of tendency: the mean, the median, and the mode.

Objective A: Mean, Median and Mode Three measures of central of tendency: the mean, the median, and the mode. Chapter 3 Numerically Summarizing Data Chapter 3.1 Measures of Central Tendency Objective A: Mean, Median and Mode Three measures of central of tendency: the mean, the median, and the mode. A1. Mean The

More information

Vocabulary: Samples and Populations

Vocabulary: Samples and Populations Vocabulary: Samples and Populations Concept Different types of data Categorical data results when the question asked in a survey or sample can be answered with a nonnumerical answer. For example if we

More information

Key Facts and Methods

Key Facts and Methods Intermediate Maths Key Facts and Methods Use this (as well as trying questions) to revise by: 1. Testing yourself. Asking a friend or family member to test you by reading the questions (on the lefthand

More information

Chapter 4. Displaying and Summarizing. Quantitative Data

Chapter 4. Displaying and Summarizing. Quantitative Data STAT 141 Introduction to Statistics Chapter 4 Displaying and Summarizing Quantitative Data Bin Zou (bzou@ualberta.ca) STAT 141 University of Alberta Winter 2015 1 / 31 4.1 Histograms 1 We divide the range

More information

additionalmathematicsstatisticsadditi onalmathematicsstatisticsadditionalm athematicsstatisticsadditionalmathem aticsstatisticsadditionalmathematicsst

additionalmathematicsstatisticsadditi onalmathematicsstatisticsadditionalm athematicsstatisticsadditionalmathem aticsstatisticsadditionalmathematicsst additionalmathematicsstatisticsadditi onalmathematicsstatisticsadditionalm athematicsstatisticsadditionalmathem aticsstatisticsadditionalmathematicsst STATISTICS atisticsadditionalmathematicsstatistic

More information

MATH 10 INTRODUCTORY STATISTICS

MATH 10 INTRODUCTORY STATISTICS MATH 10 INTRODUCTORY STATISTICS Tommy Khoo Your friendly neighbourhood graduate student. Week 1 Chapter 1 Introduction What is Statistics? Why do you need to know Statistics? Technical lingo and concepts:

More information

1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved.

1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved. 1-1 Chapter 1 Sampling and Descriptive Statistics 1-2 Why Statistics? Deal with uncertainty in repeated scientific measurements Draw conclusions from data Design valid experiments and draw reliable conclusions

More information

CHAPTER 1. Introduction

CHAPTER 1. Introduction CHAPTER 1 Introduction Engineers and scientists are constantly exposed to collections of facts, or data. The discipline of statistics provides methods for organizing and summarizing data, and for drawing

More information

Sociology 6Z03 Review I

Sociology 6Z03 Review I Sociology 6Z03 Review I John Fox McMaster University Fall 2016 John Fox (McMaster University) Sociology 6Z03 Review I Fall 2016 1 / 19 Outline: Review I Introduction Displaying Distributions Describing

More information

STT 315 This lecture is based on Chapter 2 of the textbook.

STT 315 This lecture is based on Chapter 2 of the textbook. STT 315 This lecture is based on Chapter 2 of the textbook. Acknowledgement: Author is thankful to Dr. Ashok Sinha, Dr. Jennifer Kaplan and Dr. Parthanil Roy for allowing him to use/edit some of their

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics CHAPTER OUTLINE 6-1 Numerical Summaries of Data 6- Stem-and-Leaf Diagrams 6-3 Frequency Distributions and Histograms 6-4 Box Plots 6-5 Time Sequence Plots 6-6 Probability Plots Chapter

More information

CS 361: Probability & Statistics

CS 361: Probability & Statistics January 24, 2018 CS 361: Probability & Statistics Relationships in data Standard coordinates If we have two quantities of interest in a dataset, we might like to plot their histograms and compare the two

More information

Chapters 1 & 2 Exam Review

Chapters 1 & 2 Exam Review Problems 1-3 refer to the following five boxplots. 1.) To which of the above boxplots does the following histogram correspond? (A) A (B) B (C) C (D) D (E) E 2.) To which of the above boxplots does the

More information

Sections OPIM 303, Managerial Statistics H Guy Williams, 2006

Sections OPIM 303, Managerial Statistics H Guy Williams, 2006 Sections 3.1 3.5 The three major properties which describe a set of data: Central Tendency Variation Shape OPIM 303 Lecture 3 Page 1 Most sets of data show a distinct tendency to group or cluster around

More information

CS 5014: Research Methods in Computer Science. Statistics: The Basic Idea. Statistics Questions (1) Statistics Questions (2) Clifford A.

CS 5014: Research Methods in Computer Science. Statistics: The Basic Idea. Statistics Questions (1) Statistics Questions (2) Clifford A. Department of Computer Science Virginia Tech Blacksburg, Virginia Copyright c 2015 by Clifford A. Shaffer Computer Science Title page Computer Science Clifford A. Shaffer Fall 2015 Clifford A. Shaffer

More information

Shape, Outliers, Center, Spread Frequency and Relative Histograms Related to other types of graphical displays

Shape, Outliers, Center, Spread Frequency and Relative Histograms Related to other types of graphical displays Histograms: Shape, Outliers, Center, Spread Frequency and Relative Histograms Related to other types of graphical displays Sep 9 1:13 PM Shape: Skewed left Bell shaped Symmetric Bi modal Symmetric Skewed

More information

Chapter 6. The Standard Deviation as a Ruler and the Normal Model 1 /67

Chapter 6. The Standard Deviation as a Ruler and the Normal Model 1 /67 Chapter 6 The Standard Deviation as a Ruler and the Normal Model 1 /67 Homework Read Chpt 6 Complete Reading Notes Do P129 1, 3, 5, 7, 15, 17, 23, 27, 29, 31, 37, 39, 43 2 /67 Objective Students calculate

More information

Statistics I Chapter 2: Univariate data analysis

Statistics I Chapter 2: Univariate data analysis Statistics I Chapter 2: Univariate data analysis Chapter 2: Univariate data analysis Contents Graphical displays for categorical data (barchart, piechart) Graphical displays for numerical data data (histogram,

More information

Introduction to Statistics

Introduction to Statistics Introduction to Statistics Data and Statistics Data consists of information coming from observations, counts, measurements, or responses. Statistics is the science of collecting, organizing, analyzing,

More information

Finite Mathematics : A Business Approach

Finite Mathematics : A Business Approach Finite Mathematics : A Business Approach Dr. Brian Travers and Prof. James Lampes Second Edition Cover Art by Stephanie Oxenford Additional Editing by John Gambino Contents What You Should Already Know

More information

Unit 2. Describing Data: Numerical

Unit 2. Describing Data: Numerical Unit 2 Describing Data: Numerical Describing Data Numerically Describing Data Numerically Central Tendency Arithmetic Mean Median Mode Variation Range Interquartile Range Variance Standard Deviation Coefficient

More information

Module 1. Identify parts of an expression using vocabulary such as term, equation, inequality

Module 1. Identify parts of an expression using vocabulary such as term, equation, inequality Common Core Standards Major Topic Key Skills Chapters Key Vocabulary Essential Questions Module 1 Pre- Requisites Skills: Students need to know how to add, subtract, multiply and divide. Students need

More information

Topic 3: Introduction to Statistics. Algebra 1. Collecting Data. Table of Contents. Categorical or Quantitative? What is the Study of Statistics?!

Topic 3: Introduction to Statistics. Algebra 1. Collecting Data. Table of Contents. Categorical or Quantitative? What is the Study of Statistics?! Topic 3: Introduction to Statistics Collecting Data We collect data through observation, surveys and experiments. We can collect two different types of data: Categorical Quantitative Algebra 1 Table of

More information

Chapter 1: Exploring Data

Chapter 1: Exploring Data Chapter 1: Exploring Data Section 1.3 with Numbers The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE Chapter 1 Exploring Data Introduction: Data Analysis: Making Sense of Data 1.1

More information

What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty.

What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty. What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty. Statistics is a field of study concerned with the data collection,

More information

Math 140 Introductory Statistics

Math 140 Introductory Statistics Math 140 Introductory Statistics Professor Silvia Fernández Chapter 2 Based on the book Statistics in Action by A. Watkins, R. Scheaffer, and G. Cobb. Visualizing Distributions Recall the definition: The

More information

Math 140 Introductory Statistics

Math 140 Introductory Statistics Visualizing Distributions Math 140 Introductory Statistics Professor Silvia Fernández Chapter Based on the book Statistics in Action by A. Watkins, R. Scheaffer, and G. Cobb. Recall the definition: The

More information

Chapter 9: Roots and Irrational Numbers

Chapter 9: Roots and Irrational Numbers Chapter 9: Roots and Irrational Numbers Index: A: Square Roots B: Irrational Numbers C: Square Root Functions & Shifting D: Finding Zeros by Completing the Square E: The Quadratic Formula F: Quadratic

More information

Remember your SOCS! S: O: C: S:

Remember your SOCS! S: O: C: S: Remember your SOCS! S: O: C: S: 1.1: Displaying Distributions with Graphs Dotplot: Age of your fathers Low scale: 45 High scale: 75 Doesn t have to start at zero, just cover the range of the data Label

More information

Further Mathematics 2018 CORE: Data analysis Chapter 2 Summarising numerical data

Further Mathematics 2018 CORE: Data analysis Chapter 2 Summarising numerical data Chapter 2: Summarising numerical data Further Mathematics 2018 CORE: Data analysis Chapter 2 Summarising numerical data Extract from Study Design Key knowledge Types of data: categorical (nominal and ordinal)

More information

Glossary for the Triola Statistics Series

Glossary for the Triola Statistics Series Glossary for the Triola Statistics Series Absolute deviation The measure of variation equal to the sum of the deviations of each value from the mean, divided by the number of values Acceptance sampling

More information

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1 Lecture Slides Elementary Statistics Tenth Edition and the Triola Statistics Series by Mario F. Triola Slide 1 Chapter 3 Statistics for Describing, Exploring, and Comparing Data 3-1 Overview 3-2 Measures

More information

Chapter 6 The Standard Deviation as a Ruler and the Normal Model

Chapter 6 The Standard Deviation as a Ruler and the Normal Model Chapter 6 The Standard Deviation as a Ruler and the Normal Model Overview Key Concepts Understand how adding (subtracting) a constant or multiplying (dividing) by a constant changes the center and/or spread

More information

Statistics I Chapter 2: Univariate data analysis

Statistics I Chapter 2: Univariate data analysis Statistics I Chapter 2: Univariate data analysis Chapter 2: Univariate data analysis Contents Graphical displays for categorical data (barchart, piechart) Graphical displays for numerical data data (histogram,

More information

Types of Information. Topic 2 - Descriptive Statistics. Examples. Sample and Sample Size. Background Reading. Variables classified as STAT 511

Types of Information. Topic 2 - Descriptive Statistics. Examples. Sample and Sample Size. Background Reading. Variables classified as STAT 511 Topic 2 - Descriptive Statistics STAT 511 Professor Bruce Craig Types of Information Variables classified as Categorical (qualitative) - variable classifies individual into one of several groups or categories

More information

ADMS2320.com. We Make Stats Easy. Chapter 4. ADMS2320.com Tutorials Past Tests. Tutorial Length 1 Hour 45 Minutes

ADMS2320.com. We Make Stats Easy. Chapter 4. ADMS2320.com Tutorials Past Tests. Tutorial Length 1 Hour 45 Minutes We Make Stats Easy. Chapter 4 Tutorial Length 1 Hour 45 Minutes Tutorials Past Tests Chapter 4 Page 1 Chapter 4 Note The following topics will be covered in this chapter: Measures of central location Measures

More information

MEASURING THE SPREAD OF DATA: 6F

MEASURING THE SPREAD OF DATA: 6F CONTINUING WITH DESCRIPTIVE STATS 6E,6F,6G,6H,6I MEASURING THE SPREAD OF DATA: 6F othink about this example: Suppose you are at a high school football game and you sample 40 people from the student section

More information

Index I-1. in one variable, solution set of, 474 solving by factoring, 473 cubic function definition, 394 graphs of, 394 x-intercepts on, 474

Index I-1. in one variable, solution set of, 474 solving by factoring, 473 cubic function definition, 394 graphs of, 394 x-intercepts on, 474 Index A Absolute value explanation of, 40, 81 82 of slope of lines, 453 addition applications involving, 43 associative law for, 506 508, 570 commutative law for, 238, 505 509, 570 English phrases for,

More information

MATH 117 Statistical Methods for Management I Chapter Three

MATH 117 Statistical Methods for Management I Chapter Three Jubail University College MATH 117 Statistical Methods for Management I Chapter Three This chapter covers the following topics: I. Measures of Center Tendency. 1. Mean for Ungrouped Data (Raw Data) 2.

More information

Math Lecture 3 Notes

Math Lecture 3 Notes Math 1010 - Lecture 3 Notes Dylan Zwick Fall 2009 1 Operations with Real Numbers In our last lecture we covered some basic operations with real numbers like addition, subtraction and multiplication. This

More information

Stat Lecture Slides Exploring Numerical Data. Yibi Huang Department of Statistics University of Chicago

Stat Lecture Slides Exploring Numerical Data. Yibi Huang Department of Statistics University of Chicago Stat 22000 Lecture Slides Exploring Numerical Data Yibi Huang Department of Statistics University of Chicago Outline In this slide, we cover mostly Section 1.2 & 1.6 in the text. Data and Types of Variables

More information

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics DETAILED CONTENTS About the Author Preface to the Instructor To the Student How to Use SPSS With This Book PART I INTRODUCTION AND DESCRIPTIVE STATISTICS 1. Introduction to Statistics 1.1 Descriptive and

More information

WSMA Algebra - Expressions Lesson 14

WSMA Algebra - Expressions Lesson 14 Algebra Expressions Why study algebra? Because this topic provides the mathematical tools for any problem more complicated than just combining some given numbers together. Algebra lets you solve word problems

More information

Descriptive Statistics-I. Dr Mahmoud Alhussami

Descriptive Statistics-I. Dr Mahmoud Alhussami Descriptive Statistics-I Dr Mahmoud Alhussami Biostatistics What is the biostatistics? A branch of applied math. that deals with collecting, organizing and interpreting data using well-defined procedures.

More information

Observations Homework Checkpoint quizzes Chapter assessments (Possibly Projects) Blocks of Algebra

Observations Homework Checkpoint quizzes Chapter assessments (Possibly Projects) Blocks of Algebra September The Building Blocks of Algebra Rates, Patterns and Problem Solving Variables and Expressions The Commutative and Associative Properties The Distributive Property Equivalent Expressions Seeing

More information

REVIEW: Midterm Exam. Spring 2012

REVIEW: Midterm Exam. Spring 2012 REVIEW: Midterm Exam Spring 2012 Introduction Important Definitions: - Data - Statistics - A Population - A census - A sample Types of Data Parameter (Describing a characteristic of the Population) Statistic

More information

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 3.1- #

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 3.1- # Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series by Mario F. Triola Chapter 3 Statistics for Describing, Exploring, and Comparing Data 3-1 Review and Preview 3-2 Measures

More information

3.1 Measures of Central Tendency: Mode, Median and Mean. Average a single number that is used to describe the entire sample or population

3.1 Measures of Central Tendency: Mode, Median and Mean. Average a single number that is used to describe the entire sample or population . Measures of Central Tendency: Mode, Median and Mean Average a single number that is used to describe the entire sample or population. Mode a. Easiest to compute, but not too stable i. Changing just one

More information

Probability Distributions

Probability Distributions Probability Distributions Probability This is not a math class, or an applied math class, or a statistics class; but it is a computer science course! Still, probability, which is a math-y concept underlies

More information