1 Measures of the Center of a Distribution

Size: px
Start display at page:

Download "1 Measures of the Center of a Distribution"

Transcription

1 1 Measures of the Center of a Distribution Qualitative descriptions of the shape of a distribution are important and useful. But we will often desire the precision of numerical summaries as well. Two aspects of unimodal distributions that we will often want to measure are center (what is a typical value? where do the values cluster?), and the amount of variation (are the data tightly clustered around a central value, or more spread out?) Two widely used measures of center are the mean and the median. You are probably already familiar with both. The mean is calculated by adding all the values of a variable and dividing by the number of values. Our usual notation will be to denote the n values as x 1, x 2,... x n, and the mean of these values as x. Then the formula for the mean becomes n x = x i. n The median is a value that splits the data in half half of the values are smaller than the median and half are larger. By this definition, there could be more than one median (when there are an even number of values). This ambiguity is removed by taking the mean of the two middle numbers (after sorting the data). Whereas x denotes the mean of the n numbers x 1,..., x n, we use x to denote the median of these numbers. The mean and median are easily computed in R. > x=c(1,1,5,20,10) > mean(x) [1] 7.4 > median(x) [1] Comparing mean and median While both the mean and median provide a measure of the center of a distribution, they measure different things and sometimes one measure is better than the other. If a distribution is (approximately) symmetric, the mean and median will be (approximately) the same. If the distribution is not symmetric, however, the mean and median may be very different. For example, if we begin with a symmetric distribution and add in one additional value that is very much larger than the other values (an outlier), then the median will not change very much (if at all), but the mean will increase substantially. Because of this, we say that the median is resistant to outliers while the mean is not. A similar thing happens with a skewed, unimodal distribution. If a distribution is positively skewed, the large values in the tail of the distribution increase the mean (as compared to a symmetric distribution) but not the median, so the mean will be larger than the median. Similarly, the mean of a negatively skewed distribution will be smaller than the median. Consider the data on the populations of the 3,141 county equivalents in the United States. From R we see the great difference in the mean county population and the median county population. Note that the largest county, Los Angeles County with over 9 million people, alone contributes over 3,000 people to the mean. Whether a resistant measure is desirable or not depends on context. If we are looking at the income of employees of a local business, the median may give us a much better indication of what a typical worker earns, since there may be a few large salaries (the business owner s, for example) that inflate the mean. This is also why the government reports median household income and median housing costs. The median county population perhaps tells us more about what a typical county looks like than does the mean. On the other hand, if we are ultimately interested in the total, the mean is more useful. For example, the median of daily sales of a Hallmark Card store will likely be smaller than the mean as there are several days of card sales that are outliers (e.g., the few days before Mother s Day). The mean daily sales for a year allows us also to compute the total sales on the year and comparing the means of two different stores allows us to determine which store has greater sales overall.

2 1.2 The trimmed mean There is another measure of center that is less well known and represents a kind of compromise between the mean and the median. In particular, it is more sensitive to the the extreme values of a distribution than the median is, but less sensitive than the mean. The idea of a trimmed mean is very simple. Before calculating the mean, we remove the largest and smallest values from the data. The percentage of the data removed from each end is called the trimming percentage. The 10% trimmed mean is the mean of the middle 80% of the data (after removing the largest and smallest 10%). A trimmed mean is calculated in R by setting the trim argument of mean(), e.g. mean(x,trim=.10). Although a trimmed mean in some sense combines the advantages of both the mean and median, it is less common than either the mean or the median. This is partly due the mathematical theory that has been developed for working with the median and especially the mean of sample data. The 10% trimmed mean of county populations is 38,234 which is much closer in size to the median than to the mean. We note in passing that there are some complications in defining the trimmed mean. For instance in the case of the example above, the 10% trimmed mean of the county populations should be computed by removing the largest most populus and the least populus counties from the data. But this doesn t make any sense so that there needs to be some convention for dealing with these fractions of data points. In fact there are several different conventions but we will let R handle the details of the computation and not be too concerned about the differences. With large datasets, the differences between the various definitions of, say, the 10% trimmed mean are small. In some sports, the trimmed mean is used to compute a competitors score based on the scores given by individual judges. Diving, figure skating, and gymnastics are three sports that use a trimmed mean to compute a competitors final score. (Diving uses the middle three scores when there are five judges which amounts to computing the 20% trimmed mean.) 2 Measures of Dispersion It is often useful to characterize a distribution in terms of its center, but that is not the whole story. Consider the distributions depicted in the histograms below A B 0.15 Density In each case the mean and median are approximately 10, but the distributions clearly have very different shapes. The difference is that distribution B is much more spread out. Almost all of the data in distribution A are quite close to 10; a much larger proportion of distribution B is far away from 10. The intuitive (and not very precise) statement in the preceding sentence can be quantified by means of quantiles. The idea of quantiles is probably familiar to you since percentiles are a special case of quantiles. Definition 2.1 (Quantile). Let p [0, 1]. A p-quantile of a quantitative distribution is a number q such that the (approximate) proportion of the distribution that is less than q is p.

3 So for example, the.2-quantile divides a distribution into 20% below and 80% above. The.2-quantile is also called the 20th percentile. The median is just the.5-quantile (and the 50th percentile). While the definition of quantile above seems clear, it does have the same complication as that of the definition of trimmed mean. Suppose your data set has 15 values. What is the.30-quantile? 30% of the data would be (.30)(15) = 4.5 values. Of course, there is no number that has 4.5 values below it and 10.5 values above it. This is the reason for the parenthetical word approximate in Definition??. Different methods have been proposed for giving quantiles a single value, and R implements 9 different methods! The next example illustrates the default computation of the.25-quantile for datasets of size 5, 6, 7 and 8. > quantile(1:5,.25) 2 > quantile(1:6,.25) 2.25 > quantile(1:7,.25) 2.5 > quantile(1:8,.25) 2.75 Fortunately, for large data sets, the differences between the various different quantile methods are usually small, so we will just let R compute quantiles for us using the default method of the quantile() function. The difference between the first and third quartiles is often used as a simple measure of dispersion. This measure is called the inter-quartile range and abbreviated IQR. The IQR of the Old Faithful eruption times is = > quantile(faithful$eruptions,.75) 75% > quantile(faithful$eruptions,.25) > IQR(faithful$eruptions) [1] Note that since IQR depends only on the middle 50% of data, it is a measure of dispersion that is resistant to outliers. Especially for hand computation, yet another method of computing the quartiles based on the median is popular. With this method, the first- and third-quartiles (the.25-quantile and the.75-quantile respectively) are called the lower hinge and the upper hinge respectively. These are computed by the following definition. Definition 2.2 (hinges). Suppose that a variable x 1,..., x n has an even number of values, say n = 2k. Then the lower hinge is the median of the smallest k values and the upper hinge is the median of the largest k values. If the variable has an odd number of values, n = 2k + 1, then the lower hinge is the median of the smallest k + 1 values and the upper hinge is the median of the largest k + 1 values. In other words, the lower hinge is the median of the lower half of the data with the middle point included in that half if there are an odd number of data points. Similarly for the upper hinge. A very common and useful description of the variability in a distribution is the five number summary. The five number summary consists of the minimum, lower hinge, median, upper hinge, and maximum of the distribution. The five number summary is computed by the R function fivenum().

4 The five-number summary is often presented graphically by means of a boxplot. The standard R function is boxplot(). A boxplot of the Sepal.Width of the iris data is generated by > boxplot(iris$sepal.width,horizontal=t) The sides of the box are drawn at the hinges. The median is represented by a dot or line in the box. In some boxplots, the whiskers extend out to the maximum and minimum values. However the boxplot that we are using here attempts to identify outliers. Outliers are values that are unusually large or small and are indicated by a special symbol beyond the whiskers. The whiskers are then drawn from the box to the largest and smallest non-outliers. One common rule for automating outlier detection for boxplots is the 1.5 IQR rule. This is the default rule in both boxplot functions in R. Under this rule, any value that is more than 1.5 IQR away from the box is marked as an outlier. Indicating outliers in this way is useful since it allows us to see if the whisker is long only because of a few extreme values. A boxplot gives us some idea of the symmetry and general dispersion of a variable but it certainly doesn t give us as much information about the shape of a distribution as a histogram. 2.1 Variance and Standard Deviation Another important way to measure the dispersion of a distribution is by comparing each value to the center of the distribution. If the distribution is spread out, these differences will tend to be large, otherwise these differences will be small. To get a single number, we could simply add up all of the deviation from the mean: total deviation from the mean = (x i x). The trouble with this is that the total deviation from the mean is always 0. deviations and the positive deviations always exactly cancel out. To fix this problem we might consider taking the absolute value of the deviations from the mean: total absolute deviation from the mean = x i x. The problem is that the negative This number will only be 0 if all of the data values are equal to the mean. Even better would be to divide by the number of data values. Otherwise large data sets will have large sums even if the values are all close to the mean. mean absolute deviation mean absolute deviation = 1 x i x. n This is a reasonable measure of the dispersion in a distribution, but we will not use it very often. There is another measure that is much more common, namely the variance, which is defined by variance variance = VAR(x) = 1 n 1 (x i x) 2.

5 You will notice two differences from the mean absolute deviation. First, instead of using an absolute value to make things positive, we square the deviations from the mean. One advantage of squaring over the absolute value is that it is much easier to do calculus with a polynomial than with functions involving absolute values. The second difference is that we divide by n 1 instead of by n. There is a good reason for this, even though dividing by n seems more natural. We will get to that reason later. However the following principle helps to remember the number n 1. If we consider the n 1 deviations (x 1 x), (x 2 x),..., (x n x), any one of them is determined from the other n 1 (because of the property of the mean that the sums of these deviations is 0). Therefore there are only n 1 degrees of freedom in these numbers rather than the n degrees of freedom in the data. Indeed, n 1 is called degrees of freedom for this reason. Because the squaring changes the units of this measure, the square root of the variance, called the standard deviation, is commonly used in place of the variance. standard deviation = SD(x) = VAR(x). We will sometimes use the notation s x and s 2 x for the standard deviation and variance respectively. The subscript x refers to the particular variable for which we are computing the variance or standard deviation and we sometimes omit it (and write s or s 2 ) when it is clear what variable is involved. All of these quantities are easy to compute in R.

Describing Distributions with Numbers

Describing Distributions with Numbers Describing Distributions with Numbers Using graphs, we could determine the center, spread, and shape of the distribution of a quantitative variable. We can also use numbers (called summary statistics)

More information

2011 Pearson Education, Inc

2011 Pearson Education, Inc Statistics for Business and Economics Chapter 2 Methods for Describing Sets of Data Summary of Central Tendency Measures Measure Formula Description Mean x i / n Balance Point Median ( n +1) Middle Value

More information

Lecture 2 and Lecture 3

Lecture 2 and Lecture 3 Lecture 2 and Lecture 3 1 Lecture 2 and Lecture 3 We can describe distributions using 3 characteristics: shape, center and spread. These characteristics have been discussed since the foundation of statistics.

More information

Unit 2: Numerical Descriptive Measures

Unit 2: Numerical Descriptive Measures Unit 2: Numerical Descriptive Measures Summation Notation Measures of Central Tendency Measures of Dispersion Chebyshev's Rule Empirical Rule Measures of Relative Standing Box Plots z scores Jan 28 10:48

More information

QUANTITATIVE DATA. UNIVARIATE DATA data for one variable

QUANTITATIVE DATA. UNIVARIATE DATA data for one variable QUANTITATIVE DATA Recall that quantitative (numeric) data values are numbers where data take numerical values for which it is sensible to find averages, such as height, hourly pay, and pulse rates. UNIVARIATE

More information

Chapter 3. Data Description

Chapter 3. Data Description Chapter 3. Data Description Graphical Methods Pie chart It is used to display the percentage of the total number of measurements falling into each of the categories of the variable by partition a circle.

More information

Measures of center. The mean The mean of a distribution is the arithmetic average of the observations:

Measures of center. The mean The mean of a distribution is the arithmetic average of the observations: Measures of center The mean The mean of a distribution is the arithmetic average of the observations: x = x 1 + + x n n n = 1 x i n i=1 The median The median is the midpoint of a distribution: the number

More information

2.1 Measures of Location (P.9-11)

2.1 Measures of Location (P.9-11) MATH1015 Biostatistics Week.1 Measures of Location (P.9-11).1.1 Summation Notation Suppose that we observe n values from an experiment. This collection (or set) of n values is called a sample. Let x 1

More information

CHAPTER 1. Introduction

CHAPTER 1. Introduction CHAPTER 1 Introduction Engineers and scientists are constantly exposed to collections of facts, or data. The discipline of statistics provides methods for organizing and summarizing data, and for drawing

More information

Chapter 1 - Lecture 3 Measures of Location

Chapter 1 - Lecture 3 Measures of Location Chapter 1 - Lecture 3 of Location August 31st, 2009 Chapter 1 - Lecture 3 of Location General Types of measures Median Skewness Chapter 1 - Lecture 3 of Location Outline General Types of measures What

More information

Lecture 6: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries)

Lecture 6: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries) Lecture 6: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries) Summarize with Shape, Center, Spread Displays: Stemplots, Histograms Five Number Summary, Outliers, Boxplots Cengage Learning

More information

Elementary Statistics

Elementary Statistics Elementary Statistics Q: What is data? Q: What does the data look like? Q: What conclusions can we draw from the data? Q: Where is the middle of the data? Q: Why is the spread of the data important? Q:

More information

STP 420 INTRODUCTION TO APPLIED STATISTICS NOTES

STP 420 INTRODUCTION TO APPLIED STATISTICS NOTES INTRODUCTION TO APPLIED STATISTICS NOTES PART - DATA CHAPTER LOOKING AT DATA - DISTRIBUTIONS Individuals objects described by a set of data (people, animals, things) - all the data for one individual make

More information

Statistics and parameters

Statistics and parameters Statistics and parameters Tables, histograms and other charts are used to summarize large amounts of data. Often, an even more extreme summary is desirable. Statistics and parameters are numbers that characterize

More information

Chapter 3: Displaying and summarizing quantitative data p52 The pattern of variation of a variable is called its distribution.

Chapter 3: Displaying and summarizing quantitative data p52 The pattern of variation of a variable is called its distribution. Chapter 3: Displaying and summarizing quantitative data p52 The pattern of variation of a variable is called its distribution. 1 Histograms p53 The breakfast cereal data Study collected data on nutritional

More information

1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved.

1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved. 1-1 Chapter 1 Sampling and Descriptive Statistics 1-2 Why Statistics? Deal with uncertainty in repeated scientific measurements Draw conclusions from data Design valid experiments and draw reliable conclusions

More information

Chapter 2: Tools for Exploring Univariate Data

Chapter 2: Tools for Exploring Univariate Data Stats 11 (Fall 2004) Lecture Note Introduction to Statistical Methods for Business and Economics Instructor: Hongquan Xu Chapter 2: Tools for Exploring Univariate Data Section 2.1: Introduction What is

More information

Descriptive Data Summarization

Descriptive Data Summarization Descriptive Data Summarization Descriptive data summarization gives the general characteristics of the data and identify the presence of noise or outliers, which is useful for successful data cleaning

More information

Lecture 2. Quantitative variables. There are three main graphical methods for describing, summarizing, and detecting patterns in quantitative data:

Lecture 2. Quantitative variables. There are three main graphical methods for describing, summarizing, and detecting patterns in quantitative data: Lecture 2 Quantitative variables There are three main graphical methods for describing, summarizing, and detecting patterns in quantitative data: Stemplot (stem-and-leaf plot) Histogram Dot plot Stemplots

More information

P8130: Biostatistical Methods I

P8130: Biostatistical Methods I P8130: Biostatistical Methods I Lecture 2: Descriptive Statistics Cody Chiuzan, PhD Department of Biostatistics Mailman School of Public Health (MSPH) Lecture 1: Recap Intro to Biostatistics Types of Data

More information

1. Exploratory Data Analysis

1. Exploratory Data Analysis 1. Exploratory Data Analysis 1.1 Methods of Displaying Data A visual display aids understanding and can highlight features which may be worth exploring more formally. Displays should have impact and be

More information

Shape, Outliers, Center, Spread Frequency and Relative Histograms Related to other types of graphical displays

Shape, Outliers, Center, Spread Frequency and Relative Histograms Related to other types of graphical displays Histograms: Shape, Outliers, Center, Spread Frequency and Relative Histograms Related to other types of graphical displays Sep 9 1:13 PM Shape: Skewed left Bell shaped Symmetric Bi modal Symmetric Skewed

More information

Perhaps the most important measure of location is the mean (average). Sample mean: where n = sample size. Arrange the values from smallest to largest:

Perhaps the most important measure of location is the mean (average). Sample mean: where n = sample size. Arrange the values from smallest to largest: 1 Chapter 3 - Descriptive stats: Numerical measures 3.1 Measures of Location Mean Perhaps the most important measure of location is the mean (average). Sample mean: where n = sample size Example: The number

More information

Describing Distributions with Numbers

Describing Distributions with Numbers Topic 2 We next look at quantitative data. Recall that in this case, these data can be subject to the operations of arithmetic. In particular, we can add or subtract observation values, we can sort them

More information

Histograms allow a visual interpretation

Histograms allow a visual interpretation Chapter 4: Displaying and Summarizing i Quantitative Data s allow a visual interpretation of quantitative (numerical) data by indicating the number of data points that lie within a range of values, called

More information

Unit 2. Describing Data: Numerical

Unit 2. Describing Data: Numerical Unit 2 Describing Data: Numerical Describing Data Numerically Describing Data Numerically Central Tendency Arithmetic Mean Median Mode Variation Range Interquartile Range Variance Standard Deviation Coefficient

More information

Chapter 5. Understanding and Comparing. Distributions

Chapter 5. Understanding and Comparing. Distributions STAT 141 Introduction to Statistics Chapter 5 Understanding and Comparing Distributions Bin Zou (bzou@ualberta.ca) STAT 141 University of Alberta Winter 2015 1 / 27 Boxplots How to create a boxplot? Assume

More information

Chapter 3. Measuring data

Chapter 3. Measuring data Chapter 3 Measuring data 1 Measuring data versus presenting data We present data to help us draw meaning from it But pictures of data are subjective They re also not susceptible to rigorous inference Measuring

More information

Section 3. Measures of Variation

Section 3. Measures of Variation Section 3 Measures of Variation Range Range = (maximum value) (minimum value) It is very sensitive to extreme values; therefore not as useful as other measures of variation. Sample Standard Deviation The

More information

Describing distributions with numbers

Describing distributions with numbers Describing distributions with numbers A large number or numerical methods are available for describing quantitative data sets. Most of these methods measure one of two data characteristics: The central

More information

Describing distributions with numbers

Describing distributions with numbers Describing distributions with numbers A large number or numerical methods are available for describing quantitative data sets. Most of these methods measure one of two data characteristics: The central

More information

Chapter 4.notebook. August 30, 2017

Chapter 4.notebook. August 30, 2017 Sep 1 7:53 AM Sep 1 8:21 AM Sep 1 8:21 AM 1 Sep 1 8:23 AM Sep 1 8:23 AM Sep 1 8:23 AM SOCS When describing a distribution, make sure to always tell about three things: shape, outliers, center, and spread

More information

ECLT 5810 Data Preprocessing. Prof. Wai Lam

ECLT 5810 Data Preprocessing. Prof. Wai Lam ECLT 5810 Data Preprocessing Prof. Wai Lam Why Data Preprocessing? Data in the real world is imperfect incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate

More information

Unit Two Descriptive Biostatistics. Dr Mahmoud Alhussami

Unit Two Descriptive Biostatistics. Dr Mahmoud Alhussami Unit Two Descriptive Biostatistics Dr Mahmoud Alhussami Descriptive Biostatistics The best way to work with data is to summarize and organize them. Numbers that have not been summarized and organized are

More information

F78SC2 Notes 2 RJRC. If the interest rate is 5%, we substitute x = 0.05 in the formula. This gives

F78SC2 Notes 2 RJRC. If the interest rate is 5%, we substitute x = 0.05 in the formula. This gives F78SC2 Notes 2 RJRC Algebra It is useful to use letters to represent numbers. We can use the rules of arithmetic to manipulate the formula and just substitute in the numbers at the end. Example: 100 invested

More information

CHAPTER 2: Describing Distributions with Numbers

CHAPTER 2: Describing Distributions with Numbers CHAPTER 2: Describing Distributions with Numbers The Basic Practice of Statistics 6 th Edition Moore / Notz / Fligner Lecture PowerPoint Slides Chapter 2 Concepts 2 Measuring Center: Mean and Median Measuring

More information

Describing Distributions

Describing Distributions Describing Distributions With Numbers April 18, 2012 Summary Statistics. Measures of Center. Percentiles. Measures of Spread. A Summary Statement. Choosing Numerical Summaries. 1.0 What Are Summary Statistics?

More information

Chapter 4. Displaying and Summarizing. Quantitative Data

Chapter 4. Displaying and Summarizing. Quantitative Data STAT 141 Introduction to Statistics Chapter 4 Displaying and Summarizing Quantitative Data Bin Zou (bzou@ualberta.ca) STAT 141 University of Alberta Winter 2015 1 / 31 4.1 Histograms 1 We divide the range

More information

Lecture 3B: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries)

Lecture 3B: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries) Lecture 3B: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries) Summarize with Shape, Center, Spread Displays: Stemplots, Histograms Five Number Summary, Outliers, Boxplots Mean vs.

More information

Further Mathematics 2018 CORE: Data analysis Chapter 2 Summarising numerical data

Further Mathematics 2018 CORE: Data analysis Chapter 2 Summarising numerical data Chapter 2: Summarising numerical data Further Mathematics 2018 CORE: Data analysis Chapter 2 Summarising numerical data Extract from Study Design Key knowledge Types of data: categorical (nominal and ordinal)

More information

Chapter 6 Group Activity - SOLUTIONS

Chapter 6 Group Activity - SOLUTIONS Chapter 6 Group Activity - SOLUTIONS Group Activity Summarizing a Distribution 1. The following data are the number of credit hours taken by Math 105 students during a summer term. You will be analyzing

More information

Σ x i. Sigma Notation

Σ x i. Sigma Notation Sigma Notation The mathematical notation that is used most often in the formulation of statistics is the summation notation The uppercase Greek letter Σ (sigma) is used as shorthand, as a way to indicate

More information

3.1 Measures of Central Tendency: Mode, Median and Mean. Average a single number that is used to describe the entire sample or population

3.1 Measures of Central Tendency: Mode, Median and Mean. Average a single number that is used to describe the entire sample or population . Measures of Central Tendency: Mode, Median and Mean Average a single number that is used to describe the entire sample or population. Mode a. Easiest to compute, but not too stable i. Changing just one

More information

Math 140 Introductory Statistics

Math 140 Introductory Statistics Math 140 Introductory Statistics Professor Silvia Fernández Chapter 2 Based on the book Statistics in Action by A. Watkins, R. Scheaffer, and G. Cobb. Visualizing Distributions Recall the definition: The

More information

Math 140 Introductory Statistics

Math 140 Introductory Statistics Visualizing Distributions Math 140 Introductory Statistics Professor Silvia Fernández Chapter Based on the book Statistics in Action by A. Watkins, R. Scheaffer, and G. Cobb. Recall the definition: The

More information

Summarizing and Displaying Measurement Data/Understanding and Comparing Distributions

Summarizing and Displaying Measurement Data/Understanding and Comparing Distributions Summarizing and Displaying Measurement Data/Understanding and Comparing Distributions Histograms, Mean, Median, Five-Number Summary and Boxplots, Standard Deviation Thought Questions 1. If you were to

More information

CS 147: Computer Systems Performance Analysis

CS 147: Computer Systems Performance Analysis CS 147: Computer Systems Performance Analysis Summarizing Variability and Determining Distributions CS 147: Computer Systems Performance Analysis Summarizing Variability and Determining Distributions 1

More information

Range The range is the simplest of the three measures and is defined now.

Range The range is the simplest of the three measures and is defined now. Measures of Variation EXAMPLE A testing lab wishes to test two experimental brands of outdoor paint to see how long each will last before fading. The testing lab makes 6 gallons of each paint to test.

More information

Descriptive Statistics-I. Dr Mahmoud Alhussami

Descriptive Statistics-I. Dr Mahmoud Alhussami Descriptive Statistics-I Dr Mahmoud Alhussami Biostatistics What is the biostatistics? A branch of applied math. that deals with collecting, organizing and interpreting data using well-defined procedures.

More information

Describing Distributions With Numbers

Describing Distributions With Numbers Describing Distributions With Numbers October 24, 2012 What Do We Usually Summarize? Measures of Center. Percentiles. Measures of Spread. A Summary Statement. Choosing Numerical Summaries. 1.0 What Do

More information

3.1 Measure of Center

3.1 Measure of Center 3.1 Measure of Center Calculate the mean for a given data set Find the median, and describe why the median is sometimes preferable to the mean Find the mode of a data set Describe how skewness affects

More information

TOPIC: Descriptive Statistics Single Variable

TOPIC: Descriptive Statistics Single Variable TOPIC: Descriptive Statistics Single Variable I. Numerical data summary measurements A. Measures of Location. Measures of central tendency Mean; Median; Mode. Quantiles - measures of noncentral tendency

More information

MAT Mathematics in Today's World

MAT Mathematics in Today's World MAT 1000 Mathematics in Today's World Last Time 1. Three keys to summarize a collection of data: shape, center, spread. 2. Can measure spread with the fivenumber summary. 3. The five-number summary can

More information

DEPARTMENT OF QUANTITATIVE METHODS & INFORMATION SYSTEMS QM 120. Spring 2008

DEPARTMENT OF QUANTITATIVE METHODS & INFORMATION SYSTEMS QM 120. Spring 2008 DEPARTMENT OF QUANTITATIVE METHODS & INFORMATION SYSTEMS Introduction to Business Statistics QM 120 Chapter 3 Spring 2008 Measures of central tendency for ungrouped data 2 Graphs are very helpful to describe

More information

Chapter 1:Descriptive statistics

Chapter 1:Descriptive statistics Slide 1.1 Chapter 1:Descriptive statistics Descriptive statistics summarises a mass of information. We may use graphical and/or numerical methods Examples of the former are the bar chart and XY chart,

More information

Chapter 1: Exploring Data

Chapter 1: Exploring Data Chapter 1: Exploring Data Section 1.3 with Numbers The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE Chapter 1 Exploring Data Introduction: Data Analysis: Making Sense of Data 1.1

More information

Week 1: Intro to R and EDA

Week 1: Intro to R and EDA Statistical Methods APPM 4570/5570, STAT 4000/5000 Populations and Samples 1 Week 1: Intro to R and EDA Introduction to EDA Objective: study of a characteristic (measurable quantity, random variable) for

More information

Chapter 1. Looking at Data

Chapter 1. Looking at Data Chapter 1 Looking at Data Types of variables Looking at Data Be sure that each variable really does measure what you want it to. A poor choice of variables can lead to misleading conclusions!! For example,

More information

Statistics I Chapter 2: Univariate data analysis

Statistics I Chapter 2: Univariate data analysis Statistics I Chapter 2: Univariate data analysis Chapter 2: Univariate data analysis Contents Graphical displays for categorical data (barchart, piechart) Graphical displays for numerical data data (histogram,

More information

Exercises from Chapter 3, Section 1

Exercises from Chapter 3, Section 1 Exercises from Chapter 3, Section 1 1. Consider the following sample consisting of 20 numbers. (a) Find the mode of the data 21 23 24 24 25 26 29 30 32 34 39 41 41 41 42 43 48 51 53 53 (b) Find the median

More information

Objective A: Mean, Median and Mode Three measures of central of tendency: the mean, the median, and the mode.

Objective A: Mean, Median and Mode Three measures of central of tendency: the mean, the median, and the mode. Chapter 3 Numerically Summarizing Data Chapter 3.1 Measures of Central Tendency Objective A: Mean, Median and Mode Three measures of central of tendency: the mean, the median, and the mode. A1. Mean The

More information

AP Final Review II Exploring Data (20% 30%)

AP Final Review II Exploring Data (20% 30%) AP Final Review II Exploring Data (20% 30%) Quantitative vs Categorical Variables Quantitative variables are numerical values for which arithmetic operations such as means make sense. It is usually a measure

More information

Review for Exam #1. Chapter 1. The Nature of Data. Definitions. Population. Sample. Quantitative data. Qualitative (attribute) data

Review for Exam #1. Chapter 1. The Nature of Data. Definitions. Population. Sample. Quantitative data. Qualitative (attribute) data Review for Exam #1 1 Chapter 1 Population the complete collection of elements (scores, people, measurements, etc.) to be studied Sample a subcollection of elements drawn from a population 11 The Nature

More information

MATH4427 Notebook 4 Fall Semester 2017/2018

MATH4427 Notebook 4 Fall Semester 2017/2018 MATH4427 Notebook 4 Fall Semester 2017/2018 prepared by Professor Jenny Baglivo c Copyright 2009-2018 by Jenny A. Baglivo. All Rights Reserved. 4 MATH4427 Notebook 4 3 4.1 K th Order Statistics and Their

More information

1.3.1 Measuring Center: The Mean

1.3.1 Measuring Center: The Mean 1.3.1 Measuring Center: The Mean Mean - The arithmetic average. To find the mean (pronounced x bar) of a set of observations, add their values and divide by the number of observations. If the n observations

More information

Description of Samples and Populations

Description of Samples and Populations Description of Samples and Populations Random Variables Data are generated by some underlying random process or phenomenon. Any datum (data point) represents the outcome of a random variable. We represent

More information

MATH 117 Statistical Methods for Management I Chapter Three

MATH 117 Statistical Methods for Management I Chapter Three Jubail University College MATH 117 Statistical Methods for Management I Chapter Three This chapter covers the following topics: I. Measures of Center Tendency. 1. Mean for Ungrouped Data (Raw Data) 2.

More information

Statistics I Chapter 2: Univariate data analysis

Statistics I Chapter 2: Univariate data analysis Statistics I Chapter 2: Univariate data analysis Chapter 2: Univariate data analysis Contents Graphical displays for categorical data (barchart, piechart) Graphical displays for numerical data data (histogram,

More information

Describing Distributions With Numbers Chapter 12

Describing Distributions With Numbers Chapter 12 Describing Distributions With Numbers Chapter 12 May 1, 2013 What Do We Usually Summarize? Measures of Center. Percentiles. Measures of Spread. A Summary. 1.0 What Do We Usually Summarize? source: Prof.

More information

STA 218: Statistics for Management

STA 218: Statistics for Management Al Nosedal. University of Toronto. Fall 2017 My momma always said: Life was like a box of chocolates. You never know what you re gonna get. Forrest Gump. Problem How much do people with a bachelor s degree

More information

Finding Quartiles. . Q1 is the median of the lower half of the data. Q3 is the median of the upper half of the data

Finding Quartiles. . Q1 is the median of the lower half of the data. Q3 is the median of the upper half of the data Finding Quartiles. Use the median to divide the ordered data set into two halves.. If n is odd, do not include the median in either half. If n is even, split this data set exactly in half.. Q1 is the median

More information

MATH 1150 Chapter 2 Notation and Terminology

MATH 1150 Chapter 2 Notation and Terminology MATH 1150 Chapter 2 Notation and Terminology Categorical Data The following is a dataset for 30 randomly selected adults in the U.S., showing the values of two categorical variables: whether or not the

More information

Slide 1. Slide 2. Slide 3. Pick a Brick. Daphne. 400 pts 200 pts 300 pts 500 pts 100 pts. 300 pts. 300 pts 400 pts 100 pts 400 pts.

Slide 1. Slide 2. Slide 3. Pick a Brick. Daphne. 400 pts 200 pts 300 pts 500 pts 100 pts. 300 pts. 300 pts 400 pts 100 pts 400 pts. Slide 1 Slide 2 Daphne Phillip Kathy Slide 3 Pick a Brick 100 pts 200 pts 500 pts 300 pts 400 pts 200 pts 300 pts 500 pts 100 pts 300 pts 400 pts 100 pts 400 pts 100 pts 200 pts 500 pts 100 pts 400 pts

More information

Chapter 1: Introduction. Material from Devore s book (Ed 8), and Cengagebrain.com

Chapter 1: Introduction. Material from Devore s book (Ed 8), and Cengagebrain.com 1 Chapter 1: Introduction Material from Devore s book (Ed 8), and Cengagebrain.com Populations and Samples An investigation of some characteristic of a population of interest. Example: Say you want to

More information

Statistics for Managers using Microsoft Excel 6 th Edition

Statistics for Managers using Microsoft Excel 6 th Edition Statistics for Managers using Microsoft Excel 6 th Edition Chapter 3 Numerical Descriptive Measures 3-1 Learning Objectives In this chapter, you learn: To describe the properties of central tendency, variation,

More information

What is statistics? Statistics is the science of: Collecting information. Organizing and summarizing the information collected

What is statistics? Statistics is the science of: Collecting information. Organizing and summarizing the information collected What is statistics? Statistics is the science of: Collecting information Organizing and summarizing the information collected Analyzing the information collected in order to draw conclusions Two types

More information

Measures of disease spread

Measures of disease spread Measures of disease spread Marco De Nardi Milk Safety Project 1 Objectives 1. Describe the following measures of spread: range, interquartile range, variance, and standard deviation 2. Discuss examples

More information

Exploratory data analysis: numerical summaries

Exploratory data analysis: numerical summaries 16 Exploratory data analysis: numerical summaries The classical way to describe important features of a dataset is to give several numerical summaries We discuss numerical summaries for the center of a

More information

MgtOp 215 Chapter 3 Dr. Ahn

MgtOp 215 Chapter 3 Dr. Ahn MgtOp 215 Chapter 3 Dr. Ahn Measures of central tendency (center, location): measures the middle point of a distribution or data; these include mean and median. Measures of dispersion (variability, spread):

More information

1.3: Describing Quantitative Data with Numbers

1.3: Describing Quantitative Data with Numbers 1.3: Describing Quantitative Data with Numbers Section 1.3 Describing Quantitative Data with Numbers After this section, you should be able to MEASURE center with the mean and median MEASURE spread with

More information

Chapter 3: Displaying and summarizing quantitative data p52 The pattern of variation of a variable is called its distribution.

Chapter 3: Displaying and summarizing quantitative data p52 The pattern of variation of a variable is called its distribution. Chapter 3: Displaying and summarizing quantitative data p52 The pattern of variation of a variable is called its distribution. 1 Histograms p53 Spoiled ballots are a real threat to democracy. Below are

More information

Lecture 1: Descriptive Statistics

Lecture 1: Descriptive Statistics Lecture 1: Descriptive Statistics MSU-STT-351-Sum 15 (P. Vellaisamy: MSU-STT-351-Sum 15) Probability & Statistics for Engineers 1 / 56 Contents 1 Introduction 2 Branches of Statistics Descriptive Statistics

More information

Measures of the Location of the Data

Measures of the Location of the Data Measures of the Location of the Data 1. 5. Mark has 51 films in his collection. Each movie comes with a rating on a scale from 0.0 to 10.0. The following table displays the ratings of the aforementioned

More information

GRAPHS AND STATISTICS Central Tendency and Dispersion Common Core Standards

GRAPHS AND STATISTICS Central Tendency and Dispersion Common Core Standards B Graphs and Statistics, Lesson 2, Central Tendency and Dispersion (r. 2018) GRAPHS AND STATISTICS Central Tendency and Dispersion Common Core Standards Next Generation Standards S-ID.A.2 Use statistics

More information

Chapter 1 Descriptive Statistics

Chapter 1 Descriptive Statistics MICHIGAN STATE UNIVERSITY STT 351 SECTION 2 FALL 2008 LECTURE NOTES Chapter 1 Descriptive Statistics Nao Mimoto Contents 1 Overview 2 2 Pictorial Methods in Descriptive Statistics 3 2.1 Different Kinds

More information

Example 2. Given the data below, complete the chart:

Example 2. Given the data below, complete the chart: Statistics 2035 Quiz 1 Solutions Example 1. 2 64 150 150 2 128 150 2 256 150 8 8 Example 2. Given the data below, complete the chart: 52.4, 68.1, 66.5, 75.0, 60.5, 78.8, 63.5, 48.9, 81.3 n=9 The data is

More information

Lecture 2. Descriptive Statistics: Measures of Center

Lecture 2. Descriptive Statistics: Measures of Center Lecture 2. Descriptive Statistics: Measures of Center Descriptive Statistics summarize or describe the important characteristics of a known set of data Inferential Statistics use sample data to make inferences

More information

Resistant Measure - A statistic that is not affected very much by extreme observations.

Resistant Measure - A statistic that is not affected very much by extreme observations. Chapter 1.3 Lecture Notes & Examples Section 1.3 Describing Quantitative Data with Numbers (pp. 50-74) 1.3.1 Measuring Center: The Mean Mean - The arithmetic average. To find the mean (pronounced x bar)

More information

STAT 200 Chapter 1 Looking at Data - Distributions

STAT 200 Chapter 1 Looking at Data - Distributions STAT 200 Chapter 1 Looking at Data - Distributions What is Statistics? Statistics is a science that involves the design of studies, data collection, summarizing and analyzing the data, interpreting the

More information

The empirical ( ) rule

The empirical ( ) rule The empirical (68-95-99.7) rule With a bell shaped distribution, about 68% of the data fall within a distance of 1 standard deviation from the mean. 95% fall within 2 standard deviations of the mean. 99.7%

More information

ADMS2320.com. We Make Stats Easy. Chapter 4. ADMS2320.com Tutorials Past Tests. Tutorial Length 1 Hour 45 Minutes

ADMS2320.com. We Make Stats Easy. Chapter 4. ADMS2320.com Tutorials Past Tests. Tutorial Length 1 Hour 45 Minutes We Make Stats Easy. Chapter 4 Tutorial Length 1 Hour 45 Minutes Tutorials Past Tests Chapter 4 Page 1 Chapter 4 Note The following topics will be covered in this chapter: Measures of central location Measures

More information

Business Statistics. Lecture 10: Course Review

Business Statistics. Lecture 10: Course Review Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,

More information

2/2/2015 GEOGRAPHY 204: STATISTICAL PROBLEM SOLVING IN GEOGRAPHY MEASURES OF CENTRAL TENDENCY CHAPTER 3: DESCRIPTIVE STATISTICS AND GRAPHICS

2/2/2015 GEOGRAPHY 204: STATISTICAL PROBLEM SOLVING IN GEOGRAPHY MEASURES OF CENTRAL TENDENCY CHAPTER 3: DESCRIPTIVE STATISTICS AND GRAPHICS Spring 2015: Lembo GEOGRAPHY 204: STATISTICAL PROBLEM SOLVING IN GEOGRAPHY CHAPTER 3: DESCRIPTIVE STATISTICS AND GRAPHICS Descriptive statistics concise and easily understood summary of data set characteristics

More information

are the objects described by a set of data. They may be people, animals or things.

are the objects described by a set of data. They may be people, animals or things. ( c ) E p s t e i n, C a r t e r a n d B o l l i n g e r 2016 C h a p t e r 5 : E x p l o r i n g D a t a : D i s t r i b u t i o n s P a g e 1 CHAPTER 5: EXPLORING DATA DISTRIBUTIONS 5.1 Creating Histograms

More information

One-Sample Numerical Data

One-Sample Numerical Data One-Sample Numerical Data quantiles, boxplot, histogram, bootstrap confidence intervals, goodness-of-fit tests University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html

More information

Statistical Concepts. Constructing a Trend Plot

Statistical Concepts. Constructing a Trend Plot Module 1: Review of Basic Statistical Concepts 1.2 Plotting Data, Measures of Central Tendency and Dispersion, and Correlation Constructing a Trend Plot A trend plot graphs the data against a variable

More information

Sections 2.3 and 2.4

Sections 2.3 and 2.4 1 / 24 Sections 2.3 and 2.4 Note made by: Dr. Timothy Hanson Instructor: Peijie Hou Department of Statistics, University of South Carolina Stat 205: Elementary Statistics for the Biological and Life Sciences

More information

Stat 101 Exam 1 Important Formulas and Concepts 1

Stat 101 Exam 1 Important Formulas and Concepts 1 1 Chapter 1 1.1 Definitions Stat 101 Exam 1 Important Formulas and Concepts 1 1. Data Any collection of numbers, characters, images, or other items that provide information about something. 2. Categorical/Qualitative

More information

CHAPTER 5: EXPLORING DATA DISTRIBUTIONS. Individuals are the objects described by a set of data. These individuals may be people, animals or things.

CHAPTER 5: EXPLORING DATA DISTRIBUTIONS. Individuals are the objects described by a set of data. These individuals may be people, animals or things. (c) Epstein 2013 Chapter 5: Exploring Data Distributions Page 1 CHAPTER 5: EXPLORING DATA DISTRIBUTIONS 5.1 Creating Histograms Individuals are the objects described by a set of data. These individuals

More information

Introduction. ECN 102: Analysis of Economic Data Winter, J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 January 4, / 51

Introduction. ECN 102: Analysis of Economic Data Winter, J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 January 4, / 51 Introduction ECN 102: Analysis of Economic Data Winter, 2011 J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 January 4, 2011 1 / 51 Contact Information Instructor: John Parman Email: jmparman@ucdavis.edu

More information