1 Measures of the Center of a Distribution
|
|
- Oliver Austin
- 5 years ago
- Views:
Transcription
1 1 Measures of the Center of a Distribution Qualitative descriptions of the shape of a distribution are important and useful. But we will often desire the precision of numerical summaries as well. Two aspects of unimodal distributions that we will often want to measure are center (what is a typical value? where do the values cluster?), and the amount of variation (are the data tightly clustered around a central value, or more spread out?) Two widely used measures of center are the mean and the median. You are probably already familiar with both. The mean is calculated by adding all the values of a variable and dividing by the number of values. Our usual notation will be to denote the n values as x 1, x 2,... x n, and the mean of these values as x. Then the formula for the mean becomes n x = x i. n The median is a value that splits the data in half half of the values are smaller than the median and half are larger. By this definition, there could be more than one median (when there are an even number of values). This ambiguity is removed by taking the mean of the two middle numbers (after sorting the data). Whereas x denotes the mean of the n numbers x 1,..., x n, we use x to denote the median of these numbers. The mean and median are easily computed in R. > x=c(1,1,5,20,10) > mean(x) [1] 7.4 > median(x) [1] Comparing mean and median While both the mean and median provide a measure of the center of a distribution, they measure different things and sometimes one measure is better than the other. If a distribution is (approximately) symmetric, the mean and median will be (approximately) the same. If the distribution is not symmetric, however, the mean and median may be very different. For example, if we begin with a symmetric distribution and add in one additional value that is very much larger than the other values (an outlier), then the median will not change very much (if at all), but the mean will increase substantially. Because of this, we say that the median is resistant to outliers while the mean is not. A similar thing happens with a skewed, unimodal distribution. If a distribution is positively skewed, the large values in the tail of the distribution increase the mean (as compared to a symmetric distribution) but not the median, so the mean will be larger than the median. Similarly, the mean of a negatively skewed distribution will be smaller than the median. Consider the data on the populations of the 3,141 county equivalents in the United States. From R we see the great difference in the mean county population and the median county population. Note that the largest county, Los Angeles County with over 9 million people, alone contributes over 3,000 people to the mean. Whether a resistant measure is desirable or not depends on context. If we are looking at the income of employees of a local business, the median may give us a much better indication of what a typical worker earns, since there may be a few large salaries (the business owner s, for example) that inflate the mean. This is also why the government reports median household income and median housing costs. The median county population perhaps tells us more about what a typical county looks like than does the mean. On the other hand, if we are ultimately interested in the total, the mean is more useful. For example, the median of daily sales of a Hallmark Card store will likely be smaller than the mean as there are several days of card sales that are outliers (e.g., the few days before Mother s Day). The mean daily sales for a year allows us also to compute the total sales on the year and comparing the means of two different stores allows us to determine which store has greater sales overall.
2 1.2 The trimmed mean There is another measure of center that is less well known and represents a kind of compromise between the mean and the median. In particular, it is more sensitive to the the extreme values of a distribution than the median is, but less sensitive than the mean. The idea of a trimmed mean is very simple. Before calculating the mean, we remove the largest and smallest values from the data. The percentage of the data removed from each end is called the trimming percentage. The 10% trimmed mean is the mean of the middle 80% of the data (after removing the largest and smallest 10%). A trimmed mean is calculated in R by setting the trim argument of mean(), e.g. mean(x,trim=.10). Although a trimmed mean in some sense combines the advantages of both the mean and median, it is less common than either the mean or the median. This is partly due the mathematical theory that has been developed for working with the median and especially the mean of sample data. The 10% trimmed mean of county populations is 38,234 which is much closer in size to the median than to the mean. We note in passing that there are some complications in defining the trimmed mean. For instance in the case of the example above, the 10% trimmed mean of the county populations should be computed by removing the largest most populus and the least populus counties from the data. But this doesn t make any sense so that there needs to be some convention for dealing with these fractions of data points. In fact there are several different conventions but we will let R handle the details of the computation and not be too concerned about the differences. With large datasets, the differences between the various definitions of, say, the 10% trimmed mean are small. In some sports, the trimmed mean is used to compute a competitors score based on the scores given by individual judges. Diving, figure skating, and gymnastics are three sports that use a trimmed mean to compute a competitors final score. (Diving uses the middle three scores when there are five judges which amounts to computing the 20% trimmed mean.) 2 Measures of Dispersion It is often useful to characterize a distribution in terms of its center, but that is not the whole story. Consider the distributions depicted in the histograms below A B 0.15 Density In each case the mean and median are approximately 10, but the distributions clearly have very different shapes. The difference is that distribution B is much more spread out. Almost all of the data in distribution A are quite close to 10; a much larger proportion of distribution B is far away from 10. The intuitive (and not very precise) statement in the preceding sentence can be quantified by means of quantiles. The idea of quantiles is probably familiar to you since percentiles are a special case of quantiles. Definition 2.1 (Quantile). Let p [0, 1]. A p-quantile of a quantitative distribution is a number q such that the (approximate) proportion of the distribution that is less than q is p.
3 So for example, the.2-quantile divides a distribution into 20% below and 80% above. The.2-quantile is also called the 20th percentile. The median is just the.5-quantile (and the 50th percentile). While the definition of quantile above seems clear, it does have the same complication as that of the definition of trimmed mean. Suppose your data set has 15 values. What is the.30-quantile? 30% of the data would be (.30)(15) = 4.5 values. Of course, there is no number that has 4.5 values below it and 10.5 values above it. This is the reason for the parenthetical word approximate in Definition??. Different methods have been proposed for giving quantiles a single value, and R implements 9 different methods! The next example illustrates the default computation of the.25-quantile for datasets of size 5, 6, 7 and 8. > quantile(1:5,.25) 2 > quantile(1:6,.25) 2.25 > quantile(1:7,.25) 2.5 > quantile(1:8,.25) 2.75 Fortunately, for large data sets, the differences between the various different quantile methods are usually small, so we will just let R compute quantiles for us using the default method of the quantile() function. The difference between the first and third quartiles is often used as a simple measure of dispersion. This measure is called the inter-quartile range and abbreviated IQR. The IQR of the Old Faithful eruption times is = > quantile(faithful$eruptions,.75) 75% > quantile(faithful$eruptions,.25) > IQR(faithful$eruptions) [1] Note that since IQR depends only on the middle 50% of data, it is a measure of dispersion that is resistant to outliers. Especially for hand computation, yet another method of computing the quartiles based on the median is popular. With this method, the first- and third-quartiles (the.25-quantile and the.75-quantile respectively) are called the lower hinge and the upper hinge respectively. These are computed by the following definition. Definition 2.2 (hinges). Suppose that a variable x 1,..., x n has an even number of values, say n = 2k. Then the lower hinge is the median of the smallest k values and the upper hinge is the median of the largest k values. If the variable has an odd number of values, n = 2k + 1, then the lower hinge is the median of the smallest k + 1 values and the upper hinge is the median of the largest k + 1 values. In other words, the lower hinge is the median of the lower half of the data with the middle point included in that half if there are an odd number of data points. Similarly for the upper hinge. A very common and useful description of the variability in a distribution is the five number summary. The five number summary consists of the minimum, lower hinge, median, upper hinge, and maximum of the distribution. The five number summary is computed by the R function fivenum().
4 The five-number summary is often presented graphically by means of a boxplot. The standard R function is boxplot(). A boxplot of the Sepal.Width of the iris data is generated by > boxplot(iris$sepal.width,horizontal=t) The sides of the box are drawn at the hinges. The median is represented by a dot or line in the box. In some boxplots, the whiskers extend out to the maximum and minimum values. However the boxplot that we are using here attempts to identify outliers. Outliers are values that are unusually large or small and are indicated by a special symbol beyond the whiskers. The whiskers are then drawn from the box to the largest and smallest non-outliers. One common rule for automating outlier detection for boxplots is the 1.5 IQR rule. This is the default rule in both boxplot functions in R. Under this rule, any value that is more than 1.5 IQR away from the box is marked as an outlier. Indicating outliers in this way is useful since it allows us to see if the whisker is long only because of a few extreme values. A boxplot gives us some idea of the symmetry and general dispersion of a variable but it certainly doesn t give us as much information about the shape of a distribution as a histogram. 2.1 Variance and Standard Deviation Another important way to measure the dispersion of a distribution is by comparing each value to the center of the distribution. If the distribution is spread out, these differences will tend to be large, otherwise these differences will be small. To get a single number, we could simply add up all of the deviation from the mean: total deviation from the mean = (x i x). The trouble with this is that the total deviation from the mean is always 0. deviations and the positive deviations always exactly cancel out. To fix this problem we might consider taking the absolute value of the deviations from the mean: total absolute deviation from the mean = x i x. The problem is that the negative This number will only be 0 if all of the data values are equal to the mean. Even better would be to divide by the number of data values. Otherwise large data sets will have large sums even if the values are all close to the mean. mean absolute deviation mean absolute deviation = 1 x i x. n This is a reasonable measure of the dispersion in a distribution, but we will not use it very often. There is another measure that is much more common, namely the variance, which is defined by variance variance = VAR(x) = 1 n 1 (x i x) 2.
5 You will notice two differences from the mean absolute deviation. First, instead of using an absolute value to make things positive, we square the deviations from the mean. One advantage of squaring over the absolute value is that it is much easier to do calculus with a polynomial than with functions involving absolute values. The second difference is that we divide by n 1 instead of by n. There is a good reason for this, even though dividing by n seems more natural. We will get to that reason later. However the following principle helps to remember the number n 1. If we consider the n 1 deviations (x 1 x), (x 2 x),..., (x n x), any one of them is determined from the other n 1 (because of the property of the mean that the sums of these deviations is 0). Therefore there are only n 1 degrees of freedom in these numbers rather than the n degrees of freedom in the data. Indeed, n 1 is called degrees of freedom for this reason. Because the squaring changes the units of this measure, the square root of the variance, called the standard deviation, is commonly used in place of the variance. standard deviation = SD(x) = VAR(x). We will sometimes use the notation s x and s 2 x for the standard deviation and variance respectively. The subscript x refers to the particular variable for which we are computing the variance or standard deviation and we sometimes omit it (and write s or s 2 ) when it is clear what variable is involved. All of these quantities are easy to compute in R.
Describing Distributions with Numbers
Describing Distributions with Numbers Using graphs, we could determine the center, spread, and shape of the distribution of a quantitative variable. We can also use numbers (called summary statistics)
More information2011 Pearson Education, Inc
Statistics for Business and Economics Chapter 2 Methods for Describing Sets of Data Summary of Central Tendency Measures Measure Formula Description Mean x i / n Balance Point Median ( n +1) Middle Value
More informationLecture 2 and Lecture 3
Lecture 2 and Lecture 3 1 Lecture 2 and Lecture 3 We can describe distributions using 3 characteristics: shape, center and spread. These characteristics have been discussed since the foundation of statistics.
More informationUnit 2: Numerical Descriptive Measures
Unit 2: Numerical Descriptive Measures Summation Notation Measures of Central Tendency Measures of Dispersion Chebyshev's Rule Empirical Rule Measures of Relative Standing Box Plots z scores Jan 28 10:48
More informationQUANTITATIVE DATA. UNIVARIATE DATA data for one variable
QUANTITATIVE DATA Recall that quantitative (numeric) data values are numbers where data take numerical values for which it is sensible to find averages, such as height, hourly pay, and pulse rates. UNIVARIATE
More informationChapter 3. Data Description
Chapter 3. Data Description Graphical Methods Pie chart It is used to display the percentage of the total number of measurements falling into each of the categories of the variable by partition a circle.
More informationMeasures of center. The mean The mean of a distribution is the arithmetic average of the observations:
Measures of center The mean The mean of a distribution is the arithmetic average of the observations: x = x 1 + + x n n n = 1 x i n i=1 The median The median is the midpoint of a distribution: the number
More information2.1 Measures of Location (P.9-11)
MATH1015 Biostatistics Week.1 Measures of Location (P.9-11).1.1 Summation Notation Suppose that we observe n values from an experiment. This collection (or set) of n values is called a sample. Let x 1
More informationCHAPTER 1. Introduction
CHAPTER 1 Introduction Engineers and scientists are constantly exposed to collections of facts, or data. The discipline of statistics provides methods for organizing and summarizing data, and for drawing
More informationChapter 1 - Lecture 3 Measures of Location
Chapter 1 - Lecture 3 of Location August 31st, 2009 Chapter 1 - Lecture 3 of Location General Types of measures Median Skewness Chapter 1 - Lecture 3 of Location Outline General Types of measures What
More informationLecture 6: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries)
Lecture 6: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries) Summarize with Shape, Center, Spread Displays: Stemplots, Histograms Five Number Summary, Outliers, Boxplots Cengage Learning
More informationElementary Statistics
Elementary Statistics Q: What is data? Q: What does the data look like? Q: What conclusions can we draw from the data? Q: Where is the middle of the data? Q: Why is the spread of the data important? Q:
More informationSTP 420 INTRODUCTION TO APPLIED STATISTICS NOTES
INTRODUCTION TO APPLIED STATISTICS NOTES PART - DATA CHAPTER LOOKING AT DATA - DISTRIBUTIONS Individuals objects described by a set of data (people, animals, things) - all the data for one individual make
More informationStatistics and parameters
Statistics and parameters Tables, histograms and other charts are used to summarize large amounts of data. Often, an even more extreme summary is desirable. Statistics and parameters are numbers that characterize
More informationChapter 3: Displaying and summarizing quantitative data p52 The pattern of variation of a variable is called its distribution.
Chapter 3: Displaying and summarizing quantitative data p52 The pattern of variation of a variable is called its distribution. 1 Histograms p53 The breakfast cereal data Study collected data on nutritional
More information1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved.
1-1 Chapter 1 Sampling and Descriptive Statistics 1-2 Why Statistics? Deal with uncertainty in repeated scientific measurements Draw conclusions from data Design valid experiments and draw reliable conclusions
More informationChapter 2: Tools for Exploring Univariate Data
Stats 11 (Fall 2004) Lecture Note Introduction to Statistical Methods for Business and Economics Instructor: Hongquan Xu Chapter 2: Tools for Exploring Univariate Data Section 2.1: Introduction What is
More informationDescriptive Data Summarization
Descriptive Data Summarization Descriptive data summarization gives the general characteristics of the data and identify the presence of noise or outliers, which is useful for successful data cleaning
More informationLecture 2. Quantitative variables. There are three main graphical methods for describing, summarizing, and detecting patterns in quantitative data:
Lecture 2 Quantitative variables There are three main graphical methods for describing, summarizing, and detecting patterns in quantitative data: Stemplot (stem-and-leaf plot) Histogram Dot plot Stemplots
More informationP8130: Biostatistical Methods I
P8130: Biostatistical Methods I Lecture 2: Descriptive Statistics Cody Chiuzan, PhD Department of Biostatistics Mailman School of Public Health (MSPH) Lecture 1: Recap Intro to Biostatistics Types of Data
More information1. Exploratory Data Analysis
1. Exploratory Data Analysis 1.1 Methods of Displaying Data A visual display aids understanding and can highlight features which may be worth exploring more formally. Displays should have impact and be
More informationShape, Outliers, Center, Spread Frequency and Relative Histograms Related to other types of graphical displays
Histograms: Shape, Outliers, Center, Spread Frequency and Relative Histograms Related to other types of graphical displays Sep 9 1:13 PM Shape: Skewed left Bell shaped Symmetric Bi modal Symmetric Skewed
More informationPerhaps the most important measure of location is the mean (average). Sample mean: where n = sample size. Arrange the values from smallest to largest:
1 Chapter 3 - Descriptive stats: Numerical measures 3.1 Measures of Location Mean Perhaps the most important measure of location is the mean (average). Sample mean: where n = sample size Example: The number
More informationDescribing Distributions with Numbers
Topic 2 We next look at quantitative data. Recall that in this case, these data can be subject to the operations of arithmetic. In particular, we can add or subtract observation values, we can sort them
More informationHistograms allow a visual interpretation
Chapter 4: Displaying and Summarizing i Quantitative Data s allow a visual interpretation of quantitative (numerical) data by indicating the number of data points that lie within a range of values, called
More informationUnit 2. Describing Data: Numerical
Unit 2 Describing Data: Numerical Describing Data Numerically Describing Data Numerically Central Tendency Arithmetic Mean Median Mode Variation Range Interquartile Range Variance Standard Deviation Coefficient
More informationChapter 5. Understanding and Comparing. Distributions
STAT 141 Introduction to Statistics Chapter 5 Understanding and Comparing Distributions Bin Zou (bzou@ualberta.ca) STAT 141 University of Alberta Winter 2015 1 / 27 Boxplots How to create a boxplot? Assume
More informationChapter 3. Measuring data
Chapter 3 Measuring data 1 Measuring data versus presenting data We present data to help us draw meaning from it But pictures of data are subjective They re also not susceptible to rigorous inference Measuring
More informationSection 3. Measures of Variation
Section 3 Measures of Variation Range Range = (maximum value) (minimum value) It is very sensitive to extreme values; therefore not as useful as other measures of variation. Sample Standard Deviation The
More informationDescribing distributions with numbers
Describing distributions with numbers A large number or numerical methods are available for describing quantitative data sets. Most of these methods measure one of two data characteristics: The central
More informationDescribing distributions with numbers
Describing distributions with numbers A large number or numerical methods are available for describing quantitative data sets. Most of these methods measure one of two data characteristics: The central
More informationChapter 4.notebook. August 30, 2017
Sep 1 7:53 AM Sep 1 8:21 AM Sep 1 8:21 AM 1 Sep 1 8:23 AM Sep 1 8:23 AM Sep 1 8:23 AM SOCS When describing a distribution, make sure to always tell about three things: shape, outliers, center, and spread
More informationECLT 5810 Data Preprocessing. Prof. Wai Lam
ECLT 5810 Data Preprocessing Prof. Wai Lam Why Data Preprocessing? Data in the real world is imperfect incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate
More informationUnit Two Descriptive Biostatistics. Dr Mahmoud Alhussami
Unit Two Descriptive Biostatistics Dr Mahmoud Alhussami Descriptive Biostatistics The best way to work with data is to summarize and organize them. Numbers that have not been summarized and organized are
More informationF78SC2 Notes 2 RJRC. If the interest rate is 5%, we substitute x = 0.05 in the formula. This gives
F78SC2 Notes 2 RJRC Algebra It is useful to use letters to represent numbers. We can use the rules of arithmetic to manipulate the formula and just substitute in the numbers at the end. Example: 100 invested
More informationCHAPTER 2: Describing Distributions with Numbers
CHAPTER 2: Describing Distributions with Numbers The Basic Practice of Statistics 6 th Edition Moore / Notz / Fligner Lecture PowerPoint Slides Chapter 2 Concepts 2 Measuring Center: Mean and Median Measuring
More informationDescribing Distributions
Describing Distributions With Numbers April 18, 2012 Summary Statistics. Measures of Center. Percentiles. Measures of Spread. A Summary Statement. Choosing Numerical Summaries. 1.0 What Are Summary Statistics?
More informationChapter 4. Displaying and Summarizing. Quantitative Data
STAT 141 Introduction to Statistics Chapter 4 Displaying and Summarizing Quantitative Data Bin Zou (bzou@ualberta.ca) STAT 141 University of Alberta Winter 2015 1 / 31 4.1 Histograms 1 We divide the range
More informationLecture 3B: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries)
Lecture 3B: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries) Summarize with Shape, Center, Spread Displays: Stemplots, Histograms Five Number Summary, Outliers, Boxplots Mean vs.
More informationFurther Mathematics 2018 CORE: Data analysis Chapter 2 Summarising numerical data
Chapter 2: Summarising numerical data Further Mathematics 2018 CORE: Data analysis Chapter 2 Summarising numerical data Extract from Study Design Key knowledge Types of data: categorical (nominal and ordinal)
More informationChapter 6 Group Activity - SOLUTIONS
Chapter 6 Group Activity - SOLUTIONS Group Activity Summarizing a Distribution 1. The following data are the number of credit hours taken by Math 105 students during a summer term. You will be analyzing
More informationΣ x i. Sigma Notation
Sigma Notation The mathematical notation that is used most often in the formulation of statistics is the summation notation The uppercase Greek letter Σ (sigma) is used as shorthand, as a way to indicate
More information3.1 Measures of Central Tendency: Mode, Median and Mean. Average a single number that is used to describe the entire sample or population
. Measures of Central Tendency: Mode, Median and Mean Average a single number that is used to describe the entire sample or population. Mode a. Easiest to compute, but not too stable i. Changing just one
More informationMath 140 Introductory Statistics
Math 140 Introductory Statistics Professor Silvia Fernández Chapter 2 Based on the book Statistics in Action by A. Watkins, R. Scheaffer, and G. Cobb. Visualizing Distributions Recall the definition: The
More informationMath 140 Introductory Statistics
Visualizing Distributions Math 140 Introductory Statistics Professor Silvia Fernández Chapter Based on the book Statistics in Action by A. Watkins, R. Scheaffer, and G. Cobb. Recall the definition: The
More informationSummarizing and Displaying Measurement Data/Understanding and Comparing Distributions
Summarizing and Displaying Measurement Data/Understanding and Comparing Distributions Histograms, Mean, Median, Five-Number Summary and Boxplots, Standard Deviation Thought Questions 1. If you were to
More informationCS 147: Computer Systems Performance Analysis
CS 147: Computer Systems Performance Analysis Summarizing Variability and Determining Distributions CS 147: Computer Systems Performance Analysis Summarizing Variability and Determining Distributions 1
More informationRange The range is the simplest of the three measures and is defined now.
Measures of Variation EXAMPLE A testing lab wishes to test two experimental brands of outdoor paint to see how long each will last before fading. The testing lab makes 6 gallons of each paint to test.
More informationDescriptive Statistics-I. Dr Mahmoud Alhussami
Descriptive Statistics-I Dr Mahmoud Alhussami Biostatistics What is the biostatistics? A branch of applied math. that deals with collecting, organizing and interpreting data using well-defined procedures.
More informationDescribing Distributions With Numbers
Describing Distributions With Numbers October 24, 2012 What Do We Usually Summarize? Measures of Center. Percentiles. Measures of Spread. A Summary Statement. Choosing Numerical Summaries. 1.0 What Do
More information3.1 Measure of Center
3.1 Measure of Center Calculate the mean for a given data set Find the median, and describe why the median is sometimes preferable to the mean Find the mode of a data set Describe how skewness affects
More informationTOPIC: Descriptive Statistics Single Variable
TOPIC: Descriptive Statistics Single Variable I. Numerical data summary measurements A. Measures of Location. Measures of central tendency Mean; Median; Mode. Quantiles - measures of noncentral tendency
More informationMAT Mathematics in Today's World
MAT 1000 Mathematics in Today's World Last Time 1. Three keys to summarize a collection of data: shape, center, spread. 2. Can measure spread with the fivenumber summary. 3. The five-number summary can
More informationDEPARTMENT OF QUANTITATIVE METHODS & INFORMATION SYSTEMS QM 120. Spring 2008
DEPARTMENT OF QUANTITATIVE METHODS & INFORMATION SYSTEMS Introduction to Business Statistics QM 120 Chapter 3 Spring 2008 Measures of central tendency for ungrouped data 2 Graphs are very helpful to describe
More informationChapter 1:Descriptive statistics
Slide 1.1 Chapter 1:Descriptive statistics Descriptive statistics summarises a mass of information. We may use graphical and/or numerical methods Examples of the former are the bar chart and XY chart,
More informationChapter 1: Exploring Data
Chapter 1: Exploring Data Section 1.3 with Numbers The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE Chapter 1 Exploring Data Introduction: Data Analysis: Making Sense of Data 1.1
More informationWeek 1: Intro to R and EDA
Statistical Methods APPM 4570/5570, STAT 4000/5000 Populations and Samples 1 Week 1: Intro to R and EDA Introduction to EDA Objective: study of a characteristic (measurable quantity, random variable) for
More informationChapter 1. Looking at Data
Chapter 1 Looking at Data Types of variables Looking at Data Be sure that each variable really does measure what you want it to. A poor choice of variables can lead to misleading conclusions!! For example,
More informationStatistics I Chapter 2: Univariate data analysis
Statistics I Chapter 2: Univariate data analysis Chapter 2: Univariate data analysis Contents Graphical displays for categorical data (barchart, piechart) Graphical displays for numerical data data (histogram,
More informationExercises from Chapter 3, Section 1
Exercises from Chapter 3, Section 1 1. Consider the following sample consisting of 20 numbers. (a) Find the mode of the data 21 23 24 24 25 26 29 30 32 34 39 41 41 41 42 43 48 51 53 53 (b) Find the median
More informationObjective A: Mean, Median and Mode Three measures of central of tendency: the mean, the median, and the mode.
Chapter 3 Numerically Summarizing Data Chapter 3.1 Measures of Central Tendency Objective A: Mean, Median and Mode Three measures of central of tendency: the mean, the median, and the mode. A1. Mean The
More informationAP Final Review II Exploring Data (20% 30%)
AP Final Review II Exploring Data (20% 30%) Quantitative vs Categorical Variables Quantitative variables are numerical values for which arithmetic operations such as means make sense. It is usually a measure
More informationReview for Exam #1. Chapter 1. The Nature of Data. Definitions. Population. Sample. Quantitative data. Qualitative (attribute) data
Review for Exam #1 1 Chapter 1 Population the complete collection of elements (scores, people, measurements, etc.) to be studied Sample a subcollection of elements drawn from a population 11 The Nature
More informationMATH4427 Notebook 4 Fall Semester 2017/2018
MATH4427 Notebook 4 Fall Semester 2017/2018 prepared by Professor Jenny Baglivo c Copyright 2009-2018 by Jenny A. Baglivo. All Rights Reserved. 4 MATH4427 Notebook 4 3 4.1 K th Order Statistics and Their
More information1.3.1 Measuring Center: The Mean
1.3.1 Measuring Center: The Mean Mean - The arithmetic average. To find the mean (pronounced x bar) of a set of observations, add their values and divide by the number of observations. If the n observations
More informationDescription of Samples and Populations
Description of Samples and Populations Random Variables Data are generated by some underlying random process or phenomenon. Any datum (data point) represents the outcome of a random variable. We represent
More informationMATH 117 Statistical Methods for Management I Chapter Three
Jubail University College MATH 117 Statistical Methods for Management I Chapter Three This chapter covers the following topics: I. Measures of Center Tendency. 1. Mean for Ungrouped Data (Raw Data) 2.
More informationStatistics I Chapter 2: Univariate data analysis
Statistics I Chapter 2: Univariate data analysis Chapter 2: Univariate data analysis Contents Graphical displays for categorical data (barchart, piechart) Graphical displays for numerical data data (histogram,
More informationDescribing Distributions With Numbers Chapter 12
Describing Distributions With Numbers Chapter 12 May 1, 2013 What Do We Usually Summarize? Measures of Center. Percentiles. Measures of Spread. A Summary. 1.0 What Do We Usually Summarize? source: Prof.
More informationSTA 218: Statistics for Management
Al Nosedal. University of Toronto. Fall 2017 My momma always said: Life was like a box of chocolates. You never know what you re gonna get. Forrest Gump. Problem How much do people with a bachelor s degree
More informationFinding Quartiles. . Q1 is the median of the lower half of the data. Q3 is the median of the upper half of the data
Finding Quartiles. Use the median to divide the ordered data set into two halves.. If n is odd, do not include the median in either half. If n is even, split this data set exactly in half.. Q1 is the median
More informationMATH 1150 Chapter 2 Notation and Terminology
MATH 1150 Chapter 2 Notation and Terminology Categorical Data The following is a dataset for 30 randomly selected adults in the U.S., showing the values of two categorical variables: whether or not the
More informationSlide 1. Slide 2. Slide 3. Pick a Brick. Daphne. 400 pts 200 pts 300 pts 500 pts 100 pts. 300 pts. 300 pts 400 pts 100 pts 400 pts.
Slide 1 Slide 2 Daphne Phillip Kathy Slide 3 Pick a Brick 100 pts 200 pts 500 pts 300 pts 400 pts 200 pts 300 pts 500 pts 100 pts 300 pts 400 pts 100 pts 400 pts 100 pts 200 pts 500 pts 100 pts 400 pts
More informationChapter 1: Introduction. Material from Devore s book (Ed 8), and Cengagebrain.com
1 Chapter 1: Introduction Material from Devore s book (Ed 8), and Cengagebrain.com Populations and Samples An investigation of some characteristic of a population of interest. Example: Say you want to
More informationStatistics for Managers using Microsoft Excel 6 th Edition
Statistics for Managers using Microsoft Excel 6 th Edition Chapter 3 Numerical Descriptive Measures 3-1 Learning Objectives In this chapter, you learn: To describe the properties of central tendency, variation,
More informationWhat is statistics? Statistics is the science of: Collecting information. Organizing and summarizing the information collected
What is statistics? Statistics is the science of: Collecting information Organizing and summarizing the information collected Analyzing the information collected in order to draw conclusions Two types
More informationMeasures of disease spread
Measures of disease spread Marco De Nardi Milk Safety Project 1 Objectives 1. Describe the following measures of spread: range, interquartile range, variance, and standard deviation 2. Discuss examples
More informationExploratory data analysis: numerical summaries
16 Exploratory data analysis: numerical summaries The classical way to describe important features of a dataset is to give several numerical summaries We discuss numerical summaries for the center of a
More informationMgtOp 215 Chapter 3 Dr. Ahn
MgtOp 215 Chapter 3 Dr. Ahn Measures of central tendency (center, location): measures the middle point of a distribution or data; these include mean and median. Measures of dispersion (variability, spread):
More information1.3: Describing Quantitative Data with Numbers
1.3: Describing Quantitative Data with Numbers Section 1.3 Describing Quantitative Data with Numbers After this section, you should be able to MEASURE center with the mean and median MEASURE spread with
More informationChapter 3: Displaying and summarizing quantitative data p52 The pattern of variation of a variable is called its distribution.
Chapter 3: Displaying and summarizing quantitative data p52 The pattern of variation of a variable is called its distribution. 1 Histograms p53 Spoiled ballots are a real threat to democracy. Below are
More informationLecture 1: Descriptive Statistics
Lecture 1: Descriptive Statistics MSU-STT-351-Sum 15 (P. Vellaisamy: MSU-STT-351-Sum 15) Probability & Statistics for Engineers 1 / 56 Contents 1 Introduction 2 Branches of Statistics Descriptive Statistics
More informationMeasures of the Location of the Data
Measures of the Location of the Data 1. 5. Mark has 51 films in his collection. Each movie comes with a rating on a scale from 0.0 to 10.0. The following table displays the ratings of the aforementioned
More informationGRAPHS AND STATISTICS Central Tendency and Dispersion Common Core Standards
B Graphs and Statistics, Lesson 2, Central Tendency and Dispersion (r. 2018) GRAPHS AND STATISTICS Central Tendency and Dispersion Common Core Standards Next Generation Standards S-ID.A.2 Use statistics
More informationChapter 1 Descriptive Statistics
MICHIGAN STATE UNIVERSITY STT 351 SECTION 2 FALL 2008 LECTURE NOTES Chapter 1 Descriptive Statistics Nao Mimoto Contents 1 Overview 2 2 Pictorial Methods in Descriptive Statistics 3 2.1 Different Kinds
More informationExample 2. Given the data below, complete the chart:
Statistics 2035 Quiz 1 Solutions Example 1. 2 64 150 150 2 128 150 2 256 150 8 8 Example 2. Given the data below, complete the chart: 52.4, 68.1, 66.5, 75.0, 60.5, 78.8, 63.5, 48.9, 81.3 n=9 The data is
More informationLecture 2. Descriptive Statistics: Measures of Center
Lecture 2. Descriptive Statistics: Measures of Center Descriptive Statistics summarize or describe the important characteristics of a known set of data Inferential Statistics use sample data to make inferences
More informationResistant Measure - A statistic that is not affected very much by extreme observations.
Chapter 1.3 Lecture Notes & Examples Section 1.3 Describing Quantitative Data with Numbers (pp. 50-74) 1.3.1 Measuring Center: The Mean Mean - The arithmetic average. To find the mean (pronounced x bar)
More informationSTAT 200 Chapter 1 Looking at Data - Distributions
STAT 200 Chapter 1 Looking at Data - Distributions What is Statistics? Statistics is a science that involves the design of studies, data collection, summarizing and analyzing the data, interpreting the
More informationThe empirical ( ) rule
The empirical (68-95-99.7) rule With a bell shaped distribution, about 68% of the data fall within a distance of 1 standard deviation from the mean. 95% fall within 2 standard deviations of the mean. 99.7%
More informationADMS2320.com. We Make Stats Easy. Chapter 4. ADMS2320.com Tutorials Past Tests. Tutorial Length 1 Hour 45 Minutes
We Make Stats Easy. Chapter 4 Tutorial Length 1 Hour 45 Minutes Tutorials Past Tests Chapter 4 Page 1 Chapter 4 Note The following topics will be covered in this chapter: Measures of central location Measures
More informationBusiness Statistics. Lecture 10: Course Review
Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,
More information2/2/2015 GEOGRAPHY 204: STATISTICAL PROBLEM SOLVING IN GEOGRAPHY MEASURES OF CENTRAL TENDENCY CHAPTER 3: DESCRIPTIVE STATISTICS AND GRAPHICS
Spring 2015: Lembo GEOGRAPHY 204: STATISTICAL PROBLEM SOLVING IN GEOGRAPHY CHAPTER 3: DESCRIPTIVE STATISTICS AND GRAPHICS Descriptive statistics concise and easily understood summary of data set characteristics
More informationare the objects described by a set of data. They may be people, animals or things.
( c ) E p s t e i n, C a r t e r a n d B o l l i n g e r 2016 C h a p t e r 5 : E x p l o r i n g D a t a : D i s t r i b u t i o n s P a g e 1 CHAPTER 5: EXPLORING DATA DISTRIBUTIONS 5.1 Creating Histograms
More informationOne-Sample Numerical Data
One-Sample Numerical Data quantiles, boxplot, histogram, bootstrap confidence intervals, goodness-of-fit tests University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html
More informationStatistical Concepts. Constructing a Trend Plot
Module 1: Review of Basic Statistical Concepts 1.2 Plotting Data, Measures of Central Tendency and Dispersion, and Correlation Constructing a Trend Plot A trend plot graphs the data against a variable
More informationSections 2.3 and 2.4
1 / 24 Sections 2.3 and 2.4 Note made by: Dr. Timothy Hanson Instructor: Peijie Hou Department of Statistics, University of South Carolina Stat 205: Elementary Statistics for the Biological and Life Sciences
More informationStat 101 Exam 1 Important Formulas and Concepts 1
1 Chapter 1 1.1 Definitions Stat 101 Exam 1 Important Formulas and Concepts 1 1. Data Any collection of numbers, characters, images, or other items that provide information about something. 2. Categorical/Qualitative
More informationCHAPTER 5: EXPLORING DATA DISTRIBUTIONS. Individuals are the objects described by a set of data. These individuals may be people, animals or things.
(c) Epstein 2013 Chapter 5: Exploring Data Distributions Page 1 CHAPTER 5: EXPLORING DATA DISTRIBUTIONS 5.1 Creating Histograms Individuals are the objects described by a set of data. These individuals
More informationIntroduction. ECN 102: Analysis of Economic Data Winter, J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 January 4, / 51
Introduction ECN 102: Analysis of Economic Data Winter, 2011 J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 January 4, 2011 1 / 51 Contact Information Instructor: John Parman Email: jmparman@ucdavis.edu
More information