u s s s S C CONCEPTS AND APPLICATIONS Mark L. Berenson David M. Levine Sixth Edition Prentice Hall, Upper Saddle River, New Jersey

Similar documents
APPLIED NONLINEAR CONTROL. Jean-Jacques E Slotine WeipingLi

Statistics for Managers using Microsoft Excel 6 th Edition

For Bonnie and Jesse (again)

Unit Two Descriptive Biostatistics. Dr Mahmoud Alhussami

Descriptive Statistics-I. Dr Mahmoud Alhussami

Numerical Measures of Central Tendency

Statistical Concepts. Constructing a Trend Plot

Descriptive Statistics

Math Tutor: Algebra Skills

Overview of Dispersion. Standard. Deviation

Histograms allow a visual interpretation

Variety I Variety II

CHAPTER 4 VARIABILITY ANALYSES. Chapter 3 introduced the mode, median, and mean as tools for summarizing the

2011 Pearson Education, Inc

Kaleidoscope. Math. by Cindi Mitchell and Jim Mitchell. Kaleidoscope Math Cindi Mitchell and Jim Mitchell, Published by Scholastic Teaching Resources

UNIT 3 CONCEPT OF DISPERSION

Introductory Statistics Neil A. Weiss Ninth Edition

TOPIC: Descriptive Statistics Single Variable

Algebra. Table of Contents

Statistics and parameters

Differential Equations and Linear Algebra C. Henry Edwards David E. Penney Third Edition

ALGEBRA. COPYRIGHT 1996 Mark Twain Media, Inc. ISBN Printing No EB

Mathematics for Economics

AN INTRODUCTION TO PROBABILITY AND STATISTICS

Math 223 Lecture Notes 3/15/04 From The Basic Practice of Statistics, bymoore

Unit 2. Describing Data: Numerical

COUNTING. Solutions Manual. 2nd Edition. Counting Downloaded from by on 02/19/18. For personal use only.

Geometrical Properties of Differential Equations Downloaded from by on 05/09/18. For personal use only.

STATISTICAL ANALYSIS WITH MISSING DATA

Emulsions. Fundamentals and Applications in the Petroleum Industry

1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved.

3.1 Measure of Center

Chapter 10. Regression. Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania

Unsolved! of UFOs. Kathryn Walker. based on original text by Brian Innes. Crabtree Publishing Company.

QUANTUM MECHANICS. For Electrical Engineers. Quantum Mechanics Downloaded from

1. AN INTRODUCTION TO DESCRIPTIVE STATISTICS. No great deed, private or public, has ever been undertaken in a bliss of certainty.

MATH STUDENT BOOK. 11th Grade Unit 9

STAT 200 Chapter 1 Looking at Data - Distributions

Chapter 5: Exploring Data: Distributions Lesson Plan

A FIRST COURSE IN INTEGRAL EQUATIONS

Elementary Statistics in Social Research Essentials Jack Levin James Alan Fox Third Edition

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 3.1- #

Pearson Education Limited Edinburgh Gate Harlow Essex CM20 2JE England and Associated Companies throughout the world

Daily Skill Builders:

Alternative Presentation of the Standard Normal Distribution

Computational Nanoscience

Applied Regression Modeling

Applied Multivariate Statistical Analysis Richard Johnson Dean Wichern Sixth Edition

in the company. Hence, we need to collect a sample which is representative of the entire population. In order for the sample to faithfully represent t

DISPLAYING THE POISSON REGRESSION ANALYSIS

CHAPTER 5 LINEAR REGRESSION AND CORRELATION

Regulated CheInicals Directory

Unit 2: Numerical Descriptive Measures

Chapter 2: Tools for Exploring Univariate Data

FRACTIONAL CALCULUS IN PHYSICS

SESSION 5 Descriptive Statistics

THE HISTORY AND PRESERVATION OF CHEMICAL INSTRUMENTATION

Integrated Arithmetic and Basic Algebra

ENGINEERING PROPERTIES OF NICKEL AND NICKEL ALLOYS

Digital Fundamentals

SIGNALS, SYSTEMS, AND TRANSFORMS FOURTH EDITION

Proportional Relationships

3.1 Measures of Central Tendency: Mode, Median and Mean. Average a single number that is used to describe the entire sample or population

Helping Students Understand Algebra

Unsolved! Kathryn Walker. Crabtree Publishing Company. based on original text by Brian Innes.

Analyzing and Interpreting Continuous Data Using JMP

Instructor: Doug Ensley Course: MAT Applied Statistics - Ensley

College Algebra. Third Edition. Concepts Through Functions. Michael Sullivan. Michael Sullivan, III. Chicago State University. Joliet Junior College

1 Measures of the Center of a Distribution

After completing this chapter, you should be able to:

Range The range is the simplest of the three measures and is defined now.

Theory and Problems of Signals and Systems

VARIATIONS INTRODUCTION TO THE CALCULUS OF. 3rd Edition. Introduction to the Calculus of Variations Downloaded from

Mathematica. A Systemjor Doing Mathematics by Computer. Stephen Wolfral1l. ~ C;:tr lju J. ~c~~

UNIFICATION OF FUNDAMENTAL FORCES

2009 Teacher Created Resources, Inc.

DEPARTMENT OF QUANTITATIVE METHODS & INFORMATION SYSTEMS QM 120. Spring 2008

Elementary Linear Algebra with Applications Bernard Kolman David Hill Ninth Edition

Chapter. Numerically Summarizing Data Pearson Prentice Hall. All rights reserved

Chaos in the Dynamics of the Family of Mappings f c (x) = x 2 x + c

THEORY OF PLATES AND SHELLS

Arithmetic Testing OnLine (ATOL) SM Assessment Framework

SPECIAL FUNCTIONS AN INTRODUCTION TO THE CLASSICAL FUNCTIONS OF MATHEMATICAL PHYSICS

Summarizing and Displaying Measurement Data/Understanding and Comparing Distributions

EMPIRICAL POLITICAL ANALYSIS

Springer Berlin Heidelberg New York Barcelona Budapest Hong Kong London Milan Paris Santa Clara Singapore Tokyo

Sampling, Frequency Distributions, and Graphs (12.1)

SAMPLE. The SSAT Course Book MIDDLE & UPPER LEVEL QUANTITATIVE. Focusing on the Individual Student

Simplifying Improper Fractions

Physics for Scientists & Engineers with Modern Physics Douglas C. Giancoli Fourth Edition

The Mathematics of Computerized Tomography

446 CHAP. 8 NUMERICAL OPTIMIZATION. Newton's Search for a Minimum of f(x,y) Newton s Method

STA 218: Statistics for Management

GIS AND TERRITORIAL INTELLIGENCE. Using Microdata. Jean Dubé and Diègo Legros

Jumpstarters for Fractions & Decimals

Lecture Notes for BUSINESS STATISTICS - BMGT 571. Chapters 1 through 6. Professor Ahmadi, Ph.D. Department of Management

Section 3.2 Measures of Central Tendency

Practical Geostatistics Isobel Clark. William V Harper

Respiration in Archaea and Bacteria

PROGRESS IN INORGANIC CHEMISTRY. Volume 11

Transcription:

Sixth Edition LI C u s s s S C CONCEPTS AND APPLICATIONS Mark L. Berenson David M. Levine Department ofstatistics and Computer Information Systems Baruch College, City University ofnew York Prentice Hall, Upper Saddle River, New Jersey

LIBRARY OF CONGRESS CATALOGING-IN-PUBLICATION DATA Berenson, Mark L. Basic business statistics : concepts and applications Mark L. Berenson, David M. Levine.-6th ed. p. cm. Includes bibliographical references and index. ISBN 0-13-303009-1 1. Commercial statistics. 2. Statistics. I. Levine, David M. HF1017.B38 1996 519.5 dc 20 94-12551 CIP Acquisitions Editor: Tom Tucker Production Editor: Katherine Evancie Managing Editor: Joyce Turner Cover Designer: Sue Behnke Interior Design: Ed Smith Design Director: Patricia H. Wosczyk Buyer: Marie McNamara Assistant Editor: Diane Peirano Production Assistant: Renee Pelletier Marketing Manager: Susan McLaughlin Cover art: Marjory Dressler 1996, 1992, 1989, 1986, 1983, 1979 by Prentice Hall, Inc. Simon & Schuster/A Viacom Company Upper Saddle River, New Jersey 07458 All rights reserved. No part of this book may be reproduced, in any form or by any means, without permission in writing from the publisher. Printed in the United States of America 10 9 8 7 6 5 4 ISBN 0-13-303009-1 Prentice-Hall International (UK) Limited, London Prentice-Hall of Australia Pty. Limited, Sydney Prentice-Hall Canada Inc., Toronto Prentice-Hall Hispanoamericana, S.A., Mexico Prentice-Hall of India Private Limited, New Delhi Prentice-Hall of Japan, Inc., Tokyo Simon & Schuster Asia Pte. Ltd., Singapore Editora Prentice-Hall do Brasil, Ltda., Rio de Janeiro

Using the ordered array of tuition rates charged (in thousands of dollars) to out-of-state residents from our sample of six Pennsylvania schools: 4.9 6.3 7.7 8.9 10.3 I 1.7 the range is 11.7-4.9 = 6.80 thousand dollars. The range measures the total spread in the batch of data. Although the range is a simple, easily calculated measure of total variation in the data, its distinct weakness is that it fails to take into account how the data are actually distributed between the smallest and largest values. This can be observed from Figure 4.4. Thus, as evidenced in scale C, it would be improper to use the range as a measure of variation when either one or both of its components are extreme observations. 7 8 9 1 1 12 Scale A 7 8 9 1 '0 dl 12 Scale B 0 0 O 0 0 Scale C 7 8 9 1 '0 1 ' 1,1X 13 Figure 4.4 Comparing three data sets with the same range. The Interquartile Range The interquartile range (also called midspread) is the difference between the third and first quartiles in a batch of data. That is, Interquartile range = Q, (4.5) This simple measure considers the spread in the middle 50% of the data and thus is in no way influenced by possibly occurring extreme values. Measures of Variation I i 9

For the Pennsylvania tuition rate data we have Interquartile range = Q3 - Q1 = 10.3 6.3 = 4.0 thousand dollars This is the range in tuition rates for the middle group of Pennsylvania schools. 4.5.3 The Variance and the Standard Deviation Although the range is a measure of the total spread and the interquartile range is a measure of the middle spread, neither of these measures of variation takes into consideration how the observations distribute or cluster. Two commonly used measures of variation that do take into account how all the values in the data are distributed are the variance and its square root, the standard deviation. These measures evaluate how the values fluctuate about the mean. Defining the Sample Variance The sample variance is roughly (or almost) the average of the squared differences between each of the observations in a batch of data and the mean. Thus, for a sample containing n observations, X 1, X 2,..., X,,, the sample variance (given by the symbol S 2) can be written as s2 (X, X) 2 + (X2 X) 2 ± (X, - 50 2 n 1 Using our summation notation, the above formulation can be more simply expressed as (X - X ) 2 s 2 = "1 n 1 (4.6) where X = sample arithmetic mean n = sample size X; = ith value of the random variable X 1(x, 30 2 = summation of all the squared differences between the X, values and X Had the denominator been n instead of n 1, the average of the squared differences around the mean would have been obtained. However, n 1 is used here because of certain desirable mathematical properties possessed by the statistic S 2 that make it appropriate for statistical inference (see Chapter 9). If the sample size is large, division by n or n 1 doesn't really make much difference. Defining the Sample Standard Deviation The sample standard deviation (given by the symbol S) is simply the square root of the sample variance. That is, 120 Chapter 4 Summarizing and Describing Numerical Data

S = (4.7) n - 1 Computing S2 and S To compute the variance we 1. Obtain the difference between each observation and the mean 2. Square each difference 3. Add the squared results together 4. Divide the summation by n - 1 To compute the standard deviation, we merely take the square root of the variance. For our sample of six Pennsylvania schools, the raw (tuition rate) data (in iikousands of dollars) were and X = 8.30 thousand dollars. The sample variance is computed as S 2 = = 1 = 10.3 4.9 8.9 11.7 6.3 7.7 n 1(x -X) 5 n - 1 2 (10.3-8.3) 2 + (4.9-8.3) 2 + + (7.7-8.3) 2 6-1 31.84 6.368 (in squared thousands of dollars) d the sample standard deviation is computed as S r s2 = -\ 1 = 1 n - 1 X) 2 = \16.368 = 2.52 thousands of dollars Obtaining S 2 and S Since in the preceding computations we are squaring the differences, neither the variance nor the standard deviation can ever be negative. The only time S2 and S could be zero would be when there was no variation at all in the data when each observation in the sample was exactly the same. In such an unusual case the range would also be zero. But numerical data are inherently variable not constant. Any random phenomenon of interest that we could think of usually takes on a variety of values. For example, colleges and universities charge different rates of tuition for out-of-state residents just as people have different IQs, incomes, weights, heights, ages, pulse rates, etc. It is because numerical data inherently vary that it becomes so important to study not only measures (of central tendency) that summarize the data but also measures (of variation) that reflect how the numerical data are dispersed. Measures of Variation

What the Variance and the Standard Deviation Indicate The variance and the standard deviation measure the "average" scatter around the mean that is, how larger observations fluctuate above it and how smaller observations distribute below it. The variance possesses certain useful mathematical properties. However, its computation results in squared units squared thousands of dollars, squared dollars, squared inches, etc. Thus, for practical work our primary measure of variation will be the standard deviation, whose value is in the original units of the data thousands of dollars, dollars, inches, etc. In the Pennsylvania tuition rate sample the standard deviation is 2.52 thousand dollars. This tells us that the majority of the tuition rates in this sample are clustering within 2.52 thousand dollars around the mean of 8.30 thousand dollars (that is, between 5.78 and 10.82 thousand dollars). Why We Square the Deviations deviation could not merely use The formulas for variance and standard i = 1 (Xi - X ) as a numerator, because you may recall that the mean acts as a balancing point for observations larger and smaller than it. Therefore, the sum of the deviations about the mean is always zero 3; that is, i (X X) = 0 I To demonstrate this, let us again refer to the Pennsylvania tuition rate data: Therefore, 10.3 4.9 8.9 11.7 6.3 7.7 1(X X) = (10.3-8.3) + (4.9-8.3) + (8.9-8.3) =0 + (11.7-8.3) + (6.3-8.3) + (7.7-8.3) This is depicted in the accompanying dot scale diagram displayed in Figure 4.5. As already noted, three of the observations are smaller than the mean and Tuition rates at six Pennsylvania schools Figure 4.5 The mean as a balancing point. Chapter 4 Summarizing and Describing Numerical Data

three are larger. Although the sum of the six deviations (2.0, -3.4, 0.6, 3.4, -2.0, and -0.6) is zero, the sum of the squared deviations allows us to study the variation in the data. Hence we use (xi - TO 2 =1 when computing the variance and standard deviation. In the squaring process, observations that are farther from the mean get more weight than observations closer to the mean. The respective squared deviations for the Pennsylvania tuition rate data are 4.00 11.56 0.36 11.56 4.00 0.36 We note that the fourth observation (X 4 = 11.7 thousand dollars) is 3.4 thousand dollars higher than the mean, and the second observation (X 2 = 4.9 thousand dollars) is 3.4 thousand dollars lower. In the squaring process both these values contribute substantially more to the calculation of S 2 and S than do the other observations in the sample, which are closer to the mean. Therefore we may generalize as follows: 1. The more spread out or dispersed the data are, the larger will be the range, the interquartile range, the variance, and the standard deviation. 2. The more concentrated or homogeneous the data are, the smaller will be the range, the interquartile range, the variance, and the standard deviation. 3. If the observations are all the same (so that there is no variation in the data), the range, interquartile range, variance, and standard deviation will all be zero. Computing S2 and S: "Hand-Held Calculator" Formulas The formulas for variance and standard deviation, Equations (4.6) and (4.7), are definitional formulas, but they are often not practical to use even with a hand-held calculator. For our Pennsylvania tuition rate data the mean, 8.30 thousand dollars, is not an integer. For these more typical situations, where the observations and the mean are unlikely to be integers, the following "hand-held calculator" formulas for the variance and the standard deviation are given for practical use: s 2 = 2 - n X 2 (4.8) n - 1 S = 1 xi2 nx2 n - 1 (4.9) Measures of Variation

where i = summation of the squares of the individual observations nx 2 = sample size times the square of the sample mean The hand-held calculator formulas, Equations (4.8) and (4.9), are identical to the definitional formulas, Equations (4.6) and (4.7). Since the denominators are the same, it is easy to show through expansion and the use of summation rules (see Appendix B) that P - X ) 2 X - nx 2 1=1 1=1 Moreover, since S 2 (and S) can never be negative, the summation of squares, must always equal or exceed n i=1 )c n 2 the sample size times the square of the sample mean. Returning to the Pennsylvania tuition rate data, the variance and standard deviation are recomputed using Equations (4.8) and (4.9) as follows: and S 2 = n 1 nx 2 (10.3 2 + 4.9 2 + + 7.7 2 ) 6(8.3 2 ) 6 1 (106.09 + 24.01 + + 59.29) 6(68.89) 5 445.18 413.34 5 31.84 = 6.368 (in squared thousands of dollars) 5 S = J6.368 = 2.52 thousand dollars 4.5.4 The Coefficient of Variation Unlike the previous measures we have studied, the coefficient of variation is a relative measure of variation. It is expressed as a percentage rather than in terms of the units of the particular data. I 24 Chapter 4 Summarizing and Describing Numerical Data