Measures of Variation

Similar documents
Statistics 511 Additional Materials

6.3 Testing Series With Positive Terms

1 Inferential Methods for Correlation and Regression Analysis

II. Descriptive Statistics D. Linear Correlation and Regression. 1. Linear Correlation

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER / Statistics

Anna Janicka Mathematical Statistics 2018/2019 Lecture 1, Parts 1 & 2

Understanding Samples

Chapter If n is odd, the median is the exact middle number If n is even, the median is the average of the two middle numbers

Measures of Spread: Standard Deviation

Number of fatalities X Sunday 4 Monday 6 Tuesday 2 Wednesday 0 Thursday 3 Friday 5 Saturday 8 Total 28. Day

1 Lesson 6: Measure of Variation

MEASURES OF DISPERSION (VARIABILITY)

Confidence Intervals for the Population Proportion p

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Infinite Sequences and Series

Estimation of a population proportion March 23,

Lecture 1. Statistics: A science of information. Population: The population is the collection of all subjects we re interested in studying.

Chapter 6 Sampling Distributions

ANALYSIS OF EXPERIMENTAL ERRORS

4.3 Growth Rates of Solutions to Recurrences

Data Analysis and Statistical Methods Statistics 651

Elementary Statistics

Median and IQR The median is the value which divides the ordered data values in half.

Final Examination Solutions 17/6/2010

ENGI 4421 Probability and Statistics Faculty of Engineering and Applied Science Problem Set 1 Solutions Descriptive Statistics. None at all!

Frequentist Inference

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

Chapter 23: Inferences About Means

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10

Chapter 8: Estimating with Confidence

Response Variable denoted by y it is the variable that is to be predicted measure of the outcome of an experiment also called the dependent variable

Lecture 24 Floods and flood frequency

Random Variables, Sampling and Estimation

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

Continuous Data that can take on any real number (time/length) based on sample data. Categorical data can only be named or categorised

NUMERICAL METHODS COURSEWORK INFORMAL NOTES ON NUMERICAL INTEGRATION COURSEWORK

µ and π p i.e. Point Estimation x And, more generally, the population proportion is approximately equal to a sample proportion

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference

Properties and Hypothesis Testing

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc.

Chapter 6. Sampling and Estimation

BIOS 4110: Introduction to Biostatistics. Breheny. Lab #9

Data Description. Measure of Central Tendency. Data Description. Chapter x i

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

The Binomial Theorem

Analysis of Experimental Data

Read through these prior to coming to the test and follow them when you take your test.

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

Activity 3: Length Measurements with the Four-Sided Meter Stick

PH 425 Quantum Measurement and Spin Winter SPINS Lab 1

Recall the study where we estimated the difference between mean systolic blood pressure levels of users of oral contraceptives and non-users, x - y.

STP 226 EXAMPLE EXAM #1

CHAPTER I: Vector Spaces

Chapter 12 Correlation

Power and Type II Error

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

BIOSTATS 640 Intermediate Biostatistics Frequently Asked Questions Topic 1 FAQ 1 Review of BIOSTATS 540 Introductory Biostatistics

This is an introductory course in Analysis of Variance and Design of Experiments.

Math 113 Exam 3 Practice

Expectation and Variance of a random variable

Chapter 2 Descriptive Statistics

Estimation for Complete Data

Example: Find the SD of the set {x j } = {2, 4, 5, 8, 5, 11, 7}.

Measures of Spread: Variance and Standard Deviation

SNAP Centre Workshop. Basic Algebraic Manipulation

Tennessee Department of Education

Sample Size Determination (Two or More Samples)

Analysis of Experimental Measurements

Inferential Statistics. Inference Process. Inferential Statistics and Probability a Holistic Approach. Inference Process.

Chapter 6 Part 5. Confidence Intervals t distribution chi square distribution. October 23, 2008

Chapter 8: STATISTICAL INTERVALS FOR A SINGLE SAMPLE. Part 3: Summary of CI for µ Confidence Interval for a Population Proportion p

Stat 139 Homework 7 Solutions, Fall 2015

Parameter, Statistic and Random Samples

MA131 - Analysis 1. Workbook 2 Sequences I

Overview. p 2. Chapter 9. Pooled Estimate of. q = 1 p. Notation for Two Proportions. Inferences about Two Proportions. Assumptions

Mathematical Notation Math Introduction to Applied Statistics

Some examples of vector spaces

Correlation Regression

STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to:

CHAPTER 8 FUNDAMENTAL SAMPLING DISTRIBUTIONS AND DATA DESCRIPTIONS. 8.1 Random Sampling. 8.2 Some Important Statistics

Lesson 10: Limits and Continuity

Riemann Sums y = f (x)

Topic 10: Introduction to Estimation

The Sample Variance Formula: A Detailed Study of an Old Controversy

Computing Confidence Intervals for Sample Data

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Summary: CORRELATION & LINEAR REGRESSION. GC. Students are advised to refer to lecture notes for the GC operations to obtain scatter diagram.

ENGI 4421 Confidence Intervals (Two Samples) Page 12-01

STAT 350 Handout 19 Sampling Distribution, Central Limit Theorem (6.6)

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

multiplies all measures of center and the standard deviation and range by k, while the variance is multiplied by k 2.

A statistical method to determine sample size to estimate characteristic value of soil parameters

Module 1 Fundamentals in statistics

A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

Polynomial Functions and Their Graphs

AP Statistics Review Ch. 8

Exponents. Learning Objectives. Pre-Activity

WHAT IS THE PROBABILITY FUNCTION FOR LARGE TSUNAMI WAVES? ABSTRACT

Transcription:

Chapter : Measures of Variatio from Statistical Aalysis i the Behavioral Scieces by James Raymodo Secod Editio 97814669676 01 Copyright Property of Kedall Hut Publishig CHAPTER Measures of Variatio Key Cocepts Deviatio from Mea Rage Iterquartile Rage Semi-Iterquartile Rage Variace Stadard Deviatio Explaiig Variace Itroductio Measures of variatio refer to a group of statistics that is iteded to provide us with iformatio o how a set of scores are distributed. A examiatio of measures of variatio is a logical extesio of ay descriptio of a data set usig the measures of cetral tedecy that we examied i the previous chapter. Cosider a case where there are two sectios of a course i statistics, ad you are told that each sectio is taught by the same professor, each sectio has a erollmet of 1 studets, ad that the mea, ad media score o a recet examiatio is 80 i both sectios of the course. Without ay additioal iformatio you would be tempted to coclude that the performace of the studets i the two sectios of the course is reasoably similar. As a matter of fact, all of the iformatio up to this poit would suggest that the performace of the studets i the two sectios is idetical. Now suppose that you are show the actual performace of each studet o the examiatio i both sectios of the course (see Figure.1). Clearly the performace of the studets i the two sectios is radically differet. The score of 80 is ot oly the mea of sectio 1, but also a score that seems to be more represetative of the performace of the etire class. While ot all of the studets scored 80, more were at that score 87

Chapter : Measures of Variatio from Statistical Aalysis i the Behavioral Scieces by James Raymodo Secod Editio 97814669676 01 Copyright Property of Kedall Hut Publishig 88 Chapter : Measures of Variatio Sectio 1 Sectio f 8 7 6 4 3 1 f 8 7 6 4 3 1 0 60 70 80 90 Score o examiatio 1; mea 80; media 90 100 0 60 70 80 90 Score o examiatio 1; mea 80; media 90 100 Figure.1 Graph showig distributios for the two sectios tha ay other, ad the umber of studets scorig higher or lower tha 80 falls off the further the scores deviate from 80. The performace of the studets i sectio is far differet eve though the distributio has the same mea ad media score as the first distributio. I sectio the mea of 80 is ot at all represetative of the typical performace. I fact, oly oe studet eared a score at the mea. Seve studets eared perfect scores of 100, while the remaiig seve studets eared a very low score of oly 60. Just as the mea is a sigle umber that is desiged to tell us where the cetral poit of a distributio of scores is located, measures of variatio are sigle umbers that are desiged to tell us how the idividual scores are distributed. By examiig both a measure of cetral tedecy such as the mea, ad a appropriate measure of variatio, we will be able to kow ot oly where the cetral poit of a distributio is located, but also if it teds to look more like the distributio of scores i sectio 1 or if it looks more like the distributio of scores i sectio. The measures of variatio examied i this chapter ca be divided ito two groups. The first group of statistics measures variatio i a distributio i terms of the distace from the smaller scores to the higher scores. Icluded i this group of measures of variatio is the rage, which is a simple measure of the variatio i a distributio computed by examiig the distace from the smallest score to the largest score. Also icluded i this group of statistics are the iterquartile rage (IQR), ad the semi-iterquartile rage (SIQR). These latter two measures of variatio are ofte used i educatioal research. The secod group of statistics measures variatio i terms of a summary measure of each score s deviatio from the mea. The two statistics of this type that we will examie are the variace, ad the stadard deviatio. The measures of variatio based o deviatio from the mea ted to be more useful, ad are fudametal cocepts i behavioral sciece research. The Rage, Iterquartile Rage, ad Semi-iterquartile Rage Rage The simplest measure of variatio i a distributio of data is the rage. The rage is defied as the distace from the smallest observed score to the largest observed score i a set of data. For raw data, the rage may be computed by subtractig the lower limit of the smallest observed score from the upper limit of the largest observed score. Cosider the set of raw data below: : 4 7 8 10 1 1 18 1 The rage is (1. 1.) 0.

Chapter : Measures of Variatio from Statistical Aalysis i the Behavioral Scieces by James Raymodo Secod Editio 97814669676 01 Copyright Property of Kedall Hut Publishig Chapter : Measures of Variatio 89 A simple alterative is to subtract the smallest observed score from the largest observed score, ad the add 1 additioal uit of measuremet to compesate for the upper ad lower limits. I this case, the rage is (1 ) + 1 0 I either case, the set of data rages over 0 uits, just as if you bega readig a book o page ad cotiued through page 1 you would have read a total of 0 pages. Keep i mid whe usig the alterative method of subtractig the smallest observed score from the largest observed score that we add 1 uit of measuremet, ot just 1. For example, if we had the followig data o icome: Icome: 1,000 14,000 18,000 3,000 46,000 8,000 The rage would be computed as: (8,000 1,000) + 1,000 47,000 It would ot be correct to compute the rage as: (8,000 1,000) + 1 46,001 The computatio is similar for data orgaized i a simple or full frequecy distributio as illustrated below. f Cf 1 7 0 10 3 13 8 3 10 7 7 6 Σ f 0 I a simple frequecy distributio where the iterval size i 1, we do ot lose ay precisio i measuremet. I this case the rage would be computed as (1..) 7 or, alteratively: (1 6) + 1 7. I a frequecy distributio where the iterval size i > 1, we follow the same geeral procedure, but use the lower limit of the smallest iterval, ad the upper limit of the largest iterval as our parameters for the computatio of the rage. For the data below: f Cf 9 9 7 0 4 16 66 1 19 0 0 10 14 16 30 9 10 14 0 4 4 4 Σf 7

Chapter : Measures of Variatio from Statistical Aalysis i the Behavioral Scieces by James Raymodo Secod Editio 97814669676 01 Copyright Property of Kedall Hut Publishig 90 Chapter : Measures of Variatio the rage would be computed as: [9. ( 0.)] 30 or, alteratively (9 0) + 1 30. Iterquartile rage The iterquartile rage (IQR) is defied as the distace from the 7th percetile to the th percetile i a set of data. I the previous chapter o cetral tedecy we saw that a few extreme scores at oe ed of the distributio ca bias a measure such as the mea. The same situatio is true for a measure of variatio such as the rage. A few extreme scores at oe or the other ed of a distributio will affect the size of the rage. The IQR is a alterative measure of variatio that elimiates the effect of the extreme scores i a distributio by reportig the rage betwee the 7th ad th percetiles. I effect, the IQR represets the rage of the middle 0% of the distributio, ad igores the top % ad bottom % of the data that may be subject to extreme scores. Computig the IQR is as simple as subtractig the th percetile from the 7th percetile: IQR P7 P The formula for fidig a particular percetile from Chapter 3 is provided below, alog with the frequecy distributio previously used to illustrate the procedure. To compute the IQR we will eed to fid both the th percetile ad the 7th percetile. f Cf 9 0 1 0 4 10 1 19 8 83 10 14 0 9 3 0 4 10 10 Σ f 1 The geeral formula for fidig a give percetile is give by the equatio below. P x LL+ FP CF f it below i where P x F P the desired percetile the umber of frequecies below the desired percetile CF below the value from the cumulative frequecy colum for the iterval just below F P LL the lower limit of the iterval cotaiig F P f it the umber of frequecies i the F P iterval i the iterval size. For the th percetile: F P (.) (1) 31. For the 7th percetile: F P (.7) (1) 93.7

Chapter : Measures of Variatio from Statistical Aalysis i the Behavioral Scieces by James Raymodo Secod Editio 97814669676 01 Copyright Property of Kedall Hut Publishig Chapter : Measures of Variatio 91 The itervals cotaiig the th ad 7th percetiles are idicated below: f Cf 9 0 1 0 4 10 <--(F P 93.7 is i this iterval) 1 19 8 83 <--(CF below 83) 10 14 0 9 3 <--(F P 31. is i this iterval) 0 4 10 10 <--(CF below 10) Σ f 1 To fid the th percetile we simply substitute appropriate values ito the formula as follows: 31. 10 P 4. + P 4. 1. + P 4. + (.8 ) P 4. + 4. 8.7 To fid the 7th percetile we simply substitute appropriate values ito the formula as follows: 93.7 83 P7 19. + 10.7 P7 19. + P7 19. + (.49 ) P7 19. +.4 1.9 Havig computed both the th ad the 7th percetiles, we may ow compute the IQR: IQR 1.9 8.7 13. The resultig value of 13. for the IQR idicates that there is a rage of 13. poits for the middle 0% of the distributio. The advatage of the IQR over the simple rage is that ay bias that might result from a few extremely high scores or a few extremely low scores (or both) has bee elimiated. Semi-iterquartile rage The fial rage based measure of variatio preseted i this chapter is the semi-iterquartile rage. The cocept iterquartile rage suggests a measure of variatio based o a quartile, or % of the distributio; however, the IQR actually represets the rage of the middle 0% of the distributio. The SIQR is

Chapter : Measures of Variatio from Statistical Aalysis i the Behavioral Scieces by James Raymodo Secod Editio 97814669676 01 Copyright Property of Kedall Hut Publishig 9 Chapter : Measures of Variatio a alterative to the IQR that comes closer to represetig a quartile or % size rage i the distributio. The SIQR is simply the IQR divided by. SIQR IQR I the case of our previous example, the SIQR is SIQR 13. The IQR ad the SIQR are widely used i educatio research where there always seems to be oe or two studets at each extreme of the distributio. 6.6 The Variace ad the Stadard Deviatio Variace The variace ad the stadard deviatio are two measures of variatio that are based o the cocept of deviatio from the mea. For ay distributio of scores measured o a cotiuous scale we ca compute a mea, ad the measure the distace of each score from the mea. For example, the set of 6 scores preseted below have a mea equal to 8. 13 11 9 7 3 We may the defie deviatio from the mea (di) as the distace of each score from the mea, or d i Usig our set of 6 scores, we may the calculate the deviatio from the mea for each score. di 13 (13 8) 11 (11 8) 3 9 (9 8) 1 7 (7 8) 1 ( 8) 3 3 (3 8) Oe way we might costruct a summary measure of variatio i a distributio of scores is to compute the average deviatio of each score from the score s mea. To do this, we would simple sum the idividual deviatios from the mea which we have just calculated, ad the divide by the umber of observatios we have.

Chapter : Measures of Variatio from Statistical Aalysis i the Behavioral Scieces by James Raymodo Secod Editio 97814669676 01 Copyright Property of Kedall Hut Publishig Chapter : Measures of Variatio 93 Σ di + 3+ 1 + ( 1) + ( 3) + ( ) 6 Σ di 0 0 6 It may seem quite logical to costruct a measure of variace by calculatig the average deviatio from the mea for a set of scores, but there is oe small problem. The sum of the deviatios from the mea for all distributios is always the same thig 0. Σ di 0 Oe solutio to this problem is to base our measure of variace o the squared deviatio from the mea. By squarig the result of ( ), we will elimiate the egative umbers, ad prevet the egative deviatios ad positive deviatios from cacelig each other out. Applyig this strategy to our origial distributio will give us the followig result: ( ) 13 (13 8) () 11 (11 8) 3 (3) 9 9 (9 8) 1 (1) 1 7 (7 8) 1 ( 1) 1 ( 8) 3 ( 3) 9 3 (3 8) ( ) Now if we wat to costruct a measure of variatio that gives us a sigle umber represetig the average variatio of each score i a distributio, we ca use the mea (or average) of the squared deviatios of each score from the distributio mea. We eed oly sum the squared deviatios from the mea, ad the divide by the umber of observatios. Σ( ) + 9+ 1+ 1+ 9+ 6 Σ( ) 70 11.67 6 The resultig value idicates that the mea squared deviatio of each score from the distributio mea is 11.67; or stated differetly, o average, the distace squared of each score from the mea is 11.67 uits. The statistic we call the variace represets the mea squared deviatio from the mea for a set of data. The logical formula for the variace is simply: where each value of i the distributio the mea of the distributio the sample size. ( ) Variace Σ

Chapter : Measures of Variatio from Statistical Aalysis i the Behavioral Scieces by James Raymodo Secod Editio 97814669676 01 Copyright Property of Kedall Hut Publishig 94 Chapter : Measures of Variatio What does the variace really tell us? Recall the situatio from earlier i the chapter where we had two distributios represetig the performace of two sectios of a class o a exam with the same umber of observatios, the same mea, ad the same media. Yet, we could see by simple ispectio that the two distributios were very differet. The variace of the distributio will tell us how represetative the mea is of each of the scores i the distributio. The closer each idividual score is to the mea the smaller the variace will be. If each score is at the mea i a distributio the variace will equal zero, idicatig that there is o variatio from the mea across the etire distributio. The farther each of the idividual scores is from the mea the greater the variace will be, idicatig that the mea is ot as typical of the idividual scores i the distributio. Examie the three simple distributios below. A B C 10 10 8 8 10 6 6 6 6 4 6 4 Each distributio cotais observatios, ad each distributio has a mea equal to 6. Yet the observatios differ with respect to how much the idividual scores vary from the mea. Sice the variace represets the average squared deviatio from the mea, which distributio would you expect to have the greatest variace? Which distributio should have the smallest variace? The mea value of 6 seems to be most typical of the scores i distributio C, so it should have the smallest variace. The scores i distributio B are much farther from the mea value of 6, so it should have the largest variace. The scores i distributio A appear somewhat i betwee, ad should have a variace betwee that of distributio B ad distributio C. Let s calculate the variace for each distributio below: A ( ) ( ) B ( ) ( ) C ( ) ( ) 10 4 16 10 4 16 8 4 8 4 10 4 16 6 0 0 6 0 0 6 0 0 6 0 0 4 4 4 16 6 0 0 4 16 4 16 4 4 Σ( ) 40 Σ( ) 64 Σ( ) 8 Variace: 40 8 64 1.8 8 1.6 As suspected, the variace i distributio C is the smallest. For distributio C the average squared deviatio from the mea is 1.6 uits. Distributio B has the largest variace with a average squared deviatio of 1.8 uits. Distributio A is i betwee the two with a average squared deviatio from the mea of 8 uits. It is importat to realize that the size of the variace does ot have ay special uderlyig stadard iterpretatio. The value of the variace does ot have a special meaig like your blood pressure, where

Chapter : Measures of Variatio from Statistical Aalysis i the Behavioral Scieces by James Raymodo Secod Editio 97814669676 01 Copyright Property of Kedall Hut Publishig Chapter : Measures of Variatio 9 you kow that you are i reasoably good coditio with a systolic blood pressure of 10 ad a diastolic pressure of 80. There is o ormal or abormal rage for the variace. What the variace is tellig you is simply what the squared distace is from the typical or average score to the mea. Variace, alog with the mea, ca the allow you to have a idea of what a particular distributio might look like, ad will allow you to judge how well the mea serves as a measure of cetral tedecy. Cosider the case of the three simple distributios we just examied. Typically, iformatio for such distributios would ot provide idividual scores, but would be preseted i summary form as follows: Distributio Summary Statistics A B C Sample Size: Mea: 6 6 6 Variace: 8.0 1.8 1.6 Eve though we do ot have the idividual scores available, we ca reach some fairly accurate coclusios about what each of these distributios would look like. For example, we ca see that all three distributios are of the same size ( ), ad that all three have the same mea ( 6). The fact that the variace for distributio C is oly 1.6 idicates that most of the idividual scores i the distributio should be very close to the value of 6. After all, a distributio where every score is equal to the mea will have a variace of zero (remember that variace ca ever be egative). Similarly, we would assume that the idividual scores i distributio B must be much more diverse or spread out aroud the mea sice the variace is so much larger. A computatioal formula for variace Up to this poit we have utilized a logical formula for the variace that is useful for demostratig how the variace is computed, but requires us to go through some uecessary steps. A computatioal formula may be used, which simplifies the calculatios, especially whe a larger data set is ivolved. The computatioal formula preseted below may look a little more difficult at first, but with a little experiece usig it you will likely fid it to be much easier. Σ Variace ( ) Σ where Σ the sum of the s squared (Σ) the quatity, sum of s squared the sample size. Notice that the computatioal formula for variace cotais both the sum of s squared term, ad the quatity sum of s squared term. ( ) The term i the umerator: Σ Σ represets the sum of the squared deviatios from the mea that was formally writte as Σ( ) This term is also sometimes referred to as simply the sum of squares, ad will play a role i several statistical procedures that will be examied i later chapters.

Chapter : Measures of Variatio from Statistical Aalysis i the Behavioral Scieces by James Raymodo Secod Editio 97814669676 01 Copyright Property of Kedall Hut Publishig 96 Chapter : Measures of Variatio Let s demostrate that the computatioal formula for the variace will provide the same results that we previously obtaied for our three distributios. A B C 10 100 10 100 8 64 8 64 10 100 6 36 6 36 6 36 6 36 4 16 4 6 36 4 4 4 16 Σ 30 Σ 0 Σ 30 Σ 44 Σ 30 Σ 188 (Σ ) 900 (Σ ) 900 (Σ ) 900 Variace Computatio Distributio A Distributio B Distributio C 0 900 44 900 188 900 0 180 44 180 188 180 40 64 8 8 1.8 1.6 I each case we obtai the same result for the variace with the computatioal formula that we previously obtaied with the logical formula. Remember that the variace will ever be egative. If your calculatio of the variace results i a egative umber, you ca be sure that you have made a error somewhere. Cofusig the Σ with the (Σ ) i the computatioal formula is a commo mistake that will result i a egative umber, as is eglectig to divide (Σ ) by before subtractig the result from Σ. Some importat termiology ad symbols for variace At this poit we eed to itroduce some termiology, ad appropriate symbols for the variace. There are three situatios where we might wat to calculate the variace. The logical formula ad its computatioal alterative that we have bee usig are appropriate for two of the three situatios. Recall from Chapter 1 the distictio betwee a populatio represetig the etire collectio of uits of iterest i a research project, ad a sample that is the smaller subgroup that we select for actual observatios. The variace for a populatio is represeted by the lower case Greek letter sigma squared, or σ. The populatio variace σ may be calculated usig the formula that we have worked with up to this poit. The variace for a sample of data, whe we are oly iterested i describig the sample, is represeted by the upper case letter S, ad it too may be calculated with the formula that we have worked with up to this poit.

Chapter : Measures of Variatio from Statistical Aalysis i the Behavioral Scieces by James Raymodo Secod Editio 97814669676 01 Copyright Property of Kedall Hut Publishig Chapter : Measures of Variatio 97 However, if you also recall from Chapter 1, we made a distictio betwee descriptive statistics that are used to describe a set of data ad iferetial statistics that are used to ifer somethig about a populatio parameter by the observatio of sample statistics. We ofte have a iterest i doig just that i behavioral sciece research, or at the very least we are iterested i beig able to geeralize our results to a larger populatio. It turs out that if you were to actually kow the value of a populatio s variace, ad the take a series of samples from the populatio ad compare the sample variaces computed for each sample to the actual populatio variace, you would fid that the sample variaces ted to uderestimate the true size of the populatio variace. The sample variace is sometimes referred to as a biased estimator of the populatio variace, ad the directio of the bias is to uderestimate the true size of the populatio variace. We ca reduce the bias of the estimate of the populatio variace whe usig data from a sample by makig a slight adjustmet i the formula for the sample variace whe we iteded it to serve as a estimate of the populatio variace. Sice the directio of the bias is to uderestimate the true populatio variace, we ca icrease the size of the estimated variace by usig the value 1 i the deomiator of the variace formula i place of the usual deomiator. We will the use the lower case letter s to represet a sample variace that is beig used as a estimate of the populatio variace. It might be useful at this poit to review the computatioal formulas for the three situatios for computig variace. Symbol Situatio Computatioal Formula Σ σ Populatio Variace σ Σ S Sample Variace S Σ s Sample Variace used to estimate the populatio variace σ s ( ) Σ ( ) Σ ( ) Σ 1 1 as degrees of freedom The use of 1 i the deomiator of the formula for variace is the result of the cocept of degrees of freedom. Oe of the thigs that we observed whe lookig at deviatio from the mea was that the sum of the deviatios from the mea always equaled zero. However, whe dealig with sample data from a populatio there is o guaratee that the sum of the deviatios of the idividual sample scores from the populatio mea will actually equal zero. (Do t be cofused by the fact that the sum of the deviatios of the sample scores from the sample mea will equal zero. What we are cocered with here is whether or ot the sum of the deviatios of the sample scores from the populatio mea will equal zero.) Cosider this example of a populatio of N 10 with a mea μ 30. (You may wat to verify that the sum of the deviatios of the populatio scores from the populatio mea is actually zero.) Suppose I select a sample of scores from the populatio as idicated below. We ca calculate a sample mea for the five scores that were selected from the populatio, ad if we calculate the sum of the deviatios from the sample mea for the five scores i the sample we will i fact fid the sum equal to. But is the sum of the deviatios of the sample scores from the populatio mea equal to zero? No! The sum of the deviatios from the sample scores to the populatio mea is actually 30. How

Chapter : Measures of Variatio from Statistical Aalysis i the Behavioral Scieces by James Raymodo Secod Editio 97814669676 01 Copyright Property of Kedall Hut Publishig 98 Chapter : Measures of Variatio Describig a populatio or a sample variace Estimatig the populatio variace by observatio of a sample Σ ( Σ) σ Σ ( Σ ) S Populatio Sample s Σ (Σ) 1 Figure. Diagram of populatio ad sample with the three formulas Populatio 10 30 4 1 3 4 0 3 40 N 10 µ 30 Sample 10 1 0 30 4 4 Figure.3 Illustratio of populatio data ad sample draw from it may of the sample scores would I have to chage (or cotrol) if I wated to force the sum of the deviatios of the sample scores from the populatio mea to equal zero? The aswer is that oly oe of the sample scores must be cotrolled. I ca always esure that the sum of the deviatios of the sample scores from the populatio mea will equal zero if I ca cotrol oe of the sample scores. It does ot matter which score I choose to cotrol. I simply must be sure that the value of the score I choose will result i a deviatio from the populatio mea that whe added to the other deviatios from the populatio mea will give me a sum of zero. Sice I must cotrol oly oe score, it meas that the other scores are free to take o ay value. I other words, I have 1 degrees of freedom. The cocept of degrees of freedom will be examied agai whe we begi our ivestigatio of hypothesis testig i later chapters. It is easy to be cofused by the cocept of degrees of freedom, or the differece betwee describig a sample variace through the use of oe formula, ad estimatig the populatio variace from sample data usig a differet formula. Like may of the cocepts i statistics, it takes time for reflectio ad experiece before everythig falls ito place. At this poit, the most importat thig to keep i mid is what variace tells us about a distributio of scores. Remember, the larger the variace the less likely that the idividual scores of a distributio are close to the mea, ad the less likely that the mea is a good idicator of what the typical score i a distributio was. The smaller the variace the more likely that the idividual scores of the distributio are close to the mea, ad the more likely that the mea is a good idicator of the typical score i a distributio. We have examied the logical ad computatioal formulas for calculatig the variace whe usig raw data or a set of idividual scores. We will follow a patter similar to what we did i Chapter 4 o measures of cetral tedecy ad also briefly examie the method for computig variace whe data are preseted i a simple frequecy distributio of size i 1, ad a frequecy distributio whe i > 1. It is useful to examie these two situatios sice we do ot always cotrol the way data are preseted to us. But before movig to these other approaches I wat to itroduce the cocept of stadard deviatio, which is closely related to variace.

Stadard deviatio Chapter : Measures of Variatio from Statistical Aalysis i the Behavioral Scieces by James Raymodo Secod Editio 97814669676 01 Copyright Property of Kedall Hut Publishig Chapter : Measures of Variatio 99 May more people have heard the term stadard deviatio tha the umber who actually kows what it meas. But what is a stadard deviatio? The stadard deviatio of a set of data is simply the square root of the variace. Just as the variace ca be defied as the mea of the squared deviatios from the mea for a set of data, the stadard deviatio ca be defied as the square root of the mea of the squared deviatios from the mea for a set of data. You have every reaso to be woderig why do we care about the stadard deviatio whe we already kow the variace? There are really two reasos. First, recall that we were ot able to base a measure of variatio i a set of data o the simple deviatio from the mea due to the fact that the sum of the simple deviatios from the mea was always the same thig zero. We elimiated that problem by squarig the deviatios from the mea, ad usig the sum of the squared deviatios as the basis for our measuremet of variace. But i solvig oe problem we artificially iflated our measure of variace whe we squared all of the deviatios from the mea. I oe sese you ca thik of the stadard deviatio as beig a measure of variatio that is more i lie with what we were itedig to measure i the first place sice by takig the square root of the variace we are usquarig the deviatios from the mea. The iformatio coveyed by the variace ad the stadard deviatio is essetially the same, but the stadard deviatio is usually a much smaller value. There is a exceptio of course whe the variace is less tha 1.00, sice the square root of a umber greater tha zero ad less tha 1.00 is larger tha the origial umber. For example, the square root of 0.36 is the larger value 0.60, but i most cases the variace is a much larger umber, ad we will fid it much easier to work with the smaller value of a stadard deviatio. However, it is the secod reaso for workig with the stadard deviatio that is much more importat. It turs out that by kowig the mea ad the stadard deviatio for certai types of distributios we ca also kow a great deal about how the idividual observatios i that distributio are orgaized. The mea ad the stadard deviatio will tell us a great deal about ay distributio that is ormal i form. We discussed the idea of a distributio beig ormal i form or havig a bell-shaped curve i Chapter 3, ad we will be examiig the cocept of a ormal distributio i great detail i the ext chapter. Furthermore, the stadard deviatio, ad a related cocept of the stadard error that is based o the stadard deviatio, will be key terms whe we begi our ivestigatio of hypothesis testig i later chapters. Just as we represeted the populatio variace by the symbol of lower case sigma squared (σ ), we represet the symbol for a populatio stadard deviatio by the lower case sigma (σ). Similarly, a sample stadard deviatio is represeted by the upper case S, ad a stadard deviatio that is beig used to ifer a populatio stadard deviatio is symbolized by a lower case s. We will still have the same problem with stadard deviatio that we had with variace whe we attempt to use sample data to estimate the populatio stadard deviatio. We will use the same techique of alterig the deomiator of the formula by usig 1 i place of the usual deomiator whe we wish to estimate the populatio stadard deviatio. With this i mid, we ca preset both the logical formula ad the computatioal formula for the stadard deviatio. Logical Formula for Stadard Deviatio: Σ( ) σ or S Computatioal formula for Stadard Deviatio: σ or S Σ ( ) Σ

Chapter : Measures of Variatio from Statistical Aalysis i the Behavioral Scieces by James Raymodo Secod Editio 97814669676 01 Copyright Property of Kedall Hut Publishig 100 Chapter : Measures of Variatio Whe usig data from a sample to estimate a populatio stadard deviatio we would simply substitute 1 i the deomiator of the formula. For example, the computatioal formula would become: Computatioal formula for Sample Stadard Deviatio used to estimate the Populatio Stadard Deviatio σ: s Σ ( ) Σ 1 Techically, the correctio factor of 1 is sufficiet to adjust for the bias whe usig the sample variace s as a estimate of the populatio variace σ. However, the sample based stadard deviatio s will still be a biased estimator of the populatio stadard deviatio σ eve whe usig the correctio factor of 1. This is especially true whe workig with a very small sample size. Fortuately, oce the sample size moves above a 0 or so, the bias becomes very slight, ad because may research applicatios i the behavioral scieces ivolve a large sample size we eed ot worry i most cases. Those of you with statistical fuctio calculators might take a momet to examie your fuctio keys. May statistical calculators will automatically compute the mea ad stadard deviatio for a set of data, ad some will give you a choice o how you wat the stadard deviatio computed. You may see two keys marked as: σ, ad σ 1, providig you with a choice of a descriptive or a iferetial computatio of the stadard deviatio. Computig the stadard deviatio is relatively simple. Just fid the variace, ad the take the square root. For example, we previously computed the variace for three simple distributios, ad obtaied the results of A B C 10 100 10 100 8 64 8 64 10 100 6 36 6 36 6 36 6 36 4 16 4 6 36 4 4 4 16 Σ 30 Σ 0 Σ 30 Σ 44 Σ 30 Σ 188 (Σ ) 900 (Σ ) 900 (Σ ) 900 S 8; S 1.8; ad S 1.6. Stadard Deviatio Computatio Distributio A Distributio B Distributio C 0 900 44 188 900 0 180 40 44 180 64 188 180 8 8 1.8 1.6 SA 8.83 SB 1.8 3.8 SC 1.6 1.6

Chapter : Measures of Variatio from Statistical Aalysis i the Behavioral Scieces by James Raymodo Secod Editio 97814669676 01 Copyright Property of Kedall Hut Publishig Chapter : Measures of Variatio 101 To compute the stadard deviatio we simply took the square root of each of the observed variaces. Computig Variace ad Stadard Deviatio for Frequecy Distributios As we have see earlier, we do ot always cotrol the way data are preseted to us. O occasio we may be preseted with data already i the form of a frequecy distributio with o access to the origial raw scores, ad yet we may still wat to compute a mea, or a variace ad stadard deviatio. Variace ad stadard deviatio may be computed for frequecy distributios by makig a simple adjustmet i the formula followig the same patter we used whe computig the mea for frequecy distributios i Chapter 4. To compute the variace we will eed to alter our origial computatioal formula for raw data: as follows: S S Σ Σf ( ) Σ ( f ) Σ where f each value of times its frequecy f each value of squared times its frequecy We must simply use the umber of frequecies f i each iterval to weight the value of Σ ad (Σ ) i each iterval. Cosider the simple frequecy distributio below, which we used i Chapter 4 to compute a mea. We ca add the appropriate colums to the frequecy distributio to obtai the ecessary sums. f f f 11 1 11 11 9 4 40 8 4 3 6 3 1 7 3 3 9 7 Σf 16 Σf 11 Σf 884 Note that the f colum represets the product of f, ad as such you may obtai it two ways. Cosider the top two etries i the f colum of the frequecy distributio above. You may first square each value, ad the multiply the resultig sum by the appropriate value of f as follows: 11 11 f 1 f 1 11 11 9 81 f f 81 40

Chapter : Measures of Variatio from Statistical Aalysis i the Behavioral Scieces by James Raymodo Secod Editio 97814669676 01 Copyright Property of Kedall Hut Publishig 10 Chapter : Measures of Variatio ad so o, or you might see that you already have the value of f, ad may simply multiply that colum by the value of. Sice multiplicatio is commutative it does ot matter what order you perform the computatio. That is: f f f f So we ca also obtai the f colum as follows: 1 f 1 f 11 f f 11 11 11 9 f f 4 f f 4 9 40 Use which ever method is easier for you. To complete the computatio for the variace we eed oly plug the umbers ito the formula. Keep i mid that is equal to the total umber of observatios i our sample, ot the umber of itervals i which they happe to be categorized. I this case, 16, ot. S S S Σf ( f) Σ 884 144 16 16 884 784 16 100 S 6. 16 So our variace is equal to 6.. To compute the stadard deviatio we would take the square root of the variace. S 6.. Notice that we are assumig that we are iterested i describig the observed sample, ad are ot attemptig to estimate a populatio variace or stadard deviatio. This is evidet from our use of 16 i the formula istead of 1 1. We follow the same geeral procedure whe workig with a frequecy distributio where the iterval size is greater tha 1. The formula for variace must be adjusted just as it was whe workig with the simpler frequecy distributio. We may write the formula as before as: S ( f ) Σ f Σ where the iterval midpoit. We will agai use oe of the frequecy distributios from Chapter 4. Our first step is to fid the midpoit of each iterval. From that poit o we are simply repeatig the procedure that we followed

Chapter : Measures of Variatio from Statistical Aalysis i the Behavioral Scieces by James Raymodo Secod Editio 97814669676 01 Copyright Property of Kedall Hut Publishig Chapter : Measures of Variatio 103 for the frequecy distributio where i 1. We will eed a f colum, ad a f colum to compute the variace. f f f 4 49 6 47 8 13,4 40 44 8 4 336 14,11 3 39 1 37 444 16,48 30 34 10 3 30 10,40 9 9 7 43 6,61 0 4 110,40 Σf 0 Σf 1,73 Σf 63,01 S S S 6301 (173) 0 0 3, 010, 6301 0 0 6301 6004. 0 810. S 6.1 0 We have computed the variace, ad may ow compute the stadard deviatio by simply takig the square root of 6.1. S 6.1 7.0 Variace as Predictio Error (or Cabo Sa Lucas Here I Come!) We have examied the idea of variatio i data i several differet ways such as the rage, the IQR, the SIQR, variace, ad stadard deviatio. Toward that ed, we have spet a great deal of time doig a variety of computatios. While it is importat to be able to take a statistical formula, apply it to a set of data, ad geerate the correct result, it is much more importat to kow why we do it. I other words, what does the resultig statistic mea? What does it tell us about a set of data that we did ot kow before? Up util ow I have stressed the idea that variace is importat because it tells us somethig about the way that idividual scores are distributed aroud their mea. By kowig the size of the variace we kow how represetative the mea is of the idividual scores i the distributio. The variace viewed i those terms is a importat piece of iformatio, but i the behavioral scieces we ca, ad ofte do, look at variace i aother way. We look at variace as a type of predictio error, ad try to fid ways to reduce the size of predictio error. Or you might thik of the process as simply tryig to do a better job of predictig the value of some variable that we cosider importat i our research. Suppose we have a set of data represetig aual icome i thousads of dollars for a group of 10 idividuals.

Chapter : Measures of Variatio from Statistical Aalysis i the Behavioral Scieces by James Raymodo Secod Editio 97814669676 01 Copyright Property of Kedall Hut Publishig 104 Chapter : Measures of Variatio Icome ( 1,000) 30 0 0 1 1 7 40 1 16 Σf 9 We could calculate the mea icome ad would fid that it is 9., or $9,00 per year. By examiig the 10 scores i the distributio you ca see that there is variatio preset. That is, ot everyoe has a icome of 9.; the icomes vary, some are higher ad some are lower. Now suppose I told you that I had the icome of each perso writte o a piece of paper, ad that I was goig to draw the pieces of paper at radom ad let you guess what the perso s icome was. The oly restrictio is that you have to make the same guess each time. You are free to choose ay of the icomes represeted, or ay other value for that matter as your guess. All 10 icomes will evetually be selected, ad I am goig to measure how well you guess by comparig your guess to the icome that is chose. Sice some of your guesses will be too high ad others too low, I am goig to square the differece betwee the actual icome ad your guess to elimiate ay egative umbers. After all 10 icomes have bee selected, I will calculate a mea of your squared differece for each guess as a idicatio of how well you have doe. What icome value would you choose as your guess? You wat to select a value to guess each time that will give you the smallest amout of squared error possible. (Choose wisely because there might be a prize i this for you if you wi. I m thikig maybe some ice luggage ad a trip to Cabo Sa Lucas, but I have t made up my mid yet.) If you examie the distributio you will otice that there are two scores at 1, ad you might be tempted to select the value 1 as your guess sice your mea squared differece for those two cases would be zero. However, your mea squared differece across all 10 icomes would be 7.9 if you select the value 1 as your guess. (You might wat to verify this by subtractig 1 from each of the observed icomes, squarig the differece, ad the averagig the 10 results). Is that a wiig performace? I doubt it, it seems high. Selectig the mode of 1 did ot seem to be a wise strategy. What if you selected the media icome as your guess? If you rearrage the umbers you will fid the media to be equal to 1. If 1 becomes your guess, you will ot be exactly right o ay of the 10 icomes, but your mea squared differece across all 10 icomes will be 437.9, which is much improved over the strategy of selectig 1, but is it the best you could do? Oe fial strategy might be to select the mea value of 9. as your guess. Agai, you will ot be exactly right o ay of the 10 icomes, but how would you do across all 10? By selectig the mea of 9. as your guess, your mea squared differece would be 36.6. That is the best we have see yet, ad i fact, there is o other guess that would be ay better. Now if you thik about how we have measured

Chapter : Measures of Variatio from Statistical Aalysis i the Behavioral Scieces by James Raymodo Secod Editio 97814669676 01 Copyright Property of Kedall Hut Publishig Chapter : Measures of Variatio 10 the accuracy of each guess, you might recogize that the value of 36.6 represets somethig else we have examied i this chapter. We took each score, subtracted your guess of the mea, squared the differece, ad the calculated the mea of the 10 squared deviatios. I other words, we computed the variace. Icome ( 1,000) ( - ) ( - ) 30 0. 0. 0-9. 90. 0 0. 40. 1-14. 10. 1-17. 306. 7 4. 070. 40 10. 110. 1-14. 10. -7. 6. 16-13. 18. Σ( ) 366.0 366.0 S 36.6 10 Sice there is o other sigle value that would serve as a better guess of the idividual scores tha the mea, we ca thik of the variace as the maximum amout of predictio error that we would have to accept whe tryig to predict a idividual score. Or to thik of it a differet way, your best guess of a score i a distributio is the mea, assumig that o additioal iformatio is available to you. (For those of you who origially guessed the mea, the prize committee iforms me that either the trip to Cabo or the luggage is available. You do wi our home game allowig you hours of fu guessig ayoe s icome you please.) Now let s chage the rules a little bit. Suppose before I have you guess the icome value, I am willig to give you oe piece of additioal iformatio. It would be i your best iterest to ask for somethig that might help you better predict icome. What sorts of thigs (variables) are related to icome? You ca probably thik of may thigs, but certaily educatio is a key variable that helps determie oe s icome. Suppose I am willig to tell you if the icome I have selected is that of a perso who has a college degree or ot. Let s also suppose that I am willig to let you provide two differet icomes as your guess; oe for the college graduates, ad oe for the ocollege graduates. (I hope you uderstad that by chagig the rules of the game the trip to Cabo Sa Lucas is defiitely out of the questio for this roud. No, you re ot goig to get the luggage either!) As you might suspect, your best strategy for guessig has chaged. I the light of this ew iformatio, your best strategy is to guess the mea of the college graduates as the selected icome whe you kow I have selected a college graduate, ad to guess the mea of the ocollege graduates whe you kow I have selected a ocollege graduate. Let s look at the icome distributio agai with the ew iformatio added.

Chapter : Measures of Variatio from Statistical Aalysis i the Behavioral Scieces by James Raymodo Secod Editio 97814669676 01 Copyright Property of Kedall Hut Publishig 106 Chapter : Measures of Variatio College Graduate Icome ( 1,000) YES 30 NO 0 YES 0 NO 1 NO 1 College Graduate s Mea 43.4 YES 7 Nocollege Graduate s Mea 1.6 YES 40 NO 1 YES NO 16 The mea icome of the college graduates is 43.4 or $43,400, ad the mea icome of the ocollege graduates is 1.6, or $1,600. How well ca you guess icome ow if you guess the college graduate mea of 43.4 whe you kow the idividual has a college degree, ad guess the ocollege graduate mea of 1.6 whe you kow the idividual does ot have a college degree? We will substitute the appropriate mea ito the calculatio of ( ), ad ( ), ad the compute what we ca thik of as a modified variace (symbolized by S ). College Graduate Icome ( 1,000) ( ) ( - ) YES 30 (30 43.4) -13.4 179.6 NO 0 (0 1.6) 4.4 19.36 YES 0 (0 43.4) 6.6 43.6 NO 1 (1 1.6) -0.6 0.36 NO 1 (1 1.6) -3.6 1.96 YES 7 (7 43.4) 31.6 998.6 YES 40 (40 43.4) -3.4 11.6 NO 1 (1 1.6) -0.6 0.36 YES ( 43.4) -1.4 47.96 NO 16 (16 1.6) 0.4 0.16 Σ( ) 174.40 174.40 S' 17.44 10 Whe we use the mea icome of the college graduates to guess a college graduate s icome, ad the mea icome of the ocollege graduates to guess the mea icome of the ocollege graduates, ad the fid the mea squared deviatio, we arrive at a modified measure of variace; oe that uses the specific group mea i place of the overall group mea. I this case usig educatioal level to help predict icome results i a modified variace of 17.44, which we ca compare with our previous variace of 36.6. By usig the appropriate mea icome for each group we have bee able to reduce the amout

Chapter : Measures of Variatio from Statistical Aalysis i the Behavioral Scieces by James Raymodo Secod Editio 97814669676 01 Copyright Property of Kedall Hut Publishig Chapter : Measures of Variatio 107 of variace by 193.1 poits (36.6 17.44), or we ca express that differece as a percetage of the origial variace ad say that we have reduced the variace by.8%. Usig educatioal level to help predict icome has explaied over half (.8%) of the variace i icome. Not everyoe i our small sample of 10 idividuals has the same icome, variatio is preset. By kowig what the idividual s educatioal level is we are able to explai or accout for over half of the variatio i icome. This idea of beig able to explai or reduce variace i oe variable by kowig the value of a secod variable is oe of the more importat cocepts i statistical aalysis i the behavioral scieces. We will deal with this cocept agai whe lookig at the iterpretatio of correlatio betwee two variables i Chapter 9, ad i assessig the quality of a liear regressio aalysis i Chapter 10. Computer Applicatios 1. Select several variables from the GSS data set, ad geerate descriptive statistics.. Be sure to click o the Optios butto ad request the variace ad rage i additio to the default statistics of mea, stadard deviatio, miimum, ad maximum. 3. Eter data from oe of the short examples from the text ad compare the results to SPSS. Is SPSS usig the variace formula with as the deomiator, or 1? How to do it Ope the GSS data set, ad the click o Aalyze, Descriptive Statistics, ad the Descriptives. Highlight the desired variables ad select them by clickig o the directio arrow. Click o Optios to request additioal statistics such as the variace ad rage. Click o OK to ru the procedure. Clear the GSS data set by clickig o File, New, ad the Data. Use the ew empty Data Editor Scree to iput data from oe of the simple examples from the text. Ru the descriptive statistics procedure with variace requested, ad determie the formula used. Summary of Key Poits Measures of variatio are a group of statistics that idicate how a set of scores is distributed. Some measures of variatio idicate the variatio from the bottom or lower ed of the distributio to the top or the upper ed of the distributio, while other measures of variatio idicate the variatio of each score from a cetral poit such as the mea. The former techiques typically measure the rage of the distributio, while the latter techiques typically measure deviatio from the mea. Rage The rage is the distace from the smallest score i a distributio to the largest score. It is oe of the simplest measures of variatio. Iterquartile Rage The iterquartile rage is the distace from the 7th percetile to the th percetile i a distributio. Semi-Iterquartile Rage The semi-iterquartile rage is the iterquartile rage divided by. Deviatio from the Mea Deviatio from the mea is a measure of each score s distace from the mea. The sum of the deviatios from the mea is always zero. Variace The variace is the average of the squared deviatios from the mea for a set of scores. We square the deviatios to keep the positive ad egative deviatios from cacellig each other out. The smaller the variace the more closely the scores of a distributio are to their mea. Stadard Deviatio The stadard deviatio is the square root of the variace. Degrees of Freedom The umber of values i a sample that are free to take o ay value ad still represet a ubiased estimate of a populatio parameter. Normal i Form A distributio of scores whose shape approximates that of a ormal or bellshaped curve.

Chapter : Measures of Variatio from Statistical Aalysis i the Behavioral Scieces by James Raymodo Secod Editio 97814669676 01 Copyright Property of Kedall Hut Publishig 108 Chapter : Measures of Variatio Reduce Variace The ability to reduce the amout of error whe predictig a variable by makig use of iformatio obtaied from a secod variable. Reducig or explaiig variace is a importat cocept that is cetral to several statistical procedures. Questios ad Problems for Review 1. Compute the rage for the followig sets of data: A. 4 6 8 30 38 49 6 67 7 B. 0.0 1. 3.0 4. 8.9 10.0. Uder what circumstaces would it be wise to compute the IQR or the SIQR? 3. Examie the frequecy distributio below. Do you thik it would be better to report the rage for these data, or the IQR? Why? f Cf 9 99 3 13 90 94 3 13 8 89 0 19 80 84 109 7 79 9 84 70 74 4 6 69 0 31 60 64 4 11 9 7 0 4 Σf 13 4. Compute the rage, IQR, ad SIQR for the data i Problem 3 above.. Examie the summary iformatio preseted below. What ca you coclude about the icome distributio for each occupatioal category? For which groups does the mea seem to be a better measure of cetral tedecy? For which groups is the mea less idicative of overall group icome? Icome Occupatio Mea ($) Stadard Deviatio ($) Accoutat 3,00 8,00 Attorey 7,00 3,100 Egieer 7,800 1,100 Evagelist 66,600 7,800 Physicia 108,000 8,00 Psychic 3,700 4,0 6. Compute the variace ad stadard deviatio for the two sectios of the statistics class illustrated i Figure.1.

Chapter : Measures of Variatio from Statistical Aalysis i the Behavioral Scieces by James Raymodo Secod Editio 97814669676 01 Copyright Property of Kedall Hut Publishig Chapter : Measures of Variatio 109 7. Compute the variace ad stadard deviatio for the simple frequecy distributio below. 0 18 4 1 7 1 4 10 3 f Σf 0 8. What does it mea to be able to explai variace? 9. Scores for the verbal sectio of the SAT are preseted below for a group of 10 studets. A. Compute the variace for the etire group. B. Compute the mea score of the females, ad the mea score of the males. Geder Verbal SAT Female 00 Female 60 Female 48 Female 70 Male 40 Male 39 Male 700 Male 630 Male 40 Male 8 10. How much of the variace i the SAT scores i problem 9 ca be explaied by geder? (Hit: you will eed to use the mea score of the females whe predictig a female score, ad the mea score of the males whe predictig a male score to compute a modified variace).