is the score of the 1 st student, x

Similar documents
Mean is only appropriate for interval or ratio scales, not ordinal or nominal.

Descriptive Statistics

Lecture Notes Types of economic variables

= 1. UCLA STAT 13 Introduction to Statistical Methods for the Life and Health Sciences. Parameters and Statistics. Measures of Centrality

Handout #1. Title: Foundations of Econometrics. POPULATION vs. SAMPLE

2.28 The Wall Street Journal is probably referring to the average number of cubes used per glass measured for some population that they have chosen.

CHAPTER VI Statistical Analysis of Experimental Data

Summary of the lecture in Biostatistics

MEASURES OF DISPERSION

Section l h l Stem=Tens. 8l Leaf=Ones. 8h l 03. 9h 58

Measures of Dispersion

Arithmetic Mean Suppose there is only a finite number N of items in the system of interest. Then the population arithmetic mean is

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ " 1

f f... f 1 n n (ii) Median : It is the value of the middle-most observation(s).

Chapter 5 Properties of a Random Sample

best estimate (mean) for X uncertainty or error in the measurement (systematic, random or statistical) best

Lesson 3. Group and individual indexes. Design and Data Analysis in Psychology I English group (A) School of Psychology Dpt. Experimental Psychology

b. There appears to be a positive relationship between X and Y; that is, as X increases, so does Y.

Multiple Regression. More than 2 variables! Grade on Final. Multiple Regression 11/21/2012. Exam 2 Grades. Exam 2 Re-grades

hp calculators HP 30S Statistics Averages and Standard Deviations Average and Standard Deviation Practice Finding Averages and Standard Deviations

Summary tables and charts

Chapter Statistics Background of Regression Analysis

Econometric Methods. Review of Estimation

Statistics Descriptive

Statistics MINITAB - Lab 5

Continuous Distributions

Statistics: Unlocking the Power of Data Lock 5

Module 7. Lecture 7: Statistical parameter estimation

Measures of Central Tendency

Lecture 3. Sampling, sampling distributions, and parameter estimation

Lecture 1 Review of Fundamental Statistical Concepts

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model

1. The weight of six Golden Retrievers is 66, 61, 70, 67, 92 and 66 pounds. The weight of six Labrador Retrievers is 54, 60, 72, 78, 84 and 67.

ENGI 3423 Simple Linear Regression Page 12-01

12.2 Estimating Model parameters Assumptions: ox and y are related according to the simple linear regression model

Laboratory I.10 It All Adds Up

CHAPTER 4 RADICAL EXPRESSIONS

STATISTICS 13. Lecture 5 Apr 7, 2010

Statistics Descriptive and Inferential Statistics. Instructor: Daisuke Nagakura

Third handout: On the Gini Index

Simple Linear Regression

Chapter 8. Inferences about More Than Two Population Central Values

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

Midterm Exam 1, section 2 (Solution) Thursday, February hour, 15 minutes

Simple Linear Regression

CLASS NOTES. for. PBAF 528: Quantitative Methods II SPRING Instructor: Jean Swanson. Daniel J. Evans School of Public Affairs

Random Variables and Probability Distributions

Module 7: Probability and Statistics

C. Statistics. X = n geometric the n th root of the product of numerical data ln X GM = or ln GM = X 2. X n X 1

Midterm Exam 1, section 1 (Solution) Thursday, February hour, 15 minutes

Discrete Mathematics and Probability Theory Fall 2016 Seshia and Walrand DIS 10b

The variance and standard deviation from ungrouped data

Can we take the Mysticism Out of the Pearson Coefficient of Linear Correlation?

{ }{ ( )} (, ) = ( ) ( ) ( ) Chapter 14 Exercises in Sampling Theory. Exercise 1 (Simple random sampling): Solution:

THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE

Chapter 3 Sampling For Proportions and Percentages

STA 105-M BASIC STATISTICS (This is a multiple choice paper.)

Correlation and Simple Linear Regression

Lecture 3 Probability review (cont d)

CHAPTER 2. = y ˆ β x (.1022) So we can write

Chapter 13 Student Lecture Notes 13-1

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

X ε ) = 0, or equivalently, lim

1 Onto functions and bijections Applications to Counting

Point Estimation: definition of estimators

SPECIAL CONSIDERATIONS FOR VOLUMETRIC Z-TEST FOR PROPORTIONS

For combinatorial problems we might need to generate all permutations, combinations, or subsets of a set.

L5 Polynomial / Spline Curves

Lecture 8: Linear Regression

Introduction to local (nonparametric) density estimation. methods

THE ROYAL STATISTICAL SOCIETY 2016 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE MODULE 5

The equation is sometimes presented in form Y = a + b x. This is reasonable, but it s not the notation we use.

Multiple Choice Test. Chapter Adequacy of Models for Regression

Chapter 8: Statistical Analysis of Simulated Data

Example: Multiple linear regression. Least squares regression. Repetition: Simple linear regression. Tron Anders Moger

Class 13,14 June 17, 19, 2015

ESS Line Fitting

Ordinary Least Squares Regression. Simple Regression. Algebra and Assumptions.

Outline. Point Pattern Analysis Part I. Revisit IRP/CSR

Functions of Random Variables

STA302/1001-Fall 2008 Midterm Test October 21, 2008

EECE 301 Signals & Systems

ENGI 4421 Propagation of Error Page 8-01

Quantitative analysis requires : sound knowledge of chemistry : possibility of interferences WHY do we need to use STATISTICS in Anal. Chem.?

Lecture 02: Bounding tail distributions of a random variable

Utts and Heckard. Why Study Statistics? Why Study Statistics? American Heritage College Dictionary, 3rd Ed.

Chapter 4 Multiple Random Variables

Parameter, Statistic and Random Samples

Chapter 2. 1.what are the types of Data Sets

GOALS The Samples Why Sample the Population? What is a Probability Sample? Four Most Commonly Used Probability Sampling Methods

LECTURE - 4 SIMPLE RANDOM SAMPLING DR. SHALABH DEPARTMENT OF MATHEMATICS AND STATISTICS INDIAN INSTITUTE OF TECHNOLOGY KANPUR

STK4011 and STK9011 Autumn 2016

Assignment 5/MATH 247/Winter Due: Friday, February 19 in class (!) (answers will be posted right after class)

Ideal multigrades with trigonometric coefficients

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #1

A tighter lower bound on the circuit size of the hardest Boolean functions

Block-Based Compact Thermal Modeling of Semiconductor Integrated Circuits

StatiStical MethodS for GeoGraphy

Median as a Weighted Arithmetic Mean of All Sample Observations

Chapter 1 Data and Statistics

Transcription:

8 Chapter Collectg, Dsplayg, ad Aalyzg your Data. Descrptve Statstcs Sectos explaed how to choose a sample, how to collect ad orgaze data from the sample, ad how to dsplay your data. I ths secto, you wll lear how to aalyze the data usg some basc tools from descrptve statstcs. You wll ot have to perform ay complcated calculatos by had; however, t s vtal that you uderstad the cocepts of measurg cetral tedecy, posto, dsperso, ad outlers because these are tools you ca use to aalyze the data from your ow research. You also eed to uderstad the relatoshp betwee these measuremets ad the shape of a frequecy dstrbuto. The measures of cetral tedecy, dsperso, ad the shape of a frequecy dstrbuto are the three basc buldg blocks we wll use to costruct all of the sophstcated cocepts (lke statstcal tests) Chapter ad Chapter. Measures of Cetral Tedecy Whe aalyzg a large ad dverse populato, t s helpful to try to measure characterstcs about ts ceter or mddle. The two ma measures of cetral tedecy are the meda ad the mea. Suppose that you have a lst of scores from a test gve a musc class. The the sample s the class ad the umber of studets your class s the sample sze. The sample sze s represeted by the varable. The meda s the mdpot (or mddle) of the scores whe lsted from smallest to greatest. That meas % of the scores are less tha or equal to the meda ad % of the scores are greater tha or equal to the meda. As you ca tell from the defto, you ca oly calculate a meda f the data ca be ordered or raked from smallest to greatest. Thus, you ca oly calculate a meda f the varable s quattatve (ordal, terval, or rato level). It would ot make sese to calculate a meda for a omal varable. The mode s the score that occurs most frequetly. It s possble for there to be more tha oe mode. For example, f there are te scores: 7, 7, 8, 8, 8, 89, 9, 9, 9, ad 98. The the score of 8 ad 9 occur three tmes ad are the most frequetly occurrg scores. I ths case there are two modes, 8 ad 9. Ulke the other measures of cetral tedecy, you ca calculate the mode for a varable wth a omal level of measuremet. The sample mea s ofte called the sample average. You probably remember from hgh school that you calculate the sample mea by addg all of the scores the sample ad the dvdg by the umber of people the sample. So, f there are studets the sample, the sample mea = x + x + x +... + x where x s the score of the st studet, x s the score of the d studet, etc. Ths s the formula that we wll use throughout the book; however, t wll be wrtte wth slghtly dfferet otato. I addto to beg shorter to wrte, the symbols the otato correspod to the fuctos used by computers to calculate these statstcs. The symbol for the sample mea (the sample average) s X. The Greek letter captal sgma, meas summato (add thgs together). (called the sample sze) s the umber of peces of data you pla to add together. So the symbol x x + x + x +... + x meas. Example: The average of,,, 9, 9 s x + + + 9 + 9 X = = =.

Secto. Descrptve Statstcs 9 I order to calculate a mea, there must be precse measuremets betwee the values of the scores. Thus, you ca oly calculate the sample mea for varables that have a terval or a rato level of measuremet. Measures of Posto Suppose you have a lst of scores from a test gve a musc class. The mmum s the score that s the lowest the class ad the maxmum s the hghest score the class. They are abbrevated M ad Max. The percetle rakg of a score tells you what percetage of the class scored less tha that score. Thus, f Suze scored a 8 o a test ad that was the 7 d percetle, the 7% of the class scored less tha 8. I other words, Suze scored better tha 7% of the people who took the same test. The th percetle s called the frst quartle, the th percetle s called the secod quartle (ths s also the meda), ad the 7 th percetle s called the thrd quartle. The symbols for the st, d, ad rd quartle are Q, Q, ad Q respectvely. % of the scores are less tha Q, % of the scores are betwee Q ad Q, % of the scores are betwee Q ad Q, ad % of the scores are greater tha Q. Thus, the quartles partto the scores to four groups wth a equal umber of studets each group. All of these measures of posto oly apply to scores that ca be ordered or raked, thus you ca oly calculate them for quattatve varables. Measures of Dsperso Suppose that you have a lst of scores from a test gve a musc class. The rage of the scores s the dstace from the lowest score to the hghest score, t measures the overall dstace of the spread of scores the class. The formula s: Rage = Maxmum Mmum. The er quartle rage s the dstace from the frst quartle to the thrd quartle, t measures the dstace across the mddle % of the data (from the th percetle to the 7 th percetle). The abbrevato for the er quartle rage s I.Q.R. The formula s: I.Q.R. = Q Q The rage calculates the overall dstace of the spread of data ad s a somewhat crude measuremet of the dsperso of the data (how spread out the data s). The sample varace s a more refed measuremet of dsperso. A large varace meas that the scores are spread out wth may scores far away from the mea. A small varace meas that oly a few of the scores are far away from the mea (the scores are more clumped together or clustered aroud the ceter). The symbol for sample varace s S. For example: the data,,,, has varace zero (there s o varato). the data,,,, has varace. (there s very lttle varato) the data,,,, has varace. the data,,,, 8 has varace. The formula for sample varace s: S = ( x ) X.

Chapter Collectg, Dsplayg, ad Aalyzg your Data I the examples below, each score the data s blue, the sample sze s gree ad the sample mea s red. Usg the data,,,, we ca see that = ad the average X =. Varace = ( x X ) ( ) + ( ) + ( ) + ( ) + ( ) = =. If we use the data,,, 7, we ca see that = ad the average X =. ( x X ) ( ) + ( ) + ( ) + ( 7 ) Varace = = =. Notce that ths s larger tha. because the data ths set s more spread out. I the formula for varace, ( x X ) s the dfferece betwee a score ad the average. I the formula, ths s squared so that the umber wll always be postve. Notce that the further away the score s from the average, the larger ths umber wll be. Whe we add up all of the squared dstaces betwee each score ad the average, the umber wll be smaller f the scores are close to the average ad larger f the scores are far away from the average. Thus, the varace s smaller f the scores are clumped closer together ad larger f the scores are more spread out. The sample stadard devato s smply the square root of the sample varace. The symbol for sample stadard devato s S, (whch explas why the symbol for sample varace s S ). Sample Stadard Devato = S = ( x ) X. The rage ad the I.Q.R. ca be appled to ay quattatve varable, but both the sample varace ad the sample stadard devato have the sample mea ther calculato, so they oly apply to varables that have a terval or a rato level of measuremet. A box & whskers graph shows the dsperso of the scores by dsplayg each quartle. The left whsker goes from the mmum to the frst quartle, Q. The box goes from the Q to Q. The box has a vertcal le sde of t dcatg the locato of Q. The rght whsker goes from Q to the maxmum. The rage s the dstace betwee the tp of each whsker. The I.Q.R. s the legth of the box. The lowest % of the scores are the left whsker, the mddle % of the scores are sde the box (% o each sde of the le), ad the hghest % of the scores are the rght whsker. Rage 78 % Ier Quartle Rage 78 78 } % % mmum Q Q = the meda Q 7 8 9 maxmum Rage = Maxmum Mmum = = 9 (the total legth of the graph). I.Q.R. = Q Q = 9 = (the legth of the box). % of the grades were betwee ad, % of the grades were betwee ad, % of the grades were betwee ad 9, ad % of the grades were betwee 9 ad. %

Secto. Descrptve Statstcs If the mmum or the maxmum score s really far away from the meda, there wll be some scores that are cosdered outlers. The most wdely used defto for outler s ay score whose dstace from the box s more tha ½ tmes the Ier Quartle Rage. So, f a score s less tha Q. I. Q. R., t s a outler. Or, f a score s greater tha Q +. I. Q. R., t s a outler. I the graph below, Q =, ad Q =, so I. Q. R. = =.. I. Q. R. =. =. Q = ad Q + = 9, so ay score less tha s a outler ad ay score more tha 9 s a outler. Wheever there are outlers, we draw the whsker to the boudary for outlers. Ay scores that are outlers (outsde of these boudares) are draw as sgle pots. mmum outlers % % 78 Q Q Q. I.Q.R. % % 78. I.Q.R. maxmum outlers Left boudary for outlers 7 8 9 Rght boudary for outlers Shapes of Frequecy Dstrbutos The frequecy dstrbuto for a varable shows the frequeces for each of ts values. Bar graphs are ofte used to llustrate frequecy dstrbutos. The shapes of these dstrbutos tell us may thgs about our sample. Below are bar-graphs llustratg the most commo shapes of frequecy dstrbutos. Notce that dfferet expermets ca result dfferet dstrbutos. A de was rolled tmes ad the outcomes were recorded. O the rght s a frequecy graph showg the frequecy for each value. Whle there are fluctuatos due to radom chace, each value appears equally ofte (approxmately). Ths s a uform frequecy dstrbuto. Frequecy 8 Outcome for Rollg a De Ffty persos were gve a co ad told to keep flppg the co utl t came up heads. We recorded how may tmes each perso had to flp the co. The graph o the rght shows a decreasg frequecy dstrbuto (lower umbered values occur more frequetly tha hgh umbered values). Frequecy # of Co Flps Utl a "Head" Appears

Chapter Collectg, Dsplayg, ad Aalyzg your Data Ffty persos ra a -yard dash ad ther tmes were recorded (rouded to the earest secod). The graph o the rght shows a bell-shaped ad symmetrc dstrbuto. It s bell-shaped because values the mddle occur more frequetly tha lower or hgher values. It s symmetrc because the mea ad mode are close to the meda (whch s the mddle). May commoly used dstrbutos have ths shape cludg the most used dstrbuto of all, the ormal dstrbuto (see Chapter ). Frequecy 7 Tme for yd Dash ( sec) A professor was teachg a freshma class choral musc at her college. She otced that half of her studets had atteded hgh schools wth choral musc programs ad half atteded schools that dd ot have a choral musc program. She gave a test (worth pots) durg the frst week of class ad the results are show o the graph at the rght. The tallest bar dcates the mode. Notce that ths case, there are two modes, ad. Ths dstrbuto s called a bmodal frequecy dstrbuto. Frequecy 9 8 7 Score o Choral Test Ms. Smth gave a test her musc class. The scores have bee rouded to the earest %. The graph o the rght shows a bell-shaped dstrbuto that s leftskewed. It s bell-shaped because values ear the mddle occur more frequetly tha lower or hgher values. It s left-skewed because the mea (average score) s to the left of the meda (whch s the mddle). The mea = X =.7 ad the meda =. You ca estmate that t s left-skewed because there s a tal to the left of the mode. Frequecy 7 8 9 Grade o Frst Test Later, Ms. Smth gave a retake of the test. The scores have bee rouded to the earest %. The graph o the rght shows a bell-shaped dstrbuto that s rghtskewed. It s bell-shaped because values ear the mddle occur more frequetly tha lower or hgher values. It s rght-skewed because the mea (average score) s to the rght of the meda (whch s the mddle). The mea = X = 8. ad the meda =. You ca estmate that t s rght-skewed because there s a tal to the rght of the mode. Frequecy 7 8 9 Grade o Retake Test

Secto. Descrptve Statstcs The tallest bar dcates the most frequet occurrece. Thus, the mode for the frst test was a 7 ad the mode for the retake test was a. If you add up all of the heghts of the bars oe class, you wll see that there are 8 studets the class. Thus, the sample sze = = 8. O both tests, scores are less tha ad scores are greater tha. I each case, s the mdpot for the data, so each meda =. You ca eye-ball the skewess of a dstrbuto by lookg to see f there s a log tal o ether sde of the mode. If there s a tal to the left (as o the frst test), the the dstrbuto s left-skewed. If there s a tal to the rght (as o the retake test), the the dstrbuto s rght-skewed. To make a more precse calculato of the skewess of a dstrbuto, you wll eed to calculate the sample mea. You ca calculate the sample mea from the graph by addg up all of the scores ad dvdg by 8. (Remember that the heght of the bar dcates how may of each score you wll eed to add). O the frst test, the mea was.7, whch s to the left of the meda, thus the dstrbuto s left-skewed. O the retake test, the mea was 8., whch s to the rght of the meda, thus the dstrbuto for the retake test s rght-skewed. Some dstrbutos are more skewed tha others, ad f you wat to measure the magtude of the skewess, you ca use the formula below. x X The Coeffcet for Skewess = ( )( ) S s the sample sze, S s the sample stadard devato, ad X s the mea. The symbol meas that we are gog to sum up terms of the form x X S. Notce that f the score x s less tha the mea X, we wll be addg a egatve term. If the score x s greater tha the mea X, we wll be addg a postve term. Thus, f eough scores are far eough to the left of the mea, ths coeffcet wll be egatve dcatg that the dstrbuto s leftskewed. If eough scores are far eough to the rght of the mea, ths coeffcet wll be postve dcatg that the dstrbuto s rght-skewed. The postve or egatve sg dcates the drecto of skewess, the sze of the umber after the sg dcates the degree to whch the dstrbuto s skewed. The more skewed the dstrbuto s, the greater ths umber wll be. I the prevous example, the frst test had a coeffcet for skewess =.7 (left). The retake test had a coeffcet for skewess = +.9 (rght). Thus, the results of the frst test were skewed to a greater degree tha the results for the retake test. You wll ot have to memorze ths formula or perform these calculatos by had. Ths fucto s cluded Mcrosoft Excel, so we wll let the computer calculate ths for us. You ca thk of skewess as pullg your data ether the postve or egatve drecto. Some statstcs are more lkely to be affected by the skewg of data tha others. For example, the data set,,,, has both the meda ad mea equal to. The data set,,,, stll has the meda =, but the mea X =. Thus, whe we added to the last score the frst data set, we skewed the data to the rght ad the mea was skewed ut to the rght. However, the meda s a more reslet measure of cetral tedecy ad t dd ot chage.

Chapter Collectg, Dsplayg, ad Aalyzg your Data Ths has cosequeces that affect how we terpret our data. For example, the varable Aual Icome has a dstrbuto that s rght-skewed (U.S. Cesus, ). % % % Percetage of Populato 8% % % % % $ - $ $ - $ $ - $ $ - $ $ - $ $ - $ $ - $7 $7 - $8 $8 - $9 $9 - $ $ - $ $ - $ $ - $ $ + Aual Icome Ths correspods to the box & whskers graph show at the rght. The two data pots at the far rght are oly kow to be greater tha $, per year, ther precse values are ot kow. $ $, $, $, $, $, As you ca see from the box & whskers graph, there are may outlers the postve drecto. I fact, a come of $ per year s ot a outler, but ayoe earg over $, per year s a outler. These outlers heavly fluece the calculato of the mea. The meda come s $,89 ad mea s X = $,8. However, % of the populato makes less tha the mea. What mght be a good defto of mddle class? Oe defto s the Ier Quartle, because t gves us the mddle % of the populato. People whose come s the bottom % of populato make less tha $, per year ad the people whose come s the top % of the populato make more tha $8, per year. Whch meas the mddle % of the populato ears betwee $, ad $8, per year.