Descriptive Statistics C H A P T E R 5 P P

Descriptive Statistics C H A P T E R 5 P P 1 1 0-130

Graphing data Frequency distributions Bar graphs Qualitative variable (categories) Bars don t touch Histograms Frequency polygons Quantitative variable (ordinal, interval, or ratio scale) Others: Pie chart Stem and leaf Scatterplot

Example grade distribution Class interval frequency distribution A 1 A- 2 B+ 3 B 7 B- 8 C+ 6 C 3 C- 2 D 1 F 1 N = 34

Number per 100,000 population Graphs! Read X and Y axis carefully Death Rates in America 120 100 80 Age 1-4 Age 15-24 60 1980 1981 1982 1983 40 1984 1985 Year 1986 1987 1988 1989

Example bar graph Rauscher, Shaw, & Ky (1993). Mozart Effect 119 111 110 N = 36 college students

Lots of cool graphs! Florence Nightingale s coxcomb diagram Blue: died of sickness; Red: died of wounds; Black: died of other causes

Graph interpretation Careful to read values on each axis graphs can be deceiving! Reminiscence bump Recency effect

Descriptive statistics Data collected in a study = raw data Reports of a study = summary data Descriptive statistics provide that summary Measures of central tendency Describe middleness of distribution of scores Mean Median Mode Measures of variation Describe width or dispersion of a distribution Range Standard deviation Variance

Descriptive statistics Measure of central tendency Mean Mean for population = sum of scores # of scores in distribution μ = ΣX N Mean for sample = sum of scores # scores in distribution M or X = ΣX N

Mean as the balance point The mean balances the distances (or deviations) of all scores Scores (x) 2 2 6 10 X = 20 N = 4 M = 5 Mean 5 5 5 5 Distance from mean -3-3 1 5 X = 0

Effect of changing 1 score X = X / N = M = 26 29 31 32 34 35 38 40 42 83 390 390 / 10 39 X = X / N = M = 26 29 31 32 34 35 38 40 42 33 340 340 / 10 34 The mean is not a robust statistic It is highly influenced by a single outlier score

Adding a constant 26 29 31 32 34 35 38 40 42 83 X + 10 36 39 41 42 44 45 48 50 52 93 390 390 / 10 39 490 490 / 10 49 If you add, subtract, multiply or divide all scores by constant: The same change is made to M

Descriptive statistics Measure of central tendency Mean Mean for population = sum of scores # of scores in distribution µ = X / N Mean for sample = sum of scores # scores in distribution X or M = X / N Median Middle score in distribution Order scores from highest to lowest If N is even number, average the two middle scores

Calculating the median for RTs scores 512 587 590 578 567 533 573 529 577 572 572 591 575 577 534 Median Mean sorted 512 529 533 534 567 572 572 573 575 577 577 578 587 590 591 573 564.47 Add Hi X 512 587 590 578 567 533 573 529 899 572 572 591 575 577 534 573 585.93 Add Lo X 512 587 590 578 567 533 573 529 177 572 572 591 575 577 534 572 537.80 Median is a robust statistic!

Descriptive statistics Measure of central tendency Mean Mean for population = sum of scores / # of scores in distribution µ = X / N Mean for sample = sum of scores / # scores in distribution X or M = X / N Median Middle score in distribution Order scores from highest to lowest If N is even number, average the two middle scores Mode Score that occurs with greatest frequency

Example grade distribution A 1 A- 2 B+ 3 B 7 B- 8 C+ 6 C 3 C- 2 D 1 F 1 N = 34 M = 80.38 Median = 81 Mode = B-

Can have 2+ modes Sample grade distribution with 2 modes 7 6 5 4 3 2 1 0 A A- B+ B B- C+ C C- D F

Types of distributions Normal distribution Bell-shaped Symmetrical Only 1 mode Mean, median, mode all equal Kurtosis: spread of distribution How flat or peaked Mesokurtic: medium peak (like normal distribution) Leptokurtic: tall and thin Platykurtic: flat and broad

Measures of central tendency Indicators of the shape of the distribution How mean, median, and mode change w/ shape of distribution Normal distribution Positive skew Tail to positive scores Negative skew Tail to negative scores Positive skew Negative skew

Which measure of central tendency to use? If interval or ratio data and normally-distributed Use mean If interval or ratio data and there are outliers or a skewed-distribution Use median If nominal data Use mode But, that s not enough info

Measures of variation Range Difference between lowest and highest scores in a distribution = Maximum score minimum score Easily distorted by an outlier (low or high score) Standard deviation Average distance of scores in a distribution from the mean If sum deviations from mean = zero! SO Average deviation: Use absolute values Standard deviation: Use squared deviation scores For population: σ = Σ(X μ)2 N

Example grade distribution A 1 A- 2 B+ 3 B 7 B- 8 C+ 6 C 3 C- 2 D 1 F 1 N = 34 M = 80.38 Median = 81 Mode = B- s = 7.92 M - s = 72.5 M = 80.38 M + s = 88.3 Note: most scores are w/in 8 pts of mean

Calculating standard deviation (σ) 1. Calculate deviation score (score mean) 2. Square deviations 3. Sum squared deviations 4. Divide by N 1. N = # of scores 2. This step = variance 5. Take square root of value RTs x - M (x - M) 2 Avg = 512-52.47 2753.101 587 22.53 507.6009 590 25.53 651.7809 578 13.53 183.0609 567 2.53 6.4009 533-31.47 990.3609 573 8.53 72.7609 529-35.47 1258.121 577 12.53 157.0009 572 7.53 56.7009 572 7.53 56.7009 591 26.53 703.8409 575 10.53 110.8809 577 12.53 157.0009 534-30.47 928.4209 564.4667 8593.734 sum of (X-M) 2 572.9156 Variance: sum divided by N 23.93565 SD: square root of sum/n

Calculating standard deviation (s) 1. Calculate deviation score (score mean) 2. Square deviations 3. Sum squared deviations 4. Divide by N or N - 1 1. This step = variance 2. Use N for population 3. Use N-1 to estimate population from sample 5. Take square root of value RTs x - M (x - M) 2 Avg = 512-52.47 2753.101 587 22.53 507.6009 590 25.53 651.7809 578 13.53 183.0609 567 2.53 6.4009 533-31.47 990.3609 573 8.53 72.7609 529-35.47 1258.121 577 12.53 157.0009 572 7.53 56.7009 572 7.53 56.7009 591 26.53 703.8409 575 10.53 110.8809 577 12.53 157.0009 534-30.47 928.4209 564.4667 8593.734 sum of (X-M) 2 sd = 613.8381 Variance: sum divided by N-1 24.77576 = 24.77576 SD: square root of sum/n-1

Measures of variation Standard deviation of population σ = Σ(X μ)2 N Standard deviation of sample (when estimating population) s = Variance Σ(X M)2 N 1 Population = σ 2 = Σ(X μ)2 N or sample = s 2 = Σ(X μ)2 N

Why use N 1? Sample is less variable than the population Divide by smaller # so yields more conservative estimate of variance or SD Makes variance score larger Use n-1 so can make conclusions about population (not just describe your sample)

Thank you, Excel! For example, if data is in column B from row 1 to 20 Sum: =sum(b1:b20) Mean: =average(b1:b20) Median: =median(b1:b20) Mode: =mode(b1:b20) Maximum score: =max(b1:b20) Minimum score: =min(b1:b20) Range: Subtract Max score from Min score