Summary tables and charts

Similar documents
MEASURES OF DISPERSION

Descriptive Statistics

= 1. UCLA STAT 13 Introduction to Statistical Methods for the Life and Health Sciences. Parameters and Statistics. Measures of Centrality

Mean is only appropriate for interval or ratio scales, not ordinal or nominal.

is the score of the 1 st student, x

Measures of Dispersion

Lecture Notes Types of economic variables

Summary of the lecture in Biostatistics

CHAPTER VI Statistical Analysis of Experimental Data

hp calculators HP 30S Statistics Averages and Standard Deviations Average and Standard Deviation Practice Finding Averages and Standard Deviations

Statistics Descriptive

2.28 The Wall Street Journal is probably referring to the average number of cubes used per glass measured for some population that they have chosen.

Section l h l Stem=Tens. 8l Leaf=Ones. 8h l 03. 9h 58

f f... f 1 n n (ii) Median : It is the value of the middle-most observation(s).

Lesson 3. Group and individual indexes. Design and Data Analysis in Psychology I English group (A) School of Psychology Dpt. Experimental Psychology

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model

Econometric Methods. Review of Estimation

Measures of Central Tendency

Chapter 5 Properties of a Random Sample

Module 7. Lecture 7: Statistical parameter estimation

Lecture 1 Review of Fundamental Statistical Concepts

Class 13,14 June 17, 19, 2015

Arithmetic Mean Suppose there is only a finite number N of items in the system of interest. Then the population arithmetic mean is

A Study of the Reproducibility of Measurements with HUR Leg Extension/Curl Research Line

STA 105-M BASIC STATISTICS (This is a multiple choice paper.)

Lecture 3. Sampling, sampling distributions, and parameter estimation

ESS Line Fitting

LECTURE - 4 SIMPLE RANDOM SAMPLING DR. SHALABH DEPARTMENT OF MATHEMATICS AND STATISTICS INDIAN INSTITUTE OF TECHNOLOGY KANPUR

ENGI 3423 Simple Linear Regression Page 12-01

Continuous Distributions

Simple Linear Regression

12.2 Estimating Model parameters Assumptions: ox and y are related according to the simple linear regression model

Median as a Weighted Arithmetic Mean of All Sample Observations

{ }{ ( )} (, ) = ( ) ( ) ( ) Chapter 14 Exercises in Sampling Theory. Exercise 1 (Simple random sampling): Solution:

Chapter 3 Sampling For Proportions and Percentages

Outline. Point Pattern Analysis Part I. Revisit IRP/CSR

Simple Linear Regression

C. Statistics. X = n geometric the n th root of the product of numerical data ln X GM = or ln GM = X 2. X n X 1

best estimate (mean) for X uncertainty or error in the measurement (systematic, random or statistical) best

Chapter 11 The Analysis of Variance

Simulation Output Analysis

Chapter 8: Statistical Analysis of Simulated Data

For combinatorial problems we might need to generate all permutations, combinations, or subsets of a set.

Correlation and Simple Linear Regression

Chapter 8. Inferences about More Than Two Population Central Values

The variance and standard deviation from ungrouped data

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ " 1

Handout #1. Title: Foundations of Econometrics. POPULATION vs. SAMPLE

CHAPTER 4 RADICAL EXPRESSIONS

Chapter -2 Simple Random Sampling

Chapter Business Statistics: A First Course Fifth Edition. Learning Objectives. Correlation vs. Regression. In this chapter, you learn:

Statistics MINITAB - Lab 5

Third handout: On the Gini Index

PROPERTIES OF GOOD ESTIMATORS

Part 4b Asymptotic Results for MRR2 using PRESS. Recall that the PRESS statistic is a special type of cross validation procedure (see Allen (1971))

1. A real number x is represented approximately by , and we are told that the relative error is 0.1 %. What is x? Note: There are two answers.

Parameter, Statistic and Random Samples

Chapter -2 Simple Random Sampling

Ordinary Least Squares Regression. Simple Regression. Algebra and Assumptions.

ENGI 4421 Joint Probability Distributions Page Joint Probability Distributions [Navidi sections 2.5 and 2.6; Devore sections

Chapter 14 Logistic Regression Models

Functions of Random Variables

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

STATISTICS 13. Lecture 5 Apr 7, 2010

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #1

Chapter 11 Systematic Sampling

STA302/1001-Fall 2008 Midterm Test October 21, 2008

Multiple Linear Regression Analysis

Module 7: Probability and Statistics

Chapter 13 Student Lecture Notes 13-1

Lecture 9: Tolerant Testing

( ) = ( ) ( ) Chapter 13 Asymptotic Theory and Stochastic Regressors. Stochastic regressors model

The Selection Problem - Variable Size Decrease/Conquer (Practice with algorithm analysis)

GOALS The Samples Why Sample the Population? What is a Probability Sample? Four Most Commonly Used Probability Sampling Methods

CHAPTER 6. d. With success = observation greater than 10, x = # of successes = 4, and

Midterm Exam 1, section 1 (Solution) Thursday, February hour, 15 minutes


Example: Multiple linear regression. Least squares regression. Repetition: Simple linear regression. Tron Anders Moger

Chapter Statistics Background of Regression Analysis

Chapter 13, Part A Analysis of Variance and Experimental Design. Introduction to Analysis of Variance. Introduction to Analysis of Variance

ENGI 4421 Propagation of Error Page 8-01

1 Onto functions and bijections Applications to Counting

1 Mixed Quantum State. 2 Density Matrix. CS Density Matrices, von Neumann Entropy 3/7/07 Spring 2007 Lecture 13. ψ = α x x. ρ = p i ψ i ψ i.

Chapter 5 Elementary Statistics, Empirical Probability Distributions, and More on Simulation

Lecture 8: Linear Regression

Analysis of Variance with Weibull Data

Chapter 1 Data and Statistics

MA/CSSE 473 Day 27. Dynamic programming

Statistics: Unlocking the Power of Data Lock 5

CHAPTER 3 POSTERIOR DISTRIBUTIONS

Introduction to local (nonparametric) density estimation. methods

Random Variables and Probability Distributions

Multiple Regression. More than 2 variables! Grade on Final. Multiple Regression 11/21/2012. Exam 2 Grades. Exam 2 Re-grades

Special Instructions / Useful Data

Lecture 02: Bounding tail distributions of a random variable

9 U-STATISTICS. Eh =(m!) 1 Eh(X (1),..., X (m ) ) i.i.d

UNIT 1 MEASURES OF CENTRAL TENDENCY

Lecture 3 Probability review (cont d)

Objectives of Multiple Regression

Assignment 5/MATH 247/Winter Due: Friday, February 19 in class (!) (answers will be posted right after class)

Transcription:

Data Aalyss Summary tables ad charts. Orgazg umercal data: Hstograms ad frequecy tables I ths lecture, we wll study descrptve statstcs. By descrptve statstcs, we refer to methods volvg the collecto, presetato, ad characterzato of a set of data order to descrbe the varous features of that set of data properly. Data are usually provded raw form. Eample: grades for assgmet # mgts3000 are collected for a class of 40 studets. For group (0 studets) of the class, these grades, gve a scale from 0 to 0 are: 7. 4.9 6.4 4.8 4.6 6.0 5.4 4.6 3.8 3.8 6.6 8.0.4 7.0 4.9 3.6 7.4 5.8 8.6 3.9 For group (0 studets as well), the grades are: 3.4 3.9 5.0 3.9 8.0 3.5 4.9 4. 3.9 4.8 5.9 3.9 9.7 4.7 0 4.7 9. 4.7 3.5 5. The data as they are preseted above are of lttle terest. We eed frst to order the data. The ordered array would be: For group :.4 3.6 3.8 3.8 3.9 4.6 4.6 4.8 4.9 4.9 5.4 5.8 6 6.4 6.6 7 7. 7.4 8 8.6 For group 3.4 3.5 3.5 3.9 3.9 3.9 3.9 4. 4.7 4.7 4.7 4.8 4.9 5 5. 5.9 8 9. 9.7 0 We ca already read the data a lttle bt better (lke vsualzg the mmum ad mamum values). We stll eed some basc tools to uderstad the dstrbuto of the data set. The frst tool that we ca use s the steam-ad-leaf dsplay... Stem-ad-leaf dsplay The Stem-ad-leaf dsplay separates etres to leadg dgts or stems ad tralg dgts or leafs. It s useful to show the dstrbuto of data ad how skewed t s.

Data Aalyss Fgure : the stem-ad-leaf dsplay The stem-ad-leaf dsplay 4 3 869 4 98689 5 48 6 406 7 04 8 306 I the revsed stem-ad-leaf dsplay, the tralg dgts are ordered from the smallest to the largest oe. Fgure : The revsed Stem-ad-leaf dsplay The revsed stem-ad-leaf dsplay 4 3 689 4 68899 5 48 6 046 7 04 8 036 (a) Iterpret the above dsplay. (b) Form the stem ad leaf dsplay for group (c) Form the revsed dsplay ad terpret you fdgs. Frequecy dstrbutos ad relatve frequecy dstrbutos Frequecy dstrbutos are summary tables whch data are arraged to coveetly establshed, umercally ordered groupgs or categores. The umber of classes depeds: t depeds o the umber of observato the data. It should be large eough to dscrmate the data but ot too large to a pot where the frequecy dstrbuto becomes useless. Noetheless, two rules have bee proposed the lterature to defe the optmal umber of classes. Struge Rule: ths rule suggests usg as umber of classes C the closest teger to the value + 3.3log0( ). For stace, for =0, C +3.3.30=5 (5.9 rouded to the lower value). For =00, C +3.3log(00)=7, Square root of rule: A secod rule C=. For stace, for =0, C 0 = 4. 47 (ether 4 or 5 classes). For =00, C 5. Notce that there s o deftve rule to apply choosg C. It s mostly a matter of judgmet.

Data Aalyss The wdth of classes: t s better to have classes of equal wdth. To determe the wdth of each class, we dvde the rage of the data by the umber of class groupg desred. The rage s equal to the dstace betwee the largest observato ad the smallest oe. Boudares of classes refer to the upper ad lower lmts of a class. Classes should ot crossover (o pot should belog to two dfferet classes at the same tme) The class mdpot s a pot located halfway betwee the boudares of each class ad s represetatve of data wth that class. The relatve frequecy dstrbuto resumes the percetage dstrbuto of the data. It scales the frequecy obtaed by the total sum of frequeces (or proportos) so that the relatve frequeces add up to 00%. Dvdg the frequeces each class of the frequecy dstrbuto forms by the total umber of observatos the relatve frequecy dstrbuto. Eample: (a) Form the frequecy table for group grades. (b) Compute the relatve frequeces (c) Compute the cumulatve frequeces ad the cumulatve relatve frequeces Class of grades B Frequecy relatve frequecy cumulatve frequecy cumulatve relatve frequecy 0-0 0% 0 0% -4 4 5 5% 5 5% 4-6 6 8 40% 3 65% 6-8 8 6 30% 9 95% 8-0 0 5% 0 00% sum 0 00% I the relatve frequecy colum, each etry correspods to the frequecy of that class dvded by the total umber of observatos. For stace, for the class -4, the relatve frequecy s 5%=5/0. The formula gvg the relatve frequecy of a class s: f k = = k f k s the relatve frequecy of the class k k s the frequecy of the class k, that s the total umber of observatos belogg to t I the cumulatve frequecy table, we compute the sum of frequeces from the smallest observato up to the upper lmt of that terval or class. For stace, the relatve frequecy of the class 4-6 s 0+5+8=3. It s also equal to 8 plus the relatve frequecy of the prevous class (5). The formula leadg to the cumulatve relatve frequecy F k s: F k X X k k = = = = 3

Data Aalyss X k s the cumulatve frequecy of class k whereas X s the cumulatve frequecy of class, that s the total umber of observatos. The cumulatve relatve frequecy does the same thg wth relatve frequeces. Phase: computg relatve frequeces, cumulatve frequeces ad cumulatve relatve frequeces. The table below.. Polygos Polygos perform the kd of aalyss tha hstograms. The oly dstcto s that a polygo jos by straght-le mdpots of cosecutve classes. I the case of the prevous eample, the polygo s the followg: Fgure 3: hstogram for group grades Hstogram Frequecy 0.00 5.00-4 6 8 0 More B 50% 00% 50% 0% Frequecy Cumulatve % Fgure 4: polygo for group grades 0.00 8.00 6.00 4.00.00 - Polygo 3 5 7 9 Frequecy 4

Data Aalyss.. Cotgecy tables Cotgecy table allows comparg two dstrbutos smultaeously. To create a cotgecy table, rows we put groups ad colums classes. I the case of assgmet # grades, the cotgecy table would look lke that: B 4 6 8 0 group 0 5 8 6 group 0 7 9 3 Questo: Compare the dstrbuto of the two groups.. Summarzg data I the prevous class, we leared how to summarze umercal data by tables ad by charts. I order to make sese of these data, we eed to compute ad summarze the key features of the data set uder aalyss. We wll thus eplore: - The data cetral tedecy - Its varato - Its shape. Measures of cetral tedecy.. The mea The mea s the most commo measure of cetral tedecy. - the arthmetc mea most usually called average. It s calculated by summg all observatos ad dvdg the sum by the umber of tems observed. Where s the arthmetc mea s the sample sze s the value of observato = + +... + I the case of assgmet # s grades, 7. + 4.9 +... + 3.9 = 0 Computg the average usg grouped data: I case the data have bee already grouped (trasformed from raw data to a frequecy table), the mea s computed as follows: 5

Data Aalyss = C = f m f = relatve frequecy of class m = mdpot of class C= umber of classes Eample: Usg the frequecy table for group s grades Class of grades B Mdpot Frequecy relatve frequecy 0-0 0% -4 4 3 5 5% 4-6 6 5 8 40% 6-8 8 7 6 30% 8-0 0 9 5% sum 0 00% = 5.3 ( 0 + 3.5 + 5.4 + 7.3 + 9.05) = Notce that they may be some loss of precso usg the grouped data for computg the mea. However, t s sometmes justfable as t saves computatoal tme cosderably. Propertes of the mea - y = a + b y = a b +.. Meda: The meda deotes the mddle value a ordered sequece of data. The meda s uaffected by etreme observatos lke the mea. Thus, wheever a etreme observato s preset, t s more approprate to use the meda as a measure for cetral tedecy. How to calculate the meda:. arrage the data a ordered creasg array. If the sample sze s odd, the the meda correspods to the umercal value of the (+) ordered observato. If s eve, the meda s the average of the (+) th ad th observato the ordered array. Usg grouped data The meda s located the class where the cumulatve frequecy eceeds 50% for the frst tme. To compute the meda, use ths formula: Meda = B m + ( B m+ B m (50% F ) ( F F m+ m m ) ) 6

Data Aalyss Class of grades B Frequecy relatve frequecy cumulatve frequecy cumulatve relatve frequecy 0-0 0% 0 0% -4 4 5 5% 5 5% 4-6 6 8 40% 3 65% 6-8 8 6 30% 9 95% 8-0 0 5% 0 00% sum 0 00% Usg group s grades data, we otce from the frequecy table that that the cumulatve frequecy eceeds 50% for the frst tme the class of grades (4-6). The meda belogs thus to ths grades. Usg the lear etrapolato proposed, the meda s gve by: Eample Meda = 4 + (6-4) (50%-5%)/(65%-5%) = 5.5 The meda for group s equal to 5.5 (as gve by Ecel). 5.5=(4.9+5.4)/ where 4.9 ad 5.4 are the respectve values of the 0 th ad th observatos a ordered array. The ratoale behd usg the meda s: - It s ot affected by etreme values (usually called outlers) - Whe you take ay observato at radom, t s just as lkely to eceed the meda as t s to be eceeded by t...3 The mode The mode correspods to the most frequet observato. It ca be obtaed from the ordered array of data. The stem-ad-leaf dsplay also helps to fd the mode. The mode s maly applcable to dscreet data. It s applcable to cotuous observato oly after modfcato of the data set ad roudg up data observatos. More mportat tha the mode as a sgle value s the modal class, whch refers to the class wth the hghest frequecy (or equvaletly the hghest relatve frequecy). Eample By lookg at the revsed stem-ad-leaf dsplay for group (see page 3), we otce that there are two values that may costtute the mode for group : 4.6 ad 4.8. The modal class s the class 4-6 as t has the hghest frequecy wth 8 observatos. The mode s useful, as t s sestve to outlers. However, t has a lmted applcablty, as t s applcable to cotuous data ad to data set characterzed by a bg rage...4 The Mdrage The mdrage s equal to the average of the smallest observato ad the largest oe. 7

Data Aalyss..5 Quartles ad percetles: Quartles are descrptve measures that splt the data set to 4 groups Q, the frst quartle, s the observato such as 75% of observatos s bgger ad 5% of observatos s smaller. Q, the secod quartle, s the observato such as 50% of observatos s bgger ad 50% of observatos s smaller. Questo: Does ths remd you aythg? Q 3, the thrd quartle, s the observato such as 5% of observatos s bgger ad 75% of observatos s smaller. Questo: what would be Q 4? How to compute these measures? For Q, put data a ordered array, tha choose the value correspodg to the I((+)/4) th ordered + observato. By I, we mea the rouded up value to the et teger of the th observato. 4 For stace, let be the sample sze. If =40,4,4 or 43, Q correspods to the th observato ordered decreasg array. 3 ( + ) I a smlar way, Q3 correspods to the I( ) th observato a ordered decreasg array. To be 4 much more rgurous, quartles are gve by ths formula: Q 75% Fq 3 = q + ( q3 + F F 3 3 q q + q Q 3 s the umercal value correspodg to the thrd quartle 3 3 q3 s the last observato havg a cumulatve relatve frequecy below 75% q3+ s the frst observato havg a cumulatve relatve frequecy above 75% F q3 s the cumulatve frequecy of q3. F q3+ s the cumulatve frequecy of q3+ I a smlar way, Q s gve by the followg formula Q 5% Fq = q + ( q + F F q q + q 3 ) The terms ths formula has smlar meag to those the formula gvg Q 3. ). Measures of Cetral tedecy It s also ecessary to aalyze the amout of dsperso or spread characterzg a set of data. Two data sets may have smlar cetral tedecy measures but vary cosderably term of varato. 8

Data Aalyss Fgure 3: varablty of the dsperso for 3 dstrbutos havg the same meag C B A For stace, otce the above graph that the data set A have a dstrbuto much tghter aroud the commo mea µ tha data set B. Data set C has the most dspersed data. Thus, t s ecessary to troduce some measures of varato to better characterze the dstrbuto of a data set... The rage The rage provdes a rough dea about the varato the data set. It s the dfferece betwee the largest value ad the smallest oe. Rage = X ma X m The rage measures the total spread the data set. The ma weakess s that t s hghly sestve to outlers (etreme values). To cope wth ths problem, the terquartle rage s proposed as a alteratve soluto.... Iterquartle rage: The formula yeldg to the terquartle s: Iterquartle = Q 3 -Q The terquartle rage captures the spread the mddle 50% of the data For eample, for assgmet # s grades, the rage ad the terquartle are: For group, Rage = 8.6-.4=6. Iterquartle=Q-Q3=6.7-4.45=.75 For group Rage =6.6 Iterquartle=.475 9

Data Aalyss..3 The stadard-devato The stadard devato s the most commoly used measure for dsperso ad varace. The varace s defed as follows S ( = ) + ( ) + Κ + ( ) Where s the arthmetc mea Questo: s the sample sze s the value correspodg to the th observato Why do we dvde by (-) ad ot by? Whe we dvde by -, we get a ubased estmator of σ, the true value of dsperso o the populato. The stadard devato s the square root of the varace S = ( ) + ( ) + Κ + ( ) Computg S ad S : The maual (ad log way) to compute the varace ad the stadard devato s. Compute. Compute - 3. Compute ( - ) 4. Compute Σ( - ) 5. Dvdg by -, we obta s 6. To get s, take the square root of s 0

Data Aalyss Eample (assgmet grades) Maually, we would compute s ad s for group accordg to ths table. = 5.485 S( - ) = 5.3655 S =.56875 Based std-devato.60584 Ubased st-dev=.6446 Ituto beyod usg the stadard devato Cosder the followg measure of dsperso: = ( ) Questo: observato -mea ( -mea) 7..75.945 4.9-0.585 0.345 3 6.4 0.95 0.8375 4 4.8-0.685 0.4695 5 4.6-0.885 0.7835 6 6 0.55 0.655 7 5.4-0.085 0.0075 8 4.6-0.885 0.7835 9 3.8 -.685.8395 0 3.8 -.685.8395 6.6.5.435 8.55 6.355 3.4-3.085 9.575 4 7.55.955 5 4.9-0.585 0.345 6 3.6 -.885 3.5535 7 7.4.95 3.6675 8 5.8 0.35 0.0995 9 8.6 3.5 9.7035 0 3.9 -.585.55 Sum 09.7-6.6634E-5 5.3655 Do you see ay problem wth usg ths measure of dsperso? The problem s that egatve devatos offset postve oes. Cosder ths smple eample:

Data Aalyss 3 4 5 6 7 X X 3 X 6 X X 5 X 4 I ths case, the sum of devato s equal to zero, suggestg that there s o varato amog the data (that s all observatos =5, whch s the mea). Suppose we take ths measure = Ths measure certaly hadles dsperso a much better way. However, t gves the same weght (mportace) to all devatos. Outlers are ot pealzed eough. Whe we square devatos, we esure () that postve ad egatve devatos wll ot offset each other ad () that we capture more easly the effect of outlers or etreme observatos. Propertes of the varace ad stadard-devato: f f y = a + S = S y = b S = y y b S Questo: What f y =a+b f z = y + S = S + S z y Try to proof these propertes..4 Coeffcet of varato The coeffcet of varato (CV) s gve by the followg formula SD CV = The CV s a relatve measure of varato. It s partcularly useful whe comparg the varablty of two or more data sets that are epressed dfferet uts of measuremet or scales. For eample, the frst data set s gve klos ad a secod oe pouds. Also, whe computg the stadard devato, oe ca be mslead by the ut of measuremet eve wth the same sample. Eample klo =. pouds SD ( klos)=. SD ( pouds) More geerally, Var(λ )=λ Var( ) ad SD(λ )=λsd( ) However, wth the CV, CV(λ )= CV( ). See proof below.

Data Aalyss.3 Measures of shape SD( λ ) λsd( ) SD( ) CV ( λ ) = = = = CV( ) E( λ ) λe ( ) E( ) Shape refers to the appearace of the dstrbuto fucto of the data. A dstrbuto ca be symmetrcal, skewed to the left or skewed to the rght. A frst ad fast way to check for skewedess s to compare the mea ad the meda. If the mea s bgger tha the meda, the the dstrbuto s postvely or rght-skewed. If the mea s almost equal to the meda, the dstrbuto s sad to be symmetrcal. If the mea s smaller tha the meda, the the dstrbuto s egatvely or left-skewed..3. The coeffcet of skewedess The skewedess s gve by the followg formula Skewedess = = ( ) 3 The rule of thumb s: If skewedess 0, the dstrbuto s egatvely skewed. If skewedess 0, the dstrbuto s postvely skewed. If skewedess 0, the dstrbuto s symmetrcal. The graph below resumes the prevous dscusso. Fgure 5: the varous types of skewedess a dstrbuto Iterpretg skewedess Rght or postvely skewed Symmetrcal Left or egatvely skewed Mea<meda skewedess >0 Mea=meda skewedess=0 Mea>meda skewedess <0 3

Data Aalyss.3. Detectg outlers: the Bo-ad-Whsker plot Ier feces are froters that set lmts to the ormal values. All values that le beyod the er feces are defed as outlers whch are etreme values that may affect the valdty of the cetral tedecy measure used to characterze the data set. The formula leadg to the upper ad lower feces are: To further dscrmate, we defe outer feces as: Upper Ier Fece = Q3 + 5. *( Q3 Q) Lower Ier Fece = Q3 5. *( Q3 Q) Upper outer Fece = Q3 + 3*( Q3 Q) Lower outer Fece = Q3 3*( Q3 Q) Values that fall beyod outer feces are cosdered to be severe outlers, whereas values lyg betwee er ad outer feces are moderate outlers. The dagram for the Bo-ad-Whsker plot s the followg. Fgure 6: The Bo-ad-Whsker plot ad classfyg outlers The Bo-ad-Whsker plot severe outlers moderate outlers Q 3 +3(Q 3 -Q ) Q 3 +.5(Q 3 -Q ) ormal observatos moderate outlers severe outlers Q 3 Q meda Q -.5(Q 3 -Q ) Q -3(Q 3 -Q ) The Bo-ad-Whsker plot helps also descrbg the shape of the dstrbuto. See the below fgure 8. 4

Data Aalyss Fgure 7: Idetfyg skewedess wth the Bo-ad-Whsker plot upper etreme values rght skewed symmetrcal left skewed lower etreme values Eercse (a) Form the Bo-ad-Whsker plot for group ad group grades. Do you detect ay outlers each group? (b) Descrbe the shape of the dstrbuto for each group. (c) Is there aother way to obta these same results about the shape of the dstrbuto?.4 Calculatg descrptve summary measures from a populato Now that we kow how to characterze a sample, that s ts statstcs, we ca proceed to characterze the whole populato. The resultg measures, called parameters, are computed the same way as t s a sample. The populato mea, µ, s gve by The populato stadard, σ, devato s gve by µ = = σ = ( µ ) = 5

Data Aalyss Notce that the formula gvg the sample stadard devato s slghtly dfferet S = ( ) = µ Ths small dfferece -dvdg by (-) stead of - s justfed by the ubasess of the estmator S. I fact, f becomes larger, the sample sze creases to a pot that the sample chose covers the whole populato. By the Cetral Lmt Theorem (CLT), f the µ f the ( µ ) σ = Cosequetly, by takg S ( ) = = µ, by the CLT ( µ ) ( µ ) = = f the S = = σ that s f the S σ 6