Quantitative Tools for Research

Similar documents
Unit 2. Describing Data: Numerical

P8130: Biostatistical Methods I

Lecture 2 and Lecture 3

MgtOp 215 Chapter 3 Dr. Ahn

2.1 Measures of Location (P.9-11)

Overview of Dispersion. Standard. Deviation

Statistics for Managers using Microsoft Excel 6 th Edition

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

Determining the Spread of a Distribution Variance & Standard Deviation

2011 Pearson Education, Inc

Lecture 2. Descriptive Statistics: Measures of Center

Chapter 3. Data Description

STP 420 INTRODUCTION TO APPLIED STATISTICS NOTES

Recap: Ø Distribution Shape Ø Mean, Median, Mode Ø Standard Deviations

GRAPHS AND STATISTICS Central Tendency and Dispersion Common Core Standards

Unit 2: Numerical Descriptive Measures

Statistics I Chapter 2: Univariate data analysis

ADMS2320.com. We Make Stats Easy. Chapter 4. ADMS2320.com Tutorials Past Tests. Tutorial Length 1 Hour 45 Minutes

Chapter 3 Data Description

Statistics I Chapter 2: Univariate data analysis

Chapter 1 - Lecture 3 Measures of Location

Unit Two Descriptive Biostatistics. Dr Mahmoud Alhussami

TOPIC: Descriptive Statistics Single Variable

Objective A: Mean, Median and Mode Three measures of central of tendency: the mean, the median, and the mode.

Describing distributions with numbers

2.0 Lesson Plan. Answer Questions. Summary Statistics. Histograms. The Normal Distribution. Using the Standard Normal Table

MATH 117 Statistical Methods for Management I Chapter Three

After completing this chapter, you should be able to:

Descriptive Statistics-I. Dr Mahmoud Alhussami

additionalmathematicsstatisticsadditi onalmathematicsstatisticsadditionalm athematicsstatisticsadditionalmathem aticsstatisticsadditionalmathematicsst


Numerical Measures of Central Tendency

BNG 495 Capstone Design. Descriptive Statistics

Quartiles, Deciles, and Percentiles

Section 3.2 Measures of Central Tendency

Describing Distributions With Numbers

Exercises from Chapter 3, Section 1

F78SC2 Notes 2 RJRC. If the interest rate is 5%, we substitute x = 0.05 in the formula. This gives

Last Lecture. Distinguish Populations from Samples. Knowing different Sampling Techniques. Distinguish Parameters from Statistics

Instrumentation (cont.) Statistics vs. Parameters. Descriptive Statistics. Types of Numerical Data

3.1 Measures of Central Tendency: Mode, Median and Mean. Average a single number that is used to describe the entire sample or population

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 3.1- #

Describing Distributions

1. Exploratory Data Analysis

Chapter (3) Describing Data Numerical Measures Examples

SUMMARIZING MEASURED DATA. Gaia Maselli

Section 3. Measures of Variation

Chapter Four. Numerical Descriptive Techniques. Range, Standard Deviation, Variance, Coefficient of Variation

Describing distributions with numbers

Determining the Spread of a Distribution

Lesson Plan. Answer Questions. Summary Statistics. Histograms. The Normal Distribution. Using the Standard Normal Table

Chapter 2: Tools for Exploring Univariate Data

Determining the Spread of a Distribution

Descriptive Univariate Statistics and Bivariate Correlation

MATHEMATICS-IIA. 1. Calculate the variance and standard deviation of the following continuous frequency distribution

Chapter 2 Descriptive Statistics

3 Lecture 3 Notes: Measures of Variation. The Boxplot. Definition of Probability

Introduction to Statistics

Tastitsticsss? What s that? Principles of Biostatistics and Informatics. Variables, outcomes. Tastitsticsss? What s that?

Practice problems from chapters 2 and 3

Chapter 2 Class Notes Sample & Population Descriptions Classifying variables

Data: the pieces of information that have been observed and recorded, from an experiment or a survey

CHAPTER 2: Describing Distributions with Numbers

Descriptive Data Summarization

Preliminary Statistics course. Lecture 1: Descriptive Statistics

Slide 1. Slide 2. Slide 3. Pick a Brick. Daphne. 400 pts 200 pts 300 pts 500 pts 100 pts. 300 pts. 300 pts 400 pts 100 pts 400 pts.

CS 147: Computer Systems Performance Analysis

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency

Chapter 3. Measuring data

Resistant Measure - A statistic that is not affected very much by extreme observations.

STOR 155 Introductory Statistics. Lecture 4: Displaying Distributions with Numbers (II)

Summarizing Measured Data

Lecture 11. Data Description Estimation

AP STATISTICS: Summer Math Packet

STAT 200 Chapter 1 Looking at Data - Distributions

Measures of center. The mean The mean of a distribution is the arithmetic average of the observations:

Midrange: mean of highest and lowest scores. easy to compute, rough estimate, rarely used

Measures of Central Tendency

DEPARTMENT OF QUANTITATIVE METHODS & INFORMATION SYSTEMS QM 120. Spring 2008

Describing Distributions With Numbers Chapter 12

1 Measures of the Center of a Distribution

Describing Distributions with Numbers

SESSION 5 Descriptive Statistics

Descriptive Statistics

Elementary Statistics

QUIZ 1 (CHAPTERS 1-4) SOLUTIONS MATH 119 SPRING 2013 KUNIYUKI 105 POINTS TOTAL, BUT 100 POINTS = 100%

Probability Solved Sums

Measures of the Location of the Data

Chapter 3 Statistics for Describing, Exploring, and Comparing Data. Section 3-1: Overview. 3-2 Measures of Center. Definition. Key Concept.

MEASURES OF LOCATION AND SPREAD

Topic-1 Describing Data with Numerical Measures

Measures of Central Tendency

Math 14 Lecture Notes Ch Percentile

Statistics and parameters

Section 2.4. Measuring Spread. How Can We Describe the Spread of Quantitative Data? Review: Central Measures

Algebra 2. Outliers. Measures of Central Tendency (Mean, Median, Mode) Standard Deviation Normal Distribution (Bell Curves)

Types of Information. Topic 2 - Descriptive Statistics. Examples. Sample and Sample Size. Background Reading. Variables classified as STAT 511

Chapter. Numerically Summarizing Data Pearson Prentice Hall. All rights reserved

Perhaps the most important measure of location is the mean (average). Sample mean: where n = sample size. Arrange the values from smallest to largest:

1 Paid Copy Don t Share With Anyone

Transcription:

Quantitative Tools for Research KASHIF QADRI Descriptive Analysis Lecture Week 4 1

Overview Measurement of Central Tendency / Location Mean, Median & Mode Quantiles (Quartiles, Deciles, Percentiles) Measurement of Dispersion Range & Quartile Deviation Variance and Standard Deviation The concept of outliers 3 Descriptive Analysis Describing the characteristics of the data Revealing the distribution Leading towards further analysis Concluding on the basis descriptive analysis 4 2

Measures of Central Tendency Central Tendency All the values in a data tend towards it center. 3

Central Tendency The values, in a data set, tend towards a central value that is called central tendency. It summarizes a data set in a single value. The methods to measure central tendency are called Measures of Central Tendency, Location, Position or simply Average. 7 The Arithmetic Mean x The Arithmetic Mean or Simply the mean is the most widely used average. It is defined as the sum of the observations divided by the number of the observations. It is indicated by AM or μ or x 8 4

The Arithmetic Mean Mean = Sum of observations No. of observations Let the observations are x1, x2, x3,, xn. Then the arithmetic mean (μ or ) will be: x = x1 + x2 + x3 + + xn = Σxi x n n μ = x = Population Mean Sample Mean 9 Example Find the AM of: 4, 7, -2, 0 and 8. Mean = ( 4 + 7-2 + 0 + 8 ) / 5 = 17 / 5 = 3.4 Ans. 10 5

Weighted Mean According the relative importance of numbers their weights are assigned. Hence weighted mean is obtained as: WM = w1*x1 +w2*x2 + + wk*xk w1 + w2 + + wk WM = Σwx 11 Σw Example A student scored 45, 80 and 60 in three quizzes. The weights of these quizzes are 1, 2 and 5 respectively. Find the weighted score of this student. 12 6

Solution Weighted Mean = w1*x1 + w2*x2 + w3*x3 w1 + w2 + w3 WM = 1* 45 + 2 * 80 + 5 * 60 1 + 2 + 5 WM = 505 8 Weighted Mean = 63.125 13 14 7

Combined Mean The combined mean of k groups can be obtained by: CM = n1*m1 + n2*m2 + + nk*mk n1 + n2 + + nk CM = Σnm Σn 15 Example The mean of three samples are given below. Find the combined mean. Sample # No. of values Mean 1 32 1158 2 17 1897 3 26 1453 16 8

Solution CM = n1*m1 + n2*m2 + n3*m3 n1 + n2 + n3 CM = 32 * 1158 + 17 * 1897 + 26 * 1453 32 + 17 + 26 = 37056 + 32249 + 37778 75 = 107083 / 75 Combined mean = 1427.77 17 Properties of Arithmetic Mean 1. The sum of deviations of all observations from AM is always zero. 2. The sum of square of deviations of each observation from AM is minimum. 3. Linear Transformation 4. Change of Origin & Scale 18 9

Linear Transformation If there is a linear relationship between two variables X andy, i.e. Y= a + bx where a & b are any constants but a 0. Then y = a + b x 19 Example The average salary of workers in a factory is $580. If their salary is raised by 2% and a further bonus $50 is given to each, then find the new average salary. X = 580, a = 1.02, b = 50 ; Y =? Formula: Mean of Y = a + b * Mean of x = 50 + 1.02 * 580 = $641.6 20 10

The Median Median is the value which divides the ordered data set into two equal parts (or it is middle most observation in the ordered data). Data line: Median 0% 50% 100% 27 Median Median is the middle most observation in the arranged data. Median divides the ordered data into two halves. 50% values lie below median and 50% above median. 28 Median 14

For Odd number of observations Formula: Median = (n + 1)th observation 2 Find the median of: 2, - 6, 0, 11, 7, 5, and - 1 Median = (n + 1) = (7 + 1) = 4th observation 2 2 Arranged data: - 6, - 1, 0, 2, 5, 7, 11 Median = 2 29 Question Find the median of the following data: 17, -13, 21, 9, 0, -8, 13, 7, 2 Solution: - 13, - 8, 0, 2, 7, 9, 13, 17, 21 Median = 7 30 15

For Even number of observations Formula: Median is AM of { n/2 & (n/2 + 1)}th obs. Find the median of : 4, -6, 0, 7, 4, 2, -9, 10 Here n = 8 So median will be: AM of ( 4th & 5th) observations i.e. Ordered data: -9, -6, 0, 2, 4, 4, 7, 10 Median = (2 + 4)/2 = 3 31 Question Find the median of the following data: 45, 10, 36, 28, 17, 32, 11, 37, 22, 41 Ordered Data: 10, 11, 17, 22, 28, 32, 36, 37, 41, 45 Median = (28 + 32)/2 = 30 32 16

Quantiles 1. The partitioned values of the ordered data set are called quantiles. 2. There are three kinds of quantiles: a) Quartiles (divide the into 4 parts) b) Deciles (divide the data into 10 parts) c) Percentiles (divide the data into 100 parts) 33 Quartiles Q1 Q2 Q3 Q1 = Lower Quartile Q3 = Upper Quartile Q2 = Median 34 17

Quartiles There are three quartiles Q1, Q2, and Q3. Q1 is the value below which 25% data lies. Q2 is the value below which 50% data lies. Q3 is the value below which 75% data lies. Q1 = (n +1)th obs. & Q3 = 3(n +1)th obs. 4 4 35 Example Find the lower and upper quartiles of the data given below. 4, 3, 9, 0, 1, 6, 8, 4, 3, 0, 2, 10, 13. Arranging in ascending Order (n = 13): 0 0 1 2 3 3 4 4 6 8 9 10 13 36 18

Solution Lower Quartile (Q1) Q1 = (13+1)/4 = 3.5 = 4th Obs. Q1 = 2 Upper Quartile (Q3) Q3 = 3 (13 +1)/4 = 10.5 = 11th Obs. Q3 = 9 37 Deciles Deciles divide the ordered data set into TEN parts. D1 D2 D5 D9 10 20 50 90 General Formula: Di = i*(n +1)/10 38 19

Example Find 4th decile of the following series: 3, 6, 9,, 884, 887, 900 4th Decile (D4): Here n = 300 ( i.e. 300 observations ) D4 = 4(n +1)/10 value D4 = 4(301)10 = 120.4 = 120 (round off) Hence D4 will be 120th Obs. i.e. D4 = 360 39 Percentiles P1 P50 P99 1 2 50% 99% Partition the ordered data set into 100 parts. General Formula: Pi = i*(n + 1)/100 40 20

Example Find 78th percentile of the following series: 3, 6, 9,, 884, 887, 900 78th Percentile P78 = 78(n+1)/100 = 78(301)/100 = 234.78 = 235 P78 = 235th Observation. P78 = 705 41 The Mode Mode is the most frequent observation in the data. It is the value which is repeated largest number of times. If two values are repeated same number of times both of them will be mode. If all the values are repeated equal number of times than there will be no mode. 42 21

Example Find the mode of: a) 1, 2, 1, 3 2, 3, 0, 1, 4, 5, 2, 3, 3 Mode = 3 b) 2, 2, 1, 3, 5, 0, 5, 0, 0, 4, 1, 6, 1 Modes = 0 and 1 Why? (Both are repeated equal number of times) c) 1, 2, 3, 5, 4, 7, 0, 6, 8, 9, 15 Mode: none Why? (All values are once in the data.) 43 When to Use? Mean Mean most generally used central tendency Popular measure of central tendency because of it properties Fails if observations are scattered or extreme values are there Median Strong measurement in ordered data Good for qualitative ordered data Has no affect of extreme values Lacks mathematical properties Mode Helpful if the values are close and repeated Lack mathematical treatment 44 22

Measurement of Dispersion Definition If the values in a data set are scattered apart much than simply central tendency will not describe the data adequately. Hence the measure of spread is also applied to the data. The additional information that measures the scattered nature of a data set is called dispersion. 23

It is useful especially when two data sets are to be compared. There are two types of dispersion: absolute and relative. The dispersion is expressed in the units same as the data. The Range (R) Measure of Variation Difference Between Largest & Smallest Observations: Range = xm x 0 Ignores How Data Are Distributed: Range = 12-7 = 5 Range = 12-7 = 5 7 8 9 10 11 12 7 8 9 10 11 12 24

Formula Range = Xm - Xo Coefficient of Range = Xm - X0 Xm + X0 Example Find Range and Coefficient of Range of: 2, 5, 6, 10, -4, -3, 0, 5, 11 Here: Xm = 11 Xo = - 4 R = 11 - ( - 4) = 15 Co eff. of R = 11 - ( - 4 ) = 15 = 2.14 11 + ( - 4 ) 7 25

Let us take two sets of observations. Set A contains marks of five students in Mathematics out of 25 marks and group B contains marks of the same student in English out of 100 marks. Set A: 10, 15, 18, 20, 20 Set B: 30, 35, 40, 45, 50 The values of range and coefficient of range are calculated as: Range Coefficient of Range Set A: (Mathematics) 20-10 =10 20-10/20+10=0.33 Set B: (English) 50-30 =20 50-30/50+30=0.25

Quartile Deviation Quartile Deviation (QD) is the half of the difference between upper and lower quartile. Quartile deviation is also called semi interquartile range. Formula QD = Q3 - Q1 2 26

Example The students of a class scored the following marks in a certain quiz. Find the quartile deviation and coefficient of quartile deviation of marks. 40, 12, 27, 11, 5, 33, 45, 21, 37, & 43 Ordered data: 5, 11, 12, 21, 27, 33, 37, 40, 43, 45 Solution No. of observations = n = 10 Q1 = (n + 1)/4 = 11/4 = 2.75 3rd Obs. Q3 = 3(n + 1)/4 = 33/4 = 8.25 8th Obs. So Q1 = 12 Q3 = 40 QD = Q3 - Q1 = 40-12 = 14 2 2 27

Variance It is an important Measure of Variation. It shows variation about the arithmetic mean. For the Population: For the Sample: For the Population: use N in For the Sample : use n - 1 the denominator. in the denominator. Standard Deviation Most Important Measure of Variation Shows Variation About the Mean: For the Population: For the Sample: s Xi X 2 n 1 Xi N 2 For the Population: use N in the denominator. For the Sample : use n - 1 in the denominator. 28

Example Find the variance and standard deviation of the following data. - 4, 0, 6, 10 and 23 a) Use actual formula b) Short-cut formula Solution: X (X - m) (X - m)2 X X2-4 -4 16 0 0 0 6 6 36 10 10 100 23 23 529 35 681 m = mean = 7 30

Calculations Actual m = ΣX/n = 35/5 = 7 S2 = Σ(X - m)2 n S2 = 436 / 5 S2 = 87.2 S = 9.3381 Solution: X X2 31

Calculations Actual Short - Cut X= ΣX/n = 35/5 = 7 S2 = ΣX2 - ( ΣX )2 S2 = (X - X)2 n n n S2 = 681 - ( 35 )2 S2 = 436 / 5 5 5 S2 = 87.2 S2 = 136.2 - ( 7 )2 S = 9.3381 S2 = 87.2 S = 9.3381 CoComparison of three data setseviations Data A 11 12 13 14 15 Mean = 15.5 16 17 18 19 20 21 s = 3.338 Data B Data C 11 12 13 14 15 16 17 18 19 20 21 11 12 13 14 15 16 17 18 19 20 21 Mean = 15.5 s =.9258 Mean = 15.5 s = 4.57 32

Coefficient of Variation of Variation Measure of Relative Variation Always a % Shows Variation Relative to Mean Used to Compare 2 or More Groups Formula (for sample): Example CV = SD x 100 Mean CV = 9.3381 x 100 = 133.4 % 7 In comparing two data sets, the data set having less CV is considered more consistent. 33

Comparing CVs Stock A: Average Price last year = $50 Standard Deviation = $5 Stock B: Average Price last year = $100 Standard Deviation = $5 Coefficient of Variation: Stock A: CV = 10% Stock B: CV = 5% Example: The units produced by two worker in last week are given below. Which worker is more consistent in terms of production? Day Mon Tue Wed Thu Fri Worker A 26 55 0 112 51 Worker B 45 36 68 22 57 34

Solution Worker A Worker B X X2 (square) X X2(square) 26 676 45 2025 55 3025 36 1296 0 0 68 4624 112 12544 22 484 51 2601 57 3249 244 18846 228 11678 Calculations Worker A Mean Mean = ΣX / n = ΣX / n = 244 / 5 = 228 / 5 = 48.8 = 45.6 Worker B 35

Worker A Worker B Variance Variance S2 = ΣX2 - ( ΣX )2 S2 = ΣX2 - ( ΣX )2 n n n n S2 = 18846 - (244)2 S2 = 11678 - (228)2 5 5 5 5 S2 =3769.5-2381.44 S2 = 2335.6-2079.36 S2 = 1388.06 S2 = 256.24 Worker A Standard Deviation S = 1388.06 = 37.26 Co-eff. of Variance CV = S * 100 X CV = 37.26 * 100 48.8 CV = 76.35% Worker B Standard Deviation S = 256.24 = 16.01 Co-eff. of Variance CV = S * 100 X CV = 16.01 * 100 45.6 CV = 35.11% 36

Summary of Results Worker A Worker B Mean 48.8 45.6 Variance (S2) 1388.06 256.04 SD (S) 37.26 16.01 CV 76.35% 35.11% Result: Here CVB < CVA hence worker B is more consistent in his performance. Inter Quartile Range IQR = Q3 - Q1 The quartiles of a data set are Q1 = 25 Q3 = 38 Find IQR = 38-25 = 13 37

Concept of Outlier Extreme values in a data set are called outliers It is difficult to identify sometimes outlier especially when the data is large Our results are affected because of outliers Hence we detect outliers and remove them and then perform out anslysis 75 How to find outliers? Lower limit and upper limit of a data set are: Lower limit = Q1-1.5 IQR Upper limit = Q1 + 1.5 IQR 76 38

Example Find the outlier in the following data if any: Following is the weekly TV viewing time (in hours) of 20 people. 25 41 27 32 43 66 35 31 15 5 34 26 32 38 16 30 38 30 20 21 Q1 = 23 Q3 = 36.5 IQR = 36.5-23 = 13.5 Lower limit = Q1-1.5 IQR = 23-1.5*13.5 = 2.75 hours Upper limit = Q3 + 1.5 IQR = 36.5 + 1.5*13.5 = 56.75 77 There is only one outlier = 66 Weekly time 66 hours is outside the usual pattern of the data. 78 39