Martin L. Lesser, PhD Biostatistics Unit Feinstein Institute for Medical Research North Shore-LIJ Health System

Size: px
Start display at page:

Download "Martin L. Lesser, PhD Biostatistics Unit Feinstein Institute for Medical Research North Shore-LIJ Health System"

Transcription

1 PREP Course #10: Introduction to Exploratory Data Analysis and Data Transformations (Part 1) Martin L. Lesser, PhD Biostatistics Unit Feinstein Institute for Medical Research North Shore-LIJ Health System

2 CME Disclosure Statement The North Shore LIJ Health System adheres to the ACCME s new Standards for Commercial Support. Any individuals in a position to control the content of a CME activity, including faculty, planners, and managers, are required to disclose all financial relationships with commercial interests. All identified potential conflicts of interest are thoroughly vetted by the North Shore-LIJ for fair balance and scientific objectivity and to ensure appropriateness of patient care recommendations. Course Director and Course Planners, Kevin Tracey, MD, Cynthia Hahn, Emmelyn Kim, MPH, Tina Chuck, MPH have nothing to disclose. Martin L Lesser, PhD, EMT-CC have nothing to disclose

3 Quick Review Measures of location mean, median, quartiles, quantiles Measures of spread range, standard deviation, interquartile range, interquantile range Quick displays of data stem-and-leaf plot, box (and whisker) plot 3

4 LOS for 54 Pneumonia Patients Hypothetical Example Observed data (days), n = 54: Frequency Distribution Cumulative Cumulative LOS Frequency Percent Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 3-4 days days days days days days days days days days days days days

5 Graphical Methods Histograms Stem-and leaf plots Box plots Measure of Location Mean Median Quartiles SUMMARIZING DATA Measures of Spread Range (R) Mean absolute deviation (MAD) Variance (S 2 ) Standard deviation (s or SD) Interquartile range (IQR) 5

6 LOS for 54 Pneumonia Patients Frequency Histogram 6

7 LOS for 54 Pneumonia Patients Relative Frequency Histogram 7

8 Stem-and-Leaf Plot Los for 54 Pneumonia Patients Stem Leaf # Boxplot * *-----*

9 Constructing a Stem-and Leaf Plot <=== Step 4 === represents 4th data point, 11.0; and so on <=== Steps 1 and 2 === represent 1st and 2nd data points, 8.0 and 8.0 <=== Step 3 === represents 3rd data point, 4.0 Continue to fill in the plot until all data points have been plotted. Note that the data do not have to be entered in sorted order. 9

10 How Many Stem Lines? What Interval Between Stems? Maximum number of stem lines L = [ 10 x log 10 n ], where [x]=greatest integer function example: n =54, L= [10 x log 54] = 18 L for various values of n: n L Interval Size = range / L, rounded to nearest power of 10 example: n=54, L= 18, range= 34-2=32 interval size = 32/18 = 1.8 rounded to 1 10

11 Los for 54 Pneumonia Patients Stem Leaf # Boxplot * *-----*

12 Computing the Mean Suppose there are n observations: X 1, X 2,, X n Mean = X n i 1 n X i FACTS: The mean measures the central tendency of the data. The mean is sensitive to extreme observations known as outliers. Observed data (days), n = 54: X = mean = 576 / 54 = 10.7 days 12

13 Computing the Median The median is the middle value that splits the data set into two equal parts To compute the median (M), arrange the X i in ascending order: X (1), X (2), X (3),., X (n) Where X (1) = smallest value, X (2) = 2 nd smallest value,, X (n) = largest value The median is defined as the middle observation, which corresponds to the ordered observation in position (n + 1) / 2 ( depth ) Note that if n is an odd number, then the median falls out precisely on the middle observation, X ((n+1)/2) If n is an even number, then the median falls out halfway between the two middle observations, X (n/2) and X (n/2 + 1). In other words, median = (X (n/2) + X (n/2 + 1) ) / 2 The median is said to be robust because it is not sensitive to outliers. 13

14 Computing the Median (continued) Ordered data: n = 54 Since n is even, then M is the average of the middle two numbers, i.e. M = median = (n+1) / 2 = 55 / 2 = 27.5 => average of obs # 27 and # 28 = 8 days If n is odd, then M is simply the middle number, i.e. M = median = (n+ 1) / 2 14

15 Computing the Lower and Upper Quartiles ( Hinges ) The quartiles split the set of data into four equal parts. Lower quartile Q 1 = median of lower half = (n+1) / 4 Upper quartile Q 3 = median of upper half = 3*(n+1) / 4 Facts: The quartiles split the sample into quarters. Half of the observations lie between Q1 and Q3. The quartiles are said to be robust because they are not sensitive to outliers. There are several different methods for computing quartiles. To compute the quartiles, refer to the ordered data Q 1 = lower quartile = (total obs + 1) / 4 = (54+1) / 4 = => average of obs # 13 and # 14 = 6 days Q 3 = upper quartile = 3 * (total obs+1) / 4 = 3 * (54+1) / 4 = => average of obs # 41 and # 42 = 12.5 days 15

16 A measure of location, alone, does not adequately describe a set of data!! 16

17 Same Location Different Spread 17

18 Computing Measures of Spread Suppose there are n observations, X 1, X 2,.., X n Range = X max X min Mean absolute deviation = MAD = Xi n - X Variance = s 2 = n 2 X) Standard deviation = SD = s = (X- i (X- i n 2 X) Interquartile range = IQR = Q 3 Q 1 FACTS: The range, MAD, variance. SD and IQR all measure the amount of variation (spread) in the data. All measures except the MAD and IQR are sensitive to extreme observations known as outliers. MAD and IQR are robust measures of spread. 18

19 Observed data (days), n = 54: Summary for LOS Example Location X = 10.7 days M = 8 days Q 1 = 6 days Q 3 = 12.5 days Spread R = 31 days SD = 7.2 days MAD = 5.1 days IQR = 6.5 days 19

20 The Boxplot The boxplot is a convenient way of depicting the distribution of data using measures of location and spread. The most important parts of a boxplot correspond to the lower and upper quartiles, the median, and the mean. Sometimes known as a box-and-whisker plot. 20

21 Inner Fence Q x IQR Anatomy of a Boxplot Q 3 Median Q 1 Inner Fence Q1-1.5 x IQR + Mean 21

22 Schematic Plots LOS *--+--* *-----* *-----* *-----* West 2 South 3 North 4 East Side-by-Side Boxplots LOS for Four Nursing Stations Nursing Station 22

23 Salary Levels, by Gender 23

24 24

25 REFERENCES Hoaglin DC, Mosteller F, Tukey JW. Understanding Robust and Exploratory Data Analysis. Wiley Series in Probability and Mathematical Statistics. John Wiley & Sons, Inc., Chambers JM, Cleveland WS, Kleiner B, Tukey PA. Graphical Methods for Data Analysis. Wadsworth Statistics/Probability Series. Duxbury Press, Mosteller, Tukey JW. Data Analysis and Regression, A Second Course in Statistics. Addison-Wesley Series in Behavioral Science: Quantitative Methods, Velleman PF, Hoaglin DC. Applications, Basics, and Computing of Exploratory Data Analysis (A-B-Cs of EDA). PWS Publishers, Duxbury Press,

26 Introduction to Exploratory Data Analysis (EDA) Data Transformations Part 1

27 REFERENCES Hoaglin DC, Mosteller F, Tukey JW. Understanding Robust and Exploratory Data Analysis. Wiley Series in Probability and Mathematical Statistics. John Wiley & Sons, Inc., Chambers JM, Cleveland WS, Kleiner B, Tukey PA. Graphical Methods for Data Analysis. Wadsworth Statistics/Probability Series. Duxbury Press, Mosteller, Tukey JW. Data Analysis and Regression, A Second Course in Statistics. Addison-Wesley Series in Behavioral Science: Quantitative Methods, Velleman PF, Hoaglin DC. Applications, Basics, and Computing of Exploratory Data Analysis (A-B-Cs of EDA). PWS Publishers, Duxbury Press,

28 Why Transform* Data? 1. Classical Inference a. To achieve homoscedasticity (ANOVA, t-test do not work with unequal variances) b. To achieve normality c. To straighten out plots d. To conform to known physical laws 2. Exploratory Data Analysis (EDA) a. To symmetrize/normalize b. To explore data c. To compare distributions d. To linearize plots e. To create confusion (??) * EDAers use the work re-express 28

29 Displaying Data Using a Stem-and-Leaf Plot LOS for 54 Pneumonia Patients Hypothetical Example Observed data (days), n = 54:

30 Constructing a Stem-and Leaf Plot <=== Step 4 === represents 4th data point, 11.0; and so on <=== Steps 1 and 2 === represent 1st and 2nd data points, 8.0 and 8.0 <=== Step 3 === represents 3rd data point, 4.0 Continue to fill in the plot until all data points have been plotted. Note that the data do not have to be entered in sorted order. 30

31 Los for 54 Pneumonia Patients Stem Leaf # Boxplot * *-----*

32 Displaying Data (The EDA Way) 1. Stem-and-Leaf Displays (Organize Data) 2. Letter-Value Displays (Summarize Data) Example 1: Bilirubin of 95 Patients Who Underwent the Whipple Procedure , With Pathological Dx of Cancer 14.5 Pancreas 13.1 Pancreas 8.1 Bile Duct 31.3 Ampulla 12.6 Pancreas 4.2 Other 22.2 Bile Duct

33 Whipple Procedure Bilirubin of 95 patients Stem Leaf #

34 Example 2 Zinc levels in patients with Epidermoid Cancer of the head and neck Patients with stable nutritional status = 25 Stem Leaf # Multiply Stem.Leaf by 10**+1 Patients with impaired nutritional status = 25 Stem Leaf # Multiply Stem.Leaf by 10**+1 34

35 Letter-Value Displays Extremes (1) d(1) = 1 Sixteenths (D) d(d) = ( [d(e)] + 1) / 2 Eighths (E) d(e) = ( [d(h)] + 1) / 2 Hinges (H) d(h) = ( [d(m)] + 1) / 2 Median (M) d(m) = (n+1) / 2 Mid-Summaries mid 1 Mid-range = (min + max)/2 mid D Mid-sixteenth = (D L + D U )/2 mid E Mid-eighth = (D L + D U )/2 mid H Mid-hinge = (H L + H U )/2 med Median Spreads 1 spread D spread E spread H spread range = max min = D U - D L = E U - E L Interquartile range = H U - H L

36 Bilirubin (n=95) Letter-Value-Displays for the Examples LOWER UPPER MID SPREAD M H E D Zinc-Stable (n=25) LOWER UPPER MID SPREAD M H E D Zinc-Impaired (n=25) DEPTH DEPTH DEPTH LOWER UPPER MID SPREAD M H E D

37 Look at Skewness Bilirubin MID M 9.9 H 9.1 E D Zinc Stable MID M 94 H 93.5 E 93.5 D Mid-Summaries increasing === Skewed RIGHT Not much of a trend - fairly symmetric Zinc Impaired MID M 73 H 71 E 67.5 D Mid-Summaries decreasing === Slightly Skewed LEFT 37

38 Choice of a Transformation Ladder of Powers: X X P P Transformation Name Naturals X 2 square 1 X raw ½ x square root counts (0) log X logarithm biochemical measures 1-1/2 reciprocal x -1-1/X reciprocal waiting times (=> rates) -2-1/X Note: Use of negative multipler for p<0 preserves natural order 38

39 P > 1 Effect of Transformation X X p Pull in Stretched-out Lower tail Stretch out Bunched-in Upper tail P>1 X X X p P < 1 Pull in Stretched-out Upper Tail Stretch out Bunched-in Lower Tail P<1 X X p 39

40 Bilirubin Data Effect of Transformation: An Example Mid-raw Mid- Mid-log (ln) M H E D Skewed Right About Right? (symmetric?) Skewed Left Ladder of p = 1 1/2 0 Powers Seems to stretch out lower tail too much!! 40

41 Effect of Transformation for Bilirubin Data Raw data Square root log 41

42 STARTS Problem: Can t take log x for x 0 Can t take even roots - x, 4 6 x, x, etc. for x 0 Some Solutions: 1. Use log (x+c) instead of log x (c is the Start ) c should be small compared to the typical size of data values. e.g. log (x+¼) log (x+½) log (x+1) 2. If all x s are negative, it is easier and better to simply multiply by -1 first, then take logs or even roots. 3. If only some x s are negative, then adding a constant might be ok. 42

43 Comparing To The Normal Distribution After transforming a data set to a (roughly) symmetric shape, can the new distribution be compared to normality? Yes - Compare spreads to normal spreads Name Spreads For N (0,1) Distribution Spread H E D (See Velleman & Hoaglin for more) If distribution is normal, then the quotients (H-Spread) / (E-Spread) / Should be nearly equal (D-Spread) / If quotients increase then heavy tails. If quotients decrease than light tails. Note: Can use (H-Spread) / as estimate of 43

44 Since Comparison to Normality: An Example Bilirubin we ll look at that is quite symmetric Bilirubin Spread s M - H (= 1.85 / 1.349) E (= 3.80 / 2.301) D (= 4.36 / 3.068) Also, look at zinc-stable Zinc M H E D Spread s

45 A. AMOUNTS AND COUNTS log x x 1/2 x -1 Transformations Useful in Common Situations Example: White blood counts, glucose levels, number of patients seen in clinic per month. * log is especially useful if the ratio of the largest to smallest observation is large. B. BALANCES (i.e., real numbers) Often not transformed, but if necessary do it!! Example: Deviation from ideal body weight C. COUNTED FRACTIONS x x - A i.e., p orp n B - A * use folded values with transform (p) = f (p) f (1-p) [symmetry is natural] froots: flogs: p pluralitie s : - 1- p logit (p)logp(1- p)logp - p -(1- p)2p-1 log(1- p) Example: proportion of patients responding to rx percentage of sperm with oval shape D. RANKS (i.e., 1, 2, 3,, n) similar to fractions 45

46 Another Example Duration of operation for 100 patients with Epidural Anesthesia (time recorded in minutes) DEPTH LOWER UPPER MID SPREAD M H E D ** Stretched out Upper Tail Suggests X p with p<1 Stem Leaf # Multiply Stem.Leaf by 10**+1 46

47 Since log (p = 0) is slightly skewed right and 100 / OPTIME (p = -1) is skewed left, then a power between 0 and -1 might work Try p = -1/2 i.e., 100 OPTIME Stem Leaf # Multiply Stem.Leaf by 10**+1 47

48 p = - 1/2 100 OPTIME MID M = H = E = D = = Stem Leaf #

49 MID p = 0 log (OPTIME) M = 4.2 H = E = 4.3 D = = Stem Leaf # Multiply Stem.Leaf by 10**-1 Pretty good!!? 49

50 p = / OPTIME MID M = H = E = D = = Stem Leaf # Multiply Stem.Leaf by 10**-1 Now Skewed to the low end 50

51 p = 1/2 OPTIME MID M = 8.22 H = 8.62 E = 8.83 D = = 9.72 Stem Leaf # Less skewness, but it still exists 51

52 Comparing OPTIME Spreads to Normal Distribution Standardized Spread -100 OPTIME OPTIME OP TIM E log -100 OPTIME H E D about right? 52

53 Graphical Comparison of OPTIME Spreads to Normal Distribution OPTIME -100 OPTIME OPTIME -100 OPTIME log(optime) 53

54 Example 4: Peak Common Bile Duct Pressure During an operation, common bile duct pressure is measured every 2 minutes for 20 minutes. The ratio of pressure at time t to baseline (t = 0) is calculated. The peak ratio is recorded. Peak Ratio STD MID SPR SPR M H E D Peak Ratio Stem Leaf # Multiply Stem.Leaf by 10**-1 Stem Leaf # STD MID SPR SPR M H E D

55 A look ahead.. Variance stabilization Straightening x-y plots Interpretation and reporting 55

Martin L. Lesser, PhD Biostatistics Unit Feinstein Institute for Medical Research North Shore-LIJ Health System

Martin L. Lesser, PhD Biostatistics Unit Feinstein Institute for Medical Research North Shore-LIJ Health System PREP Course #13: Introduction to Exploratory Data Analysis and Data Transformations (Part 2) Martin L. Lesser, PhD Biostatistics Unit Feinstein Institute for Medical Research North Shore-LIJ Health System

More information

1. Exploratory Data Analysis

1. Exploratory Data Analysis 1. Exploratory Data Analysis 1.1 Methods of Displaying Data A visual display aids understanding and can highlight features which may be worth exploring more formally. Displays should have impact and be

More information

Chapter 2: Tools for Exploring Univariate Data

Chapter 2: Tools for Exploring Univariate Data Stats 11 (Fall 2004) Lecture Note Introduction to Statistical Methods for Business and Economics Instructor: Hongquan Xu Chapter 2: Tools for Exploring Univariate Data Section 2.1: Introduction What is

More information

F78SC2 Notes 2 RJRC. If the interest rate is 5%, we substitute x = 0.05 in the formula. This gives

F78SC2 Notes 2 RJRC. If the interest rate is 5%, we substitute x = 0.05 in the formula. This gives F78SC2 Notes 2 RJRC Algebra It is useful to use letters to represent numbers. We can use the rules of arithmetic to manipulate the formula and just substitute in the numbers at the end. Example: 100 invested

More information

STAT 200 Chapter 1 Looking at Data - Distributions

STAT 200 Chapter 1 Looking at Data - Distributions STAT 200 Chapter 1 Looking at Data - Distributions What is Statistics? Statistics is a science that involves the design of studies, data collection, summarizing and analyzing the data, interpreting the

More information

Chapter 1: Exploring Data

Chapter 1: Exploring Data Chapter 1: Exploring Data Section 1.3 with Numbers The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE Chapter 1 Exploring Data Introduction: Data Analysis: Making Sense of Data 1.1

More information

CHAPTER 1. Introduction

CHAPTER 1. Introduction CHAPTER 1 Introduction Engineers and scientists are constantly exposed to collections of facts, or data. The discipline of statistics provides methods for organizing and summarizing data, and for drawing

More information

Chapter 3. Data Description

Chapter 3. Data Description Chapter 3. Data Description Graphical Methods Pie chart It is used to display the percentage of the total number of measurements falling into each of the categories of the variable by partition a circle.

More information

Chapter 4. Displaying and Summarizing. Quantitative Data

Chapter 4. Displaying and Summarizing. Quantitative Data STAT 141 Introduction to Statistics Chapter 4 Displaying and Summarizing Quantitative Data Bin Zou (bzou@ualberta.ca) STAT 141 University of Alberta Winter 2015 1 / 31 4.1 Histograms 1 We divide the range

More information

Lecture 2 and Lecture 3

Lecture 2 and Lecture 3 Lecture 2 and Lecture 3 1 Lecture 2 and Lecture 3 We can describe distributions using 3 characteristics: shape, center and spread. These characteristics have been discussed since the foundation of statistics.

More information

2011 Pearson Education, Inc

2011 Pearson Education, Inc Statistics for Business and Economics Chapter 2 Methods for Describing Sets of Data Summary of Central Tendency Measures Measure Formula Description Mean x i / n Balance Point Median ( n +1) Middle Value

More information

STP 420 INTRODUCTION TO APPLIED STATISTICS NOTES

STP 420 INTRODUCTION TO APPLIED STATISTICS NOTES INTRODUCTION TO APPLIED STATISTICS NOTES PART - DATA CHAPTER LOOKING AT DATA - DISTRIBUTIONS Individuals objects described by a set of data (people, animals, things) - all the data for one individual make

More information

Describing Distributions with Numbers

Describing Distributions with Numbers Describing Distributions with Numbers Using graphs, we could determine the center, spread, and shape of the distribution of a quantitative variable. We can also use numbers (called summary statistics)

More information

Units. Exploratory Data Analysis. Variables. Student Data

Units. Exploratory Data Analysis. Variables. Student Data Units Exploratory Data Analysis Bret Larget Departments of Botany and of Statistics University of Wisconsin Madison Statistics 371 13th September 2005 A unit is an object that can be measured, such as

More information

Describing distributions with numbers

Describing distributions with numbers Describing distributions with numbers A large number or numerical methods are available for describing quantitative data sets. Most of these methods measure one of two data characteristics: The central

More information

Exploratory data analysis: numerical summaries

Exploratory data analysis: numerical summaries 16 Exploratory data analysis: numerical summaries The classical way to describe important features of a dataset is to give several numerical summaries We discuss numerical summaries for the center of a

More information

CHAPTER 5: EXPLORING DATA DISTRIBUTIONS. Individuals are the objects described by a set of data. These individuals may be people, animals or things.

CHAPTER 5: EXPLORING DATA DISTRIBUTIONS. Individuals are the objects described by a set of data. These individuals may be people, animals or things. (c) Epstein 2013 Chapter 5: Exploring Data Distributions Page 1 CHAPTER 5: EXPLORING DATA DISTRIBUTIONS 5.1 Creating Histograms Individuals are the objects described by a set of data. These individuals

More information

BIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke

BIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke BIOL 51A - Biostatistics 1 1 Lecture 1: Intro to Biostatistics Smoking: hazardous? FEV (l) 1 2 3 4 5 No Yes Smoke BIOL 51A - Biostatistics 1 2 Box Plot a.k.a box-and-whisker diagram or candlestick chart

More information

are the objects described by a set of data. They may be people, animals or things.

are the objects described by a set of data. They may be people, animals or things. ( c ) E p s t e i n, C a r t e r a n d B o l l i n g e r 2016 C h a p t e r 5 : E x p l o r i n g D a t a : D i s t r i b u t i o n s P a g e 1 CHAPTER 5: EXPLORING DATA DISTRIBUTIONS 5.1 Creating Histograms

More information

Chapter 3. Measuring data

Chapter 3. Measuring data Chapter 3 Measuring data 1 Measuring data versus presenting data We present data to help us draw meaning from it But pictures of data are subjective They re also not susceptible to rigorous inference Measuring

More information

What is statistics? Statistics is the science of: Collecting information. Organizing and summarizing the information collected

What is statistics? Statistics is the science of: Collecting information. Organizing and summarizing the information collected What is statistics? Statistics is the science of: Collecting information Organizing and summarizing the information collected Analyzing the information collected in order to draw conclusions Two types

More information

CHAPTER 2: Describing Distributions with Numbers

CHAPTER 2: Describing Distributions with Numbers CHAPTER 2: Describing Distributions with Numbers The Basic Practice of Statistics 6 th Edition Moore / Notz / Fligner Lecture PowerPoint Slides Chapter 2 Concepts 2 Measuring Center: Mean and Median Measuring

More information

Describing distributions with numbers

Describing distributions with numbers Describing distributions with numbers A large number or numerical methods are available for describing quantitative data sets. Most of these methods measure one of two data characteristics: The central

More information

1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved.

1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved. 1-1 Chapter 1 Sampling and Descriptive Statistics 1-2 Why Statistics? Deal with uncertainty in repeated scientific measurements Draw conclusions from data Design valid experiments and draw reliable conclusions

More information

Further Mathematics 2018 CORE: Data analysis Chapter 2 Summarising numerical data

Further Mathematics 2018 CORE: Data analysis Chapter 2 Summarising numerical data Chapter 2: Summarising numerical data Further Mathematics 2018 CORE: Data analysis Chapter 2 Summarising numerical data Extract from Study Design Key knowledge Types of data: categorical (nominal and ordinal)

More information

Objective A: Mean, Median and Mode Three measures of central of tendency: the mean, the median, and the mode.

Objective A: Mean, Median and Mode Three measures of central of tendency: the mean, the median, and the mode. Chapter 3 Numerically Summarizing Data Chapter 3.1 Measures of Central Tendency Objective A: Mean, Median and Mode Three measures of central of tendency: the mean, the median, and the mode. A1. Mean The

More information

TOPIC: Descriptive Statistics Single Variable

TOPIC: Descriptive Statistics Single Variable TOPIC: Descriptive Statistics Single Variable I. Numerical data summary measurements A. Measures of Location. Measures of central tendency Mean; Median; Mode. Quantiles - measures of noncentral tendency

More information

MgtOp 215 Chapter 3 Dr. Ahn

MgtOp 215 Chapter 3 Dr. Ahn MgtOp 215 Chapter 3 Dr. Ahn Measures of central tendency (center, location): measures the middle point of a distribution or data; these include mean and median. Measures of dispersion (variability, spread):

More information

Statistics and parameters

Statistics and parameters Statistics and parameters Tables, histograms and other charts are used to summarize large amounts of data. Often, an even more extreme summary is desirable. Statistics and parameters are numbers that characterize

More information

Chapter 2: Descriptive Analysis and Presentation of Single- Variable Data

Chapter 2: Descriptive Analysis and Presentation of Single- Variable Data Chapter 2: Descriptive Analysis and Presentation of Single- Variable Data Mean 26.86667 Standard Error 2.816392 Median 25 Mode 20 Standard Deviation 10.90784 Sample Variance 118.981 Kurtosis -0.61717 Skewness

More information

Descriptive statistics

Descriptive statistics Patrick Breheny February 6 Patrick Breheny to Biostatistics (171:161) 1/25 Tables and figures Human beings are not good at sifting through large streams of data; we understand data much better when it

More information

What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty.

What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty. What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty. Statistics is a field of study concerned with the data collection,

More information

Descriptive Univariate Statistics and Bivariate Correlation

Descriptive Univariate Statistics and Bivariate Correlation ESC 100 Exploring Engineering Descriptive Univariate Statistics and Bivariate Correlation Instructor: Sudhir Khetan, Ph.D. Wednesday/Friday, October 17/19, 2012 The Central Dogma of Statistics used to

More information

CIVL 7012/8012. Collection and Analysis of Information

CIVL 7012/8012. Collection and Analysis of Information CIVL 7012/8012 Collection and Analysis of Information Uncertainty in Engineering Statistics deals with the collection and analysis of data to solve real-world problems. Uncertainty is inherent in all real

More information

Performance of fourth-grade students on an agility test

Performance of fourth-grade students on an agility test Starter Ch. 5 2005 #1a CW Ch. 4: Regression L1 L2 87 88 84 86 83 73 81 67 78 83 65 80 50 78 78? 93? 86? Create a scatterplot Find the equation of the regression line Predict the scores Chapter 5: Understanding

More information

P8130: Biostatistical Methods I

P8130: Biostatistical Methods I P8130: Biostatistical Methods I Lecture 2: Descriptive Statistics Cody Chiuzan, PhD Department of Biostatistics Mailman School of Public Health (MSPH) Lecture 1: Recap Intro to Biostatistics Types of Data

More information

Statistics for Managers using Microsoft Excel 6 th Edition

Statistics for Managers using Microsoft Excel 6 th Edition Statistics for Managers using Microsoft Excel 6 th Edition Chapter 3 Numerical Descriptive Measures 3-1 Learning Objectives In this chapter, you learn: To describe the properties of central tendency, variation,

More information

MATH 1150 Chapter 2 Notation and Terminology

MATH 1150 Chapter 2 Notation and Terminology MATH 1150 Chapter 2 Notation and Terminology Categorical Data The following is a dataset for 30 randomly selected adults in the U.S., showing the values of two categorical variables: whether or not the

More information

Lecture 6: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries)

Lecture 6: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries) Lecture 6: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries) Summarize with Shape, Center, Spread Displays: Stemplots, Histograms Five Number Summary, Outliers, Boxplots Cengage Learning

More information

Elementary Statistics

Elementary Statistics Elementary Statistics Q: What is data? Q: What does the data look like? Q: What conclusions can we draw from the data? Q: Where is the middle of the data? Q: Why is the spread of the data important? Q:

More information

Chapter2 Description of samples and populations. 2.1 Introduction.

Chapter2 Description of samples and populations. 2.1 Introduction. Chapter2 Description of samples and populations. 2.1 Introduction. Statistics=science of analyzing data. Information collected (data) is gathered in terms of variables (characteristics of a subject that

More information

3.1 Measure of Center

3.1 Measure of Center 3.1 Measure of Center Calculate the mean for a given data set Find the median, and describe why the median is sometimes preferable to the mean Find the mode of a data set Describe how skewness affects

More information

Measures of center. The mean The mean of a distribution is the arithmetic average of the observations:

Measures of center. The mean The mean of a distribution is the arithmetic average of the observations: Measures of center The mean The mean of a distribution is the arithmetic average of the observations: x = x 1 + + x n n n = 1 x i n i=1 The median The median is the midpoint of a distribution: the number

More information

Review: Central Measures

Review: Central Measures Review: Central Measures Mean, Median and Mode When do we use mean or median? If there is (are) outliers, use Median If there is no outlier, use Mean. Example: For a data 1, 1.2, 1.5, 1.7, 1.8, 1.9, 2.3,

More information

Perhaps the most important measure of location is the mean (average). Sample mean: where n = sample size. Arrange the values from smallest to largest:

Perhaps the most important measure of location is the mean (average). Sample mean: where n = sample size. Arrange the values from smallest to largest: 1 Chapter 3 - Descriptive stats: Numerical measures 3.1 Measures of Location Mean Perhaps the most important measure of location is the mean (average). Sample mean: where n = sample size Example: The number

More information

Unit Two Descriptive Biostatistics. Dr Mahmoud Alhussami

Unit Two Descriptive Biostatistics. Dr Mahmoud Alhussami Unit Two Descriptive Biostatistics Dr Mahmoud Alhussami Descriptive Biostatistics The best way to work with data is to summarize and organize them. Numbers that have not been summarized and organized are

More information

1 Measures of the Center of a Distribution

1 Measures of the Center of a Distribution 1 Measures of the Center of a Distribution Qualitative descriptions of the shape of a distribution are important and useful. But we will often desire the precision of numerical summaries as well. Two aspects

More information

Statistics I Chapter 2: Univariate data analysis

Statistics I Chapter 2: Univariate data analysis Statistics I Chapter 2: Univariate data analysis Chapter 2: Univariate data analysis Contents Graphical displays for categorical data (barchart, piechart) Graphical displays for numerical data data (histogram,

More information

Histograms allow a visual interpretation

Histograms allow a visual interpretation Chapter 4: Displaying and Summarizing i Quantitative Data s allow a visual interpretation of quantitative (numerical) data by indicating the number of data points that lie within a range of values, called

More information

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency Math 1 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency The word average: is very ambiguous and can actually refer to the mean, median, mode or midrange. Notation:

More information

Types of Information. Topic 2 - Descriptive Statistics. Examples. Sample and Sample Size. Background Reading. Variables classified as STAT 511

Types of Information. Topic 2 - Descriptive Statistics. Examples. Sample and Sample Size. Background Reading. Variables classified as STAT 511 Topic 2 - Descriptive Statistics STAT 511 Professor Bruce Craig Types of Information Variables classified as Categorical (qualitative) - variable classifies individual into one of several groups or categories

More information

ADMS2320.com. We Make Stats Easy. Chapter 4. ADMS2320.com Tutorials Past Tests. Tutorial Length 1 Hour 45 Minutes

ADMS2320.com. We Make Stats Easy. Chapter 4. ADMS2320.com Tutorials Past Tests. Tutorial Length 1 Hour 45 Minutes We Make Stats Easy. Chapter 4 Tutorial Length 1 Hour 45 Minutes Tutorials Past Tests Chapter 4 Page 1 Chapter 4 Note The following topics will be covered in this chapter: Measures of central location Measures

More information

Sections 2.3 and 2.4

Sections 2.3 and 2.4 1 / 24 Sections 2.3 and 2.4 Note made by: Dr. Timothy Hanson Instructor: Peijie Hou Department of Statistics, University of South Carolina Stat 205: Elementary Statistics for the Biological and Life Sciences

More information

Determining the Spread of a Distribution

Determining the Spread of a Distribution Determining the Spread of a Distribution 1.3-1.5 Cathy Poliak, Ph.D. cathy@math.uh.edu Department of Mathematics University of Houston Lecture 3-2311 Lecture 3-2311 1 / 58 Outline 1 Describing Quantitative

More information

Determining the Spread of a Distribution

Determining the Spread of a Distribution Determining the Spread of a Distribution 1.3-1.5 Cathy Poliak, Ph.D. cathy@math.uh.edu Department of Mathematics University of Houston Lecture 3-2311 Lecture 3-2311 1 / 58 Outline 1 Describing Quantitative

More information

Statistics I Chapter 2: Univariate data analysis

Statistics I Chapter 2: Univariate data analysis Statistics I Chapter 2: Univariate data analysis Chapter 2: Univariate data analysis Contents Graphical displays for categorical data (barchart, piechart) Graphical displays for numerical data data (histogram,

More information

Practice problems from chapters 2 and 3

Practice problems from chapters 2 and 3 Practice problems from chapters and 3 Question-1. For each of the following variables, indicate whether it is quantitative or qualitative and specify which of the four levels of measurement (nominal, ordinal,

More information

OBJECTIVES INTRODUCTION

OBJECTIVES INTRODUCTION M7 Chapter 3 Section 1 OBJECTIVES Suarize data using easures of central tendency, such as the ean, edian, ode, and idrange. Describe data using the easures of variation, such as the range, variance, and

More information

Descriptive Statistics-I. Dr Mahmoud Alhussami

Descriptive Statistics-I. Dr Mahmoud Alhussami Descriptive Statistics-I Dr Mahmoud Alhussami Biostatistics What is the biostatistics? A branch of applied math. that deals with collecting, organizing and interpreting data using well-defined procedures.

More information

Lecture 2. Quantitative variables. There are three main graphical methods for describing, summarizing, and detecting patterns in quantitative data:

Lecture 2. Quantitative variables. There are three main graphical methods for describing, summarizing, and detecting patterns in quantitative data: Lecture 2 Quantitative variables There are three main graphical methods for describing, summarizing, and detecting patterns in quantitative data: Stemplot (stem-and-leaf plot) Histogram Dot plot Stemplots

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics CHAPTER OUTLINE 6-1 Numerical Summaries of Data 6- Stem-and-Leaf Diagrams 6-3 Frequency Distributions and Histograms 6-4 Box Plots 6-5 Time Sequence Plots 6-6 Probability Plots Chapter

More information

After completing this chapter, you should be able to:

After completing this chapter, you should be able to: Chapter 2 Descriptive Statistics Chapter Goals After completing this chapter, you should be able to: Compute and interpret the mean, median, and mode for a set of data Find the range, variance, standard

More information

Chapter 3: Displaying and summarizing quantitative data p52 The pattern of variation of a variable is called its distribution.

Chapter 3: Displaying and summarizing quantitative data p52 The pattern of variation of a variable is called its distribution. Chapter 3: Displaying and summarizing quantitative data p52 The pattern of variation of a variable is called its distribution. 1 Histograms p53 The breakfast cereal data Study collected data on nutritional

More information

MATH 117 Statistical Methods for Management I Chapter Three

MATH 117 Statistical Methods for Management I Chapter Three Jubail University College MATH 117 Statistical Methods for Management I Chapter Three This chapter covers the following topics: I. Measures of Center Tendency. 1. Mean for Ungrouped Data (Raw Data) 2.

More information

QUANTITATIVE DATA. UNIVARIATE DATA data for one variable

QUANTITATIVE DATA. UNIVARIATE DATA data for one variable QUANTITATIVE DATA Recall that quantitative (numeric) data values are numbers where data take numerical values for which it is sensible to find averages, such as height, hourly pay, and pulse rates. UNIVARIATE

More information

University of Jordan Fall 2009/2010 Department of Mathematics

University of Jordan Fall 2009/2010 Department of Mathematics handouts Part 1 (Chapter 1 - Chapter 5) University of Jordan Fall 009/010 Department of Mathematics Chapter 1 Introduction to Introduction; Some Basic Concepts Statistics is a science related to making

More information

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1 Lecture Slides Elementary Statistics Tenth Edition and the Triola Statistics Series by Mario F. Triola Slide 1 Chapter 3 Statistics for Describing, Exploring, and Comparing Data 3-1 Overview 3-2 Measures

More information

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

Full file at

Full file at IV SOLUTIONS TO EXERCISES Note: Exercises whose answers are given in the back of the textbook are denoted by the symbol. CHAPTER Description of Samples and Populations Note: Exercises whose answers are

More information

Tastitsticsss? What s that? Principles of Biostatistics and Informatics. Variables, outcomes. Tastitsticsss? What s that?

Tastitsticsss? What s that? Principles of Biostatistics and Informatics. Variables, outcomes. Tastitsticsss? What s that? Tastitsticsss? What s that? Statistics describes random mass phanomenons. Principles of Biostatistics and Informatics nd Lecture: Descriptive Statistics 3 th September Dániel VERES Data Collecting (Sampling)

More information

CHAPTER 2 Description of Samples and Populations

CHAPTER 2 Description of Samples and Populations Chapter 2 27 CHAPTER 2 Description of Samples and Populations 2.1.1 (a) i) Molar width ii) Continuous variable iii) A molar iv) 36 (b) i) Birthweight, date of birth, and race ii) Birthweight is continuous,

More information

Statistics in medicine

Statistics in medicine Statistics in medicine Lecture 1- part 1: Describing variation, and graphical presentation Outline Sources of variation Types of variables Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease

More information

AP Final Review II Exploring Data (20% 30%)

AP Final Review II Exploring Data (20% 30%) AP Final Review II Exploring Data (20% 30%) Quantitative vs Categorical Variables Quantitative variables are numerical values for which arithmetic operations such as means make sense. It is usually a measure

More information

Section 2.4. Measuring Spread. How Can We Describe the Spread of Quantitative Data? Review: Central Measures

Section 2.4. Measuring Spread. How Can We Describe the Spread of Quantitative Data? Review: Central Measures mean median mode Review: entral Measures Mean, Median and Mode When do we use mean or median? If there is (are) outliers, use Median If there is no outlier, use Mean. Example: For a data 1, 1., 1.5, 1.7,

More information

Review for Exam #1. Chapter 1. The Nature of Data. Definitions. Population. Sample. Quantitative data. Qualitative (attribute) data

Review for Exam #1. Chapter 1. The Nature of Data. Definitions. Population. Sample. Quantitative data. Qualitative (attribute) data Review for Exam #1 1 Chapter 1 Population the complete collection of elements (scores, people, measurements, etc.) to be studied Sample a subcollection of elements drawn from a population 11 The Nature

More information

Clinical Research Module: Biostatistics

Clinical Research Module: Biostatistics Clinical Research Module: Biostatistics Lecture 1 Alberto Nettel-Aguirre, PhD, PStat These lecture notes based on others developed by Drs. Peter Faris, Sarah Rose Luz Palacios-Derflingher and myself Who

More information

Chapter. Numerically Summarizing Data Pearson Prentice Hall. All rights reserved

Chapter. Numerically Summarizing Data Pearson Prentice Hall. All rights reserved Chapter 3 Numerically Summarizing Data Section 3.1 Measures of Central Tendency Objectives 1. Determine the arithmetic mean of a variable from raw data 2. Determine the median of a variable from raw data

More information

3 GRAPHICAL DISPLAYS OF DATA

3 GRAPHICAL DISPLAYS OF DATA some without indicating nonnormality. If a sample of 30 observations contains 4 outliers, two of which are extreme, would it be reasonable to assume the population from which the data were collected has

More information

A is one of the categories into which qualitative data can be classified.

A is one of the categories into which qualitative data can be classified. Chapter 2 Methods for Describing Sets of Data 2.1 Describing qualitative data Recall qualitative data: non-numerical or categorical data Basic definitions: A is one of the categories into which qualitative

More information

Section 3. Measures of Variation

Section 3. Measures of Variation Section 3 Measures of Variation Range Range = (maximum value) (minimum value) It is very sensitive to extreme values; therefore not as useful as other measures of variation. Sample Standard Deviation The

More information

Resistant Measure - A statistic that is not affected very much by extreme observations.

Resistant Measure - A statistic that is not affected very much by extreme observations. Chapter 1.3 Lecture Notes & Examples Section 1.3 Describing Quantitative Data with Numbers (pp. 50-74) 1.3.1 Measuring Center: The Mean Mean - The arithmetic average. To find the mean (pronounced x bar)

More information

Letter-value plots: Boxplots for large data

Letter-value plots: Boxplots for large data Letter-value plots: Boxplots for large data Heike Hofmann Karen Kafadar Hadley Wickham Dept of Statistics Dept of Statistics Dept of Statistics Iowa State University Indiana University Rice University

More information

Graphical Techniques Stem and Leaf Box plot Histograms Cumulative Frequency Distributions

Graphical Techniques Stem and Leaf Box plot Histograms Cumulative Frequency Distributions Class #8 Wednesday 9 February 2011 What did we cover last time? Description & Inference Robustness & Resistance Median & Quartiles Location, Spread and Symmetry (parallels from classical statistics: Mean,

More information

Lecture 1: Descriptive Statistics

Lecture 1: Descriptive Statistics Lecture 1: Descriptive Statistics MSU-STT-351-Sum 15 (P. Vellaisamy: MSU-STT-351-Sum 15) Probability & Statistics for Engineers 1 / 56 Contents 1 Introduction 2 Branches of Statistics Descriptive Statistics

More information

Unit 2: Numerical Descriptive Measures

Unit 2: Numerical Descriptive Measures Unit 2: Numerical Descriptive Measures Summation Notation Measures of Central Tendency Measures of Dispersion Chebyshev's Rule Empirical Rule Measures of Relative Standing Box Plots z scores Jan 28 10:48

More information

Lecture 3B: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries)

Lecture 3B: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries) Lecture 3B: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries) Summarize with Shape, Center, Spread Displays: Stemplots, Histograms Five Number Summary, Outliers, Boxplots Mean vs.

More information

MATH 1015: Life Science Statistics. Lecture Pack for Chapter 1 Weeks 1-3. Lecturer: Jennifer Chan Room: Carslaw Room 817 Telephone:

MATH 1015: Life Science Statistics. Lecture Pack for Chapter 1 Weeks 1-3. Lecturer: Jennifer Chan Room: Carslaw Room 817 Telephone: MATH 1015: Life Science Statistics Lecture Pack for Chapter 1 Weeks 1-3. Lecturer: Jennifer Chan Room: Carslaw Room 817 Telephone: 9351 4873. Text: Phipps, M. and Quine, M. (2001) A Primer of Statistics

More information

Slide 1. Slide 2. Slide 3. Pick a Brick. Daphne. 400 pts 200 pts 300 pts 500 pts 100 pts. 300 pts. 300 pts 400 pts 100 pts 400 pts.

Slide 1. Slide 2. Slide 3. Pick a Brick. Daphne. 400 pts 200 pts 300 pts 500 pts 100 pts. 300 pts. 300 pts 400 pts 100 pts 400 pts. Slide 1 Slide 2 Daphne Phillip Kathy Slide 3 Pick a Brick 100 pts 200 pts 500 pts 300 pts 400 pts 200 pts 300 pts 500 pts 100 pts 300 pts 400 pts 100 pts 400 pts 100 pts 200 pts 500 pts 100 pts 400 pts

More information

2.1 Measures of Location (P.9-11)

2.1 Measures of Location (P.9-11) MATH1015 Biostatistics Week.1 Measures of Location (P.9-11).1.1 Summation Notation Suppose that we observe n values from an experiment. This collection (or set) of n values is called a sample. Let x 1

More information

CHAPTER 1 Exploring Data

CHAPTER 1 Exploring Data CHAPTER 1 Exploring Data 1.3 Describing Quantitative Data with Numbers The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers 1.3 Reading Quiz True or false?

More information

Unit 2. Describing Data: Numerical

Unit 2. Describing Data: Numerical Unit 2 Describing Data: Numerical Describing Data Numerically Describing Data Numerically Central Tendency Arithmetic Mean Median Mode Variation Range Interquartile Range Variance Standard Deviation Coefficient

More information

2/2/2015 GEOGRAPHY 204: STATISTICAL PROBLEM SOLVING IN GEOGRAPHY MEASURES OF CENTRAL TENDENCY CHAPTER 3: DESCRIPTIVE STATISTICS AND GRAPHICS

2/2/2015 GEOGRAPHY 204: STATISTICAL PROBLEM SOLVING IN GEOGRAPHY MEASURES OF CENTRAL TENDENCY CHAPTER 3: DESCRIPTIVE STATISTICS AND GRAPHICS Spring 2015: Lembo GEOGRAPHY 204: STATISTICAL PROBLEM SOLVING IN GEOGRAPHY CHAPTER 3: DESCRIPTIVE STATISTICS AND GRAPHICS Descriptive statistics concise and easily understood summary of data set characteristics

More information

Summarising numerical data

Summarising numerical data 2 Core: Data analysis Chapter 2 Summarising numerical data 42 Core Chapter 2 Summarising numerical data 2A Dot plots and stem plots Even when we have constructed a frequency table, or a histogram to display

More information

BIOS 2041: Introduction to Statistical Methods

BIOS 2041: Introduction to Statistical Methods BIOS 2041: Introduction to Statistical Methods Abdus S Wahed* *Some of the materials in this chapter has been adapted from Dr. John Wilson s lecture notes for the same course. Chapter 0 2 Chapter 1 Introduction

More information

Chapter 2 Descriptive Statistics

Chapter 2 Descriptive Statistics Chapter 2 Descriptive Statistics Lecture 1: Measures of Central Tendency and Dispersion Donald E. Mercante, PhD Biostatistics May 2010 Biostatistics (LSUHSC) Chapter 2 05/10 1 / 34 Lecture 1: Descriptive

More information

Fundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur

Fundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur Fundamentals to Biostatistics Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur Statistics collection, analysis, interpretation of data development of new

More information

Math 140 Introductory Statistics

Math 140 Introductory Statistics Math 140 Introductory Statistics Professor Silvia Fernández Chapter 2 Based on the book Statistics in Action by A. Watkins, R. Scheaffer, and G. Cobb. Visualizing Distributions Recall the definition: The

More information

Math 140 Introductory Statistics

Math 140 Introductory Statistics Visualizing Distributions Math 140 Introductory Statistics Professor Silvia Fernández Chapter Based on the book Statistics in Action by A. Watkins, R. Scheaffer, and G. Cobb. Recall the definition: The

More information

1.3: Describing Quantitative Data with Numbers

1.3: Describing Quantitative Data with Numbers 1.3: Describing Quantitative Data with Numbers Section 1.3 Describing Quantitative Data with Numbers After this section, you should be able to MEASURE center with the mean and median MEASURE spread with

More information

Stat 20: Intro to Probability and Statistics

Stat 20: Intro to Probability and Statistics Stat 20: Intro to Probability and Statistics Lecture 5: Summary Statistics Tessa L. Childers-Day UC Berkeley 30 June 2014 By the end of this lecture... You will be able to: Describe a data set by its:

More information