BIOS 2041: Introduction to Statistical Methods

Similar documents
BIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke

STAT 200 Chapter 1 Looking at Data - Distributions

What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty.

Describing distributions with numbers

Describing distributions with numbers

CIVL 7012/8012. Collection and Analysis of Information

Chapter 2 Descriptive Statistics

Last Lecture. Distinguish Populations from Samples. Knowing different Sampling Techniques. Distinguish Parameters from Statistics

Tastitsticsss? What s that? Principles of Biostatistics and Informatics. Variables, outcomes. Tastitsticsss? What s that?

Summarizing and Displaying Measurement Data/Understanding and Comparing Distributions

P8130: Biostatistical Methods I

Elementary Statistics

200 participants [EUR] ( =60) 200 = 30% i.e. nearly a third of the phone bills are greater than 75 EUR

Unit 2. Describing Data: Numerical

1. Exploratory Data Analysis

Units. Exploratory Data Analysis. Variables. Student Data

MEASURES OF LOCATION AND SPREAD

1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved.

Measures of center. The mean The mean of a distribution is the arithmetic average of the observations:

Statistics in medicine

What is statistics? Statistics is the science of: Collecting information. Organizing and summarizing the information collected

ST Presenting & Summarising Data Descriptive Statistics. Frequency Distribution, Histogram & Bar Chart

Descriptive statistics

University of Jordan Fall 2009/2010 Department of Mathematics

Lecture 1: Descriptive Statistics

Chapter 3. Measuring data

BNG 495 Capstone Design. Descriptive Statistics

Descriptive Statistics-I. Dr Mahmoud Alhussami

A is one of the categories into which qualitative data can be classified.

CHAPTER 1. Introduction

are the objects described by a set of data. They may be people, animals or things.

UNIVERSITY OF MASSACHUSETTS Department of Biostatistics and Epidemiology BioEpi 540W - Introduction to Biostatistics Fall 2004

Fundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur

Preliminary Statistics course. Lecture 1: Descriptive Statistics

Clinical Research Module: Biostatistics

CHAPTER 5: EXPLORING DATA DISTRIBUTIONS. Individuals are the objects described by a set of data. These individuals may be people, animals or things.

Week 1: Intro to R and EDA

AP Final Review II Exploring Data (20% 30%)

Exploring, summarizing and presenting data. Berghold, IMI, MUG

TOPIC: Descriptive Statistics Single Variable

22S:105 Statistical Methods and Computing. Graphical Depiction of Qualitative and Quantitative Data and Measures of Central Tendency

2011 Pearson Education, Inc

Statistics for Managers using Microsoft Excel 6 th Edition

Chapter 1. Looking at Data

Full file at

Practice problems from chapters 2 and 3

Chapter 3. Data Description

Measures of disease spread

Unit 1 Summarizing Data

Unit Two Descriptive Biostatistics. Dr Mahmoud Alhussami

Chapter2 Description of samples and populations. 2.1 Introduction.

Descriptive Statistics

Lecture 2 and Lecture 3

DEPARTMENT OF QUANTITATIVE METHODS & INFORMATION SYSTEMS QM 120. Spring 2008

Lecture 2. Quantitative variables. There are three main graphical methods for describing, summarizing, and detecting patterns in quantitative data:

Resistant Measure - A statistic that is not affected very much by extreme observations.

Descriptive Univariate Statistics and Bivariate Correlation

SUMMARIZING MEASURED DATA. Gaia Maselli

Chapter 4. Displaying and Summarizing. Quantitative Data

CHAPTER 2: Describing Distributions with Numbers

F78SC2 Notes 2 RJRC. If the interest rate is 5%, we substitute x = 0.05 in the formula. This gives

ECLT 5810 Data Preprocessing. Prof. Wai Lam

Statistics I Chapter 2: Univariate data analysis

STT 315 This lecture is based on Chapter 2 of the textbook.

Biostatistics for biomedical profession. BIMM34 Karin Källen & Linda Hartman November-December 2015

(quantitative or categorical variables) Numerical descriptions of center, variability, position (quantitative variables)

Chapter. Numerically Summarizing Data Pearson Prentice Hall. All rights reserved

Chapter 1 - Lecture 3 Measures of Location

AP Statistics Semester I Examination Section I Questions 1-30 Spend approximately 60 minutes on this part of the exam.

Vocabulary: Samples and Populations

Chapter 1: Exploring Data

Quantitative Tools for Research

ADMS2320.com. We Make Stats Easy. Chapter 4. ADMS2320.com Tutorials Past Tests. Tutorial Length 1 Hour 45 Minutes

Introduction to Statistics

Statistics I Chapter 2: Univariate data analysis

STP 420 INTRODUCTION TO APPLIED STATISTICS NOTES

Descriptive Data Summarization

Chapter. Numerically Summarizing Data. Copyright 2013, 2010 and 2007 Pearson Education, Inc.

Description of Samples and Populations

Answer keys for Assignment 10: Measurement of study variables (The correct answer is underlined in bold text)

1.3.1 Measuring Center: The Mean

Unit 1 Summarizing Data

Chapter 2 Class Notes Sample & Population Descriptions Classifying variables

CHAPTER 2 Description of Samples and Populations

Math 223 Lecture Notes 3/15/04 From The Basic Practice of Statistics, bymoore

The Empirical Rule, z-scores, and the Rare Event Approach

Chapter 1 Descriptive Statistics

COMPLEMENTARY EXERCISES WITH DESCRIPTIVE STATISTICS

Determining the Spread of a Distribution

3rd Quartile. 1st Quartile) Minimum

Determining the Spread of a Distribution

3 Lecture 3 Notes: Measures of Variation. The Boxplot. Definition of Probability

Discrete Multivariate Statistics

MATH 1150 Chapter 2 Notation and Terminology

Meelis Kull Autumn Meelis Kull - Autumn MTAT Data Mining - Lecture 03

Chapter 1: Introduction. Material from Devore s book (Ed 8), and Cengagebrain.com

Basics of Experimental Design. Review of Statistics. Basic Study. Experimental Design. When an Experiment is Not Possible. Studying Relations

Unit 1 Summarizing Data

Chapter Four. Numerical Descriptive Techniques. Range, Standard Deviation, Variance, Coefficient of Variation

Section 3. Measures of Variation

Transcription:

BIOS 2041: Introduction to Statistical Methods Abdus S Wahed* *Some of the materials in this chapter has been adapted from Dr. John Wilson s lecture notes for the same course.

Chapter 0 2

Chapter 1 Introduction to Statistical Methods 1.1 What is Statistics? Statistics Science of making inferences about specific random phenomena based on limited sample materials. The discipline provides methods for answering questions such as What effect does air pollution have on the residents of Pittsburgh? What proportion of Pittsburgh residents invest in stocks or bonds? Is drug A better than drug B in relieving certain asthma 3

symptoms? Does vitamin A prevent cancer? Based on this quarter s performance of stock returns, what strategy will optimize the expected return in the next quarter? A central task of statistical analysis is to draw a conclusion ( make inference ) about a population of interest based on evidence in a sample from that population. Population = the set of all subjects or individuals who could be measured for some variable of interest. Another viewpoint is that the population is the group about which you wish to draw a conclusion. Example: All women in Allegheny County A parameter is a numeric characteristic of a population. Example: Proportion of women in Allegheny County having a female relative who has been treated for breast cancer. Chapter 1 4

A sample is a subset of population selected for study. The idea is that the sample will provide the information used in drawing the conclusion about the population. Example: 200 Allegheny County women selected by randomdigit telephone dialing. A statistic is a numeric characteristic of a sample. Example: The observed proportion of women with a female relative who had been treated for breast cancer is 35%. Inference is the conclusion drawn about population on basis of sample. Example: The proportion of Allegheny County women having a female relative who has had breast cancer is 35%. Another example: Population: All patients treated for Acute Myelocytic Leukemia (AML) who are in first complete remission (CR1). Chapter 1 5

Parameter: Median duration of remission of treated AML patients in CR1. Sample: 35 AML / CR1 patients treated at the University of Pittsburgh Cancer Institute during 2007. Statistic: The median duration of CR1 in these 35 patients was 13 months. Inference: The median duration of CR1 in patients treated for AML is 13 months. 1.2 What is Biostatistics? Biostatistics The branch of statistics that applies statistical methods to medical and biological problems. Biostatisticians help researchers (basic scientists, medical researchers, drug developers) from the inception of a study to its completion. The role of a biostatistician in the process is: Chapter 1 6

To formulate the research question in concrete terms hypothesis. To plan the experiment/study that will answer the research question accurately and efficiently e.g. How many subjects (mice, patients, machines) will be needed to answer the research question? How would, for example, subjects be assigned to different groups? What data should be collected on each subject? How would the data be verified and processed? What are the issues with the data? e.g. How would the missing data be handled? Are there measurement errors in the data? How is it going to be handled? Analyze the collected data to draw conclusions regarding the hypotheses. Chapter 1 7

Example 1.2.1. Drug development. XYZ pharmaceuticals has been conducting research on developing drugs for hepatitis C (Hep C) treatment since 1990. Their basic science researchers have convinced Food and Drug Administration (FDA) through phase I and II trials that they have discovered a new molecule of the standard interferon that can be administered once weekly instead of once daily, and they claim that the drug provides better response rate compared to standard interferon. The company is planning to test the drug on a large cohort of hepatitis C patients. The statistician assigned for this study will generally start asking basic questions like: 1. How would you quantify the response? (Usually a simplified answer would be: absence of Hep C virus in the serum 24 weeks after the end of the treatment.) 2. How much improvement do you expect in response rate among the users of the new drug compared to standard interferon users? (The Phase II trial would indicate some ball park figure for this.) Chapter 1 8

Based on the answers, the statistician will Formulate the hypothesis in quantitative terms: H 0 : P 1 = P 2, (1.2.1) P 1 is the response rate in the standard interferon group and P 2 is the response rate for the new treatment (weekly interferon). Determine the number of patients to be recruited in the (standard) daily interferon group and in the (new) weekly interferon group. Make sure that the patient safety and privacy is ensured in the protocol keeping in mind the objective of the study. Devise a randomization scheme (possibly double-blinded) to assign treatments to patients so that the two groups are comparable with respect to patient characteristics. Suggest a data collection, verification and management plan. Chapter 1 9

How many sites will be used for patient recruitment? What data needs to be collected? What system will be used to transfer the data? How will the data be processed? What information and how often should the data be presented to the DSMB (Data Safety Monitoring Board)? What criteria should be used to declare the new treatment significantly better? How many interim analysis should be planned? What criteria should be used for stopping the trial? Finally, when the trial ends, the statistician will conduct/oversee the data analysis to arrive at a conclusion regarding the hypothesis. In this course, we will mainly talk about: Chapter 1 10

Statistical methods to analyze collected data so that answers to specific questions of interest can be made. Design issues, for example, sample size and power, etc. We will cover: Chapters 1-8 (in full), 10-11 (partial). Chapter 1 11

Chapter 1 12

Chapter 2 Descriptive Statistics In most cases data consist of many sample points. In a bid to interpret data, the first task is to summarize the data in some concise manner. 2.1 Types of data. Data collected, outcomes of experiments, etc. are often referred to as variables or outcomes, which come in several varieties. The type of outcome observed plays a role in determining which statistical procedures are appropriate. 13

Categorical (discrete) - data can be assigned to discrete categories. a) Unordered i) Gender ii) Political party to which one belongs iii) Exposed vs not exposed iv) Disease or no disease b) Ordered i) Good- Better- Best classification ii) Number of times patient admitted to hospital for illness during a given year. Continuous variables a) Ordinary or uncensored i) Standard scale measurements -height Chapter 2 14

-weight - optical density -ph ii) Survival times that are actually observed. b) Censored data i) Survival time- may be known only that time is greater than some observed time. Here is the first 10 records from a dataset: Table 2.1: Several records from a dataset Obs ID AGE SEX LEADTYP IQF 1 101 1101 1 1 70 2 102 905 1 1 85 3 103 1101 1 1 86 4 104 611 1 1 76 5 105 1103 1 1 84 6 106 606 1 2 96 7 107 611 1 2 94 8 108 1500 2 2 56 9 109 702 2 2 115 10 110 703 1 2 97 Chapter 2 15

Many numerical and graphical techniques are available for the purpose of summarizing data. We will start with continuous variables. 2.2 Measures of Location The first sets of summary measures will define the center (or middle) of the sample data. Such measures are known as measures of location or measures of central tendency. We will start with the simplest of these measures, the arithmetic mean (or simply, the mean). 2.2.1 Arithmetic Mean Arithmetic mean is the sum of the observations divided by the number of observations. Formula: If X is what is measured (observed) and x 1,x 2,...,x n are the values of n measurements, then the arithmetic mean is given by the formula: x = x 1 + x 2 +...+ x n n = n i=1 x i. (2.2.1) n Chapter 2 16

Example 2.2.1. Table 2.1 (Rosner) Table 2.2: Sample of birthweights (g) of live-born infants born at a private hospital in San Diego, California, during a 1-week period. New-born Weight (g) New-born Weight (g) New-born Weight (g) New-born Weight (g) 1 3265 6 3323 11 2581 16 2759 2 3260 7 3649 12 2841 17 3248 3 3245 8 3200 13 3609 18 3314 4 3484 9 3031 14 2838 19 3101 5 4146 10 2069 15 3541 20 2834 X = birthweights (g) of live-born infants x = 3265 + 3260 +...+ 2834 20 = 3166.9g. (2.2.2) Facts about mean Arithmetic mean is easy to compute. If the sample points change in scale by a factor of c, themean changes by a factor of c. In some cases it fails reflect the center of the sample, specifically in the presence of unusually high or low values (outliers). Chapter 2 17

It is most widely used measures of location. 2.2.2 Median Loosely speaking, the median is a number such that in the ordered sample, half of the sample points lies below it, and half above it. Formula: If n is odd then ( ) n+1 2 th observation is the median. Otherwise, median is defined as the average of the ( ( n 2) th and n 2 +1) th largest observations. Example 2.2.2. Table 2.2 (Rosner). White blood cell counts ( 1000) for a sample of 9 patients entering a hospital. The ordered sample is as follows: 3, 5, 7, 8, 8, 9, 10, 12, 35 Here, n = 9, and hence ( ) n+1 2 = 5. The median white blood cell counts for this sample is the 5th observation, which is 8000. Chapter 2 18

Facts about median: Median is not highly influenced by extreme observations, unless there is only one or two data points. Median depends only on one or two middle observations and hence is less sensitive to the magnitude of other observations in the sample. 2.2.3 Mode Mode is the most frequently occurring value in the sample. In the above example, the mode white blood cell count is 8000 as it occurs most frequently than any other white-blood count. Facts about mode: If all the data points occur exactly the same number of times, then there is no mode. A sample with one mode is called unimodal;twomodes,bimodal; Chapter 2 19

three modes, trimodal; and so on. 2.2.4 Geometric Mean Geometric mean is often used for summarizing ratios, percentages, indices, or other data sets bounded by zero. The geometric mean of n positive numbers x 1,x 2,...,x n ia defined as the n-th root of their product. Formula: GM = n x 1 x 2... x n =(x 1 x 2... x n ) n. 1 (2.2.3) In Example (2.2.2), the geometric mean is (3 5... 35) 1 9 =8.59 Facts about geometric mean: Only defined for non-negative numbers. Usually, if a distribution on the positive axis is asymmetric, then a log transformation is used to make it symmetric. For such distributions the geometric mean is used. Chapter 2 20

2.3 Measures of Spread/Variation/Dispersion Refer to Figure 2.4 (FOB). 2.3.1 Range Range is the difference between the largest and the smallest observations. For the birthweights data in Table 2.1, the range is Range = 4146 2069 = 2077g. For the data in Figure 2.4 (FOB), the range for the Autoanalyzer method is 226 177 = 49mg/dl, whereas the same for the Microenzymetic method is 209 172 = 17mg/dl. Thus, one can claim that: The Microenzymetic method measures cholesterol levels more consistently than Autoanalyzer method does. Or, equivalently, Chapter 2 21

Measurements of cholesterol levels using Microenzymetic method are more precise than those using Autoanalyzer method. Or, equivalently, Microenzymetic cholesterol measurements have lower variability compared to Autoanalyzer cholesterol measurements. Facts about range: Easy to compute. Depends highly on the extreme values. 2.3.2 Percentiles/Quantiles and Interquartile Range The 100pth (0 p 1) percentile of a distribution is the value V p such that 100p% of the sample points are less than or equal to V p. Median is the 50th percentile. Chapter 2 22

For the birthweights data in Table 2.1, some of the percentiles are calculated as: Position Percentile How we calculated it from the ordered data 10th 2670.0 n p =20 0.10 = 2; The average of 2nd and 3rd observation. 25th 2839.5 n p =20 0.25 = 5; The average of 5th and 6th observation. 50th 3246.5 n p =20 0.50 = 10; The average of 10th and 11th observation. 75th 3403.5 n p =20 0.75 = 15; The average of 15th and 16th observation. 95th 3629.0 n p =20 0.95 = 18; The average of 18th and 19th observation. 99th 4146.0 n p =20 0.99 = 19.8; The 20th observation. Table 2.3: Percentiles for the Birthweights data in Table 2.1 (Rosner) Facts about percentiles Percentiles are also known as quantiles. Percentiles characterize the relative positioning of the observations in the sample. The spread of the distribution about the center can be characterized by specifying cerain quantiles. For instance, 25th and 75th percentiles tell us that the middle half of the sample points lies between these two values. Chapter 2 23

The 25th percentile and the 75th percentile of a distribution are commonly referred to as 1st (lower) and 3rd (upper) quartiles. Here are the percentiles for the cholesterol data in Figure 2.4 (FOB): Method N Lower Quartile Median Upper Quartile IQR Auto 5 193.0 195.0 209.0 16.0 Micro 5 197.0 200.0 202.0 5.0 Table 2.4: Percentiles for the Cholesterol data in Figure 2.4 (Rosner) Interquartile range The distance between the 1st quartile (Q 1 ) and the 3rd quartile (Q 3 ) is known as interquartile range (IQR). Interquartile range is useful for comparing the spread of two distribution as well as detecting outliers. The higher the IQR, the more variable the distribution is. For the cholesterol data, the IQR for Autoanalyzer method and the microenzymatic method are respectively 16 and 5 which justifies our previous claim that the autoanalyzer method is not as precise as the Microenzymatic method. Chapter 2 24

For a positively skewed distribution, the distance between the median and upper quartile is greater than the distance between median and the lower quartile. For a negatively skewed distribution, the distance between the median and upper quartile is smaller than the distance between median and the lower quartile. [Birthweights data (Table 2.1, FOB)] For a symmetric distribution, the distance between the median and upper quartile is approximately equal the distance between median and the lower quartile. [For the menstrual cycle data Table 2.3 (FOB), Q 1 =28=Median, Q 3 = 29.] Outliers Outliers are extremely high or low values that are isolated from the overall distribution. Outliers in a data set can be identified based on the lower and upper quartiles. Formula: Chapter 2 25

An observation x can be treated as an outlier if either 1. x>q 3 +1.5 IQR,or 2. x<q 1 1.5 IQR. Formula: An observation x is an extreme outlier if either 1. x>q 3 +3 IQR,or 2. x<q 1 3 IQR. Are there any outliers in the cholesterol data set? 2.3.3 Mean deviation Let us look at the cholesterol data one more time. [INSERT CHOLSTEROL FIGURE] Look at how each observation differs from the mean; i.e, x 1 x, x 2 x, x 3 x,...,x n x. One way to measure the spread is to look at how sample points in the data differ from the mean. However, the mean of these differences Chapter 2 26

are zero for any data. For the autoanalyzer method sample, the differences are: (177 200) = 23, (193 200) = 7, (195 200) = 5, (209 200) = 9, and (226 200) = 26, and the mean difference is zero. Same is true for the microenzymatic method. Therefore the mean difference about the mean cannot be used to distinguish between samples based on spreads. What if we just take the average of the distances, instead of differences, i.e, x 1 x, x 2 x, x 3 x,..., x n x. Average of the distances from mean is known as mean deviation. For the autoanalyzer method sample, the distances are: 23, 7, 5, 9, and 26 with an average of 14. On the other hand, the mean deviation for the microenzymatic method is 4.4. Chapter 2 27

2.3.4 Variance and Standard Deviation In the definition of the mean deviation, we used absolute values of the difference between individual observations and the sample mean. Absolute values are sometimes difficult to deal with. Another measure of spread uses the squared deviations from the mean and averages it over the whole sample. The measure, known as variance, isdefined as: s 2 = n i=1 (x i x) 2. (2.3.1) n 1 The use of n 1 instead of n in the denominator have special justification, which we will discuss in chapter 6. Standard deviation is defined as the positive square root of the variance: s = n i=1 (x i x) 2. (2.3.2) n 1 For the autoanalyzer method, the variance is s 2 = ( 23)2 +( 7) 2 +( 5) 2 +9 2 +26 2 4 = 340. Chapter 2 28

For the microenzymatic method, the variance is s 2 = ( 8)2 +( 3) 2 + (0) 2 +2 2 +9 2 4 =39.5. Corresponding standard deviations are respectively s = 340 = 18.4 and s = 39.5 =6.3. Thus the spread, as measured by the standard deviation, is approximately three times as large as that of microenzymatic method. Facts about variance and standard deviation Variance and standard deviation remain unchanged when all the observations in the sample are shifted by the same constant. For example, the following two samples have the same variance (340) and standard deviation (18.4): Sample 1: 77, 93, 95, 109, 126 Sample 2: 177, 193, 195, 209, 226 Standard deviation has the same unit of measurement as the original samples. Chapter 2 29

If the sample points change in scale by a factor of c, the variance changes by a factor of c 2 and the standard deviation changes by a factor of c. Standard deviation is the most widely used measure of spread (dispersion). 2.3.5 Coefficient of Variation Suppose you are comparing two distributions having different means. How would you compare the variability of a sample with mean 10 and standard deviation 5 to a sample with mean 100 and standard deviation 5? Of course, the former is more variable, as the magnitude of the standard deviation relative to the mean is much higher for that sample compared to the latter. The measure coefficient of variation is designed to account for the magnitude of mean when assessing the spread. It is defined as: CV = s x 100. (2.3.3) Chapter 2 30

For the cholesterol data in Table 2.4 (FOB), the coefficient of variations for the Autoanalyzer and Microenzymatic methods are respectively 9.2% and 3.1%. 2.4 Graphical Representation 2.4.1 Histogram Histogram is a useful way of presenting data graphically. It presents frequencies (or relative frequencies) on the Y-axis against the data points on X-axis. The frequencies along with the values are usually referred to as the frequency distribution or distribution. When the number of unique observations are too large, the range of the variable is categorized in continuous intervals and the number of observations belonging to those intervals are reported. Distributions having two tails approximately similar are called symmetric distributions. For such distributions Mean Median Mode. Chapter 2 31

Histogram of Menstrual Cycle Relative Frequency 40 30 20 10 0 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 Time (days) Figure 2.1: Distribution of time intervals between successive menstrual periods (days) of college women (Table 2.3; Rosner; Page 13). Mean=28.5; Median=28; Mode=28. A distribution which has a longer tail on the right is called a positively skewed distribution. For such distributions data points on the right of the median tends to be farther from the median in absolute value than points below median, Chapter 2 32

Mean Median Mode. Figure 2.2: Example of a distribution which is neither skewed, nor symmetric. Distributions with a tail on the left are known as negatively skewed distributions. For such distributions Mean Median Mode. For more examples on symmetric, positively skewed and negatively skewed distributions, refer to page 12 of FOB. Chapter 2 33

2.4.2 Stem-and-leaf Plot Stem-and-leaf plot is similar to histogram, but it keeps the plot more close to the actual data by using the observations from the actual sample. It shows the basic shape of the distribution just like histogram does. Stem Leaf 41 1 3 5566 4 3 012223333 9 2 68888 5 Number 21 1 Multiply Stem.Leaf by 10**+3 Figure 2.3: Steam-and-leaf plot for the birthweights data in Table 2.1 (FOB). 2.4.3 Box plot Chapter 2 34

Stem Leaf 14 1 1 13 13 12 58 2 12 0 1 11 558 3 11 1124 4 10 55677778 8 10 00111124444 11 9 566666666777889999 18 9 0111122223334444 16 8 555555566666778888889999 24 8 00000022334 11 7 5556666667778899 16 7 012234 6 6 6 56 1 50 1 46 1 Multiply Stem.Leaf by 10**+1 Number Figure 2.4: Steam-and-leaf plot for the the variable IQF from the dataset Lead in the case study described in section 2.9 (FOB). Chapter 2 35

Figure 2.5: Box plot for the the variable IQF from the dataset Lead in the case study described in section 2.9 (FOB) by exposure type. 140 + 0 130 + 0 120 + 110 + 100 + +-----+ *-----* +-----+ + 90 + + *-----* +-----+ 80 + +-----+ 70 + / / 0 50 + 0 ------------+-----------+----------- LEAD_TYP 1 2 Chapter 2 36