Exploring, summarizing and presenting data
Example Patient Nr Gender Age Weight Height PAVK-Grade W alking Distance Physical Functioning Scale Total Cholesterol Triglycerides 01 m 65 90 185 II b 200 70 179 84 02 m 70 75 170 II b 100 45 185 59 03 m 98 110 186 II b 150 75 175 87 04 f 50 75 162 II b 20 10 215 196 05 m 79 78 163 IV 20 00 221 330 06 f 68 92 164 III 200 55 200 189 07 f 56 68 161 II b 50 25 185 39 08 m 63 82 168 IV 10 00 196 75 09 m 70 72 177 III 50 15 187 174 10 f 79 60 155 III 100 30 177 105 11 m 51 48 180 II b 200 50 239 88 12 m 63 72 166 II b 100 10 184 153 13 f 70 74 158 II b 200 45 137 294 14 m 55 85 181 II b 50 25 183 101 15 m 46 98 174 II b 100 80 124 160 16 f 62 67 151 IV 100 20 183 86 17 f 60 77 158 II b 100 15 189 120 18 f 85 68 159 II b 30 25 195 76 19 m 67 87 173 II b 20 10 211 121 20 m 80 95 181 III 5 00 201 158 21 f 54 90 160 III 10 00 216 173 22 m 61 75 179 II b 100 50 219 47 23 f 57 62 160 IV 40 25 208 92 24 m 68 79 178 III 50 25 190 149 25 m 81 92 170 II b 50 55 248 369
Scales Nominal scale Ordinal scale Numerical scale
Nominal Scale The values of any two study units can be classified either as identical or non identical hair colour place of birth blood group Binary (dichotomous) variables: gender, rhesus factor,...
Ordinal Scale Observation are still classified but some observations have "more" or are "greater than" other observations. school grades stage of breast cancer side effect of a drug (mild, average, severe) pain-scores...
Numerical Scale continuous (e.g. age, height - measurements) discrete (e.g. number of fractures, number of children - counts) weight body temperature blood pressure serum cholesterol...
Types of Data Qualitative data categorical variable Nominal scale Ordinal scale Quantitative data Discrete variables Continuous variables
Examples Protein measured in urine Spontaneous urine using test strips (neg., pos.: +,++,+++) 24 hours sample of urine protein g/24hours Smoking Consumed tobacco g/day Number of smoked cigarettes per day Non-smoker, smoker
Criteria - measurements Reliability Validity Ease of Use
Reliability reliable unreliable
Validity Valid Not valid
Descriptive Statistics Exploring and presenting data in form of graphs Summarizing - data reduction (mean, variance etc.) Presenting data in form of tables
Frequency Qualitative data absolute and relative frequency Quantitative data define class intervals Determine the number of class intervals There should be enough class intervals to show the shape of the distribution but not too many that minor fluctuations are noticeable.
Graphs Barchart Piechart Histogram Box-and-whisker plot Scatterplot Time series plot...
Barchart number of decayed teeth in pupils decayed teeth in pupils cumulative 30 frequencies percentage percentage 0 25 33,3 33,3 1 2 26 34,7 68,0 9 12,0 80,0 20 3 4 5 6 7 9,3 89,3 2 2,7 92,0 4 5,3 97,3 1 1,3 98,7 absolute frequency 10 7 1 1,3 100,0 total 75 100,0 0 0 1 2 3 4 5 6 7 number of decayed teeth in pupils
Piechart PAVK-Grade IV 24% II b 50% III 26%
Histogram and cumulative distribution 0,35 1,0 0,30 0,8 0,25 rel. frequency 0,20 0,15 F(x) 0,6 0,4 0,10 0,2 0,05 0,00 0,0 1-1,5 1,5-2 2-2,5 2,5-3 3-3,5 3,5-4 4-4,5 4,5-5 5-5,5 5,5-6 1-1,5 1,5-2 2-2,5 2,5-3 3-3,5 3,5-4 4-4,5 4,5-5 5-5,5 5,5-6 FT3 FT3
TRIGLYCERIDES (mg / 100 ml) Histogram 1 frequency 240 230 220 210 200 190 180 170 160 150 140 130 120 110 100 90 80 70 12 10 8 6 4 2 0 Std.dev. = 38,83 Mean = 129 N = 80,00
Histogram 2 200 250 300 350 400 450 500 550 600 650 700 750 800 TOTAL CHOLESTEROL (mg / 100 ml) frequency 100 150 50 40 30 20 10 0 Std.dev. = 92,46 Mean = 220 N = 80,00
Histogram 3 30 25 frequency 20 15 10 5 Std.dev. = 21,97 Mean = 162 0 100 120 140 160 180 200 220 N = 80,00 SYSTOLIC BLOOD PRESSURE (mmhg)
Types of Distribution a) unimodal b) skewed positively c) skewed negatively c) bimodal e) trapezoid f) truncated g) L- shaped h) J - shaped i) U - shaped
Scatterplot 200 150 HDL 100 50 0 0 50 100 150 200 250 LDL
Summarizing Data Common statistics used to summarize data and describe certain attributes of a set of data. Measures of location: the central tendency Measures of dispersion: the spread of data Mean Median, quantile Mode Variance, standard deviation Range Interquartile range
Mean Mean = arithmetic mean x = 1 n n i= 1 x i Note: The mean is sensitive to extreme values
Example Values: 1, 2, 30 x = ( 1+ 2 + 30) 3 = 11 mean: x = 11 1 2 30
Variance, standard deviation s 2 = 1 n 1 n ( x ) i x i= 1 The variance of a data set is the arithmetic mean of the squared differences between the observations and the mean. s = s The standard deviation is primarily used to describe data. It is the square root of the variance. In many circumstances the large majority (about 95%) of a set of observations will be within two standard deviations of the mean (depends on the shape of the distribution normal distribution) normal range 2 2
Example The number of cows 4 farmers own in 3 villages village 1 village 2 village 3 observations 3, 6, 7, 4 5, 5, 5, 5 0, 0, 0, 20 mean x = 5 x = 5 x = 5 standard deviation s = 1.8 s = 0 s = 10.0
Time Series Plot R-TCI Induction of Anaesthesia 140 120 100 80 60 40 20 6 4 2 0-2 0 2 5 10 15 Time Course (min) all data points: n = 30
Geometric mean Geometric mean The geometric mean is generally used with data measured on a logarithmic scale G = n x1x2... x n logg = n i= 1 log x n i The logarithm of the geometric mean is equal to the mean of the logarithms of the observations
Median Median The median is the central value of the distribution if n is odd ~ x = x n+ (( 1) / 2) if n is even ~ x 1 2 ( x + ) n x = n ( / 2) ( / 2+ 1)
Mean - Median Example: n = 3 values: 1, 2, 30 median ~ x = 2 : mean: x =11 1 2 30
Skewness by mean, median and mode skewed negatively x < Me < Mo skewed positively Mo < Me < x
Quantiles The α-quantile The median is only a special case that is based on rank order. α-quantile x α : that at least α % of measurements are smaller or equal than the value x α. 1st quartile (α = 0.25) 2nd quartile or median 3rd quartile (α = 0.75) Percentiles (centiles)
Quantiles The α-quantile x α Calculation: α*n, rankorder m if α*n is not an integer, than m is the next integer following α*n and x α = x (m). if α*n is an integer, than m = α*n and x m + x 2 m+1 x α =
Quantiles
Quantiles Data: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 5, 2, 2, 6, 7, 2, -40, 2, 3, 2, 1, 1, 12, 3, 4, 0-40, 0, 1, 1, 2, 2, 2, 2, 2, 3, 3, 4, 5, 6, 7, 12 Q 1 = 1.5 Me = 2.0 Q 3 = 4.5 Interquartile range = Q 3 Q 1 = 3
Interquartile Range Interquartile range The 50% central range is sometimes used to describe variability IQR = 3rd quartile - 1st quartile
Box-and-Whisker Plot maximum 3rd quartile median 1st quartile minimum
Example Box-and-Whisker Plot 6 one-second-capacity (L) 5 4 3 2 1 Gender female 0 N = 104 100 152 170 49 51 5-8 yrs 9-12 yrs 13-16 yrs age groups male
In bunten Bildern wenig Klarheit, viel Irrtum und ein wenig Wahrheit. 25 20 15 J. W. v. Goethe 5 0 0 1 2 3 4 5 6
Presentation of Results Numerical Presentation Data summary should not be by the mean (median) alone, but some indication of variability should also be provided. E.g.: "... the mean diastolic blood pressure was 102.3 mm Hg (SD 11.9)." mean: standard deviation: quote it to one extra decimal place compared with the raw data (depending on amount of data) display with same precision as mean or with one more decimal place.
Tables Mean (SD) Age 67,8 (10,8) Total Cholesterol 213,3 (41,1) Triglycerides 129,4 (72,0) frequency % Gender f 35 (46) m 41 (54) PAVK-Grade II b 38 (50) III 20 (26) IV 18 (24)