Exploring, summarizing and presenting data. Berghold, IMI, MUG

Similar documents
Tastitsticsss? What s that? Principles of Biostatistics and Informatics. Variables, outcomes. Tastitsticsss? What s that?

What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty.

Chapter 3. Data Description

Statistics in medicine

Last Lecture. Distinguish Populations from Samples. Knowing different Sampling Techniques. Distinguish Parameters from Statistics

Statistics I Chapter 2: Univariate data analysis

Unit 2. Describing Data: Numerical

STAT 200 Chapter 1 Looking at Data - Distributions

BIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke

Statistics I Chapter 2: Univariate data analysis

Clinical Research Module: Biostatistics

Measures of Central Tendency and their dispersion and applications. Acknowledgement: Dr Muslima Ejaz

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.

Biostatistics for biomedical profession. BIMM34 Karin Källen & Linda Hartman November-December 2015

Review for Exam #1. Chapter 1. The Nature of Data. Definitions. Population. Sample. Quantitative data. Qualitative (attribute) data

Chapter2 Description of samples and populations. 2.1 Introduction.

Chapter 2 Class Notes Sample & Population Descriptions Classifying variables

P8130: Biostatistical Methods I

Chapter 3 Data Description

Unit Two Descriptive Biostatistics. Dr Mahmoud Alhussami

Describing distributions with numbers

Unit 1 Summarizing Data

CIVL 7012/8012. Collection and Analysis of Information

University of Jordan Fall 2009/2010 Department of Mathematics

Descriptive Statistics-I. Dr Mahmoud Alhussami

ECLT 5810 Data Preprocessing. Prof. Wai Lam

Answer keys for Assignment 10: Measurement of study variables (The correct answer is underlined in bold text)

2 Descriptive Statistics

Chapter 1. Looking at Data

Chapter 2. Mean and Standard Deviation

Histograms allow a visual interpretation

Measures of Central Tendency

Lecture 1: Descriptive Statistics

QUANTITATIVE DATA. UNIVARIATE DATA data for one variable

A is one of the categories into which qualitative data can be classified.

Announcements. Lecture 1 - Data and Data Summaries. Data. Numerical Data. all variables. continuous discrete. Homework 1 - Out 1/15, due 1/22

Section 3. Measures of Variation

Units. Exploratory Data Analysis. Variables. Student Data

IB Questionbank Mathematical Studies 3rd edition. Grouped discrete. 184 min 183 marks

After completing this chapter, you should be able to:

Unit 1 Summarizing Data

Averages How difficult is QM1? What is the average mark? Week 1b, Lecture 2

ADMS2320.com. We Make Stats Easy. Chapter 4. ADMS2320.com Tutorials Past Tests. Tutorial Length 1 Hour 45 Minutes

1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved.

Discrete Multivariate Statistics

FREQUENCY DISTRIBUTIONS AND PERCENTILES

Chapter 4. Displaying and Summarizing. Quantitative Data

Types of Information. Topic 2 - Descriptive Statistics. Examples. Sample and Sample Size. Background Reading. Variables classified as STAT 511

additionalmathematicsstatisticsadditi onalmathematicsstatisticsadditionalm athematicsstatisticsadditionalmathem aticsstatisticsadditionalmathematicsst

Full file at

Z score indicates how far a raw score deviates from the sample mean in SD units. score Mean % Lower Bound

Quantitative Methods Chapter 0: Review of Basic Concepts 0.1 Business Applications (II) 0.2 Business Applications (III)

What is statistics? Statistics is the science of: Collecting information. Organizing and summarizing the information collected

CHAPTER 2 Description of Samples and Populations

Descriptive Data Summarization

Unit 1 Summarizing Data

Measures of center. The mean The mean of a distribution is the arithmetic average of the observations:

BIOS 2041: Introduction to Statistical Methods

Preliminary Statistics course. Lecture 1: Descriptive Statistics

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

Describing distributions with numbers

Lecture 2 and Lecture 3

Chapter. Numerically Summarizing Data. Copyright 2013, 2010 and 2007 Pearson Education, Inc.

Objective A: Mean, Median and Mode Three measures of central of tendency: the mean, the median, and the mode.

MATH 117 Statistical Methods for Management I Chapter Three

STP 420 INTRODUCTION TO APPLIED STATISTICS NOTES

F78SC2 Notes 2 RJRC. If the interest rate is 5%, we substitute x = 0.05 in the formula. This gives

OCR Maths S1. Topic Questions from Papers. Representation of Data

1. Exploratory Data Analysis

Unit 1 Summarizing Data

MgtOp 215 Chapter 3 Dr. Ahn

2/2/2015 GEOGRAPHY 204: STATISTICAL PROBLEM SOLVING IN GEOGRAPHY MEASURES OF CENTRAL TENDENCY CHAPTER 3: DESCRIPTIVE STATISTICS AND GRAPHICS

Unit 1 Summarizing Data

Chapter 1: Exploring Data

Summarizing and Displaying Measurement Data/Understanding and Comparing Distributions

Chapter. Numerically Summarizing Data Pearson Prentice Hall. All rights reserved

Getting To Know Your Data

Stat 101 Exam 1 Important Formulas and Concepts 1

Accelerated Advanced Algebra. Chapter 1 Patterns and Recursion Homework List and Objectives

AP Final Review II Exploring Data (20% 30%)

MEASURING THE SPREAD OF DATA: 6F

Statistical Methods. by Robert W. Lindeman WPI, Dept. of Computer Science

Introduction to Statistics

Meelis Kull Autumn Meelis Kull - Autumn MTAT Data Mining - Lecture 03

Midrange: mean of highest and lowest scores. easy to compute, rough estimate, rarely used

Lecture 6: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries)

UCLA STAT 10 Statistical Reasoning - Midterm Review Solutions Observational Studies, Designed Experiments & Surveys

Practice problems from chapters 2 and 3

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 3.1- #

Elementary Statistics

Lecture 3B: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries)

An introduction to everyday statistics 2

Chapter 2: Tools for Exploring Univariate Data

DEPARTMENT OF QUANTITATIVE METHODS & INFORMATION SYSTEMS QM 120. Spring 2008

21 ST CENTURY LEARNING CURRICULUM FRAMEWORK PERFORMANCE RUBRICS FOR MATHEMATICS PRE-CALCULUS

Performance of fourth-grade students on an agility test

Chapter Four. Numerical Descriptive Techniques. Range, Standard Deviation, Variance, Coefficient of Variation

Describing Distributions

Transcription:

Exploring, summarizing and presenting data

Example Patient Nr Gender Age Weight Height PAVK-Grade W alking Distance Physical Functioning Scale Total Cholesterol Triglycerides 01 m 65 90 185 II b 200 70 179 84 02 m 70 75 170 II b 100 45 185 59 03 m 98 110 186 II b 150 75 175 87 04 f 50 75 162 II b 20 10 215 196 05 m 79 78 163 IV 20 00 221 330 06 f 68 92 164 III 200 55 200 189 07 f 56 68 161 II b 50 25 185 39 08 m 63 82 168 IV 10 00 196 75 09 m 70 72 177 III 50 15 187 174 10 f 79 60 155 III 100 30 177 105 11 m 51 48 180 II b 200 50 239 88 12 m 63 72 166 II b 100 10 184 153 13 f 70 74 158 II b 200 45 137 294 14 m 55 85 181 II b 50 25 183 101 15 m 46 98 174 II b 100 80 124 160 16 f 62 67 151 IV 100 20 183 86 17 f 60 77 158 II b 100 15 189 120 18 f 85 68 159 II b 30 25 195 76 19 m 67 87 173 II b 20 10 211 121 20 m 80 95 181 III 5 00 201 158 21 f 54 90 160 III 10 00 216 173 22 m 61 75 179 II b 100 50 219 47 23 f 57 62 160 IV 40 25 208 92 24 m 68 79 178 III 50 25 190 149 25 m 81 92 170 II b 50 55 248 369

Scales Nominal scale Ordinal scale Numerical scale

Nominal Scale The values of any two study units can be classified either as identical or non identical hair colour place of birth blood group Binary (dichotomous) variables: gender, rhesus factor,...

Ordinal Scale Observation are still classified but some observations have "more" or are "greater than" other observations. school grades stage of breast cancer side effect of a drug (mild, average, severe) pain-scores...

Numerical Scale continuous (e.g. age, height - measurements) discrete (e.g. number of fractures, number of children - counts) weight body temperature blood pressure serum cholesterol...

Types of Data Qualitative data categorical variable Nominal scale Ordinal scale Quantitative data Discrete variables Continuous variables

Examples Protein measured in urine Spontaneous urine using test strips (neg., pos.: +,++,+++) 24 hours sample of urine protein g/24hours Smoking Consumed tobacco g/day Number of smoked cigarettes per day Non-smoker, smoker

Criteria - measurements Reliability Validity Ease of Use

Reliability reliable unreliable

Validity Valid Not valid

Descriptive Statistics Exploring and presenting data in form of graphs Summarizing - data reduction (mean, variance etc.) Presenting data in form of tables

Frequency Qualitative data absolute and relative frequency Quantitative data define class intervals Determine the number of class intervals There should be enough class intervals to show the shape of the distribution but not too many that minor fluctuations are noticeable.

Graphs Barchart Piechart Histogram Box-and-whisker plot Scatterplot Time series plot...

Barchart number of decayed teeth in pupils decayed teeth in pupils cumulative 30 frequencies percentage percentage 0 25 33,3 33,3 1 2 26 34,7 68,0 9 12,0 80,0 20 3 4 5 6 7 9,3 89,3 2 2,7 92,0 4 5,3 97,3 1 1,3 98,7 absolute frequency 10 7 1 1,3 100,0 total 75 100,0 0 0 1 2 3 4 5 6 7 number of decayed teeth in pupils

Piechart PAVK-Grade IV 24% II b 50% III 26%

Histogram and cumulative distribution 0,35 1,0 0,30 0,8 0,25 rel. frequency 0,20 0,15 F(x) 0,6 0,4 0,10 0,2 0,05 0,00 0,0 1-1,5 1,5-2 2-2,5 2,5-3 3-3,5 3,5-4 4-4,5 4,5-5 5-5,5 5,5-6 1-1,5 1,5-2 2-2,5 2,5-3 3-3,5 3,5-4 4-4,5 4,5-5 5-5,5 5,5-6 FT3 FT3

TRIGLYCERIDES (mg / 100 ml) Histogram 1 frequency 240 230 220 210 200 190 180 170 160 150 140 130 120 110 100 90 80 70 12 10 8 6 4 2 0 Std.dev. = 38,83 Mean = 129 N = 80,00

Histogram 2 200 250 300 350 400 450 500 550 600 650 700 750 800 TOTAL CHOLESTEROL (mg / 100 ml) frequency 100 150 50 40 30 20 10 0 Std.dev. = 92,46 Mean = 220 N = 80,00

Histogram 3 30 25 frequency 20 15 10 5 Std.dev. = 21,97 Mean = 162 0 100 120 140 160 180 200 220 N = 80,00 SYSTOLIC BLOOD PRESSURE (mmhg)

Types of Distribution a) unimodal b) skewed positively c) skewed negatively c) bimodal e) trapezoid f) truncated g) L- shaped h) J - shaped i) U - shaped

Scatterplot 200 150 HDL 100 50 0 0 50 100 150 200 250 LDL

Summarizing Data Common statistics used to summarize data and describe certain attributes of a set of data. Measures of location: the central tendency Measures of dispersion: the spread of data Mean Median, quantile Mode Variance, standard deviation Range Interquartile range

Mean Mean = arithmetic mean x = 1 n n i= 1 x i Note: The mean is sensitive to extreme values

Example Values: 1, 2, 30 x = ( 1+ 2 + 30) 3 = 11 mean: x = 11 1 2 30

Variance, standard deviation s 2 = 1 n 1 n ( x ) i x i= 1 The variance of a data set is the arithmetic mean of the squared differences between the observations and the mean. s = s The standard deviation is primarily used to describe data. It is the square root of the variance. In many circumstances the large majority (about 95%) of a set of observations will be within two standard deviations of the mean (depends on the shape of the distribution normal distribution) normal range 2 2

Example The number of cows 4 farmers own in 3 villages village 1 village 2 village 3 observations 3, 6, 7, 4 5, 5, 5, 5 0, 0, 0, 20 mean x = 5 x = 5 x = 5 standard deviation s = 1.8 s = 0 s = 10.0

Time Series Plot R-TCI Induction of Anaesthesia 140 120 100 80 60 40 20 6 4 2 0-2 0 2 5 10 15 Time Course (min) all data points: n = 30

Geometric mean Geometric mean The geometric mean is generally used with data measured on a logarithmic scale G = n x1x2... x n logg = n i= 1 log x n i The logarithm of the geometric mean is equal to the mean of the logarithms of the observations

Median Median The median is the central value of the distribution if n is odd ~ x = x n+ (( 1) / 2) if n is even ~ x 1 2 ( x + ) n x = n ( / 2) ( / 2+ 1)

Mean - Median Example: n = 3 values: 1, 2, 30 median ~ x = 2 : mean: x =11 1 2 30

Skewness by mean, median and mode skewed negatively x < Me < Mo skewed positively Mo < Me < x

Quantiles The α-quantile The median is only a special case that is based on rank order. α-quantile x α : that at least α % of measurements are smaller or equal than the value x α. 1st quartile (α = 0.25) 2nd quartile or median 3rd quartile (α = 0.75) Percentiles (centiles)

Quantiles The α-quantile x α Calculation: α*n, rankorder m if α*n is not an integer, than m is the next integer following α*n and x α = x (m). if α*n is an integer, than m = α*n and x m + x 2 m+1 x α =

Quantiles

Quantiles Data: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 5, 2, 2, 6, 7, 2, -40, 2, 3, 2, 1, 1, 12, 3, 4, 0-40, 0, 1, 1, 2, 2, 2, 2, 2, 3, 3, 4, 5, 6, 7, 12 Q 1 = 1.5 Me = 2.0 Q 3 = 4.5 Interquartile range = Q 3 Q 1 = 3

Interquartile Range Interquartile range The 50% central range is sometimes used to describe variability IQR = 3rd quartile - 1st quartile

Box-and-Whisker Plot maximum 3rd quartile median 1st quartile minimum

Example Box-and-Whisker Plot 6 one-second-capacity (L) 5 4 3 2 1 Gender female 0 N = 104 100 152 170 49 51 5-8 yrs 9-12 yrs 13-16 yrs age groups male

In bunten Bildern wenig Klarheit, viel Irrtum und ein wenig Wahrheit. 25 20 15 J. W. v. Goethe 5 0 0 1 2 3 4 5 6

Presentation of Results Numerical Presentation Data summary should not be by the mean (median) alone, but some indication of variability should also be provided. E.g.: "... the mean diastolic blood pressure was 102.3 mm Hg (SD 11.9)." mean: standard deviation: quote it to one extra decimal place compared with the raw data (depending on amount of data) display with same precision as mean or with one more decimal place.

Tables Mean (SD) Age 67,8 (10,8) Total Cholesterol 213,3 (41,1) Triglycerides 129,4 (72,0) frequency % Gender f 35 (46) m 41 (54) PAVK-Grade II b 38 (50) III 20 (26) IV 18 (24)