Medical statistics part I, autumn 2009 (KLMED 8004 / KLH3004)

Similar documents
Tastitsticsss? What s that? Principles of Biostatistics and Informatics. Variables, outcomes. Tastitsticsss? What s that?

P8130: Biostatistical Methods I

1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved.

What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty.

Lecture 2 and Lecture 3

Analysis of repeated measurements (KLMED8008)

Unit 2. Describing Data: Numerical

Chapter 3. Data Description

200 participants [EUR] ( =60) 200 = 30% i.e. nearly a third of the phone bills are greater than 75 EUR

STAT 200 Chapter 1 Looking at Data - Distributions

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

CHAPTER 2 Description of Samples and Populations

CIVL 7012/8012. Collection and Analysis of Information

Lecture 1: Descriptive Statistics

ST Presenting & Summarising Data Descriptive Statistics. Frequency Distribution, Histogram & Bar Chart

BIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke

Glossary for the Triola Statistics Series

Descriptive Data Summarization

Overview. INFOWO Statistics lecture S1: Descriptive statistics. Detailed Overview of the Statistics track. Definition

BIOS 2041: Introduction to Statistical Methods

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.

Contents. Acknowledgments. xix

Measures of center. The mean The mean of a distribution is the arithmetic average of the observations:

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.

Medical statistics part I, autumn 2010: One sample test of hypothesis

Statistics in medicine

Chapter 2: Tools for Exploring Univariate Data

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics

Getting To Know Your Data

Chapter I: Introduction & Foundations

Exploring, summarizing and presenting data. Berghold, IMI, MUG

Descriptive Statistics-I. Dr Mahmoud Alhussami

Midrange: mean of highest and lowest scores. easy to compute, rough estimate, rarely used

Last Lecture. Distinguish Populations from Samples. Knowing different Sampling Techniques. Distinguish Parameters from Statistics

Chapter2 Description of samples and populations. 2.1 Introduction.

3 Lecture 3 Notes: Measures of Variation. The Boxplot. Definition of Probability

Chapter 1. Looking at Data

MATH4427 Notebook 4 Fall Semester 2017/2018

Lecture 12: Small Sample Intervals Based on a Normal Population Distribution

Dover- Sherborn High School Mathematics Curriculum Probability and Statistics

2/2/2015 GEOGRAPHY 204: STATISTICAL PROBLEM SOLVING IN GEOGRAPHY MEASURES OF CENTRAL TENDENCY CHAPTER 3: DESCRIPTIVE STATISTICS AND GRAPHICS

University of Jordan Fall 2009/2010 Department of Mathematics

1. Types of Biological Data 2. Summary Descriptive Statistics

BNG 495 Capstone Design. Descriptive Statistics

Descriptive Univariate Statistics and Bivariate Correlation

Statistics for Managers using Microsoft Excel 6 th Edition

Review of Statistics 101

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Units. Exploratory Data Analysis. Variables. Student Data

Describing distributions with numbers

(quantitative or categorical variables) Numerical descriptions of center, variability, position (quantitative variables)

CHAPTER 2: Describing Distributions with Numbers

2.1 Measures of Location (P.9-11)

Full file at

STP 420 INTRODUCTION TO APPLIED STATISTICS NOTES

2011 Pearson Education, Inc

MATH 1150 Chapter 2 Notation and Terminology

ADMS2320.com. We Make Stats Easy. Chapter 4. ADMS2320.com Tutorials Past Tests. Tutorial Length 1 Hour 45 Minutes

ECLT 5810 Data Preprocessing. Prof. Wai Lam

Introduction. ECN 102: Analysis of Economic Data Winter, J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 January 4, / 51

Describing distributions with numbers

Exam 1 Review (Notes 1-8)

ANÁLISE DOS DADOS. Daniela Barreiro Claro

Descriptive Statistics

Clinical Research Module: Biostatistics

AP Statistics Cumulative AP Exam Study Guide

Chapter 2 Descriptive Statistics

Comparing Measures of Central Tendency *

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency

Chapter 2 Class Notes Sample & Population Descriptions Classifying variables

Prentice Hall Stats: Modeling the World 2004 (Bock) Correlated to: National Advanced Placement (AP) Statistics Course Outline (Grades 9-12)

Introduction to Statistics

1. Exploratory Data Analysis

Transition Passage to Descriptive Statistics 28

Course Review. Kin 304W Week 14: April 9, 2013

Chapter 3. Measuring data

Chapter 1: Exploring Data

Psych Jan. 5, 2005

Review for Final. Chapter 1 Type of studies: anecdotal, observational, experimental Random sampling

Chapter 2: Descriptive Analysis and Presentation of Single- Variable Data

MATH 117 Statistical Methods for Management I Chapter Three

Introduction and Descriptive Statistics p. 1 Introduction to Statistics p. 3 Statistics, Science, and Observations p. 5 Populations and Samples p.

Determining the Spread of a Distribution

Exercises from Chapter 3, Section 1

AP Final Review II Exploring Data (20% 30%)

Math 221, REVIEW, Instructor: Susan Sun Nunamaker

Determining the Spread of a Distribution

Salt Lake Community College MATH 1040 Final Exam Fall Semester 2011 Form E

Chapter 4. Displaying and Summarizing. Quantitative Data

3.1 Measure of Center

Statistics I Chapter 2: Univariate data analysis

REVIEW: Midterm Exam. Spring 2012

Fundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur

Quantitative Tools for Research

Vocabulary: Samples and Populations

After completing this chapter, you should be able to:

Business Statistics. Lecture 10: Course Review

STT 315 This lecture is based on Chapter 2 of the textbook.

Exploratory data analysis

Statistics Handbook. All statistical tables were computed by the author.

Transcription:

Medical statistics part I, autumn 2009 (KLMED 8004 / KLH3004) Eirik Skogvoll Consultant/ assoc. professor Faculty of Medicine, NTNU Dept. of Anaesthesiology and Emergency Medicine, St. Olav University Hospital Trondheim Overview of the introductory courses Part I (KLMED8004/ KLH3004) one variable Descriptive statistics Graphics Probability Random variable Probability distribution Hypothesis testing Confidence intervals One- and two sample procedures Non-parametric methods Part II (KLMED8005) multi variable Table analysis Correlation Linear regression Analysis of variance (ANOVA) Survival analysis Logistic regression 2

What we do not cover Data management in general Registry and database design Questionnaire design and analysis Epidemiology 3 Day Practical and organizational details What is statistics? Statistical terminology and some basic mathematical notation Descriptive statistics Graphical display 4 2

Practical... Home page of the course: http://folk.ntnu.no/slyderse/medstat/medstati_h09.html Lecture on Wednesdays, 25-400, + block weeks (KLH3004) Where? Check the home page! Additional lectures: Introduction to R (24.08), Stata (9.08), SPSS (25.08 og 3.08). These are not that important for the course. Check the home page for details and supplementary literature. Course manager: Professor Stian Lydersen (stian.lydersen@ntnu.no), phone 725 75428 Course administrator: Mariann Hansen (mariann.hansen@ntnu.no), phone 725 98660 5 Practical... 3 mandatory exercises, each with 8 parts Practical application of theory (for future reference) Ensures continuous work throughout the course Group-wise collaboration and hand-in Groups of (maximum) 8 persons: Each group works independently (minimum two meetings), and then receives 2 hours of mandatory guidance with instructor to clarify unsolved issues Guidance approx. every 3. week (KLMED8004: week no. 39, 44 and 47, KLH3004: week no. 38, 4 and 46). Joint guidance for two groups: Group,2: Monday 45-600 3, 4: Wednesday 25-400 (In English) 5, 6: Wednesday 45-600 7, 8: Wednesday 55-700 9,0: Wednesday 35-500,2,3,4: Thursday 085-000 6 3

Practical... The instructor will... record attendance, receive the written assignment (i.e. exercise), correct and comment it, eventually approve, and return it to the group contact Every course participant shall possess an approved copy of every assignment 7 Course assessment Course approval requires completed and approved assignments Examination used to consist of presenting the exercises (5-20 min): The complete set of approved exercises must be brought and displayed One exercise is drawn at random for detailed discussion Supplementary questions in relation to this This term possibly written examination (4 hours) 8 4

What is statistics? Statistics means: Numerical summaries of information, as commonly understood (tables, census, graphs etc.) A branch of mathematics with its own terminology and methods Numerical quantities calculated from a sample (mean, median, sample variance, range etc.) ( ikke noe helt godt norsk ord for dette. observator ) 9 What is statistics? «The science of collecting, displaying and analysing data» Oxford Dictionary of Statistics, 2004 0 5

What is statistics? «Data have no meaning in themselves; they are meaningful only in relation to a conceptual model of the phenomenon studied.» Box, Hunter & Hunter: Statistics for experimenters, Wiley 978 What is statistics? «I am dismayed by how often my clients ask whether a particular approach would be "legal" or "against the rules" rather than "accurate" or "misleading. This misunderstanding of statistics as a body of seemingly arbitrary dogma leads many ( ) to perceive violations even when the research has not actually been harmed.» Professor Peter Bacchetti, Dept. of Epidemiology and Biostatistics, University of California (BMJ, 2002) 2 6

Statistical terminology Population Sample Variable Probability distribution Parameter 3 Population and sample Population Phenomenon Sample Observations (data recording) Inference Measurement Model statistical analysis 4 7

Variable Definition: An observation which is not constant, but may take on different values Numerical variable Continuous: may take on all possible values on the real line Examples: Weight (g) Height (cm) income (kr) Discrete: a countable set Examples: Cells per microscopy field No. of deaths over a time period The results from a dice roll 5 Variable Numerical variable Other aspects. Natural zero point? Example: In the physical sense, 200 K is twice 00 K, but 30 C is not twice as hot as 5 C Censored observation? We only know whether the observation exceeds or is less than some limit. Examples: CRP < 5 mg/l (limit of detection) Survival time > 20 days (last appointment) 6 8

Variable Categorical variable Ordinal: rank, but distance is undefined ( 3 is more than 2, but how much more? ) Examples: APGAR score (0-0), NYHA classification (I-IV) Nominal: no ranking Examples: Philosophy/ religion (Jew, Muslim, Christian, Atheist, Buddhist) gender (boy, girl) Place of living (city, town, village) 7 Variable Stochastic (random) variable May take on different values. Mathematically described with a probability distribution (Normal-, χ 2, etc.) Dependent variable, outcome variable stochastic, the one we measure and have focus on Independent variable known, deterministic, (usually) not stochastic, e.g. a grouping variable (then known as a factor ) 8 9

Probability distribution 0,5 Describes which values the stochastic variable may take on, and the theoretical probability of each value 0,4 0,3 0,2 0, 0,0-4 -3-2 - 0 2 3 4 Z A histogram describes the observed frequency 2000 2500 3000 3500 4000 9 Parameter A numerical constant Defines the properties (location and shape) of a probability distribution Example: the variable height is Normally distributed with two parameters: expected value (μ) and variance (σ 2 ) Usually needs to be estimated (calculated) from data Parameter variable! 20 0

Variable vs. parameter Equation of a straight line: Y = α + βx 2.5 2 Y = 2 X.5 Y Dependent and independent variable? Parameter (s)? 0.5 0 0 0.5.5 2 2.5 3 3.5 4 4.5 5 X 2 Descriptive statistics Summarizing data using a few useful measures (numbers): Central tendency (often mean value) Measures of spread (often standard deviation) Antall 0 2 4 6 8 0 Measures of spread (variation, form, shape ) 2000 2500 3000 3500 4000 4500 Fødselsvekt (g) Central tendency (mean, centre of gravity, position, location ) 22

Measures of Central Tendency Arithmetic mean Trimmed) mean Median Mode (Midrange) (Geometric mean) Antall 0 2 4 6 8 0 2000 2500 3000 3500 4000 4500 Fødselsvekt (g) Central tendency (mean, centre of gravity, position, location 23 Arithmetic mean Synonyms Average Mean value Sample mean Definition (Rosner def. 2., s. 9) x x + x +... + x n 2 n = = xi n n i = 24 2

Arithmetic mean Properties All observations must be known Observations need not be ranked by value Mathematically favourable Sensitive to outliers : extreme, untypical observations Definition (Rosner def. 2., s. 9) x x + x +... + x n x 2 n = = n n i = i 25 Example (Rosner expl. 2.4, p. ) (from Rosner table 2., p. 0) 3265 g + 3260 g +... + 2834 g x = = 366,9 g 20 If x = 500 g rather than 3265 g... 500 g + 3260 g +... + 2834 g x = = 3028,7 g 20... a relatively large change due to a single observation 26 3

Arithmetic mean Properties Not defined for censored observations Sample means may be combined: x = k n x n x + n x j n j j= n 2 n2 = k n + n2 n j j= +... + n x +... + n k k nk In the long run (i.e. as n infinity) the sample mean will converge towards the Expected value of the stochastic variable E( X ) = μ = lim n n xi n i= 27 Example. Combining sample means: x = 8, 4 n = 22 x = 7,9 n = 2 2 x = 9,5 n = 6 3 3 x = 5, n = 5 (i.e. k = 4) 4 4 k xn j j= nx + nx +... + n x x = = k n+ n2 +... + nk n j= j n 2 n2 k nk 22 8, 4 + 7,9 + 6 9,5 + 5 5, = 22+ + 6+ 5 404,8 + 96,9 + 52 + 75,5 829, 2 = = = 5,36 54 54 28 4

Trimmed mean Arithmetic mean based on the central 90-95 % of observations; the tails (5-0 %) have been trimmed Less sensitive to outliers 29 Median Synonyms 50 percentile sample median Properties Observations are ranked by value, in increasing order The median is the value that cuts the data into two pieces Insensitive to outliers Transformation: x x i ordered () i 30 5

Eksemple (Rosner expl. 2.5, p. ) (from Rosner table 2., p. 0) x = 3265 x2 x3 = = 3260 3245 x = 3484 4... Observations ranked in increasing order: x x x () (2) (3) (4) = 2069 = 258 = 2759 x = 2834... 3 Median Properties Up to half of observations may be censored Mathematically less favourable Sample medians cannot be combined Definition (Rosner def. 2.2, p. ) () Uneven no. of observations ( n = 3, 5, 7...) x = x n+ 2 (2) Even no. of observations ( n = 2, 4, 6...) x = x + x n n+ 2 2 2 2 (other definitions may occur) 32 6

Median Example (Rosner expl. 2.6, p.) (from Rosner table 2.2, p. 2) n = 9, we use definition (): x = x = x = x = x = 8 n+ 9+ 0 ( 5) 2 2 2 33 Median Eksample (Rosner expl. 2.5, p.) (from Rosner table 2., p. 0) n = 20, we use definition (2): x + x x + x n n+ 2 20 20+ 2 x 2 2 2 2 ( 0) + x ( ) 3245 g + 3248 g x = = = = = 2 2 2 2 3246,5 g 34 7

Mode Rosner def. 2.3 The most frequently observed value Rosner expl. 2.7, p. 3 Mode = 28 35 Mode Not in much use. Mathematically unfavorable. More often used in an abstract sense: unimodal distribution (single peak), bimodal (two peaks) etc. Examples of a bimodal probability distribution (Zar 999, fig. 3.2 b) 36 8

Measures of Central tendency - summary Unimodal, symmetrical Bimodal, symmetrical Unimodal, right (pos.) skewed Unimodal, left (neg.) skewed 37 Measures of spread Measure of spread, (variation, form, shape ) Range Quantiles, percentiles: interquartile range Variance Standard deviation Coefficient of variation Antall 0 2 4 6 8 0 2000 2500 3000 3500 4000 4500 Fødselsvekt (g) 38 9

Range Synonym Variasjonsbredde (Norw.) Properties Same unit as observations Employs only two observations. Inefficient. Sensitive to outliers Increases with sample size Definisjon (BR def. 2.5, s. 8) Range = X X ( n) () Eksempel (BR expl. 2.4, s. 8) (fra BR table 2., s. 0) Range = x x = 446 g 2069 g = 2077 g (20) () 39 Range Eksample (Rosner expl. 2.5, p. 8) (from Rosner Fig 2.4, p. 8) Range (auto-) = x x = 226 mg/dl 77 mg/dl = 49 mg/dl (5) () Range (micro-) = x x = 209 mg/dl 92 mg/dl = 7 mg/dl (5) () 40 20

Quantiles and percentiles Definition (Rosner def 2.6) The sample value V p,which exceeds a given proportion (p) of the ordered observations, 0 < p < (One observation corresponds to /n th of the data. If n = 00, then one observation equals 0.0 or %. Thus, observation no. 20 (ordered) corresponds to the 0.2 quantile or the 20th percentile. Formally n ordered observations: X, X, X,..., X, X +,..., X () V p = X(k + ) if npis not an integer", k < np < k+ X( np) + X( np+ ) (2) Vp = if np is an integer 2 () (2) (3) ( k) ( k ) ( n) 4 Quantiles and percentiles Example (Rosner, expl. 2.6) Find V 0, and V 0,9 (0th and 90th percentile) of the birth weight data in table 2.. p = 0, and np = 2 (integer) and we employ (2): x(2) + x(3) 258+ 2759 V 0, = = = 2670g 2 2 p = 0,9 and np = 8 (also an integer) and we use (2): x(8) + x(9) 3609 + 3649 V 0, 9 = = = 3629g 2 2 The central 80 % of children thus weigh between 2670 g and 3629 g, i.e. a spread of about 000 g. 42 2

Quantiles and percentiles V 0,5 = median = 50th percentile. The definitions must agree Example (from Rosner table 2., p. 0) np = 0, so definition 2.6 (2) is used: V 0,5 x( ) + x 0 ( ) 3245 g + 3248 g = x = = = 2 2 3246,5 g 43 Interquartile range (IQR) Properties Same unit as observations Contains the central 50 % of (ordered) observations Less sensitive to outliers Corresponds to the box of a box plot Definition IQR = V V V V 0,75 0,25 0,75 0,25 : upper, or 3.rd quartile : lower, or.st quartile 44 22

Interquartile range (IQR) Example (from Rosner table 2., p. 9) Find IQR for birth weight IQR = V V ( np = 5 og 5, respectively) 0,75 0,25 x( ) + x 5 ( 6) 3323 g + 3484 g V0,75 = = = 3403,5 g 2 2 x( 5) + x( 6) 2838 g + 284 g V0,25 = = = 2839,5 g 2 2 IQR = 3403, 5 2839, 5 = 564 g 45 Other possible measures of spread... Observations: x, x 2,,x n -Mean deviation? n n i = x x i Rather unuseful! Sums to 0 (zero). -Mean, absolute deviation (MAD)? n n i = n i = x x i OK, but mathematically less favourable. -Mean, squared deviation? n Best! ( x x) i 2 46 23

Sample variance Definition (Rosner Def.2.7) S x x n 2 2 = ( i ) n i= Comment (n -) is used rather than n for the denominator, as we already have made use of the sample mean value. Thus one piece of information (one Degree of Freedom, DF) from the sample has been expended. This yields a slightly larger estimate of S 2. For large n, the difference is trivial. 47 Sample variance Example (Rosner Expl. 2.9, p 2-22) Auto-analyzer, x = 200 mg/dl S x x n 2 2 = ( i ) n i= 2 2 2 2 2 = (77 200) (93 200) (95 200) (209 200) (226 200) 4 + + + + 360 = (529 + 49 + 25 + 8+ 676) = = 340 4 4 Micro enzymatic (same x) 2 2 2 2 2 2 S = (92 200) (97 200) (200 200) (202 200) (209 200) 4 + + + + 58 = (64 + 9 + 0 + 4 + 8) = = 39,5 4 4 48 24

Sample variance n 2 S = ( x i x) Properties n i= Always positive Extreme observations contribute more (squaring) Favorable mathematical properties But cannot be interpreted on original scale (kg kg 2, mg/dl mg 2 /dl 2, cm cm 2 osv.) 2 49 Sample standard deviation (SD) Definition (Rosner Def.2.8) n S = ( xi x) = S n i= 2 2 Example (Rosner Expl. 2.9, p. 2) 2 Auto-analyzer: 340 8, 4 S = S = = 2 Micro enz: 39,5 6,3 S = S = = 50 25

Sample standard deviation (SD) Approximately 95 % of the population are found within: x ± 2SD which approximately corresponds to the 2,5 and 97,5 percentiles Follows from the Normal distribution (Gauss) Requires reasonable symmetry and unimodal distribution A quick & dirty calculation: SD Range 4 May prove useful if you have to make a rough estimate of SD, in for example sample size determination 5 Sample standard deviation (SD) Properties Always positive S = n Extreme observations contribute more (squaring) Favorable mathematical properties May be interpreted on original scale! n i= ( x i x) 2 52 26

Graphics: dotplot A good figure says more than 000 t-tests! Plot individual observations. A view of everything. Group comparison is easy Useful with a low number of observations 53 Graphics: Box-and whisker plot (boxplot) Properties Quick view of central tendency and spread Key elements: Median (2.nd quartile).st and 3.rd quartile Non-extreme observations Useful for moderately many observations Easy to compare groups 500 2000 2500 3000 3500 4000 4500 Median Largest nonextreme observation 3.rd quartile.st quartile 54 27

0 5 0 5 20 2 55 Histogram Properties An empirical probability distribution Shows the general shape Difficult to compare groups Useful with a large number of observations Number 0 2 4 6 8 0 2000 2500 3000 3500 4000 4500 Birth weight (g) 56 28