Statistics in medicine

Size: px
Start display at page:

Download "Statistics in medicine"

Transcription

1 Statistics in medicine Lecture 1- part 1: Describing variation, and graphical presentation Outline Sources of variation Types of variables Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease Epidemiology Department Yale School of Public Health S L I D E 0 S L I D E 1 Readings and resources Chapter 9, p : Jekel's epidemiology, biostatistics, preventive medicine, and public health by David L. Katz et al (4th edition). Almost every characteristic that is measured on a patient varies THAT IS WHY IT IS CALLED A VARIABLE EXAMPLES Blood glucose level Blood pressure Diet Electrolytes etc. S L I D E 2 S L I D E 3 1

2 There are different sources of variation Let us consider blood pressure as an example Biologic differences Age, race, diet, affect blood pressure Older patients, of African descent, and those who consume high salt diet tend to have high blood pressure Measurement conditions Time of the day, anxiety, fatigue etc. High blood pressure is observed following exercise, and with anxiety There are different sources of variation Let us consider blood pressure as an example Measurement error Systematic error Distort the data in one direction leading to bias obscure the truth Ex. Defective BP cuff that tend to give high readings Random error Slight, inevitable inaccuracies Not systematic because it makes some readings too high, and some too low Statistics can adjust for random error, but can not fix systematic error S L I D E 4 S L I D E 5 To understand variation, you have to describe it Descriptive statistics definition: Statistics, such as the mean, the standard deviation, the proportion, and the rate, used to describe attributes of a set of a data Variable could be quantitative or qualitative Qualitative Skin color Jaundice Heart murmurs Quantitative Blood pressure Electrolytes levels jpg S L I D E 6 S L I D E 7 2

3 There are different types of variables Nominal Dichotomous (binary) Ordinal (ranked) Continuous (interval) Continuous (ratio) Risks and proportions Counts and units of observation Combining data Nominal variables (qualitative) Nominal are naming variables The simplest scale of measurement. Used for characteristics that have no numerical values, no measurement scales and no rank order. It is also called a categorical or qualitative scale. Ex. Skin color Different number can be assigned to each color E.g. 1: purple, 2: black, 3: white, 4 blue, 5: tan It makes no difference to the statistical analysis which number is assigned to which color, because the number is merely a numerical name for a color Percentages and proportions are commonly used to summarize the data S L I D E 8 S L I D E 9 Dichotomous variables (qualitative) Dichotomous from the Greek cut into two variables Ex.: Normal/abnormal skin color, living/dead Some time it s not enough to describe the data as two categories living/dead, but it is important to know how long the patient survived survival analysis Ordinal ranked variables Used for characteristics that have an underlying order to their values; that have clearly implied direction from better to worse. Are categorical (qualitative) scales Three or more levels Although there is an order among categories, however the difference between two adjacent categories is not the same throughout the scale S L I D E 10 S L I D E 11 3

4 Ordinal ranked variables Numerical scales (quantitative) Ex. Pitting edema grading scale: 0- no edema sever edema Ex. Pain scale: 0- no pain worst imaginable pain The highest level of measurement. It is used for characteristics that can be given numerical values; the difference between numbers have meaning, ex. BMI, height. Percentages and proportions are commonly used to summarize the data Medians are sometime used to describe the whole data Types Interval Ratio Discrete Measures of central tendencies are usually used to summarize: means, medians S L I D E 12 S L I D E 13 Numerical scales (continuous) Has a value on a continuum Interval: arbitrary zero point Ex. Centigrade temperature scale Ratio: absolute zero point Ex. Kalvin temperature scale Numerical scales (Discrete) Has values equal to integers Units of observation: person, animal, thing, etc. Presented in frequency tables One characteristic in the x-axis, one characteristic in the y-axis, and counts in the cells Frequency table of gender by whether serum total cholesterol was checked or not Cholesterol level Gender Checked Not checked Total Female 17(63%) 10(37%) 27(100%) Male 25 (57%) 19(43%) 44(100%) Total 42(59%) 29(41%) 71(100%) &cad=rja&uact=8&ved=0ahukewiuo6nf8sjoahuekh4khxtzanuqjrwi Bw&url=http%3A%2F%2Fwww.livescience.com%2F kelv in.html&psig=afqjcnfgvvg1wdlx78w2v44wdlzqdqb17a&ust= Source: Jekel's epidemiology, biostatistics, preventive medicine, and public health by David L. Katz et al (4th edition). S L I D E 14 S L I D E 15 4

5 Risks and proportions Risk is the conditional probability of an event (e.g. death) in a defined population in a defined period. Share some characteristics of discrete and some characteristics of continuous variables Ex. A discrete event (e.g., death) occurred in a fraction of population Calculated by the ratio of counts in the numerator to counts in denominator Combining data Continuous variable could be converted to ordinal variable When data is converted to categories individual information is lost The fewer the number of categories the greater is the amount of information lost Histogram of neonatal mortality rate per 1000 live births, by birth weight group, United States 1980 Birth weight (g) Source: Buehler W et al. Public Health Rep 1 02: , 1987 S L I D E 16 S L I D E 17 Statistics in medicine Lecture 1- part 2: Describing variation, and graphical presentation Outline Frequency distributions Frequency distribution of continuous data Frequency distribution of binary data Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease Epidemiology Department Yale School of Public Health Fatma.shebl@yale.edu S L I D E 18 S L I D E 19 5

6 Readings and resources Frequency distribution is Chapter 9, p : Jekel's epidemiology, biostatistics, preventive medicine, and public health by David L. Katz et al (4th edition). S L I D E 20 S L I D E 21 Frequency distribution is TABLE of data displaying the VALUE of each data point ( or range of data points) in one column and the FREQUENCY with which that value occurs in the other column PLOT of data displaying the VALUE of each data point ( or range of data points) on one axis and the FREQUENCY with which that value occurs on the other axis S L I D E 22 Frequency tables Definition A table showing the number and or the percentages of observations occurring at different values (or range of values) of a variable. Steps of creating frequency table Decide on the number of non-overlapping intervals It is better to have equal width intervals Usually 6 to 14 intervals are adequate to demonstrate the shape of the distribution Creating intervals means: continuous variable converted to ordinal variable Information on individual level is lost Count the number of observations in each interval Percentages could be calculated as well Percentage=the number of observation in the interval divided by the total number of observations, multiplied by 100 Presented graphically by histogram S L I D E 23 6

7 Frequency tables Categories of glucose level of 180 participants Category Count % <= >= Glucose level of 180 participants Glucose level Count Glucose level Count Glucose level Count S L I D E 24 There are REAL and THEORITICAL frequency distributions Real Obtained from the actual data Theoretical Calculated using certain assumptions The most commonly used is NORMAL (GAUSSIAN) DISTRIBUTION Most statistical methods assume that the data is normally distributed Real data are seldom perfectly normally distributed Based on the central limit theory, if the sample size is large, the assumption of normal distribution usually hold even if the data is skewed S L I D E 25 Normal (Gaussian) distribution Continuous distribution Used if the population (σ) is known A symmetric bell-shaped probability distribution with a shape that is determined by mean (µ) and standard deviation (σ) Same µ different σ Different µ Same σ Normal (Gaussian) distribution Properties: Bell shape Depends on mean (µ) and standard deviation (σ) Symmetric about the mean (µ) Mean=median=mode S L I D E 26 S L I D E 27 7

8 Normal (Gaussian) distribution The area under the curve is the probability (relative frequency) of the values comprising the normal distribution. The area under the whole curve = 1 68% within µ + 1σ 95% within µ + 2σ (actually 1.96σ) 99% within µ + 3σ (actually 2.58σ) Normal (Gaussian) distribution, example If the math test scores is normally distributed with a mean of 10 and standard deviation of 3, then what is the range of scores in which 68% of the student scores will lie? 68% of the students will have a score within µ + 1σ 10+3 =between 7 and 13 S L I D E 28 S L I D E 29 Standard normal distribution (z) Standard normal distribution (z) The normal distribution with mean 0 and standard deviation 1 If the mean#0 and SD#1 do z transformation allow using the standard normal table z = x μ, where x is the value of the σ variable, µ is the mean, σ is the SD A positive z means the value is above the mean A negative z means the value is below the mean If the z is known you can get the x x= µ + zσ Graph generated by R Properties: Bell shape Symmetric about the mean Mean=median=mode Mean=0 Standard deviation=1 The area under the curve = 1 68% within µ + 1σ 95% within µ + 2σ 99% within µ + 3σ Graph generated by R S L I D E 30 S L I D E 31 8

9 Standard normal distribution (z) tables Standard normal distribution (z), example Areas under the standard normal curve (z scores) Could be used to find proportion above,below, or between any z scores The first column includes the stem of the z value The top row includes the second and third digit of the z value Z score Area under the curve to the left i.e. below z Negative z Positive z Source: If the mean of students test scores is 80, and the standard deviation is 10, what is the test score that divides the highest 5% of scores (i.e. find the students at or above the 95% percentile)? Solution: Find the z score that marks the upper 5% The test score= µ σ= *10=96.45 Conclusion: the upper 5% has a test score > S L I D E 32 S L I D E 33 Standard normal distribution (z) tables T-distribution If the mean of HDL cholesterol is 45 mg/dl, and the standard deviation is 5, what is the proportion of population that have HDL values > 40 mg/dl? Solution: Find the z score equivalent to 40 mg/dl z = x μ = (40-45)/5= -1 σ P(HDL>40)=P(z>-1)=1-P(z<=1-) Find the area (probability) below (HDL=40) =.1587 P(HDL>40)= = Conclusion: 84.13% of people in the population are expected to have HDL value 40 mg/dl Area under the curve to the left i.e. below z Z score Negative z table A symmetric distribution with mean 0 and standard deviation larger than that for the normal distribution for small sample sizes. Used if the population standard deviation is unknown Needed when the sample size is small t and z distributions are very similar if n>30 Properties: Symmetric Bell shape Shape change based on degrees of freedom k Mean=median=mode=0 Standard deviation > 1 Z & t almost identical when sample size ~30 Graph generated by R Source: S L I D E 34 S L I D E 35 9

10 T-distribution T-distribution Degrees of freedom (df) Is the number of observations that are free to vary When calculating the mean, the sum of observations are fixed, therefore when adding up the N observations, each observation could be vary, except the last one, because the total has to be fixed. Therefore, only N-1 observations can vary if one mean is to be estimated (one-sample), and (N1+N2)-2 observations can vary if two means are to be estimated (two-sample) df= total sample size-number of means that are calculated Table of critical values of t distribution Levels of Significance for a One-Tailed Test df Levels of Signficance for a Two-Tailed Test Source: S L I D E 36 S L I D E 37 Binomial distribution is used to describe the frequency distribution of dichotomous data The probability distribution that describes the number of successes X observed in n independent trials, each with the same probability of occurrence For binary variables Defined by n and π If sample is large, or proportion ~.5 z distribution could be used Chi-square distribution (X 2 ) is used for analysis of counts The distribution used to analyze counts in frequency tables. A nonsymmetrical distribution with mean (µ) and variance (σ 2 ) Used for categorical (nominal) data Properties: Degrees of freedom = υ µ = υ σ 2 = υ*2 Approaches normal distribution with the increase in df Graphs generated by R Graph generated by R S L I D E 38 S L I D E 39 10

11 Statistics in medicine Lecture 1- part 3: Describing variation, and graphical presentation Readings and resources Chapter 9, p : Jekel's epidemiology, biostatistics, preventive medicine, and public health by David L. Katz et al (4th edition). Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease Epidemiology Department Yale School of Public Health S L I D E 40 S L I D E 41 Summarizing numerical data Continuous variable Measures of central tendency Measures of dispersion Nominal data Proportions Percentages Ratios Rates Measures of central tendency Index or summary numbers that describe the middle of a distribution Types: Mean Median Mode S L I D E 42 S L I D E 43 11

12 The mean The arithmetic mean Types Arithmetic Geometric The most commonly used statistics The arithmetic average of the observations, which is denoted by µ in the population and by in the sample. In a sample the mean is the sum of X values divided by the number n in the sample Arithmetic mean s calculation Sensitive to extreme values Could be used with numerical scales Should NOT be used with ordinal scales S L I D E 44 S L I D E 45 Example of arithmetic mean s calculation Arithmetic mean = = 1775 = Basic Statistical Measures Location Variability Mean Std Deviation Median Variance Mode Range Interquartile Range SAS 9.4 output Subject Glucose The geometric mean Less commonly used than arithmetic mean The nth root of the product of n observations Geometric mean s calculation Log GM i.e. the mean of the log values Exponentiation GM Used with skewed distributions or logarithms S L I D E 46 S L I D E 47 12

13 Example of geometric mean s calculation The median Geometric mean ' Log Subject Glucose glucose Sum Arethmetic Mean Geometric mean A measure of central tendency. It is the middle observation; i.e., the one that divides the distribution of values into halves.it is also equal to the 50 th percentile Median s calculation Arrange observation ascending or descending Count in to find Odd number of observations: the middle value Even number of observations: the mean of the two middle values Less sensitive to extreme value than the mean Could be used with numerical scales Could be used with ordinal scales S L I D E 48 S L I D E 49 Example of median s calculation Median (88+90)/2 =89 Basic Statistical Measures Location Variability Mean Std Deviation Median Variance Mode Range Interquartile Range SAS 9.4 output Subject Glucose The mode The value of a numerical variable that occurs the most frequently Mode s calculation Count the number of times each value occur The mode is the value that is most frequent Some data might not have mode Some data might have two modes bimodal Some data might have > two modes multimodal Modal class could be estimated, which is the interval that has the largest number of observations S L I D E 50 S L I D E 51 13

14 Example of mode s calculation Use of measures of central tendency Modes 98 and 108 Subject Glucose What is the best measure for a particular dataset? The choice depends on: Type of scale Numerical arithmetic mean or median Ordinal median Logarithmic scale geometric mean Distribution Symmetrical: the same shape on both sides of the mean arithmetic mean or median Skewed: outliers in one direction median Bimodal: mode S L I D E 52 S L I D E 53 Measures of spread (dispersion) The range Index or summary numbers that describe the spread of observations about the middle value. Types Range Standard deviation Coefficient of variation Percentiles Interquartile range The difference between the largest and the smallest observation Range s calculation Rank the data Range=largest value smallest value Sometimes, minimum and maximum values are displayed instead of the range S L I D E 54 S L I D E 55 14

15 Example of range s calculation The standard deviation Range =36 Or present the lower and upper values (72,108) Basic Statistical Measures Location Variability Mean Std Deviation Median Variance Mode Range Interquartile Range SAS 9.4 output Subject Glucose The most common measure of spread, denoted by σ in the population and SD or s in the sample. It can be used with the mean to describe the distribution of observations. It is the square root of the average of the squared deviations of the observations from their mean SD s calculation Other computational formulas exists S L I D E 56 S L I D E 57 The standard deviation Example of SD s calculation SD is used in many statistical tests Could be used with the mean to describe the distribution of observation If the mean 2SD contains zero skewed observations Characteristics of SD: If the distribution is bell shape 67% of observations lie between mean+1sd 95% of observations lie between mean+2sd 99.7% of observations lie between mean+3sd Regardless of the shape At least 75% of observations lie between mean+2sd S L I D E 58 SD s calculation Basic Statistical Measures Location Variability Mean Std Deviation Media Variance n Mode Range Interquar tile Range SAS 9.4 output Subject Glucose Sum Mean SD S L I D E 59 15

16 The coefficient of variation The standard deviation divided by the mean. It is used to obtain a measure of relative variation i.e. variation relative to the size of the mean CV s calculation Commonly used in quality control Percentiles A number that indicates the percentage of a distribution that is less than or equal to that number Commonly used to compare individual values to norm Growth charts Used to determine normal laboratory ranges Between 2½ and 97½ percentiles contains the central 95% of the distribution Quantiles Level Quantile 100% Max % % % % Q % Median % Q % % % % Min 72. SAS 9.4 output S L I D E 60 S L I D E 61 Interquartile range The difference between the 25 th percentile(first quartile) and the 75 th percentile(third quartile) It contains the central 50% of the distribution Some authors present the first and third quartile values instead of the difference Interquartile range Interquartile range =19 Or present the first and third quartile (78.5,97.5) Basic Statistical Measures Location Variability Mean Std Deviation Median Variance Mode Range Interquartile Range SAS 9.4 output Subject Glucose S L I D E 62 S L I D E 63 16

17 Use of measures of spread Error bar plots What is the best measure for a particular dataset? The choice depends on: Type of measure of central tendency Mean standard deviation Median interquartile range Distribution Symmetrical: the same shape on both sides of the mean standard deviation or interquartile range Skewed: outliers in one direction interquartile range Purpose Compare to norms percentiles Compare distributions measured on different scale coefficient of variation Describe the central 50% of distribution interquartile range Emphasize the extreme values range A graph that displays the mean and a measure of a spread for one or more groups Deciphering the error bar plot The circle The mean The bars The standard deviation Some authors present the standard error S L I D E 64 S L I D E 65 The proportions and percentages Proportion definition: The number of observations with the characteristic of interest divided by the total number of observations. Proportion s calculation If the data contains two groups a and b, then the proportion of a is Could be used with Nominal scales Ordinal scales numerical scales Percentage: is the proportion multiplied by 100% The ratios A part divided by another part. It is the number of observations WITH the characteristic of interest divided by the number of observations WITHOUT the characteristic of interest. Ratio s calculation If the data contains two groups a and b, then the ratio of a to b is S L I D E 66 S L I D E 67 17

18 The rates The rates A proportion associated with a multiplier, called the base (e.g., 1000, 100,000) and computed over a specified period Rate s calculation If the data contains two groups a and b, then the rate of a is Use of rates in epidemiology and medicine: Mortality rates Cause-specific mortality rates Morbidity rates Adjusting rates: Why crude rate might not be suitable? Comparing populations with dissimilar characteristics such as age, gender, race Types: Direct adjustment Indirect adjustment Details of calculations will be covered in the epidemiology and public health thread class S L I D E 68 S L I D E 69 One of the problems in the analysis of frequency distribution is SKEWNESS Horizontal stretching of the distribution the right and left sides of the distributions are not mirror images i.e. one tail is longer than the other The tail indicates the direction and type of skewed distribution Tail is pointing to the right skewed to the right (positively skewed) Tail is pointing to the left skewed to the left (negatively skewed) The mean follows the tail regardless of the type of skewed distribution The sequence from the tail to the apex is mean, median, mode (realize it is alphabetical order) Mean > median > mode skewed to the right (positively skewed) Mean < median < mode skewed to the left (negatively skewed) Graph source: Statistics in medicine Lecture 1- part 4: Describing variation, and graphical presentation Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease Epidemiology Department Yale School of Public Health Fatma.shebl@yale.edu S L I D E 70 S L I D E 71 18

19 Readings and resources Chapter 9, p : Jekel's epidemiology, biostatistics, preventive medicine, and public health by David L. Katz et al (4th edition). There are several way to depict continuous variable frequency distribution Histogram Frequency polygons Line graphs Stem and leaf diagrams Quantiles Boxplots S L I D E 72 S L I D E 73 Frequency distribution is usually presented with histogram A bar graph of a frequency distribution of numerical observations Steps of creating histogram Decide on the number of non-overlapping intervals(statistical software might determine this automatically) Put the intervals on the x-axis Put the number or percentages on the y-axis Percentages are used to compare two histograms based on different sample sizes The frequency/percentages are presented with bars Area of each bar is in proportion to percentage of individuals in that interval Combining observations in intervals smoother curve compared to histograms of individual values Frequency Histogram of Glucose level Interpreting the graph Most participants had fasting blood glucose level of 65 to 125. Only two participants had blood glucose level less than 60 mg/dl. Additionally, the distribution is skewed to the right (positively skewed) ; several participants had had fasting blood glucose level much higher than the target of =< 125 mg/dl Glucose level Minitab 17 output S L I D E 74 S L I D E 75 19

20 Frequency polygons is another presentation of the frequency distribution Percentage polygons Frequency Frequency polygon of Glucose level Frequency polygon definition: A line graph connecting the mid-points of the top of the columns of histogram. It is useful in comparing two frequency distributions Steps of creating frequency polygons Create a histogram Connect the mid-points of the top of the columns of histogram Percent Percentage polygon Percentage polygon definition: A line graph connecting the mid-points of the top of the columns of histogram based on percentages instead of count. It is useful in comparing two or more frequency distributions when frequencies are not equal Steps of creating percentage polygons Create a histogram based on percentages Connect the mid-points of the top of the columns of histogram Extends the line from the midpoints of the first and last columns to the x-axis Glucose level Minitab 17 output Glucose level S L I D E 76 S L I D E 77 Stem-and-leaf plots Stem-and-Leaf Display: Glucose level A graphical display for numerical data. It is similar to both frequency table and histogram For tallying observations Steps of creating stem-and-leaf plot Decide on the number of non-overlapping intervals Draw a vertical line Put the first digits of each interval on the left side of the vertical line stem For each individual, put the second digit on the right side of the vertical line leaves If the observation is one digit, that digit is the leaf Reorder leaves from lowest to highest within each interval Count from either end to locate the median Stem-and-leaf of Glucose level N = 180 Leaf Unit = 1.0 n Stem Leaf (24) Vertical line was added manually Median is in this line=91 Minitab 17 output S L I D E 78 S L I D E 79 20

21 Box plots (box-and-whisker plot) Box plots (box-and-whisker plot) A graph that summarize the data by displaying the minimum, first quartile, median, third quartile, and maximum statistics It could be created from the information displayed in a stem-and-leaf plot or a frequency table Deciphering the box-and-whisker plot The box The top of the box is the is the third quartile The bottom of the box is the first quartile The length of the box is the interquartile range The median is presented with a horizontal line in the box The mean is presented with a plus sign in the box (some programs) The whiskers Depict the minimum and the maximum values Source: editionhttp:// e.box.defs.gif S L I D E 80 S L I D E 81 Glucose level Boxplot of Glucose level Interpreting the results The boxplot shows: The range(whiskers) is 52,172 The longer upper whisker and large box area above the median indicate that the data is rightly (positive) skewed The median is 91 The mean The interquartile range is 79, One outlier is present Tabular and graphical presentation of nominal and ordinal data Contingency frequency tables: A table used to display counts and or frequencies for two or more nominal or quantitative variables Gender Post graduate College High school Male Female S L I D E 82 S L I D E 83 21

22 Tabular and graphical presentation of nominal and ordinal data Dot plots A graphical presentation using dots Graphs for two characteristics Two characteristics are nominal Bar charts - Dot plots Bar charts A graph used with nominal characteristics to display the numbers or percentages of observations with the characteristic of interest The categories are placed on the x- axis The numbers or percentages are placed on the y-axis S L I D E 84 S L I D E 85 Graphs for two characteristics Graphs for two characteristics One characteristic is nominal and the other is numerical: Box plots Error plots Error plots SAS 9.4 output Box plots SAS 9.4 output Two characteristics are numerical: Scatterplots (bivariate plots) A two-dimensional graph displaying the relationship between two numerical characteristics of variables Creating a scatterplot If data does not have an outcome and a predictor Choice of the x and y axis does not matter If data has an outcome and a predictor Put the explanatory (risk factor, predictor) on the x- axis Put the outcome on the y-axis Put a circle for each observation at the point of intersection of its x and y values Scatter plots SAS 9.4 output S L I D E 86 S L I D E 87 22

23 Quiz A pharmaceutical company tested the effect of sofosbuvir (new HCV drug) on sustained viral response (SVR) in four HCV genotypes. In genotype 1, 2, 3, and 4, the drug was shown to cause SVR in 90%, 93%, 84%, and 96% of the patients respectively. What type of graphical depiction is best suited to show the data? A. Pie chart B. Venn diagram C. Bar diagram D. Histogram S L I D E 88 23

STAT 200 Chapter 1 Looking at Data - Distributions

STAT 200 Chapter 1 Looking at Data - Distributions STAT 200 Chapter 1 Looking at Data - Distributions What is Statistics? Statistics is a science that involves the design of studies, data collection, summarizing and analyzing the data, interpreting the

More information

Elementary Statistics

Elementary Statistics Elementary Statistics Q: What is data? Q: What does the data look like? Q: What conclusions can we draw from the data? Q: Where is the middle of the data? Q: Why is the spread of the data important? Q:

More information

A is one of the categories into which qualitative data can be classified.

A is one of the categories into which qualitative data can be classified. Chapter 2 Methods for Describing Sets of Data 2.1 Describing qualitative data Recall qualitative data: non-numerical or categorical data Basic definitions: A is one of the categories into which qualitative

More information

What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty.

What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty. What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty. Statistics is a field of study concerned with the data collection,

More information

Statistics in medicine

Statistics in medicine Statistics in medicine Lecture 3: Bivariate association : Categorical variables Proportion in one group One group is measured one time: z test Use the z distribution as an approximation to the binomial

More information

Chapter 3. Data Description

Chapter 3. Data Description Chapter 3. Data Description Graphical Methods Pie chart It is used to display the percentage of the total number of measurements falling into each of the categories of the variable by partition a circle.

More information

STP 420 INTRODUCTION TO APPLIED STATISTICS NOTES

STP 420 INTRODUCTION TO APPLIED STATISTICS NOTES INTRODUCTION TO APPLIED STATISTICS NOTES PART - DATA CHAPTER LOOKING AT DATA - DISTRIBUTIONS Individuals objects described by a set of data (people, animals, things) - all the data for one individual make

More information

Statistics in medicine

Statistics in medicine Statistics in medicine Lecture 4: and multivariable regression Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease Epidemiology Department Yale School of Public Health Fatma.shebl@yale.edu

More information

BIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke

BIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke BIOL 51A - Biostatistics 1 1 Lecture 1: Intro to Biostatistics Smoking: hazardous? FEV (l) 1 2 3 4 5 No Yes Smoke BIOL 51A - Biostatistics 1 2 Box Plot a.k.a box-and-whisker diagram or candlestick chart

More information

Last Lecture. Distinguish Populations from Samples. Knowing different Sampling Techniques. Distinguish Parameters from Statistics

Last Lecture. Distinguish Populations from Samples. Knowing different Sampling Techniques. Distinguish Parameters from Statistics Last Lecture Distinguish Populations from Samples Importance of identifying a population and well chosen sample Knowing different Sampling Techniques Distinguish Parameters from Statistics Knowing different

More information

CIVL 7012/8012. Collection and Analysis of Information

CIVL 7012/8012. Collection and Analysis of Information CIVL 7012/8012 Collection and Analysis of Information Uncertainty in Engineering Statistics deals with the collection and analysis of data to solve real-world problems. Uncertainty is inherent in all real

More information

TOPIC: Descriptive Statistics Single Variable

TOPIC: Descriptive Statistics Single Variable TOPIC: Descriptive Statistics Single Variable I. Numerical data summary measurements A. Measures of Location. Measures of central tendency Mean; Median; Mode. Quantiles - measures of noncentral tendency

More information

Introduction to Statistics

Introduction to Statistics Introduction to Statistics Data and Statistics Data consists of information coming from observations, counts, measurements, or responses. Statistics is the science of collecting, organizing, analyzing,

More information

Tastitsticsss? What s that? Principles of Biostatistics and Informatics. Variables, outcomes. Tastitsticsss? What s that?

Tastitsticsss? What s that? Principles of Biostatistics and Informatics. Variables, outcomes. Tastitsticsss? What s that? Tastitsticsss? What s that? Statistics describes random mass phanomenons. Principles of Biostatistics and Informatics nd Lecture: Descriptive Statistics 3 th September Dániel VERES Data Collecting (Sampling)

More information

Describing distributions with numbers

Describing distributions with numbers Describing distributions with numbers A large number or numerical methods are available for describing quantitative data sets. Most of these methods measure one of two data characteristics: The central

More information

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

P8130: Biostatistical Methods I

P8130: Biostatistical Methods I P8130: Biostatistical Methods I Lecture 2: Descriptive Statistics Cody Chiuzan, PhD Department of Biostatistics Mailman School of Public Health (MSPH) Lecture 1: Recap Intro to Biostatistics Types of Data

More information

Exploring, summarizing and presenting data. Berghold, IMI, MUG

Exploring, summarizing and presenting data. Berghold, IMI, MUG Exploring, summarizing and presenting data Example Patient Nr Gender Age Weight Height PAVK-Grade W alking Distance Physical Functioning Scale Total Cholesterol Triglycerides 01 m 65 90 185 II b 200 70

More information

University of Jordan Fall 2009/2010 Department of Mathematics

University of Jordan Fall 2009/2010 Department of Mathematics handouts Part 1 (Chapter 1 - Chapter 5) University of Jordan Fall 009/010 Department of Mathematics Chapter 1 Introduction to Introduction; Some Basic Concepts Statistics is a science related to making

More information

Descriptive Statistics-I. Dr Mahmoud Alhussami

Descriptive Statistics-I. Dr Mahmoud Alhussami Descriptive Statistics-I Dr Mahmoud Alhussami Biostatistics What is the biostatistics? A branch of applied math. that deals with collecting, organizing and interpreting data using well-defined procedures.

More information

Chapter 2: Tools for Exploring Univariate Data

Chapter 2: Tools for Exploring Univariate Data Stats 11 (Fall 2004) Lecture Note Introduction to Statistical Methods for Business and Economics Instructor: Hongquan Xu Chapter 2: Tools for Exploring Univariate Data Section 2.1: Introduction What is

More information

Glossary for the Triola Statistics Series

Glossary for the Triola Statistics Series Glossary for the Triola Statistics Series Absolute deviation The measure of variation equal to the sum of the deviations of each value from the mean, divided by the number of values Acceptance sampling

More information

Describing distributions with numbers

Describing distributions with numbers Describing distributions with numbers A large number or numerical methods are available for describing quantitative data sets. Most of these methods measure one of two data characteristics: The central

More information

Unit Two Descriptive Biostatistics. Dr Mahmoud Alhussami

Unit Two Descriptive Biostatistics. Dr Mahmoud Alhussami Unit Two Descriptive Biostatistics Dr Mahmoud Alhussami Descriptive Biostatistics The best way to work with data is to summarize and organize them. Numbers that have not been summarized and organized are

More information

Clinical Research Module: Biostatistics

Clinical Research Module: Biostatistics Clinical Research Module: Biostatistics Lecture 1 Alberto Nettel-Aguirre, PhD, PStat These lecture notes based on others developed by Drs. Peter Faris, Sarah Rose Luz Palacios-Derflingher and myself Who

More information

Units. Exploratory Data Analysis. Variables. Student Data

Units. Exploratory Data Analysis. Variables. Student Data Units Exploratory Data Analysis Bret Larget Departments of Botany and of Statistics University of Wisconsin Madison Statistics 371 13th September 2005 A unit is an object that can be measured, such as

More information

1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved.

1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved. 1-1 Chapter 1 Sampling and Descriptive Statistics 1-2 Why Statistics? Deal with uncertainty in repeated scientific measurements Draw conclusions from data Design valid experiments and draw reliable conclusions

More information

Lecture 1: Descriptive Statistics

Lecture 1: Descriptive Statistics Lecture 1: Descriptive Statistics MSU-STT-351-Sum 15 (P. Vellaisamy: MSU-STT-351-Sum 15) Probability & Statistics for Engineers 1 / 56 Contents 1 Introduction 2 Branches of Statistics Descriptive Statistics

More information

Stat 101 Exam 1 Important Formulas and Concepts 1

Stat 101 Exam 1 Important Formulas and Concepts 1 1 Chapter 1 1.1 Definitions Stat 101 Exam 1 Important Formulas and Concepts 1 1. Data Any collection of numbers, characters, images, or other items that provide information about something. 2. Categorical/Qualitative

More information

Chapter 1. Looking at Data

Chapter 1. Looking at Data Chapter 1 Looking at Data Types of variables Looking at Data Be sure that each variable really does measure what you want it to. A poor choice of variables can lead to misleading conclusions!! For example,

More information

are the objects described by a set of data. They may be people, animals or things.

are the objects described by a set of data. They may be people, animals or things. ( c ) E p s t e i n, C a r t e r a n d B o l l i n g e r 2016 C h a p t e r 5 : E x p l o r i n g D a t a : D i s t r i b u t i o n s P a g e 1 CHAPTER 5: EXPLORING DATA DISTRIBUTIONS 5.1 Creating Histograms

More information

Histograms allow a visual interpretation

Histograms allow a visual interpretation Chapter 4: Displaying and Summarizing i Quantitative Data s allow a visual interpretation of quantitative (numerical) data by indicating the number of data points that lie within a range of values, called

More information

Unit 2. Describing Data: Numerical

Unit 2. Describing Data: Numerical Unit 2 Describing Data: Numerical Describing Data Numerically Describing Data Numerically Central Tendency Arithmetic Mean Median Mode Variation Range Interquartile Range Variance Standard Deviation Coefficient

More information

21 ST CENTURY LEARNING CURRICULUM FRAMEWORK PERFORMANCE RUBRICS FOR MATHEMATICS PRE-CALCULUS

21 ST CENTURY LEARNING CURRICULUM FRAMEWORK PERFORMANCE RUBRICS FOR MATHEMATICS PRE-CALCULUS 21 ST CENTURY LEARNING CURRICULUM FRAMEWORK PERFORMANCE RUBRICS FOR MATHEMATICS PRE-CALCULUS Table of Contents Functions... 2 Polynomials and Rational Functions... 3 Exponential Functions... 4 Logarithmic

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics CHAPTER OUTLINE 6-1 Numerical Summaries of Data 6- Stem-and-Leaf Diagrams 6-3 Frequency Distributions and Histograms 6-4 Box Plots 6-5 Time Sequence Plots 6-6 Probability Plots Chapter

More information

Class 11 Maths Chapter 15. Statistics

Class 11 Maths Chapter 15. Statistics 1 P a g e Class 11 Maths Chapter 15. Statistics Statistics is the Science of collection, organization, presentation, analysis and interpretation of the numerical data. Useful Terms 1. Limit of the Class

More information

1. Exploratory Data Analysis

1. Exploratory Data Analysis 1. Exploratory Data Analysis 1.1 Methods of Displaying Data A visual display aids understanding and can highlight features which may be worth exploring more formally. Displays should have impact and be

More information

STT 315 This lecture is based on Chapter 2 of the textbook.

STT 315 This lecture is based on Chapter 2 of the textbook. STT 315 This lecture is based on Chapter 2 of the textbook. Acknowledgement: Author is thankful to Dr. Ashok Sinha, Dr. Jennifer Kaplan and Dr. Parthanil Roy for allowing him to use/edit some of their

More information

Fundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur

Fundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur Fundamentals to Biostatistics Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur Statistics collection, analysis, interpretation of data development of new

More information

Chapter 2: Descriptive Analysis and Presentation of Single- Variable Data

Chapter 2: Descriptive Analysis and Presentation of Single- Variable Data Chapter 2: Descriptive Analysis and Presentation of Single- Variable Data Mean 26.86667 Standard Error 2.816392 Median 25 Mode 20 Standard Deviation 10.90784 Sample Variance 118.981 Kurtosis -0.61717 Skewness

More information

Measures of Central Tendency and their dispersion and applications. Acknowledgement: Dr Muslima Ejaz

Measures of Central Tendency and their dispersion and applications. Acknowledgement: Dr Muslima Ejaz Measures of Central Tendency and their dispersion and applications Acknowledgement: Dr Muslima Ejaz LEARNING OBJECTIVES: Compute and distinguish between the uses of measures of central tendency: mean,

More information

CHAPTER 5: EXPLORING DATA DISTRIBUTIONS. Individuals are the objects described by a set of data. These individuals may be people, animals or things.

CHAPTER 5: EXPLORING DATA DISTRIBUTIONS. Individuals are the objects described by a set of data. These individuals may be people, animals or things. (c) Epstein 2013 Chapter 5: Exploring Data Distributions Page 1 CHAPTER 5: EXPLORING DATA DISTRIBUTIONS 5.1 Creating Histograms Individuals are the objects described by a set of data. These individuals

More information

Sets and Set notation. Algebra 2 Unit 8 Notes

Sets and Set notation. Algebra 2 Unit 8 Notes Sets and Set notation Section 11-2 Probability Experimental Probability experimental probability of an event: Theoretical Probability number of time the event occurs P(event) = number of trials Sample

More information

Chapter 4. Displaying and Summarizing. Quantitative Data

Chapter 4. Displaying and Summarizing. Quantitative Data STAT 141 Introduction to Statistics Chapter 4 Displaying and Summarizing Quantitative Data Bin Zou (bzou@ualberta.ca) STAT 141 University of Alberta Winter 2015 1 / 31 4.1 Histograms 1 We divide the range

More information

Statistics I Chapter 2: Univariate data analysis

Statistics I Chapter 2: Univariate data analysis Statistics I Chapter 2: Univariate data analysis Chapter 2: Univariate data analysis Contents Graphical displays for categorical data (barchart, piechart) Graphical displays for numerical data data (histogram,

More information

Example 2. Given the data below, complete the chart:

Example 2. Given the data below, complete the chart: Statistics 2035 Quiz 1 Solutions Example 1. 2 64 150 150 2 128 150 2 256 150 8 8 Example 2. Given the data below, complete the chart: 52.4, 68.1, 66.5, 75.0, 60.5, 78.8, 63.5, 48.9, 81.3 n=9 The data is

More information

FREQUENCY DISTRIBUTIONS AND PERCENTILES

FREQUENCY DISTRIBUTIONS AND PERCENTILES FREQUENCY DISTRIBUTIONS AND PERCENTILES New Statistical Notation Frequency (f): the number of times a score occurs N: sample size Simple Frequency Distributions Raw Scores The scores that we have directly

More information

MATH 117 Statistical Methods for Management I Chapter Three

MATH 117 Statistical Methods for Management I Chapter Three Jubail University College MATH 117 Statistical Methods for Management I Chapter Three This chapter covers the following topics: I. Measures of Center Tendency. 1. Mean for Ungrouped Data (Raw Data) 2.

More information

Descriptive Univariate Statistics and Bivariate Correlation

Descriptive Univariate Statistics and Bivariate Correlation ESC 100 Exploring Engineering Descriptive Univariate Statistics and Bivariate Correlation Instructor: Sudhir Khetan, Ph.D. Wednesday/Friday, October 17/19, 2012 The Central Dogma of Statistics used to

More information

Determining the Spread of a Distribution

Determining the Spread of a Distribution Determining the Spread of a Distribution 1.3-1.5 Cathy Poliak, Ph.D. cathy@math.uh.edu Department of Mathematics University of Houston Lecture 3-2311 Lecture 3-2311 1 / 58 Outline 1 Describing Quantitative

More information

QUANTITATIVE DATA. UNIVARIATE DATA data for one variable

QUANTITATIVE DATA. UNIVARIATE DATA data for one variable QUANTITATIVE DATA Recall that quantitative (numeric) data values are numbers where data take numerical values for which it is sensible to find averages, such as height, hourly pay, and pulse rates. UNIVARIATE

More information

The empirical ( ) rule

The empirical ( ) rule The empirical (68-95-99.7) rule With a bell shaped distribution, about 68% of the data fall within a distance of 1 standard deviation from the mean. 95% fall within 2 standard deviations of the mean. 99.7%

More information

Learning Objectives for Stat 225

Learning Objectives for Stat 225 Learning Objectives for Stat 225 08/20/12 Introduction to Probability: Get some general ideas about probability, and learn how to use sample space to compute the probability of a specific event. Set Theory:

More information

Determining the Spread of a Distribution

Determining the Spread of a Distribution Determining the Spread of a Distribution 1.3-1.5 Cathy Poliak, Ph.D. cathy@math.uh.edu Department of Mathematics University of Houston Lecture 3-2311 Lecture 3-2311 1 / 58 Outline 1 Describing Quantitative

More information

Biostatistics Presentation of data DR. AMEER KADHIM HUSSEIN M.B.CH.B.FICMS (COM.)

Biostatistics Presentation of data DR. AMEER KADHIM HUSSEIN M.B.CH.B.FICMS (COM.) Biostatistics Presentation of data DR. AMEER KADHIM HUSSEIN M.B.CH.B.FICMS (COM.) PRESENTATION OF DATA 1. Mathematical presentation (measures of central tendency and measures of dispersion). 2. Tabular

More information

Further Mathematics 2018 CORE: Data analysis Chapter 2 Summarising numerical data

Further Mathematics 2018 CORE: Data analysis Chapter 2 Summarising numerical data Chapter 2: Summarising numerical data Further Mathematics 2018 CORE: Data analysis Chapter 2 Summarising numerical data Extract from Study Design Key knowledge Types of data: categorical (nominal and ordinal)

More information

AP Final Review II Exploring Data (20% 30%)

AP Final Review II Exploring Data (20% 30%) AP Final Review II Exploring Data (20% 30%) Quantitative vs Categorical Variables Quantitative variables are numerical values for which arithmetic operations such as means make sense. It is usually a measure

More information

What is statistics? Statistics is the science of: Collecting information. Organizing and summarizing the information collected

What is statistics? Statistics is the science of: Collecting information. Organizing and summarizing the information collected What is statistics? Statistics is the science of: Collecting information Organizing and summarizing the information collected Analyzing the information collected in order to draw conclusions Two types

More information

MATH 10 INTRODUCTORY STATISTICS

MATH 10 INTRODUCTORY STATISTICS MATH 10 INTRODUCTORY STATISTICS Tommy Khoo Your friendly neighbourhood graduate student. Week 1 Chapter 1 Introduction What is Statistics? Why do you need to know Statistics? Technical lingo and concepts:

More information

Chapter. Numerically Summarizing Data Pearson Prentice Hall. All rights reserved

Chapter. Numerically Summarizing Data Pearson Prentice Hall. All rights reserved Chapter 3 Numerically Summarizing Data Section 3.1 Measures of Central Tendency Objectives 1. Determine the arithmetic mean of a variable from raw data 2. Determine the median of a variable from raw data

More information

Statistics I Chapter 2: Univariate data analysis

Statistics I Chapter 2: Univariate data analysis Statistics I Chapter 2: Univariate data analysis Chapter 2: Univariate data analysis Contents Graphical displays for categorical data (barchart, piechart) Graphical displays for numerical data data (histogram,

More information

Statistics for Managers using Microsoft Excel 6 th Edition

Statistics for Managers using Microsoft Excel 6 th Edition Statistics for Managers using Microsoft Excel 6 th Edition Chapter 3 Numerical Descriptive Measures 3-1 Learning Objectives In this chapter, you learn: To describe the properties of central tendency, variation,

More information

MIDTERM EXAMINATION (Spring 2011) STA301- Statistics and Probability

MIDTERM EXAMINATION (Spring 2011) STA301- Statistics and Probability STA301- Statistics and Probability Solved MCQS From Midterm Papers March 19,2012 MC100401285 Moaaz.pk@gmail.com Mc100401285@gmail.com PSMD01 MIDTERM EXAMINATION (Spring 2011) STA301- Statistics and Probability

More information

MATH 2560 C F03 Elementary Statistics I Lecture 1: Displaying Distributions with Graphs. Outline.

MATH 2560 C F03 Elementary Statistics I Lecture 1: Displaying Distributions with Graphs. Outline. MATH 2560 C F03 Elementary Statistics I Lecture 1: Displaying Distributions with Graphs. Outline. data; variables: categorical & quantitative; distributions; bar graphs & pie charts: What Is Statistics?

More information

ST Presenting & Summarising Data Descriptive Statistics. Frequency Distribution, Histogram & Bar Chart

ST Presenting & Summarising Data Descriptive Statistics. Frequency Distribution, Histogram & Bar Chart ST2001 2. Presenting & Summarising Data Descriptive Statistics Frequency Distribution, Histogram & Bar Chart Summary of Previous Lecture u A study often involves taking a sample from a population that

More information

Quantitative Methods Chapter 0: Review of Basic Concepts 0.1 Business Applications (II) 0.2 Business Applications (III)

Quantitative Methods Chapter 0: Review of Basic Concepts 0.1 Business Applications (II) 0.2 Business Applications (III) Quantitative Methods Chapter 0: Review of Basic Concepts 0.1 Business Applications (II) 0.1.1 Simple Interest 0.2 Business Applications (III) 0.2.1 Expenses Involved in Buying a Car 0.2.2 Expenses Involved

More information

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1 Lecture Slides Elementary Statistics Tenth Edition and the Triola Statistics Series by Mario F. Triola Slide 1 Chapter 3 Statistics for Describing, Exploring, and Comparing Data 3-1 Overview 3-2 Measures

More information

After completing this chapter, you should be able to:

After completing this chapter, you should be able to: Chapter 2 Descriptive Statistics Chapter Goals After completing this chapter, you should be able to: Compute and interpret the mean, median, and mode for a set of data Find the range, variance, standard

More information

Comparing Measures of Central Tendency *

Comparing Measures of Central Tendency * OpenStax-CNX module: m11011 1 Comparing Measures of Central Tendency * David Lane This work is produced by OpenStax-CNX and licensed under the Creative Commons Attribution License 1.0 1 Comparing Measures

More information

2/2/2015 GEOGRAPHY 204: STATISTICAL PROBLEM SOLVING IN GEOGRAPHY MEASURES OF CENTRAL TENDENCY CHAPTER 3: DESCRIPTIVE STATISTICS AND GRAPHICS

2/2/2015 GEOGRAPHY 204: STATISTICAL PROBLEM SOLVING IN GEOGRAPHY MEASURES OF CENTRAL TENDENCY CHAPTER 3: DESCRIPTIVE STATISTICS AND GRAPHICS Spring 2015: Lembo GEOGRAPHY 204: STATISTICAL PROBLEM SOLVING IN GEOGRAPHY CHAPTER 3: DESCRIPTIVE STATISTICS AND GRAPHICS Descriptive statistics concise and easily understood summary of data set characteristics

More information

Chapter2 Description of samples and populations. 2.1 Introduction.

Chapter2 Description of samples and populations. 2.1 Introduction. Chapter2 Description of samples and populations. 2.1 Introduction. Statistics=science of analyzing data. Information collected (data) is gathered in terms of variables (characteristics of a subject that

More information

(quantitative or categorical variables) Numerical descriptions of center, variability, position (quantitative variables)

(quantitative or categorical variables) Numerical descriptions of center, variability, position (quantitative variables) 3. Descriptive Statistics Describing data with tables and graphs (quantitative or categorical variables) Numerical descriptions of center, variability, position (quantitative variables) Bivariate descriptions

More information

SESSION 5 Descriptive Statistics

SESSION 5 Descriptive Statistics SESSION 5 Descriptive Statistics Descriptive statistics are used to describe the basic features of the data in a study. They provide simple summaries about the sample and the measures. Together with simple

More information

Types of Information. Topic 2 - Descriptive Statistics. Examples. Sample and Sample Size. Background Reading. Variables classified as STAT 511

Types of Information. Topic 2 - Descriptive Statistics. Examples. Sample and Sample Size. Background Reading. Variables classified as STAT 511 Topic 2 - Descriptive Statistics STAT 511 Professor Bruce Craig Types of Information Variables classified as Categorical (qualitative) - variable classifies individual into one of several groups or categories

More information

Chapter 2 Solutions Page 15 of 28

Chapter 2 Solutions Page 15 of 28 Chapter Solutions Page 15 of 8.50 a. The median is 55. The mean is about 105. b. The median is a more representative average" than the median here. Notice in the stem-and-leaf plot on p.3 of the text that

More information

Descriptive statistics

Descriptive statistics Patrick Breheny February 6 Patrick Breheny to Biostatistics (171:161) 1/25 Tables and figures Human beings are not good at sifting through large streams of data; we understand data much better when it

More information

Chapter 5: Exploring Data: Distributions Lesson Plan

Chapter 5: Exploring Data: Distributions Lesson Plan Lesson Plan Exploring Data Displaying Distributions: Histograms Interpreting Histograms Displaying Distributions: Stemplots Describing Center: Mean and Median Describing Variability: The Quartiles The

More information

F78SC2 Notes 2 RJRC. If the interest rate is 5%, we substitute x = 0.05 in the formula. This gives

F78SC2 Notes 2 RJRC. If the interest rate is 5%, we substitute x = 0.05 in the formula. This gives F78SC2 Notes 2 RJRC Algebra It is useful to use letters to represent numbers. We can use the rules of arithmetic to manipulate the formula and just substitute in the numbers at the end. Example: 100 invested

More information

Topic 3: Introduction to Statistics. Algebra 1. Collecting Data. Table of Contents. Categorical or Quantitative? What is the Study of Statistics?!

Topic 3: Introduction to Statistics. Algebra 1. Collecting Data. Table of Contents. Categorical or Quantitative? What is the Study of Statistics?! Topic 3: Introduction to Statistics Collecting Data We collect data through observation, surveys and experiments. We can collect two different types of data: Categorical Quantitative Algebra 1 Table of

More information

Review for Exam #1. Chapter 1. The Nature of Data. Definitions. Population. Sample. Quantitative data. Qualitative (attribute) data

Review for Exam #1. Chapter 1. The Nature of Data. Definitions. Population. Sample. Quantitative data. Qualitative (attribute) data Review for Exam #1 1 Chapter 1 Population the complete collection of elements (scores, people, measurements, etc.) to be studied Sample a subcollection of elements drawn from a population 11 The Nature

More information

Chapter 2 Class Notes Sample & Population Descriptions Classifying variables

Chapter 2 Class Notes Sample & Population Descriptions Classifying variables Chapter 2 Class Notes Sample & Population Descriptions Classifying variables Random Variables (RVs) are discrete quantitative continuous nominal qualitative ordinal Notation and Definitions: a Sample is

More information

2.1 Measures of Location (P.9-11)

2.1 Measures of Location (P.9-11) MATH1015 Biostatistics Week.1 Measures of Location (P.9-11).1.1 Summation Notation Suppose that we observe n values from an experiment. This collection (or set) of n values is called a sample. Let x 1

More information

CHAPTER 1. Introduction

CHAPTER 1. Introduction CHAPTER 1 Introduction Engineers and scientists are constantly exposed to collections of facts, or data. The discipline of statistics provides methods for organizing and summarizing data, and for drawing

More information

Descriptive Statistics C H A P T E R 5 P P

Descriptive Statistics C H A P T E R 5 P P Descriptive Statistics C H A P T E R 5 P P 1 1 0-130 Graphing data Frequency distributions Bar graphs Qualitative variable (categories) Bars don t touch Histograms Frequency polygons Quantitative variable

More information

Chapter. Numerically Summarizing Data. Copyright 2013, 2010 and 2007 Pearson Education, Inc.

Chapter. Numerically Summarizing Data. Copyright 2013, 2010 and 2007 Pearson Education, Inc. Chapter 3 Numerically Summarizing Data Section 3.1 Measures of Central Tendency Objectives 1. Determine the arithmetic mean of a variable from raw data 2. Determine the median of a variable from raw data

More information

Descriptive Data Summarization

Descriptive Data Summarization Descriptive Data Summarization Descriptive data summarization gives the general characteristics of the data and identify the presence of noise or outliers, which is useful for successful data cleaning

More information

Lecture Notes 2: Variables and graphics

Lecture Notes 2: Variables and graphics Highlights: Lecture Notes 2: Variables and graphics Quantitative vs. qualitative variables Continuous vs. discrete and ordinal vs. nominal variables Frequency distributions Pie charts Bar charts Histograms

More information

Statistical Methods. by Robert W. Lindeman WPI, Dept. of Computer Science

Statistical Methods. by Robert W. Lindeman WPI, Dept. of Computer Science Statistical Methods by Robert W. Lindeman WPI, Dept. of Computer Science gogo@wpi.edu Descriptive Methods Frequency distributions How many people were similar in the sense that according to the dependent

More information

3.1 Measures of Central Tendency: Mode, Median and Mean. Average a single number that is used to describe the entire sample or population

3.1 Measures of Central Tendency: Mode, Median and Mean. Average a single number that is used to describe the entire sample or population . Measures of Central Tendency: Mode, Median and Mean Average a single number that is used to describe the entire sample or population. Mode a. Easiest to compute, but not too stable i. Changing just one

More information

Course ID May 2017 COURSE OUTLINE. Mathematics 130 Elementary & Intermediate Algebra for Statistics

Course ID May 2017 COURSE OUTLINE. Mathematics 130 Elementary & Intermediate Algebra for Statistics Non-Degree Applicable Glendale Community College Course ID 010238 May 2017 Catalog Statement COURSE OUTLINE Mathematics 130 Elementary & Intermediate Algebra for Statistics is a one-semester accelerated

More information

BIOS 2041: Introduction to Statistical Methods

BIOS 2041: Introduction to Statistical Methods BIOS 2041: Introduction to Statistical Methods Abdus S Wahed* *Some of the materials in this chapter has been adapted from Dr. John Wilson s lecture notes for the same course. Chapter 0 2 Chapter 1 Introduction

More information

1. Types of Biological Data 2. Summary Descriptive Statistics

1. Types of Biological Data 2. Summary Descriptive Statistics Lecture 1: Basic Descriptive Statistics 1. Types of Biological Data 2. Summary Descriptive Statistics Measures of Central Tendency Measures of Dispersion 3. Assignments 1. Types of Biological Data Scales

More information

Lecture 2. Quantitative variables. There are three main graphical methods for describing, summarizing, and detecting patterns in quantitative data:

Lecture 2. Quantitative variables. There are three main graphical methods for describing, summarizing, and detecting patterns in quantitative data: Lecture 2 Quantitative variables There are three main graphical methods for describing, summarizing, and detecting patterns in quantitative data: Stemplot (stem-and-leaf plot) Histogram Dot plot Stemplots

More information

Probabilities and Statistics Probabilities and Statistics Probabilities and Statistics

Probabilities and Statistics Probabilities and Statistics Probabilities and Statistics - Lecture 8 Olariu E. Florentin April, 2018 Table of contents 1 Introduction Vocabulary 2 Descriptive Variables Graphical representations Measures of the Central Tendency The Mean The Median The Mode Comparing

More information

Preliminary Statistics course. Lecture 1: Descriptive Statistics

Preliminary Statistics course. Lecture 1: Descriptive Statistics Preliminary Statistics course Lecture 1: Descriptive Statistics Rory Macqueen (rm43@soas.ac.uk), September 2015 Organisational Sessions: 16-21 Sep. 10.00-13.00, V111 22-23 Sep. 15.00-18.00, V111 24 Sep.

More information

MATH 1150 Chapter 2 Notation and Terminology

MATH 1150 Chapter 2 Notation and Terminology MATH 1150 Chapter 2 Notation and Terminology Categorical Data The following is a dataset for 30 randomly selected adults in the U.S., showing the values of two categorical variables: whether or not the

More information

Chapter 1 - Lecture 3 Measures of Location

Chapter 1 - Lecture 3 Measures of Location Chapter 1 - Lecture 3 of Location August 31st, 2009 Chapter 1 - Lecture 3 of Location General Types of measures Median Skewness Chapter 1 - Lecture 3 of Location Outline General Types of measures What

More information

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 3.1- #

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 3.1- # Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series by Mario F. Triola Chapter 3 Statistics for Describing, Exploring, and Comparing Data 3-1 Review and Preview 3-2 Measures

More information

Math 223 Lecture Notes 3/15/04 From The Basic Practice of Statistics, bymoore

Math 223 Lecture Notes 3/15/04 From The Basic Practice of Statistics, bymoore Math 223 Lecture Notes 3/15/04 From The Basic Practice of Statistics, bymoore Chapter 3 continued Describing distributions with numbers Measuring spread of data: Quartiles Definition 1: The interquartile

More information

Biostatistics for biomedical profession. BIMM34 Karin Källen & Linda Hartman November-December 2015

Biostatistics for biomedical profession. BIMM34 Karin Källen & Linda Hartman November-December 2015 Biostatistics for biomedical profession BIMM34 Karin Källen & Linda Hartman November-December 2015 12015-11-02 Who needs a course in biostatistics? - Anyone who uses quntitative methods to interpret biological

More information