BIOS 2041: Introduction to Statistical Methods
|
|
- Britton Sutton
- 5 years ago
- Views:
Transcription
1 BIOS 2041: Introduction to Statistical Methods Abdus S Wahed* *Some of the materials in this chapter has been adapted from Dr. John Wilson s lecture notes for the same course.
2 Chapter 0 2
3 Chapter 1 Introduction to Statistical Methods 1.1 What is Statistics? Statistics Science of making inferences about specific random phenomena based on limited sample materials. The discipline provides methods for answering questions such as What effect does air pollution have on the residents of Pittsburgh? What proportion of Pittsburgh residents invest in stocks or bonds? Is drug A better than drug B in relieving certain asthma 3
4 symptoms? Does vitamin A prevent cancer? Based on this quarter s performance of stock returns, what strategy will optimize the expected return in the next quarter? A central task of statistical analysis is to draw a conclusion ( make inference ) about a population of interest based on evidence in a sample from that population. Population = the set of all subjects or individuals who could be measured for some variable of interest. Another viewpoint is that the population is the group about which you wish to draw a conclusion. Example: All women in Allegheny County A parameter is a numeric characteristic of a population. Example: Proportion of women in Allegheny County having a female relative who has been treated for breast cancer. Chapter 1 4
5 A sample is a subset of population selected for study. The idea is that the sample will provide the information used in drawing the conclusion about the population. Example: 200 Allegheny County women selected by randomdigit telephone dialing. A statistic is a numeric characteristic of a sample. Example: The observed proportion of women with a female relative who had been treated for breast cancer is 35%. Inference is the conclusion drawn about population on basis of sample. Example: The proportion of Allegheny County women having a female relative who has had breast cancer is 35%. Another example: Population: All patients treated for Acute Myelocytic Leukemia (AML) who are in first complete remission (CR1). Chapter 1 5
6 Parameter: Median duration of remission of treated AML patients in CR1. Sample: 35 AML / CR1 patients treated at the University of Pittsburgh Cancer Institute during Statistic: The median duration of CR1 in these 35 patients was 13 months. Inference: The median duration of CR1 in patients treated for AML is 13 months. 1.2 What is Biostatistics? Biostatistics The branch of statistics that applies statistical methods to medical and biological problems. Biostatisticians help researchers (basic scientists, medical researchers, drug developers) from the inception of a study to its completion. The role of a biostatistician in the process is: Chapter 1 6
7 To formulate the research question in concrete terms hypothesis. To plan the experiment/study that will answer the research question accurately and efficiently e.g. How many subjects (mice, patients, machines) will be needed to answer the research question? How would, for example, subjects be assigned to different groups? What data should be collected on each subject? How would the data be verified and processed? What are the issues with the data? e.g. How would the missing data be handled? Are there measurement errors in the data? How is it going to be handled? Analyze the collected data to draw conclusions regarding the hypotheses. Chapter 1 7
8 Example Drug development. XYZ pharmaceuticals has been conducting research on developing drugs for hepatitis C (Hep C) treatment since Their basic science researchers have convinced Food and Drug Administration (FDA) through phase I and II trials that they have discovered a new molecule of the standard interferon that can be administered once weekly instead of once daily, and they claim that the drug provides better response rate compared to standard interferon. The company is planning to test the drug on a large cohort of hepatitis C patients. The statistician assigned for this study will generally start asking basic questions like: 1. How would you quantify the response? (Usually a simplified answer would be: absence of Hep C virus in the serum 24 weeks after the end of the treatment.) 2. How much improvement do you expect in response rate among the users of the new drug compared to standard interferon users? (The Phase II trial would indicate some ball park figure for this.) Chapter 1 8
9 Based on the answers, the statistician will Formulate the hypothesis in quantitative terms: H 0 : P 1 = P 2, (1.2.1) P 1 is the response rate in the standard interferon group and P 2 is the response rate for the new treatment (weekly interferon). Determine the number of patients to be recruited in the (standard) daily interferon group and in the (new) weekly interferon group. Make sure that the patient safety and privacy is ensured in the protocol keeping in mind the objective of the study. Devise a randomization scheme (possibly double-blinded) to assign treatments to patients so that the two groups are comparable with respect to patient characteristics. Suggest a data collection, verification and management plan. Chapter 1 9
10 How many sites will be used for patient recruitment? What data needs to be collected? What system will be used to transfer the data? How will the data be processed? What information and how often should the data be presented to the DSMB (Data Safety Monitoring Board)? What criteria should be used to declare the new treatment significantly better? How many interim analysis should be planned? What criteria should be used for stopping the trial? Finally, when the trial ends, the statistician will conduct/oversee the data analysis to arrive at a conclusion regarding the hypothesis. In this course, we will mainly talk about: Chapter 1 10
11 Statistical methods to analyze collected data so that answers to specific questions of interest can be made. Design issues, for example, sample size and power, etc. We will cover: Chapters 1-8 (in full), (partial). Chapter 1 11
12 Chapter 1 12
13 Chapter 2 Descriptive Statistics In most cases data consist of many sample points. In a bid to interpret data, the first task is to summarize the data in some concise manner. 2.1 Types of data. Data collected, outcomes of experiments, etc. are often referred to as variables or outcomes, which come in several varieties. The type of outcome observed plays a role in determining which statistical procedures are appropriate. 13
14 Categorical (discrete) - data can be assigned to discrete categories. a) Unordered i) Gender ii) Political party to which one belongs iii) Exposed vs not exposed iv) Disease or no disease b) Ordered i) Good- Better- Best classification ii) Number of times patient admitted to hospital for illness during a given year. Continuous variables a) Ordinary or uncensored i) Standard scale measurements -height Chapter 2 14
15 -weight - optical density -ph ii) Survival times that are actually observed. b) Censored data i) Survival time- may be known only that time is greater than some observed time. Here is the first 10 records from a dataset: Table 2.1: Several records from a dataset Obs ID AGE SEX LEADTYP IQF Chapter 2 15
16 Many numerical and graphical techniques are available for the purpose of summarizing data. We will start with continuous variables. 2.2 Measures of Location The first sets of summary measures will define the center (or middle) of the sample data. Such measures are known as measures of location or measures of central tendency. We will start with the simplest of these measures, the arithmetic mean (or simply, the mean) Arithmetic Mean Arithmetic mean is the sum of the observations divided by the number of observations. Formula: If X is what is measured (observed) and x 1,x 2,...,x n are the values of n measurements, then the arithmetic mean is given by the formula: x = x 1 + x x n n = n i=1 x i. (2.2.1) n Chapter 2 16
17 Example Table 2.1 (Rosner) Table 2.2: Sample of birthweights (g) of live-born infants born at a private hospital in San Diego, California, during a 1-week period. New-born Weight (g) New-born Weight (g) New-born Weight (g) New-born Weight (g) X = birthweights (g) of live-born infants x = = g. (2.2.2) Facts about mean Arithmetic mean is easy to compute. If the sample points change in scale by a factor of c, themean changes by a factor of c. In some cases it fails reflect the center of the sample, specifically in the presence of unusually high or low values (outliers). Chapter 2 17
18 It is most widely used measures of location Median Loosely speaking, the median is a number such that in the ordered sample, half of the sample points lies below it, and half above it. Formula: If n is odd then ( ) n+1 2 th observation is the median. Otherwise, median is defined as the average of the ( ( n 2) th and n 2 +1) th largest observations. Example Table 2.2 (Rosner). White blood cell counts ( 1000) for a sample of 9 patients entering a hospital. The ordered sample is as follows: 3, 5, 7, 8, 8, 9, 10, 12, 35 Here, n = 9, and hence ( ) n+1 2 = 5. The median white blood cell counts for this sample is the 5th observation, which is Chapter 2 18
19 Facts about median: Median is not highly influenced by extreme observations, unless there is only one or two data points. Median depends only on one or two middle observations and hence is less sensitive to the magnitude of other observations in the sample Mode Mode is the most frequently occurring value in the sample. In the above example, the mode white blood cell count is 8000 as it occurs most frequently than any other white-blood count. Facts about mode: If all the data points occur exactly the same number of times, then there is no mode. A sample with one mode is called unimodal;twomodes,bimodal; Chapter 2 19
20 three modes, trimodal; and so on Geometric Mean Geometric mean is often used for summarizing ratios, percentages, indices, or other data sets bounded by zero. The geometric mean of n positive numbers x 1,x 2,...,x n ia defined as the n-th root of their product. Formula: GM = n x 1 x 2... x n =(x 1 x 2... x n ) n. 1 (2.2.3) In Example (2.2.2), the geometric mean is ( ) 1 9 =8.59 Facts about geometric mean: Only defined for non-negative numbers. Usually, if a distribution on the positive axis is asymmetric, then a log transformation is used to make it symmetric. For such distributions the geometric mean is used. Chapter 2 20
21 2.3 Measures of Spread/Variation/Dispersion Refer to Figure 2.4 (FOB) Range Range is the difference between the largest and the smallest observations. For the birthweights data in Table 2.1, the range is Range = = 2077g. For the data in Figure 2.4 (FOB), the range for the Autoanalyzer method is = 49mg/dl, whereas the same for the Microenzymetic method is = 17mg/dl. Thus, one can claim that: The Microenzymetic method measures cholesterol levels more consistently than Autoanalyzer method does. Or, equivalently, Chapter 2 21
22 Measurements of cholesterol levels using Microenzymetic method are more precise than those using Autoanalyzer method. Or, equivalently, Microenzymetic cholesterol measurements have lower variability compared to Autoanalyzer cholesterol measurements. Facts about range: Easy to compute. Depends highly on the extreme values Percentiles/Quantiles and Interquartile Range The 100pth (0 p 1) percentile of a distribution is the value V p such that 100p% of the sample points are less than or equal to V p. Median is the 50th percentile. Chapter 2 22
23 For the birthweights data in Table 2.1, some of the percentiles are calculated as: Position Percentile How we calculated it from the ordered data 10th n p = = 2; The average of 2nd and 3rd observation. 25th n p = = 5; The average of 5th and 6th observation. 50th n p = = 10; The average of 10th and 11th observation. 75th n p = = 15; The average of 15th and 16th observation. 95th n p = = 18; The average of 18th and 19th observation. 99th n p = = 19.8; The 20th observation. Table 2.3: Percentiles for the Birthweights data in Table 2.1 (Rosner) Facts about percentiles Percentiles are also known as quantiles. Percentiles characterize the relative positioning of the observations in the sample. The spread of the distribution about the center can be characterized by specifying cerain quantiles. For instance, 25th and 75th percentiles tell us that the middle half of the sample points lies between these two values. Chapter 2 23
24 The 25th percentile and the 75th percentile of a distribution are commonly referred to as 1st (lower) and 3rd (upper) quartiles. Here are the percentiles for the cholesterol data in Figure 2.4 (FOB): Method N Lower Quartile Median Upper Quartile IQR Auto Micro Table 2.4: Percentiles for the Cholesterol data in Figure 2.4 (Rosner) Interquartile range The distance between the 1st quartile (Q 1 ) and the 3rd quartile (Q 3 ) is known as interquartile range (IQR). Interquartile range is useful for comparing the spread of two distribution as well as detecting outliers. The higher the IQR, the more variable the distribution is. For the cholesterol data, the IQR for Autoanalyzer method and the microenzymatic method are respectively 16 and 5 which justifies our previous claim that the autoanalyzer method is not as precise as the Microenzymatic method. Chapter 2 24
25 For a positively skewed distribution, the distance between the median and upper quartile is greater than the distance between median and the lower quartile. For a negatively skewed distribution, the distance between the median and upper quartile is smaller than the distance between median and the lower quartile. [Birthweights data (Table 2.1, FOB)] For a symmetric distribution, the distance between the median and upper quartile is approximately equal the distance between median and the lower quartile. [For the menstrual cycle data Table 2.3 (FOB), Q 1 =28=Median, Q 3 = 29.] Outliers Outliers are extremely high or low values that are isolated from the overall distribution. Outliers in a data set can be identified based on the lower and upper quartiles. Formula: Chapter 2 25
26 An observation x can be treated as an outlier if either 1. x>q IQR,or 2. x<q IQR. Formula: An observation x is an extreme outlier if either 1. x>q 3 +3 IQR,or 2. x<q 1 3 IQR. Are there any outliers in the cholesterol data set? Mean deviation Let us look at the cholesterol data one more time. [INSERT CHOLSTEROL FIGURE] Look at how each observation differs from the mean; i.e, x 1 x, x 2 x, x 3 x,...,x n x. One way to measure the spread is to look at how sample points in the data differ from the mean. However, the mean of these differences Chapter 2 26
27 are zero for any data. For the autoanalyzer method sample, the differences are: ( ) = 23, ( ) = 7, ( ) = 5, ( ) = 9, and ( ) = 26, and the mean difference is zero. Same is true for the microenzymatic method. Therefore the mean difference about the mean cannot be used to distinguish between samples based on spreads. What if we just take the average of the distances, instead of differences, i.e, x 1 x, x 2 x, x 3 x,..., x n x. Average of the distances from mean is known as mean deviation. For the autoanalyzer method sample, the distances are: 23, 7, 5, 9, and 26 with an average of 14. On the other hand, the mean deviation for the microenzymatic method is 4.4. Chapter 2 27
28 2.3.4 Variance and Standard Deviation In the definition of the mean deviation, we used absolute values of the difference between individual observations and the sample mean. Absolute values are sometimes difficult to deal with. Another measure of spread uses the squared deviations from the mean and averages it over the whole sample. The measure, known as variance, isdefined as: s 2 = n i=1 (x i x) 2. (2.3.1) n 1 The use of n 1 instead of n in the denominator have special justification, which we will discuss in chapter 6. Standard deviation is defined as the positive square root of the variance: s = n i=1 (x i x) 2. (2.3.2) n 1 For the autoanalyzer method, the variance is s 2 = ( 23)2 +( 7) 2 +( 5) = 340. Chapter 2 28
29 For the microenzymatic method, the variance is s 2 = ( 8)2 +( 3) 2 + (0) =39.5. Corresponding standard deviations are respectively s = 340 = 18.4 and s = 39.5 =6.3. Thus the spread, as measured by the standard deviation, is approximately three times as large as that of microenzymatic method. Facts about variance and standard deviation Variance and standard deviation remain unchanged when all the observations in the sample are shifted by the same constant. For example, the following two samples have the same variance (340) and standard deviation (18.4): Sample 1: 77, 93, 95, 109, 126 Sample 2: 177, 193, 195, 209, 226 Standard deviation has the same unit of measurement as the original samples. Chapter 2 29
30 If the sample points change in scale by a factor of c, the variance changes by a factor of c 2 and the standard deviation changes by a factor of c. Standard deviation is the most widely used measure of spread (dispersion) Coefficient of Variation Suppose you are comparing two distributions having different means. How would you compare the variability of a sample with mean 10 and standard deviation 5 to a sample with mean 100 and standard deviation 5? Of course, the former is more variable, as the magnitude of the standard deviation relative to the mean is much higher for that sample compared to the latter. The measure coefficient of variation is designed to account for the magnitude of mean when assessing the spread. It is defined as: CV = s x 100. (2.3.3) Chapter 2 30
31 For the cholesterol data in Table 2.4 (FOB), the coefficient of variations for the Autoanalyzer and Microenzymatic methods are respectively 9.2% and 3.1%. 2.4 Graphical Representation Histogram Histogram is a useful way of presenting data graphically. It presents frequencies (or relative frequencies) on the Y-axis against the data points on X-axis. The frequencies along with the values are usually referred to as the frequency distribution or distribution. When the number of unique observations are too large, the range of the variable is categorized in continuous intervals and the number of observations belonging to those intervals are reported. Distributions having two tails approximately similar are called symmetric distributions. For such distributions Mean Median Mode. Chapter 2 31
32 Histogram of Menstrual Cycle Relative Frequency Time (days) Figure 2.1: Distribution of time intervals between successive menstrual periods (days) of college women (Table 2.3; Rosner; Page 13). Mean=28.5; Median=28; Mode=28. A distribution which has a longer tail on the right is called a positively skewed distribution. For such distributions data points on the right of the median tends to be farther from the median in absolute value than points below median, Chapter 2 32
33 Mean Median Mode. Figure 2.2: Example of a distribution which is neither skewed, nor symmetric. Distributions with a tail on the left are known as negatively skewed distributions. For such distributions Mean Median Mode. For more examples on symmetric, positively skewed and negatively skewed distributions, refer to page 12 of FOB. Chapter 2 33
34 2.4.2 Stem-and-leaf Plot Stem-and-leaf plot is similar to histogram, but it keeps the plot more close to the actual data by using the observations from the actual sample. It shows the basic shape of the distribution just like histogram does. Stem Leaf Number 21 1 Multiply Stem.Leaf by 10**+3 Figure 2.3: Steam-and-leaf plot for the birthweights data in Table 2.1 (FOB) Box plot Chapter 2 34
35 Stem Leaf Multiply Stem.Leaf by 10**+1 Number Figure 2.4: Steam-and-leaf plot for the the variable IQF from the dataset Lead in the case study described in section 2.9 (FOB). Chapter 2 35
36 Figure 2.5: Box plot for the the variable IQF from the dataset Lead in the case study described in section 2.9 (FOB) by exposure type *-----* *-----* / / LEAD_TYP 1 2 Chapter 2 36
BIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke
BIOL 51A - Biostatistics 1 1 Lecture 1: Intro to Biostatistics Smoking: hazardous? FEV (l) 1 2 3 4 5 No Yes Smoke BIOL 51A - Biostatistics 1 2 Box Plot a.k.a box-and-whisker diagram or candlestick chart
More informationSTAT 200 Chapter 1 Looking at Data - Distributions
STAT 200 Chapter 1 Looking at Data - Distributions What is Statistics? Statistics is a science that involves the design of studies, data collection, summarizing and analyzing the data, interpreting the
More informationWhat is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty.
What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty. Statistics is a field of study concerned with the data collection,
More informationDescribing distributions with numbers
Describing distributions with numbers A large number or numerical methods are available for describing quantitative data sets. Most of these methods measure one of two data characteristics: The central
More informationDescribing distributions with numbers
Describing distributions with numbers A large number or numerical methods are available for describing quantitative data sets. Most of these methods measure one of two data characteristics: The central
More informationCIVL 7012/8012. Collection and Analysis of Information
CIVL 7012/8012 Collection and Analysis of Information Uncertainty in Engineering Statistics deals with the collection and analysis of data to solve real-world problems. Uncertainty is inherent in all real
More informationChapter 2 Descriptive Statistics
Chapter 2 Descriptive Statistics Lecture 1: Measures of Central Tendency and Dispersion Donald E. Mercante, PhD Biostatistics May 2010 Biostatistics (LSUHSC) Chapter 2 05/10 1 / 34 Lecture 1: Descriptive
More informationLast Lecture. Distinguish Populations from Samples. Knowing different Sampling Techniques. Distinguish Parameters from Statistics
Last Lecture Distinguish Populations from Samples Importance of identifying a population and well chosen sample Knowing different Sampling Techniques Distinguish Parameters from Statistics Knowing different
More informationTastitsticsss? What s that? Principles of Biostatistics and Informatics. Variables, outcomes. Tastitsticsss? What s that?
Tastitsticsss? What s that? Statistics describes random mass phanomenons. Principles of Biostatistics and Informatics nd Lecture: Descriptive Statistics 3 th September Dániel VERES Data Collecting (Sampling)
More informationSummarizing and Displaying Measurement Data/Understanding and Comparing Distributions
Summarizing and Displaying Measurement Data/Understanding and Comparing Distributions Histograms, Mean, Median, Five-Number Summary and Boxplots, Standard Deviation Thought Questions 1. If you were to
More informationP8130: Biostatistical Methods I
P8130: Biostatistical Methods I Lecture 2: Descriptive Statistics Cody Chiuzan, PhD Department of Biostatistics Mailman School of Public Health (MSPH) Lecture 1: Recap Intro to Biostatistics Types of Data
More informationElementary Statistics
Elementary Statistics Q: What is data? Q: What does the data look like? Q: What conclusions can we draw from the data? Q: Where is the middle of the data? Q: Why is the spread of the data important? Q:
More information200 participants [EUR] ( =60) 200 = 30% i.e. nearly a third of the phone bills are greater than 75 EUR
Ana Jerončić 200 participants [EUR] about half (71+37=108) 200 = 54% of the bills are small, i.e. less than 30 EUR (18+28+14=60) 200 = 30% i.e. nearly a third of the phone bills are greater than 75 EUR
More informationUnit 2. Describing Data: Numerical
Unit 2 Describing Data: Numerical Describing Data Numerically Describing Data Numerically Central Tendency Arithmetic Mean Median Mode Variation Range Interquartile Range Variance Standard Deviation Coefficient
More information1. Exploratory Data Analysis
1. Exploratory Data Analysis 1.1 Methods of Displaying Data A visual display aids understanding and can highlight features which may be worth exploring more formally. Displays should have impact and be
More informationUnits. Exploratory Data Analysis. Variables. Student Data
Units Exploratory Data Analysis Bret Larget Departments of Botany and of Statistics University of Wisconsin Madison Statistics 371 13th September 2005 A unit is an object that can be measured, such as
More informationMEASURES OF LOCATION AND SPREAD
MEASURES OF LOCATION AND SPREAD Frequency distributions and other methods of data summarization and presentation explained in the previous lectures provide a fairly detailed description of the data and
More information1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved.
1-1 Chapter 1 Sampling and Descriptive Statistics 1-2 Why Statistics? Deal with uncertainty in repeated scientific measurements Draw conclusions from data Design valid experiments and draw reliable conclusions
More informationMeasures of center. The mean The mean of a distribution is the arithmetic average of the observations:
Measures of center The mean The mean of a distribution is the arithmetic average of the observations: x = x 1 + + x n n n = 1 x i n i=1 The median The median is the midpoint of a distribution: the number
More informationStatistics in medicine
Statistics in medicine Lecture 1- part 1: Describing variation, and graphical presentation Outline Sources of variation Types of variables Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease
More informationWhat is statistics? Statistics is the science of: Collecting information. Organizing and summarizing the information collected
What is statistics? Statistics is the science of: Collecting information Organizing and summarizing the information collected Analyzing the information collected in order to draw conclusions Two types
More informationST Presenting & Summarising Data Descriptive Statistics. Frequency Distribution, Histogram & Bar Chart
ST2001 2. Presenting & Summarising Data Descriptive Statistics Frequency Distribution, Histogram & Bar Chart Summary of Previous Lecture u A study often involves taking a sample from a population that
More informationDescriptive statistics
Patrick Breheny February 6 Patrick Breheny to Biostatistics (171:161) 1/25 Tables and figures Human beings are not good at sifting through large streams of data; we understand data much better when it
More informationUniversity of Jordan Fall 2009/2010 Department of Mathematics
handouts Part 1 (Chapter 1 - Chapter 5) University of Jordan Fall 009/010 Department of Mathematics Chapter 1 Introduction to Introduction; Some Basic Concepts Statistics is a science related to making
More informationLecture 1: Descriptive Statistics
Lecture 1: Descriptive Statistics MSU-STT-351-Sum 15 (P. Vellaisamy: MSU-STT-351-Sum 15) Probability & Statistics for Engineers 1 / 56 Contents 1 Introduction 2 Branches of Statistics Descriptive Statistics
More informationChapter 3. Measuring data
Chapter 3 Measuring data 1 Measuring data versus presenting data We present data to help us draw meaning from it But pictures of data are subjective They re also not susceptible to rigorous inference Measuring
More informationBNG 495 Capstone Design. Descriptive Statistics
BNG 495 Capstone Design Descriptive Statistics Overview The overall goal of this short course in statistics is to provide an introduction to descriptive and inferential statistical methods, with a focus
More informationDescriptive Statistics-I. Dr Mahmoud Alhussami
Descriptive Statistics-I Dr Mahmoud Alhussami Biostatistics What is the biostatistics? A branch of applied math. that deals with collecting, organizing and interpreting data using well-defined procedures.
More informationA is one of the categories into which qualitative data can be classified.
Chapter 2 Methods for Describing Sets of Data 2.1 Describing qualitative data Recall qualitative data: non-numerical or categorical data Basic definitions: A is one of the categories into which qualitative
More informationCHAPTER 1. Introduction
CHAPTER 1 Introduction Engineers and scientists are constantly exposed to collections of facts, or data. The discipline of statistics provides methods for organizing and summarizing data, and for drawing
More informationare the objects described by a set of data. They may be people, animals or things.
( c ) E p s t e i n, C a r t e r a n d B o l l i n g e r 2016 C h a p t e r 5 : E x p l o r i n g D a t a : D i s t r i b u t i o n s P a g e 1 CHAPTER 5: EXPLORING DATA DISTRIBUTIONS 5.1 Creating Histograms
More informationUNIVERSITY OF MASSACHUSETTS Department of Biostatistics and Epidemiology BioEpi 540W - Introduction to Biostatistics Fall 2004
UNIVERSITY OF MASSACHUSETTS Department of Biostatistics and Epidemiology BioEpi 50W - Introduction to Biostatistics Fall 00 Exercises with Solutions Topic Summarizing Data Due: Monday September 7, 00 READINGS.
More informationFundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur
Fundamentals to Biostatistics Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur Statistics collection, analysis, interpretation of data development of new
More informationPreliminary Statistics course. Lecture 1: Descriptive Statistics
Preliminary Statistics course Lecture 1: Descriptive Statistics Rory Macqueen (rm43@soas.ac.uk), September 2015 Organisational Sessions: 16-21 Sep. 10.00-13.00, V111 22-23 Sep. 15.00-18.00, V111 24 Sep.
More informationClinical Research Module: Biostatistics
Clinical Research Module: Biostatistics Lecture 1 Alberto Nettel-Aguirre, PhD, PStat These lecture notes based on others developed by Drs. Peter Faris, Sarah Rose Luz Palacios-Derflingher and myself Who
More informationCHAPTER 5: EXPLORING DATA DISTRIBUTIONS. Individuals are the objects described by a set of data. These individuals may be people, animals or things.
(c) Epstein 2013 Chapter 5: Exploring Data Distributions Page 1 CHAPTER 5: EXPLORING DATA DISTRIBUTIONS 5.1 Creating Histograms Individuals are the objects described by a set of data. These individuals
More informationWeek 1: Intro to R and EDA
Statistical Methods APPM 4570/5570, STAT 4000/5000 Populations and Samples 1 Week 1: Intro to R and EDA Introduction to EDA Objective: study of a characteristic (measurable quantity, random variable) for
More informationAP Final Review II Exploring Data (20% 30%)
AP Final Review II Exploring Data (20% 30%) Quantitative vs Categorical Variables Quantitative variables are numerical values for which arithmetic operations such as means make sense. It is usually a measure
More informationExploring, summarizing and presenting data. Berghold, IMI, MUG
Exploring, summarizing and presenting data Example Patient Nr Gender Age Weight Height PAVK-Grade W alking Distance Physical Functioning Scale Total Cholesterol Triglycerides 01 m 65 90 185 II b 200 70
More informationTOPIC: Descriptive Statistics Single Variable
TOPIC: Descriptive Statistics Single Variable I. Numerical data summary measurements A. Measures of Location. Measures of central tendency Mean; Median; Mode. Quantiles - measures of noncentral tendency
More information22S:105 Statistical Methods and Computing. Graphical Depiction of Qualitative and Quantitative Data and Measures of Central Tendency
22S:105 Statistical Methods and Computing Graphical Depiction of Qualitative and Quantitative Data and Measures of Central Tendency 1 2 Bar charts for nominal and ordinal data present a frequency distribution
More information2011 Pearson Education, Inc
Statistics for Business and Economics Chapter 2 Methods for Describing Sets of Data Summary of Central Tendency Measures Measure Formula Description Mean x i / n Balance Point Median ( n +1) Middle Value
More informationStatistics for Managers using Microsoft Excel 6 th Edition
Statistics for Managers using Microsoft Excel 6 th Edition Chapter 3 Numerical Descriptive Measures 3-1 Learning Objectives In this chapter, you learn: To describe the properties of central tendency, variation,
More informationChapter 1. Looking at Data
Chapter 1 Looking at Data Types of variables Looking at Data Be sure that each variable really does measure what you want it to. A poor choice of variables can lead to misleading conclusions!! For example,
More informationFull file at
IV SOLUTIONS TO EXERCISES Note: Exercises whose answers are given in the back of the textbook are denoted by the symbol. CHAPTER Description of Samples and Populations Note: Exercises whose answers are
More informationPractice problems from chapters 2 and 3
Practice problems from chapters and 3 Question-1. For each of the following variables, indicate whether it is quantitative or qualitative and specify which of the four levels of measurement (nominal, ordinal,
More informationChapter 3. Data Description
Chapter 3. Data Description Graphical Methods Pie chart It is used to display the percentage of the total number of measurements falling into each of the categories of the variable by partition a circle.
More informationMeasures of disease spread
Measures of disease spread Marco De Nardi Milk Safety Project 1 Objectives 1. Describe the following measures of spread: range, interquartile range, variance, and standard deviation 2. Discuss examples
More informationUnit 1 Summarizing Data
BIOSTATS 540 Fall 2016 1. Summarizing Page 1 of 42 Unit 1 Summarizing It is difficult to understand why statisticians commonly limit their enquiries to averages, and do not revel in more comprehensive
More informationUnit Two Descriptive Biostatistics. Dr Mahmoud Alhussami
Unit Two Descriptive Biostatistics Dr Mahmoud Alhussami Descriptive Biostatistics The best way to work with data is to summarize and organize them. Numbers that have not been summarized and organized are
More informationChapter2 Description of samples and populations. 2.1 Introduction.
Chapter2 Description of samples and populations. 2.1 Introduction. Statistics=science of analyzing data. Information collected (data) is gathered in terms of variables (characteristics of a subject that
More informationDescriptive Statistics
Descriptive Statistics CHAPTER OUTLINE 6-1 Numerical Summaries of Data 6- Stem-and-Leaf Diagrams 6-3 Frequency Distributions and Histograms 6-4 Box Plots 6-5 Time Sequence Plots 6-6 Probability Plots Chapter
More informationLecture 2 and Lecture 3
Lecture 2 and Lecture 3 1 Lecture 2 and Lecture 3 We can describe distributions using 3 characteristics: shape, center and spread. These characteristics have been discussed since the foundation of statistics.
More informationDEPARTMENT OF QUANTITATIVE METHODS & INFORMATION SYSTEMS QM 120. Spring 2008
DEPARTMENT OF QUANTITATIVE METHODS & INFORMATION SYSTEMS Introduction to Business Statistics QM 120 Chapter 3 Spring 2008 Measures of central tendency for ungrouped data 2 Graphs are very helpful to describe
More informationLecture 2. Quantitative variables. There are three main graphical methods for describing, summarizing, and detecting patterns in quantitative data:
Lecture 2 Quantitative variables There are three main graphical methods for describing, summarizing, and detecting patterns in quantitative data: Stemplot (stem-and-leaf plot) Histogram Dot plot Stemplots
More informationResistant Measure - A statistic that is not affected very much by extreme observations.
Chapter 1.3 Lecture Notes & Examples Section 1.3 Describing Quantitative Data with Numbers (pp. 50-74) 1.3.1 Measuring Center: The Mean Mean - The arithmetic average. To find the mean (pronounced x bar)
More informationDescriptive Univariate Statistics and Bivariate Correlation
ESC 100 Exploring Engineering Descriptive Univariate Statistics and Bivariate Correlation Instructor: Sudhir Khetan, Ph.D. Wednesday/Friday, October 17/19, 2012 The Central Dogma of Statistics used to
More informationSUMMARIZING MEASURED DATA. Gaia Maselli
SUMMARIZING MEASURED DATA Gaia Maselli maselli@di.uniroma1.it Computer Network Performance 2 Overview Basic concepts Summarizing measured data Summarizing data by a single number Summarizing variability
More informationChapter 4. Displaying and Summarizing. Quantitative Data
STAT 141 Introduction to Statistics Chapter 4 Displaying and Summarizing Quantitative Data Bin Zou (bzou@ualberta.ca) STAT 141 University of Alberta Winter 2015 1 / 31 4.1 Histograms 1 We divide the range
More informationCHAPTER 2: Describing Distributions with Numbers
CHAPTER 2: Describing Distributions with Numbers The Basic Practice of Statistics 6 th Edition Moore / Notz / Fligner Lecture PowerPoint Slides Chapter 2 Concepts 2 Measuring Center: Mean and Median Measuring
More informationF78SC2 Notes 2 RJRC. If the interest rate is 5%, we substitute x = 0.05 in the formula. This gives
F78SC2 Notes 2 RJRC Algebra It is useful to use letters to represent numbers. We can use the rules of arithmetic to manipulate the formula and just substitute in the numbers at the end. Example: 100 invested
More informationECLT 5810 Data Preprocessing. Prof. Wai Lam
ECLT 5810 Data Preprocessing Prof. Wai Lam Why Data Preprocessing? Data in the real world is imperfect incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate
More informationStatistics I Chapter 2: Univariate data analysis
Statistics I Chapter 2: Univariate data analysis Chapter 2: Univariate data analysis Contents Graphical displays for categorical data (barchart, piechart) Graphical displays for numerical data data (histogram,
More informationSTT 315 This lecture is based on Chapter 2 of the textbook.
STT 315 This lecture is based on Chapter 2 of the textbook. Acknowledgement: Author is thankful to Dr. Ashok Sinha, Dr. Jennifer Kaplan and Dr. Parthanil Roy for allowing him to use/edit some of their
More informationBiostatistics for biomedical profession. BIMM34 Karin Källen & Linda Hartman November-December 2015
Biostatistics for biomedical profession BIMM34 Karin Källen & Linda Hartman November-December 2015 12015-11-02 Who needs a course in biostatistics? - Anyone who uses quntitative methods to interpret biological
More information(quantitative or categorical variables) Numerical descriptions of center, variability, position (quantitative variables)
3. Descriptive Statistics Describing data with tables and graphs (quantitative or categorical variables) Numerical descriptions of center, variability, position (quantitative variables) Bivariate descriptions
More informationChapter. Numerically Summarizing Data Pearson Prentice Hall. All rights reserved
Chapter 3 Numerically Summarizing Data Section 3.1 Measures of Central Tendency Objectives 1. Determine the arithmetic mean of a variable from raw data 2. Determine the median of a variable from raw data
More informationChapter 1 - Lecture 3 Measures of Location
Chapter 1 - Lecture 3 of Location August 31st, 2009 Chapter 1 - Lecture 3 of Location General Types of measures Median Skewness Chapter 1 - Lecture 3 of Location Outline General Types of measures What
More informationAP Statistics Semester I Examination Section I Questions 1-30 Spend approximately 60 minutes on this part of the exam.
AP Statistics Semester I Examination Section I Questions 1-30 Spend approximately 60 minutes on this part of the exam. Name: Directions: The questions or incomplete statements below are each followed by
More informationVocabulary: Samples and Populations
Vocabulary: Samples and Populations Concept Different types of data Categorical data results when the question asked in a survey or sample can be answered with a nonnumerical answer. For example if we
More informationChapter 1: Exploring Data
Chapter 1: Exploring Data Section 1.3 with Numbers The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE Chapter 1 Exploring Data Introduction: Data Analysis: Making Sense of Data 1.1
More informationQuantitative Tools for Research
Quantitative Tools for Research KASHIF QADRI Descriptive Analysis Lecture Week 4 1 Overview Measurement of Central Tendency / Location Mean, Median & Mode Quantiles (Quartiles, Deciles, Percentiles) Measurement
More informationADMS2320.com. We Make Stats Easy. Chapter 4. ADMS2320.com Tutorials Past Tests. Tutorial Length 1 Hour 45 Minutes
We Make Stats Easy. Chapter 4 Tutorial Length 1 Hour 45 Minutes Tutorials Past Tests Chapter 4 Page 1 Chapter 4 Note The following topics will be covered in this chapter: Measures of central location Measures
More informationIntroduction to Statistics
Introduction to Statistics Data and Statistics Data consists of information coming from observations, counts, measurements, or responses. Statistics is the science of collecting, organizing, analyzing,
More informationStatistics I Chapter 2: Univariate data analysis
Statistics I Chapter 2: Univariate data analysis Chapter 2: Univariate data analysis Contents Graphical displays for categorical data (barchart, piechart) Graphical displays for numerical data data (histogram,
More informationSTP 420 INTRODUCTION TO APPLIED STATISTICS NOTES
INTRODUCTION TO APPLIED STATISTICS NOTES PART - DATA CHAPTER LOOKING AT DATA - DISTRIBUTIONS Individuals objects described by a set of data (people, animals, things) - all the data for one individual make
More informationDescriptive Data Summarization
Descriptive Data Summarization Descriptive data summarization gives the general characteristics of the data and identify the presence of noise or outliers, which is useful for successful data cleaning
More informationChapter. Numerically Summarizing Data. Copyright 2013, 2010 and 2007 Pearson Education, Inc.
Chapter 3 Numerically Summarizing Data Section 3.1 Measures of Central Tendency Objectives 1. Determine the arithmetic mean of a variable from raw data 2. Determine the median of a variable from raw data
More informationDescription of Samples and Populations
Description of Samples and Populations Random Variables Data are generated by some underlying random process or phenomenon. Any datum (data point) represents the outcome of a random variable. We represent
More informationAnswer keys for Assignment 10: Measurement of study variables (The correct answer is underlined in bold text)
Answer keys for Assignment 10: Measurement of study variables (The correct answer is underlined in bold text) 1. A quick and easy indicator of dispersion is a. Arithmetic mean b. Variance c. Standard deviation
More information1.3.1 Measuring Center: The Mean
1.3.1 Measuring Center: The Mean Mean - The arithmetic average. To find the mean (pronounced x bar) of a set of observations, add their values and divide by the number of observations. If the n observations
More informationUnit 1 Summarizing Data
PubHtlth 540 Fall 2014 1. Summarizing Page 1 of 54 Unit 1 Summarizing It is difficult to understand why statisticians commonly limit their enquiries to averages, and do not revel in more comprehensive
More informationChapter 2 Class Notes Sample & Population Descriptions Classifying variables
Chapter 2 Class Notes Sample & Population Descriptions Classifying variables Random Variables (RVs) are discrete quantitative continuous nominal qualitative ordinal Notation and Definitions: a Sample is
More informationCHAPTER 2 Description of Samples and Populations
Chapter 2 27 CHAPTER 2 Description of Samples and Populations 2.1.1 (a) i) Molar width ii) Continuous variable iii) A molar iv) 36 (b) i) Birthweight, date of birth, and race ii) Birthweight is continuous,
More informationMath 223 Lecture Notes 3/15/04 From The Basic Practice of Statistics, bymoore
Math 223 Lecture Notes 3/15/04 From The Basic Practice of Statistics, bymoore Chapter 3 continued Describing distributions with numbers Measuring spread of data: Quartiles Definition 1: The interquartile
More informationThe Empirical Rule, z-scores, and the Rare Event Approach
Overview The Empirical Rule, z-scores, and the Rare Event Approach Look at Chebyshev s Rule and the Empirical Rule Explore some applications of the Empirical Rule How to calculate and use z-scores Introducing
More informationChapter 1 Descriptive Statistics
MICHIGAN STATE UNIVERSITY STT 351 SECTION 2 FALL 2008 LECTURE NOTES Chapter 1 Descriptive Statistics Nao Mimoto Contents 1 Overview 2 2 Pictorial Methods in Descriptive Statistics 3 2.1 Different Kinds
More informationCOMPLEMENTARY EXERCISES WITH DESCRIPTIVE STATISTICS
COMPLEMENTARY EXERCISES WITH DESCRIPTIVE STATISTICS EX 1 Given the following series of data on Gender and Height for 8 patients, fill in two frequency tables one for each Variable, according to the model
More informationDetermining the Spread of a Distribution
Determining the Spread of a Distribution 1.3-1.5 Cathy Poliak, Ph.D. cathy@math.uh.edu Department of Mathematics University of Houston Lecture 3-2311 Lecture 3-2311 1 / 58 Outline 1 Describing Quantitative
More information3rd Quartile. 1st Quartile) Minimum
EXST7034 - Regression Techniques Page 1 Regression diagnostics dependent variable Y3 There are a number of graphic representations which will help with problem detection and which can be used to obtain
More informationDetermining the Spread of a Distribution
Determining the Spread of a Distribution 1.3-1.5 Cathy Poliak, Ph.D. cathy@math.uh.edu Department of Mathematics University of Houston Lecture 3-2311 Lecture 3-2311 1 / 58 Outline 1 Describing Quantitative
More information3 Lecture 3 Notes: Measures of Variation. The Boxplot. Definition of Probability
3 Lecture 3 Notes: Measures of Variation. The Boxplot. Definition of Probability 3.1 Week 1 Review Creativity is more than just being different. Anybody can plan weird; that s easy. What s hard is to be
More informationDiscrete Multivariate Statistics
Discrete Multivariate Statistics Univariate Discrete Random variables Let X be a discrete random variable which, in this module, will be assumed to take a finite number of t different values which are
More informationMATH 1150 Chapter 2 Notation and Terminology
MATH 1150 Chapter 2 Notation and Terminology Categorical Data The following is a dataset for 30 randomly selected adults in the U.S., showing the values of two categorical variables: whether or not the
More informationMeelis Kull Autumn Meelis Kull - Autumn MTAT Data Mining - Lecture 03
Meelis Kull meelis.kull@ut.ee Autumn 2017 1 Demo: Data science mini-project CRISP-DM: cross-industrial standard process for data mining Data understanding: Types of data Data understanding: First look
More informationChapter 1: Introduction. Material from Devore s book (Ed 8), and Cengagebrain.com
1 Chapter 1: Introduction Material from Devore s book (Ed 8), and Cengagebrain.com Populations and Samples An investigation of some characteristic of a population of interest. Example: Say you want to
More informationBasics of Experimental Design. Review of Statistics. Basic Study. Experimental Design. When an Experiment is Not Possible. Studying Relations
Basics of Experimental Design Review of Statistics And Experimental Design Scientists study relation between variables In the context of experiments these variables are called independent and dependent
More informationUnit 1 Summarizing Data
BIOSTATS 540 Fall 2018 1. Summarizing Page 1 of 47 Unit 1 Summarizing It is difficult to understand why statisticians commonly limit their enquiries to averages, and do not revel in more comprehensive
More informationChapter Four. Numerical Descriptive Techniques. Range, Standard Deviation, Variance, Coefficient of Variation
Chapter Four Numerical Descriptive Techniques 4.1 Numerical Descriptive Techniques Measures of Central Location Mean, Median, Mode Measures of Variability Range, Standard Deviation, Variance, Coefficient
More informationSection 3. Measures of Variation
Section 3 Measures of Variation Range Range = (maximum value) (minimum value) It is very sensitive to extreme values; therefore not as useful as other measures of variation. Sample Standard Deviation The
More information