Descriptive Statistics and Visualizing Data in STATA

Size: px
Start display at page:

Download "Descriptive Statistics and Visualizing Data in STATA"

Transcription

1 Descriptive Statistics and Visualizing Data in STATA BIOS 514/517 R. Y. Coley Week of October 7, 2013

2 Log Files, Getting Data in STATA Log files save your commands cd /home/students/rycoley/bios To change directory log using stata-section-oct7, replace text To name log file (change stata-section-oct7) capture log close to close log file insheet using To get FEV data in

3 Defining, Labeling Variables table smoke Currently coded as 1 and 2 No missing data (would be coded as 9) label define smokelabel 1 "smoker" 2 "non-smoker" label values smoke smokelabel label define sexlabel 1 "male" 2 "female" label values sex sexlabel

4 Labeling Variables label variable age "Age (years)" label variable fev "FEV (L/s)" label variable height "Height (in)"

5 Descriptive Statistics Basic commands detailed in this week s lecture notes: summarize means centile tabstat tabulate

6 Descriptive Stats by Group bysort sex: tabstat fev, stat(n mean sd min p25 med p75 max) col(stat) format bysort sex: tabulate smoke

7 Defining New Variables A few ways: gen age9over = age>=9 gen age9over = 0 replace age9over=1 if age>=9 gen age9over = age==9 age==10 age==11... age==19

8 Measures of Spread Range: tabstat fev, stat(min max range) Variance: tabstat fev, stat(var) Standard Deviation: tabstat fev, stat(sd) Interquartile Range: tabstat fev, stat(p25, p75, iqr) IQR is the distance between the 25th and 75th percentiles of the data

9 Visualizing Data- Histograms histogram fev to save: graph export hist-fev.png, replace Height of each bar proportional to proportion of observations in that bin s range

10 Visualizing Data- Histograms histogram fev, kdensity by (sex) kdensity adds smooth line estimating density

11 Visualizing Data- Dotplots dotplot fev Each dot represents an observations

12 Visualizing Data- Box Plots a.k.a. Box and whiskers plots Box extends from lower quartile (25th percentile of data) to upper quartile (75th percentile) with a line at the median (50th percentile). Whiskers extend from lower quartile to lower adjacent value and from upper quartile to upper adjacent value LAV = lower quartile 3 2 IQR UAV = upper quartile IQR ( Observations outside the UAV and LAV plotted as points (Some box plots have whiskers extend to minimum and maximum observations.)

13 Visualizing Data- Box Plots graph box fev

14 Visualizing Data- Box Plots graph box fev, over(sex)

15 Visualizing Data- Scatterplots scatter fev height

16 Visualizing Data- Bar Charts gen one=1 graph bar (count) one, over(smoke) ytitle("frequency")

17 Another Example log using cause-of-death, text replace set obs 10 input float deaths str30 cause "Heart Disease" "Cancer" "Cerebrovascular Disease" "Chronic respiratory disease" "Accidental Death" "Diabetes" "Flu and pneumonia" "Alzheimer s disease" "Kidney disorder" "Septicemia"

18 Visualizing Data- Bar Chart gen dthou=deaths/1000 graph hbar dthou, over(cause) ytitle("annual deaths (thousands)")

19 Visualizing Data- Bar Charts gen dthou=deaths/1000 graph hbar dthou, over(cause, sort(1) descending) ytitle("annual deaths (thousands)")

20 Visualizing Data- Pie Charts graph pie deaths, over(cause) sort descending

21 Visualizing Data- Pie Charts

22 Visualizing Data- Pie Charts

23 Visualizing Data- Pie Charts

24 Doing it all over again in R! Look at the code I have posted on the discussion board. It is extensively commented (##)! Comments omitted here. data<-read.csv("fevdata.csv",header=true) names(data) dim(data) n<-dim(data)[1]

25 (Re-)defining variables Variables don t have labels like in Stata. But, we can improve upon the current coding of smoke and sex. data$smoke[data$smoke==2]<-0 \\ data$female<-data$sex==2 Creating a new variable: data$age9over<-data$age>=9

26 Descriptive Statistics summary(data$fev) #min, 1Q, Med, Mean, 3Q, Max mean(data$fev) quantile(data$fev, p=c(0.25, 0.5, 0.75)) table(data$smoke) xtabs(~data$smoke+data$female) #to get cross tabulation

27 Measures of Spread range(data$fev) #gives min and max var(data$fev) #variance sd(data$fev) #standard deviation

28 Histograms hist(data$fev, xlab="fev (L/s)", main="histogram of FEV") To save the graph: pdf(file="fev-hist-r.pdf") hist(data$fev, xlab="fev (L/s)", main="histogram of FEV") graphics.off() Histogram of FEV Frequency FEV (L/s)

29 Histograms hist(data$fev, xlab="fev (L/s)", main="histogram of FEV", prob=true) lines(density(data$fev)) Histogram of FEV Density FEV (L/s)

30 Histogram plot(hist(data$fev[data$female==0], xlab="fev (L/s)", main="males", ylim=c(0,80)), hist(data$fev[data$female==1], xlab="fev (L/s)", main="females", xlim=c(0,6))) Males Females Frequency Frequency FEV (L/s) FEV (L/s)

31 Boxplot boxplot(data$fev, ylab="fev (L/s)") FEV (L/s)

32 Boxplot boxplot(data$fev~data$female, ylab="fev (L/s)", xaxt="n") axis(1, at=c(1,2), labels=c("male", "Female")) FEV (L/s) Male Female

33 Scatter Plot plot(data$fev~data$height, ylab="fev (L/s)", xlab="height (in)") Height (in) FEV (L/s)

34 Bar Plot counts<-table(data$smoke) barplot(counts, xlab="smoker", xaxt="n") axis(1, at=c(1,2), labels=c("no","yes")) No Smoker Yes

35 Cause of Death Example in R n.deaths<-c(700142, , , , , 71372, 62034, 53852, 39480, 32238) cause<-c("heart Disease", "Cancer", "Cerebrovascular Disease", "Chronic Respiratory Diesease","Accidental death", "Diabetes", "Flu and Pneumonia", "Alzheimer s Disease", "Kidney Disorder","Septicemia") n.deaths<-n.deaths/1000

36 Cause of Death Example par(mar=c(4,6.5,1,1)) barplot(n.deaths, horiz=t, yaxt="n", xlab="number of Death (Thousands)", main="cause of Death") text(y=seq(1,11.35, 1.15), par("usr")[1], labels=cause, srt=45, pos=2, xpd=t, cex=0.75) Cause of Death Septicemia Kidney Disorder Alzheimer's Disease Flu and Pneumonia Diabetes Accidental death Chronic Respiratory Diesease Cerebrovascular Disease Cancer Heart Disease Number of Deaths (Thousands)

37 Cause of Death Example pie(n.deaths, cause, main="cause of Death" ) Cause of Death Heart Disease Cancer Septicemia Kidney Disorder Flu and Pneumonia Diabetes Alzheimer's Disease Cerebrovascular Disease Accidental death Chronic Respiratory Diesease

BIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke

BIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke BIOL 51A - Biostatistics 1 1 Lecture 1: Intro to Biostatistics Smoking: hazardous? FEV (l) 1 2 3 4 5 No Yes Smoke BIOL 51A - Biostatistics 1 2 Box Plot a.k.a box-and-whisker diagram or candlestick chart

More information

CS 5630/6630 Scientific Visualization. Elementary Plotting Techniques II

CS 5630/6630 Scientific Visualization. Elementary Plotting Techniques II CS 5630/6630 Scientific Visualization Elementary Plotting Techniques II Motivation Given a certain type of data, what plotting technique should I use? What plotting techniques should be avoided? How do

More information

Chapter 2: Tools for Exploring Univariate Data

Chapter 2: Tools for Exploring Univariate Data Stats 11 (Fall 2004) Lecture Note Introduction to Statistical Methods for Business and Economics Instructor: Hongquan Xu Chapter 2: Tools for Exploring Univariate Data Section 2.1: Introduction What is

More information

What is statistics? Statistics is the science of: Collecting information. Organizing and summarizing the information collected

What is statistics? Statistics is the science of: Collecting information. Organizing and summarizing the information collected What is statistics? Statistics is the science of: Collecting information Organizing and summarizing the information collected Analyzing the information collected in order to draw conclusions Two types

More information

STT 315 This lecture is based on Chapter 2 of the textbook.

STT 315 This lecture is based on Chapter 2 of the textbook. STT 315 This lecture is based on Chapter 2 of the textbook. Acknowledgement: Author is thankful to Dr. Ashok Sinha, Dr. Jennifer Kaplan and Dr. Parthanil Roy for allowing him to use/edit some of their

More information

QUANTITATIVE DATA. UNIVARIATE DATA data for one variable

QUANTITATIVE DATA. UNIVARIATE DATA data for one variable QUANTITATIVE DATA Recall that quantitative (numeric) data values are numbers where data take numerical values for which it is sensible to find averages, such as height, hourly pay, and pulse rates. UNIVARIATE

More information

What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty.

What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty. What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty. Statistics is a field of study concerned with the data collection,

More information

Stat 101 Exam 1 Important Formulas and Concepts 1

Stat 101 Exam 1 Important Formulas and Concepts 1 1 Chapter 1 1.1 Definitions Stat 101 Exam 1 Important Formulas and Concepts 1 1. Data Any collection of numbers, characters, images, or other items that provide information about something. 2. Categorical/Qualitative

More information

Describing distributions with numbers

Describing distributions with numbers Describing distributions with numbers A large number or numerical methods are available for describing quantitative data sets. Most of these methods measure one of two data characteristics: The central

More information

Chapter 4. Displaying and Summarizing. Quantitative Data

Chapter 4. Displaying and Summarizing. Quantitative Data STAT 141 Introduction to Statistics Chapter 4 Displaying and Summarizing Quantitative Data Bin Zou (bzou@ualberta.ca) STAT 141 University of Alberta Winter 2015 1 / 31 4.1 Histograms 1 We divide the range

More information

Announcements. Lecture 1 - Data and Data Summaries. Data. Numerical Data. all variables. continuous discrete. Homework 1 - Out 1/15, due 1/22

Announcements. Lecture 1 - Data and Data Summaries. Data. Numerical Data. all variables. continuous discrete. Homework 1 - Out 1/15, due 1/22 Announcements Announcements Lecture 1 - Data and Data Summaries Statistics 102 Colin Rundel January 13, 2013 Homework 1 - Out 1/15, due 1/22 Lab 1 - Tomorrow RStudio accounts created this evening Try logging

More information

are the objects described by a set of data. They may be people, animals or things.

are the objects described by a set of data. They may be people, animals or things. ( c ) E p s t e i n, C a r t e r a n d B o l l i n g e r 2016 C h a p t e r 5 : E x p l o r i n g D a t a : D i s t r i b u t i o n s P a g e 1 CHAPTER 5: EXPLORING DATA DISTRIBUTIONS 5.1 Creating Histograms

More information

1.3.1 Measuring Center: The Mean

1.3.1 Measuring Center: The Mean 1.3.1 Measuring Center: The Mean Mean - The arithmetic average. To find the mean (pronounced x bar) of a set of observations, add their values and divide by the number of observations. If the n observations

More information

Section 3. Measures of Variation

Section 3. Measures of Variation Section 3 Measures of Variation Range Range = (maximum value) (minimum value) It is very sensitive to extreme values; therefore not as useful as other measures of variation. Sample Standard Deviation The

More information

Units. Exploratory Data Analysis. Variables. Student Data

Units. Exploratory Data Analysis. Variables. Student Data Units Exploratory Data Analysis Bret Larget Departments of Botany and of Statistics University of Wisconsin Madison Statistics 371 13th September 2005 A unit is an object that can be measured, such as

More information

1. Exploratory Data Analysis

1. Exploratory Data Analysis 1. Exploratory Data Analysis 1.1 Methods of Displaying Data A visual display aids understanding and can highlight features which may be worth exploring more formally. Displays should have impact and be

More information

CHAPTER 5: EXPLORING DATA DISTRIBUTIONS. Individuals are the objects described by a set of data. These individuals may be people, animals or things.

CHAPTER 5: EXPLORING DATA DISTRIBUTIONS. Individuals are the objects described by a set of data. These individuals may be people, animals or things. (c) Epstein 2013 Chapter 5: Exploring Data Distributions Page 1 CHAPTER 5: EXPLORING DATA DISTRIBUTIONS 5.1 Creating Histograms Individuals are the objects described by a set of data. These individuals

More information

PubHlth 540 Fall Summarizing Data Page 1 of 18. Unit 1 - Summarizing Data Practice Problems. Solutions

PubHlth 540 Fall Summarizing Data Page 1 of 18. Unit 1 - Summarizing Data Practice Problems. Solutions PubHlth 50 Fall 0. Summarizing Data Page of 8 Unit - Summarizing Data Practice Problems Solutions #. a. Qualitative - ordinal b. Qualitative - nominal c. Quantitative continuous, ratio d. Qualitative -

More information

Exam: 4 hour multiple choice. Agenda. Course Introduction to Statistics. Lecture 1: Introduction to Statistics. Per Bruun Brockhoff

Exam: 4 hour multiple choice. Agenda. Course Introduction to Statistics. Lecture 1: Introduction to Statistics. Per Bruun Brockhoff Course 02402 Lecture 1: Per Bruun Brockhoff DTU Informatics Building 305 - room 110 Danish Technical University 2800 Lyngby Denmark e-mail: pbb@imm.dtu.dk Agenda 1 2 3 4 Per Bruun Brockhoff (pbb@imm.dtu.dk),

More information

Chapter 2 Class Notes Sample & Population Descriptions Classifying variables

Chapter 2 Class Notes Sample & Population Descriptions Classifying variables Chapter 2 Class Notes Sample & Population Descriptions Classifying variables Random Variables (RVs) are discrete quantitative continuous nominal qualitative ordinal Notation and Definitions: a Sample is

More information

Chapter 1. Looking at Data

Chapter 1. Looking at Data Chapter 1 Looking at Data Types of variables Looking at Data Be sure that each variable really does measure what you want it to. A poor choice of variables can lead to misleading conclusions!! For example,

More information

Stat Lecture Slides Exploring Numerical Data. Yibi Huang Department of Statistics University of Chicago

Stat Lecture Slides Exploring Numerical Data. Yibi Huang Department of Statistics University of Chicago Stat 22000 Lecture Slides Exploring Numerical Data Yibi Huang Department of Statistics University of Chicago Outline In this slide, we cover mostly Section 1.2 & 1.6 in the text. Data and Types of Variables

More information

Clinical Research Module: Biostatistics

Clinical Research Module: Biostatistics Clinical Research Module: Biostatistics Lecture 1 Alberto Nettel-Aguirre, PhD, PStat These lecture notes based on others developed by Drs. Peter Faris, Sarah Rose Luz Palacios-Derflingher and myself Who

More information

P8130: Biostatistical Methods I

P8130: Biostatistical Methods I P8130: Biostatistical Methods I Lecture 2: Descriptive Statistics Cody Chiuzan, PhD Department of Biostatistics Mailman School of Public Health (MSPH) Lecture 1: Recap Intro to Biostatistics Types of Data

More information

Exploring, summarizing and presenting data. Berghold, IMI, MUG

Exploring, summarizing and presenting data. Berghold, IMI, MUG Exploring, summarizing and presenting data Example Patient Nr Gender Age Weight Height PAVK-Grade W alking Distance Physical Functioning Scale Total Cholesterol Triglycerides 01 m 65 90 185 II b 200 70

More information

Lecture 2. Quantitative variables. There are three main graphical methods for describing, summarizing, and detecting patterns in quantitative data:

Lecture 2. Quantitative variables. There are three main graphical methods for describing, summarizing, and detecting patterns in quantitative data: Lecture 2 Quantitative variables There are three main graphical methods for describing, summarizing, and detecting patterns in quantitative data: Stemplot (stem-and-leaf plot) Histogram Dot plot Stemplots

More information

CHAPTER 2: Describing Distributions with Numbers

CHAPTER 2: Describing Distributions with Numbers CHAPTER 2: Describing Distributions with Numbers The Basic Practice of Statistics 6 th Edition Moore / Notz / Fligner Lecture PowerPoint Slides Chapter 2 Concepts 2 Measuring Center: Mean and Median Measuring

More information

Elementary Statistics

Elementary Statistics Elementary Statistics Q: What is data? Q: What does the data look like? Q: What conclusions can we draw from the data? Q: Where is the middle of the data? Q: Why is the spread of the data important? Q:

More information

MATH 10 INTRODUCTORY STATISTICS

MATH 10 INTRODUCTORY STATISTICS MATH 10 INTRODUCTORY STATISTICS Tommy Khoo Your friendly neighbourhood graduate student. Week 1 Chapter 1 Introduction What is Statistics? Why do you need to know Statistics? Technical lingo and concepts:

More information

2.1 Measures of Location (P.9-11)

2.1 Measures of Location (P.9-11) MATH1015 Biostatistics Week.1 Measures of Location (P.9-11).1.1 Summation Notation Suppose that we observe n values from an experiment. This collection (or set) of n values is called a sample. Let x 1

More information

Averages How difficult is QM1? What is the average mark? Week 1b, Lecture 2

Averages How difficult is QM1? What is the average mark? Week 1b, Lecture 2 Averages How difficult is QM1? What is the average mark? Week 1b, Lecture 2 Topics: 1. Mean 2. Mode 3. Median 4. Order Statistics 5. Minimum, Maximum, Range 6. Percentiles, Quartiles, Interquartile Range

More information

Visualizing Data: Basic Plot Types

Visualizing Data: Basic Plot Types Visualizing Data: Basic Plot Types Data Science 101 Stanford University, Department of Statistics Agenda Today s lecture focuses on these basic plot types: bar charts histograms boxplots scatter plots

More information

Chapter2 Description of samples and populations. 2.1 Introduction.

Chapter2 Description of samples and populations. 2.1 Introduction. Chapter2 Description of samples and populations. 2.1 Introduction. Statistics=science of analyzing data. Information collected (data) is gathered in terms of variables (characteristics of a subject that

More information

Introduction to Statistics

Introduction to Statistics Introduction to Statistics Data and Statistics Data consists of information coming from observations, counts, measurements, or responses. Statistics is the science of collecting, organizing, analyzing,

More information

Last time. Numerical summaries for continuous variables. Center: mean and median. Spread: Standard deviation and inter-quartile range

Last time. Numerical summaries for continuous variables. Center: mean and median. Spread: Standard deviation and inter-quartile range Lecture 4 Last time Numerical summaries for continuous variables Center: mean and median Spread: Standard deviation and inter-quartile range Exploratory graphics Histogram (revisit modes ) Histograms Histogram

More information

Online supplement. Absolute Value of Lung Function (FEV 1 or FVC) Explains the Sex Difference in. Breathlessness in the General Population

Online supplement. Absolute Value of Lung Function (FEV 1 or FVC) Explains the Sex Difference in. Breathlessness in the General Population Online supplement Absolute Value of Lung Function (FEV 1 or FVC) Explains the Sex Difference in Breathlessness in the General Population Table S1. Comparison between patients who were excluded or included

More information

Chapter 3. Data Description

Chapter 3. Data Description Chapter 3. Data Description Graphical Methods Pie chart It is used to display the percentage of the total number of measurements falling into each of the categories of the variable by partition a circle.

More information

Resistant Measure - A statistic that is not affected very much by extreme observations.

Resistant Measure - A statistic that is not affected very much by extreme observations. Chapter 1.3 Lecture Notes & Examples Section 1.3 Describing Quantitative Data with Numbers (pp. 50-74) 1.3.1 Measuring Center: The Mean Mean - The arithmetic average. To find the mean (pronounced x bar)

More information

STAT 200 Chapter 1 Looking at Data - Distributions

STAT 200 Chapter 1 Looking at Data - Distributions STAT 200 Chapter 1 Looking at Data - Distributions What is Statistics? Statistics is a science that involves the design of studies, data collection, summarizing and analyzing the data, interpreting the

More information

Describing distributions with numbers

Describing distributions with numbers Describing distributions with numbers A large number or numerical methods are available for describing quantitative data sets. Most of these methods measure one of two data characteristics: The central

More information

MATH 1150 Chapter 2 Notation and Terminology

MATH 1150 Chapter 2 Notation and Terminology MATH 1150 Chapter 2 Notation and Terminology Categorical Data The following is a dataset for 30 randomly selected adults in the U.S., showing the values of two categorical variables: whether or not the

More information

UNIVERSITY OF MASSACHUSETTS Department of Biostatistics and Epidemiology BioEpi 540W - Introduction to Biostatistics Fall 2004

UNIVERSITY OF MASSACHUSETTS Department of Biostatistics and Epidemiology BioEpi 540W - Introduction to Biostatistics Fall 2004 UNIVERSITY OF MASSACHUSETTS Department of Biostatistics and Epidemiology BioEpi 50W - Introduction to Biostatistics Fall 00 Exercises with Solutions Topic Summarizing Data Due: Monday September 7, 00 READINGS.

More information

Nicole Dalzell. July 2, 2014

Nicole Dalzell. July 2, 2014 UNIT 1: INTRODUCTION TO DATA LECTURE 3: EDA (CONT.) AND INTRODUCTION TO STATISTICAL INFERENCE VIA SIMULATION STATISTICS 101 Nicole Dalzell July 2, 2014 Teams and Announcements Team1 = Houdan Sai Cui Huanqi

More information

Topic 3: Introduction to Statistics. Algebra 1. Collecting Data. Table of Contents. Categorical or Quantitative? What is the Study of Statistics?!

Topic 3: Introduction to Statistics. Algebra 1. Collecting Data. Table of Contents. Categorical or Quantitative? What is the Study of Statistics?! Topic 3: Introduction to Statistics Collecting Data We collect data through observation, surveys and experiments. We can collect two different types of data: Categorical Quantitative Algebra 1 Table of

More information

Measures of center. The mean The mean of a distribution is the arithmetic average of the observations:

Measures of center. The mean The mean of a distribution is the arithmetic average of the observations: Measures of center The mean The mean of a distribution is the arithmetic average of the observations: x = x 1 + + x n n n = 1 x i n i=1 The median The median is the midpoint of a distribution: the number

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics CHAPTER OUTLINE 6-1 Numerical Summaries of Data 6- Stem-and-Leaf Diagrams 6-3 Frequency Distributions and Histograms 6-4 Box Plots 6-5 Time Sequence Plots 6-6 Probability Plots Chapter

More information

Descriptive Data Summarization

Descriptive Data Summarization Descriptive Data Summarization Descriptive data summarization gives the general characteristics of the data and identify the presence of noise or outliers, which is useful for successful data cleaning

More information

Stat 20: Intro to Probability and Statistics

Stat 20: Intro to Probability and Statistics Stat 20: Intro to Probability and Statistics Lecture 5: Summary Statistics Tessa L. Childers-Day UC Berkeley 30 June 2014 By the end of this lecture... You will be able to: Describe a data set by its:

More information

Chapter 2 Solutions Page 15 of 28

Chapter 2 Solutions Page 15 of 28 Chapter Solutions Page 15 of 8.50 a. The median is 55. The mean is about 105. b. The median is a more representative average" than the median here. Notice in the stem-and-leaf plot on p.3 of the text that

More information

Statistics I Chapter 2: Univariate data analysis

Statistics I Chapter 2: Univariate data analysis Statistics I Chapter 2: Univariate data analysis Chapter 2: Univariate data analysis Contents Graphical displays for categorical data (barchart, piechart) Graphical displays for numerical data data (histogram,

More information

Sections 2.3 and 2.4

Sections 2.3 and 2.4 1 / 24 Sections 2.3 and 2.4 Note made by: Dr. Timothy Hanson Instructor: Peijie Hou Department of Statistics, University of South Carolina Stat 205: Elementary Statistics for the Biological and Life Sciences

More information

Tastitsticsss? What s that? Principles of Biostatistics and Informatics. Variables, outcomes. Tastitsticsss? What s that?

Tastitsticsss? What s that? Principles of Biostatistics and Informatics. Variables, outcomes. Tastitsticsss? What s that? Tastitsticsss? What s that? Statistics describes random mass phanomenons. Principles of Biostatistics and Informatics nd Lecture: Descriptive Statistics 3 th September Dániel VERES Data Collecting (Sampling)

More information

1.3: Describing Quantitative Data with Numbers

1.3: Describing Quantitative Data with Numbers 1.3: Describing Quantitative Data with Numbers Section 1.3 Describing Quantitative Data with Numbers After this section, you should be able to MEASURE center with the mean and median MEASURE spread with

More information

Statistics I Chapter 2: Univariate data analysis

Statistics I Chapter 2: Univariate data analysis Statistics I Chapter 2: Univariate data analysis Chapter 2: Univariate data analysis Contents Graphical displays for categorical data (barchart, piechart) Graphical displays for numerical data data (histogram,

More information

Last Lecture. Distinguish Populations from Samples. Knowing different Sampling Techniques. Distinguish Parameters from Statistics

Last Lecture. Distinguish Populations from Samples. Knowing different Sampling Techniques. Distinguish Parameters from Statistics Last Lecture Distinguish Populations from Samples Importance of identifying a population and well chosen sample Knowing different Sampling Techniques Distinguish Parameters from Statistics Knowing different

More information

Chapter 1: Exploring Data

Chapter 1: Exploring Data Chapter 1: Exploring Data Section 1.3 with Numbers The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE Chapter 1 Exploring Data Introduction: Data Analysis: Making Sense of Data 1.1

More information

ECLT 5810 Data Preprocessing. Prof. Wai Lam

ECLT 5810 Data Preprocessing. Prof. Wai Lam ECLT 5810 Data Preprocessing Prof. Wai Lam Why Data Preprocessing? Data in the real world is imperfect incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate

More information

Analytical Graphing. lets start with the best graph ever made

Analytical Graphing. lets start with the best graph ever made Analytical Graphing lets start with the best graph ever made Probably the best statistical graphic ever drawn, this map by Charles Joseph Minard portrays the losses suffered by Napoleon's army in the Russian

More information

Multivariate Analysis of Heart Risk Factors Bill Qualls

Multivariate Analysis of Heart Risk Factors Bill Qualls Multivariate Analysis of Heart Risk Factors Bill Qualls Executive Summary The purpose of the present study is to determine if trivial demographic / lifestyle data such as age, weight, gender, exercise,

More information

Descriptive Univariate Statistics and Bivariate Correlation

Descriptive Univariate Statistics and Bivariate Correlation ESC 100 Exploring Engineering Descriptive Univariate Statistics and Bivariate Correlation Instructor: Sudhir Khetan, Ph.D. Wednesday/Friday, October 17/19, 2012 The Central Dogma of Statistics used to

More information

Lesson Plan. Answer Questions. Summary Statistics. Histograms. The Normal Distribution. Using the Standard Normal Table

Lesson Plan. Answer Questions. Summary Statistics. Histograms. The Normal Distribution. Using the Standard Normal Table Lesson Plan Answer Questions Summary Statistics Histograms The Normal Distribution Using the Standard Normal Table 1 2. Summary Statistics Given a collection of data, one needs to find representations

More information

Lecture Notes 2: Variables and graphics

Lecture Notes 2: Variables and graphics Highlights: Lecture Notes 2: Variables and graphics Quantitative vs. qualitative variables Continuous vs. discrete and ordinal vs. nominal variables Frequency distributions Pie charts Bar charts Histograms

More information

Lecture 6: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries)

Lecture 6: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries) Lecture 6: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries) Summarize with Shape, Center, Spread Displays: Stemplots, Histograms Five Number Summary, Outliers, Boxplots Cengage Learning

More information

Data Analysis and Statistical Methods Statistics 651

Data Analysis and Statistical Methods Statistics 651 Data Analysis and Statistical Methods Statistics 651 http://www.stat.tamu.edu/~suhasini/teaching/ Suhasini Subba Rao Review In the previous lecture we looked at the statistics of M&Ms. This example illustrates

More information

CS 361: Probability & Statistics

CS 361: Probability & Statistics January 24, 2018 CS 361: Probability & Statistics Relationships in data Standard coordinates If we have two quantities of interest in a dataset, we might like to plot their histograms and compare the two

More information

ORGANIZATION AND DESCRIPTION OF DATA

ORGANIZATION AND DESCRIPTION OF DATA Loss 0 40 80 120 Frequency 0 5 10 15 20 Miller and Freunds Probability and Statistics for Engineers 9th Edition Johnson SOLUTIONS MANUAL Full download at: https://testbankreal.com/download/miller-freunds-probability-statisticsengineers-9th-edition-johnson-solutions-manual/

More information

AP Final Review II Exploring Data (20% 30%)

AP Final Review II Exploring Data (20% 30%) AP Final Review II Exploring Data (20% 30%) Quantitative vs Categorical Variables Quantitative variables are numerical values for which arithmetic operations such as means make sense. It is usually a measure

More information

Lecture 3B: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries)

Lecture 3B: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries) Lecture 3B: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries) Summarize with Shape, Center, Spread Displays: Stemplots, Histograms Five Number Summary, Outliers, Boxplots Mean vs.

More information

Chapter 6 Part 4. Confidence Intervals

Chapter 6 Part 4. Confidence Intervals Chapter 6 Part 4 Confidence Intervals October 1, 008 Goal: To clearly understand the link between probability distributions and confidence intervals. Skills: Be able to calculate (1 - α)% confidence interval

More information

Exploring Data. How to Explore Data

Exploring Data. How to Explore Data Exploring Data Statistics is the art and science of learning from data. This may include: Designing appropriate tools to collect data. Organizing data in a meaningful way. Displaying data with appropriate

More information

Describing Data: Two Variables

Describing Data: Two Variables STAT 250 Dr. Kari Lock Morgan Describing Data: Two Variables SECTIONS 2.4, 2.5 One quantitative variable (2.4) One quantitative and one categorical (2.4) Two quantitative (2.5) z- score Which is better,

More information

Histograms, Central Tendency, and Variability

Histograms, Central Tendency, and Variability The Economist, September 6, 214 1 Histograms, Central Tendency, and Variability Lecture 2 Reading: Sections 5 5.6 Includes ALL margin notes and boxes: For Example, Guided Example, Notation Alert, Just

More information

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency Math 1 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency The word average: is very ambiguous and can actually refer to the mean, median, mode or midrange. Notation:

More information

2.0 Lesson Plan. Answer Questions. Summary Statistics. Histograms. The Normal Distribution. Using the Standard Normal Table

2.0 Lesson Plan. Answer Questions. Summary Statistics. Histograms. The Normal Distribution. Using the Standard Normal Table 2.0 Lesson Plan Answer Questions 1 Summary Statistics Histograms The Normal Distribution Using the Standard Normal Table 2. Summary Statistics Given a collection of data, one needs to find representations

More information

STP 420 INTRODUCTION TO APPLIED STATISTICS NOTES

STP 420 INTRODUCTION TO APPLIED STATISTICS NOTES INTRODUCTION TO APPLIED STATISTICS NOTES PART - DATA CHAPTER LOOKING AT DATA - DISTRIBUTIONS Individuals objects described by a set of data (people, animals, things) - all the data for one individual make

More information

Analytical Graphing. lets start with the best graph ever made

Analytical Graphing. lets start with the best graph ever made Analytical Graphing lets start with the best graph ever made Probably the best statistical graphic ever drawn, this map by Charles Joseph Minard portrays the losses suffered by Napoleon's army in the Russian

More information

Review for Exam #1. Chapter 1. The Nature of Data. Definitions. Population. Sample. Quantitative data. Qualitative (attribute) data

Review for Exam #1. Chapter 1. The Nature of Data. Definitions. Population. Sample. Quantitative data. Qualitative (attribute) data Review for Exam #1 1 Chapter 1 Population the complete collection of elements (scores, people, measurements, etc.) to be studied Sample a subcollection of elements drawn from a population 11 The Nature

More information

READ YOUR SYLLABUS WHICH IS ALSO POSTED ON THE CLASS PAGE!!!!!!!

READ YOUR SYLLABUS WHICH IS ALSO POSTED ON THE CLASS PAGE!!!!!!! Text Book : Moore, McCabe and Craig, Introduction to the Practice of Statistics, 6 th ed. Class Web Page: http://www.stt.msu.edu/academics/classpages/ Choose STT421. READ YOUR SYLLABUS WHICH IS ALSO POSTED

More information

Statistics in medicine

Statistics in medicine Statistics in medicine Lecture 1- part 1: Describing variation, and graphical presentation Outline Sources of variation Types of variables Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease

More information

Learning Objectives for Stat 225

Learning Objectives for Stat 225 Learning Objectives for Stat 225 08/20/12 Introduction to Probability: Get some general ideas about probability, and learn how to use sample space to compute the probability of a specific event. Set Theory:

More information

1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved.

1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved. 1-1 Chapter 1 Sampling and Descriptive Statistics 1-2 Why Statistics? Deal with uncertainty in repeated scientific measurements Draw conclusions from data Design valid experiments and draw reliable conclusions

More information

Lecture 4 Multiple linear regression

Lecture 4 Multiple linear regression Lecture 4 Multiple linear regression BIOST 515 January 15, 2004 Outline 1 Motivation for the multiple regression model Multiple regression in matrix notation Least squares estimation of model parameters

More information

Describing Distributions With Numbers

Describing Distributions With Numbers Describing Distributions With Numbers October 24, 2012 What Do We Usually Summarize? Measures of Center. Percentiles. Measures of Spread. A Summary Statement. Choosing Numerical Summaries. 1.0 What Do

More information

Lecture Slides. Elementary Statistics Eleventh Edition. by Mario F. Triola. and the Triola Statistics Series 3.1-1

Lecture Slides. Elementary Statistics Eleventh Edition. by Mario F. Triola. and the Triola Statistics Series 3.1-1 Lecture Slides Elementary Statistics Eleventh Edition and the Triola Statistics Series by Mario F. Triola 3.1-1 Chapter 3 Statistics for Describing, Exploring, and Comparing Data 3-1 Review and Preview

More information

Nemours Biomedical Research Statistics Course. Li Xie Nemours Biostatistics Core October 14, 2014

Nemours Biomedical Research Statistics Course. Li Xie Nemours Biostatistics Core October 14, 2014 Nemours Biomedical Research Statistics Course Li Xie Nemours Biostatistics Core October 14, 2014 Outline Recap Introduction to Logistic Regression Recap Descriptive statistics Variable type Example of

More information

Section 2.4. Measuring Spread. How Can We Describe the Spread of Quantitative Data? Review: Central Measures

Section 2.4. Measuring Spread. How Can We Describe the Spread of Quantitative Data? Review: Central Measures mean median mode Review: entral Measures Mean, Median and Mode When do we use mean or median? If there is (are) outliers, use Median If there is no outlier, use Mean. Example: For a data 1, 1., 1.5, 1.7,

More information

TOPIC: Descriptive Statistics Single Variable

TOPIC: Descriptive Statistics Single Variable TOPIC: Descriptive Statistics Single Variable I. Numerical data summary measurements A. Measures of Location. Measures of central tendency Mean; Median; Mode. Quantiles - measures of noncentral tendency

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics DS GA 1002 Probability and Statistics for Data Science http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall17 Carlos Fernandez-Granda Descriptive statistics Techniques to visualize

More information

Describing Distributions with Numbers

Describing Distributions with Numbers Describing Distributions with Numbers Using graphs, we could determine the center, spread, and shape of the distribution of a quantitative variable. We can also use numbers (called summary statistics)

More information

Visualizing and summarizing data

Visualizing and summarizing data Visualizing and summarizing data Ken Rice, Dept of Biostatistics HUBIO 530 January 2015 Q. What s your talk about? Today I will describe: How to visualize small datasets How to summarize small datasets

More information

Types of Information. Topic 2 - Descriptive Statistics. Examples. Sample and Sample Size. Background Reading. Variables classified as STAT 511

Types of Information. Topic 2 - Descriptive Statistics. Examples. Sample and Sample Size. Background Reading. Variables classified as STAT 511 Topic 2 - Descriptive Statistics STAT 511 Professor Bruce Craig Types of Information Variables classified as Categorical (qualitative) - variable classifies individual into one of several groups or categories

More information

Week 1: Intro to R and EDA

Week 1: Intro to R and EDA Statistical Methods APPM 4570/5570, STAT 4000/5000 Populations and Samples 1 Week 1: Intro to R and EDA Introduction to EDA Objective: study of a characteristic (measurable quantity, random variable) for

More information

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1 Lecture Slides Elementary Statistics Tenth Edition and the Triola Statistics Series by Mario F. Triola Slide 1 Chapter 3 Statistics for Describing, Exploring, and Comparing Data 3-1 Overview 3-2 Measures

More information

Remember your SOCS! S: O: C: S:

Remember your SOCS! S: O: C: S: Remember your SOCS! S: O: C: S: 1.1: Displaying Distributions with Graphs Dotplot: Age of your fathers Low scale: 45 High scale: 75 Doesn t have to start at zero, just cover the range of the data Label

More information

ST Presenting & Summarising Data Descriptive Statistics. Frequency Distribution, Histogram & Bar Chart

ST Presenting & Summarising Data Descriptive Statistics. Frequency Distribution, Histogram & Bar Chart ST2001 2. Presenting & Summarising Data Descriptive Statistics Frequency Distribution, Histogram & Bar Chart Summary of Previous Lecture u A study often involves taking a sample from a population that

More information

Histograms allow a visual interpretation

Histograms allow a visual interpretation Chapter 4: Displaying and Summarizing i Quantitative Data s allow a visual interpretation of quantitative (numerical) data by indicating the number of data points that lie within a range of values, called

More information

MAT Mathematics in Today's World

MAT Mathematics in Today's World MAT 1000 Mathematics in Today's World Last Time 1. Three keys to summarize a collection of data: shape, center, spread. 2. Can measure spread with the fivenumber summary. 3. The five-number summary can

More information

Describing Distributions

Describing Distributions Describing Distributions With Numbers April 18, 2012 Summary Statistics. Measures of Center. Percentiles. Measures of Spread. A Summary Statement. Choosing Numerical Summaries. 1.0 What Are Summary Statistics?

More information

STOR 155 Introductory Statistics. Lecture 4: Displaying Distributions with Numbers (II)

STOR 155 Introductory Statistics. Lecture 4: Displaying Distributions with Numbers (II) The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL STOR 155 Introductory Statistics Lecture 4: Displaying Distributions with Numbers (II) 9/8/09 Lecture 4 1 Numerical Summary for Distributions Center Mean

More information

F78SC2 Notes 2 RJRC. If the interest rate is 5%, we substitute x = 0.05 in the formula. This gives

F78SC2 Notes 2 RJRC. If the interest rate is 5%, we substitute x = 0.05 in the formula. This gives F78SC2 Notes 2 RJRC Algebra It is useful to use letters to represent numbers. We can use the rules of arithmetic to manipulate the formula and just substitute in the numbers at the end. Example: 100 invested

More information