BNG 495 Capstone Design. Descriptive Statistics

Similar documents
Descriptive Univariate Statistics and Bivariate Correlation

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

Tastitsticsss? What s that? Principles of Biostatistics and Informatics. Variables, outcomes. Tastitsticsss? What s that?

Chapter 1 - Lecture 3 Measures of Location

Describing distributions with numbers

What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty.

Unit 2. Describing Data: Numerical

Introduction to Statistics

Chapter 1 Descriptive Statistics

Chapter 3. Data Description

Units. Exploratory Data Analysis. Variables. Student Data

1. Exploratory Data Analysis

Further Mathematics 2018 CORE: Data analysis Chapter 2 Summarising numerical data

P8130: Biostatistical Methods I

2011 Pearson Education, Inc

Describing distributions with numbers

Lecture 1: Descriptive Statistics

Chapter 3. Measuring data

Unit 2: Numerical Descriptive Measures

Last Lecture. Distinguish Populations from Samples. Knowing different Sampling Techniques. Distinguish Parameters from Statistics

Chapter 4. Displaying and Summarizing. Quantitative Data

CHAPTER 2: Describing Distributions with Numbers

TOPIC: Descriptive Statistics Single Variable

Chapter 1. Looking at Data

Lecture 6: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries)

Chapter 1: Exploring Data

Chapter 2 Descriptive Statistics

Lecture 2 and Lecture 3

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 3.1- #

STP 420 INTRODUCTION TO APPLIED STATISTICS NOTES

Describing Distributions

Lecture 3B: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries)

1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved.

Chapter 2: Tools for Exploring Univariate Data

MATH 1150 Chapter 2 Notation and Terminology

Descriptive Statistics

Topic 3: Introduction to Statistics. Algebra 1. Collecting Data. Table of Contents. Categorical or Quantitative? What is the Study of Statistics?!

Exploratory data analysis

= n 1. n 1. Measures of Variability. Sample Variance. Range. Sample Standard Deviation ( ) 2. Chapter 2 Slides. Maurice Geraghty

Chapter 3 Statistics for Describing, Exploring, and Comparing Data. Section 3-1: Overview. 3-2 Measures of Center. Definition. Key Concept.

Performance of fourth-grade students on an agility test

Describing Distributions With Numbers

Describing Distributions With Numbers Chapter 12

Elementary Statistics

Biostatistics for biomedical profession. BIMM34 Karin Källen & Linda Hartman November-December 2015

CIVL 7012/8012. Collection and Analysis of Information

200 participants [EUR] ( =60) 200 = 30% i.e. nearly a third of the phone bills are greater than 75 EUR

Descriptive Statistics-I. Dr Mahmoud Alhussami

QUANTITATIVE DATA. UNIVARIATE DATA data for one variable

Chapter 5. Understanding and Comparing. Distributions

1.3: Describing Quantitative Data with Numbers

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.

3 Lecture 3 Notes: Measures of Variation. The Boxplot. Definition of Probability

STT 315 This lecture is based on Chapter 2 of the textbook.

MATH 10 INTRODUCTORY STATISTICS

Describing Distributions with Numbers

Introduction to Statistics for Traffic Crash Reconstruction

Statistics I Chapter 2: Univariate data analysis

MgtOp 215 Chapter 3 Dr. Ahn

Prentice Hall Stats: Modeling the World 2004 (Bock) Correlated to: National Advanced Placement (AP) Statistics Course Outline (Grades 9-12)

Quantitative Tools for Research

STAT 200 Chapter 1 Looking at Data - Distributions

are the objects described by a set of data. They may be people, animals or things.

Introduction and Descriptive Statistics p. 1 Introduction to Statistics p. 3 Statistics, Science, and Observations p. 5 Populations and Samples p.

Lecture 1: Description of Data. Readings: Sections 1.2,

CHAPTER 5: EXPLORING DATA DISTRIBUTIONS. Individuals are the objects described by a set of data. These individuals may be people, animals or things.

Statistics I Chapter 2: Univariate data analysis

A is one of the categories into which qualitative data can be classified.

Chapter 5: Exploring Data: Distributions Lesson Plan

Stat 101 Exam 1 Important Formulas and Concepts 1

Measures of the Location of the Data

2.1 Measures of Location (P.9-11)

Chapter 2 Class Notes Sample & Population Descriptions Classifying variables

Resistant Measure - A statistic that is not affected very much by extreme observations.

The Normal Distribution. Chapter 6

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.

Dover- Sherborn High School Mathematics Curriculum Probability and Statistics

Contents. Acknowledgments. xix

Instrumentation (cont.) Statistics vs. Parameters. Descriptive Statistics. Types of Numerical Data

STATISTICS 1 REVISION NOTES

Variables, distributions, and samples (cont.) Phil 12: Logic and Decision Making Fall 2010 UC San Diego 10/18/2010

Chapter 2 Solutions Page 15 of 28

a table or a graph or an equation.

AP Final Review II Exploring Data (20% 30%)

Lecture Slides. Elementary Statistics Eleventh Edition. by Mario F. Triola. and the Triola Statistics Series 3.1-1

Module 1. Identify parts of an expression using vocabulary such as term, equation, inequality

BIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke

Algebra I Learning Targets Chapter 1: Equations and Inequalities (one variable) Section Section Title Learning Target(s)

Finding Quartiles. . Q1 is the median of the lower half of the data. Q3 is the median of the upper half of the data

Announcements. Lecture 1 - Data and Data Summaries. Data. Numerical Data. all variables. continuous discrete. Homework 1 - Out 1/15, due 1/22

MATH 117 Statistical Methods for Management I Chapter Three

Math 223 Lecture Notes 3/15/04 From The Basic Practice of Statistics, bymoore

SUMMARIZING MEASURED DATA. Gaia Maselli

2.0 Lesson Plan. Answer Questions. Summary Statistics. Histograms. The Normal Distribution. Using the Standard Normal Table

Histograms allow a visual interpretation

Chapter 6 Group Activity - SOLUTIONS

Fundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur

Chapter2 Description of samples and populations. 2.1 Introduction.

Learning Objectives for Stat 225

Determining the Spread of a Distribution

Transcription:

BNG 495 Capstone Design Descriptive Statistics

Overview The overall goal of this short course in statistics is to provide an introduction to descriptive and inferential statistical methods, with a focus on using MATLAB for implementation. The four modules are: Introduction and descriptive statistics Probability distributions Hypothesis testing Correlation and regression Each lecture will be supplemented with a MATLAB tutorial on the same topic You will need to walk through these yourselves after we have covered the corresponding topics in class

Statistics a (very brief) introduction 1663 Natural and Political Observations upon the Bills of Mortality by John Graunt is published Motivated by the desire to base policy on demographic data 1700s Laplace introduces the normal distribution and regression via his study of astronomy 1800s Quetelet applies statistical analysis to human biology The central purpose of statistics is to learn more about some population of interest (e.g., all humans in the world) However, we very rarely, if ever, have access to every individual in the population! sample a subset of the entire population??? compilation of data about the entire population With a sample in hand, we seek to either summarize that data (using descriptive statistics) or use the data to make some prediction or statement about the population (using inferential statistics)

The Central Dogma of Statistics used to summarize data; (this is the focus for today) used to make inferences about the population

Dimensionality of data sets Univariate: measurements made on one variable per subject. This will be the focus for modules 1-3 Bivariate: measurement made on two variables per subject. Multivariate: measurement made on many variables per subject.

Types of descriptive statistics Central tendency measures: computed to give a center around which the measurements in the data are distributed (also called measures of location ). (mean, median, mode, quartiles) Variation or variability measures: describe data spread or how far the measurements are away from the center. (variance, standard dev.) Relative standing measures: describe the relative position of specific measurements in the data. (percentiles)

Measures of central tendency mean The sample mean (a.k.a. average): To calculate the average x of a set of observations, add their values and divide by the number of observations: n x 1 +...+ x n 1 x = = Σ x n n i i = 1

Measures of central tendency - median Median the exact middle value Calculation If there are an odd number of observations, find the middle value If there are an even number of observations, find the mean of the middle two values Example: Median = ave(22,23) = 22.5 Age of students: 17 19 21 22 23 23 23 28

Which measure of central tendency is best? In other words, which measure better approximates the center of a data set? Mean is best for symmetric distributions w/o outliers Median is useful for skewed distributions or data with (one-directional) outliers mean = 3.125 median = 3 mean = 4.857 median = 4

Scale: variance The sample variance is the average of squared deviations of values from the mean n s 2 = Σ (x i x) 2 i = 1 n 1 Square the deviations to get rid of the negatives The result is that the contribution to the variance increases as you go farther from the mean in either direction

Scale: standard deviation s = i n Σ(x i x) 2 = 1 n 1 Procedure to obtain the sample standard deviation: Score/measure observations (in the units that are meaningful, let s say m/s) Find the mean of the observations (m/s) Find each score s deviation from the mean (m/s) Square all those deviations (m/s) 2 Divide by n 1 (m/s) 2 (note that this is the variance) square root (m/s) now we have the starting units! Let s do a simple example problem!

Measures of central tendency mode The mode is the observation that takes place most frequently in a data set Unlike the mean or median, the mode is not necessarily unique the same maximum frequency may occur at different values. Based on the previous slide, is the mode a parametric statistic? (hint: remember that it is a measure of central tendency)

Scale: quartiles and IQR Q 2 is the same as the median The first quartile (Q 1 ) and third quartile (Q 3 ) are the medians of the data sets that would be created if all of the values below and above Q 2, respectively, were chosen. The interquartile range (IQR) is Q 3 Q 1

Quartiles example problem Find the three quartiles and IQR of the following two datasets: 2 3 6 10 12 14 15 18 2 5 9 11 13 15 19 21 24 Q 1 = 4.5 Median = 11 Q 3 = 14.5 IQR = 10 Q 1 = 7 Median = 13 Q 3 = 20 IQR = 13 Note from this example that the 25% rule from the previous slide isn t precisely correct It is easiest to first insert the median the lower and upper halves from which to find Q 1 and Q 3 should then be obvious

Percentiles (a.k.a. quantiles) Generally, the n th percentile is a value such that n% of the observations fall at or below it: Q 1 = 25 th percentile Median = 50 th percentile Q 2 = 75 th percentile

Univariate data: histograms and bar plots What s the differences between a histogram and bar plot? Bar plot Used for categorical variables to show frequency or proportion in each category. Translate the data from frequency tables into a pictoral representation... Histogram Used to visualize distribution (shape, center, range, variation) of continuous variables bin size is important

Effect of bin size on histogram Simulated 1000 N(0,1) 1000 random numbers from the standard normal distribution with mean 0 and st. dev. 1

More on histograms What s the difference between a frequency histogram and a density histogram?

More on histograms What s the difference between a frequency histogram and a density histogram?

More on histograms So, for our roughly gaussian distribution from earlier, the density histogram looks like this: 0.4 0.35 0.3 mean = 3.125 relative frequency 0.25 0.2 0.15 0.1 mean = 4.857 median = 3 0.05 median = 4 0 1 2 3 4 5 observation

Stem and leaf plots

Box and whisker plots An outlier is a score either 1.5 IQR above the upper quartile or below the lower quartile

Example problem Two different classes take a quiz and gets the following scores. Class 1: 2, 4, 6, 8, 10, 12, 14 Class 2: 2, 2, 3, 8, 8, 10, 23 What the mean and median of each set? The same! Will making a box and whisker plot of each set of data give us a better picture of their distributions? (let s do the second one together)

Box plot procedure Steps to make our box plot: Find the median, Q1, Q3, and IQR Draw 3 horizontal lines, at Q1, median, and Q3 Draw the corresponding vertical lines to make the boxes Compute the lower inner fence (Q1 1.5*IQR) and the upper inner fence (Q3 + 1.5*IQR) Draw a whisker downward from Q1 to lower inner fence or minimum, whichever comes first Draw a whisker upward from Q3 to upper inner fence or maximum, whichever comes first Compute the lower outer fence (Q1 3*IQR) and the upper outer fence (Q3 + 3*IQR) Mild outliers fall between the inner and outer fences, mark with O Extreme outliers fall outside outer fences, mark with *

References Lecture 2 Descriptive Statistics and Exploratory Data Analysis University of Washington School of Medicine. http://www.webquest.hawaii.edu/ http://www.sonoma.edu/users/w/wilsonst/cour ses/math_300/final/p14/default.html