Statistical Concepts. Constructing a Trend Plot

Similar documents
Descriptive Data Summarization

P8130: Biostatistical Methods I

Unit 2: Numerical Descriptive Measures

Describing distributions with numbers

Math Sec 4 CST Topic 7. Statistics. i.e: Add up all values and divide by the total number of values.

Describing distributions with numbers

ADMS2320.com. We Make Stats Easy. Chapter 4. ADMS2320.com Tutorials Past Tests. Tutorial Length 1 Hour 45 Minutes

TOPIC: Descriptive Statistics Single Variable

Chapter 2: Tools for Exploring Univariate Data

Unit Two Descriptive Biostatistics. Dr Mahmoud Alhussami

MATH 1150 Chapter 2 Notation and Terminology

Chapter 1 - Lecture 3 Measures of Location

1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved.

Statistical Methods: Introduction, Applications, Histograms, Ch

Statistics and parameters

Math 10 - Compilation of Sample Exam Questions + Answers

are the objects described by a set of data. They may be people, animals or things.

STA 218: Statistics for Management

STP 420 INTRODUCTION TO APPLIED STATISTICS NOTES

Unit 2. Describing Data: Numerical

2.0 Lesson Plan. Answer Questions. Summary Statistics. Histograms. The Normal Distribution. Using the Standard Normal Table

CHAPTER 2: Describing Distributions with Numbers

Descriptive Statistics-I. Dr Mahmoud Alhussami

CHAPTER 1. Introduction

Lecture 2. Quantitative variables. There are three main graphical methods for describing, summarizing, and detecting patterns in quantitative data:

The science of learning from data.

Class 11 Maths Chapter 15. Statistics

Stat 101 Exam 1 Important Formulas and Concepts 1

Module 1. Identify parts of an expression using vocabulary such as term, equation, inequality

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.

QUANTITATIVE DATA. UNIVARIATE DATA data for one variable

Chapter 1: Exploring Data

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 3.1- #

4.1 Introduction. 4.2 The Scatter Diagram. Chapter 4 Linear Correlation and Regression Analysis

Lesson Plan. Answer Questions. Summary Statistics. Histograms. The Normal Distribution. Using the Standard Normal Table

Histograms allow a visual interpretation

CHAPTER 5: EXPLORING DATA DISTRIBUTIONS. Individuals are the objects described by a set of data. These individuals may be people, animals or things.

ALGEBRA 1 KEYSTONE. Module 1 and Module 2 both have 23 multiple choice questions and 4 CRQ questions.

Lecture 2 and Lecture 3

+ Check for Understanding

1 Measures of the Center of a Distribution

Section 3. Measures of Variation

MATH 10 INTRODUCTORY STATISTICS

Chapter 4. Displaying and Summarizing. Quantitative Data

Chapter 3: Displaying and summarizing quantitative data p52 The pattern of variation of a variable is called its distribution.

Measures of center. The mean The mean of a distribution is the arithmetic average of the observations:

Chapter 12 - Part I: Correlation Analysis

Chapter 6 Group Activity - SOLUTIONS

Inferences for Regression

2011 Pearson Education, Inc

Chapter 1. Looking at Data

appstats27.notebook April 06, 2017

Chapter 2 Descriptive Statistics

Chapter 5. Understanding and Comparing. Distributions

Introduction to Uncertainty and Treatment of Data

Chapter 27 Summary Inferences for Regression

3.1 Measure of Center

STAT 200 Chapter 1 Looking at Data - Distributions

What is statistics? Statistics is the science of: Collecting information. Organizing and summarizing the information collected

Overview. Overview. Overview. Specific Examples. General Examples. Bivariate Regression & Correlation

Describing Distributions with Numbers

College Mathematics

CHAPTER 4 VARIABILITY ANALYSES. Chapter 3 introduced the mode, median, and mean as tools for summarizing the

AP Final Review II Exploring Data (20% 30%)

Introduction to Basic Statistics Version 2

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006

ST Presenting & Summarising Data Descriptive Statistics. Frequency Distribution, Histogram & Bar Chart

Regression analysis is a tool for building mathematical and statistical models that characterize relationships between variables Finds a linear

Speed of waves. Apparatus: Long spring, meter stick, spring scale, stopwatch (or cell phone stopwatch)

Chapter 3: Examining Relationships

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.

Describing Data with Numerical Measures

Review. Midterm Exam. Midterm Review. May 6th, 2015 AMS-UCSC. Spring Session 1 (Midterm Review) AMS-5 May 6th, / 24

Sampling, Frequency Distributions, and Graphs (12.1)

Summary statistics. G.S. Questa, L. Trapani. MSc Induction - Summary statistics 1

Elementary Statistics

Introduction to statistics

Learning Goals. 2. To be able to distinguish between a dependent and independent variable.

Summarizing and Displaying Measurement Data/Understanding and Comparing Distributions

20 Hypothesis Testing, Part I

Lecture 3. The Population Variance. The population variance, denoted σ 2, is the sum. of the squared deviations about the population

1. Exploratory Data Analysis

1 Measurement Uncertainties

F78SC2 Notes 2 RJRC. If the interest rate is 5%, we substitute x = 0.05 in the formula. This gives

STOR 155 Introductory Statistics. Lecture 4: Displaying Distributions with Numbers (II)

AIM HIGH SCHOOL. Curriculum Map W. 12 Mile Road Farmington Hills, MI (248)

Continuous Random Variables

Chapter 3. Measuring data

The Normal Distribution. Chapter 6

Math Section SR MW 1-2:30pm. Bekki George: University of Houston. Sections

AP Statistics. Chapter 9 Re-Expressing data: Get it Straight

Learning Objectives for Stat 225

Statistics for Managers using Microsoft Excel 6 th Edition

Descriptive Univariate Statistics and Bivariate Correlation

Chapter 2: Statistical Methods. 4. Total Measurement System and Errors. 2. Characterizing statistical distribution. 3. Interpretation of Results

Section 3.4 Normal Distribution MDM4U Jensen

Chapter Four. Numerical Descriptive Techniques. Range, Standard Deviation, Variance, Coefficient of Variation

Revised: 2/19/09 Unit 1 Pre-Algebra Concepts and Operations Review

Σ x i. Sigma Notation

BNG 495 Capstone Design. Descriptive Statistics

Transcription:

Module 1: Review of Basic Statistical Concepts 1.2 Plotting Data, Measures of Central Tendency and Dispersion, and Correlation Constructing a Trend Plot A trend plot graphs the data against a variable of interest, t often time or space. It is used to examine whether or not there is a relationship between the variable being examined and time/space. Examples: Is the level of contamination in a well increasing over time? Is there a relationship between the measured concentration of a contaminant in a series of wells and their distance from a suspected source? Module 1.2 2 1

Constructing a Trend Plot b) Trend Plot Leve el of Contamination (pp 4 3 2 1 Jan-9 97 Jul-9 97 Jan-9 98 Jul-9 98 Jan-9 99 Jul-9 99 Jan- Jul- Jan- 1 Time Module 1.2 3 Constructing a Trend Plot Note that both axes are labeled Sometimes it makes sense to connect the points, sometimes not. Use your judgement. Do not Add a Trendline in Excel unless you have run a regression analysis and know that the slope of the line is significantly different from zero We ll learn how to do that later in the course. Module 1.2 4 2

Constructing a Scatter Plot A scatter plot graphs values of one variable against the values of another variable. It is used to see if there is a relationship between the two variables. Module 1.2 5 Constructing a Scatter Plot Scatter Plot ntaminant B (ppb) Con 15 1 5 5 15 25 35 Contaminant A (ppb) Module 1.2 6 3

Constructing a Histogram A histogram is a graph of a sample pdf. It is constructed by: Choose 5-1 non-overlapping intervals that cover the range of the data Figure out how many data points fall into each interval Divide the number of points in each interval by the total number of data points to get relative frequencies Plot the data range on the X axis and the relative frequency on the Y Draw a bar the width of each interval and the height of the relative frequency Module 1.2 7 Constructing a Histogram Histogram.3 Rela ative Frequency.25.2.15.1.5.5 1. 1.5 2. 2.5 3. 3.5 4. 4.5 5. Contaminant C (ppb) Module 1.2 8 4

Measures of Central Tendency Sample Mean Same as Average n 1 X Xi n i 1 Sample Median The middle value of a data set sorted from largest to smallest. If there are an even number of data points, average the two middle values. Sample Mode The most commonly occurring value Module 1.2 9 Measures of Central Tendency Example: Heights to the nearest inch 6 64 65 67 67 67 69 7 72 72 Mean = (6+64+65+67+67+67+69+7+72+72)/1 = 67.3 Median = (67 + 67)/2 = 67 Mode = 67 Module 1.2 1 5

Measures of Central Tendency Example: Salaries in a Start-up Dot Com company (in thousands) 27 27 33 35 85 15 Mean = 59.5K Median = 34K Mode = 27K So, for symmetric distributions ib i (like the normal) the mean is a good measure of central tendency but for skewed distributions (like income or environmental contamination) it is heavily influenced by a few unusual points. Module 1.2 11 Measures of Dispersion Measures of dispersion measure how spread out the data are. Sample Range = largest value smallest value The problem with the range is that it tells you nothing about all of the rest of the data and it s very affected by one odd point Module 1.2 12 6

Measures of Dispersion Intuitively, you can think of the Sample Standard Deviation as the average difference between the data points and the mean. Unlike the range, it s a function of all of the data points. A deviation is a difference between two values. We can easily calculate the deviations of each data point from the mean. If we summed these, we would get zero. So, we must either square them or take their absolute value. Absolute values are difficult to work with mathematically so we ll square the deviations. Then we average them to get the variance. Then, since we squared the deviations, the units of the variance are the square of the data points so we take the square root to get back to original units. Module 1.2 13 Measures of Dispersion Because of some mathematical properties of the statistic, we use n-1 rather than n in taking the average of the deviations. Sample Variance is s 2. Take the square root of it to get the sample standard deviation s. s 2 n 1 n 1 i 1 ( X i X ) 2 s 1 n 1 n i 1 ( X i X ) 2 Module 1.2 14 7

Measures of Dispersion Example: Heights to the nearest inch X i X i X i X 2 i i 6-7.3 53.29 64-3.3 1.89 65-2.3 5.29 67 -.3.9 67 -.3.9 67 -.3.9 69 1.7 2.89 7 2.7 7.29 72 4.7 22.9 72 4.7 22.9 124.1 Sample Mean is = 67.3 Sample Variance is s 2 = (1/(1-1))*124.1=(1/9)*124.1 = 13.79 Sample Standard deviation is the square root of 13.79 = 3.71 X Module 1.2 15 Percentiles of a Distribution The population median is the point that has 5% of the distribution above it and 5% below. The sample median has 5% of the data above and 5% below. The percentiles of the distribution (or sample) are similar. The Xth percentile has X percent of the distribution (or data) below it and 1-X percent above it. For example, a 95 percentile has 95% of the distribution below it and 5% above it. Module 1.2 16 8

Correlation The correlation coefficient measures the degree of linear association between two variables. It is denoted by r and ranges between -1 and 1. A perfect linear association gives points that plot on a straight line. No association gives points that plot as a cloud. A positive linear association means that high values of one variable are associated with high values of the other. A negative linear association means that high values of one variable are associated with low values of the other. Module 1.2 17 Examples of the Correlation Coefficient 9 8 7 6 r = -1 5 4 3 2 1 2 4 6 8 9 8 7 6 5 4 3 2 1 r = 1 2 4 6 8 Module 1.2 18 9

Examples of the Correlation Coefficient 9 8 7 r =.9 6 5 4 3 2 1 2 4 6 8 9 8 7 r = -.9 6 5 4 3 2 1 2 4 6 8 Module 1.2 19 Examples of the Correlation Coefficient 1 9 r = -.1 8 7 6 5 4 3 2 1 2 4 6 8 1 Module 1.2 2 1

Correlation Note it is the degree of linear, or straight line, association Variables can have strong associations and have very small correlations This association is strong but r= Module 1.2 21 11