Lecture 1: Description of Data. Readings: Sections 1.2,

Similar documents
STAT 200 Chapter 1 Looking at Data - Distributions

MATH 1150 Chapter 2 Notation and Terminology

STT 315 This lecture is based on Chapter 2 of the textbook.

Topic 3: Introduction to Statistics. Algebra 1. Collecting Data. Table of Contents. Categorical or Quantitative? What is the Study of Statistics?!

Chapter 4. Displaying and Summarizing. Quantitative Data

Chapter 2: Tools for Exploring Univariate Data

CHAPTER 1. Introduction

AP Final Review II Exploring Data (20% 30%)

What is statistics? Statistics is the science of: Collecting information. Organizing and summarizing the information collected

STP 420 INTRODUCTION TO APPLIED STATISTICS NOTES

1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved.

Elementary Statistics

Unit 1: Statistics. Mrs. Valentine Math III

Stat 101 Exam 1 Important Formulas and Concepts 1

Chapter 6 Group Activity - SOLUTIONS

Math 223 Lecture Notes 3/15/04 From The Basic Practice of Statistics, bymoore

Chapter 2 Solutions Page 15 of 28

CIVL 7012/8012. Collection and Analysis of Information

Lecture 1: Descriptive Statistics

Resistant Measure - A statistic that is not affected very much by extreme observations.

Let's Do It! What Type of Variable?

Let's Do It! What Type of Variable?

are the objects described by a set of data. They may be people, animals or things.

Chapter 5: Exploring Data: Distributions Lesson Plan

CHAPTER 2: Describing Distributions with Numbers

Describing distributions with numbers

Histograms allow a visual interpretation

Practice Questions for Exam 1

Chapter 1 - Lecture 3 Measures of Location

Announcements. Lecture 1 - Data and Data Summaries. Data. Numerical Data. all variables. continuous discrete. Homework 1 - Out 1/15, due 1/22

Chapter 1. Looking at Data

Chapter. Numerically Summarizing Data Pearson Prentice Hall. All rights reserved

Chapter 1: Exploring Data

Describing distributions with numbers

1.3.1 Measuring Center: The Mean

F78SC2 Notes 2 RJRC. If the interest rate is 5%, we substitute x = 0.05 in the formula. This gives

(quantitative or categorical variables) Numerical descriptions of center, variability, position (quantitative variables)

CHAPTER 5: EXPLORING DATA DISTRIBUTIONS. Individuals are the objects described by a set of data. These individuals may be people, animals or things.

Averages How difficult is QM1? What is the average mark? Week 1b, Lecture 2

Notation Measures of Location Measures of Dispersion Standardization Proportions for Categorical Variables Measures of Association Outliers

Chapter 2 Class Notes Sample & Population Descriptions Classifying variables

BNG 495 Capstone Design. Descriptive Statistics

Chapter 5: Exploring Data: Distributions Lesson Plan

Chapters 1 & 2 Exam Review

Math 58. Rumbos Fall More Review Problems Solutions

Math 140 Introductory Statistics

Math 140 Introductory Statistics

Measures of the Location of the Data

Chapter 5. Understanding and Comparing. Distributions

1 Probability Distributions

Descriptive Univariate Statistics and Bivariate Correlation

Lecture 2. Quantitative variables. There are three main graphical methods for describing, summarizing, and detecting patterns in quantitative data:

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency

3.3. Section. Measures of Central Tendency and Dispersion from Grouped Data. Copyright 2013, 2010 and 2007 Pearson Education, Inc.

Example 2. Given the data below, complete the chart:

The empirical ( ) rule

Nicole Dalzell. July 2, 2014

Sampling, Frequency Distributions, and Graphs (12.1)

Units. Exploratory Data Analysis. Variables. Student Data

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

Further Mathematics 2018 CORE: Data analysis Chapter 2 Summarising numerical data

Chapter 1: Introduction. Material from Devore s book (Ed 8), and Cengagebrain.com

Chapter2 Description of samples and populations. 2.1 Introduction.

Lecture Notes 2: Variables and graphics

Lecture 6: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries)

Chapter 3: Displaying and summarizing quantitative data p52 The pattern of variation of a variable is called its distribution.

1.3: Describing Quantitative Data with Numbers

1. Descriptive stats methods for organizing and summarizing information

Perhaps the most important measure of location is the mean (average). Sample mean: where n = sample size. Arrange the values from smallest to largest:

Math 082 Final Examination Review

Types of Information. Topic 2 - Descriptive Statistics. Examples. Sample and Sample Size. Background Reading. Variables classified as STAT 511

A is one of the categories into which qualitative data can be classified.

TOPIC: Descriptive Statistics Single Variable

Descriptive Statistics

Lecture 3B: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries)

Chapter 1:Descriptive statistics

Exam: practice test 1 MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

6 THE NORMAL DISTRIBUTION

MATH 117 Statistical Methods for Management I Chapter Three

Chapter 4.notebook. August 30, 2017

What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty.

3.1 Measure of Center

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 3.1- #

Vocabulary: Samples and Populations

EQ: What is a normal distribution?

Shape, Outliers, Center, Spread Frequency and Relative Histograms Related to other types of graphical displays

Salt Lake Community College MATH 1040 Final Exam Fall Semester 2011 Form E

1 Measures of the Center of a Distribution

1. Exploratory Data Analysis

University of California, Berkeley, Statistics 131A: Statistical Inference for the Social and Life Sciences. Michael Lugo, Spring 2012

a table or a graph or an equation.

Statistics and parameters

Instructor: Doug Ensley Course: MAT Applied Statistics - Ensley

Describing Distributions with Numbers

Honors Algebra 1 - Fall Final Review

Performance of fourth-grade students on an agility test

MATH 10 INTRODUCTORY STATISTICS

Chapter 3. Measuring data

STRAND E: STATISTICS. UNIT E4 Measures of Variation: Text * * Contents. Section. E4.1 Cumulative Frequency. E4.2 Box and Whisker Plots

Section 3.2 Measures of Central Tendency

Transcription:

Lecture 1: Description of Data Readings: Sections 1.,.1-.3 1 Variable Example 1 a. Write two complete and grammatically correct sentences, explaining your primary reason for taking this course and then describing what the term statistics means to you. b. For each word in your response to part a, record the number of letters in the word: c. Did every word in your two sentences contain the same number of letters? Definition A variable is any characteristic of a person or thing that can be assigned a number or a category. The person or things to which the number or category is assigned, such as a student in your class, is called the observational unit. Data consist of the numbers or categories recorded for the observational units in a study. Variability refers to the phenomenon of a variable taking on different values or categories from observational unit to observational unit. A quantitative variable measures a numerical characteristic such as height, where a categorical records a group designation such as gender. Example Now consider the students in your class as observational units. Classify each of the following variables as categorical or quantitative. How many hours you have slept in the past 4 hours Whether you have slept for at least 7 hours in the past 4 hours How many states you have visited Handedness (which hand you write with) Day of the week on which you were born Gender Average study time per week Score on the first exam in this course 1

Still consider yourself and your classmates as observational units, can average height of students in the class be legitimately considered a variable? What about percentage of students in the class who have used a cell phone today? Explain. Example 3 Suppose that the observational units of interest are the fifty states. Identify which of the following are variables and which are not. Also classify the variables as categorical or quantitative. Gender of the state s current governor Number of states that have a female governor Percentage of the state s residents older than 65 years of age Highest speed limit in the state Whether the state s name contains one word Average income of the adult residents of the state How many states were settled before 1865 Example 4 For each of the following questions, identify the observational units and variables. Also classify each variable as type quantitative or categorical a. An economist suspects that chief executive officers (CEOs) of American companies tend to be taller than the national average height of 69 inches, so she takes a random sample of 100 CEOs and records their height. Observational units: Variable (Type): b. A conservationist recorded the whether (clear, partly cloudy, cloudy, rainy) and number of cars parked at noon at a trail head on each of 18 days. Observational units: Variable (Type):

c. A psychologist shows a videotaped interview of a married couple to a sample of 150 marriage counsellor. Each counsellor is asked to predict whether the couple will still be married five years later. The psychologist wants to test whether marriage counsellors make the correct prediction more than half the time. Observational units: Variable (Type): d. A psychologist gives an SAT-like exam to 00 African-American college students. Half of the students are randomly assigned to use a version of the exam that asks them to indicate their race, and the other half are randomly assigned to use a version of the exam that does not ask them to indicate their race. The psychologist suspects that those students who are not asked to indicate their race will score significantly higher on the exam than those who are asked to indicate their race. Observational units: Variable (Type): e. An economist randomly assigns four actors to go to ten different car dealerships each and negotiate the best price they can for a particular model of car. The four people are all the same age, dressed similarly, and tell the car sale people that they have the occupation and neighbourhood of residence. One of the actors is a white male, one is a black male, one is black male, one is a white female, and one is a black female. The economist wants to test whether the average prices differ significantly among these four types of customers. Observational units: Variable (Type): Wrap up... You encountered the most fundamental concept of statistics: variability. This concept will be central throughout the course. Some useful definitions to remember and habits to develop from this topic include Always consider data in context and anticipate reasonable values for the data collected and analyzed. A variable is a characteristic that varies from person to person from thing to thing. The person or thing is called an observational unit. Variables can be classified as categorical or quantitative, depending on whether the characteristic is a categorical designation (such as gender) or a numerical value (such as height). 3

Visualizing Data.1 Frequency Table and Histogram Example 5. (Binge Drinking in College). Binge drinkers: Five or more drinks in a row for males, four or more drinks in a row for females. Population: undergraduate students Sample: a sample of students in (a sample of) 30 colleges Variable: percentage of undergraduate students who are binge drinkers in a college Data: 46 5 51 35 58 60 59 46 33 57 55 1 48 36 13 7 58 64 46 67 4 53 6 41 9 6 18 66 41 6 Frequency distribution: a way to summarize data by displaying the number of times (frequency) or proportion of times (relative frequency) each value occurs in the data set. Class Index Class Interval Frequency Relative Frequency 1 [10, 0) 0.067 [0, 30) 5 0.167 3 [30, 40) 3 0.1 4 [40, 50) 7 0.33 5 [50, 60) 8 0.67 6 [60, 70) 5 0.167 Frequency Histogram Relative Frequency Histogram 8 0.05 Frequency 6 4 Relative Frequency 0.00 0.015 0.010 0.005 0 0.000 10 0 30 40 50 60 70 10 0 30 40 50 60 70 Three steps to create a histogram: 1. Group observations into classes and create the frequency table (classes are also called bins). Mark the class boundaries on a horizontal measurement axis 3. Above each class interval, draw a rectangle whose height is frequency or relative frequency How many classes? 4

Not too many, not too few Too many classes Too few classes 3.0.5 1 10 Frequency.0 1.5 1.0 Frequency 8 6 4 0.5 0.0 0 10 0 30 40 50 60 70 10 0 30 40 50 60 70 Use 5 to 15 classes for moderate sample size (n = 50); more classes may be used if sample size is larger. A reasonable rule of thumb is number of classes sample size Histogram with unequal width: rectangle height = relative frequency class width Frequency Histogram Frequency Histogram 8 0.05 6 0.00 Frequency 4 Density 0.015 0.010 0.005 0 0.000 0 40 60 80 100 0 40 60 80 100 Bar chart for categorical data - an analogue to histogram Example 6. Motorcycle Monthly was interested in the types of motorcycles their readers ride. 10 subscribers were randomly selected to be surveyed. Here are their responses Pareto diagram: Manufacturer Frequency Relative Frequency Honda 41 0.34 Yamaha 7 0.3 Kawasaki 0 0.17 Harley-Davidson 18 0.15 BMW 3 0.03 Other 11 0.09 Categories appear in order of decreasing frequency, except for the last miscellaneous class. 5

45 40 35 30 5 0 15 10 5 0 Honda Yamaha Kawasaki Harley Davidson BMW Other. Shapes of Distributions Unimodal, bimodal, or multimodal? Symmetric or skewed? Positively/right skewed, or negatively/left skewed? Symmetric Bimodal Positively Skewed Negatively Skewed 3 Numerical Summary of Data 3.1 Measures of center Sample mean 6

x = x 1 + x +... + x n = 1 n n n x i = 1 xi n Example: observations 6, 5, 7, 7, 6 (The sample mean is x = 31/5 = 6.) Sample median if n is odd, sample median is the middle ordered value: ( ) th n + 1 x = ordered value if n is even, sample median is the average of the two middle ordered values: x = average of ( n ) th and ( n + 1 ) th ordered value Example: observations 7, 9, 10, 1, 14 (The sample median is 10) Example: observations 3, 4, 9, 1, 14, 19 (The sample median is 10.5) If the histogram is fairly symmetric, the sample mean and sample median will be similar Sample mean is more sensitive to outliers (extreme values) than is the sample median Data x x 1,, 3, 4, 5 3 3 1,, 3, 4, 90 0 3 Trimmed mean: compromise between mean and median (semi-sensitive to extreme values) Example 7. n = 0 observations of lifetime (in hours) of an incandescent lamp 61 63 666 744 883 898 964 970 983 1003 1016 10 109 1058 1085 1088 11 1135 1197 101 10% trimmed mean: drop the smallest 10% and largest 10% of the observations and average the rest (10% trimmed mean is 979.15) 0% trimmed mean: drop the smallest 0% and largest 0% of the observations and average the rest (0% trimmed mean is 999.9167) 3. Measures of variability Motivation: Means and medians do not give a full picture Example: Midterm scores of students from two sections of a STAT course 7

0.0 0.15 0.10 0.05 0.00 0.06 0.05 0.04 0.03 0.0 0.01 0.00 50 60 70 80 90 100 50 60 70 80 90 100 Example 8: Three groups of data with 9 observations each Group 1 3 4 5 6 7 8 9 A 30 35 40 45 50 55 60 65 70 B 30 44 46 48 50 5 54 56 70 C 46 47 48 49 50 51 5 53 54 The three groups have the same mean and median. But there is clearly a difference. Which group appears to be more variable? Which is less variable? Sample range: The difference between the largest and smallest observation. Group Sample range A 40 B 40 C 8 Sample variance and sample standard deviation: 1. Deviations from the mean: difference between an observation x i and the mean x Group Deviations from the mean A -0-15 -10-5 0 5 10 15 0 B -0-6 -4-0 4 6 0 C -4-3 - -1 0 1 3 4. Sample variance: s = n (x i x) n 1 = S xx n 1 3. Sample standard deviation: s = s Example 8 (cont d): Group Squared Deviations from the mean S xx s s A 400 5 100 5 0 5 100 5 400 1500 187.5 13.693 B 400 36 16 4 0 4 16 36 400 91 114 10.677 C 16 9 4 1 0 1 4 9 16 60 7.5.739 4. An alternative formula for sample variance 8

Sum of Squares S xx = Sample variance s = Example 8 (cont d): n (x i x) = n x i np n 1 «x i n n x i ( n ) x i n i 1 3 4 5 6 7 8 9 x i 46 47 48 49 50 51 5 53 54 x i 116 09 304 401 500 601 704 809 916 np x i = 450 np x i = 560 450 s 560 9 = = 7.5 9 1 Interquartile Range Quartiles: Lower quartiles (LQ or Q1): Median of the lower half of the data values 5% of observations are smaller than this value Upper quartiles (UQ or Q3): Median of the upper half of the data values 75% of observations are smaller than this value If sample size n is an odd number, the median is included in both halves. There is a difference in how quartiles are defined in different books and softwares. You are expected to do it using the method given above! Example: 1,, 3, 4, 5 Median = 3 Lower quartile = Upper quartile = 4 Example: 1,, 3, 4, 5, 6 Median =3.5 Lower quartile = Upper quartile = 4 Interquartiles Range (IQR): difference between the upper and lower quartile (UQ - LQ) Outliers: observations farther than 1.5IQR from the closest quartile. Extreme outliers: observations farther than 3IQR from the closest quartile. Example: 1,, 3, 4, 5, 6, 11 Median = 4, LQ =.5, UQ = 5.5, IQR = 3, [LQ-1.5IQR, UQ+1.5IQR] = [-, 10] 9

4 Five-number summary and boxplot Five-number summary: Min, Lower quartile, Median, Upper Quartile, Max Boxplot: Max Upper quartile Median Lower quartile Min Boxplot that shows the outliers: Max Max non outlier Upper quartile Median Lower quartile Min non outlier Min Example 7. (cont d) n = 0 observations of lifetime (in hours) of an incadescent lamp Min = 61, Max = 101 61 63 666 744 883 898 964 970 983 1003 1016 10 109 1058 1085 1088 11 1135 1197 101 10

Median = 1009.5 Lower quartile = 890.5 Upper quartile = 1086.5 IQR = 196 Outliers? [LQ 1.5IQR, UQ + 1.5IQR] = [596.5, 1380.5] (No outliers) Lamp lifetime data Lamp lifetime data with one added observation 300 100 100 1000 1000 800 800 600 600 400 400 00 00 Side-by-side boxplot: helpful to compare distributions of data with multiple groups: Group 1 Group 11