Lecture Notes 2: Variables and graphics

Similar documents
Units. Exploratory Data Analysis. Variables. Student Data

Comparing Measures of Central Tendency *

Topic 3: Introduction to Statistics. Algebra 1. Collecting Data. Table of Contents. Categorical or Quantitative? What is the Study of Statistics?!

Stat 101 Exam 1 Important Formulas and Concepts 1

What is statistics? Statistics is the science of: Collecting information. Organizing and summarizing the information collected

1. Exploratory Data Analysis

P8130: Biostatistical Methods I

Chapter 2: Tools for Exploring Univariate Data

Probability Distributions

8/4/2009. Describing Data with Graphs

1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved.

Chapter 1. Looking at Data

Statistics lecture 3. Bell-Shaped Curves and Other Shapes

Chapter2 Description of samples and populations. 2.1 Introduction.

Announcements. Lecture 1 - Data and Data Summaries. Data. Numerical Data. all variables. continuous discrete. Homework 1 - Out 1/15, due 1/22

The science of learning from data.

QUANTITATIVE DATA. UNIVARIATE DATA data for one variable

ECLT 5810 Data Preprocessing. Prof. Wai Lam

Histograms allow a visual interpretation

Elementary Statistics

Further Mathematics 2018 CORE: Data analysis Chapter 2 Summarising numerical data

Lecture 1: Descriptive Statistics

Chapter 4. Displaying and Summarizing. Quantitative Data

Statistics 511 Additional Materials

Last Lecture. Distinguish Populations from Samples. Knowing different Sampling Techniques. Distinguish Parameters from Statistics

Lecture 1: Description of Data. Readings: Sections 1.2,

Introduction to Statistics

STT 315 This lecture is based on Chapter 2 of the textbook.

Performance of fourth-grade students on an agility test

Review for Exam #1. Chapter 1. The Nature of Data. Definitions. Population. Sample. Quantitative data. Qualitative (attribute) data

FREQUENCY DISTRIBUTIONS AND PERCENTILES

Descriptive statistics

MATH 10 INTRODUCTORY STATISTICS

1. Descriptive stats methods for organizing and summarizing information

Chapter 6 Group Activity - SOLUTIONS

Chapter 3. Data Description

Math 140 Introductory Statistics

Math 140 Introductory Statistics

Data Analysis and Statistical Methods Statistics 651

What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty.

MATH 1150 Chapter 2 Notation and Terminology

MATH 117 Statistical Methods for Management I Chapter Three

Data Analysis and Statistical Methods Statistics 651

Descriptive Statistics Methods of organizing and summarizing any data/information.

Discrete Multivariate Statistics

Probabilities and Statistics Probabilities and Statistics Probabilities and Statistics

A is one of the categories into which qualitative data can be classified.

Introduction to Statistics

STAT 200 Chapter 1 Looking at Data - Distributions

Shape, Outliers, Center, Spread Frequency and Relative Histograms Related to other types of graphical displays

Analytical Graphing. lets start with the best graph ever made

Multiple Choice. Chapter 2 Test Bank

Measures of. U4 C 1.2 Dot plot and Histogram 2 January 15 16, 2015

Statistics 301: Probability and Statistics Introduction to Statistics Module

STATISTICS 141 Final Review

Chapter 7: Statistics Describing Data. Chapter 7: Statistics Describing Data 1 / 27

Clinical Research Module: Biostatistics

Tastitsticsss? What s that? Principles of Biostatistics and Informatics. Variables, outcomes. Tastitsticsss? What s that?

CHAPTER 1. Introduction

22S:105 Statistical Methods and Computing. Graphical Depiction of Qualitative and Quantitative Data and Measures of Central Tendency

Descriptive Data Summarization

Part III: Unstructured Data. Lecture timetable. Analysis of data. Data Retrieval: III.1 Unstructured data and data retrieval

AP Final Review II Exploring Data (20% 30%)

Statistics in medicine

Chapters 1 & 2 Exam Review

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

3.1 Measure of Center

Variables, distributions, and samples (cont.) Phil 12: Logic and Decision Making Fall 2010 UC San Diego 10/18/2010

MATH 2560 C F03 Elementary Statistics I Lecture 1: Displaying Distributions with Graphs. Outline.

Univariate Descriptive Statistics for One Sample

Describing distributions with numbers

Sets and Set notation. Algebra 2 Unit 8 Notes

Lecture 6: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries)

Measuring Keepers S E S S I O N 1. 5 A

Types of Information. Topic 2 - Descriptive Statistics. Examples. Sample and Sample Size. Background Reading. Variables classified as STAT 511

Stat Lecture Slides Exploring Numerical Data. Yibi Huang Department of Statistics University of Chicago

Lecture 3B: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries)

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 3.1- #

Meelis Kull Autumn Meelis Kull - Autumn MTAT Data Mining - Lecture 03

BNG 495 Capstone Design. Descriptive Statistics

S1600 #2. Data Presentation #1. January 14, 2016

Chapter. Organizing and Summarizing Data. Copyright 2013, 2010 and 2007 Pearson Education, Inc.

Objective A: Mean, Median and Mode Three measures of central of tendency: the mean, the median, and the mode.

In this investigation you will use the statistics skills that you learned the to display and analyze a cup of peanut M&Ms.

Description of Samples and Populations

Math 082 Final Examination Review

Analytical Graphing. lets start with the best graph ever made

Chapter 2 Class Notes Sample & Population Descriptions Classifying variables

CIVL 7012/8012. Collection and Analysis of Information

Descriptive Statistics

Vocabulary: Samples and Populations

Descriptive Statistics C H A P T E R 5 P P

1. The following two-way frequency table shows information from a survey that asked the gender and the language class taken of a group of students.

Visualizing Data: Basic Plot Types

BIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke

Sem. 1 Review Ch. 1-3

= Stat1600 Solution to Midterm #1 Form D

Stat1600 Solution to Midterm #1 Form A

Chapter 1:Descriptive statistics

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency

Transcription:

Highlights: Lecture Notes 2: Variables and graphics Quantitative vs. qualitative variables Continuous vs. discrete and ordinal vs. nominal variables Frequency distributions Pie charts Bar charts Histograms and distribution shape Box plots 1

Variable (Data) Types Variables can be either qualitative or quantitative. Quantitative: Numeric - height, weight, number of customers, blood alcohol level Quantitative variables have values that we can do sensible math with. Numbers which do not represent quantities are not quantitative. Qualitative: Names or categories - eye color type of car, political affiliation, breed of dog. Sometimes qualitative variables are also referred to as categorical. Non-quantitative numbers are categorical.

Variable Levels Quantitative variables come in two levels, continuous and discrete: Quantitative Continuous: Numeric variables which can be given to an arbitrary number of decimal places. Typically, continuous variables are measured. Examples:

Variable Levels Quantitative Discrete: Numeric variables where only integer responses make sense. Typically, discrete variables are counted. Examples: Note that when a continuous variable is rounded to the nearest integer, it is still considered continuous. For instance, rounding temperature to the nearest integer is a common thing to do, but temperature is still considered continuous.

Variable Levels Qualitative variables also come in two levels, ordinal and nominal. Qualitative Ordinal: These are qualitative variables that are typically placed in a set order. If placing the values of a qualitative variable out of order would be confusing, then it should probably be treated as ordinal. Examples:

Variable Levels Qualitative Nominal: These are qualitative variables in which order does not matter. Most qualitative variables are nominal. Examples:

Data Graphics We will look at pie charts, bar charts, histograms, and box plots. All of the graphs we will look at show frequency distributions of data. Often this is shortened to just distribution. A distribution tells you the values a variable takes on, and the frequency with which those values are taken on. So, if we are interested in the distribution of blood types from a bank of donors, I could first show you the data like this

Blood Types from a Group of 77 Donors B B B B B B A B O O A AB B O B AB A B B B B AB B AB AB O B O AB AB A AB A AB AB O O AB O B AB A O A B B A B A AB B AB A O A B AB A AB B B B AB B B B O A B A B A A B A A AB (This is raw data, not a distribution)

or like this: Blood Type # of Donors A AB B O 18 18 30 11 This is a frequency distribution, because it tells you the different values that the variable Blood Type takes on, as well as how often it takes each value on.

I could show it to you like this: This is also a frequency distribution. Does the visual aspect help give meaning to the distribution?

Relative Frequency Distribution Sometimes it is useful to show relative frequency rather than just frequency. Relative frequency shows the different values a variable takes on, and how often it takes each value on as a proportion of the total. Proportions are often denoted as p, and given as: # of observations of inter relative freq. = p = total # of observations

Relative Frequency Distribution Relative frequency example: Blood Type # of Donors Relative frequency (p) A AB B O Total 18 18 30 11

Pie Charts Pie charts can be used to summarize one qualitative variable ST301 Student Attitudes Hate stats (20) Pie slices represent the proportion of observations in a class Sometimes frequency results are also included Like stats (43) Open mind (198) Survey results for students' attitudes towards statistics The more categories you have, the more difficult the pie chart will be to read.

Two pie charts Multiple pie charts can be used to compare two different groups. Here, the pie charts compare attitudes of females and males toward the appropriate punishment for murder. Often it is tough to make direct comparisons using pie charts. Life in prison (14) Death (22) Females Death (11) Males No opinion ( 2) Neither ( 1) Depends ( 1) No opinion ( 1) Neither ( 2) Depends ( 3) Life in prison ( 3)

Bar charts can be used wherever pie charts are used. Bar charts 50 38 Like pie charts, they are used to show the distribution of a qualitative variable. Number Caught 25 13 Each bar in a bar chart shows you the frequency (or count) for the group it is associated with. 0 Brown Brook Rainbow Cutthroat Lrg.Mth Small Mth. Species Walleye Salmon Sunfish Bluegill Perch The chart above shows the frequency of catches for different species of fish.

Bar charts vs. pie charts Here is the intro stats grade distribution bar chart from before, alongside a pie chart of the same data. Bar charts make comparing categories easier For example, It isn t immediately obvious that the A slice is the same size as the AB slice. But it is obvious that the A bar is the same height as the AB bar.

The Histogram Histogram of Stat311 Heights (Inches) Frequency 0 5 10 15 60 65 70

The Histogram A histogram displays the distribution of a quantitative variable. The difference between a histogram and a bar chart is that bar charts are for qualitative data and histograms are for quantitative data. With bar charts, each bar represents a different distinct group. With histograms, each bar represents the number of observations which fall into an interval, also known as a bin.

The Histogram Note that the number of bins on a histogram is arbitrary. The larger the bin size, the fewer bins there will be. Changing the number of bins can produce different looking histograms, even if the underlying data is exactly the same. The following four histograms represent the exact same data:

Frequency 0 10 20 30 40 Frequency 0 20 40 60 80 0 10 20 30 40 50 60 0 10 20 30 40 50 60 Frequency 0 10 20 30 40 50 60 Frequency 0 2 4 6 8 10 12 14 0 10 20 30 40 50 60 0 10 20 30 40 50 60 Note: If an observation falls directly on a bin endpoint, it is typical to place that observation in the bar to the left of its value. But this is not a hard and fast rule.

Histogram example Let s briefly construct two different histograms that represent the same simple dataset below: Heights of 10 randomly selected statistics students 65 67 66 69 69 66 64 64 63 72

Distribution Shape Looking at a histogram allows us to discern a distribution s shape. When there are lots of low values and just a few high values the distribution is said to be skewed to the right, or positively skewed

Distribution Shape When there are lots of high values and just a few low values the distribution is said to be skewed to the left, or negatively skewed The skewedness of this histogram is not as dramatic compared to that of the previous histogram.

Distribution Shape When the two halves of the histogram look approximately like mirror images the distribution is said to be (almost) symmetrical. We say almost symmetrical because it is unlikely that a histogram of data will be perfectly symmetrical.

Distribution Shape When there are two peaks in a histogram, we say that the data is bimodal The mode is the most common value in a distribution. Bimodality may indicate that there are two distinct groups being combined into one dataset. Frequency 0 5 10 15 20 25 Histogram of Heights (Inches) 60 65 70 75

Boxplots Like histograms, boxplots are used to display the distribution of a quantitative variable. The shape of the distribution as well as the presence of any possible outliers is easily discerned from the boxplot. These outliers are drawn as dots. Boxplots are also useful for comparing multiple groups of data side by side.

Boxplot graphics The boxplot is sometimes called the box and whiskers plot. In a boxplot, half the data lies above the thick black line and half lies below it. Also, half the data lies inside the box, and half lies outside The dots are outliers. We will discuss boxplots in detail in the next set of notes

Graphic Summaries Graphs are tools that allow us to give meaning to a set of data. They should give the reader a better understanding of what is going on than can be achieved by just looking at the raw data. A picture is worth a thousand words. In the case of statistics, a picture is also worth a whole lot of numbers. In the next set of notes, we will discuss some common statistics that can be used to summarize a set of data.