Analytical Graphing. lets start with the best graph ever made

Similar documents
Analytical Graphing. lets start with the best graph ever made

Introduction to hypothesis testing

What is statistics? Statistics is the science of: Collecting information. Organizing and summarizing the information collected

AP Final Review II Exploring Data (20% 30%)

Chapter2 Description of samples and populations. 2.1 Introduction.

Elementary Statistics

Stat 101 Exam 1 Important Formulas and Concepts 1

Talking feet: Scatterplots and lines of best fit

Descriptive Statistics-I. Dr Mahmoud Alhussami

Chapter 2: Tools for Exploring Univariate Data

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved.

Units. Exploratory Data Analysis. Variables. Student Data

MATH 1150 Chapter 2 Notation and Terminology

Lecture Notes 2: Variables and graphics

Chapter 1. Looking at Data

STT 315 This lecture is based on Chapter 2 of the textbook.

Comparing Measures of Central Tendency *

Vocabulary: Data About Us

Descriptive statistics

Descriptive Data Summarization

Statistics in medicine

Worksheet 2 - Basic statistics

What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty.

Performance of fourth-grade students on an agility test

Last Lecture. Distinguish Populations from Samples. Knowing different Sampling Techniques. Distinguish Parameters from Statistics

Data Analysis and Statistical Methods Statistics 651

Resistant Measure - A statistic that is not affected very much by extreme observations.

Descriptive Statistics C H A P T E R 5 P P

MODULE 9 NORMAL DISTRIBUTION

Chapter 1 Statistical Inference

Glossary for the Triola Statistics Series

Math Review Sheet, Fall 2008

3.1 Measure of Center

Turning a research question into a statistical question.

1 Measures of the Center of a Distribution

In this investigation you will use the statistics skills that you learned the to display and analyze a cup of peanut M&Ms.

FRANKLIN UNIVERSITY PROFICIENCY EXAM (FUPE) STUDY GUIDE

Chapter 2 Class Notes Sample & Population Descriptions Classifying variables

- measures the center of our distribution. In the case of a sample, it s given by: y i. y = where n = sample size.

Chapter 1:Descriptive statistics

Chapter 7: Statistics Describing Data. Chapter 7: Statistics Describing Data 1 / 27

Visualizing Data: Basic Plot Types

SESSION 5 Descriptive Statistics

BIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke

Vocabulary: Samples and Populations

OPEN GEODA WORKSHOP / CRASH COURSE FACILITATED BY M. KOLAK

Essentials of Statistics and Probability

Business Statistics. Lecture 10: Course Review

MATH 10 INTRODUCTORY STATISTICS

1. Exploratory Data Analysis

The science of learning from data.

Prentice Hall Stats: Modeling the World 2004 (Bock) Correlated to: National Advanced Placement (AP) Statistics Course Outline (Grades 9-12)

AIM HIGH SCHOOL. Curriculum Map W. 12 Mile Road Farmington Hills, MI (248)

Chapter 4. Displaying and Summarizing. Quantitative Data

Introduction to Linear regression analysis. Part 2. Model comparisons

A SHORT INTRODUCTION TO PROBABILITY

Graphing Data. Example:

ADMS2320.com. We Make Stats Easy. Chapter 4. ADMS2320.com Tutorials Past Tests. Tutorial Length 1 Hour 45 Minutes

Notes 3: Statistical Inference: Sampling, Sampling Distributions Confidence Intervals, and Hypothesis Testing

CS 361: Probability & Statistics

(quantitative or categorical variables) Numerical descriptions of center, variability, position (quantitative variables)

Visual Display of Information

GRACEY/STATISTICS CH. 3. CHAPTER PROBLEM Do women really talk more than men? Science, Vol. 317, No. 5834). The study

Nemours Biomedical Research Statistics Course. Li Xie Nemours Biostatistics Core October 14, 2014

M 140 Test 1 B Name (1 point) SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 75

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics

Announcements. Lecture 1 - Data and Data Summaries. Data. Numerical Data. all variables. continuous discrete. Homework 1 - Out 1/15, due 1/22

PS5: Two Variable Statistics LT3: Linear regression LT4: The test of independence.

Further Mathematics 2018 CORE: Data analysis Chapter 2 Summarising numerical data

Psych Jan. 5, 2005

For instance, we want to know whether freshmen with parents of BA degree are predicted to get higher GPA than those with parents without BA degree.

Bio 183 Statistics in Research. B. Cleaning up your data: getting rid of problems

Sampling, Frequency Distributions, and Graphs (12.1)

F78SC2 Notes 2 RJRC. If the interest rate is 5%, we substitute x = 0.05 in the formula. This gives

STAT 200 Chapter 1 Looking at Data - Distributions

Math 140 Introductory Statistics

Math 140 Introductory Statistics

Statistics Handbook. All statistical tables were computed by the author.

Review for Exam #1. Chapter 1. The Nature of Data. Definitions. Population. Sample. Quantitative data. Qualitative (attribute) data

A is one of the categories into which qualitative data can be classified.

Statistics lecture 3. Bell-Shaped Curves and Other Shapes

Bag RED ORANGE GREEN YELLOW PURPLE Candies per Bag

Statistics, continued

PS2.1 & 2.2: Linear Correlations PS2: Bivariate Statistics

Index I-1. in one variable, solution set of, 474 solving by factoring, 473 cubic function definition, 394 graphs of, 394 x-intercepts on, 474

TOPIC: Descriptive Statistics Single Variable

Stat Lecture Slides Exploring Numerical Data. Yibi Huang Department of Statistics University of Chicago

16.400/453J Human Factors Engineering. Design of Experiments II

Survey on Population Mean

This is particularly true if you see long tails in your data. What are you testing? That the two distributions are the same!

Describing Data 247. Color Frequency Blue 25 Green 52 Red 41 White 36 Black 39 Grey 23

Dover- Sherborn High School Mathematics Curriculum Probability and Statistics

Chapters 1 & 2 Exam Review

SPSS and its usage 2073/06/07 06/12. Dr. Bijay Lal Pradhan Dr Bijay Lal Pradhan

A C E. Answers Investigation 4. Applications

CIVL 7012/8012. Collection and Analysis of Information

NON-PARAMETRIC STATISTICS * (

STP 420 INTRODUCTION TO APPLIED STATISTICS NOTES

Sociology 6Z03 Review I

Transcription:

Analytical Graphing lets start with the best graph ever made Probably the best statistical graphic ever drawn, this map by Charles Joseph Minard portrays the losses suffered by Napoleon's army in the Russian campaign of 1812. Beginning at the Polish-Russian border, the thick band shows the size of the army at each position. The path of Napoleon's retreat from Moscow in the bitterly cold winter is depicted by the dark lower band, which is tied to temperature and time scales. The graph illustrates an amazing point how an army of 4, can dwindle to, without losing a single major battle. 1

When is a graph appropriate? Always for data exploration Often for data analysis and to develop predictions (models) and experimental designs Sometimes for presentations Less often for publications Data Exploration Is not snooping in the pejorative sense. Exploration is a necessary and desired operation for: Checking data for unusual values Making sure the data meet the assumptions of the chosen form of analysis Eg normality, homogeneity of variances, linearity (in regression approaches) deciding (sometimes) what sort of analysis to do. This hopefully will have been done prior to initiating a study To look for patterns that may not be expected or apparent this is indeed snooping but it is an essential part of hypothesis formation 2

Count Count Data Exploration Checking data for unusual values Making sure the data meet the assumptions of the chosen form of analysis See ourworld - pop_86 Determining distributions and outliers Will a transformation help?? 5 15 5.8 4 3.7.6.5.4.3.2.1 5. 15 Population of countries (1986) Proportion per Bar 15 5 1....3.2.1. 1... Population of countries (1986) Proportion per Bar 3

Data Exploration Is not snooping in the pejorative sense. Exploration is a necessary and desired operation for: Checking data for unusual values Making sure the data meet the assumptions of the chosen form of analysis Eg normality, homogeneity of variances, linearity (in regression approaches) The relationship between birth and death rates (ourworld) Is it linear, or is there perhaps a more appropriate model 3 DEATH_82 3 4 5 BIRTH_82 4

Clearly not linear using LOESS procedure (locally weighted scatterplot smoothing): a non-parametric regression method that combines multiple regression models in a k-nearest-neighbor-based meta-mode 3 DEATH_82 3 4 5 BIRTH_82 When is a graph appropriate? Often for data analysis (e.g.) To understand the nature of interaction terms (more later) To understand the power of a test. Say we wanted to determine sample size for an experiment where we thought the response would be around (alternate Hypothesis =) the standard deviation about 8 and we were willing to relax alpha (from.5 to.) 5

For example the effect of relaxing alpha on power Pop. Mean = Alternative = SD = 8 Alpha=.5,. Power 1..9.8.7.6.5.4.3 Power Curve (Alpha =.5).2 5 15 25 3 35 Sample Size (per cell) Power 1..9.8.7.6.5.4 Power Curve (Alpha =.).3 5 15 Sample Size (per cell) When is a graph appropriate? Sometimes for presentations Idea is to communicate information quickly Be sure you know why you are presenting the graph (is it to convey stats or some other information (we will talk about this more later) Graphs should be simple and not contain too much information never have a graph that is not interpretable So many factors involved that no one could figure it out, or worse 6

I know you can t really see this but. GN P _ 8 6 GD P _ C A P L OG_ GD P E D U C _ 8 4 E DUC HE A L T H8 4 HE A L T H P OP _ 1 9 8 3P OP _ 1 9 8 6P OP _ 1 9 9 P OP _ 2 2 B I R T H _ 8 2 B I R T H _ R T D E A T H _ 8 2D E A T H _ R TB A B Y M T 8 B 2 A B Y M OR TL I F E _ E X P GN P _ 8 2 GN P _ 8 6 GD P _ C A P L OG_ GD P E D U C _ 8 4 E DUC HE A L T H8 4 HE A L T H P OP _1983 P OP _1983 P OP _ 1 9 8 3P OP _ 1 9 8 6P OP _ 1 9 9 P OP _ 2 2 B I R T H _ 8 2 B I R T H _ R T D E A T H _ 8 2D E A T H _ R TB A B Y M T 8 B 2 A B Y M OR TL I F E _ E X P GN P _ 8 2 P OP _1986 P OP _1986 P OP _199 P OP _199 D E A TH _82 B A B Y MOR T B A B Y MT82 D E A TH _R T D E A TH _82 B I R TH _R T B I R TH _R T B I R TH _82 B I R TH _82 P OP _ P OP _ H E A LTH H E A LTH H E A LTH 84 H E A LTH 84 E DUC E DUC E D U C _84 E D U C _84 LOG_GD P LOG_GD P GD P _C A P GD P _C A P GN P _86 GN P _86 GN P _82 GN P _82 LI FE _E X P D E A TH _R T B A B Y MT82 B A B Y MOR T LI FE _E X P These are usually presented to demonstrate how much work the researcher has done really conveys that he or she has not adequately prepared the presentation When is a graph appropriate? Less often for publications Idea is to communicate information that is too complex to leave in tables or text They typically depict rather than present information (you have to read across to axes to get numbers). Hence if precise bits of information are important to the argument being made use tables. If a graph is presented it must be important to the argument being made in the text (no fluff graphs) Information cannot be presented twice (eg table and figure, text and figure) If a graph is presented it must be interpretable You should be able to understand the purpose and content of the figure directly from the legend. 7

Basics of analytical graph theory Graph types imply a basis of logic and are not always interchangeable Even interchangeable graph types are not always equivalent (some are just non-informative) Be very clear about what you are trying to convey: models, stats or data structure Graph construction (axes, scales etc) may obscure or make clear the points you are trying to make Graph trickery is usually just that and typically subtracts from the depiction Graph types imply a basis of logic and are not always interchangeable Summary Charts Density Charts Scatterplots, quantile plots and probability plots 8

Summary Charts There are a series of general graphical displays useful for characterizing the relationship between independent variables (usually categorical) and summary statistics of dependent variables (usually continuous). An example would be a bar graph of the relationship between education and income (see survey2 data). Some types of summary charts: Examples of continuous and categorical variables Categorical Gender (male, female) Nationality (French, Italian) Species (Human, Chimp) Color (red, green, blue) Age Group (Young, Old) Height Group (Short, Medium, Tall) Weight Group (Thin, Obese) Speed (Fast, Slow) Continuous Hormone level Location (Latitude, Longitude) Phylogenetic distance Color (wavelength) Age (years, days) Height (cm, inches) Weight (grams, pounds) Speed (cm/sec) Temperature (Cold, Warm) Temperature (degrees C) 9

7 Bar Dot Line 7 7 6 6 6 5 5 5 4 3 4 3 4 3 7 Profile Pyramid Pie 7 6 5 6 5 4 3 4 3 Which conveys the information most clearly how about the comparisons of interest 7 6 5 4 3 SEX Female Male Female SEX Male 7 6 5 4 3 7 6 5 4 3 3 4 5 6 7 SEX Female Male

Density Charts The density of a sample is the relative concentration of data points in intervals across the range of the distribution. A histogram is one way to display the density of a quantitative variable; box plots, dot or symmetric dot density, frequency polygons, fuzzygrams, jitter plots, density stripes, and histograms with data-driven bar widths are others. Histogram Length (mm) 11

Features of a BOXPLOT Rather than comparing sample values to the normal distribution (mean, standard deviation, and so on), box plots show robust (what does this mean) statistics (median, quartiles, and so on). confidence interval hinge median hinge outliers mean 25% 25% 25% 25% Smallest 5% Statistical Range Y Raw Data Plots:e.g. Scatterplots, Scatterplots are probably the most common form of graphical display. The key feature of scatterplots is that raw data are plotted (in contrast to summary data as in summary charts). Regression lines with confidence bands or smoothers (e.g. linear, non-linear) can be added to help explain relationships among variables. An example is the relationship between mussel height, and length and mussel height and mussel mass. How to estimate length and mass of mussels? 12

Height Length Non-linear and linear smoothing Each point is a mussel 13

Scatterplots, quantile plots and probability plots Quantile plots and probability plots are useful for studying the distribution of a variable. Quantile Plots produces quantile plots, or Q plots. Unlike probability plots, which compare a sample to a theoretical probability distribution, a quantile plot compares a sample to its own quantiles (a one-sample plot) or to another sample (a two-sample, or Q-Q, plot). The quantile of a sample is the data point corresponding to a given fraction of the data. See ourworld (pop_1986 ) Features of a Quantile Plot Distribution of data Fraction of Data 1..9.8.7.6.5.4.3.2.1 86% of countries had populations less or equal to 5 million people. 5 15 POP_1986 Distribution of quantiles (should be uniform but is subject to sample size) 14

Scatterplots, quantile plots and probability plots A Probability Plot plots the values of a variable against the corresponding percentage points of a theoretical distribution--normal, chi-square, t, F, uniform, binomial, logistic, exponential, gamma, Weibull, or Studentized range. Graphs like this are called probability plots, or P plots. You can also plot the expected values of one variable against those of another (P-P plot). These graphs are very important for determining if data are in need of transformation. See ourworld (pop_1986 ) Features of a Probability Plot No transformation Log (base ) transformation 15

Extra slides Basics of analytical graph theory Graph types imply a basis of logic and are not always interchangeable Even interchangeable graph types are not always equivalent (some are just non-informative) Be very clear about what you are trying to convey: models, stats or data structure Graph construction (axes, scales etc) may obscure or make clear the points you are trying to make Graph trickery is usually just that and typically subtracts from the depiction 16

The underlying basis of the graph There are two general bases for any data graph that will be presented or published. To display data (hopefully in the most efficient way) To convey information about statistics associated with the data Both of the above Although these may not appear to present a conflict often times there is here is an example Error Bars Be very Careful - error bars convey meaning - at least two sorts Estimate of variability for subjects in that category, irrespective of strata or statistical assumptions Of use for showing spread in sampled data Of no use for conveying inferential statistics Estimate of variability for subjects in that category, with respect to strata and statistical assumptions Of no use for showing spread in sampled data Of use for conveying inferential statistics See typing 17

How and why are these two graphs different? 8 without respect to strata and statistical assumptions 78 Least Squares Means with respect to strata and statistical assumptions 7 68 SPEED 6 SPEED 58 5 48 4 electric plain old EQUIPMNT word process 38 electric plain old EQUIPMNT word process Basics of analytical graph theory Graph types imply a basis of logic and are not always interchangeable Even interchangeable graph types are not always equivalent (some are just non-informative) Be very clear about what you are trying to convey: models, stats or data structure Graph construction (axes, scales etc) may obscure or make clear the points you are trying to make Graph trickery is usually just that and typically subtracts from the depiction 18

Which is best? 5 4 3 SEX Female Male 19