Analytical Graphing. lets start with the best graph ever made

Similar documents
Analytical Graphing. lets start with the best graph ever made

Introduction to hypothesis testing

What is statistics? Statistics is the science of: Collecting information. Organizing and summarizing the information collected

AP Final Review II Exploring Data (20% 30%)

Elementary Statistics

Talking feet: Scatterplots and lines of best fit

Chapter2 Description of samples and populations. 2.1 Introduction.

Vocabulary: Data About Us

Stat 101 Exam 1 Important Formulas and Concepts 1

Units. Exploratory Data Analysis. Variables. Student Data

1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved.

MATH 1150 Chapter 2 Notation and Terminology

Chapter 2: Tools for Exploring Univariate Data

Resistant Measure - A statistic that is not affected very much by extreme observations.

Performance of fourth-grade students on an agility test

Descriptive statistics

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Graphing Data. Example:

What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty.

STT 315 This lecture is based on Chapter 2 of the textbook.

Descriptive Statistics-I. Dr Mahmoud Alhussami

Chapter 1. Looking at Data

Lecture Notes 2: Variables and graphics

Vocabulary: Samples and Populations

Worksheet 2 - Basic statistics

Comparing Measures of Central Tendency *

3.1 Measure of Center

AIM HIGH SCHOOL. Curriculum Map W. 12 Mile Road Farmington Hills, MI (248)

STAT 200 Chapter 1 Looking at Data - Distributions

M 140 Test 1 B Name (1 point) SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 75

MATH 10 INTRODUCTORY STATISTICS

A SHORT INTRODUCTION TO PROBABILITY

Descriptive Statistics C H A P T E R 5 P P

Data Analysis and Statistical Methods Statistics 651

Descriptive Data Summarization

Essentials of Statistics and Probability

SESSION 5 Descriptive Statistics

Statistical Concepts. Constructing a Trend Plot

1.3.1 Measuring Center: The Mean

Chapter 1 Statistical Inference

Statistics in medicine

In this investigation you will use the statistics skills that you learned the to display and analyze a cup of peanut M&Ms.

A C E. Answers Investigation 4. Applications

Glossary for the Triola Statistics Series

Collecting and Reporting Data

MA30S APPLIED UNIT F: DATA MANAGEMENT CLASS NOTES

Background to Statistics

Sem. 1 Review Ch. 1-3

Last Lecture. Distinguish Populations from Samples. Knowing different Sampling Techniques. Distinguish Parameters from Statistics

PS5: Two Variable Statistics LT3: Linear regression LT4: The test of independence.

Review for Exam #1. Chapter 1. The Nature of Data. Definitions. Population. Sample. Quantitative data. Qualitative (attribute) data

Chapter 7: Statistics Describing Data. Chapter 7: Statistics Describing Data 1 / 27

OPEN GEODA WORKSHOP / CRASH COURSE FACILITATED BY M. KOLAK

AP Statistics Summer Assignment

Instructor: Doug Ensley Course: MAT Applied Statistics - Ensley

Math 140 Introductory Statistics

Math 140 Introductory Statistics

Sampling, Frequency Distributions, and Graphs (12.1)

MODULE 9 NORMAL DISTRIBUTION

Further Mathematics 2018 CORE: Data analysis Chapter 2 Summarising numerical data

ST Presenting & Summarising Data Descriptive Statistics. Frequency Distribution, Histogram & Bar Chart

Visualizing Data: Basic Plot Types

Prentice Hall Stats: Modeling the World 2004 (Bock) Correlated to: National Advanced Placement (AP) Statistics Course Outline (Grades 9-12)

TOPIC: Descriptive Statistics Single Variable

Topic 3: Introduction to Statistics. Algebra 1. Collecting Data. Table of Contents. Categorical or Quantitative? What is the Study of Statistics?!

Math Review Sheet, Fall 2008

Statistics lecture 3. Bell-Shaped Curves and Other Shapes

1. Exploratory Data Analysis

Chapters 1 & 2 Exam Review

Introduction to Statistics

Final Exam - Solutions

For instance, we want to know whether freshmen with parents of BA degree are predicted to get higher GPA than those with parents without BA degree.

F78SC2 Notes 2 RJRC. If the interest rate is 5%, we substitute x = 0.05 in the formula. This gives

The science of learning from data.

Chapter 4. Displaying and Summarizing. Quantitative Data

ADMS2320.com. We Make Stats Easy. Chapter 4. ADMS2320.com Tutorials Past Tests. Tutorial Length 1 Hour 45 Minutes

Turning a research question into a statistical question.

BIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke

1 Measures of the Center of a Distribution

Probability Distributions

Visual Display of Information

Describing distributions with numbers

Exercises from Chapter 3, Section 1

Chapter 1:Descriptive statistics

(quantitative or categorical variables) Numerical descriptions of center, variability, position (quantitative variables)

FRANKLIN UNIVERSITY PROFICIENCY EXAM (FUPE) STUDY GUIDE

MEASURING THE SPREAD OF DATA: 6F

Data Analysis and Statistical Methods Statistics 651

- measures the center of our distribution. In the case of a sample, it s given by: y i. y = where n = sample size.

The following formulas related to this topic are provided on the formula sheet:

Section 5.4. Ken Ueda

Description of Samples and Populations

Survey on Population Mean

2. Graphing Practice. Warm Up

CIVL 7012/8012. Collection and Analysis of Information

Chapter 5. Understanding and Comparing. Distributions

Chapter 2 Class Notes Sample & Population Descriptions Classifying variables

Business Statistics. Lecture 10: Course Review

Statistics Handbook. All statistical tables were computed by the author.

Chapter 3. Measuring data

Transcription:

Analytical Graphing lets start with the best graph ever made Probably the best statistical graphic ever drawn, this map by Charles Joseph Minard portrays the losses suffered by Napoleon's army in the Russian campaign of 1812. Beginning at the Polish-Russian border, the thick band shows the size of the army at each position. The path of Napoleon's retreat from Moscow in the bitterly cold winter is depicted by the dark lower band, which is tied to temperature and time scales. The graph illustrates an amazing point how an army of, can dwindle to, without losing a single major battle. 1

When is a graph appropriate? Always for data exploration Often for data analysis and to develop predictions (models) and experimental designs Sometimes for presentations Less often for publications Data Exploration Is not snooping in the pejorative sense. Exploration is a necessary and desired operation for: Checking data for unusual values Making sure the data meet the assumptions of the chosen form of analysis Eg normality, homogeneity of variances, linearity (in regression approaches) deciding (sometimes) what sort of analysis to do. This hopefully will have been done prior to initiating a study To look for patterns that may not be expected or apparent this is indeed snooping but it is an essential part of hypothesis formation 2

Count Count Data Exploration Checking data for unusual values Making sure the data meet the assumptions of the chosen form of analysis See ourworld - pop_86 Determining distributions and outliers Will a transformation help?? 1.8.7.6.5.4.3.2.1. 1 Population of countries (1986) Proportion per Bar 15 5 1....3.2.1. 1... Population of countries (1986) Proportion per Bar 3

Data Exploration Is not snooping in the pejorative sense. Exploration is a necessary and desired operation for: Checking data for unusual values Making sure the data meet the assumptions of the chosen form of analysis Eg normality, homogeneity of variances, linearity (in regression approaches) The relationship between birth and death rates (ourworld) Is it linear, or is there perhaps a more appropriate model DEATH_82 BIRTH_82 4

Clearly not linear using LOESS procedure (locally weighted scatterplot smoothing): a non-parametric regression method that combines multiple regression models in a k-nearest-neighbor-based meta-mode DEATH_82 BIRTH_82 When is a graph appropriate? Often for data analysis (e.g.) To understand the nature of interaction terms (more later) To understand the power of a test. Say we wanted to determine sample size for an experiment where we thought the response would be around (alternate Hypothesis =) the standard deviation about 8 and we were willing to relax alpha (from.5 to.) 5

For example the effect of relaxing alpha on power Pop. Mean = Alternative = SD = 8 Alpha=.5,. Power 1..9.8.7.6.5.4.3 Power Curve (Alpha =.).2 5 15 25 35 Sample Size (per cell) Power 1..9.8.7.6.5.4 Power Curve (Alpha =.).3 5 15 Sample Size (per cell) When is a graph appropriate? Sometimes for presentations Idea is to communicate information quickly Be sure you know why you are presenting the graph (is it to convey stats or some other information (we will talk about this more later) Graphs should be simple and not contain too much information never have a graph that is not interpretable So many factors involved that no one could figure it out, or worse 6

I know you can t really see this but. P OP _1986 P OP _199 P OP _ B IR TH _82 B IR TH _R T D E A TH _82 D E A TH _R T B A B Y MT82 B A B Y MOR T LIFE _E X P GN P _82 GN P _86 GD P _C A P LOG_GD P E D U C _84 E DUC H E A LTH 84 H E A LTH P OP _1983 P OP _1986 P OP _199 P OP _ B IR TH _82 B IR TH _R T D E A TH _82 D E A TH _R T B A B Y MT82 B A B Y MOR T LIFE _E X P GN P _82 GN P _86 GD P _C A P LOG_GD P E D U C _84 E DUC H E A LTH 84 H E A LTH POP_1983 POP_1983 P OP _1983 POP_1986 POP_1986 POP_199 POP_199 BIR TH _R T BIR TH _R T BIR TH _82 BIR TH _82 POP_ POP_ D EATH _R T D EATH _R T D EATH _82 D EATH _82 BABYMT82 BABYMT82 BABYMOR T BABYMOR T ED U C ED U C ED U C _84 ED U C _84 LOG_GD P LOG_GD P GD P_C AP GD P_C AP GN P_86 GN P_86 GN P_82 GN P_82 LIFE_EXP LIFE_EXP H EALTH 84 H EALTH 84 H EALTH H EALTH These are usually presented to demonstrate how much work the researcher has done really conveys that he or she has not adequately prepared the presentation When is a graph appropriate? Less often for publications Idea is to communicate information that is too complex to leave in tables or text They typically depict rather than present information (you have to read across to axes to get numbers). Hence if precise bits of information are important to the argument being made use tables. If a graph is presented it must be important to the argument being made in the text (no fluff graphs) Information cannot be presented twice (eg table and figure, text and figure) If a graph is presented it must be interpretable You should be able to understand the purpose and content of the figure directly from the legend. 7

Basics of analytical graph theory Graph types imply a basis of logic and are not always interchangeable Even interchangeable graph types are not always equivalent (some are just non-informative) Be very clear about what you are trying to convey: models, stats or data structure Graph construction (axes, scales etc) may obscure or make clear the points you are trying to make Graph trickery is usually just that and typically subtracts from the depiction Graph types imply a basis of logic and are not always interchangeable Summary Charts Density Charts Scatterplots, quantile plots and probability plots 8

Summary Charts There are a series of general graphical displays useful for characterizing the relationship between independent variables (usually categorical) and summary statistics of dependent variables (usually continuous). An example would be a bar graph of the relationship between education and income (see survey2 data). Some types of summary charts: Examples of continuous and categorical variables Categorical Gender (male, female) Nationality (French, Italian) Species (Human, Chimp) Color (red, green, blue) Age Group (Young, Old) Height Group (Short, Medium, Tall) Weight Group (Thin, Obese) Speed (Fast, Slow) Continuous Hormone level Location (Latitude, Longitude) Phylogenetic distance Color (wavelength) Age (years, days) Height (cm, inches) Weight (grams, pounds) Speed (cm/sec) Temperature (Cold, Warm) Temperature (degrees C) 9

7 Bar Dot Line 7 7 6 6 6 INCOME INCOME INCOME 7 no grad hs hs grad some college college grad no grad hs hs grad some college college grad no grad hs hs grad some college EDUC EDUC EDUC Profile Pyramid Pie 7 college grad 6 6 hs grad INCOME INCOME no grad hs some college no grad hs hs grad some college EDUC college grad no grad hs hs grad some college EDUC college grad college grad Which conveys the information most clearly how about the comparisons of interest 7 6 INCOME no grad hs hs grad some college EDUC college grad SEX Female Male no grad hs hs grad Female some college EDUC college grad SEX Male no grad hs hs grad some college EDUC college grad 7 6 INCOME INCOME INCOME 7 6 6 7 no grad hs hs grad some college EDUC college grad SEX Female Male

Density Charts The density of a sample is the relative concentration of data points in intervals across the range of the distribution. A histogram is one way to display the density of a quantitative variable; box plots, dot or symmetric dot density, frequency polygons, fuzzygrams, jitter plots, density stripes, and histograms with data-driven bar widths are others. Histogram Length (mm) 11

Features of a BOXPLOT Rather than comparing sample values to the normal distribution (mean, standard deviation, and so on), box plots show robust (what does this mean) statistics (median, quartiles, and so on). confidence interval hinge median hinge outliers mean 25% 25% 25% 25% Smallest % Statistical Range Y Raw Data Plots:e.g. Scatterplots, Scatterplots are probably the most common form of graphical display. The key feature of scatterplots is that raw data are plotted (in contrast to summary data as in summary charts). Regression lines with confidence bands or smoothers (e.g. linear, non-linear) can be added to help explain relationships among variables. An example is the relationship between mussel height, and length and mussel height and mussel mass. How to estimate length and mass of mussels? 12

Height Length Non-linear and linear smoothing Each point is a mussel 13

Scatterplots, quantile plots and probability plots Quantile plots and probability plots are useful for studying the distribution of a variable. Quantile Plots produces quantile plots, or Q plots. Unlike probability plots, which compare a sample to a theoretical probability distribution, a quantile plot compares a sample to its own quantiles (a one-sample plot) or to another sample (a two-sample, or Q-Q, plot). The quantile of a sample is the data point corresponding to a given fraction of the data. See ourworld (pop_1986 ) Features of a Quantile Plot Distribution of data Fraction of Data 1..9.8.7.6.5.4.3.2.1 86% of countries had populations less or equal to million people. 1 POP_1986 Distribution of quantiles (should be uniform but is subject to sample size) 14

Scatterplots, quantile plots and probability plots A Probability Plot plots the values of a variable against the corresponding percentage points of a theoretical distribution--normal, chi-square, t, F, uniform, binomial, logistic, exponential, gamma, Weibull, or Studentized range. Graphs like this are called probability plots, or P plots. You can also plot the expected values of one variable against those of another (P-P plot). These graphs are very important for determining if data are in need of transformation. See ourworld (pop_1986 ) Features of a Probability Plot No transformation Log (base ) transformation 15

Lets Play Activity 1: Graph construction Draw the most appropriate graph, given the data set and type provided Think about the nature of the information and how best to depict the information. Label both the x and y axis. Use appropriate scales for both axes. Think about the number of ticks on axis and labeling of tick marks Make sure the elements (points, bars, lines etc), are crafted in a way that simplifies interpretation (think about, color, pattern, shape of elements, whether or not to depict a trend) Provide a figure legend that is descriptive: the reader should be able to interpret the figure based on the graph and legend 16

Age (years) Average size of Seastars (Pisaster) over time Diameter (mm) 1 35 2 65 3 92 4 116 5 138 6 158 7 176 8 192 9 6 219 Total commercial abalone landings (pounds)over time in California Year Abalone Landings 1973 3,187,76 1974 2,587,8 1975 2,128,545 1976 1,7,111 1977 1,434,5 1978 1,292,517 1979 989,124 198 1,238,495 1981 1,9,463 1982 1,2,443 1983 8,25 1984 826,514 1985 823,931 1986 614,962 1987 762,951 1988 568,716 1989 741,1 199 523,942 1991 38,593 1992 514,8 1993 461,3 1994 2,596 1995 262,314 1996 229,379 1997 112,323 17

The relationship between time to run a mile and maximum oxygen consumption VO2 max (oxygen consumption, ml/(kg min) ) Runtime (minutes per mile) 59.57 8.17 6.6 8.63 54.3 8.65 54.63 8.92 49.16 8.95 49.87 9.22 48.67 9.4 45.44 9.63.55 9.93 46.67 45.31.7.39.8.54.13 46.77.25 51.86.33 45.79.47 47.47.5 47.27.6 49.9.85.84.95 45.12 11.8 44.75 11.12 46.8 11.17 44.61 11.37 47.92 11.5 44.81 11.63 45.68 11.95 39.41 12.63 39.2 12.88 39.44 13.8 37.39 14.3 Size distribution of Limpets Limpet size (mm) 6 7 18

Two variables: Number of Blue whales as a function of period and location Southern North Hemisphere Pacific North Atlantic Prewhaling ~175, 4,9 1, Current ~2, ~2, ~ Extra slides 19

Basics of analytical graph theory Graph types imply a basis of logic and are not always interchangeable Even interchangeable graph types are not always equivalent (some are just non-informative) Be very clear about what you are trying to convey: models, stats or data structure Graph construction (axes, scales etc) may obscure or make clear the points you are trying to make Graph trickery is usually just that and typically subtracts from the depiction The underlying basis of the graph There are two general bases for any data graph that will be presented or published. To display data (hopefully in the most efficient way) To convey information about statistics associated with the data Both of the above Although these may not appear to present a conflict often times there is here is an example

Error Bars Be very Careful - error bars convey meaning - at least two sorts Estimate of variability for subjects in that category, irrespective of strata or statistical assumptions Of use for showing spread in sampled data Of no use for conveying inferential statistics Estimate of variability for subjects in that category, with respect to strata and statistical assumptions Of no use for showing spread in sampled data Of use for conveying inferential statistics See typing How and why are these two graphs different? 8 without respect to strata and statistical assumptions 78 Least Squares Means with respect to strata and statistical assumptions 7 68 SPEED 6 SPEED 58 48 electric plain old EQUIPMNT word process 38 electric plain old EQUIPMNT word process 21

Basics of analytical graph theory Graph types imply a basis of logic and are not always interchangeable Even interchangeable graph types are not always equivalent (some are just non-informative) Be very clear about what you are trying to convey: models, stats or data structure Graph construction (axes, scales etc) may obscure or make clear the points you are trying to make Graph trickery is usually just that and typically subtracts from the depiction Which is best? INCOME SEX no grad hs hs grad some college EDUC college grad Female Male 22