Background to Statistics

Similar documents
Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Bio 183 Statistics in Research. B. Cleaning up your data: getting rid of problems

Introduction and Descriptive Statistics p. 1 Introduction to Statistics p. 3 Statistics, Science, and Observations p. 5 Populations and Samples p.

Chapter 15: Nonparametric Statistics Section 15.1: An Overview of Nonparametric Statistics

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics

Mitosis Data Analysis: Testing Statistical Hypotheses By Dana Krempels, Ph.D. and Steven Green, Ph.D.

Glossary for the Triola Statistics Series

Intuitive Biostatistics: Choosing a statistical test

Review of Statistics 101

z and t tests for the mean of a normal distribution Confidence intervals for the mean Binomial tests

Contents. Acknowledgments. xix

Probability and Statistics

Transition Passage to Descriptive Statistics 28

How spread out is the data? Are all the numbers fairly close to General Education Statistics

Rank-Based Methods. Lukas Meier

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

Statistics: revision

Statistics Primer. ORC Staff: Jayme Palka Peter Boedeker Marcus Fagan Trey Dejong

General Certificate of Education Advanced Level Examination June 2013

CENTRAL LIMIT THEOREM (CLT)

Variables, distributions, and samples (cont.) Phil 12: Logic and Decision Making Fall 2010 UC San Diego 10/18/2010

psychological statistics

Introduction to Statistical Inference

20 Hypothesis Testing, Part I

HYPOTHESIS TESTING II TESTS ON MEANS. Sorana D. Bolboacă

Nonparametric Statistics. Leah Wright, Tyler Ross, Taylor Brown

Lecture Slides. Elementary Statistics. by Mario F. Triola. and the Triola Statistics Series

Lecture Slides. Section 13-1 Overview. Elementary Statistics Tenth Edition. Chapter 13 Nonparametric Statistics. by Mario F.

Statistical Methods: Introduction, Applications, Histograms, Ch

THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE

FRANKLIN UNIVERSITY PROFICIENCY EXAM (FUPE) STUDY GUIDE

SESSION 5 Descriptive Statistics

Nonparametric Statistics

Big Data Analysis with Apache Spark UC#BERKELEY

Parameter Estimation, Sampling Distributions & Hypothesis Testing

Experimental Design, Data, and Data Summary

Dover- Sherborn High School Mathematics Curriculum Probability and Statistics

Non-parametric tests, part A:

Preview from Notesale.co.uk Page 3 of 63

PSY 307 Statistics for the Behavioral Sciences. Chapter 20 Tests for Ranked Data, Choosing Statistical Tests

LAB 2. HYPOTHESIS TESTING IN THE BIOLOGICAL SCIENCES- Part 2

CH.9 Tests of Hypotheses for a Single Sample

Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami

Exam details. Final Review Session. Things to Review

LOOKING FOR RELATIONSHIPS

Resistant Measure - A statistic that is not affected very much by extreme observations.

INTERVAL ESTIMATION AND HYPOTHESES TESTING

QUANTITATIVE TECHNIQUES

ESP 178 Applied Research Methods. 2/23: Quantitative Analysis

Hypothesis testing I. - In particular, we are talking about statistical hypotheses. [get everyone s finger length!] n =

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 3.1- #

ESR#1 Research Model

Distribution-Free Procedures (Devore Chapter Fifteen)

Course Review. Kin 304W Week 14: April 9, 2013

Statistical Analysis for QBIC Genetics Adapted by Ellen G. Dow 2017

Data analysis and Geostatistics - lecture VII

Introductory Econometrics. Review of statistics (Part II: Inference)

Using SPSS for One Way Analysis of Variance

Chapter 1 Statistical Inference

Lecture on Null Hypothesis Testing & Temporal Correlation

Statistics Primer. A Brief Overview of Basic Statistical and Probability Principles. Essential Statistics for Data Analysts Using Excel

Lecture 11. Data Description Estimation

Ch. 7. One sample hypothesis tests for µ and σ

Chapter 18 Sampling Distribution Models

CONTINUOUS RANDOM VARIABLES

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006

AMS 5 NUMERICAL DESCRIPTIVE METHODS

7.2 One-Sample Correlation ( = a) Introduction. Correlation analysis measures the strength and direction of association between

SEVERAL μs AND MEDIANS: MORE ISSUES. Business Statistics

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur

Statistics Introductory Correlation

Null Hypothesis Significance Testing p-values, significance level, power, t-tests Spring 2017

Power of a hypothesis test

1.3.1 Measuring Center: The Mean

Lecture 3. The Population Variance. The population variance, denoted σ 2, is the sum. of the squared deviations about the population

9/2/2010. Wildlife Management is a very quantitative field of study. throughout this course and throughout your career.

Data Analysis: Agonistic Display in Betta splendens I. Betta splendens Research: Parametric or Non-parametric Data?

Section 3.4 Normal Distribution MDM4U Jensen

General Certificate of Education Advanced Level Examination June 2014

10/4/2013. Hypothesis Testing & z-test. Hypothesis Testing. Hypothesis Testing

Let the x-axis have the following intervals:

Essentials of Statistics and Probability

MAT Mathematics in Today's World

Statistics and parameters

An inferential procedure to use sample data to understand a population Procedures

Chapter 3 Statistics for Describing, Exploring, and Comparing Data. Section 3-1: Overview. 3-2 Measures of Center. Definition. Key Concept.

2.830J / 6.780J / ESD.63J Control of Manufacturing Processes (SMA 6303) Spring 2008

Spearman Rho Correlation

Introduction to Design of Experiments

Agonistic Display in Betta splendens: Data Analysis I. Betta splendens Research: Parametric or Non-parametric Data?

Chi Square Analysis M&M Statistics. Name Period Date

Statistics Handbook. All statistical tables were computed by the author.

TOPIC 12: RANDOM VARIABLES AND THEIR DISTRIBUTIONS

Practice Problems Section Problems

Data Analysis and Statistical Methods Statistics 651

Nonparametric tests. Mark Muldoon School of Mathematics, University of Manchester. Mark Muldoon, November 8, 2005 Nonparametric tests - p.

y = a + bx 12.1: Inference for Linear Regression Review: General Form of Linear Regression Equation Review: Interpreting Computer Regression Output

Everything is not normal

Elementary Statistics Triola, Elementary Statistics 11/e Unit 17 The Basics of Hypotheses Testing

Turning a research question into a statistical question.

Transcription:

FACT SHEET Background to Statistics Introduction Statistics include a broad range of methods for manipulating, presenting and interpreting data. Professional scientists of all kinds need to be proficient in the use of statistics: they must be able to recognise and communicate what their data show. Advanced level coursework therefore requires you to demonstrate the ability to use the statistical analysis of data effectively. You must make calculations and draw graphs or charts to show and identify patterns and trends, including central tendencies and spread. You need to carry out tests to confirm relationships. Applied science presents many opportunities for the presentation and analysis of data. The ability to use graphs and descriptive statistics is of particular significance in AQA Unit 7 Planning and carrying out a scientific investigation, AQA Unit 16 Ecology, conservation and recycling, OCR Unit 8 Investigating the scientist s work and OCR Unit 9 Sampling, testing and processing. OCR Unit 14 Ecology and managing the environment additionally requires the use of inferential statistics. You can also use these elsewhere, to make data analysis and the formation of valid conclusions more effective. Inferential statistics help you to form conclusions about populations of data from samples you don t need to obtain every single possible result to be able to form a valid conclusion. You can use your analysis to support or reject a hypothesis (testable statement) and establish how confident you can be in forming your conclusions. If you use statistics well, you can collect the minimum amount of data and reduce the amount of work that you have to do! Statistical techniques include: graphs and descriptive statistics inferential statistics scatter graphs association statistics bar charts o correlation coefficients histograms o χ 2 test (chi-squared) for association error bars comparative statistics mean, mode and median o t-tests for paired or unpaired data variance and standard deviation o Mann-Whitney U-test standard error of the mean o analysis of variance (ANOVA) confidence intervals frequency statistics χ 2 test (chi-squared) Background to statistics: page 1 of 5

Graphs and descriptive statistics Average Average is not a precise mathematical term. It can be: mean ( x ) = sum of values divided by number of values mode = most frequent value median = middle value for results listed in order The mean is the arithmetic average, found by adding up all your values and dividing by the total number of separate values. It is the most often used average and can be used in further statistical analysis. However, it can be distorted by a single extreme value. The mode is the most frequently occurring value or event. It is not useful for further statistical analysis but, unlike the mean, it is not affected by extreme values. The median is the central value (or average of the two central values in an even number of values) when all the values are placed in order of magnitude (ranked). It is not useful for further statistical analysis but, unlike the mean, it is not affected by extreme values. Choose by deciding what you are going to use the statistic for: does the value give you the information you need? Sampling A population contains all the possible results, so usually a representative sample is obtained, i.e. representative of the population as a whole. The aim is to use the smallest sample that will reliably give the same statistics as the population. Variance, standard deviation and normal distribution The mean, mode or median may be the same for sets of data that spread out in different ways. Finding the range, the difference between the highest and lowest values, is a useful measurement for comparing spread. However, this does not indicate the pattern of variation in the values. Many measurements, especially biological measurements, such as adult human heights, show a normal distribution. They cluster around the mean with fewer and fewer values as you move further away on either side, giving a characteristic bell curve : The spread of the values is symmetrical about the mean, so the mean, mode and median all coincide at the same value. Background to statistics: page 2 of 5

Standard deviation is a measure of the spread of the values. For a sample it is usually given the symbol s. Sometimes the population symbol σ (Greek letter sigma) is used. Standard deviation is the square root of the variance. The variance is the average of the squared differences from the mean of each of the values. For example, if you have two values, 3 and 9: x mean, x = = (3 + 9) 2 = 6 n (Σx = the sum of all the different values, n is the number of values) differences (deviations) from the mean, x - x = 3 and -3 deviations squared, ( x - x ) 2 = 9 and 9 so average deviation or variance = Σ( x - x ) 2 / n = 18 2 = 9 and standard deviation (square root of the variance), s = [Σ( x - x ) 2 / n] = 9 = 3. If you have a statistical calculator, you may be able to use a standard deviation key to obtain its value. There are usually two keys labelled σ n and σ n-1 (σ is used, even though the keys are only for sample calculations). The n -1 refers to finding an unbiased value for the sample standard deviation: calculations based on small samples (<25 values or so) tend to slightly underestimate the true value of the standard deviation. Should standard deviation be calculated using n -1? You can use [Σ( x - x ) 2 / n-1] if you have fewer than about 25-30 values in your sample, but this is not essential. Some statisticians think that if you need to do this, it is because your data are not good enough in the first place! Variance is measured in units squared, but standard deviation restores the measurement to the same units as the original values, making it much more useful. It is the root mean square (RMS) of the deviations of the values from their mean. The standard deviation is the most commonly used measure of how widely spread values are. If many data points are close to the mean, then the standard deviation is small. Modified from an original file created by Petter Strandmark, based on an original graph by Jeremy Kemp http://en.wikipedia.org/wiki/imag e:standard_deviation_diagram.sv g http://creativecommons.org/licen ses/by/2.5/ If many data points are far from the mean, then the standard deviation is large. If all the data values are equal, then the standard deviation is zero. Background to statistics: page 3 of 5

The 68-95-99.7 or empirical rule for a normal distribution: 68% or roughly 2/3 of the observations fall within 1 standard deviation of the mean 95% of the observations fall within 2 standard deviations of the mean 99.7% of the observations fall within 3 standard deviations of the mean Inferential statistics Statistical tests, probability and significance If you measure limpet shells at two different rocky shores, you may find that your mean values are different. Statistical tests allow you to decide if the difference is big enough to be significant. This is based on the idea that results can be influenced by chance as well as caused by an effect. Even if you measured two samples of shells in the same area, you would expect a difference between the means of your samples, due to chance alone. So you need to decide if the observed difference is a real difference. You must test the hypothesis that there is a significant difference in shell diameters between the limpets, not due to chance alone. For statistical tests, a null hypothesis must often be used (usually given the symbol H 0 ). This is a hypothesis that you try to reject so that your original hypothesis (the alternative hypothesis, H A or H 1 ) is supported. In this case, you could test the null hypothesis: There is no significant difference between the diameters of the limpet shells found in the two places, any apparent difference is due solely to chance. If you can reject this hypothesis, the converse alternative is supported: There is a significant difference between the diameters of limpet shells in the two places. To make the decision, you use statistical tests to find the probability (p) of getting a particular set of results if a hypothesis is true. p = 1 represents certainty, p = 0.5 is 1 in 2 or 50% and p = 0.1 is 1 in 10 or 10%. Probability can be worked out by considering how many times an event can occur and dividing by the total number of all possible events. Hence, the probability of obtaining a head when tossing a coin is ½, i.e. p = 0.5. For being dealt an ace at random from a pack of cards, p = 4/52 = 0.077. Significance in biological investigations is usually set at p 0.05 (probability equal to or less than 0.05). Looking at the data that you would expect if the hypothesis is true, at p = 0.05 you would expect to get that type of data1 in 20, or 5% of the time. This probability is chosen because more than 1 in 20 is thought to be reasonably likely or better. At a probability of 1 in 20 or less, events become more and more unlikely. Background to statistics: page 4 of 5

If p 0.05 the hypothesis can be rejected with confidence. The data are unlikely and don t support the hypothesis. If p > 0.05 the hypothesis cannot be rejected with confidence. The data are as expected; it is reasonably likely that the hypothesis is true. Confidence levels You can use the p 0.05 cut-off level or, if a probability value can be found, the smaller the probability becomes the greater your confidence: p 0.05 reject with confidence p 0.01 reject with a high level of confidence p 0.001 reject with a very high level of confidence. Is p 0.01 better than p 0.05? When you reject a hypothesis at p = 0.05, on average you will be wrong once every 20 times. Why not set a stricter significance level in the first place, say p 0.01? The trouble with biological data is that it can vary quite a bit and give inconsistent results. If you make significance too difficult, it becomes more likely that you will fail to support hypotheses for effects that are really there. p 0.05 has been found by experience to be a good rule of thumb that seems to work well for biology investigations where results may vary quite a lot. In physics or chemistry investigations data may be much more precise. With more highly controlled variables and accurate measuring instruments, p 0.01, p 0.001 or even more exacting confidence levels are more appropriate. It is important to choose your minimum confidence level before you have analysed your data! If your data show strong support for a hypothesis, you can report that you have exceeded the confidence level after the choice has been made and justified. Parametric and non-parametric tests You should choose a parametric test for your data if you can show (or reasonably assume) that your data are measured using a continuous interval scale and are more or less normally distributed. The mean, mode and median should coincide or a histogram should give a bell curve. These tests include Pearson s correlation coefficient and t-tests. Parametric tests are more powerful than non-parametric tests because they require more precise data and make use of the extra assumptions that can be associated with the normal distribution of data. You should choose non-parametric tests if data are not reasonably normally distributed or not measured on a continuous interval scale (divisible into smaller equal sized units). These include Spearman s correlation coefficient, χ 2 (chi-squared) tests and the Mann-Whitney U-test. These tests will work for data where ranks or arbitrary scores have be awarded rather than using accurate measurements using small scale divisions. Background to statistics: page 5 of 5