Rigorous Science - Based on a probability value? The linkage between Popperian science and statistical analysis

Similar documents
Rigorous Science - Based on a probability value? The linkage between Popperian science and statistical analysis

Rigorous Science - Based on a probability value? The linkage between Popperian science and statistical analysis

Rigorous Science - Based on a probability value? The linkage between Popperian science and statistical analysis

Marine Ecology BIOL 529, 529L, 529I. Course Objectives. Text and other Reading Textbook: No textbook. Format of Course. Grading. What is Ecology?

Introduction to hypothesis testing

Astronomy 301G: Revolutionary Ideas in Science. Getting Started. What is Science? Richard Feynman ( CE) The Uncertainty of Science

PSY 305. Module 3. Page Title. Introduction to Hypothesis Testing Z-tests. Five steps in hypothesis testing

Political Science 236 Hypothesis Testing: Review and Bootstrapping

hypothesis a claim about the value of some parameter (like p)

BIO3011 RESEARCH METHODS Scientific Method: an introduction. Dr Alistair Hamilton

Bio 183 Statistics in Research. B. Cleaning up your data: getting rid of problems

Philosophy and History of Statistics

Statistics Primer. ORC Staff: Jayme Palka Peter Boedeker Marcus Fagan Trey Dejong

I. Induction, Probability and Confirmation: Introduction

Sampling Distributions

II Scientific Method. II Scientific Method. A. Introduction. A. Introduction. B. Knowledge. B. Knowledge. Attainment of knowledge

The problem of base rates

Probability and Statistics

Parameter Estimation, Sampling Distributions & Hypothesis Testing

HYPOTHESIS TESTING. Hypothesis Testing

Chapter Three. Hypothesis Testing

Chapter 9 Inferences from Two Samples

Test Yourself! Methodological and Statistical Requirements for M.Sc. Early Childhood Research

Last week: Sample, population and sampling distributions finished with estimation & confidence intervals

Last two weeks: Sample, population and sampling distributions finished with estimation & confidence intervals

Monday, December 15, 14. The Natural Sciences

Statistical inference

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur

Testing Research and Statistical Hypotheses

AMS7: WEEK 7. CLASS 1. More on Hypothesis Testing Monday May 11th, 2015

1 Descriptive statistics. 2 Scores and probability distributions. 3 Hypothesis testing and one-sample t-test. 4 More on t-tests

Applied Statistics for the Behavioral Sciences

An inferential procedure to use sample data to understand a population Procedures

a. See the textbook for examples of proving logical equivalence using truth tables. b. There is a real number x for which f (x) < 0. (x 1) 2 > 0.

Introduction to Statistical Data Analysis III

On Likelihoodism and Intelligent Design

CH.9 Tests of Hypotheses for a Single Sample

Chapter 24. Comparing Means. Copyright 2010 Pearson Education, Inc.

EC 331: Research in Applied Economics

Background to Statistics

Chapter 2. Mathematical Reasoning. 2.1 Mathematical Models

Hypothesis testing. Data to decisions

Quantitative Methods for Economics, Finance and Management (A86050 F86050)

Data Mining Chapter 4: Data Analysis and Uncertainty Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018

Hypothesis Testing. Hypothesis: conjecture, proposition or statement based on published literature, data, or a theory that may or may not be true

Two-Sample Inferential Statistics

Popper s Measure of Corroboration and P h b

Business Statistics: Lecture 8: Introduction to Estimation & Hypothesis Testing

Chapter 7 Comparison of two independent samples

Supplementary Notes on Inductive Definitions

PHI Searle against Turing 1

For True Conditionalizers Weisberg s Paradox is a False Alarm

Introduction To Mathematical Modeling

Truth, Subderivations and the Liar. Why Should I Care about the Liar Sentence? Uses of the Truth Concept - (i) Disquotation.

Human-Centered Fidelity Metrics for Virtual Environment Simulations

Introduction to the Analysis of Variance (ANOVA)

STAT Chapter 8: Hypothesis Tests

Lecture Slides. Elementary Statistics Eleventh Edition. by Mario F. Triola. and the Triola Statistics Series 9.1-1

Basic Statistics. 1. Gross error analyst makes a gross mistake (misread balance or entered wrong value into calculation).

Keppel, G. & Wickens, T.D. Design and Analysis Chapter 2: Sources of Variability and Sums of Squares

Introducing Proof 1. hsn.uk.net. Contents

Primer on statistics:

Uni- and Bivariate Power

Harvard University. Rigorous Research in Engineering Education

Inferences for Regression

Practice Problems Section Problems

CENTRAL LIMIT THEOREM (CLT)

A Strong Relevant Logic Model of Epistemic Processes in Scientific Discovery

Student s t-distribution. The t-distribution, t-tests, & Measures of Effect Size

Chi Square Analysis M&M Statistics. Name Period Date

Section 1.3: Valid and Invalid Arguments

Sampling distribution of t. 2. Sampling distribution of t. 3. Example: Gas mileage investigation. II. Inferential Statistics (8) t =

INTRODUCTION TO ANALYSIS OF VARIANCE

Introduction to Econometrics. Review of Probability & Statistics

For True Conditionalizers Weisberg s Paradox is a False Alarm

10/4/2013. Hypothesis Testing & z-test. Hypothesis Testing. Hypothesis Testing

Basics of Experimental Design. Review of Statistics. Basic Study. Experimental Design. When an Experiment is Not Possible. Studying Relations

Statistical Inference. Why Use Statistical Inference. Point Estimates. Point Estimates. Greg C Elvers

280 CHAPTER 9 TESTS OF HYPOTHESES FOR A SINGLE SAMPLE Tests of Statistical Hypotheses

Topic 17: Simple Hypotheses

Competing Models. The Ptolemaic system (Geocentric) The Copernican system (Heliocentric)

Williamson s Modal Logic as Metaphysics

CHAPTER 9: HYPOTHESIS TESTING

Introductory Econometrics

Lab #12: Exam 3 Review Key

HOW TO WRITE PROOFS. Dr. Min Ru, University of Houston

Tutorial on Mathematical Induction

IENG581 Design and Analysis of Experiments INTRODUCTION

Mitosis Data Analysis: Testing Statistical Hypotheses By Dana Krempels, Ph.D. and Steven Green, Ph.D.

COPYRIGHTED MATERIAL. Table of Contents. xix. List of Figures xv Acknowledgments. Introduction 1. Part I: Fundamental Issues 5

9/2/2010. Wildlife Management is a very quantitative field of study. throughout this course and throughout your career.

Philosophy 5340 Epistemology. Topic 3: Analysis, Analytically Basic Concepts, Direct Acquaintance, and Theoretical Terms. Part 2: Theoretical Terms

Formal Statement of Simple Linear Regression Model

STA Module 10 Comparing Two Proportions

Composite Hypotheses. Topic Partitioning the Parameter Space The Power Function

First we look at some terms to be used in this section.

Course: Physics 1 Course Code:

Topic 1: Introduction to the Principles of Experimental Design Reading

Machine Learning

Transcription:

Rigorous Science - Based on a probability value? The linkage between Popperian science and statistical analysis

The Philosophy of science: the scientific Method - from a Popperian perspective Philosophy How we understand the world Science How we expand that understanding Design How we implement science Arguments over how we understand and expand our understanding are the basis of debates over how science has been, is and should be done

The Philosophy of science: the scientific Method - from a Popperian perspective Terms: 1. Science - A method for understanding rules of assembly or organization a) Problem: How do we, (should we) make progress in science 2. Theory - a set of ideas formulated to explain something 3. Hypothesis - supposition or conjecture (prediction) put forward to account for certain facts, used as a basis for further investigations 4. Induction or inductive reasoning - reasoning that general (universal) laws exist because particular cases that seem to be examples of those laws also exist 5. Deduction or deductive reasoning - reasoning that something must be true because it is a particular case of a general (universal) law Induction Deduction Particular General

The Scientific Method - from a Popperian perspective Extreme example 1. Induction Every swan I have seen is white, therefore all swans are white 2. Deduction All swans are white, the next one I see will be white Compare these statements: 1. Which can be put into the form of a testable hypothesis? (eg. prediction, if - then statement) 2. Which is closer to how we operate in the world? 3. Which type of reasoning is most repeatable? Is there a difference between ordinary understanding and scientific understanding (should there be?)

The Scientific Method - from a Popperian perspective Hypothetico - deductive method 1. Conception - Inductive reasoning a. Observations b. Theory c. Problem d. Regulation e. Belief Conception Previous Observations - Inductive reasoning Perceived Problem Belief 2. Leads to Insight and a General Hypothesis 3. Assessment is done by a. Formulating Specific hypotheses b. Comparison with new observations 4. Which leads to: a. Falsification - and rejection of insight, and specific and general hypotheses, or Existing Theory Confirmation H A supported (accepted) H O rejected Assessment INSIGHT General hypothesis Specific hypotheses (and predictions) Comparison with new observations - Deductive reasoning Falsification H A rejected H O supported (accepted) b. Confirmation - and retesting of alternative hypotheses

The Scientific Method - from a Popperian perspective Hypothetico - deductive method Questions and Notes 1. Is there any provision for accepting the insight or working hypothesis? 2. Propositions not subject to rejection by contrary observations are not scientific 3. Confirmation does not end hypothesis testing - new hypotheses should always be put forth for a particular observation, theory, belief 4. In practice but rarely reported, alternatives are tested until only one (or a few) are left (not rejected). Then we say things like: suggest, indicates, is evidence for 5. Why is there no provision for accepting theory or working hypotheses? Conception Previous Observations Perceived Problem INSIGHT Existing Theory General hypothesis All swans are white Confirmation H A supported (accepted) H O rejected Assessment - Inductive reasoning Comparison with new observations Belief Specific hypotheses The next swan I see will be white) - Deductive reasoning Falsification H A rejected H O supported (accepted) a) Because it is easy to find confirmatory observations for almost any hypothesis, but one negative result refutes it absolutely (this assumes test was adequate - the quality of falsification is important)

The Scientific Method - from a Popperian perspective Hypothetico - deductive method Considerations - problems with the Popperian hypothetico -deductive approach) 1) This type of normal science may rarely lead to revolutions in Science (Kuhn) A) Falsification science leads to paradigms - essentially a way of doing and understanding science that has followers B) Paradigms have momentum - mainly driven by tradition, infrastructure and psychology C) Evidence against accepted theory is considered to be exceptions (that prove the rule) D) Only major crises lead to scientific revolutions 1) paradigms collapse from weight of exceptions - normal science - crisis - revolution - normal science Conception - Inductive reasoning Previous Observations Perceived Problem INSIGHT Existing Theory General hypothesis All swans are white Specific hypotheses Comparison with new observations Belief The next swan I see will be white) Confirmation Falsification H A supported (accepted) H O rejected H A rejected H O supported (accepted) Assessment - Deductive reasoning

1. The paradigm: The earth must be the center of the universe 35 BC 2. Exceptions are explained- Ptolemaic universe a) All motion in the heavens is uniform circular motion. b) The objects in the heavens are made from perfect material, and cannot change their intrinsic properties (e.g., their brightness). c) The Earth is at the center of the Universe. 3. Paradigm nears scientific collapse 4. Religion Intervenes middle ages 1 2 3

The Copernican Revolution 1543 AD

The Scientific Method - from a Popperian perspective Hypothetico - deductive method Considerations - problems with the Popperian hypothetico -deductive approach) 1) This type of normal science may rarely lead to revolutions in Science (Kuhn) A) Falsification science leads to paradigms - essentially a way of doing and understanding science that has followers B) Paradigms have momentum - mainly driven by tradition, infrastructure and psychology C) Evidence against accepted theory is considered to be exceptions (that prove the rule) D) Only major crises lead to scientific revolutions 1) paradigms collapse from weight of exceptions - normal science - crisis - revolution - normal science Aristotle - Ptolemian universe Copernican Universe

The Scientific Method - from a Popperian perspective Hypothetico - deductive method Considerations - problems with the Popperian hypothetico -deductive approach) 1) Choice of Method for doing science. Platt (1964) reviewed scientific discoveries and concluded that the most efficient way of doing science consisted of a method of formal hypothesis testing he called Strong Inference. A) Apply the following steps to every problem in Science - formally, explicitly and regularly: 1) Devise alternative hypotheses 2) Devise critical experiments with alternative possible outcomes, each of which will exclude one or more of the hypotheses (rejection) 3) Carry out procedure so as to get a clean result 1 ) Recycle the procedure, making subhypotheses or sequential ones to define possibilities that remain Conception - Inductive reasoning Previous Observations Perceived Problem INSIGHT Existing Theory General hypothesis All swans are white Specific hypotheses Comparison with new observations Belief The next swan I see will be white) Confirmation Falsification H A supported (accepted) H O rejected Assessment - Deductive reasoning H A rejected H O supported (accepted)

The Scientific Method - from a Popperian perspective Hypothetico - deductive method Considerations - problems with the Popperian hypothetico -deductive approach) 2) Philosophical opposition - (e.g. Roughgarden 1983) A) Establishment of empirical fact is by building a convincing case for that fact. B) We don t use formal rules in everyday life, instead we use native abilities and common sense in building and evaluating claims of fact C) Even if we say we are using the hypothetico - deductive approach, we are not, instead we use intuition and make it appear to be deduction Conception - Inductive reasoning Previous Observations Perceived Problem INSIGHT Existing Theory General hypothesis All swans are white Specific hypotheses Comparison with new observations Belief The next swan I see will be white) Confirmation Falsification H A supported (accepted) H O rejected H A rejected H O supported (accepted) Assessment - Deductive reasoning

The Scientific Method - from a Popperian perspective Hypothetico - deductive method Considerations - problems with the Popperian hypothetico -deductive approach) 3) Practical opposition - (e.g. Quinn and Dunham 1983) A) In practice ecology and evolution differ from Popperian science 1) they are largely inductive 2) although falsification works well in physical and some experimental areas of biology - it is difficult to apply in complex systems of multiple causality - e.g. Ecology and Evolution 3) Hypothetico - deductive reasoning works well if potential cause is shown not to work at all (falsified) but this rarely occurs in Ecology or Evolution - usually effects are of degree. Conception - Inductive reasoning Previous Observations Perceived Problem INSIGHT Existing Theory General hypothesis All swans are white Specific hypotheses Comparison with new observations Belief The next swan I see will be white) Confirmation Falsification H A supported (accepted) H O rejected Assessment - Deductive reasoning H A rejected H O supported (accepted) This may be a potent criticism and it leads to the use of inferential statistics

Absolute vs. measured differences A) Philosophical underpinnings of Popperian Method is based on absolute differences 1) E.g. All swans are white, therefore the next swan I see will be white - If the next swan is not white then the hypothesis is refuted absolutely. B) Instead, most results are based on comparisons of measured variables 1) not really true vs. false but degree to which an effect exists Example - Specific hypothesis number of Oak seedlings is higher in areas outside impact sites than inside impact sites Observation 1: Number inside Number outside 1 15 18 12 13 Mean 13 What counts as a difference? Are these different? Observation 2: Number inside Number outside 3 1 5 7 2 9 8 12 7 8 Mean 5 9.2

A brief digression to re-sampling theory Number inside Number outside 3 1 5 7 2 9 8 12 7 8 Mean 5 9.2 Traditional evaluation would probably involve a t test: another approach is re-sampling.

Resampling Treatment Number Inside 3 Inside 5 Inside 2 Inside 8 Inside 7 Outside 1 Outside 7 Outside 9 Outside 12 Outside 8 1) Assume both treatments come from the same distribution, that is, if sampled sufficiently we would find no difference between the values inside vs. outside. a. Usually we compare the means. 2) Resample groups of 5 observations (why 5?), with replacement, but irrespective of treatment

Resampling Treatment Number Inside 3 Inside 5 Inside 2 Inside 8 Inside 7 Outside 1 Outside 7 Outside 9 Outside 12 Outside 8 1) Assume both treatments come from the same distribution 2) Resample groups of 5 observations, with replacement, but irrespective of treatment

Resampling Treatment Number Inside 3 Inside 5 Inside 2 Inside 8 Inside 7 Outside 1 Outside 7 Outside 9 Outside 12 Outside 8 1) Assume both treatments come from the same distribution 2) Resample groups of 5 observations, with replacement, but irrespective of treatment 3) Calculate means for each group of 5 7.6

Resampling Treatment Number Inside 3 Inside 5 Inside 2 Inside 8 Inside 7 Outside 1 Outside 7 Outside 9 Outside 12 Outside 8 1) Assume both treatments come from the same distribution 2) Resample groups of 5 observations, with replacement, but irrespective of treatment 3) Calculate mean for each group of 5 4) Repeat many times 5) Calculate differences between pairs of means (remember the null hypothesis is that there is no effect of treatment). This generates a distribution of differences.

Number of Observations Mean 1 Mean 2 Difference 8 7.8.2 5.6 8.2-2.6 6 9-3 8 5 3 6 6 7 8-1 6 6.8 -.8 8 7.2.8 8 6.6 1.4 7 8.4-1.4 6 5.4.6 7 6.4.6 6.4 6.8 -.4 5 3.4 1.6 6.8 4.8 2 6.4 7.2 -.8 7.2 8 -.8 6.4 4.6 1.8 8.4 6 2.4 7.4 6.6.8 5.6 8.4-2.8 8.2 6.2 2 7.8 8.4 -.6 8.6 6.6 2 6 1.2-4.2 6.8 5.6 1.2 6.4 7.8-1.4 7.2 4.8 2.4 6.6 7.2 -.6 7 5.2 1.8 6.6 9.8-3.2 8.4 7.8.6 25 2 15 1 5 Distribution of differences 1 observations -1-5 5 1 Difference in Means OK, now what?.2 Proportion.1 per Bar.

Compare distribution of differences to real difference Number inside Number outside 3 1 5 7 2 9 8 12 7 8 Mean 5 9.2 Real difference = 4.2

Estimate likelihood that real difference comes from two similar distributions Proportion of Mean 1 Mean 2 Difference differences less than current 1.2 3.6 6.6 1 1 3.8 6.2.999 1.2 4.4 5.8.998 9.2 3.6 5.6.997 9.8 4.8 5.996 8.8 4.2 4.6.995 9.6 5.2 4.4.994 9.8 5.6 4.2.993 9.8 5.8 4.992 9.4 5.4 4.991 And on through 1 differences Likelihood is.7 that distributions are the same What are constraints of this sort of approach?

Number Number T-test vs resampling Variable Treatment N Mean Standard Two-Sample t-test Deviation Number Inside 5 5 2.54951 15 15 Outside 5 9.2 1.923538 Pooled Variance 1 1 Variable Treatment Mean Difference 95.% Confidence Interval Lower Limit Upper Limit t df p-value 5 5 Number Inside -4.2-7.49363 -.9637-2.9459 8.18693 Outside 3 2 Count 1 1 2 3 Count Treatment Inside Outside Test P-value Resampling.7 T-test.19 Why the difference?

Statistical analysis - cause, probability, and effect I) What counts as a difference - this is a statistical philosophical question of two parts A) How shall we compare measurements - statistical methodology B) What will count as a significant difference - philosophical question and as such subject to convention II) Statistical Methodology - General A) Null hypotheses - H o 1) In most sciences, we are faced with: a) NOT whether something is true or false (Popperian decision) b) BUT rather the degree to which an effect exists (if at all) - a statistical decision. 2) Therefore 2 competing statistical hypotheses are posed: a) H A : there is a difference in effect between (usually posed as < or >) b) H O : there is no difference in effect between Specific hypothesis H A : Number of oak seedlings is Greater in areas outside impact Sites than inside impact sites Conception Previous Observations Existing Theory Confirmation H A supported (accepted) H O rejected - Inductive reasoning Perceived Problem INSIGHT General hypothesis Specific hypotheses (and predictions) Comparison with new observations Belief Falsification H A rejected H O supported (accepted) Assessment - Deductive reasoning

Statistical analysis - cause, probability, and effect Statistical tests - a set of rules whereby a decision about hypotheses is reached (accept, reject) 1) Associated with rules - some indication of the accuracy of the decisions - that measure is a probability statement or p-value 2) Statistical hypotheses: a) do not become false when a critical p-value is exceeded b) do not become true if bounds are not exceeded c) Instead p-values indicate a level of acceptable uncertainty d) critical p-values are set by convention - what counts as acceptable uncertainty e) Example - if critical p-value =.5 this means that we are unwilling to accept the posed alternative hypothesis unless: 1) we 95% sure that it is correct, or equivalently that 2) we are willing to accept an error rate of 5% or less that we are wrong when we accept the hypothesis

Statistical analysis - cause, probability, and effect The logic of statistical tests - how they are performed 1) Assume the Null Hypothesis (H o ) is true (e.g. no difference in number of Oak seedlings in impact and non - impact sites). 2) Compare measurements - generally this means comparing two sample distribution s (determined from the numbers from the experiment or survey) A) Comparison of distributions - generally by comparing means and the estimate of error associated with the sampling of the means simplest case is the Standard E rror of the M ean (SE or SEM) = S x.5 sd/n = standa rd deviation / square root of level of replication 3) Determine the probability that distributions are similar/different 4) Compare with a critical p - value to assign significance.

Calculation of statistical distributions Distribution of Oak Seedlings - pre-impact VALUES 5 45 4 35 3 25 2 15 1 5 Sites = 1 Mean per site = 25 Total seedlings = 25

Evaluate effect of sample size on calculation of and confidence in Mean Compare 5 iterations for sample size's of 1 cells, 22 27 12 25 33 36 41 19 23 23 24 21 36 28 28 25 17 4 16 Example: sample 1 sites to determine mean x fifty iterations 31 X=21.5 X=25.8

Number of cases True Mean = 25 36 22 27 19 41 12 25 33 23 31 Mean = 21.5 23 36 Means 21.5 22.3 23. 23.9 24.9 25.1 25.8 26.5 27.8 29.9 24 28 28 25 21 17 4 16 Mean = 25.8 Estimate of Mean

Do we ever do this? that is, take repeated samples to generate a distribution of means? One approach that approximates this is resampling, which uses measured observations to build a distribution of means. Limited by Other more traditional approach is to approximate distribution of means using a statistical distribution What is needed Mean Standard deviation

Number of cases Number of cases Can be approximated True distribution of means By mean and standard error of a single sample Estimate of Mean Estimate of Mean

Number of cases Also there is an effect of number of observations on estimate of population mean 16 Frequency distributions of sample means 12 8 4 15 2 25 3 35 Estimate of Mean Mean based on X observations 5 1 2 5 99

Types of statistical error Type 1 and II Type 1 and Type II error. 1) By convention, the hypothesis tested is the null hypothesis (no difference between) a) In statistics, assumption is made that a hypothesis is true (assume H O true = assume H A false) b) accepting H O (saying it is likely to be true) is the same as rejecting H A (falsification) c) Scientific method is to falsify competing alternative hypotheses (alternative H A s) 2) Errors in decision making Decision Truth Accept H O Reject H O H O true no error (1- ) Type I error ( ) H O fal se Type II error ( ) no error (1- ) Type I error - probability that we mistakenly reject a true null hypothesis (H O ) Type II error - probability that we mistakenly fail to reject (accept) a false null hypothesis Power of Test - probability (1 - ) of n ot committing a Type II error - The more powerful the test the more likely you are to correctly conclude that an effect exists when it really does (reject H O when H O false = accept H A when H A true).

Conception - Inductive reasoning Perceived Problem Scientific method and statistical errors Previous Observations INSIGHT Existing Theory General hypothesis Belief Confirmation Specific hypotheses (and predictions) Falsification H H A O supported (accepted) rejected Comparison with new observations H H A O rejected supported (accepted) Assessment - Deductive reasoning Decision Truth Accept H O Reject H O H O true no error (1-alpha) Type I error (alpha) H O false Type II error (beta) no error (1-beta)

Conception - Inductive reasoning Perceived Problem Scientific method and statistical errors Oil leaking on a number of sites with Oaks Existing Theory INSIGHT Oil affects Oak seedlings Belief - case example Confirmation Seedling # will be higher In control sites than on impact sites H A supported (accepted) H O rejected More seedlings in control sites Compare Seedling # on impact and control sites Falsification H A rejected H O supported (accepted) No difference Assessment - Deductive reasoning Decision Truth Accept H O Reject H O H O true no error (1-alpha) Type I error (alpha) H O false Type II error (beta) no error (1-beta)

Error types and implications in basic and environmental science Monitoring Conclusion Biological Truth No Impact Impact No Impact Impact Correct decision No impact detected Type II Error Failure to detect real impact; false sense of security Type 1 Error False Alarm Correct decision Impact detected Decision Truth Accept H O Reject H O H O true no error (1-alpha) Type I error (alpha) H O false Type II error (beta) no error (1-beta) What type of error should we guard against?

Statistical Power, effect size, replication and alpha Statistical comparison of two distributions y 1 y 2

Sampling Objectives To obtain an unbiased estimate of a population mean To assess the precision of the estimate (i.e. calculate the standard error of the mean) To obtain as precise an estimate of the parameters as possible for time, effort and money spent

Measures of location Population mean (m) - the average value Sample mean = estimates m (true mean) y Population median - the middle value Sample median estimates population median In a normal distribution the mean=median (also the mode), this is not ensured in other distributions Y Mean & median Y Median Mean

Measures of dispersion Sample variance (s 2 ) estimates population variance (x i - x) 2 n - 1 Standard deviation (s) square root of variance same units as original variable

Measures (statistics) of Dispersion Sample Sum of Squares SS = (x i - x) 2 Sample variance s 2 = Note, units are squared Denominator is (n-1) (x i - m) 2 n (x i - x) 2 n - 1 Sample standard deviation s = Note, units are not squared (x i - x) 2 n - 1 Standard error of the mean se = s 2 n = s n

t statistic Assuming Null Hypothesis is true: Transforms differences in means to a value from distribution having a mean of and standard deviation of 1 y - y 1 2 s / n

Area under the curve = 1..4 Null distribution.3 Ho: Probability.2 y1 = y2.1. -5-4 -3-2 -1 1 2 3 4 5 y - y 1 2 s / n

t statistic interpretation and units The deviation between means is expressed in terms of Standard error (i.e. Standard deviations of the sampling distribution) Hence the value of t s are in standard errors For example t=2 indicates that the deviation (y 1 - y 2 ) is equal to 2 x the standard error y - y 1 2 s / n

y 1 Ho true: Distributions of means are truly the same y 2 3.3 Count 2 1.2.1 Proportion per Bar 25 t distribution. 9.5 9.7 9.9 1.1 1.3 1.5 Mean Value from Sample 25 t = y 1 - s p y 1 -+ s p y 2 1 1 + n 1 n 2 Count 2 15 1 5.2.1 Proportion per Bar Count 2 15 1 5.2.1 Proportion per Bar. -4-3 -2-1 1 2 3 4 t - value. 9.5 9.7 9.9 1.1 1.3 1.5 Mean Value from Sample

y 1 Ho true: Distributions of means are truly the same 1) Estimate of y 1 = estimate of y 2 y 2 3.3 Count 2 1.2.1 Proportion per Bar 25 t distribution. 9.5 9.7 9.9 1.1 1.3 1.5 Mean Value from Sample 25 t = y 1 - s p y 1 -+ s p y 2 1 1 + n 1 n 2 Count 2 15 1 5.2.1 Proportion per Bar Count 2 15 1 5.2.1 Proportion per Bar. -4-3 -2-1 1 2 3 4 t - value. 9.5 9.7 9.9 1.1 1.3 1.5 Mean Value from Sample

y 1 Ho true: Distributions of means are truly the same 2) Estimate of y 1 = estimate of y 2 y 2 3.3 Count 2 1.2.1 Proportion per Bar 25 t distribution. 9.5 9.7 9.9 1.1 1.3 1.5 Mean Value from Sample 25 t = y 1 - s p y 1 -+ s p y 2 1 1 + n 1 n 2 Count 2 15 1 5.2.1 Proportion per Bar Count 2 15 1 5.2.1 Proportion per Bar. -4-3 -2-1 1 2 3 4 t - value. 9.5 9.7 9.9 1.1 1.3 1.5 Mean Value from Sample

Ho true: Distributions of means are truly the same y 1 t = y 1 - s p y 1 -+ s p y 2 y 2 1 1 + n 1 n 2 Truth H O true H O false Decision Accept H O no error (1-alpha) Type II error (beta) Reject H O Type I error (alpha) no error (1-beta) t distribution (df)

Ho true: Distributions of means are truly the same assign critical p (assume =.5) t Truth H O true H O false Decision Accept H O no error (1-alpha) Type II error (beta) Reject H O Type I error (alpha) no error (1-beta) t = s p y 1 - s p y 1 -+ y 2 1 1 + n 1 n 2 t distribution (df) t c If distributions are truly the same then: area to right of critical t C represents the Type 1 error rate (Blue); any calculated t > t C will cause incorrect conclusion that distributions are different

Ho false: Distributions of means are truly different y 2 y 1 5.5 4.4 Count 3 2 1.3.2.1 Proportion per Bar 4.4. 9.5 1. 1.5 11. Mean value from sample 5.5 t = y 1 - s p y 1 -+ s p y 2 1 1 + n 1 n 2 Count 3 2 1.3.2.1 Proportion per Bar 4.4 Count 3 2 1.3.2.1 Proportion per Bar. -5 5 1 t - value. 9.5 1. 1.5 11. Mean value from sample

Ho false: Distributions of means are truly different y 2 y 1 5.5 4.4 Count 3 2 1.3.2.1 Proportion per Bar 4.4. 9.5 1. 1.5 11. Mean value from sample 5.5 t = y 1 - s p y 1 -+ s p y 2 1 1 + n 1 n 2 Count 3 2 1.3.2.1 Proportion per Bar 4.4 Count 3 2 1.3.2.1 Proportion per Bar. -5 5 1 t - value. 9.5 1. 1.5 11. Mean value from sample

Ho false: Distributions of means are truly different y 2 y 1 5.5 4.4 Count 3 2 1.3.2.1 Proportion per Bar 4.4. 9.5 1. 1.5 11. Mean value from sample 5.5 t = y 1 - s p y 1 -+ s p y 2 1 1 + n 1 n 2 Count 3 2 1.3.2.1 Proportion per Bar 4.4 Count 3 2 1.3.2.1 Proportion per Bar. -5 5 1 t - value. 9.5 1. 1.5 11. Mean value from sample

Ho false: Distributions of means are truly different y 2 y 1 t = y 1 - s p y 1 -+ s p y 2 1 1 + n 1 n 2 Non-central t distribution Central t distribution t distribution (df) t

Ho false: Distributions of means are truly different assign critical p (assume =.5) Decision Central t distribution Truth H O true H O false Accept H O no error (1-alpha) Type II error (beta) Reject H O Type I error (alpha) no error (1-beta) t = y 1 - s p y 1 -+ s p y 2 1 1 + n 1 n 2 t distribution (df) t c t If distributions are truly different then: area to left of critical t C represents the Type II error rate (Red); any Calculated t < t C will cause incorrect conclusion that distributions are same

How to control Type II error (distributions are truly different) This will maximize statistical power to detect real impacts 1) Vary critical P-Values 2) Vary Magnitude of Effect 3) Vary replication Truth H O true H O false Decision Accept H O no error (1-alpha) Type II error (beta) Reject H O Type I error (alpha) no error (1-beta) Critical p t Central t distribution Type II error t c Type I error

How to control Type II error (distributions are truly different) 1) Vary critical P-Values (change blue area) Reference Critical p t Truth H O true H O false Decision Accept H O no error (1-alpha) Type II error (beta) Reject H O Type I error (alpha) no error (1-beta) Type II error t c Type I error Critical p A) Make critical P more stringent (smaller) Type II error increases Power decreases t c Critical p A) Relax critical P (larger values) Type II error decrease Power increases t c

How to control Type II error (distributions are truly different) 2) Vary magnitude of effect (vary distance between y 1 and y 2, which affects non-central t distribution Count 7 6 5 4 3 2 1 y = 1 A..7.6.5.4.3.2.1. 9.5 1.5 11.5 12.5 Mean value from sample Proportion per Bar Count 6 5 4 3 2 1 B vs A.6.5.4.3.2.1 Proportion per Bar Count 6 5 4 3 2 1 y = 1.5 B..6.5.4.3.2.1 Proportion per Bar t = y 1 - s p y 1 -+ s p y 2 1 1 + n 1 n 2. -5 5 1 15 2 t value 6.6 Count. 9.5 1.5 11.5 12.5 Mean value from sample 7 6 5 4 3 2 1 C. y = 12.7.6.5.4.3.2.1. 9.5 1.5 11.5 12.5 Mean value from sample Proportion per Bar Count 5 4 3 2 1 C vs A.5.4.3.2.1. -5 5 1 15 2 t value Proportion per Bar

How to control Type II error (distributions are truly different) 2) Vary magnitude of effect (vary distance between y 1 and y 2, which affects non-central t distribution Reference Critical p t Truth H O true H O false Decision Accept H O no error (1-alpha) Type II error (beta) Reject H O Type I error (alpha) no error (1-beta) Type II error t t c Type I error A) Make distance smaller Type II error increases Power decreases t c A) Make distance greater t Type II error decreases Power increases t c

How to control Type II error (distributions are truly different) 3) Vary replication (which controls estimates of error) - Note Type 1 error is constant and t c is allowed to vary Reference Critical p t Truth H O true H O false Decision Accept H O no error (1-alpha) Type II error (beta) Reject H O Type I error (alpha) no error (1-beta) A) Decrease replication Type II error t c Type I error t Type II error increases Power decreases t c A) Increase replication t Type II error decrease Power increases t c

Number Number T-test vs resampling Variable Treatment N Mean Standard Two-Sample t-test Deviation Number Inside 5 5 2.54951 15 15 Outside 5 9.2 1.923538 Pooled Variance 1 1 Variable Treatment Mean Difference 95.% Confidence Interval Lower Limit Upper Limit t df p-value 5 5 Number Inside -4.2-7.49363 -.9637-2.9459 8.18693 Outside 3 2 Count 1 1 2 3 Count Treatment Inside Outside Test P-value Resampling.7 T-test.19 Why the difference?

POWER High Power Low Low Replication High High High Power Power Low Small Large Low Low High Magnitude of Effect Critical alpha

Type I error, Power and experimental design Assume you want to know if predation affects Triplefin density You want to be able to detect: 2% difference in density Have power = 8% (.8) Have critical alpha (Type 1 error) =.5 You conduct a preliminary survey before you set up the experiment to estimate density Now determine sample size necessary for your experiment Can it be done???

Preliminary survey Replicate Sample 1 Sample 2 1 3 6 2 9 9 3 18 13 4 15 8 5 1 12 6 8 15 7 15 9 8 7 1 9 6 13 1 11 7 Mean 1.2 1.2 Standard Deviation 4.6 2.9

Replication and independence Control sites Impact sites Pseudoreplication in space Pseudoreplication in time Pseudoreplication in time and space