Statistical concepts in climate research

Similar documents
Ingredients of statistical hypothesis testing and their significance

LECTURE 11. Introduction to Econometrics. Autocorrelation

Chapter 24. Comparing Means. Copyright 2010 Pearson Education, Inc.

Chapter 16. Simple Linear Regression and Correlation

Lecture 10: F -Tests, ANOVA and R 2

FinQuiz Notes

White Noise Processes (Section 6.2)

MS&E 226: Small Data

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS

Chapter 16. Simple Linear Regression and dcorrelation

Chapter 1 Statistical Inference

E 4160 Autumn term Lecture 9: Deterministic trends vs integrated series; Spurious regression; Dickey-Fuller distribution and test

EXAMINATION: QUANTITATIVE EMPIRICAL METHODS. Yale University. Department of Political Science

Chapter 9 Inferences from Two Samples

CONTENTS OF DAY 2. II. Why Random Sampling is Important 10 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006

STAT 512 MidTerm I (2/21/2013) Spring 2013 INSTRUCTIONS

Lecture 4: Statistical Hypothesis Testing

Note that we are looking at the true mean, μ, not y. The problem for us is that we need to find the endpoints of our interval (a, b).

Lecture 10: Generalized likelihood ratio test

Seasonal drought predictability in Portugal using statistical-dynamical techniques

Data Mining Chapter 4: Data Analysis and Uncertainty Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Chapter 23: Inferences About Means

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity

Normal (Gaussian) distribution The normal distribution is often relevant because of the Central Limit Theorem (CLT):

Iris Wang.

Review: General Approach to Hypothesis Testing. 1. Define the research question and formulate the appropriate null and alternative hypotheses.

Introduction to the Analysis of Variance (ANOVA)

Quantitative Methods for Economics, Finance and Management (A86050 F86050)

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

7.2 One-Sample Correlation ( = a) Introduction. Correlation analysis measures the strength and direction of association between

Physics 509: Bootstrap and Robust Parameter Estimation

Psychology 282 Lecture #4 Outline Inferences in SLR

UNIT 4 RANK CORRELATION (Rho AND KENDALL RANK CORRELATION

Confidence Intervals. - simply, an interval for which we have a certain confidence.

1 Random walks and data

13: Additional ANOVA Topics. Post hoc Comparisons

Outline. Nature of the Problem. Nature of the Problem. Basic Econometrics in Transportation. Autocorrelation

WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A

Journeys of an Accidental Statistician

Chapter 23. Inferences About Means. Monday, May 6, 13. Copyright 2009 Pearson Education, Inc.

Multiple t Tests. Introduction to Analysis of Variance. Experiments with More than 2 Conditions

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.

Rockefeller College University at Albany

Wavelet Methods for Time Series Analysis. Motivating Question

WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A

ECON Introductory Econometrics. Lecture 7: OLS with Multiple Regressors Hypotheses tests

Inferences for Correlation

Lecture Slides. Elementary Statistics. by Mario F. Triola. and the Triola Statistics Series

AMS7: WEEK 7. CLASS 1. More on Hypothesis Testing Monday May 11th, 2015

Wavelet Methods for Time Series Analysis. Part IX: Wavelet-Based Bootstrapping

THE UNIVERSITY OF CHICAGO Graduate School of Business Business 41202, Spring Quarter 2003, Mr. Ruey S. Tsay

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression

Applied Regression Analysis

NOTES AND CORRESPONDENCE. A Quantitative Estimate of the Effect of Aliasing in Climatological Time Series

Econometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

One-Sample and Two-Sample Means Tests

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Chapter 12: Inference about One Population

AUTOCORRELATION. Phung Thanh Binh

Harvard University. Rigorous Research in Engineering Education

Applying the ergodic assumption to ensembles of non-stationary simulations by using the CMIP3 multi-model dataset

Student s t-distribution. The t-distribution, t-tests, & Measures of Effect Size

R 2 and F -Tests and ANOVA

Note that we are looking at the true mean, μ, not y. The problem for us is that we need to find the endpoints of our interval (a, b).

M(t) = 1 t. (1 t), 6 M (0) = 20 P (95. X i 110) i=1

Lecture 5. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1

Global Climate Models and Extremes

Introduction to Statistical Inference Lecture 8: Linear regression, Tests and confidence intervals

Comparison of Bayesian and Frequentist Inference

Changes in Weather and Climate Extremes and Their Causes. Xuebin Zhang (CRD/ASTD) Francis Zwiers (PCIC)

Volatility. Gerald P. Dwyer. February Clemson University

IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors

Statistics Introductory Correlation

Making sense of Econometrics: Basics

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

Introductory Econometrics

Lecture 3: Statistical sampling uncertainty

Hypothesis Testing. We normally talk about two types of hypothesis: the null hypothesis and the research or alternative hypothesis.

E 4101/5101 Lecture 9: Non-stationarity

Swarthmore Honors Exam 2012: Statistics

Lecture Slides. Elementary Statistics Eleventh Edition. by Mario F. Triola. and the Triola Statistics Series 9.1-1

Computational Data Analysis!

2008 Winton. Statistical Testing of RNGs

Lecture 6: Linear Regression

Week 5 Quantitative Analysis of Financial Markets Characterizing Cycles

Correlation. We don't consider one variable independent and the other dependent. Does x go up as y goes up? Does x go down as y goes up?

Statistical Data Analysis Stat 3: p-values, parameter estimation

Introduction to Estimation. Martina Litschmannová K210

Chapter 8: Model Diagnostics

Regression of Time Series

Inference in Normal Regression Model. Dr. Frank Wood

Statistics: Error (Chpt. 5)

Solutions to the Spring 2015 CAS Exam ST

Answer all questions from part I. Answer two question from part II.a, and one question from part II.b.

Review of Statistics 101

G. S. Maddala Kajal Lahiri. WILEY A John Wiley and Sons, Ltd., Publication

CS-E3210 Machine Learning: Basic Principles

ANOVA TESTING 4STEPS. 1. State the hypothesis. : H 0 : µ 1 =

Ref.: Spring SOS3003 Applied data analysis for social science Lecture note

Transcription:

Statistical concepts in climate research (some misuses of statistics in climatology) Slava Kharin Canadian Centre for Climate Modelling and Analysis, Environment Canada, Victoria, British Columbia, Canada Banff Summer School, May 2008

Statistical concepts in climate research Lecture I - Misuses of statistics in climate research testing hypotheses suggested by the data; serial correlation; using statistical recipes as black-box tools; Lecture II - Hypothesis testing Type I and Type II errors, significance, power, etc historical developments and controversy around classical statistical significance test and its interpretation. Lecture III - Basics of Bayesian statistics. Introduction to Bayesian statistics. Bayesian climate change assessment.

References Analysis of Climate Variability: Applications of Statistical Techniques, by H. von Storch and A. Navarra (Eds.) (proceedings of an Autumn School on Elba, 1993) Statistical Analysis in Climate Research, H. von Storch and F. W. Zwiers, 1999, Cambridge.

Misuses of Statistical Analysis in Climate Research (von Storch 1993) Obsession with statistical recipes, in particular, hypothesis testing Pavlov s dog reaction to any hypothesis derived from data, demanding statistical significance test; Use of statistical techniques as a black-box, or cook-book recipe (standard example is disregard of serial correlation). Misunderstanding or misinterpreting the names (e.g. decorrelation time, p-values as probability of hypotheses) Use of sophisticated techniques...there is sometimes unwarranted expectation of miracle-like results from very advanced techniques.

Statistical analysis in climate research There is basically one observational record in climate research. It is hardly possible to perform real independent experiments. Classical hypothesis testing is frequentist in nature (repeated sampling). Dynamical models can create independent data, subject to model deficiencies. Data are inter-related, both in space and time. This is useful for reconstruction of the space-time state of the climate system from a limited set of observations. But the basic premise of many standard statistical tests is violated.

Two types of statistical data analysis (Tukey 1977) Exploratory analysis (EA) is the art of extracting information from a data set. The objectives of EA are: to suggest and formulate hypotheses about the workings of the climate system; to assess assumptions on which statistical inference (i.e., estimation and hypothesis testing) will be based; to support selection of statistical tools and techniques; provide a basis for further data collection (e.g., through numerical experiments). Confirmatory analysis (CA) is then performed to confirm independently the hypotheses.

Climatology as a one-experiment science Climatology is a one-experiment science. There is basically one observational record in climate. Climate researchers essentially look at the same record to build and test their hypotheses. The processes of building and testing hypotheses are hardly separable. Only dynamical models can provide independent datasets but subject to model limitations to describe the real world. But even dynamical models (GCMs,CGCMs,ESMs) are tuned to the available observational record. This leads to testing hypotheses suggested by the data using the same data.

Mexican Hat Mexican Hat is a unique combination of vertically arranged stones in the desert at the border of Utah and Arizona (google images).

Is the Mexican Hat man-made? (von Storch & Zwiers 1999) The null hypothesis: The Mexican Hat is of natural origin Form a statistic: { 1 if x forms a Mexican Hat t(x) = 0 otherwise where x is any pile of rocks. We need the probability distribution of t(x) under null hypothesis: collect samples of x, say, n = 10 6. Mexican Hat is famous for good reasons there is only one t(p) = 1. Estimate the probability distribution of t(p): { 10 6 for k = 1 prob(t(x) = k) = 1 10 6 for k = 0 We reject the null hypothesis at a small significance level.

Where is the logic flaw in the Mexican Hat paradox? Fundamental problem: the null hypothesis is not independent of the data which are used to conduct the test. We know a priori that the Mexican Hat is a very rare event. The fact that we didn t find any other combination like that cannot be used as independent evidence against its natural origin. The same trick can be used to prove that any rare event is non-natural (and therefore we are tricked into searching for external causes)

Labitzke and van Loon, 1988 K.Labitzke studied the relationship between solar activity and the stratospheric relationship at the North Pole. there was no obvious relationship if all years were considered. She saw that there was positive correlation during WQBO and negative correlation during EQBO.

Testing hypotheses suggested by the data The limitation of a single observational record has a severe consequence: Many people screen the observational record for rare and unusual events. Typically, only unusual results are eventually published in literature (publication bias). Presumably, some of these unusual events are nevertheless usual. Although we can compare these unusual events with all other events in the record, they cannot be contested with a statistical test since independent data are not available.

Testing hypotheses suggested by the data No statistical test, regardless of its power, can overcome this problem! Possible solutions: extend record backwards; use suitably designed experiments with GCMs; postpone testing until new data become available.

Testing hypotheses suggested by the data No statistical test, regardless of its power, can overcome this problem! Possible solutions: extend record backwards; use suitably designed experiments with GCMs; postpone testing until new data become available.

Neglecting Serial Correlation Problem: Most standard statistical techniques are derived with explicit need for statistically independent data. However, almost all climatic data are somehow correlated in time. Consequence: Statistical tests becomes too liberal, that is, the no-signal hypothesis is incorrectly rejected more often than it is implied by the significance level when it is true. The results of a statistical tests depends strongly on the serial correlation. Even if serial correlation is not neglected, its effects are not always correctly accounted for in statistical tests.

Comparison of means test (Zwiers & von Storch 1994) There are two main assumptions in the standard t-test: sampling assumption observations are statistically independent; distributional assumption observations are normally distributed; t-statistic for one sample X: t = X µ 0 S X / n = D S D where S X is the sample standard deviation. The t-statistic has Student s t distribution with n 1 d.f.

Comparison of means test: violation of assumptions How sensitive is the t-test to departures from the sampling and distributional assumptions? the t-test is moderately robust against departures from the normal distribution, especially when large samples are available, but it depends on the nature of deviations from the normality. the t-test is strongly affected by serial correlation, both for small or large samples.

Comparison of means test Type I errors: departures from the normality (Pearson distribution family) Rejection rate (%) 25 20 15 10 5 0 skew=0 skew=0.5 skew=1 skew=1.5 skew=2 20 40 60 80 100 Sample size The t test is slightly too liberal when distributions are skewed but not excessively so.

Comparison of means test Power: departures from the normality (Pearson distribution family) 100 50 L=30 L=50 L=100 Power (%) 20 10 5 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 σ (%)

Comparison of means test Type I errors: departures from the normality (bimodal) 15 Rejection rate (%) 10 5 0 20 40 60 80 100 Sample size

Comparison of means test Power: departures from the normality (bimodal) 100 50 L=30 L=50 L=100 Power (%) 20 10 5 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 σ (%)

Comparison of means test: departures from the independence We assume that a climate process can be approximately represented by a auto-regressive first order (AR1) process: X t µ X = ρ 1 (X t 1 µ X ) + ǫ t where ǫ t is a Gaussian white noise process, ρ 1 is the lag-1 correlation coefficient. (ENSO, QBO, or MJO are suspect)

Comparison of means test: departures from the independence Rejection rate 0.5 0.4 0.3 0.2 0.1 0.0 rho=0 rho=0.3 rho=0.45 rho=0.6 rho=0.75 20 40 60 80 100

Subsampling the data If a data record is long, one can try to form subsamples, separated by some time interval. t = x µ S X / n = D S D if one is interested in the differences of means in a particular season (DJF), it is often more prudent to use seasonal means, rather than monthly means. Choice of sampling interval is not trivial (either from physically considerations, or from inspecting autocorrelation function). throwing away much of data.

The equivalent sample size, ESS The main reason why the t test is too liberal for serially correlated data is that the denominator underestimate the sampling standard errors of the numerator: The concept of effective or equivalent sample size is used to correct this problem (Thiebaux and Zwiers (1984) discuss interpretation and estimation of ESS). If {X 1,...,X n } are n i.i.d. random variables then Var(X) = 1 n Var(X) If {X 1,...,X n } are serially correlated then Var(X) = 1 n e Var(X) n n e = 1 + 2 n 1 τ=1 (1 τ n )ρ(τ) n1 ρ 1, n 1, 1 + ρ 1 ρ(τ) is the autocorrelation function.

The equivalent sample size Interpretation: samples of independent observations of size n e contain as much information about the difference of means as samples of serially correlated observations of size n. Important: ESS is not uniquely defined. T = D/S D The expression for n e depends on the definition of D. If our interest is something else (e.g., variance) the definition of information, and therefore of the ESS, changes. Therefore, it is incorrect to interpret n e as the number of effectively independent observations in the absolute sense.

t-test with the equivalent sample size When samples are sufficiently large then t = X µ 0 S X / n e has approximately a standard normal distribution N(0, 1). When samples are small it is often assumed that t behaves as Student s t statistic with (2n e 2) d.f. Correct? Wrong! This assumption is not correct for small samples.

t-test with the equivalent sample size When samples are sufficiently large then t = X µ 0 S X / n e has approximately a standard normal distribution N(0, 1). When samples are small it is often assumed that t behaves as Student s t statistic with (2n e 2) d.f. Correct? Wrong! This assumption is not correct for small samples.

t-test with the known equivalent sample size The actual rejection rate of the t test with known n e (adapted after Zwiers & von Storch 1994) 0.06 0.05 Rejection rate 0.04 0.03 0.02 0.01 0.00 rho=0 rho=0.3 rho=0.45 rho=0.6 rho=0.75 20 40 60 80 100 The t test is too conservative for small sample sizes, and therefore, wrong.

t-test with the estimated equivalent sample size The actual rejection rate of the t test with n e estimated from the data Rejection rate 0.25 0.20 0.15 0.10 0.05 0.00 rho=0 rho=0.3 rho=0.45 rho=0.6 rho=0.75 20 40 60 80 100 The t test is too liberal for small sample sizes, and therefore, wrong.

t-test with the table look-up Zwiers and Storch (1994) recommend: t test should not be used with equivalent sample sizes smaller than 30 (note that 30 is for the ESS, not sample size). If ESS<30, table look-up test should be used (2 sample test): compute standard t = X Y (S 2 X +SY 2 )/n compute the pooled lag-1 correlation coefficient n i=2 ρ 1 = x i x i 1 + n i=2 y i y i 1 n i=1 y 2 + n i=1 y i 2 Use the look-up tables for n and ρ 1 to determine critical values.

Example of using the ESS in literature

Example of using the ESS in literature Use ERA-40 reanalyses to study ENSO/QBO effects on polar temperature, e.g. in the NDFJ season ( 200 months). composites for different ENSO/QBO phases are based on 10-50 months. Use monthly data to increase sample size. ESS is estimated as N e = N 1 r 1+r (an approximation for N is large, bad small sample behaviour). normality is evaluated in MC tests assuming that monthly means are independent. perform a great number of statistical significance tests. This study highlight several possible misuses of statistics: obsession with hypothesis testing (perhaps providing confidence intervals would be more preferable) the use of statistical techniques like a cook-book recipe (equivalent degrees of freedom)

Example of using the ESS in literature Use ERA-40 reanalyses to study ENSO/QBO effects on polar temperature, e.g. in the NDFJ season ( 200 months). composites for different ENSO/QBO phases are based on 10-50 months. Use monthly data to increase sample size. ESS is estimated as N e = N 1 r 1+r (an approximation for N is large, bad small sample behaviour). normality is evaluated in MC tests assuming that monthly means are independent. perform a great number of statistical significance tests. This study highlight several possible misuses of statistics: obsession with hypothesis testing (perhaps providing confidence intervals would be more preferable) the use of statistical techniques like a cook-book recipe (equivalent degrees of freedom)

Conclusions Don t be obsessed with hypothesis testing when examining observational data statistical hypothesis testing is problematic and cannot be viewed as objective and unbiased judge of the null hypothesis under these circumstances; Be careful about the effects of serial correlation on the results of statistical inference (such as estimation and hypothesis testing) some off-the-shelf methods and tools behave badly for small samples.

References Analysis of climate variability Application of Statistical Techniques. Edited by H. von Storch and A. Navarra. Springer. 1995. Thiébaux, H. J., and F. W. Zwiers, 1984: The Interpretation and Estimation of Effective Sample Size. J. Appl. Meteor., 5, 800 811 Francis W. Zwiers, and H. von Storch, 1994: Taking Serial Correlation into Account in Tests of the Mean. J. Climate, 8, 336 351. Zhang, X., and F.W. Zwiers: 2004: Comment on Applicability of prewhitening to eliminate the influence of serial correlation on the Mann-Kendall test. Water Resources Research. von Storch, H., and F. W. Zwiers, 1999: Statistical Analysis in Climate Research. Cambridge.