(x t. x t +1. TIME SERIES (Chapter 8 of Wilks)

Similar documents
Introduction to Statistical Data Analysis Lecture 7: The Chi-Square Distribution

STP 226 ELEMENTARY STATISTICS NOTES

POLI 443 Applied Political Research

The Chi-Square Distributions

Rama Nada. -Ensherah Mokheemer. 1 P a g e

The Chi-Square Distributions

Statistical Inference for Means

Quantitative Analysis and Empirical Methods

STAT Chapter 13: Categorical Data. Recall we have studied binomial data, in which each trial falls into one of 2 categories (success/failure).

Sociology 6Z03 Review II

We know from STAT.1030 that the relevant test statistic for equality of proportions is:

Chi-Square. Heibatollah Baghi, and Mastee Badii

Cohen s s Kappa and Log-linear Models

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

MA : Introductory Probability

UC Berkeley Math 10B, Spring 2015: Midterm 2 Prof. Sturmfels, April 9, SOLUTIONS

Unit 1 Review of BIOSTATS 540 Practice Problems SOLUTIONS - Stata Users

2 and F Distributions. Barrow, Statistics for Economics, Accounting and Business Studies, 4 th edition Pearson Education Limited 2006

CHI SQUARE ANALYSIS 8/18/2011 HYPOTHESIS TESTS SO FAR PARAMETRIC VS. NON-PARAMETRIC

Applied Natural Language Processing

Topic 21 Goodness of Fit

Finding Relationships Among Variables

Modeling and Performance Analysis with Discrete-Event Simulation

Computer Science, Informatik 4 Communication and Distributed Systems. Simulation. Discrete-Event System Simulation. Dr.

Ling 289 Contingency Table Statistics

Lecture 41 Sections Mon, Apr 7, 2008

Module 10: Analysis of Categorical Data Statistics (OA3102)

Chi-Squared Tests. Semester 1. Chi-Squared Tests

Topic 22 Analysis of Variance

LAB 2. HYPOTHESIS TESTING IN THE BIOLOGICAL SCIENCES- Part 2

SOME BASICS OF TIME-SERIES ANALYSIS

Analysis. Components of a Time Series

Chapter 8: Model Diagnostics

Introduction to Machine Learning CMU-10701

15: CHI SQUARED TESTS

In a one-way ANOVA, the total sums of squares among observations is partitioned into two components: Sums of squares represent:

Solutions to Odd-Numbered End-of-Chapter Exercises: Chapter 14

Week 5 Quantitative Analysis of Financial Markets Characterizing Cycles

STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression

13.1 Categorical Data and the Multinomial Experiment

STAC51: Categorical data Analysis

16.400/453J Human Factors Engineering. Design of Experiments II

Simple Linear Regression for the Climate Data

TUTORIAL 8 SOLUTIONS #

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).

Testing Independence

5 Autoregressive-Moving-Average Modeling

Chapter 10. Discrete Data Analysis

Exam details. Final Review Session. Things to Review

DATA IN SERIES AND TIME I. Several different techniques depending on data and what one wants to do

HYPOTHESIS TESTING: THE CHI-SQUARE STATISTIC

Psych 230. Psychological Measurement and Statistics

Marketing Research Session 10 Hypothesis Testing with Simple Random samples (Chapter 12)

Chapter 10: Chi-Square and F Distributions

Problem set 1 - Solutions

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

Lecture Slides. Elementary Statistics. by Mario F. Triola. and the Triola Statistics Series

Random Number Generation. CS1538: Introduction to simulations

Final Exam Review STAT 212

Lecture Slides. Section 13-1 Overview. Elementary Statistics Tenth Edition. Chapter 13 Nonparametric Statistics. by Mario F.

LOOKING FOR RELATIONSHIPS

Dover- Sherborn High School Mathematics Curriculum Probability and Statistics

10.4 Hypothesis Testing: Two Independent Samples Proportion

Fundamental Probability and Statistics

Formulas and Tables by Mario F. Triola

Pump failure data. Pump Failures Time

Homework 9 (due November 24, 2009)

How do we compare the relative performance among competing models?

This is a multiple choice and short answer practice exam. It does not count towards your grade. You may use the tables in your book.

CHAPTER 4 Analysis of Variance. One-way ANOVA Two-way ANOVA i) Two way ANOVA without replication ii) Two way ANOVA with replication

Summary of Chapter 7 (Sections ) and Chapter 8 (Section 8.1)

Ron Heck, Fall Week 3: Notes Building a Two-Level Model

Practice Problems Section Problems

Statistical Methods in HYDROLOGY CHARLES T. HAAN. The Iowa State University Press / Ames

Modeling and forecasting global mean temperature time series

1.5 Statistical significance of linear trend

Review of Statistics 101

Chapter 8. The analysis of count data. This is page 236 Printer: Opaque this

Practice Questions: Statistics W1111, Fall Solutions

Entering and recoding variables

Purposes of Data Analysis. Variables and Samples. Parameters and Statistics. Part 1: Probability Distributions

Mathematics for Economics MA course

The Difference in Proportions Test

Univariate ARIMA Models

Lecture 10: Generalized likelihood ratio test

Recall the Basics of Hypothesis Testing

Questions 3.83, 6.11, 6.12, 6.17, 6.25, 6.29, 6.33, 6.35, 6.50, 6.51, 6.53, 6.55, 6.59, 6.60, 6.65, 6.69, 6.70, 6.77, 6.79, 6.89, 6.

Confidence Intervals, Testing and ANOVA Summary

CHAPTER 8 MODEL DIAGNOSTICS. 8.1 Residual Analysis

Linear Model Under General Variance Structure: Autocorrelation

Summary of Chapters 7-9

Psych Jan. 5, 2005

Financial Econometrics Review Session Notes 3

THE ROYAL STATISTICAL SOCIETY 2009 EXAMINATIONS SOLUTIONS GRADUATE DIPLOMA MODULAR FORMAT MODULE 3 STOCHASTIC PROCESSES AND TIME SERIES

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Sociology Exam 2 Answer Key March 30, 2012

Introduction to Statistics and Error Analysis II

Graphical Techniques Stem and Leaf Box plot Histograms Cumulative Frequency Distributions

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

Transcription:

45 TIME SERIES (Chapter 8 of Wilks) In meteorology, the order of a time series matters! We will assume stationarity of the statistics of the time series. If there is non-stationarity (e.g., there is a diurnal cycle, or an annual cycle) we will subtract the climatological mean and standardize the x(t)! µ(t) anomalies: z(t). Otherwise, we can stratify the data (e.g., consider "(t) all winters together as a single time series). Time series can be analyzed in time or in frequency domains. Time series models Example of a simple time series model +1! µ " 1 (! µ) + # t +1. This is an autoregressive model of order 1 (AR(1)). Such model can be used to: a) Fit the time series and derive some of its properties. Similar to fitting a theoretical probability distribution to a sample. b) To make a forecast: ˆ +1! µ " 1 (! µ) For discrete time series, the equivalent of autoregressive models are Markov chains. Discrete time series (Markov chains) We have a discrete number of states (e.g., 2 or 3 states). For example, rain:1, no rain: 0. At a given time, the chain can remain in the same state or change state (MECE). Example: 2-state first order Markov chain P 01 p 00 0 1 p 11 p 10

46 For example: a periodic time series 0 0 0 1 1 1 0 0 0 0 Transition probabilities: P( +1 1 0) etc. In the example above: 1 7 ; p 00 6 7 ; p 10 1 3 ; p 11 2 3 Note that + p 00 1; p 10 + p 11 1 both are MECEs. We can also define unconditional probabilities:! 0 P( +1 0)! 1 P( +1 1)! 0 7 10 p 10 1 / 3 1 / 7 + 1 / 3! 1 3 10 1/ 7 1/ 7 + 1 / 3 These represent probability to change to 0 (or 1) probability of being in state 0 (or 1) probability to change (See demonstration later) We can also define Persistence lag-1 autocorrelation r 1 corr( +1, ) r 1 p 11! probability of being i coming from 1, minus probability of being i, coming from 0. r 1 2 3! 1 7 14! 3 21 11 21 We can also compute it as r 1 p 00! p 10 : r 1 6 7! 1 3 18! 7 21 11 21

47 Note: persistence implies that! 1, p 00 + p 11 " 1, i.e., there is a stronger tendency to remain in a state than to change states ( p 00 + p 11 > ). Then from! 1 p 10 +, we get that! " 1, and similarly p 10! " 0 (i.e., if there is persistence, the probability of transitioning into a state from the other is smaller than the unconditional probability of being in that state). Furthermore, p 10 1! p 11 " # 0 1! # 1, or! 1 " p 11. In summary, if there is persistence,! " 1! p 11 and p 10! " 0! p 00. Exercise: show that! 1 Proof of this: + n 0! 1 n 01 / n 0 n 01 / n 0 + 0 / n 01 n 01 + 0 n 0 + n 0 since n 01 0 because on the long run, there have to be as many changes from 0 to 1 as the other way around. Actually! 1 + n 0 is a more natural definition of unconditional probability, so the proof really shows that! 1 as defined before. Exercise: compute r 1 from the lag-1 autocorrelation. Hypothesis testing for the presence of persistence in the time series:

48 +1 0 +1 1 0 6 1 7 Marginal 1 1 2 3 7 3 10 Actual numbers from the series 0 0 0 1 1 1 0 0 0 0 Now the null hypothesis is that there is no persistence, so we create the table corresponding to no persistence: +1 0 +1 1 0 4.9 2.1 7 1 2.1 0.9 3 7 3 10 Marginal For example 4.9! 0! 0 n 7 10 " 7 10 "10 So, to test whether a series really has r 1! 0 (and it s not just sampling), we can use the! 2 distribution with 1 degree of freedom, since the marginal are given from the sample, so that given a single value on the table, the others are determined. Null hypothesis r 0 1. The hypothesis of independence (columns are independent of the rows) is tested with (# observed " # expected) 2! 2 # #expected (see note below), so that classes! 2 (6 " 4.9)2 4.9 + (1" 2.1)2 2.1 + (1" 2.1)2 2.1 + (2 " 0.9)2 0.9 2.74 Since the 5%! 2 for 1 d.o.f is 3.84, the persistence of the time series we created is not significant at a 95% level: we could have obtained a size 10 sample with such apparent persistence even though the population has no persistence with a probability greater than 5%. Note: Reminder from class 3:

49 An important use of the chi-square is to test goodness of fit: If you have a histogram with n bins, and a number of observations O i and expected number of observations E i (e.g., from a parametric distribution) in each bin, then the goodness of fit of the pdf to the data can be estimated using a chi-square test: X 2 n i1 (O i! E i ) 2 " with n-1 degrees of freedom E i The null hypothesis (that it is a good fit) is rejected at a 5% level of significance if X 2 >! (0.05,n"1) d.o.f. The table above is known as contingency table. +1 0 +1 1 Marginal 0 6 1 7 1 1 2 3 7 3 10 2. The table above has 4 bins but only one We were checking whether the Markov chain has persistence, i.e., whether the value at t+1 is dependent on the value at t. Example of a test of independence in a contingency table. democrat republican independent Women 68 56 32 156 Men 52 72 20 144 120 128 52 300 Marginal A test of independence checks whether the null hypothesis that political affiliation is independent of gender is valid. The null hypothesis would generate a table like democrat republican independent Women 62.40 66.56 27.04 156 Men 57.6 61.44 24.96 144 120 128 52 300 Marginal

50 where, for example, the number of democratic women is obtained as prob. of being a woman* prob. of being a democrat (gender independent)* number of people surveyed (156*300)* (120/300)* 30062.4. Since the marginal are fixed, the number of dof for each row is (c-1)(3-1)2, and the number of dof for each column is (r-1)(2-1)1, so that the total number of dof in this contingency table is (c-1)(r-1)2*12. Here r is the number of rows and c the number of columns. Consider the first column, democrats. If we accept the null hypothesis that the value of p62.4/120 is the gender independent probability that of this group of people, democrats are equally distributed among women and men, this is a binomial distribution with an expected value np 62.40. (The expected value for men is n(1! p) 57.6 ). Suggestion of a demonstration of the test of independence: Only for one column, a binomial distribution, e.g., the probability of being a woman or a man if you are a democrat: Consider the test statistic for this binomial distribution: T ( 68! 62.4) 2 ( 52! 57.6) 2 + 62.4 57.6 ( X! np 1 ) 2 np ( ) 2 + X! n(1! p) 2 n(1! p) ( X! np 1 ) 2 np(1! p) where we have used X 2 n! X 1 and 1 p + 1 1! p 1 p(1! p) The variance of the binomial distribution is np(1! p). Therefore, ( T X! np 1 1) 2 ( ) 2 ( ) 2 + X! np 2 2 X! np 1 1 np 1 np 2 np 1 (1! p 1 ) In other words, T is the mean square of a random (binomial) anomaly divided by its variance, and for large n, when it approaches a normal

51 distribution, this ratio has approximately a X 2 distribution with one degree of freedom. In this case T1.05, and for X 2 0.05,13.841 so just knowing that of a group of 120 democrats 68 were women does not show that women tend to vote democrat. For several columns, this generalizes to T (r!1)(c!1) ( O ij! E ij ) 2 " ~ # 2, which is the test that we have used above. (r!1)(c!1) i1,r j1,c E ij 2 In this case, T6.43, whereas!.05,2 5.99. This shows that the null hypothesis of independence between rows and columns can be rejected with 95% confidence, and political affiliation is gender dependent. End of note Uses of the Markov chain: We can use a 2-state, first order Markov chain to:

52 a) create an artificial time series for, for example yes/no precipitation: From a precipitation series, we can estimate p 00, p 11, and therefore, p 10. Then, if we are in state 0, we get a random number x between zero and 1. If x! p 00, we stay in state 0, otherwise we go to state 1. b) make a forecast: for example, given 0, we can predict P( +1 0) p 00 and P( +1 1) and so on. We could also check for goodness of fit (Wilks, p104), comparing observed data histograms with simulated data with Markov chains. Multistate first-order Markov chain P 21 P 11 p 12 1 2 P 22 p 13 P 31 3 P 23 P 32 P 33

53 Again, the transition probabilities can be derived from the sample, and similar rules are valid, e.g., p 11 + p 12 + p 13 1, etc. Second order Markov chains: p ijk P( +1 k j,!1 i) Becomes complicated!...