Fin285a:Computer Simulations and Risk Assessment Section 2.3.2:Hypothesis testing, and Confidence Intervals

Size: px

Start display at page:

Download "Fin285a:Computer Simulations and Risk Assessment Section 2.3.2:Hypothesis testing, and Confidence Intervals"

Brittany Christine Melton
5 years ago
Views:

1 Fin285a:Computer Simulations and Risk Assessment Section 2.3.2:Hypothesis testing, and Confidence Intervals

2 Overview Hypothesis testing terms Testing a die Testing issues Estimating means Confidence intervals Fall 2017: LeBaron Fin285a: / 51

3 Preview Correct models/parameters? Data similar? Has something changed in the data? Use one series to predict another Confidence intervals How good are parameter estimates? Closely related to hypothesis testing Fall 2017: LeBaron Fin285a: / 51

4 Simulation methodology Get statistics from real data Simulate model Compare model simulations to real data Could statistics (random variables) be draws from model? Fall 2017: LeBaron Fin285a: / 51

5 Hypothesis testing terms Testing a die Testing issues Estimating means Confidence intervals Hypothesis testing terms Fall 2017: LeBaron Fin285a: / 51

6 Null hypothesis Null hypothesis Assumption about how the world works Assume this is true Could data have come from this machine/theory/conjecture? Do you need more/other data? Fall 2017: LeBaron Fin285a: / 51

7 More terms Test statistic Observed statistic (random variable) p-value Probability of observing a given test statistic from the null hypothesis Fall 2017: LeBaron Fin285a: / 51

8 More terms Test statistic Observed statistic (random variable) p-value Probability of observing a given test statistic from the null hypothesis Example Trading profits = 100 (test statistic) Given a random walk for prices (null hypothesis) Probability(profits >= 100) = 0.25 (p-value) Fall 2017: LeBaron Fin285a: / 51

9 Three interrelated concepts p-values Hypothesis tests/critical values Confidence intervals Start with histogram from a null hypothesis, and a test statistic (next slide). Fall 2017: LeBaron Fin285a: / 51

10 Test statistic and null distribution Frequency X Unusual?? Fall 2017: LeBaron Fin285a: / 51

11 Test statistic and null distribution Frequency X Unusual?? Pr(X > 1.3) = 0.10 Fall 2017: LeBaron Fin285a: / 51

12 Probability questions you can ask Pr(X > t) Probability that the null hypothesis gives a value larger than the test statistic Fall 2017: LeBaron Fin285a: / 51

13 Probability questions you can ask Pr(X > t) Probability that the null hypothesis gives a value larger than the test statistic Pr(X < t) Probability that the null hypothesis gives a value smaller than the test statistic Fall 2017: LeBaron Fin285a: / 51

14 Probability questions you can ask Pr(X > t) Probability that the null hypothesis gives a value larger than the test statistic Pr(X < t) Probability that the null hypothesis gives a value smaller than the test statistic Pr( X k > t k ) Probability that the null hypothesis gives a value farther from k than the test statistic These are all a form of p-value Fall 2017: LeBaron Fin285a: / 51

15 Hypothesis test Test whether test statistic could have come from null data generator Answer: reject or cannot reject null hypothesis Test usually involves some critical value, C Fall 2017: LeBaron Fin285a: / 51

16 Hypothesis test Test whether test statistic could have come from null data generator Answer: reject or cannot reject null hypothesis Test usually involves some critical value, C Reject null hypothesis when, t > C, or (one tailed test) t < C,or (one tailed test) t k > C (two tailed test) Fall 2017: LeBaron Fin285a: / 51

17 Critical value (one tail) Frequency X Example: reject null if t > C(t > 1.3). Probability of rejecting, when null is true is Pr(X > C) = 0.1 Fall 2017: LeBaron Fin285a: / 51

18 Critical value (two tail) Frequency X Reject null if t > C. Probability of rejecting, when null is true is Pr( X > C) = 0.05 (area in each tail is 0.025) Fall 2017: LeBaron Fin285a: / 51

19 One versus two tailed tests Depends on the spirit of the question and the alternative models you are thinking about Think about a sample mean as an example You have two samples, and the estimated mean has changed from t to s. If you are asking if it could have increased by as much as you saw in the data, then a one tail test, t s > C is probably in order. If you are asking if it could have decreased by as much as you saw in the data, then a one tail test, t s < C is probably in order If you are asking if it could have changed by as much as you saw in the data, then a two tail test, t s > C is probably in order Fall 2017: LeBaron Fin285a: / 51

20 Hypothesis testing terms Testing a die Testing issues Estimating means Confidence intervals Testing a die Fall 2017: LeBaron Fin285a: / 51

21 Testing a die You ve observed the following rolls of a die out of 6000 rolls Could this have come from a fair die with prob 1/6 on each side? Fall 2017: LeBaron Fin285a: / 51

22 Dietest.m 1.Think up a test statistic 2.Roll 6000 dies with sample 3.Check how the value of the test statistic from the original data compares with the distribution from the simulations 4.Python: dietest.py Fall 2017: LeBaron Fin285a: / 51

23 A Bayesian moment What if you want to assess the probability of different types of biased coins given some data Pr(die data) For this you will need other tools Most likely Bayesian statistical methods Classical stats often involves precise testing of a somewhat narrow null hypothesis Fall 2017: LeBaron Fin285a: / 51

24 Hypothesis testing terms Testing a die Testing issues Estimating means Confidence intervals Testing issues Fall 2017: LeBaron Fin285a: / 51

25 Size and power Prob(reject null true) = size of test = Type I error Prob(reject null is false) = power of test Prob(not reject null is false) = Type II error = (1-power) Fall 2017: LeBaron Fin285a: / 51

26 Mushrooms and toadstools Test for toadstool (poisonous) Null = mushroom Reject (don t eat) if test statistic rejects Goal: eat mushrooms, throw out toadstools Fall 2017: LeBaron Fin285a: / 51

27 Mushrooms and toadstools Test for toadstool (poisonous) Null = mushroom Reject (don t eat) if test statistic rejects Goal: eat mushrooms, throw out toadstools Type I error: Probability of throwing out good mushrooms: Prob(reject null true) Fall 2017: LeBaron Fin285a: / 51

28 Mushrooms and toadstools Test for toadstool (poisonous) Null = mushroom Reject (don t eat) if test statistic rejects Goal: eat mushrooms, throw out toadstools Type I error: Probability of throwing out good mushrooms: Prob(reject null true) Power: probability of throwing out toadstools: Prob(reject null is false) Fall 2017: LeBaron Fin285a: / 51

29 Mushrooms and toadstools Test for toadstool (poisonous) Null = mushroom Reject (don t eat) if test statistic rejects Goal: eat mushrooms, throw out toadstools Type I error: Probability of throwing out good mushrooms: Prob(reject null true) Power: probability of throwing out toadstools: Prob(reject null is false) Type II error: Probability of accepting (eating) a toadstool: Prob(not reject null is false) Fall 2017: LeBaron Fin285a: / 51

30 Hypothesis testing terms Testing a die Testing issues Estimating means Confidence intervals Estimating means Fall 2017: LeBaron Fin285a: / 51

31 Estimate mean for long run stock returns Long range U.S. data ( ) Annual returns (with dividends) Real returns (inflation adjusted) Fall 2017: LeBaron Fin285a: / 51

32 Goals of this example Move from p-values, critical values, to confidence intervals Compare analytic, monte-carlo, and bootstrap approaches Fall 2017: LeBaron Fin285a: / 51

33 Sample statistics ˆσ 2 = 1 T 1 ˆθ = 1 T T t=1 R t ( ) T (R t ˆθ) 2 ( ) t=1 θ = E(R t ), σ 2 = E(R t θ) 2 ( ) Fall 2017: LeBaron Fin285a: / 51

34 Basic analytic tests (t-test) Z = ˆθ θ ˆσ/ T ( ) Assume that ˆθ is normal. Z is distributed with a student-t distribution with T 1 degrees of freedom. (t T 1 ) Null, θ = 0.06 Z = 1.82 Pr(t T 1 > Z) = 0.035, this is the p-value Probability that it came from this distribution (θ = 0.06) is small, but not impossible What about θ = 0.05? Z = 2.60 Pr(t T 1 > Z) = Fall 2017: LeBaron Fin285a: / 51

35 Monte-carlo test Assume normal for R t Assume null of θ = 0.06 (population) Set σ 2 to sample estimate, ˆσ 2 Generate monte-carlo, ˆθ mc, for many draws of T length samples Compare ˆθ to this computer generated distribution Pr(ˆθ mc > ˆθ) Fall 2017: LeBaron Fin285a: / 51

36 Bootstrap test Assume R t is the population Data readjust (need to shift R t to null) R a t = R t ˆθ+0.06 Adjust to new series with θ = E(R a t) = 0.06 (population) Remove the population mean for bootstrap, ˆθ, add null hypothesis, 0.06 Redraw new samples of length T, with probability 1/T on each R a t (with replacement, many (B) times) Store each estimated mean ˆθ b Compare ˆθ to this computer generated distribution Pr(ˆθ b > ˆθ) Fall 2017: LeBaron Fin285a: / 51

37 Bootstrap t-test(1) Redraw new samples of length T, with probability 1/T on each R t (with replacement, many (B) times) Estimate the student-t test statistic, Z b, for each bootstrap sample Z b = ˆθ b ˆθ ˆσ b / T ( ) ˆθb and ˆσ b are both estimated for each bootstrap sample drawn from the original set of returns (which represents the population) Fall 2017: LeBaron Fin285a: / 51

38 Bootstrap t-test(2) Z b = ˆθ b ˆθ ˆσ b / T, Z = ˆθ θ ˆσ/ T Store each value Z b Compare Z to this computer generated distribution Pr(Z b > Z) Remember that in bootstrapping, the population (urn) is R t (the sample), and E(R t ) = ˆθ = 1 T ( T t=1 R t) Bootstrap mantra : population = sample (1/T) Fall 2017: LeBaron Fin285a: / 51

39 Python code annualmean.py Performs all 3 tests Fall 2017: LeBaron Fin285a: / 51

40 Mean difference Estimate mean annual real returns across history : : This is an increase of Is this significant/interesting? Use bootstrap to simulate equal expected return (θ) null hypothesis Fall 2017: LeBaron Fin285a: / 51

41 Mean difference: bootstrap Python: annualmeandiff.py ˆθi is the estimated mean for each part i = (1,2) Estimate mean differences ˆd = ˆθ 2 ˆθ 1 Bootstrap technique: Assume entire set of returns is the population (same over time) Draw fake samples i = 1 and i = 2 of appropriate length from the entire sample Estimate mean difference ˆd b = ˆθ b 2 ˆθ b 1 Compare ˆd to ˆd b : Pr(ˆd b > ˆd) Fall 2017: LeBaron Fin285a: / 51

42 Hypothesis testing terms Testing a die Testing issues Estimating means Confidence intervals Confidence intervals Fall 2017: LeBaron Fin285a: / 51

43 Confidence intervals Regions which contain true parameters Show uncertainty about our estimates First experiment: (monte-carlo) Assume annual returns are normal at estimated mean and std. from the data θ = ˆθ, σ = ˆσ Simulate sample of T normal returns Estimate mean, ˆθ, in each sample, and plot distribution How much does it vary around true value? Fall 2017: LeBaron Fin285a: / 51

44 Normal monte-carlo Frequency Estimated mean stock return Distribution with and quantiles. Fall 2017: LeBaron Fin285a: / 51

45 Statistics reminder This simulation is useful, but You should remember that E(ˆθ) = θ, Std.(ˆθ) = σ θ = σ T ˆθ follows a normal distribution, N(θ,σθ 2) 95% of the distribution for ˆθ lies within [θ 1.96σ θ,θ+1.96σ θ ] [θ +Φ 1 (0.025)σ θ,θ+φ 1 (0.975)σ θ ] Φ(x) is the cumulative distribution function for a standard normal N(0,1). Φ 1 (p) is the p-quantile for N(0,1). Fall 2017: LeBaron Fin285a: / 51

46 Location for estimates If the true value of θ = then, The estimated mean from various samples of length T will lie within [0.05,0.10] with probability 0.95 This is nice, but not quite what we want We do know that with probability 0.95 the true expected value, θ, will be within 1.96σ θ of ˆθ, our estimated mean Pr( ˆθ θ < 1.96σ θ ) = 0.95 Fall 2017: LeBaron Fin285a: / 51

47 What we really want Pr( θ ˆθ < 1.96σ θ ) = 0.95 Define the region A = [ˆθ 1.96σ θ,ˆθ +1.96σ θ ] Probability that A covers θ is α = 0.95 Replace σ θ with sample estimate, ˆσ θ = ˆσ/ T = 0.17/ T This is your typical confidence band around the estimate, ˆθ, A = [ , ] Fall 2017: LeBaron Fin285a: / 51

48 Moving confidence region h h h h θ ˆθ The region is about the distribution around θ Since we are interested in distances from ˆθ we simply pick [θ h,θ +h] up and move it to ˆθ Fall 2017: LeBaron Fin285a: / 51

49 Connection to hypothesis tests Remember the two-sided test Pr( ˆθ θ > C 0.05 ) = 0.05 Estimate the critical value C for this (usually don t need θ ) Then find all values of θ where ˆθ θ C 0.05 These would also be the 0.95 confidence region or, all values of θ where the null hypothesis of E(R t ) = θ is not rejected at the 0.05 level Fall 2017: LeBaron Fin285a: / 51

50 Bootstrap confidence intervals Get bootstrap distribution of statistic, ˆθ b,b = 1,...,B Two methods: 1.Normal bands Use the bootstrap to estimate σ θ = std(ˆθ b ), Then use standard normal distribution bands [ˆθ +Φ 1 (α) σ θ,ˆθ +Φ 1 (1 α) σ θ ] Φ(x) is the cumulative distribution function for a standard normal 2.Percentile method Use bootstrap values, ˆθ b, to estimate distribution for ˆθ Then get quantiles for this [q α (ˆθ b ),q 1 α (ˆθ b )] Fall 2017: LeBaron Fin285a: / 51

51 Bootstrap mean Each draw: (x 1,x 2,...,x T ) (x b 1,x b 2,...,x b T ) Probability: ( 1 T, 1 T,..., 1 T ) Do this b = 1,2,...,B times T ˆθ b = 1 T x b t, b = 1,...,B t=1 E(x b t) = 1 T x T x T x T = ˆθ E(ˆθ b ) = 1 T T E(x b t ) = 1 T T ˆθ = ˆθ t=1 t=1 Fall 2017: LeBaron Fin285a: / 51

52 Bootstrap mean E(ˆθ b ) = 1 T T E(x b t) = 1 T T ˆθ = ˆθ t=1 t=1 What does this say? Bootstrap for mean is centered around sample mean Doesn t help get a better point estimate This is true for many (not all) statistics Fall 2017: LeBaron Fin285a: / 51

53 Bootstrap variance σ 2 θ = var(ˆθ b ) = 1 B 1 θ = 1 B B B (ˆθ b θ) 2 b=1 ˆθ b σ θ = b=1 σ 2 θ = std(ˆθ b ) Fall 2017: LeBaron Fin285a: / 51

54 Bootstrap mean confidence interval Frequency Estimated mean stock return Bootstrap distribution (ˆθ b ) with and quantiles. Fall 2017: LeBaron Fin285a: / 51

55 More on the Percentile Method Bootstrap distribution is centered at sample value, why? Population = sample, E(ˆθ b ) = E(R b t) = ˆθ for bootstrap What is going on? Assume bootstrap distribution of ˆθ b centered at ˆθ is the same as ˆθ centered around θ We pick it up and move it to ˆθ as we did in the analytic case Percentile bootstrap does this automatically Assumptions Need symmetric distribution for ˆσ Do NOT need normal distribution for anything Python code: annualmeanconf.py Fall 2017: LeBaron Fin285a: / 51

56 More on Bootstraps, and B B usually needs to be pretty large Depends a little on what you need For std. B around can be ok For small quantiles α = need large B = 100, 000 This might be why you prefer normal (method 1) confidence bands Fall 2017: LeBaron Fin285a: / 51

57 What about asymmetric distributions for ˆθ? Gets complicated Many methods (no dominant method) Most statistics we will look at will be symmetric Fall 2017: LeBaron Fin285a: / 51

58 Why bother with bootstrap? This example designed to be familiar (mean) Normal approximations look good Why bootstrap? Deviations from normality Statistics have no analytics Analytics might be difficult or time consuming Fall 2017: LeBaron Fin285a: / 51

59 Overview Hypothesis testing terms Testing a die Testing issues Estimating means Confidence intervals Fall 2017: LeBaron Fin285a: / 51

Introductory Econometrics. Review of statistics (Part II: Inference)

Introductory Econometrics. Review of statistics (Part II: Inference) Introductory Econometrics Review of statistics (Part II: Inference) Jun Ma School of Economics Renmin University of China October 1, 2018 1/16 Null and alternative hypotheses Usually, we have two competing