Outline. Confidence intervals More parametric tests More bootstrap and randomization tests. Cohen Empirical Methods CS650

Outline Confidence intervals More parametric tests More bootstrap and randomization tests

Parameter Estimation Collect a sample to estimate the value of a population parameter. Example: estimate mean age of CS graduate students from mean age of students in the class. How good is this estimate? What affects the confidence in the estimate?

How to think about confidence intervals A sample mean, x, lies k standard error units away from the population mean µ. That is x = µ + k σ x k σ x How big should k be to ensure that µ falls within k standard error units of the sample mean 95% of the time? µ x

How to think about confidence intervals How big should k be to ensure that µ falls within k standard error units of the sample mean 95% of the time? Suppose for now that the sampling distribution of the sample mean is normal 1.96 σ x 1.96 σ x For 95% of possible values of x, the population mean falls within 1.96 standard error units of x x x µ x x - 1.96 σ x µ x + 1.96 σ x

Another way to think about confidence intervals (Efron and Tibshirani, Introduction to the Bootstrap, p.157) We decide that values of less/greater than ±1.96 ˆ x are implausible, because they give a probability less than =.05 of observing an estimate of as small/large as the one we have already seen 1.96 σ x 1.96 σ x x x µ x x - 1.96 σ x µ x + 1.96 σ x

Confidence interval for mean parallelization factor for KOSO and KOSO*; normal sampling distribution Recall F = SIZE / (NUM-PROC * RUN-TIME) is a parallelization factor for the KOSO/KOSO* experiment. x koso* =.82, s koso* =.23, ˆ koso* = s koso* 150 =.019 95% confidence interval : koso* = x koso* ±1.96 ˆ koso* = 0.783, 0.857 x koso =.74, s koso =.25, ˆ koso = s koso 160 =.0198 95% confidence interval : koso = x koso ±1.96 ˆ koso = 0.701, 0.778

A few more words about confidence intervals The interpretation of a 95% confidence interval is not the population mean is koso = x koso ±1.96 ˆ koso with probability 0.95. The population mean is a constant, it is meaningless to speak of the probability that it is some value. The correct interpretation is with probability.95, the population mean falls within 1.96 standard error units of the sample mean. Note that if the sample is small and the population std is unknown, one uses t variates not Z variates. See book.

How to think about bootstrap confidence intervals Let Π be a parameter and be a sample statistic. We want k such that distance = abs Π - ( ) k with high probability. For instance, k such that Pr( k).95. Since Π is a constant, Pr( k) depends only on. It depends on the probability distribution of, that is, the sampling distribution of. Bootstrap sampling distribution of ρ ρ δ ρ + k ρ + k that cuts off.05 of the distribution is the upper bound on δ 95% of the time, δ will have a smaller value than ρ + k

Bootstrap confidence interval. Logic. Sample Resample with replacement G samples and values of R* Empirical bootstrap sampling distribution of R* 30 20 10 The upper bound of the 95% confidence interval is the.975 G quantile and the lower is the.025 G quantile. 100 110 120

1000 boostrap replications of the mean of runtime for KOSO (recoded as size / (num-proc * run-time)) Bootstrap confidence interval for mean KOSO runtime [0.7, 0.78] 50 40 30 20 10 0.68 0.7 0.72 0.74 0.76 0.78 0.8 Compare with parametric confidence interval: x koso =.74, s koso =.25, ˆ x koso = s koso / 160 =.0198 x koso 1.96 ˆ x koso x koso 1.96 ˆ x koso.701.778

Statistics other than means; Fisher s r-to-z transform for the correlation The sampling distribution of the correlation is a bit wonky but the sampling distribution of is approximately normal with mean and standard deviation z( ) =.5ln 1 + 1 z(r )=.5ln 1+ r 1 r, z(r) = 1 n 3 This means we can test hypotheses about correlations in a parametric way

Are run-time and tree height uncorrelated for alpha =.96? Partition the data, find the correlation between run-time and height for the alpha =.96 partition: r =.412 Use Fisher s r-to-z to get a z score corresponding to r =.412: 400 300 200 100 z(r) =.5ln 1+ r.412 =.5ln1+ 1 r 1.412 =.438 z(r) z( ) Z = Cohen Empirical 1/ Methods ncs650 3 =.438 0 1/ 77 = 3.84 5 10 15 20 25 HEIGHT significant result (two-tailed)

How can we bootstrap the sampling distribution of the correlation? r = corr(x,y) =.412 Repeat k times resample n points into (x*,y*) r* = corr(x*,y*) 400 300 200 100 Shift the distribution of r* by its mean 0.41 to get the Ho : ρ = 0 distribution: 5 10 15 20 25 HEIGHT Is r = corr(x,y) =.412 significant? -0.4-0.3-0.2-0.1 0.0 0.1

Adding confidence intervals to significant results 400 300 200 100 5 10 15 20 25 HEIGHT -0.4-0.3-0.2-0.1 0.0 0.1 Is r = corr(x,y) =.412 significant? Yes, but we suspect outliers Use the unshifted sampling distribution of r* to find the upper.975 and lower.025 quantiles. Confidence interval on ρ is [.225,.529] very wide