Kernel density estimation in R

Size: px
Start display at page:

Download "Kernel density estimation in R"

Transcription

1 Kernel density estimation in R Kernel density estimation can be done in R using the density() function in R. The default is a Guassian kernel, but others are possible also. It uses it s own algorithm to determine the bin width, but you can override and choose your own. If you rely on the density() function, you are limited to the built-in kernels. If you want to try a different one, you have to write the code yourself. STAT474/STAT574 February 24, / 94

2 Kernel density estimation in R: effect of bandwidth for rectangular kernel STAT474/STAT574 February 24, / 94

3 Kernel density estimation in R Note that exponential densities are a bit tricky to estimate to using kernel methods. Here is the default behavior estimating the density for exponential data. > x <- rexp(100) > plot(density(x)) STAT474/STAT574 February 24, / 94

4 Kernel density estimation in R: exponential data with Gaussian kernel STAT474/STAT574 February 24, / 94

5 Violin plots: a nice application of kernel density estimation Violin plots are an alternative to boxplots that show nonparametric density estimates of the distribution in addition to the median and interquartile range. The densities are rotated sideways to have a similar orientation as a box plot. > x <- rexp(100) > install.packages("vioplot") > library(vioplot) > x <- vioplot(x) STAT474/STAT574 February 24, / 94

6 Kernel density estimation in R: violin plot STAT474/STAT574 February 24, / 94

7 Kernel density estimation R: violin plot The violin plot uses the function sm.density() rather than density() for the nonparametric density estimate, and this leads to smoother density estimates. If you want to modify the behavior of the violin plot, you can copy the original code to your own function and change how the nonparametric density estimate is done (e.g., replacing sm.density with density, or changing the kernel used). STAT474/STAT574 February 24, / 94

8 Kernel density estimation in R: violin plot STAT474/STAT574 February 24, / 94

9 Kernel density estimation in R: violin plot STAT474/STAT574 February 24, / 94

10 Kernel density estimation in R: violin plot > vioplot function (x,..., range = 1.5, h = NULL, ylim = NULL, names = NULL, horizontal = FALSE, col = "magenta", border = "black", lty = 1, lwd = 1, rectcol = "black", colmed = "white", pchmed = 19, at, add = FALSE, wex = 1, drawrect = TRUE) { datas <- list(x,...) n <- length(datas) if (missing(at)) at <- 1:n upper <- vector(mode = "numeric", length = n) lower <- vector(mode = "numeric", length = n) q1 <- vector(mode = "numeric", length = n) q3 <- vector(mode = "numeric", length = n)... args <- list(display = "none") if (!(is.null(h))) args <- c(args, h = h) for (i in 1:n) {... smout <- do.call("sm.density", c(list(data, xlim = est.xlim), args)) STAT474/STAT574 February 24, / 94

11 Kernel density estimation There are lots of popular Kernel density estimates, and statisticians have put a lot of work into establishing their properties, showing when some Kernels work better than others (for example, using mean integrated square error as a criterion), determining how to choose bandwidths, and so on. In addition to the Guassian, common choices for the hazard function include Uniform, K(u) = 1/2 I ( 1 u 1) Epanechnikov, K(u) =.75(1 u 2 )I ( 1 u 1) biweight, K(u) = (1 x 2 ) 2 I ( 1 u 1) STAT474/STAT574 February 24, / 94

12 Kernel-smoothed hazard estimation To estimate a smoothed version of the hazard function using a kernal method, first pick a kernel, then use ĥ = 1 b D ( t ti K b i=1 ) H(t i ) where D is the number of death times and b is the babdwidth (instead of h). A common notation for bandwidth is h, but we use b because h is used for the hazard function. Also Ĥ(t) is the Nelson-Aalen estimator of the cumulative hazard function: { 0, if t t 1 H(t) = t i t d i Y i, if t > t 1 STAT474/STAT574 February 24, / 94

13 Kernel-smoothed hazard estimation The variance of the smoothed hazard is D 2 σ [ĥ(t)] = b 2 i=1 [ K ( )] t 2 ti V [ H(t)] b STAT474/STAT574 February 24, / 94

14 Asymmetric kernels A difficulty that we saw with the exponential also can occur here, the estimated hazard can give negative values. Consequently, you can use an asymmetric kernel instead for small t. For t < b, let q = t/b. A similar approach can be used for large t, when t D b < t < t D. In this case, you can use q = (t D t)/b and replace x with x in the kernal density estimate for these larger times. STAT474/STAT574 February 24, / 94

15 Asymmetric kernels STAT474/STAT574 February 24, / 94

16 Asymmetric kernels STAT474/STAT574 February 24, / 94

17 Confidence intervals A pointwise confidence interval can be obtained with lower and upper limits ( [ ĥ(t) exp Z1 α/2σ(ĥ(t)) ] [ Z1 α/2, ĥ(t) ĥ(t) exp σ(ĥ(t)) ]) ĥ(t) Note that the confidence interfal is really a confidence interval for the smoothed hazard function and not a confidence interval for the actual hazard function, making it difficult to interpret. In particular, the confidence interval will depend on the both the kernel and the bandwidth. Coverage probabilities for smoothed hazard estimates (the proportion of times the confidence interval includes the true hazard rate) appears to have ongoing research. STAT474/STAT574 February 24, / 94

18 Asymmetric kernels STAT474/STAT574 February 24, / 94

19 Effect of bandwidth STAT474/STAT574 February 24, / 94

20 Effect of bandwidth Because the bandwidth has a big impact, we somehow want to pick the optimal bandwidth. An idea is to minimize the squared area between the true hazard function and estimated hazard function. This squared area between the two functions is called the Mean Integrated Squared Error (MISE): MISE(b) = E = E τu τ L τu τ L [ĥ(u) h(u)]2 du ĥ 2 (u) du 2E τu τ L ĥ(u)h(u) du + f (h(u)) The last term doesn t depend on b so it is sufficient to minimize the function ignoring the last term. The first term can be estimated by τu τ L ĥ 2 (u) du, which can be estimated using the trapezoid rule from calculus. STAT474/STAT574 February 24, / 94

21 Effect of bandwidth The second term can be approximated by 1 ( ) t ti K H(t i ) H(t j ) b b i j summing over event times between τ L and τ U. Minimizing MISE is can be done approximately by minimizing g(b) = ( ) ui+1 u i 2 [ĥ2 (u i ) ĥ2 (u i+1 )] 2 ( ) t ti K H(t i ) H(t j ) b b i The minimization can be done numerically by plugging in different values of b and evaluating. i j STAT474/STAT574 February 24, / 94

22 Effect of bandwidth STAT474/STAT574 February 24, / 94

23 Effect of bandwidth For this example, the minimum occurs around b = 0.17 to b = 0.23 depending on the kernel. Generally, there is a trade-off with smaller bandwidths having smaller bias but higher variance, and larger bandwidths (more smoothing) having less variance but greater bias. Measuring the quality of bandwidths and kernels using MISE is standard in kernel density estimation (not just survival analysis). Bias here means that E[ĥ(t)] h(t). STAT474/STAT574 February 24, / 94

24 Section 6.3: Estimation of Excess Mortality The idea for this topic is to compare the survival curve or hazard rate for one group against a reference group, particular if the non-reference group is thought to have higher risk. The reference group might come from a much larger sample, so that its survival curve can be considered to be known. An example is to compare the mortality for psychiatric patients against the general population. You could use census data to get the lifetable for the general population, and determine the excess mortality for the psychiatric patients. Two approaches are: a multiplicative model, and an additive model. In the multiplicative model, belonging to a particular group multiplies the hazard rate by a factor. In the additive model, belonging to a particular group adds a factor to the hazard rate. STAT474/STAT574 February 24, / 94

25 Excess mortality For the multiplicative model, if there is a reference hazard rate of θ j (t) for the jth individual in a study (based on sex, age, ethnicity, etc.), then due to other risk factors, the hazard rate for the jth individual is h j (t) = β(t)θ j (t) where β(t) 1 implies that the hazard rate is higher than the reference hazard. We define B(t) = t 0 β(u) du as the cumulative relative excess mortality. STAT474/STAT574 February 24, / 94

26 Excess mortality Note that d dt = β(t). To estimate B(t), let Y j(t) = 1 if the jth individual is at risk at time t. Otherwise, let Y j (t) = 0. Here Y j (t) is defined for left-truncated and right-censored data. Let Q(t) = n θ j (t)y j (t) j=1 where n is the sample size. Then we estimate B(t) by B(t) = ti t d i Q(t i ) This value is comparing the actual number of deaths that have occurred by time t i with the expected number of deaths based on the hazard rate and number of patients available to have died. STAT474/STAT574 February 24, / 94

27 Excess mortality The variance is estimated by V [ B(t)] = ti t d i Q(t i ) 2 β(t) can be estimated by slope of B(t), which can be improved by using kernel-smoothing methods on B(t). STAT474/STAT574 February 24, / 94

28 Excess mortality For the additive model, the hazard is h j (t) = α(t) + θ j (t) Similarly to the multiplicative model, we estimate the cumulative excess mortality A(t) = t 0 α(u) du In this case the expected cumulative hazard rate is n t Θ(t) = θ j (u) Y j(u) Y (u) du where is the number at risk at time u. j=1 0 Y (u) = n Y j (u) j=1 STAT474/STAT574 February 24, / 94

29 Excess mortality The estimated excess mortality is Â(t) = t i t d i Y i Θ(t) where the first term is the Nelson-Aalen estimator of the cumulative hazard. The variance is V [Â(t)] = ti t d i Y (t) 2 STAT474/STAT574 February 24, / 94

30 Excess mortality For a lifetable where times are every year, you can also compute Θ(t) = Θ(t 1) + t λ(a j + t 1) Y (t) where a j is the age at the beginning of the study for patient j and λ is the reference hazard. Note that Θ(t) is a smooth function of t while Â(t) is has jumps. STAT474/STAT574 February 24, / 94

31 Excess mortality A more general model is to combine multiplicative and additive components, using h j (t) = β(t)θ j (t) + α(t) which is done in chapter 10. STAT474/STAT574 February 24, / 94

32 Example: Iowa psychiatric patients As an example, starting with the multiplicative model, consider 26 psychiatric patients from Iowa, where we compare to census data. STAT474/STAT574 February 24, / 94

33 Iowa psychiatric patients STAT474/STAT574 February 24, / 94

34 Census data for Iowa STAT474/STAT574 February 24, / 94

35 Excess mortality for Iowa psychiatric patients STAT474/STAT574 February 24, / 94

36 Excess mortality for Iowa psychiatric patients STAT474/STAT574 February 24, / 94

37 Excess mortality The cumulative excess mortality is difficult to interpret. The slope of the curve is more meaningful. The curve is relatively linear. If we consider age 10 to age 30, the curve goes from roughly 50 to 100, suggesting a slope of (100 50)/(30 10) = 2.5, so that patients aged 10 to 30 had a roughly 2.5 times higher chance of dying. This is a fairly low-risk age group, for which suicide is high risk factor. Note that the census data might include psychiatric patients who have committed suicide, so we might be comparing psychiatric patients to the general population which includes psychiatric patients, as opposed to psychiatric patients compared to people who have not been psychiatric patients, so this might bias results. STAT474/STAT574 February 24, / 94

38 Survival curves You can use the reference distribution to inform the survival curve instead of just relying on the data. This results in an adjusted or corrected survival curve. Let S (t) = exp[ Θ(t)] (or use the cumulative hazard based on multiplying the reference hazard by the excess harzard) and let Ŝ(t) be the standard Kaplan-Meier survival curve (using only the data, not the reference survival data). Then S c (t) = Ŝ(t)/S (t) is the corrected survival function. The estimate can be greater than 1, in which case the estimate can be set to 1. Typically, S (t) is less than 1, so that dividing by this quantity increases the estimated survival probabilities. This is somewhat similar in Bayesian statististics to the use of the prior, using the reference survival times as a prior for what the psychiatric patients are likely to experience. Consequently, the adjusted survival curve is in between the kaplan-meier (data only) estimate, and the reference survival times. STAT474/STAT574 February 24, / 94

39 Survival curves STAT474/STAT574 February 24, / 94

40 Survival curves STAT474/STAT574 February 24, / 94

41 Bayesian nonparametric survival analysis The previous example leads naturally to Bayesian nonparametric survival analysis. Here we have prior information (or prior beliefs) about the shape of the survival curve (such as based on a reference survival function). The survival curve based on this previous information is combined with the likelihood of the survival data to produce a posterior estimate of the survival function. Reasons for using a prior are: (1) to take advantage of prior information or expertise of someone familiar with the type of data, (2) to get a reasonable estimate when the sample size is small. STAT474/STAT574 February 24, / 94

42 Bayesian survival analysis In frequentist statistical methods, parameters are treated as fixed, but unknown, and an estimator is chosen to estimate the parameters based on the data and a model (including model assumptions). Parameters are unknown, but are treated as not being random. Philosophically, the Bayesian approach is to try to model all uncertainty using random variables. Uncertainty exists both in the form of the data that would arise from a probability model as well as the parameters of the model itself, so both observations and parameters are treated as random. Typically, the observations have a distribution that depends on the parameters, and the parameters themselves come from some other distribution. Bayesian models are therefore often hierarchical, often with multiple levels in the hierarchy. STAT474/STAT574 February 24, / 94

43 Bayesian survival analysis For survival analysis, we think of the (unknown) survival curve as the parameter. From a frequentist point of view, survival probabilities determine the probabilities of observing different death times, but there are no probabilities of the survival function itself. From a Bayesian point of view, you can imagine that there was some stochastic process generating survival curves according to some distribution on the space of survival curves. One of the survival curves happened to occur for the population we are studying. Once that survival function was chosen, event times could occur according to that survival curve. STAT474/STAT574 February 24, / 94

44 Bayesian survival analysis STAT474/STAT574 February 24, / 94

45 Bayesian survival analysis We imagine that there is a true survival curve S(t), and an estimated survival curve, Ŝ(t). We define a loss function as L(S, Ŝ) = [Ŝ(t) S(t)]2 dt 0 The function Ŝ that minimizes the expected value of the loss function is called the posterior mean, which is used to estimate the survival function. STAT474/STAT574 February 24, / 94

46 A prior for survival curves A typical way to assign a prior on the survival function is to use a Dirichlet process prior. For a Dirichlet process, we partition the real line into intervals A 1,..., A k, so that P(X A i ) = W i. The numbers (W 1,..., W k ) have a k-dimension Dirichlet distribution with parameters α 1,..., α k. For this to be a Dirichlet distribution, we must have Z i, i = 1,..., k are independent gamma random variables with shape parameter α i and W i = W i Z i k i=1 Z i. By construction, the random numbers s are between 0 and 1 and sum to 1, so when interpreted as probabilities, they form a discrete probability distribution. Essentially, we can think of a Dirichlet distribution as a distribution on unfair dice with k sides. We want to make a die that has k sides, and we want the probabilities of each side to be randomly determined. How fair or unfair the die is partly depends on the α parameters and partly depends on chance itself. STAT474/STAT574 February 24, / 94

47 A prior for survival curves We can also think of the Dirichlet distrbibution as generalizing the beta distribution. A beta random variable is a number between 0 and 1. This number partitions the interval [0,1] into two pieces, [0, x) and [x, 1]. A Dirichlet random variable partitions the interval into k regions, using k 1 values between 0 and 1. The joint density for these k 1 values is f (w 1,..., w k 1 ) = Γ[α α k ) Γ(α 1 ) Γ(α k ) [ k 1 i=1 w α i 1 i ] [ ] k 1 αk 1 1 w i which reduces to a beta density with parameters (α 1, α 2 ) when k = 2. i=1 STAT474/STAT574 February 24, / 94

48 Assigning a prior To assign a prior on the space of survival curves, first assume an average survival function, S 0 (t). The Dirichlet prior determines when the jumps occur, and the exponential curve gives the decay of the curve between jumps. Simulated survival curves when S 0 (t) = e 0.1t and α = 5S 0 (t) are given below. STAT474/STAT574 February 24, / 94

49 Bayesian survival analysis STAT474/STAT574 February 24, / 94

50 Bayesian survival analysis Other approaches are to have a prior for the cumulative hazard function and to use Gibb s sampling or Markov chain Monte Carlo. These topics would be more appropriate to cover after a class in Bayes methods. STAT474/STAT574 February 24, / 94

51 Chapter 7: Hypothesis testing Hypothesis testing is typically done based on the cumulative hazard function. Here we ll use the Nelson-Aalen estimate of the cumulative hazard. The survival function is used to weight differences between the observed and expected cumulative hazard. Recall that the Nelson-Aalen estimate of the cumulative hazard is H(t) = t ti d i Y i In a one-sample problem, you test whether the hazard rate h(t) is equal to some reference hazard, h 0 (t). The null hypothesis is H 0 : h(t) = h 0 (t). Under the null hypothesis, the expected hazard rate at time t i is h 0 (t i ). STAT474/STAT574 February 24, / 94

52 Hypothesis testing: one sample The idea is then to compare observed - expected cumulative hazard rates at the time τ, the largest time in the study (τ = t D ) if the largest time is a death time). The test statistic is then Z(τ) = O(τ) E(τ) = D i=1 W (t i ) d τ i W (s)h 0 (s) ds Y i 0 where W ( ) is a weight function. The variance is V [Z(τ)] = τ 0 W 2 (s) h 0(s) Y (s) ds STAT474/STAT574 February 24, / 94

53 Hypothesis testing The expected value of Z(τ) = 0, so if we take a z-score of Z(τ) (subtracting the mean and dividing by the standard deviation), we get Z(τ)/ V [Z(τ)] which has an approximate standard normal distribution. This can be used for either a two-sided or one-sided test. For example, a one-sided test would be H 1 : h ( t) > h 0 (t), and you would reject only for large values of Z(τ)/ V [Z(τ)] STAT474/STAT574 February 24, / 94

54 Hypothesis testing The most popular choice for a weighting function is W (t) = Y (t), which leads to D O(τ) = Y (t i ) d D i = d i Y i i=1 i=1 This is also called the log-rank test (not sure why). Other weight functions are possible. For example W (t) = Y (t)s 0 (t) p [1 S 0 (t)] q with 0 p, q 1 (you don t necessarily need q = 1 p here). The choice of p affects whether you care more about the hazard not matching the hypothesized hazard for small t or large t. For example, if p is large, then more emphasis is placed on the estimated hazard matching the null hazard for small values of t. S 0 (t) can be obtained from S 0 (t) = exp[ H 0 (t)]. STAT474/STAT574 February 24, / 94

55 Hypothesis testing An example where you would use the one-sided hypothesis test is in testing whether some population has a higher hazard than a reference population, such as the psychiatric patients from Iowa. Recall that for this example, we looked at excess mortality previously. STAT474/STAT574 February 24, / 94

56 Hypothesis testing: two or more samples If you have two or more samples (i.e., mortality for three different treatments or three different risk groups), then the null and alternative hypothesis are similar to that for ANOVA: H 0 : h 1 (t) = h 2 (t) = h K (t), for all t τ H A : h i (t) h j (t) for some i j and some t τ where τ is the largest time at which all of the groups have at least one subject at risk. STAT474/STAT574 February 24, / 94

57 Hypothesis testing: two or more samples We now define t i as the unique death times for the pooled data (i.e., ignoring the group that each observation comes from), and again t D is the largest death time. We observe d ij deaths at time t i in sample j, and there are Y ij individuals at risk at time t i in sample j. We let d i = K j=1 d ij be the total number of deaths at time t i and Y i = K j=1 Y ij be the total number of indivdiuals at risk (available for death?) at time t i. STAT474/STAT574 February 24, / 94

58 Hypothesis testing: two or more samples The idea for testing the hypothesis is that under the null hypothesis, the estimate of the hazard (and cumulative hazard) should be the same (in expectation) using the pooled data (ignoring the group the samples are from) and for the individual samples. We can think of the pooled data as providing a more precise estimate of the hazard for the jth sample than the jth sample itself, so using the idea of observed minus expected, we can write D ( dij Z j (τ) = W j (t) d ) i, j = 1,..., K Y ij Y i i=1 If all of the Z j (τ) terms are close to 0, then all of the sample estimated cumulative hazards are close to the pooled cumulative hazard, so they all must be close to each other, and this supports the null hypothesis. STAT474/STAT574 February 24, / 94

59 Hypothesis testing: two or more samples The typical weight function used is W j (t) = Y ij (t)w (t i ), where W (t i ) is a common weight shared by each group. For this weighting scheme, V [Z j (τ)] = σ jj = D i=1 Z j (τ) = D i=1 W (t i ) 2 Y ij Y i cov(z j (τ), Z k (τ)) = σ jk = D i=1 [ ( )] di d ij Y ij Y i ( 1 Y ij Y i W (t i ) 2 Y ij Y i Y ik Y i ) ( ) Yi d i d i, j = 1,..., K Y i 1 ( ) Yi d i d i, j k Y i 1 STAT474/STAT574 February 24, / 94

60 Hypothesis testing: two or more samples Based on the second formula for Z j (τ), the sum K j=1 Z j(τ) is equal to 0, meaning that the Z j (τ) are not independent of one another. In particular Z K (τ) is a linear combination of Z 1 (τ),..., Z K 1 (τ). Consequently, we construct a test statistic just based on the first K 1 Z j (τ) terms: χ 2 = (Z 1 (τ),..., Z K 1 (τ))σ 1 (Z 1 (τ),..., Z K 1 (τ)) where (Z 1 (τ),..., Z K 1 (τ)) is interpreted as a K 1 row-vector, Σ is a (K 1) (K 1) covariance matrix (if you had made a K K matrix using all the variables, it wouldn t be full rank, and therefore not invertible). The χ 2 statistic has K 1 degrees of freedom, and you can base the test on this distribution. STAT474/STAT574 February 24, / 94

61 Hypothesis testing: two samples Several weight functions are possible. W (t) = 1 for all t leads to the two-sample log-rank test. W (t i ) = Y i and W (t i ) = Y i have also been used. In the case of K = 2 samples, the test statistic can be written as [ ( )] D i=1 W (t i) d i1 Y di i1 Y i Z = D ( ) ( ) i=1 W (t i) 2 Y i1 Y i 1 Y i1 Yi d i Y i Y i 1 SInce we don t have to square in this case, we can do one-sided as well as two-sided hypothesis tests based on a standard normal distribution instead of a χ 2, or you can square the statistic and use a χ 2 1 distribution. STAT474/STAT574 February 24, / 94

62 Hypothesis testing: two samples STAT474/STAT574 February 24, / 94

63 Hypothesis testing: two samples This example was kidney dialysis patients with surgically implanted catheters versus percutaneous (needle-puncture) placement of catheter. Even though the survival curves look fairly different after 1 year or so, the differences are not statistically signficant. Note that there are also very few observations for the percutaneous sample. Actually the number of observations is fairly small for both samples, so the confidence intervals would be fairly wide. STAT474/STAT574 February 24, / 94

64 Hypothesis testing: two samples STAT474/STAT574 February 24, / 94

65 Hypothesis testing: two samples STAT474/STAT574 February 24, / 94

66 Hypothesis testing: two samples Different choices for the weight function affect the p-value. It is reassuring if a lot of weighting schemes give the same conclusion. The cases where the p-value were low were where the weighting scheme gave a lot of weight to differences in the hazard for large values of t i, which of course is where they appear different. This can also be sensitive to differences in censoring patterns in the two samples, so should be used cautiously. A problem with using lots of weighting schemes is if you only report weighting schemes that give the results you want and different weights conflict. This would be dishonest, so you should either pick a weighting scheme and stick to it, or report results of the different weighting schemes that you used. STAT474/STAT574 February 24, / 94

67 Hypothesis testing: weight functions STAT474/STAT574 February 24, / 94

68 Hypothesis testing: weight functions The most common weight functions are either flat, W (t i ) = 1 or decreasing, with W (t i ) = Y i. A weight function that is increasing might be used if to compare longer term survival when early survival might be due to complications rather than long term effectiveness of a treatment. An example is in comparing autologous transplants versus allogenic transplants for bone marrow for leukemia. Allogenic transplant patients (receiving bone marrow from sibling) tend to have more complications early on, reducing early survival rates (and increasing early hazard rates), but if interest is in long term survival, then a weight function could be used that emphasized later times. STAT474/STAT574 February 24, / 94

69 Hypothesis testing in R To test the difference in survival curves in R, you can use survdiff() from the survival library. An example is with the allo- versus autopatients in the leukemia data. > x <- read.table("leukemia2.txt") > a <- survdiff(surv(x$v1,x$v2)~factor(x$v3)) Call: survdiff(formula = Surv(x$V1, x$v2) ~ factor(x$v3)) N Observed Expected (O-E)^2/E (O-E)^2/V factor(x$v3)= factor(x$v3)= Chisq= 0.4 on 1 degrees of freedom, p= The results suggest that the two groups had survival experiences that were not statistically significantly different from each other. STAT474/STAT574 February 24, / 94

70 Hypothesis testing in R To plot the two survival curves together you can use > x <- read.table("leukemia2.txt") > a <- survfit(surv(x$v1[x$v3==1],x$v2[x$v3==1])~1) > b <- survfit(surv(x$v1[x$v3==2],x$v2[x$v3==2])~1) > plot(a,conf=f) > points(b$time,b$surv,type="s",col="red",lwd=3) > legend(20,1,legend=c("auto","allo"),col=c("black","red"), lty=c(1,1),lwd=c(1,3),cex=1.3) STAT474/STAT574 February 24, / 94

71 Hypothsis testing in R STAT474/STAT574 February 24, / 94

72 Hypothesis testing in R The survdiff() function in R has an optional paramter rho whose default is 0, which results in the log rank test. Larger values of rho put larger weight on later times and can have a big impact on the p-value. STAT474/STAT574 February 24, / 94

73 Tests of trend For multiple samples (K > 2), a different alternative hypothesis is the following: H A : h 1 (t) h 2 (t) h K (t), for t τ, where at least one inequality is strict. This is equivalent to H A : S 1 (t) S K (t) STAT474/STAT574 February 24, / 94

74 Tests of trend We construct the Z j (τ)s as before and use any weight functions W j (t i ). We also pick a new set of weights a j, j = 1,..., K, where a j = j is often used. The test statistic is now Z = K j=1 a jz j (τ) K K j=1 k=1 a ja k σ jk where Σ = ( σ jk ) is the K K covariance matrix. (It isn t full rank, but we don t need the inverse.) The test statistic can be compared to a standard normal. STAT474/STAT574 February 24, / 94

75 Tests of trend STAT474/STAT574 February 24, / 94

76 Stratified tests If different populations have different covariates (age, sex, etc.), then ideally, you could use a regression approach to survival analysis to adjust for covariates before comparing survival curves or hazard rates. This is done in Chapter 8. If there are a small number of levels for a predictor, then you can use a stratified test instead. Let H 0 : h 1s (t) = h 2s (t) = = h Ks (t), s = 1,..., M, t τ The idea is that for each level of the covariate (indexed by s), the hazard rate should be the same. Typically, M is small. STAT474/STAT574 February 24, / 94

77 Stratified tests For the stratified test, let Z j. (τ) = σ jk = M Z js (τ) s=1 M s=1 σ jks Then the test statistic is as before with multiple samples: (Z 1. (τ),..., Z K 1,. (τ))σ 1 (Z 1. (τ),..., Z K 1,. (τ)) which is approximately χ 2 with K 1 degrees of freedom. Here we have K samples and M strata within each sample. STAT474/STAT574 February 24, / 94

78 Renyi type tests For a two sample problem, if hazard functions cross, then the previous tests might not detect much overall difference in the hazard rates. Thus, the overall survival experience might be similar, but it could be different in the short term and different in the long term. If one group is at more at risk in the short term, and another in the long term, these changes of direction could cancel out leading one to not reject the hypothesis that the hazards are different. Renyi-type tests are based on the maximum absolute value of the differences between cumulative hazard rates rather than the summed differences. The idea is similar to the Kolmogorov-Smirnov test for comparing two distributions, which uses the largest absolute value of the difference betweent the two empirical CDF functions, but Renyi tests allow for censoring. STAT474/STAT574 February 24, / 94

79 Renyi type tests To construct this test, let Z(t i ) = t k t i W (t k ) [ ( )] dk d k1 Y k1, i = 1,..., D Y k where as usual d k = d k1 + d k2 and Y k = Y k1 + Y k2 (i.e., d k and Y k are the pulled number of deaths and number at risk at time t k over both samples). The standard error of Z(τ) is σ 2 (τ) = τ k τ W (t k ) 2 ( Yk1 Y k ) ( Yk2 where τ is the largest death time t k with Y k1, Y k2 > 0 Y k ) ( ) Yk d k d k Y k 1 STAT474/STAT574 February 24, / 94

80 Renyi type tests The test statistic is Q = sup{ Z(t), t τ}/σ(τ) you can think of the supremum here as just the maximum of the absolute values of the Z(t j ) values. Critical values are given in the Appendix, table C.5, and are based on the theory of Brownian motion. STAT474/STAT574 February 24, / 94

81 Renyi type tests STAT474/STAT574 February 24, / 94

82 Renyi type tests: finding the maximum Z(t j ) STAT474/STAT574 February 24, / 94

83 STAT474/STAT574 February 24, / 94

84 Renyi type tests The maximum occurs at 315 days, with the maximum value being 9.8. The p-value (based on Table C.5) is 0.053, which is not significant at α = 0.05 but still gives more signal for the curves being different than the log-rank test, which gives p = STAT474/STAT574 February 24, / 94

85 Testing based on a fixed point in time Instead of testing survival and hazard rates over all time points, you might be interested in the 1-yr survival rate. Note that the time being tested should be chosen before doing the test. If you look at two survival curves and say, Wow, they look really different at year 3, is that significant? then the p-value will biased too low. It is similar to testing at many time points but then not adjusting for multiple comparisons. In practice, this is what happens all the time though. People look at a graph of the data, which is maybe meant to be descriptive, something jumps out at them as being unusual, and they say, Wow, is that significant? It s extremely difficult to answer this type of question. A better approach in this type of case might be the Renyi type of test, because it is accounting for the fact that you are looking at maximum differences over the entire time frame. STAT474/STAT574 February 24, / 94

86 Testing based on a fixed point in time Here we want to test against H 0 : S 1 (t 0 ) = S 2 (t 0 ) H A : S 1 (t 0 ) S 2 (t 0 ) for two survival curves. (The method can be generalized to more survival curves.) The test statistic is Z = Ŝ 1 (t 0 ) Ŝ2(t 0 ) V [Ŝ1(t 0 )] + V [Ŝ2(t 0 )] which has an approximate standard normal distribution for large samples. STAT474/STAT574 February 24, / 94

87 Testing based on a fixed point in time If you want to test multiple fixed time points, such as the 1-yr and 5-yr survival rates, then you should adjust for multiple comparisons. For testing two time points, a Bonferroni adjustment could be made, meaning that you reject each hypothesis only if the p-value is less than α/2. The more time points you check, the less power you will have to find signficant differences. STAT474/STAT574 February 24, / 94

88 Bonferroni adjustments Probably the most popular, and simplest adjustment to make for multiple testing is Bonferroni adjustments. The idea is that to have k tests at level α (meaning that if the null hypotheses are true for all k tests, there is only a 5% chance of making an error on any one of them), you use an α level of α/k for each test. What is the rationale for doing this? STAT474/STAT574 February 24, / 94

89 Bonferroni adjustments There are several ways to justify Bonferroni adjustments. One is to look at the expected number of false positives under the null. Let X i = 1 if you make a correct decision on test i, and otherwise X i = 0. What type of variable is X i? What is the probability that X i = 1 if the null hypothesis (for experiment i) is true? What is the expected value of X i? STAT474/STAT574 February 24, / 94

90 Bonferroni adjustments X i as defined previously is Bernoulli with p = α if testing using level α. The expected value of a Bernoulli(p) random variable is p. (Why?), so the expected value of X i is α. If you do k experiments, the expected number of false positives is [ k E i=1 X i ] = kα However, if you test at the α/k level, then the expected number of false positives is α. Thus, the Bonferroni adjustment controls the expected number of false positives. STAT474/STAT574 February 24, / 94

91 Bonferroni adjustments Another approach is to use something called Bonferroni s inequality. Let A i be the event that you don t reject the null hypothesis. Suppose we set P(A i ) = 1 α/k when the null is true. From the Inclusion-Exclusion formula P(A 1 A 2 ) = P(A 1 ) + P(A 2 ) P(A 1 A 2 ) P(A 1 ) + P(A 2 ) 1 If we apply the formula again, setting B = A 1 A 2, we get P(A 1 A 2 A 3 ) = [P(A 1 )+P(A 2 ) 1]+P(A 3 ) 1 P(A 1 )+P(A 2 )+P(A 3 ) 2 In general for k events P(A 1 A k ) k P(A i ) (k 1) i=1 STAT474/STAT574 February 24, / 94

92 Bonferroni adjustments If P(A i ) = 1 α/k, then we get P(A 1 A k ) k ( 1 α ) k + 1 = 1 α k Thus, the probability of all decisions being correct is at least 1 α, and the probability of making any wrong decision is at most α. STAT474/STAT574 February 24, / 94

93 Bonferroni adjustments Bonferroni s inequality can be useful in other probabilistic arguments as well. STAT474/STAT574 February 24, / 94

Kernel density estimation in R

Kernel density estimation in R Kernel density estimation in R Kernel density estimation can be done in R using the density() function in R. The default is a Guassian kernel, but others are possible also. It uses it s own algorithm to

More information

Chapter 7: Hypothesis testing

Chapter 7: Hypothesis testing Chapter 7: Hypothesis testing Hypothesis testing is typically done based on the cumulative hazard function. Here we ll use the Nelson-Aalen estimate of the cumulative hazard. The survival function is used

More information

Chapter 7 Fall Chapter 7 Hypothesis testing Hypotheses of interest: (A) 1-sample

Chapter 7 Fall Chapter 7 Hypothesis testing Hypotheses of interest: (A) 1-sample Bios 323: Applied Survival Analysis Qingxia (Cindy) Chen Chapter 7 Fall 2012 Chapter 7 Hypothesis testing Hypotheses of interest: (A) 1-sample H 0 : S(t) = S 0 (t), where S 0 ( ) is known survival function,

More information

Right-truncated data. STAT474/STAT574 February 7, / 44

Right-truncated data. STAT474/STAT574 February 7, / 44 Right-truncated data For this data, only individuals for whom the event has occurred by a given date are included in the study. Right truncation can occur in infectious disease studies. Let T i denote

More information

Textbook: Survivial Analysis Techniques for Censored and Truncated Data 2nd edition, by Klein and Moeschberger

Textbook: Survivial Analysis Techniques for Censored and Truncated Data 2nd edition, by Klein and Moeschberger Lecturer: James Degnan Office: SMLC 342 Office hours: MW 12:00 1:00 or by appointment E-mail: jamdeg@unm.edu Please include STAT474 or STAT574 in the subject line of the email to make sure I don t overlook

More information

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Review. DS GA 1002 Statistical and Mathematical Models.   Carlos Fernandez-Granda Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with

More information

4. Comparison of Two (K) Samples

4. Comparison of Two (K) Samples 4. Comparison of Two (K) Samples K=2 Problem: compare the survival distributions between two groups. E: comparing treatments on patients with a particular disease. Z: Treatment indicator, i.e. Z = 1 for

More information

Physics 509: Bootstrap and Robust Parameter Estimation

Physics 509: Bootstrap and Robust Parameter Estimation Physics 509: Bootstrap and Robust Parameter Estimation Scott Oser Lecture #20 Physics 509 1 Nonparametric parameter estimation Question: what error estimate should you assign to the slope and intercept

More information

β j = coefficient of x j in the model; β = ( β1, β2,

β j = coefficient of x j in the model; β = ( β1, β2, Regression Modeling of Survival Time Data Why regression models? Groups similar except for the treatment under study use the nonparametric methods discussed earlier. Groups differ in variables (covariates)

More information

14.30 Introduction to Statistical Methods in Economics Spring 2009

14.30 Introduction to Statistical Methods in Economics Spring 2009 MIT OpenCourseWare http://ocw.mit.edu 4.0 Introduction to Statistical Methods in Economics Spring 009 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Deciding, Estimating, Computing, Checking

Deciding, Estimating, Computing, Checking Deciding, Estimating, Computing, Checking How are Bayesian posteriors used, computed and validated? Fundamentalist Bayes: The posterior is ALL knowledge you have about the state Use in decision making:

More information

Deciding, Estimating, Computing, Checking. How are Bayesian posteriors used, computed and validated?

Deciding, Estimating, Computing, Checking. How are Bayesian posteriors used, computed and validated? Deciding, Estimating, Computing, Checking How are Bayesian posteriors used, computed and validated? Fundamentalist Bayes: The posterior is ALL knowledge you have about the state Use in decision making:

More information

Rejection regions for the bivariate case

Rejection regions for the bivariate case Rejection regions for the bivariate case The rejection region for the T 2 test (and similarly for Z 2 when Σ is known) is the region outside of an ellipse, for which there is a (1-α)% chance that the test

More information

Analysis of Time-to-Event Data: Chapter 6 - Regression diagnostics

Analysis of Time-to-Event Data: Chapter 6 - Regression diagnostics Analysis of Time-to-Event Data: Chapter 6 - Regression diagnostics Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/25 Residuals for the

More information

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing 1 In most statistics problems, we assume that the data have been generated from some unknown probability distribution. We desire

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

Nonparametric hypothesis tests and permutation tests

Nonparametric hypothesis tests and permutation tests Nonparametric hypothesis tests and permutation tests 1.7 & 2.3. Probability Generating Functions 3.8.3. Wilcoxon Signed Rank Test 3.8.2. Mann-Whitney Test Prof. Tesler Math 283 Fall 2018 Prof. Tesler Wilcoxon

More information

6.867 Machine Learning

6.867 Machine Learning 6.867 Machine Learning Problem set 1 Solutions Thursday, September 19 What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.

More information

Statistical Methods in Particle Physics

Statistical Methods in Particle Physics Statistical Methods in Particle Physics Lecture 11 January 7, 2013 Silvia Masciocchi, GSI Darmstadt s.masciocchi@gsi.de Winter Semester 2012 / 13 Outline How to communicate the statistical uncertainty

More information

Lecture 1: Probability Fundamentals

Lecture 1: Probability Fundamentals Lecture 1: Probability Fundamentals IB Paper 7: Probability and Statistics Carl Edward Rasmussen Department of Engineering, University of Cambridge January 22nd, 2008 Rasmussen (CUED) Lecture 1: Probability

More information

Survival analysis in R

Survival analysis in R Survival analysis in R Niels Richard Hansen This note describes a few elementary aspects of practical analysis of survival data in R. For further information we refer to the book Introductory Statistics

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Introduction to Bayesian Statistics 1

Introduction to Bayesian Statistics 1 Introduction to Bayesian Statistics 1 STA 442/2101 Fall 2018 1 This slide show is an open-source document. See last slide for copyright information. 1 / 42 Thomas Bayes (1701-1761) Image from the Wikipedia

More information

Practical Statistics

Practical Statistics Practical Statistics Lecture 1 (Nov. 9): - Correlation - Hypothesis Testing Lecture 2 (Nov. 16): - Error Estimation - Bayesian Analysis - Rejecting Outliers Lecture 3 (Nov. 18) - Monte Carlo Modeling -

More information

CONTENTS OF DAY 2. II. Why Random Sampling is Important 10 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

CONTENTS OF DAY 2. II. Why Random Sampling is Important 10 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE 1 2 CONTENTS OF DAY 2 I. More Precise Definition of Simple Random Sample 3 Connection with independent random variables 4 Problems with small populations 9 II. Why Random Sampling is Important 10 A myth,

More information

1 Probabilities. 1.1 Basics 1 PROBABILITIES

1 Probabilities. 1.1 Basics 1 PROBABILITIES 1 PROBABILITIES 1 Probabilities Probability is a tricky word usually meaning the likelyhood of something occuring or how frequent something is. Obviously, if something happens frequently, then its probability

More information

Statistics 100A Homework 5 Solutions

Statistics 100A Homework 5 Solutions Chapter 5 Statistics 1A Homework 5 Solutions Ryan Rosario 1. Let X be a random variable with probability density function a What is the value of c? fx { c1 x 1 < x < 1 otherwise We know that for fx to

More information

Modelling geoadditive survival data

Modelling geoadditive survival data Modelling geoadditive survival data Thomas Kneib & Ludwig Fahrmeir Department of Statistics, Ludwig-Maximilians-University Munich 1. Leukemia survival data 2. Structured hazard regression 3. Mixed model

More information

Hypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006

Hypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006 Hypothesis Testing Part I James J. Heckman University of Chicago Econ 312 This draft, April 20, 2006 1 1 A Brief Review of Hypothesis Testing and Its Uses values and pure significance tests (R.A. Fisher)

More information

Questions 3.83, 6.11, 6.12, 6.17, 6.25, 6.29, 6.33, 6.35, 6.50, 6.51, 6.53, 6.55, 6.59, 6.60, 6.65, 6.69, 6.70, 6.77, 6.79, 6.89, 6.

Questions 3.83, 6.11, 6.12, 6.17, 6.25, 6.29, 6.33, 6.35, 6.50, 6.51, 6.53, 6.55, 6.59, 6.60, 6.65, 6.69, 6.70, 6.77, 6.79, 6.89, 6. Chapter 7 Reading 7.1, 7.2 Questions 3.83, 6.11, 6.12, 6.17, 6.25, 6.29, 6.33, 6.35, 6.50, 6.51, 6.53, 6.55, 6.59, 6.60, 6.65, 6.69, 6.70, 6.77, 6.79, 6.89, 6.112 Introduction In Chapter 5 and 6, we emphasized

More information

Statistical Data Analysis Stat 3: p-values, parameter estimation

Statistical Data Analysis Stat 3: p-values, parameter estimation Statistical Data Analysis Stat 3: p-values, parameter estimation London Postgraduate Lectures on Particle Physics; University of London MSci course PH4515 Glen Cowan Physics Department Royal Holloway,

More information

Survival Analysis. Stat 526. April 13, 2018

Survival Analysis. Stat 526. April 13, 2018 Survival Analysis Stat 526 April 13, 2018 1 Functions of Survival Time Let T be the survival time for a subject Then P [T < 0] = 0 and T is a continuous random variable The Survival function is defined

More information

Practice Exam 1. (A) (B) (C) (D) (E) You are given the following data on loss sizes:

Practice Exam 1. (A) (B) (C) (D) (E) You are given the following data on loss sizes: Practice Exam 1 1. Losses for an insurance coverage have the following cumulative distribution function: F(0) = 0 F(1,000) = 0.2 F(5,000) = 0.4 F(10,000) = 0.9 F(100,000) = 1 with linear interpolation

More information

Uniformly Most Powerful Bayesian Tests and Standards for Statistical Evidence

Uniformly Most Powerful Bayesian Tests and Standards for Statistical Evidence Uniformly Most Powerful Bayesian Tests and Standards for Statistical Evidence Valen E. Johnson Texas A&M University February 27, 2014 Valen E. Johnson Texas A&M University Uniformly most powerful Bayes

More information

Eco517 Fall 2004 C. Sims MIDTERM EXAM

Eco517 Fall 2004 C. Sims MIDTERM EXAM Eco517 Fall 2004 C. Sims MIDTERM EXAM Answer all four questions. Each is worth 23 points. Do not devote disproportionate time to any one question unless you have answered all the others. (1) We are considering

More information

Multistate Modeling and Applications

Multistate Modeling and Applications Multistate Modeling and Applications Yang Yang Department of Statistics University of Michigan, Ann Arbor IBM Research Graduate Student Workshop: Statistics for a Smarter Planet Yang Yang (UM, Ann Arbor)

More information

Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models

Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models Matthew S. Johnson New York ASA Chapter Workshop CUNY Graduate Center New York, NY hspace1in December 17, 2009 December

More information

5. Parametric Regression Model

5. Parametric Regression Model 5. Parametric Regression Model The Accelerated Failure Time (AFT) Model Denote by S (t) and S 2 (t) the survival functions of two populations. The AFT model says that there is a constant c > 0 such that

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

PhD course in Advanced survival analysis. One-sample tests. Properties. Idea: (ABGK, sect. V.1.1) Counting process N(t)

PhD course in Advanced survival analysis. One-sample tests. Properties. Idea: (ABGK, sect. V.1.1) Counting process N(t) PhD course in Advanced survival analysis. (ABGK, sect. V.1.1) One-sample tests. Counting process N(t) Non-parametric hypothesis tests. Parametric models. Intensity process λ(t) = α(t)y (t) satisfying Aalen

More information

Introduction to BGPhazard

Introduction to BGPhazard Introduction to BGPhazard José A. García-Bueno and Luis E. Nieto-Barajas February 11, 2016 Abstract We present BGPhazard, an R package which computes hazard rates from a Bayesian nonparametric view. This

More information

AP Statistics Cumulative AP Exam Study Guide

AP Statistics Cumulative AP Exam Study Guide AP Statistics Cumulative AP Eam Study Guide Chapters & 3 - Graphs Statistics the science of collecting, analyzing, and drawing conclusions from data. Descriptive methods of organizing and summarizing statistics

More information

Kaplan-Meier in SAS. filename foo url "http://math.unm.edu/~james/small.txt"; data small; infile foo firstobs=2; input time censor; run;

Kaplan-Meier in SAS. filename foo url http://math.unm.edu/~james/small.txt; data small; infile foo firstobs=2; input time censor; run; Kaplan-Meier in SAS filename foo url "http://math.unm.edu/~james/small.txt"; data small; infile foo firstobs=2; input time censor; run; proc print data=small; run; proc lifetest data=small plots=survival;

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As

More information

Cox s proportional hazards model and Cox s partial likelihood

Cox s proportional hazards model and Cox s partial likelihood Cox s proportional hazards model and Cox s partial likelihood Rasmus Waagepetersen October 12, 2018 1 / 27 Non-parametric vs. parametric Suppose we want to estimate unknown function, e.g. survival function.

More information

Survival analysis in R

Survival analysis in R Survival analysis in R Niels Richard Hansen This note describes a few elementary aspects of practical analysis of survival data in R. For further information we refer to the book Introductory Statistics

More information

18.05 Practice Final Exam

18.05 Practice Final Exam No calculators. 18.05 Practice Final Exam Number of problems 16 concept questions, 16 problems. Simplifying expressions Unless asked to explicitly, you don t need to simplify complicated expressions. For

More information

Survival Analysis Math 434 Fall 2011

Survival Analysis Math 434 Fall 2011 Survival Analysis Math 434 Fall 2011 Part IV: Chap. 8,9.2,9.3,11: Semiparametric Proportional Hazards Regression Jimin Ding Math Dept. www.math.wustl.edu/ jmding/math434/fall09/index.html Basic Model Setup

More information

Political Science 236 Hypothesis Testing: Review and Bootstrapping

Political Science 236 Hypothesis Testing: Review and Bootstrapping Political Science 236 Hypothesis Testing: Review and Bootstrapping Rocío Titiunik Fall 2007 1 Hypothesis Testing Definition 1.1 Hypothesis. A hypothesis is a statement about a population parameter The

More information

Exam 2 Practice Questions, 18.05, Spring 2014

Exam 2 Practice Questions, 18.05, Spring 2014 Exam 2 Practice Questions, 18.05, Spring 2014 Note: This is a set of practice problems for exam 2. The actual exam will be much shorter. Within each section we ve arranged the problems roughly in order

More information

6.867 Machine Learning

6.867 Machine Learning 6.867 Machine Learning Problem set 1 Due Thursday, September 19, in class What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.

More information

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017 CPSC 340: Machine Learning and Data Mining MLE and MAP Fall 2017 Assignment 3: Admin 1 late day to hand in tonight, 2 late days for Wednesday. Assignment 4: Due Friday of next week. Last Time: Multi-Class

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

Rank-Based Methods. Lukas Meier

Rank-Based Methods. Lukas Meier Rank-Based Methods Lukas Meier 20.01.2014 Introduction Up to now we basically always used a parametric family, like the normal distribution N (µ, σ 2 ) for modeling random data. Based on observed data

More information

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling 10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel

More information

Bayesian Inference: Concept and Practice

Bayesian Inference: Concept and Practice Inference: Concept and Practice fundamentals Johan A. Elkink School of Politics & International Relations University College Dublin 5 June 2017 1 2 3 Bayes theorem In order to estimate the parameters of

More information

Physics 509: Error Propagation, and the Meaning of Error Bars. Scott Oser Lecture #10

Physics 509: Error Propagation, and the Meaning of Error Bars. Scott Oser Lecture #10 Physics 509: Error Propagation, and the Meaning of Error Bars Scott Oser Lecture #10 1 What is an error bar? Someone hands you a plot like this. What do the error bars indicate? Answer: you can never be

More information

Statistical Inference and Methods

Statistical Inference and Methods Department of Mathematics Imperial College London d.stephens@imperial.ac.uk http://stats.ma.ic.ac.uk/ das01/ 31st January 2006 Part VI Session 6: Filtering and Time to Event Data Session 6: Filtering and

More information

Survival Regression Models

Survival Regression Models Survival Regression Models David M. Rocke May 18, 2017 David M. Rocke Survival Regression Models May 18, 2017 1 / 32 Background on the Proportional Hazards Model The exponential distribution has constant

More information

Lecture 5: Bayes pt. 1

Lecture 5: Bayes pt. 1 Lecture 5: Bayes pt. 1 D. Jason Koskinen koskinen@nbi.ku.dk Photo by Howard Jackman University of Copenhagen Advanced Methods in Applied Statistics Feb - Apr 2016 Niels Bohr Institute 2 Bayes Probabilities

More information

Line Broadening. φ(ν) = Γ/4π 2 (ν ν 0 ) 2 + (Γ/4π) 2, (3) where now Γ = γ +2ν col includes contributions from both natural broadening and collisions.

Line Broadening. φ(ν) = Γ/4π 2 (ν ν 0 ) 2 + (Γ/4π) 2, (3) where now Γ = γ +2ν col includes contributions from both natural broadening and collisions. Line Broadening Spectral lines are not arbitrarily sharp. There are a variety of mechanisms that give them finite width, and some of those mechanisms contain significant information. We ll consider a few

More information

12 - Nonparametric Density Estimation

12 - Nonparametric Density Estimation ST 697 Fall 2017 1/49 12 - Nonparametric Density Estimation ST 697 Fall 2017 University of Alabama Density Review ST 697 Fall 2017 2/49 Continuous Random Variables ST 697 Fall 2017 3/49 1.0 0.8 F(x) 0.6

More information

Bayesian Regression Linear and Logistic Regression

Bayesian Regression Linear and Logistic Regression When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we

More information

Hypothesis testing I. - In particular, we are talking about statistical hypotheses. [get everyone s finger length!] n =

Hypothesis testing I. - In particular, we are talking about statistical hypotheses. [get everyone s finger length!] n = Hypothesis testing I I. What is hypothesis testing? [Note we re temporarily bouncing around in the book a lot! Things will settle down again in a week or so] - Exactly what it says. We develop a hypothesis,

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our

More information

Exam C Solutions Spring 2005

Exam C Solutions Spring 2005 Exam C Solutions Spring 005 Question # The CDF is F( x) = 4 ( + x) Observation (x) F(x) compare to: Maximum difference 0. 0.58 0, 0. 0.58 0.7 0.880 0., 0.4 0.680 0.9 0.93 0.4, 0.6 0.53. 0.949 0.6, 0.8

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling

DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling Due: Tuesday, May 10, 2016, at 6pm (Submit via NYU Classes) Instructions: Your answers to the questions below, including

More information

Statistics 262: Intermediate Biostatistics Non-parametric Survival Analysis

Statistics 262: Intermediate Biostatistics Non-parametric Survival Analysis Statistics 262: Intermediate Biostatistics Non-parametric Survival Analysis Jonathan Taylor & Kristin Cobb Statistics 262: Intermediate Biostatistics p.1/?? Overview of today s class Kaplan-Meier Curve

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 1 Bootstrapped Bias and CIs Given a multiple regression model with mean and

More information

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics DETAILED CONTENTS About the Author Preface to the Instructor To the Student How to Use SPSS With This Book PART I INTRODUCTION AND DESCRIPTIVE STATISTICS 1. Introduction to Statistics 1.1 Descriptive and

More information

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis Lecture 3 1 Probability (90 min.) Definition, Bayes theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests (90 min.) general concepts, test statistics,

More information

Lectures on Statistics. William G. Faris

Lectures on Statistics. William G. Faris Lectures on Statistics William G. Faris December 1, 2003 ii Contents 1 Expectation 1 1.1 Random variables and expectation................. 1 1.2 The sample mean........................... 3 1.3 The sample

More information

Checking for Prior-Data Conflict

Checking for Prior-Data Conflict Bayesian Analysis (2006) 1, Number 4, pp. 893 914 Checking for Prior-Data Conflict Michael Evans and Hadas Moshonov Abstract. Inference proceeds from ingredients chosen by the analyst and data. To validate

More information

TDA231. Logistic regression

TDA231. Logistic regression TDA231 Devdatt Dubhashi dubhashi@chalmers.se Dept. of Computer Science and Engg. Chalmers University February 19, 2016 Some data 5 x2 0 5 5 0 5 x 1 In the Bayes classifier, we built a model of each class

More information

The Design of a Survival Study

The Design of a Survival Study The Design of a Survival Study The design of survival studies are usually based on the logrank test, and sometimes assumes the exponential distribution. As in standard designs, the power depends on The

More information

Econ 582 Nonparametric Regression

Econ 582 Nonparametric Regression Econ 582 Nonparametric Regression Eric Zivot May 28, 2013 Nonparametric Regression Sofarwehaveonlyconsideredlinearregressionmodels = x 0 β + [ x ]=0 [ x = x] =x 0 β = [ x = x] [ x = x] x = β The assume

More information

Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling

Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 009 Mark Craven craven@biostat.wisc.edu Sequence Motifs what is a sequence

More information

arxiv: v1 [physics.data-an] 3 Jun 2008

arxiv: v1 [physics.data-an] 3 Jun 2008 arxiv:0806.0530v [physics.data-an] 3 Jun 008 Averaging Results with Theory Uncertainties F. C. Porter Lauritsen Laboratory for High Energy Physics California Institute of Technology Pasadena, California

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California Texts in Statistical Science Bayesian Ideas and Data Analysis An Introduction for Scientists and Statisticians Ronald Christensen University of New Mexico Albuquerque, New Mexico Wesley Johnson University

More information

Master s Written Examination

Master s Written Examination Master s Written Examination Option: Statistics and Probability Spring 016 Full points may be obtained for correct answers to eight questions. Each numbered question which may have several parts is worth

More information

A Re-Introduction to General Linear Models (GLM)

A Re-Introduction to General Linear Models (GLM) A Re-Introduction to General Linear Models (GLM) Today s Class: You do know the GLM Estimation (where the numbers in the output come from): From least squares to restricted maximum likelihood (REML) Reviewing

More information

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA, 00 MODULE : Statistical Inference Time Allowed: Three Hours Candidates should answer FIVE questions. All questions carry equal marks. The

More information

Class 26: review for final exam 18.05, Spring 2014

Class 26: review for final exam 18.05, Spring 2014 Probability Class 26: review for final eam 8.05, Spring 204 Counting Sets Inclusion-eclusion principle Rule of product (multiplication rule) Permutation and combinations Basics Outcome, sample space, event

More information

Conditional probabilities and graphical models

Conditional probabilities and graphical models Conditional probabilities and graphical models Thomas Mailund Bioinformatics Research Centre (BiRC), Aarhus University Probability theory allows us to describe uncertainty in the processes we model within

More information

Gaussian Quiz. Preamble to The Humble Gaussian Distribution. David MacKay 1

Gaussian Quiz. Preamble to The Humble Gaussian Distribution. David MacKay 1 Preamble to The Humble Gaussian Distribution. David MacKay Gaussian Quiz H y y y 3. Assuming that the variables y, y, y 3 in this belief network have a joint Gaussian distribution, which of the following

More information

Markov Chain Monte Carlo (MCMC)

Markov Chain Monte Carlo (MCMC) Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can

More information

Introduction to Applied Bayesian Modeling. ICPSR Day 4

Introduction to Applied Bayesian Modeling. ICPSR Day 4 Introduction to Applied Bayesian Modeling ICPSR Day 4 Simple Priors Remember Bayes Law: Where P(A) is the prior probability of A Simple prior Recall the test for disease example where we specified the

More information

Chapter 6. Logistic Regression. 6.1 A linear model for the log odds

Chapter 6. Logistic Regression. 6.1 A linear model for the log odds Chapter 6 Logistic Regression In logistic regression, there is a categorical response variables, often coded 1=Yes and 0=No. Many important phenomena fit this framework. The patient survives the operation,

More information

Statistical Methods for Astronomy

Statistical Methods for Astronomy Statistical Methods for Astronomy If your experiment needs statistics, you ought to have done a better experiment. -Ernest Rutherford Lecture 1 Lecture 2 Why do we need statistics? Definitions Statistical

More information

Contents. Acknowledgments. xix

Contents. Acknowledgments. xix Table of Preface Acknowledgments page xv xix 1 Introduction 1 The Role of the Computer in Data Analysis 1 Statistics: Descriptive and Inferential 2 Variables and Constants 3 The Measurement of Variables

More information

1 Probabilities. 1.1 Basics 1 PROBABILITIES

1 Probabilities. 1.1 Basics 1 PROBABILITIES 1 PROBABILITIES 1 Probabilities Probability is a tricky word usually meaning the likelyhood of something occuring or how frequent something is. Obviously, if something happens frequently, then its probability

More information

TMA 4275 Lifetime Analysis June 2004 Solution

TMA 4275 Lifetime Analysis June 2004 Solution TMA 4275 Lifetime Analysis June 2004 Solution Problem 1 a) Observation of the outcome is censored, if the time of the outcome is not known exactly and only the last time when it was observed being intact,

More information

Fitting a Straight Line to Data

Fitting a Straight Line to Data Fitting a Straight Line to Data Thanks for your patience. Finally we ll take a shot at real data! The data set in question is baryonic Tully-Fisher data from http://astroweb.cwru.edu/sparc/btfr Lelli2016a.mrt,

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

STAT 461/561- Assignments, Year 2015

STAT 461/561- Assignments, Year 2015 STAT 461/561- Assignments, Year 2015 This is the second set of assignment problems. When you hand in any problem, include the problem itself and its number. pdf are welcome. If so, use large fonts and

More information