Chapter 7: Hypothesis testing
|
|
- Cora Fleming
- 5 years ago
- Views:
Transcription
1 Chapter 7: Hypothesis testing Hypothesis testing is typically done based on the cumulative hazard function. Here we ll use the Nelson-Aalen estimate of the cumulative hazard. The survival function is used to weight differences between the observed and expected cumulative hazard. Recall that the Nelson-Aalen estimate of the cumulative hazard is H(t) = t ti d i Y i In a one-sample problem, you test whether the hazard rate h(t) is equal to some reference hazard, h 0 (t). The null hypothesis is H 0 : h(t) = h 0 (t). Under the null hypothesis, the expected hazard rate at time t i is h 0 (t i ). SAS Programming March 11, / 43
2 Hypothesis testing: one sample The idea is then to compare observed - expected cumulative hazard rates at the time τ, the largest time in the study (τ = t D ) if the largest time is a death time). The test statistic is then Z(τ) = O(τ) E(τ) = D i=1 W (t i ) d τ i W (s)h 0 (s) ds Y i 0 where W ( ) is a weight function. The variance is V [Z(τ)] = τ 0 W 2 (s) h 0(s) Y (s) ds SAS Programming March 11, / 43
3 Hypothesis testing The expected value of Z(τ) = 0, so if we take a z-score of Z(τ) (subtracting the mean and dividing by the standard deviation), we get Z(τ)/ V [Z(τ)] which has an approximate standard normal distribution. This can be used for either a two-sided or one-sided test. For example, a one-sided test would be H 1 : h ( t) > h 0 (t), and you would reject only for large values of Z(τ)/ V [Z(τ)] SAS Programming March 11, / 43
4 Hypothesis testing The most popular choice for a weighting function is W (t) = Y (t), which leads to D O(τ) = Y (t i ) d D i = d i Y i i=1 i=1 This is also called the log-rank test (not sure why). Other weight functions are possible. For example W (t) = Y (t)s 0 (t) p [1 S 0 (t)] q with 0 p, q 1 (you don t necessarily need q = 1 p here). The choice of p affects whether you care more about the hazard not matching the hypothesized hazard for small t or large t. For example, if p is large, then more emphasis is placed on the estimated hazard matching the null hazard for small values of t. S 0 (t) can be obtained from S 0 (t) = exp[ H 0 (t)]. SAS Programming March 11, / 43
5 Hypothesis testing An example where you would use the one-sided hypothesis test is in testing whether some population has a higher hazard than a reference population, such as the psychiatric patients from Iowa. Recall that for this example, we looked at excess mortality previously. SAS Programming March 11, / 43
6 Hypothesis testing: two or more samples If you have two or more samples (i.e., mortality for three different treatments or three different risk groups), then the null and alternative hypothesis are similar to that for ANOVA: H 0 : h 1 (t) = h 2 (t) = h K (t), for all t τ H A : h i (t) h j (t) for some i j and some t τ where τ is the largest time at which all of the groups have at least one subject at risk. SAS Programming March 11, / 43
7 Hypothesis testing: two or more samples We now define t i as the unique death times for the pooled data (i.e., ignoring the group that each observation comes from), and again t D is the largest death time. We observe d ij deaths at time t i in sample j, and there are Y ij individuals at risk at time t i in sample j. We let d i = K j=1 d ij be the total number of deaths at time t i and Y i = K j=1 Y ij be the total number of indivdiuals at risk (available for death?) at time t i. SAS Programming March 11, / 43
8 Hypothesis testing: two or more samples The idea for testing the hypothesis is that under the null hypothesis, the estimate of the hazard (and cumulative hazard) should be the same (in expectation) using the pooled data (ignoring the group the samples are from) and for the individual samples. We can think of the pooled data as providing a more precise estimate of the hazard for the jth sample than the jth sample itself, so using the idea of observed minus expected, we can write D ( dij Z j (τ) = W j (t) d ) i, j = 1,..., K Y ij Y i i=1 If all of the Z j (τ) terms are close to 0, then all of the sample estimated cumulative hazards are close to the pooled cumulative hazard, so they all must be close to each other, and this supports the null hypothesis. SAS Programming March 11, / 43
9 Hypothesis testing: two or more samples The typical weight function used is W j (t) = Y ij (t)w (t i ), where W (t i ) is a common weight shared by each group. For this weighting scheme, V [Z j (τ)] = σ jj = D i=1 Z j (τ) = D i=1 W (t i ) 2 Y ij Y i cov(z j (τ), Z k (τ)) = σ jk = D i=1 [ ( )] di d ij Y ij Y i ( 1 Y ij Y i W (t i ) 2 Y ij Y i Y ik Y i ) ( ) Yi d i d i, j = 1,..., K Y i 1 ( ) Yi d i d i, j k Y i 1 SAS Programming March 11, / 43
10 Hypothesis testing: two or more samples Based on the second formula for Z j (τ), the sum K j=1 Z j(τ) is equal to 0, meaning that the Z j (τ) are not independent of one another. In particular Z K (τ) is a linear combination of Z 1 (τ),..., Z K 1 (τ). Consequently, we construct a test statistic just based on the first K 1 Z j (τ) terms: χ 2 = (Z 1 (τ),..., Z K 1 (τ))σ 1 (Z 1 (τ),..., Z K 1 (τ)) where (Z 1 (τ),..., Z K 1 (τ)) is interpreted as a K 1 row-vector, Σ is a (K 1) (K 1) covariance matrix (if you had made a K K matrix using all the variables, it wouldn t be full rank, and therefore not invertible). The χ 2 statistic has K 1 degrees of freedom, and you can base the test on this distribution. SAS Programming March 11, / 43
11 Hypothesis testing: two samples Several weight functions are possible. W (t) = 1 for all t leads to the two-sample log-rank test. W (t i ) = Y i and W (t i ) = Y i have also been used. In the case of K = 2 samples, the test statistic can be written as [ ( )] D i=1 W (t i) d i1 Y di i1 Y i Z = D ( ) ( ) i=1 W (t i) 2 Y i1 Y i 1 Y i1 Yi d i Y i Y i 1 SInce we don t have to square in this case, we can do one-sided as well as two-sided hypothesis tests based on a standard normal distribution instead of a χ 2, or you can square the statistic and use a χ 2 1 distribution. SAS Programming March 11, / 43
12 Hypothesis testing: two samples SAS Programming March 11, / 43
13 Hypothesis testing: two samples This example was kidney dialysis patients with surgically implanted catheters versus percutaneous (needle-puncture) placement of catheter. Even though the survival curves look fairly different after 1 year or so, the differences are not statistically signficant. Note that there are also very few observations for the percutaneous sample. Actually the number of observations is fairly small for both samples, so the confidence intervals would be fairly wide. SAS Programming March 11, / 43
14 Hypothesis testing: two samples SAS Programming March 11, / 43
15 Hypothesis testing: two samples SAS Programming March 11, / 43
16 Hypothesis testing: two samples Different choices for the weight function affect the p-value. It is reassuring if a lot of weighting schemes give the same conclusion. The cases where the p-value were low were where the weighting scheme gave a lot of weight to differences in the hazard for large values of t i, which of course is where they appear different. This can also be sensitive to differences in censoring patterns in the two samples, so should be used cautiously. A problem with using lots of weighting schemes is if you only report weighting schemes that give the results you want and different weights conflict. This would be dishonest, so you should either pick a weighting scheme and stick to it, or report results of the different weighting schemes that you used. SAS Programming March 11, / 43
17 Hypothesis testing: weight functions SAS Programming March 11, / 43
18 Hypothesis testing: weight functions The most common weight functions are either flat, W (t i ) = 1 or decreasing, with W (t i ) = Y i. A weight function that is increasing might be used if to compare longer term survival when early survival might be due to complications rather than long term effectiveness of a treatment. An example is in comparing autologous transplants versus allogenic transplants for bone marrow for leukemia. Allogenic transplant patients (receiving bone marrow from sibling) tend to have more complications early on, reducing early survival rates (and increasing early hazard rates), but if interest is in long term survival, then a weight function could be used that emphasized later times. SAS Programming March 11, / 43
19 Hypothesis testing in R To test the difference in survival curves in R, you can use survdiff() from the survival library. An example is with the allo- versus autopatients in the leukemia data. > x <- read.table("leukemia2.txt") > a <- survdiff(surv(x$v1,x$v2)~factor(x$v3)) Call: survdiff(formula = Surv(x$V1, x$v2) ~ factor(x$v3)) N Observed Expected (O-E)^2/E (O-E)^2/V factor(x$v3)= factor(x$v3)= Chisq= 0.4 on 1 degrees of freedom, p= The results suggest that the two groups had survival experiences that were not statistically significantly different from each other. SAS Programming March 11, / 43
20 Hypothesis testing in R To plot the two survival curves together you can use > x <- read.table("leukemia2.txt") > a <- survfit(surv(x$v1[x$v3==1],x$v2[x$v3==1])~1) > b <- survfit(surv(x$v1[x$v3==2],x$v2[x$v3==2])~1) > plot(a,conf=f) > points(b$time,b$surv,type="s",col="red",lwd=3) > legend(20,1,legend=c("auto","allo"),col=c("black","red"), lty=c(1,1),lwd=c(1,3),cex=1.3) SAS Programming March 11, / 43
21 Hypothsis testing in R SAS Programming March 11, / 43
22 Hypothesis testing in R The survdiff() function in R has an optional paramter rho whose default is 0, which results in the log rank test. Larger values of rho put larger weight on later times and can have a big impact on the p-value. SAS Programming March 11, / 43
23 Hypothesis testing in SAS You can use PROC LIFETEST in SAS to do hypothesis testing. We ll take a look at examples after the break. SAS Programming March 11, / 43
24 Tests of trend For multiple samples (K > 2), a different alternative hypothesis is the following: H A : h 1 (t) h 2 (t) h K (t), for t τ, where at least one inequality is strict. This is equivalent to H A : S 1 (t) S K (t) SAS Programming March 11, / 43
25 Tests of trend We construct the Z j (τ)s as before and use any weight functions W j (t i ). We also pick a new set of weights a j, j = 1,..., K, where a j = j is often used. The test statistic is now Z = K j=1 a jz j (τ) K K j=1 k=1 a ja k σ jk where Σ = ( σ jk ) is the K K covariance matrix. (It isn t full rank, but we don t need the inverse.) The test statistic can be compared to a standard normal. SAS Programming March 11, / 43
26 Tests of trend SAS Programming March 11, / 43
27 Stratified tests If different populations have different covariates (age, sex, etc.), then ideally, you could use a regression approach to survival analysis to adjust for covariates before comparing survival curves or hazard rates. This is done in Chapter 8. If there are a small number of levels for a predictor, then you can use a stratified test instead. Let H 0 : h 1s (t) = h 2s (t) = = h Ks (t), s = 1,..., M, t τ The idea is that for each level of the covariate (indexed by s), the hazard rate should be the same. Typically, M is small. SAS Programming March 11, / 43
28 Stratified tests For the stratified test, let Z j. (τ) = σ jk = M Z js (τ) s=1 M s=1 σ jks Then the test statistic is as before with multiple samples: (Z 1. (τ),..., Z K 1,. (τ))σ 1 (Z 1. (τ),..., Z K 1,. (τ)) which is approximately χ 2 with K 1 degrees of freedom. Here we have K samples and M strata within each sample. SAS Programming March 11, / 43
29 Renyi type tests For a two sample problem, if hazard functions cross, then the previous tests might not detect much overall difference in the hazard rates. Thus, the overall survival experience might be similar, but it could be different in the short term and different in the long term. If one group is at more at risk in the short term, and another in the long term, these changes of direction could cancel out leading one to not reject the hypothesis that the hazards are different. Renyi-type tests are based on the maximum absolute value of the differences between cumulative hazard rates rather than the summed differences. The idea is similar to the Kolmogorov-Smirnov test for comparing two distributions, which uses the largest absolute value of the difference betweent the two empirical CDF functions, but Renyi tests allow for censoring. SAS Programming March 11, / 43
30 Renyi type tests To construct this test, let Z(t i ) = t k t i W (t k ) [ ( )] dk d k1 Y k1, i = 1,..., D Y k where as usual d k = d k1 + d k2 and Y k = Y k1 + Y k2 (i.e., d k and Y k are the pulled number of deaths and number at risk at time t k over both samples). The standard error of Z(τ) is σ 2 (τ) = τ k τ W (t k ) 2 ( Yk1 Y k ) ( Yk2 where τ is the largest death time t k with Y k1, Y k2 > 0 Y k ) ( ) Yk d k d k Y k 1 SAS Programming March 11, / 43
31 Renyi type tests The test statistic is Q = sup{ Z(t), t τ}/σ(τ) you can think of the supremum here as just the maximum of the absolute values of the Z(t j ) values. Critical values are given in the Appendix, table C.5, and are based on the theory of Brownian motion. SAS Programming March 11, / 43
32 Renyi type tests SAS Programming March 11, / 43
33 Renyi type tests: finding the maximum Z(t j ) SAS Programming March 11, / 43
34 SAS Programming March 11, / 43
35 Testing based on a fixed point in time Instead of testing survival and hazard rates over all time points, you might be interested in the 1-yr survival rate. Note that the time being tested should be chosen before doing the test. If you look at two survival curves and say, Wow, they look really different at year 3, is that significant? then the p-value will biased too low. It is similar to testing at many time points but then not adjusting for multiple comparisons. In practice, this is what happens all the time though. People look at a graph of the data, which is maybe meant to be descriptive, something jumps out at them as being unusual, and they say, Wow, is that significant? It s extremely difficult to answer this type of question. A better approach in this type of case might be the Renyi type of test, because it is accounting for the fact that you are looking at maximum differences over the entire time frame. SAS Programming March 11, / 43
36 Testing based on a fixed point in time Here we want to test against H 0 : S 1 (t 0 ) = S 2 (t 0 ) H A : S 1 (t 0 ) S 2 (t 0 ) for two survival curves. (The method can be generalized to more survival curves.) The test statistic is Z = Ŝ 1 (t 0 ) Ŝ2(t 0 ) V [Ŝ1(t 0 )] + V [Ŝ2(t 0 )] which has an approximate standard normal distribution for large samples. SAS Programming March 11, / 43
37 Testing based on a fixed point in time If you want to test multiple fixed time points, such as the 1-yr and 5-yr survival rates, then you should adjust for multiple comparisons. For testing two time points, a Bonferroni adjustment could be made, meaning that you reject each hypothesis only if the p-value is less than α/2. The more time points you check, the less power you will have to find signficant differences. SAS Programming March 11, / 43
38 Bonferroni adjustments Probably the most popular, and simplest adjustment to make for multiple testing is Bonferroni adjustments. The idea is that to have k tests at level α (meaning that if the null hypotheses are true for all k tests, there is only a 5% chance of making an error on any one of them), you use an α level of α/k for each test. What is the rationale for doing this? SAS Programming March 11, / 43
39 Bonferroni adjustments There are several ways to justify Bonferroni adjustments. One is to look at the expected number of false positives under the null. Let X i = 1 if you make a correct decision on test i, and otherwise X i = 0. What type of variable is X i? What is the probability that X i = 1 if the null hypothesis (for experiment i) is true? What is the expected value of X i? SAS Programming March 11, / 43
40 Bonferroni adjustments X i as defined previously is Bernoulli with p = α if testing using level α. The expected value of a Bernoulli(p) random variable is p. (Why?), so the expected value of X i is α. If you do k experiments, the expected number of false positives is [ k E i=1 X i ] = kα However, if you test at the α/k level, then the expected number of false positives is α. Thus, the Bonferroni adjustment controls the expected number of false positives. SAS Programming March 11, / 43
41 Bonferroni adjustments Another approach is to use something called Bonferroni s inequality. Let A i be the event that you don t reject the null hypothesis. Suppose we set P(A i ) = 1 α/k when the null is true. From the Inclusion-Exclusion formula P(A 1 A 2 ) = P(A 1 ) + P(A 2 ) P(A 1 A 2 ) P(A 1 ) + P(A 2 ) 1 If we apply the formula again, setting B = A 1 A 2, we get P(A 1 A 2 A 3 ) = [P(A 1 )+P(A 2 ) 1]+P(A 3 ) 1 P(A 1 )+P(A 2 )+P(A 3 ) 2 In general for k events P(A 1 A k ) k P(A i ) (k 1) i=1 SAS Programming March 11, / 43
42 Bonferroni adjustments If P(A i ) = 1 α/k, then we get P(A 1 A k ) k ( 1 α ) k + 1 = 1 α k Thus, the probability of all decisions being correct is at least 1 α, and the probability of making any wrong decision is at most α. SAS Programming March 11, / 43
43 Bonferroni adjustments Bonferroni s inequality can be useful in other probabilistic arguments as well. SAS Programming March 11, / 43
Kernel density estimation in R
Kernel density estimation in R Kernel density estimation can be done in R using the density() function in R. The default is a Guassian kernel, but others are possible also. It uses it s own algorithm to
More informationChapter 7 Fall Chapter 7 Hypothesis testing Hypotheses of interest: (A) 1-sample
Bios 323: Applied Survival Analysis Qingxia (Cindy) Chen Chapter 7 Fall 2012 Chapter 7 Hypothesis testing Hypotheses of interest: (A) 1-sample H 0 : S(t) = S 0 (t), where S 0 ( ) is known survival function,
More informationChapter 1 Statistical Inference
Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations
More informationSurvival Regression Models
Survival Regression Models David M. Rocke May 18, 2017 David M. Rocke Survival Regression Models May 18, 2017 1 / 32 Background on the Proportional Hazards Model The exponential distribution has constant
More informationKernel density estimation in R
Kernel density estimation in R Kernel density estimation can be done in R using the density() function in R. The default is a Guassian kernel, but others are possible also. It uses it s own algorithm to
More informationAnalysis of Time-to-Event Data: Chapter 6 - Regression diagnostics
Analysis of Time-to-Event Data: Chapter 6 - Regression diagnostics Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/25 Residuals for the
More information14.30 Introduction to Statistical Methods in Economics Spring 2009
MIT OpenCourseWare http://ocw.mit.edu 4.0 Introduction to Statistical Methods in Economics Spring 009 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
More informationβ j = coefficient of x j in the model; β = ( β1, β2,
Regression Modeling of Survival Time Data Why regression models? Groups similar except for the treatment under study use the nonparametric methods discussed earlier. Groups differ in variables (covariates)
More information4. Comparison of Two (K) Samples
4. Comparison of Two (K) Samples K=2 Problem: compare the survival distributions between two groups. E: comparing treatments on patients with a particular disease. Z: Treatment indicator, i.e. Z = 1 for
More informationQuestions 3.83, 6.11, 6.12, 6.17, 6.25, 6.29, 6.33, 6.35, 6.50, 6.51, 6.53, 6.55, 6.59, 6.60, 6.65, 6.69, 6.70, 6.77, 6.79, 6.89, 6.
Chapter 7 Reading 7.1, 7.2 Questions 3.83, 6.11, 6.12, 6.17, 6.25, 6.29, 6.33, 6.35, 6.50, 6.51, 6.53, 6.55, 6.59, 6.60, 6.65, 6.69, 6.70, 6.77, 6.79, 6.89, 6.112 Introduction In Chapter 5 and 6, we emphasized
More informationContrasts and Multiple Comparisons Supplement for Pages
Contrasts and Multiple Comparisons Supplement for Pages 302-323 Brian Habing University of South Carolina Last Updated: July 20, 2001 The F-test from the ANOVA table allows us to test the null hypothesis
More informationRejection regions for the bivariate case
Rejection regions for the bivariate case The rejection region for the T 2 test (and similarly for Z 2 when Σ is known) is the region outside of an ellipse, for which there is a (1-α)% chance that the test
More informationTextbook: Survivial Analysis Techniques for Censored and Truncated Data 2nd edition, by Klein and Moeschberger
Lecturer: James Degnan Office: SMLC 342 Office hours: MW 12:00 1:00 or by appointment E-mail: jamdeg@unm.edu Please include STAT474 or STAT574 in the subject line of the email to make sure I don t overlook
More informationChapter 6. Logistic Regression. 6.1 A linear model for the log odds
Chapter 6 Logistic Regression In logistic regression, there is a categorical response variables, often coded 1=Yes and 0=No. Many important phenomena fit this framework. The patient survives the operation,
More informationAnalysis of Variance
Statistical Techniques II EXST7015 Analysis of Variance 15a_ANOVA_Introduction 1 Design The simplest model for Analysis of Variance (ANOVA) is the CRD, the Completely Randomized Design This model is also
More informationYou know I m not goin diss you on the internet Cause my mama taught me better than that I m a survivor (What?) I m not goin give up (What?
You know I m not goin diss you on the internet Cause my mama taught me better than that I m a survivor (What?) I m not goin give up (What?) I m not goin stop (What?) I m goin work harder (What?) Sir David
More informationANOVA Analysis of Variance
ANOVA Analysis of Variance ANOVA Analysis of Variance Extends independent samples t test ANOVA Analysis of Variance Extends independent samples t test Compares the means of groups of independent observations
More informationPolitical Science 236 Hypothesis Testing: Review and Bootstrapping
Political Science 236 Hypothesis Testing: Review and Bootstrapping Rocío Titiunik Fall 2007 1 Hypothesis Testing Definition 1.1 Hypothesis. A hypothesis is a statement about a population parameter The
More informationB.N.Bandodkar College of Science, Thane. Random-Number Generation. Mrs M.J.Gholba
B.N.Bandodkar College of Science, Thane Random-Number Generation Mrs M.J.Gholba Properties of Random Numbers A sequence of random numbers, R, R,., must have two important statistical properties, uniformity
More informationH 2 : otherwise. that is simply the proportion of the sample points below level x. For any fixed point x the law of large numbers gives that
Lecture 28 28.1 Kolmogorov-Smirnov test. Suppose that we have an i.i.d. sample X 1,..., X n with some unknown distribution and we would like to test the hypothesis that is equal to a particular distribution
More informationRight-truncated data. STAT474/STAT574 February 7, / 44
Right-truncated data For this data, only individuals for whom the event has occurred by a given date are included in the study. Right truncation can occur in infectious disease studies. Let T i denote
More informationMultilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2
Multilevel Models in Matrix Form Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Today s Lecture Linear models from a matrix perspective An example of how to do
More informationHypothesis testing I. - In particular, we are talking about statistical hypotheses. [get everyone s finger length!] n =
Hypothesis testing I I. What is hypothesis testing? [Note we re temporarily bouncing around in the book a lot! Things will settle down again in a week or so] - Exactly what it says. We develop a hypothesis,
More informationTHE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE
THE ROYAL STATISTICAL SOCIETY 004 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE PAPER II STATISTICAL METHODS The Society provides these solutions to assist candidates preparing for the examinations in future
More informationWeek 14 Comparing k(> 2) Populations
Week 14 Comparing k(> 2) Populations Week 14 Objectives Methods associated with testing for the equality of k(> 2) means or proportions are presented. Post-testing concepts and analysis are introduced.
More information5. Parametric Regression Model
5. Parametric Regression Model The Accelerated Failure Time (AFT) Model Denote by S (t) and S 2 (t) the survival functions of two populations. The AFT model says that there is a constant c > 0 such that
More informationUnit 14: Nonparametric Statistical Methods
Unit 14: Nonparametric Statistical Methods Statistics 571: Statistical Methods Ramón V. León 8/8/2003 Unit 14 - Stat 571 - Ramón V. León 1 Introductory Remarks Most methods studied so far have been based
More informationHypothesis Testing, Power, Sample Size and Confidence Intervals (Part 2)
Hypothesis Testing, Power, Sample Size and Confidence Intervals (Part 2) B.H. Robbins Scholars Series June 23, 2010 1 / 29 Outline Z-test χ 2 -test Confidence Interval Sample size and power Relative effect
More informationNonparametric hypothesis tests and permutation tests
Nonparametric hypothesis tests and permutation tests 1.7 & 2.3. Probability Generating Functions 3.8.3. Wilcoxon Signed Rank Test 3.8.2. Mann-Whitney Test Prof. Tesler Math 283 Fall 2018 Prof. Tesler Wilcoxon
More informationIn a one-way ANOVA, the total sums of squares among observations is partitioned into two components: Sums of squares represent:
Activity #10: AxS ANOVA (Repeated subjects design) Resources: optimism.sav So far in MATH 300 and 301, we have studied the following hypothesis testing procedures: 1) Binomial test, sign-test, Fisher s
More informationAP Statistics Cumulative AP Exam Study Guide
AP Statistics Cumulative AP Eam Study Guide Chapters & 3 - Graphs Statistics the science of collecting, analyzing, and drawing conclusions from data. Descriptive methods of organizing and summarizing statistics
More informationThe One-Way Independent-Samples ANOVA. (For Between-Subjects Designs)
The One-Way Independent-Samples ANOVA (For Between-Subjects Designs) Computations for the ANOVA In computing the terms required for the F-statistic, we won t explicitly compute any sample variances or
More informationLinear Regression. Chapter 3
Chapter 3 Linear Regression Once we ve acquired data with multiple variables, one very important question is how the variables are related. For example, we could ask for the relationship between people
More informationIntroduction to the Analysis of Variance (ANOVA) Computing One-Way Independent Measures (Between Subjects) ANOVAs
Introduction to the Analysis of Variance (ANOVA) Computing One-Way Independent Measures (Between Subjects) ANOVAs The Analysis of Variance (ANOVA) The analysis of variance (ANOVA) is a statistical technique
More informationCHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)
FROM: PAGANO, R. R. (007) I. INTRODUCTION: DISTINCTION BETWEEN PARAMETRIC AND NON-PARAMETRIC TESTS Statistical inference tests are often classified as to whether they are parametric or nonparametric Parameter
More informationMathematical Notation Math Introduction to Applied Statistics
Mathematical Notation Math 113 - Introduction to Applied Statistics Name : Use Word or WordPerfect to recreate the following documents. Each article is worth 10 points and should be emailed to the instructor
More informationLecture 7. Proportional Hazards Model - Handling Ties and Survival Estimation Statistics Survival Analysis. Presented February 4, 2016
Proportional Hazards Model - Handling Ties and Survival Estimation Statistics 255 - Survival Analysis Presented February 4, 2016 likelihood - Discrete Dan Gillen Department of Statistics University of
More informationMultiple comparisons - subsequent inferences for two-way ANOVA
1 Multiple comparisons - subsequent inferences for two-way ANOVA the kinds of inferences to be made after the F tests of a two-way ANOVA depend on the results if none of the F tests lead to rejection of
More informationdf=degrees of freedom = n - 1
One sample t-test test of the mean Assumptions: Independent, random samples Approximately normal distribution (from intro class: σ is unknown, need to calculate and use s (sample standard deviation)) Hypotheses:
More informationFormal Statement of Simple Linear Regression Model
Formal Statement of Simple Linear Regression Model Y i = β 0 + β 1 X i + ɛ i Y i value of the response variable in the i th trial β 0 and β 1 are parameters X i is a known constant, the value of the predictor
More informationElementary Statistics Triola, Elementary Statistics 11/e Unit 17 The Basics of Hypotheses Testing
(Section 8-2) Hypotheses testing is not all that different from confidence intervals, so let s do a quick review of the theory behind the latter. If it s our goal to estimate the mean of a population,
More informationChapter Seven: Multi-Sample Methods 1/52
Chapter Seven: Multi-Sample Methods 1/52 7.1 Introduction 2/52 Introduction The independent samples t test and the independent samples Z test for a difference between proportions are designed to analyze
More informationThe Design of a Survival Study
The Design of a Survival Study The design of survival studies are usually based on the logrank test, and sometimes assumes the exponential distribution. As in standard designs, the power depends on The
More informationThe Chi-Square Distributions
MATH 183 The Chi-Square Distributions Dr. Neal, WKU The chi-square distributions can be used in statistics to analyze the standard deviation σ of a normally distributed measurement and to test the goodness
More informationPhD course in Advanced survival analysis. One-sample tests. Properties. Idea: (ABGK, sect. V.1.1) Counting process N(t)
PhD course in Advanced survival analysis. (ABGK, sect. V.1.1) One-sample tests. Counting process N(t) Non-parametric hypothesis tests. Parametric models. Intensity process λ(t) = α(t)y (t) satisfying Aalen
More informationChapte The McGraw-Hill Companies, Inc. All rights reserved.
er15 Chapte Chi-Square Tests d Chi-Square Tests for -Fit Uniform Goodness- Poisson Goodness- Goodness- ECDF Tests (Optional) Contingency Tables A contingency table is a cross-tabulation of n paired observations
More informationKaplan-Meier in SAS. filename foo url "http://math.unm.edu/~james/small.txt"; data small; infile foo firstobs=2; input time censor; run;
Kaplan-Meier in SAS filename foo url "http://math.unm.edu/~james/small.txt"; data small; infile foo firstobs=2; input time censor; run; proc print data=small; run; proc lifetest data=small plots=survival;
More informationBusiness Statistics. Lecture 9: Simple Regression
Business Statistics Lecture 9: Simple Regression 1 On to Model Building! Up to now, class was about descriptive and inferential statistics Numerical and graphical summaries of data Confidence intervals
More informationQuantitative Introduction ro Risk and Uncertainty in Business Module 5: Hypothesis Testing
Quantitative Introduction ro Risk and Uncertainty in Business Module 5: Hypothesis Testing M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu October
More informationCox s proportional hazards/regression model - model assessment
Cox s proportional hazards/regression model - model assessment Rasmus Waagepetersen September 27, 2017 Topics: Plots based on estimated cumulative hazards Cox-Snell residuals: overall check of fit Martingale
More informationLecture 10: F -Tests, ANOVA and R 2
Lecture 10: F -Tests, ANOVA and R 2 1 ANOVA We saw that we could test the null hypothesis that β 1 0 using the statistic ( β 1 0)/ŝe. (Although I also mentioned that confidence intervals are generally
More informationAnalysis of variance (ANOVA) Comparing the means of more than two groups
Analysis of variance (ANOVA) Comparing the means of more than two groups Example: Cost of mating in male fruit flies Drosophila Treatments: place males with and without unmated (virgin) females Five treatments
More informationChapter 1 Review of Equations and Inequalities
Chapter 1 Review of Equations and Inequalities Part I Review of Basic Equations Recall that an equation is an expression with an equal sign in the middle. Also recall that, if a question asks you to solve
More informationBinary Logistic Regression
The coefficients of the multiple regression model are estimated using sample data with k independent variables Estimated (or predicted) value of Y Estimated intercept Estimated slope coefficients Ŷ = b
More informationTMA 4275 Lifetime Analysis June 2004 Solution
TMA 4275 Lifetime Analysis June 2004 Solution Problem 1 a) Observation of the outcome is censored, if the time of the outcome is not known exactly and only the last time when it was observed being intact,
More informationReview. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda
Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with
More informationPhilosophy and Features of the mstate package
Introduction Mathematical theory Practice Discussion Philosophy and Features of the mstate package Liesbeth de Wreede, Hein Putter Department of Medical Statistics and Bioinformatics Leiden University
More informationThe Random Effects Model Introduction
The Random Effects Model Introduction Sometimes, treatments included in experiment are randomly chosen from set of all possible treatments. Conclusions from such experiment can then be generalized to other
More informationCox s proportional hazards model and Cox s partial likelihood
Cox s proportional hazards model and Cox s partial likelihood Rasmus Waagepetersen October 12, 2018 1 / 27 Non-parametric vs. parametric Suppose we want to estimate unknown function, e.g. survival function.
More informationCorrelation. We don't consider one variable independent and the other dependent. Does x go up as y goes up? Does x go down as y goes up?
Comment: notes are adapted from BIOL 214/312. I. Correlation. Correlation A) Correlation is used when we want to examine the relationship of two continuous variables. We are not interested in prediction.
More informationSurvival Analysis. Stat 526. April 13, 2018
Survival Analysis Stat 526 April 13, 2018 1 Functions of Survival Time Let T be the survival time for a subject Then P [T < 0] = 0 and T is a continuous random variable The Survival function is defined
More informationHypothesis testing. Data to decisions
Hypothesis testing Data to decisions The idea Null hypothesis: H 0 : the DGP/population has property P Under the null, a sample statistic has a known distribution If, under that that distribution, the
More informationStatistics 262: Intermediate Biostatistics Regression & Survival Analysis
Statistics 262: Intermediate Biostatistics Regression & Survival Analysis Jonathan Taylor & Kristin Cobb Statistics 262: Intermediate Biostatistics p.1/?? Introduction This course is an applied course,
More informationSurvival analysis in R
Survival analysis in R Niels Richard Hansen This note describes a few elementary aspects of practical analysis of survival data in R. For further information we refer to the book Introductory Statistics
More informationReview of Statistics 101
Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods
More informationLinear Models: Comparing Variables. Stony Brook University CSE545, Fall 2017
Linear Models: Comparing Variables Stony Brook University CSE545, Fall 2017 Statistical Preliminaries Random Variables Random Variables X: A mapping from Ω to ℝ that describes the question we care about
More informationStatistics Primer. A Brief Overview of Basic Statistical and Probability Principles. Essential Statistics for Data Analysts Using Excel
Statistics Primer A Brief Overview of Basic Statistical and Probability Principles Liberty J. Munson, PhD 9/19/16 Essential Statistics for Data Analysts Using Excel Table of Contents What is a Variable?...
More informationDETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics
DETAILED CONTENTS About the Author Preface to the Instructor To the Student How to Use SPSS With This Book PART I INTRODUCTION AND DESCRIPTIVE STATISTICS 1. Introduction to Statistics 1.1 Descriptive and
More informationTypical Survival Data Arising From a Clinical Trial. Censoring. The Survivor Function. Mathematical Definitions Introduction
Outline CHL 5225H Advanced Statistical Methods for Clinical Trials: Survival Analysis Prof. Kevin E. Thorpe Defining Survival Data Mathematical Definitions Non-parametric Estimates of Survival Comparing
More informationLecture 21: October 19
36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 21: October 19 21.1 Likelihood Ratio Test (LRT) To test composite versus composite hypotheses the general method is to use
More informationLecture 4: Testing Stuff
Lecture 4: esting Stuff. esting Hypotheses usually has three steps a. First specify a Null Hypothesis, usually denoted, which describes a model of H 0 interest. Usually, we express H 0 as a restricted
More informationHYPOTHESIS TESTING. Hypothesis Testing
MBA 605 Business Analytics Don Conant, PhD. HYPOTHESIS TESTING Hypothesis testing involves making inferences about the nature of the population on the basis of observations of a sample drawn from the population.
More information22s:152 Applied Linear Regression. Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA)
22s:152 Applied Linear Regression Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA) We now consider an analysis with only categorical predictors (i.e. all predictors are
More informationSTAT 135 Lab 9 Multiple Testing, One-Way ANOVA and Kruskal-Wallis
STAT 135 Lab 9 Multiple Testing, One-Way ANOVA and Kruskal-Wallis Rebecca Barter April 6, 2015 Multiple Testing Multiple Testing Recall that when we were doing two sample t-tests, we were testing the equality
More informationTesting Independence
Testing Independence Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM 1/50 Testing Independence Previously, we looked at RR = OR = 1
More informationSolutions to Final STAT 421, Fall 2008
Solutions to Final STAT 421, Fall 2008 Fritz Scholz 1. (8) Two treatments A and B were randomly assigned to 8 subjects (4 subjects to each treatment) with the following responses: 0, 1, 3, 6 and 5, 7,
More informationStat 5421 Lecture Notes Fuzzy P-Values and Confidence Intervals Charles J. Geyer March 12, Discreteness versus Hypothesis Tests
Stat 5421 Lecture Notes Fuzzy P-Values and Confidence Intervals Charles J. Geyer March 12, 2016 1 Discreteness versus Hypothesis Tests You cannot do an exact level α test for any α when the data are discrete.
More informationModule 6: Model Diagnostics
St@tmaster 02429/MIXED LINEAR MODELS PREPARED BY THE STATISTICS GROUPS AT IMM, DTU AND KU-LIFE Module 6: Model Diagnostics 6.1 Introduction............................... 1 6.2 Linear model diagnostics........................
More informationR 2 and F -Tests and ANOVA
R 2 and F -Tests and ANOVA December 6, 2018 1 Partition of Sums of Squares The distance from any point y i in a collection of data, to the mean of the data ȳ, is the deviation, written as y i ȳ. Definition.
More informationInferences about a Mean Vector
Inferences about a Mean Vector Edps/Soc 584, Psych 594 Carolyn J. Anderson Department of Educational Psychology I L L I N O I S university of illinois at urbana-champaign c Board of Trustees, University
More informationContents. Acknowledgments. xix
Table of Preface Acknowledgments page xv xix 1 Introduction 1 The Role of the Computer in Data Analysis 1 Statistics: Descriptive and Inferential 2 Variables and Constants 3 The Measurement of Variables
More informationGaussian Quiz. Preamble to The Humble Gaussian Distribution. David MacKay 1
Preamble to The Humble Gaussian Distribution. David MacKay Gaussian Quiz H y y y 3. Assuming that the variables y, y, y 3 in this belief network have a joint Gaussian distribution, which of the following
More information11 Survival Analysis and Empirical Likelihood
11 Survival Analysis and Empirical Likelihood The first paper of empirical likelihood is actually about confidence intervals with the Kaplan-Meier estimator (Thomas and Grunkmeier 1979), i.e. deals with
More informationAnalysis of Variance (ANOVA)
Analysis of Variance (ANOVA) Two types of ANOVA tests: Independent measures and Repeated measures Comparing 2 means: X 1 = 20 t - test X 2 = 30 How can we Compare 3 means?: X 1 = 20 X 2 = 30 X 3 = 35 ANOVA
More informationName Solutions Linear Algebra; Test 3. Throughout the test simplify all answers except where stated otherwise.
Name Solutions Linear Algebra; Test 3 Throughout the test simplify all answers except where stated otherwise. 1) Find the following: (10 points) ( ) Or note that so the rows are linearly independent, so
More informationLecture 8 Stat D. Gillen
Statistics 255 - Survival Analysis Presented February 23, 2016 Dan Gillen Department of Statistics University of California, Irvine 8.1 Example of two ways to stratify Suppose a confounder C has 3 levels
More informationOne-sample categorical data: approximate inference
One-sample categorical data: approximate inference Patrick Breheny October 6 Patrick Breheny Biostatistical Methods I (BIOS 5710) 1/25 Introduction It is relatively easy to think about the distribution
More informationStatistics 262: Intermediate Biostatistics Non-parametric Survival Analysis
Statistics 262: Intermediate Biostatistics Non-parametric Survival Analysis Jonathan Taylor & Kristin Cobb Statistics 262: Intermediate Biostatistics p.1/?? Overview of today s class Kaplan-Meier Curve
More informationRank-Based Methods. Lukas Meier
Rank-Based Methods Lukas Meier 20.01.2014 Introduction Up to now we basically always used a parametric family, like the normal distribution N (µ, σ 2 ) for modeling random data. Based on observed data
More informationMultiple Regression Analysis
Multiple Regression Analysis y = β 0 + β 1 x 1 + β 2 x 2 +... β k x k + u 2. Inference 0 Assumptions of the Classical Linear Model (CLM)! So far, we know: 1. The mean and variance of the OLS estimators
More informationProbability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institute of Technology, Kharagpur
Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institute of Technology, Kharagpur Lecture No. # 38 Goodness - of fit tests Hello and welcome to this
More informationDistribution-Free Procedures (Devore Chapter Fifteen)
Distribution-Free Procedures (Devore Chapter Fifteen) MATH-5-01: Probability and Statistics II Spring 018 Contents 1 Nonparametric Hypothesis Tests 1 1.1 The Wilcoxon Rank Sum Test........... 1 1. Normal
More informationLecture 19 Multiple (Linear) Regression
Lecture 19 Multiple (Linear) Regression Thais Paiva STA 111 - Summer 2013 Term II August 1, 2013 1 / 30 Thais Paiva STA 111 - Summer 2013 Term II Lecture 19, 08/01/2013 Lecture Plan 1 Multiple regression
More informationStatistical Inference: Estimation and Confidence Intervals Hypothesis Testing
Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing 1 In most statistics problems, we assume that the data have been generated from some unknown probability distribution. We desire
More informationMaster s Written Examination
Master s Written Examination Option: Statistics and Probability Spring 016 Full points may be obtained for correct answers to eight questions. Each numbered question which may have several parts is worth
More informationIntroduction to Nonparametric Statistics
Introduction to Nonparametric Statistics by James Bernhard Spring 2012 Parameters Parametric method Nonparametric method µ[x 2 X 1 ] paired t-test Wilcoxon signed rank test µ[x 1 ], µ[x 2 ] 2-sample t-test
More informationMuch of the material we will be covering for a while has to do with designing an experimental study that concerns some phenomenon of interest.
Experimental Design: Much of the material we will be covering for a while has to do with designing an experimental study that concerns some phenomenon of interest We wish to use our subjects in the best
More informationMultistate Modeling and Applications
Multistate Modeling and Applications Yang Yang Department of Statistics University of Michigan, Ann Arbor IBM Research Graduate Student Workshop: Statistics for a Smarter Planet Yang Yang (UM, Ann Arbor)
More informationMAS3301 / MAS8311 Biostatistics Part II: Survival
MAS3301 / MAS8311 Biostatistics Part II: Survival M. Farrow School of Mathematics and Statistics Newcastle University Semester 2, 2009-10 1 13 The Cox proportional hazards model 13.1 Introduction In the
More informationConfidence intervals
Confidence intervals We now want to take what we ve learned about sampling distributions and standard errors and construct confidence intervals. What are confidence intervals? Simply an interval for which
More information