Correlated data. Introduction. We expect students to... Aim of the course. Faculty of Health Sciences. NFA, May 19, 2014.

Size: px

Start display at page:

Download "Correlated data. Introduction. We expect students to... Aim of the course. Faculty of Health Sciences. NFA, May 19, 2014."

Julian Poole
5 years ago
Views:

1 Faculty of Health Sciences Introduction Correlated data NFA, May 19, 2014 Introduction Julie Lyng Forman & Lene Theil Skovgaard Department of Biostatistics University of Copenhagen The idea of the course Comparing two types of measurement Logarithmic transformation Linear regression The general linear model Home page: / 80 2 / 80 Aim of the course We expect students to... To make the participants able to: understand and interpret advanced statistical analyses judge the assumptions behind the use of various methods of analyses perform own analyses using SAS understand output from a statistical program package - in general, i.e. other than SAS present results from a statistical analysis - numerically and graphically To create a better platform for communication between users of statistics and statisticians, to benefit subsequent collaboration Be interested Be motivated ideally from your own (future) research project Have basic knowledge of statistical concepts such as: mean, average variance, standard deviation, standard error distribution correlation, regression, anova t-test, χ 2 -test, F-test 3 / 80 4 / 80

2 Topics for the course Recommended reading Quantitative data (normal distribution): Analysis of variance Variance component models General linear models / regression analysis Linear mixed models Non-normal outcome (binary data or count data): Logistic or Poisson regression Generalized linear mixed models Not covered: Multivariate data (several outcomes at once) Censored data (survival analysis) The lecture notes (can be downloaded from the course webpages). Brief notes about SAS-programming (can be downloaded from the course webpages). B.T. West, K.B. Welch and A.T. Galecki: Linear mixed models: a practical guide using statistical software, Chapman & Hall/CRC, 2007 We teach SAS programming.... but the book also covers SPSS, R, Stata, and HLM. 5 / 80 6 / 80 Teaching activities Course diploma Lectures: Mornings ( ) Copies of overheads must be downloaded in advance Coffee break around Computers labs: In the afternoon ( ) following each lecture Coffee, tea, and cake will be served Exercises will be handed out Solutions can be downloaded after classes To pass the course 80% attendance is required. It is your responsibility to sign the list each morning and each afternoon. Note: 5 2 = 10 lists, 80% equals 8 half days. There is no compulsory home work... but to benefit from the course you need to work with the material at home We expect you to do so! 7 / 80 8 / 80

3 What are repeated measurements? Paired data Repeated measurements refer to data where the same outcome has been measured in different situations (or at different spots) on the same individuals. Special case: longitudinal means repeatedly over time. Repeated measurements are termed clustered data when the same outcome is measured on groups of individuals from the same families/workplaces/school classes/villages/etc. The most simple example of clustered or repeated measuments. Two replicates or two subjects per cluster Examples of paired data: Same person with treatment and placebo (cross-over studies) Baseline-follow up studies Twin studies Comparison of two measurement methods Reliability of a measurement method Quantiative outcome analysed with the paired t-test BUT often the test is not in focus, rather estimation/quantification 9 / / 80 Statistical analysis Example: MF vs SV The usual assumption is that observations are independent. If you have clustered or repeated measurements the assumption of independence is violated. Your analyses must account for the repetitions/clustering. In this course we will teach you how to do it. Warning: Ignoring the repetitions/clustering and doing a standard analysis most often leads to: P-values that are too small or too large. confidence intervals that are too wide or too narrow. Two measurement methods, expected to give the same result: MF: Transmitral volumetric flow, determined by Doppler eccocardiography SV: Left ventricular stroke volume, determined by cross-sectional eccocardiography subject MF SV / / 80

4 Comparison of measurement methods Usually a comparison of a new experimental method with an established method (the reference) How well do the two measurements agree? Is the new method biased compared to the reference? Description of the data Graphical description Scatterplot Sample paths Bland-Altman plot Histogram The data is paired The subjects act as their own controls Hence we look at differences within subjects Set up a statistical model to: Describe the typical size of the differences Test if the bias (i.e. the mean difference) is zero 13 / / 80 Numerical description Variable Mean Std.Dev MF SV DIF AVERAGE Statistical model for paired data The normal distribution x i : MF-measurement for the i th subject y i : SV-measurement for the i th subject N (µ, σ 2 ) Look at the differences: d i = x i y i, for i = 1,..., 21 Density N(, ) The mean is often denoted µ or α. The model asssumes that the differences are: independent N(, ) The standard deviation is often denoted σ or ω. normally distributed d i N (δ, σd 2) No assumptions are made about the distribution of the individual x The variance is σ 2. flow measurements 15 / / 80

5 Paired t-test in SAS Can be performed in two different ways: PROC TTEST; PAIRED mf*sv; RUN; 1. as a paired two-sample test One-sample tests in SAS, for differences 2. as a one-sample test on the differences: PROC UNIVARIATE NORMAL; VAR dif; RUN; The UNIVARIATE Procedure Variable: dif The TTEST Procedure Statistics Lower CL Upper CL Lower CL Upper CL Difference N Mean Mean Mean Std Dev Std Dev Std Dev mf - sv Difference Std Err Minimum Maximum mf - sv T-Tests Difference DF t Value Pr > t mf - sv / 80 Tests for Location: Mu0=0 Test -Statistic p Value Student s t t Pr > t Sign M 2.5 Pr >= M Signed Rank S 8 Pr >= S Moments N 21 Sum Weights 21 Mean Sum Observations 5 Std Deviation Variance / 80 About the paired t-test Estimation of bias The estimated mean difference is given by Test of the null hypothesis H 0 : δ = 0 (no bias) The t-statistic is given by: t = d 0 SEM = / 21 which gives P = 0.88, i.e. no significant bias. = t(20) Does this mean that the measurement methods are equally good? d = 0.24 cm 3 The estimate is our best guess, but repeating the experiment would give us a somewhat different result The estimate has a distribution, with an uncertainty called the standard error of the estimate. The standard error of the mean is given by SEM = s d n = = 1.52 cm 3 19 / / 80

6 General confidence intervals Confidence limits for the bias Confidence intervals tells us what the parameter is likely to be An interval, that catches the true mean with a 95% probability is called a 95% confidence interval 95% is called the coverage The usual construction is: Average ±t 97.5% (n 1) SEM Often a good approximation, even if data are not normally distributed (due to the central limit theorem) The t-quantile t 97.5% may be looked up in a table or computed by a program (e.g. R, see For the differences mf-sv, we get the confidence interval: d ± t 97.5% (20) SEM 0.24 ± / 21 ( 2.93 ; 3.41) If there is a bias, it is likely (i.e. with 95% certainty) within the limits ( 2.93cm 3, 3.41cm 3 ) Conclusion: We cannot rule out a bias of approx. 3 cm 3 in either direction 21 / / 80 P-values and confidence intervals Note the difference Tests and confidence intervals are equivalent in a certain sense They agree on reasonable values for the mean The confidence interval contains the values δ 0 for which H 0 : δ = δ 0 would be accepted But the P-value is less informative than the confidence interval If the study is large a tiny bias may be significant If the study is small a large bias may be insignificant Better use the confidence interval to judge the clinical implications of the bias! Standard error (of the mean), SE(M) tells us something about the uncertainty of the estimate of the mean SEM = SD/ n is the standard deviation in the distibution of the estimate is used for comparisons, relations etc. Standard deviation, SD tells us something about the variation in our sample, and presumably in the population is used when describing the data 23 / / 80

7 Normal regions Prediction intervals The normal region is an interval containing 95% of the typical observations, i.e. the midrange of the population: 2.5%-quantile to 97.5%-quantile If the distribution is normal N (µ, σ 2 ), then 2.5%-quantile to 97.5%-quantile is µ ± 1.96 σ An estimated normal region is given by: Average ± 2 SD But this does not account for parameter uncertainty! 25 / 80 A prediction interval has to catch future observations with high probability, say 95%. x ± 2s is a good prediction interval if the sample is large. But if the sample is small the coverage will be too low. 95% coverage is attained by the prediction interval: ( x s 1 + 1/n t 2.5%, x + s 1 + 1/n t 97.5% ) I.e. the probability that a randomly chosen subject from the population has a value in this interval is 95% if the data is normal 26 / 80 Limits of agreement Derivation of the prediction interval Limits-of-agreement is the prediction interval for the difference between two measuring methods important for deciding whether or not two measurement methods may replace each other. Limits-of-agreement for mf-sv are given by: 0.24 ± / = ( 14.97, 15.45) While " x ± 2s" is too narrow / has too low coverage: 27 / 80 d ± 2 s d = 0.24 ± = ( 13.68, 14.16) Assume that d new is a new observation, then ( d new d N 0, σd 2 (1 ) ) + 1 n dnew d s d 1+1/n t(n 1) implying that with 95% probability: t 2.5% < d new d s d 1+1/n < t 97.5% d + s d 1 + 1/n t2.5% < d new < d + s d 1 + 1/n t 97.5% d s d 1 + 1/n t97.5% < d new < d + s d 1 + 1/n t 97.5% since t 2.5% = t 97.5% by symmetry of the t-distribution. 28 / 80

8 Assumptions for the paired comparison Checking normality: the QQ-plot The differences: are independent, i.e. the subjects are unrelated are normally distributed: judged graphically or numerically by inspection of histograms or QQ-plots by formal tests (e.g. PROC UNIVARIATE NORMAL in SAS) have have identical variances: judged using the Bland-Altman plot of differencs vs. averages Observed quantiles against theoretical normal quantiles If the data is normal, the points will be close to the line Sometimes it is necessary to tranform the data in order to fulfill the assumptions 29 / / 80 Model assumption: Normality? The central limit theorem (CLT) Averages of rolls of dice are more normal than a single roll Assumption: the differences follow a normal distribution. We can check the assumption by e.g. looking at the histogram or the QQ-plot. But with large samples the assumption is not always necessary: The validity of the t-test and the confidence intervals only rely on the distributions of the average d... and averages tend to be normal due to the CLT. However: Normal regions (e.g. limits of agreement) require a normal distribution. 31 / / One dice roll Average 10 dice rolls Average dice rolls Average 50 dice rolls Average

9 Classical two-sample (unpaired) comparison Paired or unpaired comparison? If the two treatments were applied to separate groups of subjcets we have independent samples Traditional model assumptions: x 11,, x 1n1 N (µ 1, σ 2 ) x 21,, x 2n2 N (µ 2, σ 2 ) All observations are independent Observations follow a normal distribution within each group Both groups have the same variance, σ 2 The mean values, µ 1 and µ 2 may differ Note the consequences for the difference between MF and SV: Estimated mean difference 0.24, CI: (-2.93, 3.41) according to the paired t-test 0.24, CI: (-12.71, 13.19) according to the unpaired t-test i.e. same estimate but a much wider confidence interval The latter is wrong! You have to respect your design. Do not forget to take advantage of a subject serving as its own control (higher power with fewer individuals) 33 / / 80 Comparing measurement methods Another comparison: REFE vs TEST When comparing two measurement methods: We have to determine the proper scale before carrying out the statistical analysis Is the precision of the measurements approximately the same over the entire range? In that case look at differences on an absolute scale Use the differences between the raw measurements Or does the precision increase with the size of the quantity being measured? In that case look at differences on a relative scale Make a logarithmic transformation 35 / 80 Two methods for determining concentration of glucose: REFE: Colour test, may be polluted by urine acid TEST: Enzymatic test, more specific for glucose Ref: R.G. Miller et.al. (eds): Biostatistics Casebook. John Wiley & Sons, / 80 nr. REFE TEST average SD

10 The usual analysis - the naive approach Plots of the raw data Do we see a systematic difference? Test δ=0 assuming d i = REFE i TEST i N (δ, σ 2 d ) Scatter plot and Bland Altman plot: d = 9.89, s d = 9.70 t = SEM = s d / = 6.92 t(45) n hence P< , i.e. stong indication of bias. d Limits of agreement tells us that the typical differences are 9.89 ± t 97.5% (45) d 1 + 1/ = ( 9.85, 29.64) Is this a valid analysis?!? 37 / 80 The variance of the differences increases with the level; so the model assumptions of the usual analysis are violated! 38 / 80 Plots of the log-transformed data Close up Precision seem to be relative, hence we do a log-transformation Following a logarithmic transformation (and omission of the outlier) the Bland Altman plot looks OK The plots look better except for an outlier 39 / / 80

11 Notes on the log-transformation The correct analysis It is the original measurements, that have to be transformed with the logarithm, not the differences! Never make a logarithmic transformation on data that might be negative! It does not matter which logarithm you choose (i.e. which base of the logarithm) since they are all proportional The procedure with construction of limits of agreement is now repeated for the transformed observations The result can be transformed back to the original scale with the anti-logarithm (exp for the natural logarithm) Do we see a systematic difference? Test δ=0 assuming d i = log(refe i ) log(test i ) N (δ, σ 2 d ) d = 0.066, s d = t = SEM = P< , i.e. stong indication of bias. d d s d / n = t(45) Limits of agreement tells us that the typical differences are ± t 97.5% (45)... on Log-scale! 1 + 1/ = ( 0.020, 0.152) 41 / / 80 Back transformation Limits of agreement on the original scale Limits of agreement on log-scale are ( 0.020, 0.152), meaning that for 95% of the subjects we will have: < log(refe) log(test) < ( ) i.e < log REFE < TEST Back transforming (using the exponential function): = exp( 0.020) < REFE TEST < exp(0.152) = or reversed: = < TEST REFE < = 1.02 So TEST will typically lie 14% below to 2% above REFE. 43 / / 80

12 Non-normal data Example: Fertility and aging If the normal distribution is not a good description: Tests and confidence intervals are valid if the sample is sufficiently large (due to the central limit theorem). To judge the reliability for a given sample: Use resampling techniques Or check with a statistician Normal regions and limits of agreement become untrustworthy! Cross-sectional study: 527 women aged Objective: How does fertility decline with age? Outcomes: Physiological markers of fertility Menstrual cycle length Reproductive hormones (FSH, AMH,... ) Ovarian volume Antral follicle count (AFC) 45 / / 80 Simple linear regeression for AFC AFC = α + β age + ε is this a good model? Log-linear regression A more plausible model is exponential decay, implying a linear model on logarithmic scale: log(afc) = α + β age + ε 47 / / 80

13 Regression with SAS Regression equation and estimates PROC GLM DATA=menopause; MODEL logafc = age / SOLUTION CLPARM; RUN; The GLM Procedure R-Square Coeff Var Root MSE logafc Mean Source DF Type III SS Mean Square F Value Pr > F AGE <.0001 The estimates for the linear regression on logarithmic scale are: Intercept ˆα = 4.07 (95% CI ) The "expected value for age= 0"! Regression coefficient ˆβ = (95% CI to ) The expected decrease in log(afc) with one year of aging. Parameter Estimate Std.Error t Value Pr > t 95% Confidence Limits Intercept < AGE < Note: We could have used PROC REG instead. 49 / / 80 Rate of decline Multiple regression We see exponential decay on the natural scale. The expected AFC for age x (median or geometric mean) is AFC(x) = exp(α + βx) The regression could be biased by possible confounders: Use of oral contraceptives (yes, no) Smoking (current, former og never) Prenatal smoking exposure (yes, no) BMI (under weight, normal weight, over weight, obese) With one year of aging x x + 1 AFC(x + 1) = exp(α + β(x + 1)) = exp(β) AFC(x) Annual rate of change is the factor exp(β) corresponding to the decline {1 exp(β)} 100%. Estimated by exp( ˆβ) = , i.e. a decline of 3.5%. Adjust for these in a multiple regression (general linear model): Y i = α + βx + β 1 X i, β k X i,k + ε i with k additional covariates. Some of these are dummy variables coding for relevant groups 51 / / 80

14 SAS-program SAS-output PROC GLM DATA=menopause; CLASS oc smoking prenatsmoke bmigrp; MODEL logafc = oc smoking prenatsmoke bmigrp age / SOLUTION CLPARM; OUTPUT OUT=diagnostics p=fitted r=residual student=stres; RUN; The GLM Procedure Sum of Source DF Squares Mean Square F Value Pr > F Model <.0001 Error Corrected Total R-Square Coeff Var Root MSE logafc Mean Source DF Type III SS Mean Square F Value Pr > F OC <.0001 SMOKING PRENATSMOKE BMIGRP AGE <.0001 Standard Parameter Estimate Error t Value Pr > t Intercept B <.0001 OC no B <.0001 OC yes B... SMOKING never B SMOKING previous B SMOKING smoker B... PRENATSMOKE no-smoke B PRENATSMOKE smoke B... BMIGRP normal B BMIGRP over B BMIGRP over B BMIGRP under B... AGE < / 80 Adjusted ˆβ = 0.047, i.e. rate of decline by 4.6%. 54 / 80 SAS-output Interpretation of regression coefficients Parameter 95% Confidence Limits Intercept OC no OC yes.. SMOKING never SMOKING previous SMOKING smoker.. PRENATSMOKE no-smoke PRENATSMOKE smoke.. BMIGRP normal BMIGRP over BMIGRP over BMIGRP under AGE with 95% confidence interval (-0.062,-0.033), corresponding to a decline between 3.2% and 6.0%. Simple regression Y = α + β age + ε β is the expected change in log(afc) when age increases by one year. Multiple regression Y = α + β age + β 1 X β k X k + ε β is the expected change in log(afc) when age increases by one year and all other covariates are held fixed. Similarly for the other covariates: e.g. exp(0.154) or 16.6% higher AFC for normal BMI compared to < 18.5 and all other covariates held fixed. 55 / / 80

15 Hypothesis tests Tests of type I and type III Does AFC decline with age? T-test for H 0 β = 0: ˆβ = , s.e( ˆβ) = , t = ˆβ/s.e( ˆβ) = P < in t-distribution with 497 degrees of freedom. Equivalent to F-test: Mean Square(Age)/Mean Square(Error) = P < in F-distribution with (1,497) degrees of freedom Note: In case of a categorical covariates with more than two levels only the F-test is generally applicable. 57 / 80 Mind the difference! Type I: Test the effect of each covariate after ajustment for all other covariates above it on the list. Sequential tests to be read bottom-up. Type III: Test the effect of each covariate after ajustment for all other covariates on the list. Non-sequential tests, pick the one that you like. 58 / 80 Predictions (fitted values) Model assumptions log(afc) = ˆα + ˆβ age + ˆβ 1 I (no prenatal smoking) + ˆβ 2 I (never smoker) + ˆβ 3 I (previous smoker) + ˆβ 4 I (normal BMI) ˆβ 6 I (BMI > 30) + ˆβ 7 I (No use of oral contraceptives) Expected log(afc) of a 30 year old woman, no smoking, normal weight, non-user of oral contraceptives: log(afc) = = I.e. we expect an AFC of exp(3.172) / 80 The general linear model assumes that: 1. The observations are independent 2. The linear model for the mean is correct 3. Error terms (ε i s) are normally distributed with zero mean and equal variances Use the residuals for model diagnostics: R i = Y i Ŷi "Observed value - Predicted value" Standardized values are preferred for diagnostics (because of varying estimation uncertainty in the predicted values) 60 / 80

16 Residual plot Should be fairly symemtric around zero and with no systematic patterns. Residuals against covariates Similar plot looking for non-linear relation with a covariate. 61 / / 80 Checking normality: the QQ-plot Example: Maternal age at menopause 63 / / 80

17 Example: Maternal age at menopause Analysis of covariance Does the decline in fertility depend on heridatory factors? Three groups according to maternal age at menopause: Early, 45 years of age Normal, 46 to 54 years of age Late, > 55 years of age We have a log-linear model for each group. Is the rate of decline the same in all three groups? Another name for a general linear model with one quantiative covariate and one categorical covariate We have one regression line for each group Are the lines parallel? If not we have an interaction between the two covariates Are the lines identical? If not we have differences among the groups 65 / / 80 Example: Maternal age at menopause Estimating regression lines Model: log(afc) ij = α j + β j age ij, j = 1, 2, 3 One set of regression parameters per group Re-set the intercept at age= 22 for interpretability data menopause; set menopause; age22 = age-22; run; In the late-group fertility seems to increase with age??? 67 / 80 proc glm data=menopause; class menogrp; model logafc = menogrp age22*menogrp / noint solution clparm; run; 68 / 80

18 ANCOVA-output Rates of decline Dependent Variable: logafc The GLM Procedure When the slopes are back-transformed, they become estimated rates of decline, with 95%-confidence intervals: R-Square Coeff Var Root MSE logafc Mean Standard Parameter Estimate Error t Value Pr > t 95% Confidence Limits MENOGRP early < MENOGRP late < MENOGRP normal < AGE22*MENOGRP early AGE22*MENOGRP late AGE22*MENOGRP normal < Increasing rate in the late maternal menopause group is insignificant (P=0.27). Maternal menopause Rate of change in AFC per year (95% CI) Early ( 45 years) -5.1% (-8.2% to -1.9%) Normal (46-54 years) -4.1% (-5.5% to -2.7%) Late (> 55 years) +2.2% (-1.7% to +6.3%) Increasing rate in the late-group might as well be a chance finding. 69 / / 80 Re-parametrisation ANCOVA-output Same model other parameters: log(afc) i = α + β age + δ 1 I (group=1) + δ 2 I (group=2) +γ 1 I (group=1) age + γ 2 I (group=2) age Group 3 is reference with regression parameters α and β. δ s and γ s are differences in regression parameters wrt ref. Allows for testing differences among the groups. title1 ANCOVA ; proc glm data=menopause; class menogrp; model logafc = menogrp age22 age22*menogrp / solution; run; 71 / 80 The GLM Procedure R-Square Coeff Var Root MSE logafc Mean Source DF Type III SS Mean Square F Value Pr > F MENOGRP AGE AGE22*MENOGRP Standard Parameter Estimate Error t Value Pr > t Intercept B <.0001 MENOGRP early B MENOGRP late B MENOGRP normal B... AGE B <.0001 AGE22*MENOGRP early B AGE22*MENOGRP late B AGE22*MENOGRP normal B... Regression coefficients differ significantly, intercepts do not. 72 / 80

19 Missing data problem? Assuming identical intercepts We have missing data... among younger women whose mothers aren t yet menopausal i.e. missing not at random data from some of the potentially most fertile tend to be missing This may cause bias Particularly the late-group. Leave out the main effect of menogrp. title1 ANCOVA with same intercept at age 22 ; proc glm data=menopause; class menogrp; model logafc = age22 age22*menogrp/ solution clparm; run; Output: Source DF Type I SS Mean Square F Value Pr > F AGE <.0001 AGE22*MENOGRP Rate of decline still differ significantly between groups (P=0.004). 73 / / 80 A prettier picture Estimated rates of decline... when assuming identical intercepts (at age 22). Estimated rates of decline with 95%-confidence intervals: Maternal menopause Rate of decline in AFC per year (95% CI) Early ( 45 years) 4.7% (3.1% to 6.3%) Normal (46-54 years) 3.7% (2.3% to 4.9%) Late (> 55 years) 2.0% (0.4% to 3.6%) 75 / / 80

20 Summary statistics Program bits Numerical description of quantitative variables: Location, center average (mean value) x = (x x n )/n median (middle observation, 50% above and 50% below) Variation variance, s 2 = Σ(x i x) 2 /(n 1) (quadratic units) standard deviation, s = variance (units as outcome) quantiles, e.g. Inter Quantile Range (25% to 75% quantile) standard error, SE = s/ n (uncertainty of mean estimate) The summary statistics for MF vs SV are made using the code: Note: the data is read in from the file mf_sv.txt (text file with two columns and 21 observations) DATA mydata; INFILE mf_sv.txt FIRSTOBS=2; INPUT mf sv; dif=mf-sv; average=(mf+sv)/2; RUN; PROC MEANS DATA=mydata MEAN STD; RUN; 77 / / 80 Program bits Program bits, cont d The pictures for MF vs SV are made using the code: proc gplot; plot mf*sv / haxis=axis1 vaxis=axis2 frame; axis1 value=(h=2) minor=none label=(h=2); axis2 value=(h=2) minor=none label=(a=90 R=0 H=2); symbol1 v=circle i=none c=black l=1 w=2; run; proc gplot; plot flow*method=subject / nolegend haxis=axis1 vaxis=axis2 frame; axis1 value=(h=2) minor=none label=(h=2); axis2 value=(h=2) minor=none label=(a=90 R=0 H=2); symbol1 v=circle i=join l=1 w=2 r=21; run; proc gplot; plot dif*average / vref=0 lv=1 vref= lv=2 haxis=axis1 vaxis=axis2 frame; axis1 value=(h=2) minor=none label=(h=2 average ); axis2 order=(-16 to 16 by 4) value=(h=2) minor=none label=(a=90 R=0 H=2 difference MF-SV ); symbol1 v=circle i=none l=1 w=2; title h=3 Bland Altman plot ; run; title; proc gchart; vbar dif; run; 79 / / 80

Overview. Prerequisites

Overview. Prerequisites Overview Introduction Practicalities Review of basic ideas Peter Dalgaard Department of Biostatistics University of Copenhagen Structure of the course The normal distribution t tests Determining the size