Correlated data. Introduction. We expect students to... Aim of the course. Faculty of Health Sciences. NFA, May 19, 2014.

Size: px
Start display at page:

Download "Correlated data. Introduction. We expect students to... Aim of the course. Faculty of Health Sciences. NFA, May 19, 2014."

Transcription

1 Faculty of Health Sciences Introduction Correlated data NFA, May 19, 2014 Introduction Julie Lyng Forman & Lene Theil Skovgaard Department of Biostatistics University of Copenhagen The idea of the course Comparing two types of measurement Logarithmic transformation Linear regression The general linear model Home page: / 80 2 / 80 Aim of the course We expect students to... To make the participants able to: understand and interpret advanced statistical analyses judge the assumptions behind the use of various methods of analyses perform own analyses using SAS understand output from a statistical program package - in general, i.e. other than SAS present results from a statistical analysis - numerically and graphically To create a better platform for communication between users of statistics and statisticians, to benefit subsequent collaboration Be interested Be motivated ideally from your own (future) research project Have basic knowledge of statistical concepts such as: mean, average variance, standard deviation, standard error distribution correlation, regression, anova t-test, χ 2 -test, F-test 3 / 80 4 / 80

2 Topics for the course Recommended reading Quantitative data (normal distribution): Analysis of variance Variance component models General linear models / regression analysis Linear mixed models Non-normal outcome (binary data or count data): Logistic or Poisson regression Generalized linear mixed models Not covered: Multivariate data (several outcomes at once) Censored data (survival analysis) The lecture notes (can be downloaded from the course webpages). Brief notes about SAS-programming (can be downloaded from the course webpages). B.T. West, K.B. Welch and A.T. Galecki: Linear mixed models: a practical guide using statistical software, Chapman & Hall/CRC, 2007 We teach SAS programming.... but the book also covers SPSS, R, Stata, and HLM. 5 / 80 6 / 80 Teaching activities Course diploma Lectures: Mornings ( ) Copies of overheads must be downloaded in advance Coffee break around Computers labs: In the afternoon ( ) following each lecture Coffee, tea, and cake will be served Exercises will be handed out Solutions can be downloaded after classes To pass the course 80% attendance is required. It is your responsibility to sign the list each morning and each afternoon. Note: 5 2 = 10 lists, 80% equals 8 half days. There is no compulsory home work... but to benefit from the course you need to work with the material at home We expect you to do so! 7 / 80 8 / 80

3 What are repeated measurements? Paired data Repeated measurements refer to data where the same outcome has been measured in different situations (or at different spots) on the same individuals. Special case: longitudinal means repeatedly over time. Repeated measurements are termed clustered data when the same outcome is measured on groups of individuals from the same families/workplaces/school classes/villages/etc. The most simple example of clustered or repeated measuments. Two replicates or two subjects per cluster Examples of paired data: Same person with treatment and placebo (cross-over studies) Baseline-follow up studies Twin studies Comparison of two measurement methods Reliability of a measurement method Quantiative outcome analysed with the paired t-test BUT often the test is not in focus, rather estimation/quantification 9 / / 80 Statistical analysis Example: MF vs SV The usual assumption is that observations are independent. If you have clustered or repeated measurements the assumption of independence is violated. Your analyses must account for the repetitions/clustering. In this course we will teach you how to do it. Warning: Ignoring the repetitions/clustering and doing a standard analysis most often leads to: P-values that are too small or too large. confidence intervals that are too wide or too narrow. Two measurement methods, expected to give the same result: MF: Transmitral volumetric flow, determined by Doppler eccocardiography SV: Left ventricular stroke volume, determined by cross-sectional eccocardiography subject MF SV / / 80

4 Comparison of measurement methods Usually a comparison of a new experimental method with an established method (the reference) How well do the two measurements agree? Is the new method biased compared to the reference? Description of the data Graphical description Scatterplot Sample paths Bland-Altman plot Histogram The data is paired The subjects act as their own controls Hence we look at differences within subjects Set up a statistical model to: Describe the typical size of the differences Test if the bias (i.e. the mean difference) is zero 13 / / 80 Numerical description Variable Mean Std.Dev MF SV DIF AVERAGE Statistical model for paired data The normal distribution x i : MF-measurement for the i th subject y i : SV-measurement for the i th subject N (µ, σ 2 ) Look at the differences: d i = x i y i, for i = 1,..., 21 Density N(, ) The mean is often denoted µ or α. The model asssumes that the differences are: independent N(, ) The standard deviation is often denoted σ or ω. normally distributed d i N (δ, σd 2) No assumptions are made about the distribution of the individual x The variance is σ 2. flow measurements 15 / / 80

5 Paired t-test in SAS Can be performed in two different ways: PROC TTEST; PAIRED mf*sv; RUN; 1. as a paired two-sample test One-sample tests in SAS, for differences 2. as a one-sample test on the differences: PROC UNIVARIATE NORMAL; VAR dif; RUN; The UNIVARIATE Procedure Variable: dif The TTEST Procedure Statistics Lower CL Upper CL Lower CL Upper CL Difference N Mean Mean Mean Std Dev Std Dev Std Dev mf - sv Difference Std Err Minimum Maximum mf - sv T-Tests Difference DF t Value Pr > t mf - sv / 80 Tests for Location: Mu0=0 Test -Statistic p Value Student s t t Pr > t Sign M 2.5 Pr >= M Signed Rank S 8 Pr >= S Moments N 21 Sum Weights 21 Mean Sum Observations 5 Std Deviation Variance / 80 About the paired t-test Estimation of bias The estimated mean difference is given by Test of the null hypothesis H 0 : δ = 0 (no bias) The t-statistic is given by: t = d 0 SEM = / 21 which gives P = 0.88, i.e. no significant bias. = t(20) Does this mean that the measurement methods are equally good? d = 0.24 cm 3 The estimate is our best guess, but repeating the experiment would give us a somewhat different result The estimate has a distribution, with an uncertainty called the standard error of the estimate. The standard error of the mean is given by SEM = s d n = = 1.52 cm 3 19 / / 80

6 General confidence intervals Confidence limits for the bias Confidence intervals tells us what the parameter is likely to be An interval, that catches the true mean with a 95% probability is called a 95% confidence interval 95% is called the coverage The usual construction is: Average ±t 97.5% (n 1) SEM Often a good approximation, even if data are not normally distributed (due to the central limit theorem) The t-quantile t 97.5% may be looked up in a table or computed by a program (e.g. R, see For the differences mf-sv, we get the confidence interval: d ± t 97.5% (20) SEM 0.24 ± / 21 ( 2.93 ; 3.41) If there is a bias, it is likely (i.e. with 95% certainty) within the limits ( 2.93cm 3, 3.41cm 3 ) Conclusion: We cannot rule out a bias of approx. 3 cm 3 in either direction 21 / / 80 P-values and confidence intervals Note the difference Tests and confidence intervals are equivalent in a certain sense They agree on reasonable values for the mean The confidence interval contains the values δ 0 for which H 0 : δ = δ 0 would be accepted But the P-value is less informative than the confidence interval If the study is large a tiny bias may be significant If the study is small a large bias may be insignificant Better use the confidence interval to judge the clinical implications of the bias! Standard error (of the mean), SE(M) tells us something about the uncertainty of the estimate of the mean SEM = SD/ n is the standard deviation in the distibution of the estimate is used for comparisons, relations etc. Standard deviation, SD tells us something about the variation in our sample, and presumably in the population is used when describing the data 23 / / 80

7 Normal regions Prediction intervals The normal region is an interval containing 95% of the typical observations, i.e. the midrange of the population: 2.5%-quantile to 97.5%-quantile If the distribution is normal N (µ, σ 2 ), then 2.5%-quantile to 97.5%-quantile is µ ± 1.96 σ An estimated normal region is given by: Average ± 2 SD But this does not account for parameter uncertainty! 25 / 80 A prediction interval has to catch future observations with high probability, say 95%. x ± 2s is a good prediction interval if the sample is large. But if the sample is small the coverage will be too low. 95% coverage is attained by the prediction interval: ( x s 1 + 1/n t 2.5%, x + s 1 + 1/n t 97.5% ) I.e. the probability that a randomly chosen subject from the population has a value in this interval is 95% if the data is normal 26 / 80 Limits of agreement Derivation of the prediction interval Limits-of-agreement is the prediction interval for the difference between two measuring methods important for deciding whether or not two measurement methods may replace each other. Limits-of-agreement for mf-sv are given by: 0.24 ± / = ( 14.97, 15.45) While " x ± 2s" is too narrow / has too low coverage: 27 / 80 d ± 2 s d = 0.24 ± = ( 13.68, 14.16) Assume that d new is a new observation, then ( d new d N 0, σd 2 (1 ) ) + 1 n dnew d s d 1+1/n t(n 1) implying that with 95% probability: t 2.5% < d new d s d 1+1/n < t 97.5% d + s d 1 + 1/n t2.5% < d new < d + s d 1 + 1/n t 97.5% d s d 1 + 1/n t97.5% < d new < d + s d 1 + 1/n t 97.5% since t 2.5% = t 97.5% by symmetry of the t-distribution. 28 / 80

8 Assumptions for the paired comparison Checking normality: the QQ-plot The differences: are independent, i.e. the subjects are unrelated are normally distributed: judged graphically or numerically by inspection of histograms or QQ-plots by formal tests (e.g. PROC UNIVARIATE NORMAL in SAS) have have identical variances: judged using the Bland-Altman plot of differencs vs. averages Observed quantiles against theoretical normal quantiles If the data is normal, the points will be close to the line Sometimes it is necessary to tranform the data in order to fulfill the assumptions 29 / / 80 Model assumption: Normality? The central limit theorem (CLT) Averages of rolls of dice are more normal than a single roll Assumption: the differences follow a normal distribution. We can check the assumption by e.g. looking at the histogram or the QQ-plot. But with large samples the assumption is not always necessary: The validity of the t-test and the confidence intervals only rely on the distributions of the average d... and averages tend to be normal due to the CLT. However: Normal regions (e.g. limits of agreement) require a normal distribution. 31 / / One dice roll Average 10 dice rolls Average dice rolls Average 50 dice rolls Average

9 Classical two-sample (unpaired) comparison Paired or unpaired comparison? If the two treatments were applied to separate groups of subjcets we have independent samples Traditional model assumptions: x 11,, x 1n1 N (µ 1, σ 2 ) x 21,, x 2n2 N (µ 2, σ 2 ) All observations are independent Observations follow a normal distribution within each group Both groups have the same variance, σ 2 The mean values, µ 1 and µ 2 may differ Note the consequences for the difference between MF and SV: Estimated mean difference 0.24, CI: (-2.93, 3.41) according to the paired t-test 0.24, CI: (-12.71, 13.19) according to the unpaired t-test i.e. same estimate but a much wider confidence interval The latter is wrong! You have to respect your design. Do not forget to take advantage of a subject serving as its own control (higher power with fewer individuals) 33 / / 80 Comparing measurement methods Another comparison: REFE vs TEST When comparing two measurement methods: We have to determine the proper scale before carrying out the statistical analysis Is the precision of the measurements approximately the same over the entire range? In that case look at differences on an absolute scale Use the differences between the raw measurements Or does the precision increase with the size of the quantity being measured? In that case look at differences on a relative scale Make a logarithmic transformation 35 / 80 Two methods for determining concentration of glucose: REFE: Colour test, may be polluted by urine acid TEST: Enzymatic test, more specific for glucose Ref: R.G. Miller et.al. (eds): Biostatistics Casebook. John Wiley & Sons, / 80 nr. REFE TEST average SD

10 The usual analysis - the naive approach Plots of the raw data Do we see a systematic difference? Test δ=0 assuming d i = REFE i TEST i N (δ, σ 2 d ) Scatter plot and Bland Altman plot: d = 9.89, s d = 9.70 t = SEM = s d / = 6.92 t(45) n hence P< , i.e. stong indication of bias. d Limits of agreement tells us that the typical differences are 9.89 ± t 97.5% (45) d 1 + 1/ = ( 9.85, 29.64) Is this a valid analysis?!? 37 / 80 The variance of the differences increases with the level; so the model assumptions of the usual analysis are violated! 38 / 80 Plots of the log-transformed data Close up Precision seem to be relative, hence we do a log-transformation Following a logarithmic transformation (and omission of the outlier) the Bland Altman plot looks OK The plots look better except for an outlier 39 / / 80

11 Notes on the log-transformation The correct analysis It is the original measurements, that have to be transformed with the logarithm, not the differences! Never make a logarithmic transformation on data that might be negative! It does not matter which logarithm you choose (i.e. which base of the logarithm) since they are all proportional The procedure with construction of limits of agreement is now repeated for the transformed observations The result can be transformed back to the original scale with the anti-logarithm (exp for the natural logarithm) Do we see a systematic difference? Test δ=0 assuming d i = log(refe i ) log(test i ) N (δ, σ 2 d ) d = 0.066, s d = t = SEM = P< , i.e. stong indication of bias. d d s d / n = t(45) Limits of agreement tells us that the typical differences are ± t 97.5% (45)... on Log-scale! 1 + 1/ = ( 0.020, 0.152) 41 / / 80 Back transformation Limits of agreement on the original scale Limits of agreement on log-scale are ( 0.020, 0.152), meaning that for 95% of the subjects we will have: < log(refe) log(test) < ( ) i.e < log REFE < TEST Back transforming (using the exponential function): = exp( 0.020) < REFE TEST < exp(0.152) = or reversed: = < TEST REFE < = 1.02 So TEST will typically lie 14% below to 2% above REFE. 43 / / 80

12 Non-normal data Example: Fertility and aging If the normal distribution is not a good description: Tests and confidence intervals are valid if the sample is sufficiently large (due to the central limit theorem). To judge the reliability for a given sample: Use resampling techniques Or check with a statistician Normal regions and limits of agreement become untrustworthy! Cross-sectional study: 527 women aged Objective: How does fertility decline with age? Outcomes: Physiological markers of fertility Menstrual cycle length Reproductive hormones (FSH, AMH,... ) Ovarian volume Antral follicle count (AFC) 45 / / 80 Simple linear regeression for AFC AFC = α + β age + ε is this a good model? Log-linear regression A more plausible model is exponential decay, implying a linear model on logarithmic scale: log(afc) = α + β age + ε 47 / / 80

13 Regression with SAS Regression equation and estimates PROC GLM DATA=menopause; MODEL logafc = age / SOLUTION CLPARM; RUN; The GLM Procedure R-Square Coeff Var Root MSE logafc Mean Source DF Type III SS Mean Square F Value Pr > F AGE <.0001 The estimates for the linear regression on logarithmic scale are: Intercept ˆα = 4.07 (95% CI ) The "expected value for age= 0"! Regression coefficient ˆβ = (95% CI to ) The expected decrease in log(afc) with one year of aging. Parameter Estimate Std.Error t Value Pr > t 95% Confidence Limits Intercept < AGE < Note: We could have used PROC REG instead. 49 / / 80 Rate of decline Multiple regression We see exponential decay on the natural scale. The expected AFC for age x (median or geometric mean) is AFC(x) = exp(α + βx) The regression could be biased by possible confounders: Use of oral contraceptives (yes, no) Smoking (current, former og never) Prenatal smoking exposure (yes, no) BMI (under weight, normal weight, over weight, obese) With one year of aging x x + 1 AFC(x + 1) = exp(α + β(x + 1)) = exp(β) AFC(x) Annual rate of change is the factor exp(β) corresponding to the decline {1 exp(β)} 100%. Estimated by exp( ˆβ) = , i.e. a decline of 3.5%. Adjust for these in a multiple regression (general linear model): Y i = α + βx + β 1 X i, β k X i,k + ε i with k additional covariates. Some of these are dummy variables coding for relevant groups 51 / / 80

14 SAS-program SAS-output PROC GLM DATA=menopause; CLASS oc smoking prenatsmoke bmigrp; MODEL logafc = oc smoking prenatsmoke bmigrp age / SOLUTION CLPARM; OUTPUT OUT=diagnostics p=fitted r=residual student=stres; RUN; The GLM Procedure Sum of Source DF Squares Mean Square F Value Pr > F Model <.0001 Error Corrected Total R-Square Coeff Var Root MSE logafc Mean Source DF Type III SS Mean Square F Value Pr > F OC <.0001 SMOKING PRENATSMOKE BMIGRP AGE <.0001 Standard Parameter Estimate Error t Value Pr > t Intercept B <.0001 OC no B <.0001 OC yes B... SMOKING never B SMOKING previous B SMOKING smoker B... PRENATSMOKE no-smoke B PRENATSMOKE smoke B... BMIGRP normal B BMIGRP over B BMIGRP over B BMIGRP under B... AGE < / 80 Adjusted ˆβ = 0.047, i.e. rate of decline by 4.6%. 54 / 80 SAS-output Interpretation of regression coefficients Parameter 95% Confidence Limits Intercept OC no OC yes.. SMOKING never SMOKING previous SMOKING smoker.. PRENATSMOKE no-smoke PRENATSMOKE smoke.. BMIGRP normal BMIGRP over BMIGRP over BMIGRP under AGE with 95% confidence interval (-0.062,-0.033), corresponding to a decline between 3.2% and 6.0%. Simple regression Y = α + β age + ε β is the expected change in log(afc) when age increases by one year. Multiple regression Y = α + β age + β 1 X β k X k + ε β is the expected change in log(afc) when age increases by one year and all other covariates are held fixed. Similarly for the other covariates: e.g. exp(0.154) or 16.6% higher AFC for normal BMI compared to < 18.5 and all other covariates held fixed. 55 / / 80

15 Hypothesis tests Tests of type I and type III Does AFC decline with age? T-test for H 0 β = 0: ˆβ = , s.e( ˆβ) = , t = ˆβ/s.e( ˆβ) = P < in t-distribution with 497 degrees of freedom. Equivalent to F-test: Mean Square(Age)/Mean Square(Error) = P < in F-distribution with (1,497) degrees of freedom Note: In case of a categorical covariates with more than two levels only the F-test is generally applicable. 57 / 80 Mind the difference! Type I: Test the effect of each covariate after ajustment for all other covariates above it on the list. Sequential tests to be read bottom-up. Type III: Test the effect of each covariate after ajustment for all other covariates on the list. Non-sequential tests, pick the one that you like. 58 / 80 Predictions (fitted values) Model assumptions log(afc) = ˆα + ˆβ age + ˆβ 1 I (no prenatal smoking) + ˆβ 2 I (never smoker) + ˆβ 3 I (previous smoker) + ˆβ 4 I (normal BMI) ˆβ 6 I (BMI > 30) + ˆβ 7 I (No use of oral contraceptives) Expected log(afc) of a 30 year old woman, no smoking, normal weight, non-user of oral contraceptives: log(afc) = = I.e. we expect an AFC of exp(3.172) / 80 The general linear model assumes that: 1. The observations are independent 2. The linear model for the mean is correct 3. Error terms (ε i s) are normally distributed with zero mean and equal variances Use the residuals for model diagnostics: R i = Y i Ŷi "Observed value - Predicted value" Standardized values are preferred for diagnostics (because of varying estimation uncertainty in the predicted values) 60 / 80

16 Residual plot Should be fairly symemtric around zero and with no systematic patterns. Residuals against covariates Similar plot looking for non-linear relation with a covariate. 61 / / 80 Checking normality: the QQ-plot Example: Maternal age at menopause 63 / / 80

17 Example: Maternal age at menopause Analysis of covariance Does the decline in fertility depend on heridatory factors? Three groups according to maternal age at menopause: Early, 45 years of age Normal, 46 to 54 years of age Late, > 55 years of age We have a log-linear model for each group. Is the rate of decline the same in all three groups? Another name for a general linear model with one quantiative covariate and one categorical covariate We have one regression line for each group Are the lines parallel? If not we have an interaction between the two covariates Are the lines identical? If not we have differences among the groups 65 / / 80 Example: Maternal age at menopause Estimating regression lines Model: log(afc) ij = α j + β j age ij, j = 1, 2, 3 One set of regression parameters per group Re-set the intercept at age= 22 for interpretability data menopause; set menopause; age22 = age-22; run; In the late-group fertility seems to increase with age??? 67 / 80 proc glm data=menopause; class menogrp; model logafc = menogrp age22*menogrp / noint solution clparm; run; 68 / 80

18 ANCOVA-output Rates of decline Dependent Variable: logafc The GLM Procedure When the slopes are back-transformed, they become estimated rates of decline, with 95%-confidence intervals: R-Square Coeff Var Root MSE logafc Mean Standard Parameter Estimate Error t Value Pr > t 95% Confidence Limits MENOGRP early < MENOGRP late < MENOGRP normal < AGE22*MENOGRP early AGE22*MENOGRP late AGE22*MENOGRP normal < Increasing rate in the late maternal menopause group is insignificant (P=0.27). Maternal menopause Rate of change in AFC per year (95% CI) Early ( 45 years) -5.1% (-8.2% to -1.9%) Normal (46-54 years) -4.1% (-5.5% to -2.7%) Late (> 55 years) +2.2% (-1.7% to +6.3%) Increasing rate in the late-group might as well be a chance finding. 69 / / 80 Re-parametrisation ANCOVA-output Same model other parameters: log(afc) i = α + β age + δ 1 I (group=1) + δ 2 I (group=2) +γ 1 I (group=1) age + γ 2 I (group=2) age Group 3 is reference with regression parameters α and β. δ s and γ s are differences in regression parameters wrt ref. Allows for testing differences among the groups. title1 ANCOVA ; proc glm data=menopause; class menogrp; model logafc = menogrp age22 age22*menogrp / solution; run; 71 / 80 The GLM Procedure R-Square Coeff Var Root MSE logafc Mean Source DF Type III SS Mean Square F Value Pr > F MENOGRP AGE AGE22*MENOGRP Standard Parameter Estimate Error t Value Pr > t Intercept B <.0001 MENOGRP early B MENOGRP late B MENOGRP normal B... AGE B <.0001 AGE22*MENOGRP early B AGE22*MENOGRP late B AGE22*MENOGRP normal B... Regression coefficients differ significantly, intercepts do not. 72 / 80

19 Missing data problem? Assuming identical intercepts We have missing data... among younger women whose mothers aren t yet menopausal i.e. missing not at random data from some of the potentially most fertile tend to be missing This may cause bias Particularly the late-group. Leave out the main effect of menogrp. title1 ANCOVA with same intercept at age 22 ; proc glm data=menopause; class menogrp; model logafc = age22 age22*menogrp/ solution clparm; run; Output: Source DF Type I SS Mean Square F Value Pr > F AGE <.0001 AGE22*MENOGRP Rate of decline still differ significantly between groups (P=0.004). 73 / / 80 A prettier picture Estimated rates of decline... when assuming identical intercepts (at age 22). Estimated rates of decline with 95%-confidence intervals: Maternal menopause Rate of decline in AFC per year (95% CI) Early ( 45 years) 4.7% (3.1% to 6.3%) Normal (46-54 years) 3.7% (2.3% to 4.9%) Late (> 55 years) 2.0% (0.4% to 3.6%) 75 / / 80

20 Summary statistics Program bits Numerical description of quantitative variables: Location, center average (mean value) x = (x x n )/n median (middle observation, 50% above and 50% below) Variation variance, s 2 = Σ(x i x) 2 /(n 1) (quadratic units) standard deviation, s = variance (units as outcome) quantiles, e.g. Inter Quantile Range (25% to 75% quantile) standard error, SE = s/ n (uncertainty of mean estimate) The summary statistics for MF vs SV are made using the code: Note: the data is read in from the file mf_sv.txt (text file with two columns and 21 observations) DATA mydata; INFILE mf_sv.txt FIRSTOBS=2; INPUT mf sv; dif=mf-sv; average=(mf+sv)/2; RUN; PROC MEANS DATA=mydata MEAN STD; RUN; 77 / / 80 Program bits Program bits, cont d The pictures for MF vs SV are made using the code: proc gplot; plot mf*sv / haxis=axis1 vaxis=axis2 frame; axis1 value=(h=2) minor=none label=(h=2); axis2 value=(h=2) minor=none label=(a=90 R=0 H=2); symbol1 v=circle i=none c=black l=1 w=2; run; proc gplot; plot flow*method=subject / nolegend haxis=axis1 vaxis=axis2 frame; axis1 value=(h=2) minor=none label=(h=2); axis2 value=(h=2) minor=none label=(a=90 R=0 H=2); symbol1 v=circle i=join l=1 w=2 r=21; run; proc gplot; plot dif*average / vref=0 lv=1 vref= lv=2 haxis=axis1 vaxis=axis2 frame; axis1 value=(h=2) minor=none label=(h=2 average ); axis2 order=(-16 to 16 by 4) value=(h=2) minor=none label=(a=90 R=0 H=2 difference MF-SV ); symbol1 v=circle i=none l=1 w=2; title h=3 Bland Altman plot ; run; title; proc gchart; vbar dif; run; 79 / / 80

Overview. Prerequisites

Overview. Prerequisites Overview Introduction Practicalities Review of basic ideas Peter Dalgaard Department of Biostatistics University of Copenhagen Structure of the course The normal distribution t tests Determining the size

More information

6. Multiple regression - PROC GLM

6. Multiple regression - PROC GLM Use of SAS - November 2016 6. Multiple regression - PROC GLM Karl Bang Christensen Department of Biostatistics, University of Copenhagen. http://biostat.ku.dk/~kach/sas2016/ kach@biostat.ku.dk, tel: 35327491

More information

Varians- og regressionsanalyse

Varians- og regressionsanalyse Faculty of Health Sciences Overview Varians- og regressionsanalyse Introduction / Repetition Lene Theil Skovgaard Department of Biostatistics Homepages: http://staff.pubhealth.ku.dk/~lts/regression11_2

More information

Analysis of variance and regression. April 17, Contents Comparison of several groups One-way ANOVA. Two-way ANOVA Interaction Model checking

Analysis of variance and regression. April 17, Contents Comparison of several groups One-way ANOVA. Two-way ANOVA Interaction Model checking Analysis of variance and regression Contents Comparison of several groups One-way ANOVA April 7, 008 Two-way ANOVA Interaction Model checking ANOVA, April 008 Comparison of or more groups Julie Lyng Forman,

More information

Analysis of variance and regression. November 22, 2007

Analysis of variance and regression. November 22, 2007 Analysis of variance and regression November 22, 2007 Parametrisations: Choice of parameters Comparison of models Test for linearity Linear splines Lene Theil Skovgaard, Dept. of Biostatistics, Institute

More information

Analysis of variance. April 16, Contents Comparison of several groups

Analysis of variance. April 16, Contents Comparison of several groups Contents Comparison of several groups Analysis of variance April 16, 2009 One-way ANOVA Two-way ANOVA Interaction Model checking Acknowledgement for use of presentation Julie Lyng Forman, Dept. of Biostatistics

More information

Analysis of variance. April 16, 2009

Analysis of variance. April 16, 2009 Analysis of variance April 16, 2009 Contents Comparison of several groups One-way ANOVA Two-way ANOVA Interaction Model checking Acknowledgement for use of presentation Julie Lyng Forman, Dept. of Biostatistics

More information

Outline. Analysis of Variance. Acknowledgements. Comparison of 2 or more groups. Comparison of serveral groups

Outline. Analysis of Variance. Acknowledgements. Comparison of 2 or more groups. Comparison of serveral groups Outline Analysis of Variance Analysis of variance and regression course http://staff.pubhealth.ku.dk/~lts/regression10_2/index.html Comparison of serveral groups Model checking Marc Andersen, mja@statgroup.dk

More information

Outline. Analysis of Variance. Comparison of 2 or more groups. Acknowledgements. Comparison of serveral groups

Outline. Analysis of Variance. Comparison of 2 or more groups. Acknowledgements. Comparison of serveral groups Outline Analysis of Variance Analysis of variance and regression course http://staff.pubhealth.ku.dk/~jufo/varianceregressionf2011.html Comparison of serveral groups Model checking Marc Andersen, mja@statgroup.dk

More information

Analysis of Variance

Analysis of Variance 1 / 70 Analysis of Variance Analysis of variance and regression course http://staff.pubhealth.ku.dk/~lts/regression11_2 Marc Andersen, mja@statgroup.dk Analysis of variance and regression for health researchers,

More information

The General Linear Model. April 22, 2008

The General Linear Model. April 22, 2008 The General Linear Model. April 22, 2008 Multiple regression Data: The Faroese Mercury Study Simple linear regression Confounding The multiple linear regression model Interpretation of parameters Model

More information

Statistics for exp. medical researchers Regression and Correlation

Statistics for exp. medical researchers Regression and Correlation Faculty of Health Sciences Regression analysis Statistics for exp. medical researchers Regression and Correlation Lene Theil Skovgaard Sept. 28, 2015 Linear regression, Estimation and Testing Confidence

More information

The General Linear Model. November 20, 2007

The General Linear Model. November 20, 2007 The General Linear Model. November 20, 2007 Multiple regression Data: The Faroese Mercury Study Simple linear regression Confounding The multiple linear regression model Interpretation of parameters Model

More information

Parametrisations, splines

Parametrisations, splines / 7 Parametrisations, splines Analysis of variance and regression course http://staff.pubhealth.ku.dk/~lts/regression_2 Marc Andersen, mja@statgroup.dk Analysis of variance and regression for health researchers,

More information

Variance component models part I

Variance component models part I Faculty of Health Sciences Variance component models part I Analysis of repeated measurements, 30th November 2012 Julie Lyng Forman & Lene Theil Skovgaard Department of Biostatistics, University of Copenhagen

More information

Answer to exercise 'height vs. age' (Juul)

Answer to exercise 'height vs. age' (Juul) Answer to exercise 'height vs. age' (Juul) Question 1 Fitting a straight line to height for males in the age range 5-20 and making the corresponding illustration is performed by writing: proc reg data=juul;

More information

SAS Commands. General Plan. Output. Construct scatterplot / interaction plot. Run full model

SAS Commands. General Plan. Output. Construct scatterplot / interaction plot. Run full model Topic 23 - Unequal Replication Data Model Outline - Fall 2013 Parameter Estimates Inference Topic 23 2 Example Page 954 Data for Two Factor ANOVA Y is the response variable Factor A has levels i = 1, 2,...,

More information

Regression models. Categorical covariate, Quantitative outcome. Examples of categorical covariates. Group characteristics. Faculty of Health Sciences

Regression models. Categorical covariate, Quantitative outcome. Examples of categorical covariates. Group characteristics. Faculty of Health Sciences Faculty of Health Sciences Categorical covariate, Quantitative outcome Regression models Categorical covariate, Quantitative outcome Lene Theil Skovgaard April 29, 2013 PKA & LTS, Sect. 3.2, 3.2.1 ANOVA

More information

Multi-factor analysis of variance

Multi-factor analysis of variance Faculty of Health Sciences Outline Multi-factor analysis of variance Basic statistics for experimental researchers 2015 Two-way ANOVA and interaction Mathed samples ANOVA Random vs systematic variation

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

Statistics for exp. medical researchers Comparison of groups, T-tests and ANOVA

Statistics for exp. medical researchers Comparison of groups, T-tests and ANOVA Faculty of Health Sciences Outline Statistics for exp. medical researchers Comparison of groups, T-tests and ANOVA Lene Theil Skovgaard Sept. 14, 2015 Paired comparisons: tests and confidence intervals

More information

Business Statistics. Lecture 10: Course Review

Business Statistics. Lecture 10: Course Review Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,

More information

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model Topic 17 - Single Factor Analysis of Variance - Fall 2013 One way ANOVA Cell means model Factor effects model Outline Topic 17 2 One-way ANOVA Response variable Y is continuous Explanatory variable is

More information

Swabs, revisited. The families were subdivided into 3 groups according to the factor crowding, which describes the space available for the household.

Swabs, revisited. The families were subdivided into 3 groups according to the factor crowding, which describes the space available for the household. Swabs, revisited 18 families with 3 children each (in well defined age intervals) were followed over a certain period of time, during which repeated swabs were taken. The variable swabs indicates how many

More information

A Little Stats Won t Hurt You

A Little Stats Won t Hurt You A Little Stats Won t Hurt You Nate Derby Statis Pro Data Analytics Seattle, WA, USA Edmonton SAS Users Group, 11/13/09 Nate Derby A Little Stats Won t Hurt You 1 / 71 Outline Introduction 1 Introduction

More information

General Linear Model (Chapter 4)

General Linear Model (Chapter 4) General Linear Model (Chapter 4) Outcome variable is considered continuous Simple linear regression Scatterplots OLS is BLUE under basic assumptions MSE estimates residual variance testing regression coefficients

More information

Exam Applied Statistical Regression. Good Luck!

Exam Applied Statistical Regression. Good Luck! Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.

More information

Power Analysis for One-Way ANOVA

Power Analysis for One-Way ANOVA Chapter 12 Power Analysis for One-Way ANOVA Recall that the power of a statistical test is the probability of rejecting H 0 when H 0 is false, and some alternative hypothesis H 1 is true. We saw earlier

More information

Tutorial 6: Tutorial on Translating between GLIMMPSE Power Analysis and Data Analysis. Acknowledgements:

Tutorial 6: Tutorial on Translating between GLIMMPSE Power Analysis and Data Analysis. Acknowledgements: Tutorial 6: Tutorial on Translating between GLIMMPSE Power Analysis and Data Analysis Anna E. Barón, Keith E. Muller, Sarah M. Kreidler, and Deborah H. Glueck Acknowledgements: The project was supported

More information

Acknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression

Acknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression INTRODUCTION TO CLINICAL RESEARCH Introduction to Linear Regression Karen Bandeen-Roche, Ph.D. July 17, 2012 Acknowledgements Marie Diener-West Rick Thompson ICTR Leadership / Team JHU Intro to Clinical

More information

STAT 7030: Categorical Data Analysis

STAT 7030: Categorical Data Analysis STAT 7030: Categorical Data Analysis 5. Logistic Regression Peng Zeng Department of Mathematics and Statistics Auburn University Fall 2012 Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall 2012

More information

Analysis of Covariance

Analysis of Covariance Analysis of Covariance (ANCOVA) Bruce A Craig Department of Statistics Purdue University STAT 514 Topic 10 1 When to Use ANCOVA In experiment, there is a nuisance factor x that is 1 Correlated with y 2

More information

Correlation and regression

Correlation and regression 1 Correlation and regression Yongjua Laosiritaworn Introductory on Field Epidemiology 6 July 2015, Thailand Data 2 Illustrative data (Doll, 1955) 3 Scatter plot 4 Doll, 1955 5 6 Correlation coefficient,

More information

Topic 20: Single Factor Analysis of Variance

Topic 20: Single Factor Analysis of Variance Topic 20: Single Factor Analysis of Variance Outline Single factor Analysis of Variance One set of treatments Cell means model Factor effects model Link to linear regression using indicator explanatory

More information

Outline. Topic 22 - Interaction in Two Factor ANOVA. Interaction Not Significant. General Plan

Outline. Topic 22 - Interaction in Two Factor ANOVA. Interaction Not Significant. General Plan Topic 22 - Interaction in Two Factor ANOVA - Fall 2013 Outline Strategies for Analysis when interaction not present when interaction present when n ij = 1 when factor(s) quantitative Topic 22 2 General

More information

Statistics 262: Intermediate Biostatistics Regression & Survival Analysis

Statistics 262: Intermediate Biostatistics Regression & Survival Analysis Statistics 262: Intermediate Biostatistics Regression & Survival Analysis Jonathan Taylor & Kristin Cobb Statistics 262: Intermediate Biostatistics p.1/?? Introduction This course is an applied course,

More information

Correlated data. Repeated measurements over time. Typical set-up for repeated measurements. Traditional presentation of data

Correlated data. Repeated measurements over time. Typical set-up for repeated measurements. Traditional presentation of data Faculty of Health Sciences Repeated measurements over time Correlated data NFA, May 22, 2014 Longitudinal measurements Julie Lyng Forman & Lene Theil Skovgaard Department of Biostatistics University of

More information

Topic 25 - One-Way Random Effects Models. Outline. Random Effects vs Fixed Effects. Data for One-way Random Effects Model. One-way Random effects

Topic 25 - One-Way Random Effects Models. Outline. Random Effects vs Fixed Effects. Data for One-way Random Effects Model. One-way Random effects Topic 5 - One-Way Random Effects Models One-way Random effects Outline Model Variance component estimation - Fall 013 Confidence intervals Topic 5 Random Effects vs Fixed Effects Consider factor with numerous

More information

STAT 350: Summer Semester Midterm 1: Solutions

STAT 350: Summer Semester Midterm 1: Solutions Name: Student Number: STAT 350: Summer Semester 2008 Midterm 1: Solutions 9 June 2008 Instructor: Richard Lockhart Instructions: This is an open book test. You may use notes, text, other books and a calculator.

More information

STAT 350. Assignment 4

STAT 350. Assignment 4 STAT 350 Assignment 4 1. For the Mileage data in assignment 3 conduct a residual analysis and report your findings. I used the full model for this since my answers to assignment 3 suggested we needed the

More information

Correlation and Simple Linear Regression

Correlation and Simple Linear Regression Correlation and Simple Linear Regression Sasivimol Rattanasiri, Ph.D Section for Clinical Epidemiology and Biostatistics Ramathibodi Hospital, Mahidol University E-mail: sasivimol.rat@mahidol.ac.th 1 Outline

More information

Business Statistics. Lecture 5: Confidence Intervals

Business Statistics. Lecture 5: Confidence Intervals Business Statistics Lecture 5: Confidence Intervals Goals for this Lecture Confidence intervals The t distribution 2 Welcome to Interval Estimation! Moments Mean 815.0340 Std Dev 0.8923 Std Error Mean

More information

unadjusted model for baseline cholesterol 22:31 Monday, April 19,

unadjusted model for baseline cholesterol 22:31 Monday, April 19, unadjusted model for baseline cholesterol 22:31 Monday, April 19, 2004 1 Class Level Information Class Levels Values TRETGRP 3 3 4 5 SEX 2 0 1 Number of observations 916 unadjusted model for baseline cholesterol

More information

Overview Scatter Plot Example

Overview Scatter Plot Example Overview Topic 22 - Linear Regression and Correlation STAT 5 Professor Bruce Craig Consider one population but two variables For each sampling unit observe X and Y Assume linear relationship between variables

More information

Outline. Review regression diagnostics Remedial measures Weighted regression Ridge regression Robust regression Bootstrapping

Outline. Review regression diagnostics Remedial measures Weighted regression Ridge regression Robust regression Bootstrapping Topic 19: Remedies Outline Review regression diagnostics Remedial measures Weighted regression Ridge regression Robust regression Bootstrapping Regression Diagnostics Summary Check normality of the residuals

More information

Chapter 11. Analysis of Variance (One-Way)

Chapter 11. Analysis of Variance (One-Way) Chapter 11 Analysis of Variance (One-Way) We now develop a statistical procedure for comparing the means of two or more groups, known as analysis of variance or ANOVA. These groups might be the result

More information

Topic 23: Diagnostics and Remedies

Topic 23: Diagnostics and Remedies Topic 23: Diagnostics and Remedies Outline Diagnostics residual checks ANOVA remedial measures Diagnostics Overview We will take the diagnostics and remedial measures that we learned for regression and

More information

An Introduction to Multilevel Models. PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 25: December 7, 2012

An Introduction to Multilevel Models. PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 25: December 7, 2012 An Introduction to Multilevel Models PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 25: December 7, 2012 Today s Class Concepts in Longitudinal Modeling Between-Person vs. +Within-Person

More information

a. The least squares estimators of intercept and slope are (from JMP output): b 0 = 6.25 b 1 =

a. The least squares estimators of intercept and slope are (from JMP output): b 0 = 6.25 b 1 = Stat 28 Fall 2004 Key to Homework Exercise.10 a. There is evidence of a linear trend: winning times appear to decrease with year. A straight-line model for predicting winning times based on year is: Winning

More information

9. Linear Regression and Correlation

9. Linear Regression and Correlation 9. Linear Regression and Correlation Data: y a quantitative response variable x a quantitative explanatory variable (Chap. 8: Recall that both variables were categorical) For example, y = annual income,

More information

Answer to exercise: Blood pressure lowering drugs

Answer to exercise: Blood pressure lowering drugs Answer to exercise: Blood pressure lowering drugs The data set bloodpressure.txt contains data from a cross-over trial, involving three different formulations of a drug for lowering of blood pressure:

More information

Outline Topic 21 - Two Factor ANOVA

Outline Topic 21 - Two Factor ANOVA Outline Topic 21 - Two Factor ANOVA Data Model Parameter Estimates - Fall 2013 Equal Sample Size One replicate per cell Unequal Sample size Topic 21 2 Overview Now have two factors (A and B) Suppose each

More information

Linear regression and correlation

Linear regression and correlation Faculty of Health Sciences Linear regression and correlation Statistics for experimental medical researchers 2018 Julie Forman, Christian Pipper & Claus Ekstrøm Department of Biostatistics, University

More information

STATISTICS 479 Exam II (100 points)

STATISTICS 479 Exam II (100 points) Name STATISTICS 79 Exam II (1 points) 1. A SAS data set was created using the following input statement: Answer parts(a) to (e) below. input State $ City $ Pop199 Income Housing Electric; (a) () Give the

More information

Introduction to Crossover Trials

Introduction to Crossover Trials Introduction to Crossover Trials Stat 6500 Tutorial Project Isaac Blackhurst A crossover trial is a type of randomized control trial. It has advantages over other designed experiments because, under certain

More information

A Re-Introduction to General Linear Models (GLM)

A Re-Introduction to General Linear Models (GLM) A Re-Introduction to General Linear Models (GLM) Today s Class: You do know the GLM Estimation (where the numbers in the output come from): From least squares to restricted maximum likelihood (REML) Reviewing

More information

Two-factor studies. STAT 525 Chapter 19 and 20. Professor Olga Vitek

Two-factor studies. STAT 525 Chapter 19 and 20. Professor Olga Vitek Two-factor studies STAT 525 Chapter 19 and 20 Professor Olga Vitek December 2, 2010 19 Overview Now have two factors (A and B) Suppose each factor has two levels Could analyze as one factor with 4 levels

More information

STAT 3A03 Applied Regression Analysis With SAS Fall 2017

STAT 3A03 Applied Regression Analysis With SAS Fall 2017 STAT 3A03 Applied Regression Analysis With SAS Fall 2017 Assignment 5 Solution Set Q. 1 a The code that I used and the output is as follows PROC GLM DataS3A3.Wool plotsnone; Class Amp Len Load; Model CyclesAmp

More information

Statistical Distribution Assumptions of General Linear Models

Statistical Distribution Assumptions of General Linear Models Statistical Distribution Assumptions of General Linear Models Applied Multilevel Models for Cross Sectional Data Lecture 4 ICPSR Summer Workshop University of Colorado Boulder Lecture 4: Statistical Distributions

More information

Chapter 8 (More on Assumptions for the Simple Linear Regression)

Chapter 8 (More on Assumptions for the Simple Linear Regression) EXST3201 Chapter 8b Geaghan Fall 2005: Page 1 Chapter 8 (More on Assumptions for the Simple Linear Regression) Your textbook considers the following assumptions: Linearity This is not something I usually

More information

Confidence Intervals, Testing and ANOVA Summary

Confidence Intervals, Testing and ANOVA Summary Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0

More information

Least Squares Analyses of Variance and Covariance

Least Squares Analyses of Variance and Covariance Least Squares Analyses of Variance and Covariance One-Way ANOVA Read Sections 1 and 2 in Chapter 16 of Howell. Run the program ANOVA1- LS.sas, which can be found on my SAS programs page. The data here

More information

Introduction to Statistical Analysis

Introduction to Statistical Analysis Introduction to Statistical Analysis Changyu Shen Richard A. and Susan F. Smith Center for Outcomes Research in Cardiology Beth Israel Deaconess Medical Center Harvard Medical School Objectives Descriptive

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) T In 2 2 tables, statistical independence is equivalent to a population

More information

A Practitioner s Guide to Cluster-Robust Inference

A Practitioner s Guide to Cluster-Robust Inference A Practitioner s Guide to Cluster-Robust Inference A. C. Cameron and D. L. Miller presented by Federico Curci March 4, 2015 Cameron Miller Cluster Clinic II March 4, 2015 1 / 20 In the previous episode

More information

ANALYSIS OF VARIANCE OF BALANCED DAIRY SCIENCE DATA USING SAS

ANALYSIS OF VARIANCE OF BALANCED DAIRY SCIENCE DATA USING SAS ANALYSIS OF VARIANCE OF BALANCED DAIRY SCIENCE DATA USING SAS Ravinder Malhotra and Vipul Sharma National Dairy Research Institute, Karnal-132001 The most common use of statistics in dairy science is testing

More information

Topic 28: Unequal Replication in Two-Way ANOVA

Topic 28: Unequal Replication in Two-Way ANOVA Topic 28: Unequal Replication in Two-Way ANOVA Outline Two-way ANOVA with unequal numbers of observations in the cells Data and model Regression approach Parameter estimates Previous analyses with constant

More information

Varians- og regressionsanalyse

Varians- og regressionsanalyse Faculty of Health Sciences Varians- og regressionsanalyse Variance component models Lene Theil Skovgaard Department of Biostatistics Variance component models Definitions and motivation One-way anova with

More information

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 1: August 22, 2012

More information

Multiple linear regression

Multiple linear regression Multiple linear regression Course MF 930: Introduction to statistics June 0 Tron Anders Moger Department of biostatistics, IMB University of Oslo Aims for this lecture: Continue where we left off. Repeat

More information

Faculty of Health Sciences. Regression models. Counts, Poisson regression, Lene Theil Skovgaard. Dept. of Biostatistics

Faculty of Health Sciences. Regression models. Counts, Poisson regression, Lene Theil Skovgaard. Dept. of Biostatistics Faculty of Health Sciences Regression models Counts, Poisson regression, 27-5-2013 Lene Theil Skovgaard Dept. of Biostatistics 1 / 36 Count outcome PKA & LTS, Sect. 7.2 Poisson regression The Binomial

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont. TCELL 9/4/205 36-309/749 Experimental Design for Behavioral and Social Sciences Simple Regression Example Male black wheatear birds carry stones to the nest as a form of sexual display. Soler et al. wanted

More information

L6: Regression II. JJ Chen. July 2, 2015

L6: Regression II. JJ Chen. July 2, 2015 L6: Regression II JJ Chen July 2, 2015 Today s Plan Review basic inference based on Sample average Difference in sample average Extrapolate the knowledge to sample regression coefficients Standard error,

More information

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model 1 Linear Regression 2 Linear Regression In this lecture we will study a particular type of regression model: the linear regression model We will first consider the case of the model with one predictor

More information

Density Temp vs Ratio. temp

Density Temp vs Ratio. temp Temp Ratio Density 0.00 0.02 0.04 0.06 0.08 0.10 0.12 Density 0.0 0.2 0.4 0.6 0.8 1.0 1. (a) 170 175 180 185 temp 1.0 1.5 2.0 2.5 3.0 ratio The histogram shows that the temperature measures have two peaks,

More information

SAS Procedures Inference about the Line ffl model statement in proc reg has many options ffl To construct confidence intervals use alpha=, clm, cli, c

SAS Procedures Inference about the Line ffl model statement in proc reg has many options ffl To construct confidence intervals use alpha=, clm, cli, c Inference About the Slope ffl As with all estimates, ^fi1 subject to sampling var ffl Because Y jx _ Normal, the estimate ^fi1 _ Normal A linear combination of indep Normals is Normal Simple Linear Regression

More information

Faculty of Health Sciences. Correlated data. Variance component models. Lene Theil Skovgaard & Julie Lyng Forman.

Faculty of Health Sciences. Correlated data. Variance component models. Lene Theil Skovgaard & Julie Lyng Forman. Faculty of Health Sciences Correlated data Variance component models Lene Theil Skovgaard & Julie Lyng Forman November 27, 2018 1 / 84 Overview One-way anova with random variation The rabbit example Hierarchical

More information

Correlated data. Variance component models. Example: Evaluate vaccine. Traditional assumption so far. Faculty of Health Sciences

Correlated data. Variance component models. Example: Evaluate vaccine. Traditional assumption so far. Faculty of Health Sciences Faculty of Health Sciences Variance component models Definitions and motivation Correlated data Variance component models, I Lene Theil Skovgaard November 29, 2013 One-way anova with random variation The

More information

Correlated data. Overview. Example: Swelling due to vaccine. Variance component models. Faculty of Health Sciences. Variance component models

Correlated data. Overview. Example: Swelling due to vaccine. Variance component models. Faculty of Health Sciences. Variance component models Faculty of Health Sciences Overview Correlated data Variance component models One-way anova with random variation The rabbit example Hierarchical models with several levels Random regression Lene Theil

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 15: Examples of hypothesis tests (v5) Ramesh Johari ramesh.johari@stanford.edu 1 / 32 The recipe 2 / 32 The hypothesis testing recipe In this lecture we repeatedly apply the

More information

T-test: means of Spock's judge versus all other judges 1 12:10 Wednesday, January 5, judge1 N Mean Std Dev Std Err Minimum Maximum

T-test: means of Spock's judge versus all other judges 1 12:10 Wednesday, January 5, judge1 N Mean Std Dev Std Err Minimum Maximum T-test: means of Spock's judge versus all other judges 1 The TTEST Procedure Variable: pcwomen judge1 N Mean Std Dev Std Err Minimum Maximum OTHER 37 29.4919 7.4308 1.2216 16.5000 48.9000 SPOCKS 9 14.6222

More information

A Re-Introduction to General Linear Models

A Re-Introduction to General Linear Models A Re-Introduction to General Linear Models Today s Class: Big picture overview Why we are using restricted maximum likelihood within MIXED instead of least squares within GLM Linear model interpretation

More information

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression 36-309/749 Experimental Design for Behavioral and Social Sciences Sep. 22, 2015 Lecture 4: Linear Regression TCELL Simple Regression Example Male black wheatear birds carry stones to the nest as a form

More information

Inference. ME104: Linear Regression Analysis Kenneth Benoit. August 15, August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58

Inference. ME104: Linear Regression Analysis Kenneth Benoit. August 15, August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58 Inference ME104: Linear Regression Analysis Kenneth Benoit August 15, 2012 August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58 Stata output resvisited. reg votes1st spend_total incumb minister

More information

df=degrees of freedom = n - 1

df=degrees of freedom = n - 1 One sample t-test test of the mean Assumptions: Independent, random samples Approximately normal distribution (from intro class: σ is unknown, need to calculate and use s (sample standard deviation)) Hypotheses:

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

Applied Statistics and Econometrics

Applied Statistics and Econometrics Applied Statistics and Econometrics Lecture 6 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 53 Outline of Lecture 6 1 Omitted variable bias (SW 6.1) 2 Multiple

More information

Comparison of a Population Means

Comparison of a Population Means Analysis of Variance Interested in comparing Several treatments Several levels of one treatment Comparison of a Population Means Could do numerous two-sample t-tests but... ANOVA provides method of joint

More information

Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation

Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation Biost 58 Applied Biostatistics II Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture 5: Review Purpose of Statistics Statistics is about science (Science in the broadest

More information

STAT 3900/4950 MIDTERM TWO Name: Spring, 2015 (print: first last ) Covered topics: Two-way ANOVA, ANCOVA, SLR, MLR and correlation analysis

STAT 3900/4950 MIDTERM TWO Name: Spring, 2015 (print: first last ) Covered topics: Two-way ANOVA, ANCOVA, SLR, MLR and correlation analysis STAT 3900/4950 MIDTERM TWO Name: Spring, 205 (print: first last ) Covered topics: Two-way ANOVA, ANCOVA, SLR, MLR and correlation analysis Instructions: You may use your books, notes, and SPSS/SAS. NO

More information

Example. Multiple Regression. Review of ANOVA & Simple Regression /749 Experimental Design for Behavioral and Social Sciences

Example. Multiple Regression. Review of ANOVA & Simple Regression /749 Experimental Design for Behavioral and Social Sciences 36-309/749 Experimental Design for Behavioral and Social Sciences Sep. 29, 2015 Lecture 5: Multiple Regression Review of ANOVA & Simple Regression Both Quantitative outcome Independent, Gaussian errors

More information

Chapter 1 Linear Regression with One Predictor

Chapter 1 Linear Regression with One Predictor STAT 525 FALL 2018 Chapter 1 Linear Regression with One Predictor Professor Min Zhang Goals of Regression Analysis Serve three purposes Describes an association between X and Y In some applications, the

More information

Unit 10: Simple Linear Regression and Correlation

Unit 10: Simple Linear Regression and Correlation Unit 10: Simple Linear Regression and Correlation Statistics 571: Statistical Methods Ramón V. León 6/28/2004 Unit 10 - Stat 571 - Ramón V. León 1 Introductory Remarks Regression analysis is a method for

More information

PubH 7405: REGRESSION ANALYSIS SLR: DIAGNOSTICS & REMEDIES

PubH 7405: REGRESSION ANALYSIS SLR: DIAGNOSTICS & REMEDIES PubH 7405: REGRESSION ANALYSIS SLR: DIAGNOSTICS & REMEDIES Normal Error RegressionModel : Y = β 0 + β ε N(0,σ 2 1 x ) + ε The Model has several parts: Normal Distribution, Linear Mean, Constant Variance,

More information

Lecture 11: Simple Linear Regression

Lecture 11: Simple Linear Regression Lecture 11: Simple Linear Regression Readings: Sections 3.1-3.3, 11.1-11.3 Apr 17, 2009 In linear regression, we examine the association between two quantitative variables. Number of beers that you drink

More information

STAT 525 Fall Final exam. Tuesday December 14, 2010

STAT 525 Fall Final exam. Tuesday December 14, 2010 STAT 525 Fall 2010 Final exam Tuesday December 14, 2010 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points will

More information

Analysis of variance and regression. December 4, 2007

Analysis of variance and regression. December 4, 2007 Analysis of variance and regression December 4, 2007 Variance component models Variance components One-way anova with random variation estimation interpretations Two-way anova with random variation Crossed

More information

Outline. Topic 20 - Diagnostics and Remedies. Residuals. Overview. Diagnostics Plots Residual checks Formal Tests. STAT Fall 2013

Outline. Topic 20 - Diagnostics and Remedies. Residuals. Overview. Diagnostics Plots Residual checks Formal Tests. STAT Fall 2013 Topic 20 - Diagnostics and Remedies - Fall 2013 Diagnostics Plots Residual checks Formal Tests Remedial Measures Outline Topic 20 2 General assumptions Overview Normally distributed error terms Independent

More information

Exam details. Final Review Session. Things to Review

Exam details. Final Review Session. Things to Review Exam details Final Review Session Short answer, similar to book problems Formulae and tables will be given You CAN use a calculator Date and Time: Dec. 7, 006, 1-1:30 pm Location: Osborne Centre, Unit

More information