Statistics for exp. medical researchers Regression and Correlation

Size: px
Start display at page:

Download "Statistics for exp. medical researchers Regression and Correlation"

Transcription

1 Faculty of Health Sciences Regression analysis Statistics for exp. medical researchers Regression and Correlation Lene Theil Skovgaard Sept. 28, 2015 Linear regression, Estimation and Testing Confidence and prediction limits Model checks and diagnostics The correlation coefficient Transformation Comparison of regression lines Home pages: : indicates that this page may be skipped without serious consequences 1 / 90 2 / 90 The nature of the explanatory variable The nature of the explanatory variable, II So far, we have been looking at Multi factor designs (ANOVA) Variance component models The explanatory variables have been factors: Condition/Treatment (A, B, C, Ctrl) Laboratory (1,2,3,4) Splenectomy, Altitude Now we turn to quantitative explanatory variables: concentration temperature age Regression: Establish a (parsimonous) smooth relation between x and y, making prediction possible Examples of regression type problems: Dose-response relations Calibration curves 3 / 90 4 / 90

2 Quantitative explanatory variable Linear regression Relation between two quantitative variables: Explanatory variable, x-variable Outcome variable, y-variable, (Response variable, Dependent variable) Linearity: Every smooth curve can be approximated by a straight line at least locally Linearity is easy to deal with Sometimes it takes a transformation to make it linear. 5 / 90 6 / 90 Example: Cell concentration of tetrahymena Choice of scale The unicellar organism tetrahymena grown in two different media, with and without glucose for the linearity assumption (...or perhaps it is not linear on any scale) Research question: How does cell concentration x (number of cells in 1 ml of the growth media) affect the cell size y (average cell diameter, measured in µm). Quantitative covariate : concentration x Quantitative outcome : diameter y Here, we need a log-transformation (more later on, p. 71 ff) 7 / 90 8 / 90

3 Example (Book, p. 226) The straight line Calibration curve for measuring the concentration of Selenium: Mathematical formulation: y = α + βx 6 known concentrations: 0,20, 40, 80, 120 and 160 Triplicate measurements for each concentration so close that they cannot be seen individually Does this look like a straight line? Yes 9 / / 90 Parameters of the straight line Model for selenium measurements Interpretation Intercept α: The expected outcome, when the explanatory variable x is zero Units identical to y-units Slope β: The expected difference in y corresponding to a one unit difference in x Units in y-units per x-unit y ci : the i th measurement of the c th concentration x c : the corresponding known concentration of selenium Model: We call this a simple linear regression E(y ci ) = α + βx c simple, because there is only one explanatory variable (concentration) linear, because the explanatory variable has a linear effect But: We have some issues regarding correlation of triplicate measurements... (p. 50) 11 / / 90

4 Average over triplicates Model for linear regression to avoid the correlation issue Mean value of outcome is linearly dependent on the explanatory variable, and the variance σ 2, or σy x 2 (the variance of the residuals = the distance in vertical direction from the regression line) is assumed constant, 13 / 90 y c : average measurement of the c th concentration x c : the corresponding known concentration of selenium Y c = α + βx c + ε c, ε c N (0, σ 2 ) 14 / 90 Regression vs correlation Method of Least Squares The regression model Y c = α + βx c + ε c, ε i N (0, σ 2 ) independent specifies the conditional distributions of Y, given X, to be Normal, with identical variances and with mean values, that depend linearly on x. derived from the general likelihood principle: Minimize the residual sum of squares: n n SS res = (y c ŷ c ) 2 = (y c α βx c ) 2, c=1 c=1 residuals here being the vertical distance from the observation y c to the line (ŷ c = α + βx c ), i.e. r c = y c ( α + βx c ) 15 / / 90

5 Technicalities: Estimation of slope Estimation with SAS where s xy = 1 n 1 β = s xy sx 2, n (x i x)(y i ȳ) i=1 is the covariance between x and y, and s 2 x = 1 n 1 is the variance in the covariate n (x i x) 2 i=1 proc means nway N mean data=a1; class part Concentration; var Selenium; output out=av mean=average_selenium; run; ods graphics on; proc reg plots=all data=av; model Average_Selenium=Concentration / clb; run; ods graphics off; 17 / / 90 Results for Selenium averages Results for Selenium averages, II The REG Procedure Dependent Variable: Average_Selenium Number of Observations Used 6 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model <.0001 Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept Concentration <.0001 Variable DF 95% Confidence Limits Intercept Concentration / 90 taken from output: α = 0.943(0.814) β = 0.751(0.009) s y x = where s y x denotes the estimate of the residual variance σ y x = σ 2, called "Root MSE" in SAS output, and estimated as s 2 y x = SS res n 2 α and s y x are measured in the units of the outcome variable β is measured in outcome-units per x-unit 20 / 90

6 Uncertainty of estimated slope Confidence interval for parameters SE( β) = σ y x s x n 1, with 95% coverage, here shown for the slope: β ± t-quantile SE( β) Good precision, when the residual variation σ y x is small the sample (n) is large the variation in the explanatory variable (s x ) is large (i.e. when concentrations vary a lot) Here, n = 6, so df = 6 2 = 4, and the corresponding t-quantile is Therefore, the interval becomes ± = (0.726, 0.776) 21 / / 90 Test of zero slope Use of regression results Cut from the output: Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept Concentration <.0001 T = (0.751/0.009) = t(4), P < Strong evidence of a relationship between the actual concentration and the measured response Is this at all interesting? Maybe test α = 0? We want to know more... to be continued Prediction: Determine y from x, the ordinary way: Prediction of the outcome, for given value of the explanatory variable, i.e. We predict observations, for given known concentration of selenium: ŷ i = α + βx i Calibration: Determine x from y, the reverse way: We estimate unknown concentrations of selenium from one or several measurements taken We shall look briefly at that on p / / 90

7 Confidence and prediction limits Confidence and prediction limits for line Confidence limits show the uncertainty in the estimated regression line Almost collapsed on the line itself Tell us where the line may also be Limits become narrower when sample size is increased Prediction limits show the (future) variation in the outcome, for given covariate (reference regions) Tell us where future subjects will lie Limits have approximately same width no matter the sample size 25 / 90 Confidence limits can hardly be seen since they are so narrow 26 / 90 Check of model assumptions Residual plot, for check of linearity Look for possible flaws: Linearity: Plot residuals vs. explanatory variable, curves?, p.28 Test whether a second order polynomium (a parabola) is better than a straight line, p.29 Variance homogeneity: Plot residuals against predicted values, trumpet shape?, p. 30 Normality: Histogram, skewness?, p. 31 Quantile plot, hammock shape?, p. 31 We do not have enough information to reasonably perform these checks 27 / 90 Curves? 28 / 90

8 Numerical check of linearity Residual plot, for check of variance homogeneity Include a second-order term, Concentration2=(Concentration-75)**2; proc reg data=av; model Average_Selenium=Concentration Concentration2; run; which yields the output Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept Concentration <.0001 Concentration No significant deviation from linearity, since the second order term has a P-value P= / 90 Trumpet shape? 30 / 90 Model checks at a glance If assumptions fail Linearity: Transform or do non-linear regression Variance homogeneity: Transform Normality: Transform Linearity is the most important assumption, unless the task is to construct prediction intervals! More on transformations later... p. 71 ff 31 / / 90

9 Diagnostics Assess the influence of single observations by leaving out one observation at a time Omit the ith individual from the analysis Obtain new estimates, α ( i) and β ( i) Compute deletion diagnostics: dev(α) i = α α ( i) dev(β) i = β β ( i) both normalized by the standard error of the estimate Combine the squared deletion diagnostics into a single diagnostic, Cook s distance Cook(α, β) i. Deletion diagnostics for selenium averages Influence-option (in MODEL-statement of PROC REG): proc reg data=av; model Average_Selenium=Concentration / r clb influence; run; Output: Output Statistics Dependent Predicted Std Error Std Error Student Obs Variable Value Mean Predict Residual Residual Residual / / 90 Deletion diagnostics for selenium averages, II Cooks distance Output, continued: Cook s Hat Diag Cov Obs D RStudent H Ratio DFFITS 1 * ** * * ** ** DFBETAS Obs Intercept Concentration Note the large values for Obs=6 35 / 90 Note the large value of COOKD for Obs=6 36 / 90

10 The correlation coefficient Bivariate Normal distribution, ρ = 0 A numerical quantity describing the degree of (linear) relationship between two variables: Pearson correlation: assuming normality of both variables: ni=1 (x i x)(y i ȳ) r = r xy = ni=1 (x i x) 2 n i=1 (y i ȳ) 2 Spearman correlation: based on ranks Both of them take on values between -1 and 1 (0 corresponding to independence) +1 and -1 correspond to perfect (linear) relationships, positive respectively negative 37 / 90 Correlation 0 38 / 90 All vertical slices yields normal distributions with identical mean values and identical variances Bivariate Normal distribution, ρ = 0.9 Correlation 0.9 Contour curves Contour curves from a Normal distribution becomes ellipses (or circles in case of ρ = 0) All vertical slices yields normal distributions with different mean values but identical variances Scatter plots should resemble ellipses 39 / / 90

11 Regression vs correlation Regression vs. correlation, II The regression model Y c = α + βx c + ε c, ε c N (0, σ 2 ) independent specifies the conditional distributions of Y, given X, to be Normal, with identical variances and with mean values, that depend linearly on x. The assumptions for interpreting a correlation are stronger than for interpreting a slope (involves normality of both variables) Interpretation of a correlation coefficient is often misleading...and almost allways non-informative The correlation has no units, gives no quantification of the relation Tests of zero slope and zero correlation are identical and does not assume anything regarding the distribution of x 41 / / 90 Regression vs. correlation, III Problems with the correlation Test of zero slope or zero correlation is the same thing The two estimates (for correlation and slope) resembles one another (in formulae), and they become 0 simultaneously ˆβ = S xy Sxx r xy = S xy S xxsyy ˆβ = r xy S yy Sxx r xy = ˆβ Sxx Syy Test for β = 0 is identical to test for ρ xy = 0 Formula manipulation yields the equality: Fix: sample size n slope β 1 r 2 xy = s 2 s 2 + ˆβ 2 Sxx n 2 residual variation s y x but increase the variation s x in the covariate x What happens? The correlation approaches either 1 or 1!! 43 / / 90

12 Two imaginary investigations When can we use the correlation? Slopes are equal Correlations are not! Can we increase correlation even further, without obtaining more observations? Yes / 90 When we only want to test whether or not there is a relation between two variables (only P-value needed) In this case, consider the non-parametric Spearman correlation to avoid the Normality (linearity) assumption To rank the relatedness of many variables, measured on the same units (e.g. concentrations of different compounds in the same solution). The correlation has no units... and is therefore hard to interpret! 46 / 90 Correlation for Selenium Correlation for Selenium, II Correlation between average selenium measurements, and the corresponding known concentrations: proc corr pearson spearman data=av; var Average_Selenium Concentration; run; with output Simple Statistics Variable N Mean Std Dev Median Average_Selenium Concentration Variable Minimum Maximum Average_Selenium Concentration / 90...continued next page Pearson correlation Pearson Correlation Coefficients, N = 6 Prob > r under H0: Rho=0 Average_ Selenium Concentration Average_Selenium <.0001 Concentration <.0001 The very high correlation of indicates a close-to-linear relationship And so what? 48 / 90

13 Correlation for Selenium, III Spearman correlation Spearman Correlation Coefficients, N = 6 Prob > r under H0: Rho=0 Average_ Selenium Concentration Average_Selenium <.0001 Concentration <.0001 The correlation is 1, indicating a perfect monotone relationship, not necessarily linear And so what? 49 / 90 Analysis of all Selenium measurements Originally, we had 18 measurements, triplicates for each known concentration. Could we use all of these 18 measurements to obtain Better estimates? More narrow confidence intervals? No, probably not, since triplicates are not independent: We expect to have several variations in this investigation: Error in the known concentration, Observer variation, Temperature effects... and Measurement error If triplicates are measured on the same solution, an error in the known concentration will affect all three measurements equally, i.e. they would all be too large or too small, they would be correlated 50 / 90 Analysis of all Selenium measurements, II In case of correlated measurements, using the same simple regression model, applied to all individual measurements would be wrong, and would give Too small P-values for anything Too narrow confidence intervals Instead, we could build a mixed model, and the result would be identical to the analysis of averages unless the design is unbalanced, e.g. due to missing observations Naive=wrong analysis of all 18 observations The REG Procedure Dependent Variable: Selenium Number of Observations Used 18 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model <.0001 Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept Concentration <.0001 Book, p. 251 Variable DF 95% Confidence Limits Intercept Concentration / / 90

14 Residual plot from naive analysis Comparison of the two analyses The naive one (wrong), with all 18 measurements The analysis of averages Estimate t-value and P-value Method of slope for zero slope Naive (0.005) (< ) n = 18 Averages (0.009) (< ) n = 6 Note the correlation between residuals from the same concentration The discrepancy cannot be seen in the P-values, because it is so significant. 53 / / 90 Consequences of naive model Mixed model The naive model based on all 18 observations ignores the correlation between triplicates: We are dealing with two variance components: The variation of means around the regression line (ω 2 ) The measurement error (variation between triplicates, σ 2 ) We think we have too much information The standard errors become too small The confidence intervals become too narrow The conclusions become exaggerated or formulated as Y ci = α + βx c + A c + ε ci A c N (0, ω 2 ), ε ci N (0, σ 2 ) Corr(Y ci, Y cj ) = ρ = ω2 ω 2 + σ 2 55 / / 90

15 Mixed model in SAS Mixed model in SAS, specification I We have to specify, that all observations regarding the same concentration (i.e. the triplicates), are correlated. This can be done in two ways: 1. directly specifying the triplicates to have a CS: Compond Symmetry structured correlation (see p. 58) 2. specifying two variance components, one between concentrations (ω 2 ), and one within concentration (triplicate variation, σ 2 ), see p. 60 Both of these structures will require cconcentration=concentration, specified as a factor, a Class-variable proc mixed cl data=a1; class cconcentration; model Selenium=Concentration / s cl ddfm=satterth; repeated / subject=cconcentration type=cs r rcorr; run; Here, we specify directly type=cs, i.e. the correlation structure 1 ρ ρ ρ ρ ρ 1 ρ ρ ρ ρ ρ 1 ρ ρ ρ ρ ρ 1 ρ ρ ρ ρ ρ 1 and get the output shown next page 57 / / 90 Output from specification I Estimated R Matrix Row Col1 Col2 Col Estimated R Correlation Matrix Row Col1 Col2 Col Covariance Parameter Estimates Cov Parm Subject Estimate Alpha Lower Upper CS cconcentration Residual Solution for Fixed Effects Standard Effect Estimate Error DF t Value Pr > t Alpha Intercept Concentration < Effect Lower Upper Intercept Concentration / 90 Mixed model in SAS, specification II proc mixed plots=all cl data=a1; class cconcentration; model Selenium=Concentration / s cl ddfm=satterth; random intercept / subject=cconcentration; run; gives more or less the same output: Covariance Parameter Estimates Cov Parm Subject Estimate Alpha Lower Upper Intercept cconcentration Residual Solution for Fixed Effects Standard Effect Estimate Error DF t Value Pr > t Alpha Intercept Concentration < Effect Lower Upper Intercept Concentration / 90

16 Model check Comment on Mixed model in SAS Note: The results are identical to those for the analysis of averages, but we get extra information here: Estimated correlation: ˆρ = (p. 59) This clearly violates the independence assumption, which is why the naive approach using all 18 measurements will provide wrong results. 61 / / 90 Recalculate correlation Effect of correlation to the number of independent pieces of information for: n different doses (here n = 6) k repetitions for each dose (here k = 3) How much do we gain by taking duplicates, triplicates etc instead of just taking a single measurement? That depends on the correlation, ρ We ought to have n k pieces of information, but due to the correlation, we only have m < n k, and m = n k 1 + ρ(k 1) 63 / / 90

17 Reminder: prediction Calibration involves prediction/estimation the other way around, i.e. estimation of an unknown x-value, based on y-observations: What about the other way around?: Calibration 65 / 90 Take a soil sample with an unknown concentration (c 0, say) of selenium Measure with some instrument a couple of times (e.g. 3, as here), and get observations Y 01, Y 02, Y 03, with average Ȳ0 Make a qualified guess of the unknown concentration, with confidence interval, based on the average measurement, Ȳ 0 Since E(y 0 ) = α + βx 0, we must estimate ĉ 0 = Ȳ0 ˆα ˆβ But what is the uncertainty in this expression? 66 / 90 *Calibration uncertainty Example: Cell concentration of tetrahymena based on k measurements (Y 0i, i = 1,..., k) of an unknown concentration (c 0 ): σ y x ˆβ How to do this in SAS? Not so easy, unfortunately... 1 k + 1 n + (ȳ 0 ȳ) 2 ˆb2 n i=1 (x i x) 2 The unicellar organism tetrahymena grown in two different media, with and without glucose Research question: How does cell concentration x (number of cells in 1 ml of the growth media) affect the cell size y (average cell diameter, measured in µm). Quantitative covariate : concentration x Quantitative outcome : diameter y 67 / / 90

18 Scatter plot Residual plot for naive linear regression for the no glucose medium: the relation is clearly not linear Note the curved shape indicating that linearity between cell diameter and concentration is not appropriate. 69 / / 90 Power relationship Logarithmic transformation Suggested relationship between diameter (y) and concentration (x): y = αx β Interpretation of the parameters: α is a parameter denoting the cell size for a concentration of x = 1, an extrapolation to the extreme lower end of the concentration range as seen from the scatter plot β is... When the concentration x is doubled, the diameter will increase with a factor 2 b Transforming the diameter (y) with a logarithm yields the theoretical relationship or in terms of observations: log 10 (y) = log 10 (α) + β log 10 (x). E(y i ) = α + βx i where y i = log 10 (y i ), x i = log 10 (x i ), and α = log 10 (α) is the intercept. 71 / / 90

19 Scatter plot on double logarithmic scale Regression on double logarithmic scale ods graphics on; proc reg plots=(diagnostics(unpack) residuals(smooth)) data=a1; where glucose="no"; model logdiameter = logconcentration / clb; run; ods graphics off; The REG Procedure Dependent Variable: logdiameter Number of Observations Read 19 Number of Observations Used 19 Parameter Estimates looks pretty linear 73 / 90 Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept <.0001 logconcentration <.0001 Variable DF 95% Confidence Limits Intercept logconcentration / 90 Model check for logarithmic analysis Estimates for the multiplicative model taken from output on p. 74: α = 1.635(0.0202), CI=(1.5921, ) β = (0.0041), CI=( , ) Back-transforming The effect of a doubling of the concentration is estimated to 2 β = = 0.959, a 4.1% reduction of diameter. looks much better Confidence limits: ( , ) = (0.954, 0.965), i.e. between a 3.5% and a 4.6% reduction 75 / / 90

20 Two media: with and without glucose Two parallel regression lines An effect of concentration, assumed to be the same for both media A difference between the two media, assumed to be the same for all concentrations This is called: Multiple regression with two covariates (explanatory variables) Analysis of covariance It is an additive model, with no interaction. 77 / / 90 Analysis of covariance in SAS Analysis of covariance, II proc glm plots=all data=a1; class glucose; model logdiameter = logconcentration glucose /solution clparm; run; with output The GLM Procedure Class Level Information Class Levels Values glucose 2 no yes Number of Observations Read 51 Dependent Variable: logdiameter Sum of Source DF Squares Mean Square F Value Pr > F Model <.0001 Error Corrected Total R-Square Coeff Var Root MSE logdiameter Mean / 90 Source DF Type III SS Mean Square F Value Pr > F logconcentration <.0001 glucose <.0001 Standard Parameter Estimate Error t Value Pr > t Intercept B <.0001 logconcentration <.0001 glucose no B <.0001 glucose yes B... Parameter 95% Confidence Limits Intercept logconcentration glucose no glucose yes.. The slope ( ) is the estimated effect on log 10 (diam) of log 10 (Concentration), so we must backtransform for interpretation: 80 / 90

21 Interpretation of output, slope Interpretation of output, difference between media The effect of concentration after back-transforming: The effect of a doubling of the concentration is estimated to 2 β = = 0.962, a 3.8% reduction of diameter. Confidence limits: ( , ) = (0.959, 0.965), i.e. between a 3.5% and a 4.1% reduction Note that this is almost the same as when we considered one medium alone The intercept is an estimate of α (see p. 72), and α = log 10 (α) α = 10 α. Therefore, the difference between the two media (glucose vs. no glucose) is a factor = 1.067, i.e. a 6.7% higher cell diameter when glucose is added. Confidence limits: ( , ) = (1.054, 1.080), i.e. between a 5.4% and 8.0% 81 / / 90 Interaction? Model with interaction in SAS If the effect of one explanatory variable (X 1 ) depends on the value of another (X 2 ), we say that there is an interaction between X 1 and X 2 : If the effect of concentration depends on the media, we have interaction between concentration and media. If the difference between the two media varies with concentration, we have interaction between concentration and media. Interaction: The two regression lines are not parallel, they have different slopes 83 / 90 proc glm plots=all data=tetrahymena; class glucose; model logdiam=logconc glucose logconc*glucose/solution clparm; estimate slope, glucose=0 logconc 1 logconc*glucose 1 0; estimate slope, glucose=1 logconc 1 logconc*glucose 0 1; output out=check r=res p=pred; run; The GLM Procedure Class Level Information Class Levels Values glucose Number of Observations Used 51 Dependent Variable: logdiam Sum of Source DF Squares Mean Square F Value Pr > F Model <.0001 Error Corrected Total R-Square Coeff Var Root MSE logdiam Mean / 90

22 Interaction?, II Interaction?, III Output, continued: Output, continued: Source DF Type III SS Mean Square F Value Pr > F logconc <.0001 glucose logconc*glucose Standard Parameter Estimate Error t Value Pr > t slope, glucose= <.0001 slope, glucose= <.0001 Parameter 95% Confidence Limits slope, glucose= slope, glucose= Standard Parameter Estimate Error t Value Pr > t Intercept B <.0001 logconc B <.0001 glucose B glucose B... logconc*glucose B logconc*glucose B... Parameter 95% Confidence Limits Intercept logconc glucose glucose 1.. logconc*glucose logconc*glucose 1.. Test for no interaction gives P=0.19, i.e. no significance 85 / / 90 Interpretation of output, slopes Interpretation of output, media We now have two different estimates of slope, depending on the presence of glucose. We back-transform to the effect of a doubling of the concentration No glucose: 2 β = = 0.959, a 4.1% reduction of diameter (CI: 0.954, 0.965). Glucose: 2 β = = 0.964, a 3.6% reduction of diameter (CI: 0.960, 0.968). Note that for No glucosis, we get the same as the results p. 76 The difference between the two media (glucose vs. no glucose) now depends upon the concentration of cells! The estimate shown in the output (p. 86) refers to the difference in media, when the explanatory variable is zero. Since our explanatory variable is the logarithm of cell concentration, it corresponds to a cell concentration of 1 - and only this particular value. This is way out of range 87 / / 90

23 Model fit, with interaction Interpretation of marginal effects Important: As long as an interaction is present (in the model), do not try to interpret the marginal effects of either explanatory variable even if the interaction in seen to be insignificant Instead, leave out the interaction from the model, and run it again This will make us return to the analysis on p / / 90

Linear regression and correlation

Linear regression and correlation Faculty of Health Sciences Linear regression and correlation Statistics for experimental medical researchers 2018 Julie Forman, Christian Pipper & Claus Ekstrøm Department of Biostatistics, University

More information

Statistics for exp. medical researchers Comparison of groups, T-tests and ANOVA

Statistics for exp. medical researchers Comparison of groups, T-tests and ANOVA Faculty of Health Sciences Outline Statistics for exp. medical researchers Comparison of groups, T-tests and ANOVA Lene Theil Skovgaard Sept. 14, 2015 Paired comparisons: tests and confidence intervals

More information

Correlation. Bivariate normal densities with ρ 0. Two-dimensional / bivariate normal density with correlation 0

Correlation. Bivariate normal densities with ρ 0. Two-dimensional / bivariate normal density with correlation 0 Correlation Bivariate normal densities with ρ 0 Example: Obesity index and blood pressure of n people randomly chosen from a population Two-dimensional / bivariate normal density with correlation 0 Correlation?

More information

General Linear Model (Chapter 4)

General Linear Model (Chapter 4) General Linear Model (Chapter 4) Outcome variable is considered continuous Simple linear regression Scatterplots OLS is BLUE under basic assumptions MSE estimates residual variance testing regression coefficients

More information

Lecture 11: Simple Linear Regression

Lecture 11: Simple Linear Regression Lecture 11: Simple Linear Regression Readings: Sections 3.1-3.3, 11.1-11.3 Apr 17, 2009 In linear regression, we examine the association between two quantitative variables. Number of beers that you drink

More information

Overview Scatter Plot Example

Overview Scatter Plot Example Overview Topic 22 - Linear Regression and Correlation STAT 5 Professor Bruce Craig Consider one population but two variables For each sampling unit observe X and Y Assume linear relationship between variables

More information

Stat 500 Midterm 2 12 November 2009 page 0 of 11

Stat 500 Midterm 2 12 November 2009 page 0 of 11 Stat 500 Midterm 2 12 November 2009 page 0 of 11 Please put your name on the back of your answer book. Do NOT put it on the front. Thanks. Do not start until I tell you to. The exam is closed book, closed

More information

Outline. Review regression diagnostics Remedial measures Weighted regression Ridge regression Robust regression Bootstrapping

Outline. Review regression diagnostics Remedial measures Weighted regression Ridge regression Robust regression Bootstrapping Topic 19: Remedies Outline Review regression diagnostics Remedial measures Weighted regression Ridge regression Robust regression Bootstrapping Regression Diagnostics Summary Check normality of the residuals

More information

Correlated data. Repeated measurements over time. Typical set-up for repeated measurements. Traditional presentation of data

Correlated data. Repeated measurements over time. Typical set-up for repeated measurements. Traditional presentation of data Faculty of Health Sciences Repeated measurements over time Correlated data NFA, May 22, 2014 Longitudinal measurements Julie Lyng Forman & Lene Theil Skovgaard Department of Biostatistics University of

More information

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression.

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression. 10/3/011 Functional Connectivity Correlation and Regression Variance VAR = Standard deviation Standard deviation SD = Unbiased SD = 1 10/3/011 Standard error Confidence interval SE = CI = = t value for

More information

1) Answer the following questions as true (T) or false (F) by circling the appropriate letter.

1) Answer the following questions as true (T) or false (F) by circling the appropriate letter. 1) Answer the following questions as true (T) or false (F) by circling the appropriate letter. T F T F T F a) Variance estimates should always be positive, but covariance estimates can be either positive

More information

Correlation and the Analysis of Variance Approach to Simple Linear Regression

Correlation and the Analysis of Variance Approach to Simple Linear Regression Correlation and the Analysis of Variance Approach to Simple Linear Regression Biometry 755 Spring 2009 Correlation and the Analysis of Variance Approach to Simple Linear Regression p. 1/35 Correlation

More information

COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION

COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION Answer all parts. Closed book, calculators allowed. It is important to show all working,

More information

Answer to exercise: Blood pressure lowering drugs

Answer to exercise: Blood pressure lowering drugs Answer to exercise: Blood pressure lowering drugs The data set bloodpressure.txt contains data from a cross-over trial, involving three different formulations of a drug for lowering of blood pressure:

More information

Multi-factor analysis of variance

Multi-factor analysis of variance Faculty of Health Sciences Outline Multi-factor analysis of variance Basic statistics for experimental researchers 2015 Two-way ANOVA and interaction Mathed samples ANOVA Random vs systematic variation

More information

Analysis of variance and regression. May 13, 2008

Analysis of variance and regression. May 13, 2008 Analysis of variance and regression May 13, 2008 Repeated measurements over time Presentation of data Traditional ways of analysis Variance component model (the dogs revisited) Random regression Baseline

More information

df=degrees of freedom = n - 1

df=degrees of freedom = n - 1 One sample t-test test of the mean Assumptions: Independent, random samples Approximately normal distribution (from intro class: σ is unknown, need to calculate and use s (sample standard deviation)) Hypotheses:

More information

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression BSTT523: Kutner et al., Chapter 1 1 Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression Introduction: Functional relation between

More information

SAS Procedures Inference about the Line ffl model statement in proc reg has many options ffl To construct confidence intervals use alpha=, clm, cli, c

SAS Procedures Inference about the Line ffl model statement in proc reg has many options ffl To construct confidence intervals use alpha=, clm, cli, c Inference About the Slope ffl As with all estimates, ^fi1 subject to sampling var ffl Because Y jx _ Normal, the estimate ^fi1 _ Normal A linear combination of indep Normals is Normal Simple Linear Regression

More information

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij = K. Model Diagnostics We ve already seen how to check model assumptions prior to fitting a one-way ANOVA. Diagnostics carried out after model fitting by using residuals are more informative for assessing

More information

Topic 14: Inference in Multiple Regression

Topic 14: Inference in Multiple Regression Topic 14: Inference in Multiple Regression Outline Review multiple linear regression Inference of regression coefficients Application to book example Inference of mean Application to book example Inference

More information

Notes 6. Basic Stats Procedures part II

Notes 6. Basic Stats Procedures part II Statistics 5106, Fall 2007 Notes 6 Basic Stats Procedures part II Testing for Correlation between Two Variables You have probably all heard about correlation. When two variables are correlated, they are

More information

9 Correlation and Regression

9 Correlation and Regression 9 Correlation and Regression SW, Chapter 12. Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then retakes the

More information

1 A Review of Correlation and Regression

1 A Review of Correlation and Regression 1 A Review of Correlation and Regression SW, Chapter 12 Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then

More information

STAT 350: Summer Semester Midterm 1: Solutions

STAT 350: Summer Semester Midterm 1: Solutions Name: Student Number: STAT 350: Summer Semester 2008 Midterm 1: Solutions 9 June 2008 Instructor: Richard Lockhart Instructions: This is an open book test. You may use notes, text, other books and a calculator.

More information

Topic 18: Model Selection and Diagnostics

Topic 18: Model Selection and Diagnostics Topic 18: Model Selection and Diagnostics Variable Selection We want to choose a best model that is a subset of the available explanatory variables Two separate problems 1. How many explanatory variables

More information

Regression models. Categorical covariate, Quantitative outcome. Examples of categorical covariates. Group characteristics. Faculty of Health Sciences

Regression models. Categorical covariate, Quantitative outcome. Examples of categorical covariates. Group characteristics. Faculty of Health Sciences Faculty of Health Sciences Categorical covariate, Quantitative outcome Regression models Categorical covariate, Quantitative outcome Lene Theil Skovgaard April 29, 2013 PKA & LTS, Sect. 3.2, 3.2.1 ANOVA

More information

Correlated data. Longitudinal data. Typical set-up for repeated measurements. Examples from literature, I. Faculty of Health Sciences

Correlated data. Longitudinal data. Typical set-up for repeated measurements. Examples from literature, I. Faculty of Health Sciences Faculty of Health Sciences Longitudinal data Correlated data Longitudinal measurements Outline Designs Models for the mean Covariance patterns Lene Theil Skovgaard November 27, 2015 Random regression Baseline

More information

6. Multiple regression - PROC GLM

6. Multiple regression - PROC GLM Use of SAS - November 2016 6. Multiple regression - PROC GLM Karl Bang Christensen Department of Biostatistics, University of Copenhagen. http://biostat.ku.dk/~kach/sas2016/ kach@biostat.ku.dk, tel: 35327491

More information

STATISTICS 479 Exam II (100 points)

STATISTICS 479 Exam II (100 points) Name STATISTICS 79 Exam II (1 points) 1. A SAS data set was created using the following input statement: Answer parts(a) to (e) below. input State $ City $ Pop199 Income Housing Electric; (a) () Give the

More information

Chapter 1 Linear Regression with One Predictor

Chapter 1 Linear Regression with One Predictor STAT 525 FALL 2018 Chapter 1 Linear Regression with One Predictor Professor Min Zhang Goals of Regression Analysis Serve three purposes Describes an association between X and Y In some applications, the

More information

Analysis of variance and regression. November 22, 2007

Analysis of variance and regression. November 22, 2007 Analysis of variance and regression November 22, 2007 Parametrisations: Choice of parameters Comparison of models Test for linearity Linear splines Lene Theil Skovgaard, Dept. of Biostatistics, Institute

More information

Models for Clustered Data

Models for Clustered Data Models for Clustered Data Edps/Psych/Soc 589 Carolyn J Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Spring 2019 Outline Notation NELS88 data Fixed Effects ANOVA

More information

unadjusted model for baseline cholesterol 22:31 Monday, April 19,

unadjusted model for baseline cholesterol 22:31 Monday, April 19, unadjusted model for baseline cholesterol 22:31 Monday, April 19, 2004 1 Class Level Information Class Levels Values TRETGRP 3 3 4 5 SEX 2 0 1 Number of observations 916 unadjusted model for baseline cholesterol

More information

Models for longitudinal data

Models for longitudinal data Faculty of Health Sciences Contents Models for longitudinal data Analysis of repeated measurements, NFA 016 Julie Lyng Forman & Lene Theil Skovgaard Department of Biostatistics, University of Copenhagen

More information

Models for Clustered Data

Models for Clustered Data Models for Clustered Data Edps/Psych/Stat 587 Carolyn J Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Fall 2017 Outline Notation NELS88 data Fixed Effects ANOVA

More information

ST Correlation and Regression

ST Correlation and Regression Chapter 5 ST 370 - Correlation and Regression Readings: Chapter 11.1-11.4, 11.7.2-11.8, Chapter 12.1-12.2 Recap: So far we ve learned: Why we want a random sample and how to achieve it (Sampling Scheme)

More information

Multicollinearity Exercise

Multicollinearity Exercise Multicollinearity Exercise Use the attached SAS output to answer the questions. [OPTIONAL: Copy the SAS program below into the SAS editor window and run it.] You do not need to submit any output, so there

More information

Faculty of Health Sciences. Correlated data. Variance component models. Lene Theil Skovgaard & Julie Lyng Forman.

Faculty of Health Sciences. Correlated data. Variance component models. Lene Theil Skovgaard & Julie Lyng Forman. Faculty of Health Sciences Correlated data Variance component models Lene Theil Skovgaard & Julie Lyng Forman November 28, 2017 1 / 96 Overview One-way anova with random variation The rabbit example Hierarchical

More information

STAT 3A03 Applied Regression Analysis With SAS Fall 2017

STAT 3A03 Applied Regression Analysis With SAS Fall 2017 STAT 3A03 Applied Regression Analysis With SAS Fall 2017 Assignment 5 Solution Set Q. 1 a The code that I used and the output is as follows PROC GLM DataS3A3.Wool plotsnone; Class Amp Len Load; Model CyclesAmp

More information

Correlated data. Overview. Variance component models. Terminology for correlated measurements. Faculty of Health Sciences. Variance component models

Correlated data. Overview. Variance component models. Terminology for correlated measurements. Faculty of Health Sciences. Variance component models Faculty of Health Sciences Overview Correlated data Variance component models Lene Theil Skovgaard & Julie Lyng Forman November 28, 2017 One-way anova with random variation The rabbit example Hierarchical

More information

Regression and correlation

Regression and correlation 6 Regression and correlation The main object of this chapter is to show how to perform basic regression analyses, including plots for model checking and display of confidence and prediction intervals.

More information

Correlation and Simple Linear Regression

Correlation and Simple Linear Regression Correlation and Simple Linear Regression Sasivimol Rattanasiri, Ph.D Section for Clinical Epidemiology and Biostatistics Ramathibodi Hospital, Mahidol University E-mail: sasivimol.rat@mahidol.ac.th 1 Outline

More information

Lecture 3: Inference in SLR

Lecture 3: Inference in SLR Lecture 3: Inference in SLR STAT 51 Spring 011 Background Reading KNNL:.1.6 3-1 Topic Overview This topic will cover: Review of hypothesis testing Inference about 1 Inference about 0 Confidence Intervals

More information

Faculty of Health Sciences. Correlated data. Variance component models. Lene Theil Skovgaard & Julie Lyng Forman.

Faculty of Health Sciences. Correlated data. Variance component models. Lene Theil Skovgaard & Julie Lyng Forman. Faculty of Health Sciences Correlated data Variance component models Lene Theil Skovgaard & Julie Lyng Forman November 27, 2018 1 / 84 Overview One-way anova with random variation The rabbit example Hierarchical

More information

Correlated data. Overview. Example: Swelling due to vaccine. Variance component models. Faculty of Health Sciences. Variance component models

Correlated data. Overview. Example: Swelling due to vaccine. Variance component models. Faculty of Health Sciences. Variance component models Faculty of Health Sciences Overview Correlated data Variance component models One-way anova with random variation The rabbit example Hierarchical models with several levels Random regression Lene Theil

More information

sociology sociology Scatterplots Quantitative Research Methods: Introduction to correlation and regression Age vs Income

sociology sociology Scatterplots Quantitative Research Methods: Introduction to correlation and regression Age vs Income Scatterplots Quantitative Research Methods: Introduction to correlation and regression Scatterplots can be considered as interval/ratio analogue of cross-tabs: arbitrarily many values mapped out in -dimensions

More information

Lecture notes on Regression & SAS example demonstration

Lecture notes on Regression & SAS example demonstration Regression & Correlation (p. 215) When two variables are measured on a single experimental unit, the resulting data are called bivariate data. You can describe each variable individually, and you can also

More information

3 Variables: Cyberloafing Conscientiousness Age

3 Variables: Cyberloafing Conscientiousness Age title 'Cyberloafing, Mike Sage'; run; PROC CORR data=sage; var Cyberloafing Conscientiousness Age; run; quit; The CORR Procedure 3 Variables: Cyberloafing Conscientiousness Age Simple Statistics Variable

More information

Lecture 1 Linear Regression with One Predictor Variable.p2

Lecture 1 Linear Regression with One Predictor Variable.p2 Lecture Linear Regression with One Predictor Variablep - Basics - Meaning of regression parameters p - β - the slope of the regression line -it indicates the change in mean of the probability distn of

More information

Booklet of Code and Output for STAC32 Final Exam

Booklet of Code and Output for STAC32 Final Exam Booklet of Code and Output for STAC32 Final Exam December 8, 2014 List of Figures in this document by page: List of Figures 1 Popcorn data............................. 2 2 MDs by city, with normal quantile

More information

Measuring relationships among multiple responses

Measuring relationships among multiple responses Measuring relationships among multiple responses Linear association (correlation, relatedness, shared information) between pair-wise responses is an important property used in almost all multivariate analyses.

More information

REVIEW 8/2/2017 陈芳华东师大英语系

REVIEW 8/2/2017 陈芳华东师大英语系 REVIEW Hypothesis testing starts with a null hypothesis and a null distribution. We compare what we have to the null distribution, if the result is too extreme to belong to the null distribution (p

More information

Chapter 2 Inferences in Simple Linear Regression

Chapter 2 Inferences in Simple Linear Regression STAT 525 SPRING 2018 Chapter 2 Inferences in Simple Linear Regression Professor Min Zhang Testing for Linear Relationship Term β 1 X i defines linear relationship Will then test H 0 : β 1 = 0 Test requires

More information

9. Linear Regression and Correlation

9. Linear Regression and Correlation 9. Linear Regression and Correlation Data: y a quantitative response variable x a quantitative explanatory variable (Chap. 8: Recall that both variables were categorical) For example, y = annual income,

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

Lecture 11 Multiple Linear Regression

Lecture 11 Multiple Linear Regression Lecture 11 Multiple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: 6.1-6.5 11-1 Topic Overview Review: Multiple Linear Regression (MLR) Computer Science Case Study 11-2 Multiple Regression

More information

IES 612/STA 4-573/STA Winter 2008 Week 1--IES 612-STA STA doc

IES 612/STA 4-573/STA Winter 2008 Week 1--IES 612-STA STA doc IES 612/STA 4-573/STA 4-576 Winter 2008 Week 1--IES 612-STA 4-573-STA 4-576.doc Review Notes: [OL] = Ott & Longnecker Statistical Methods and Data Analysis, 5 th edition. [Handouts based on notes prepared

More information

13 Simple Linear Regression

13 Simple Linear Regression B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 3 Simple Linear Regression 3. An industrial example A study was undertaken to determine the effect of stirring rate on the amount of impurity

More information

STAT 3A03 Applied Regression With SAS Fall 2017

STAT 3A03 Applied Regression With SAS Fall 2017 STAT 3A03 Applied Regression With SAS Fall 2017 Assignment 2 Solution Set Q. 1 I will add subscripts relating to the question part to the parameters and their estimates as well as the errors and residuals.

More information

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X. Estimating σ 2 We can do simple prediction of Y and estimation of the mean of Y at any value of X. To perform inferences about our regression line, we must estimate σ 2, the variance of the error term.

More information

Week 3: Simple Linear Regression

Week 3: Simple Linear Regression Week 3: Simple Linear Regression Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ALL RIGHTS RESERVED 1 Outline

More information

ST505/S697R: Fall Homework 2 Solution.

ST505/S697R: Fall Homework 2 Solution. ST505/S69R: Fall 2012. Homework 2 Solution. 1. 1a; problem 1.22 Below is the summary information (edited) from the regression (using R output); code at end of solution as is code and output for SAS. a)

More information

Simple Linear Regression

Simple Linear Regression Chapter 2 Simple Linear Regression Linear Regression with One Independent Variable 2.1 Introduction In Chapter 1 we introduced the linear model as an alternative for making inferences on means of one or

More information

Analysing data: regression and correlation S6 and S7

Analysing data: regression and correlation S6 and S7 Basic medical statistics for clinical and experimental research Analysing data: regression and correlation S6 and S7 K. Jozwiak k.jozwiak@nki.nl 2 / 49 Correlation So far we have looked at the association

More information

Homework 2: Simple Linear Regression

Homework 2: Simple Linear Regression STAT 4385 Applied Regression Analysis Homework : Simple Linear Regression (Simple Linear Regression) Thirty (n = 30) College graduates who have recently entered the job market. For each student, the CGPA

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

LINEAR REGRESSION. Copyright 2013, SAS Institute Inc. All rights reserved.

LINEAR REGRESSION. Copyright 2013, SAS Institute Inc. All rights reserved. LINEAR REGRESSION LINEAR REGRESSION REGRESSION AND OTHER MODELS Type of Response Type of Predictors Categorical Continuous Continuous and Categorical Continuous Analysis of Variance (ANOVA) Ordinary Least

More information

Analysis of variance and regression. December 4, 2007

Analysis of variance and regression. December 4, 2007 Analysis of variance and regression December 4, 2007 Variance component models Variance components One-way anova with random variation estimation interpretations Two-way anova with random variation Crossed

More information

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont. TCELL 9/4/205 36-309/749 Experimental Design for Behavioral and Social Sciences Simple Regression Example Male black wheatear birds carry stones to the nest as a form of sexual display. Soler et al. wanted

More information

5.3 Three-Stage Nested Design Example

5.3 Three-Stage Nested Design Example 5.3 Three-Stage Nested Design Example A researcher designs an experiment to study the of a metal alloy. A three-stage nested design was conducted that included Two alloy chemistry compositions. Three ovens

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there

More information

EXST7015: Estimating tree weights from other morphometric variables Raw data print

EXST7015: Estimating tree weights from other morphometric variables Raw data print Simple Linear Regression SAS example Page 1 1 ********************************************; 2 *** Data from Freund & Wilson (1993) ***; 3 *** TABLE 8.24 : ESTIMATING TREE WEIGHTS ***; 4 ********************************************;

More information

Failure Time of System due to the Hot Electron Effect

Failure Time of System due to the Hot Electron Effect of System due to the Hot Electron Effect 1 * exresist; 2 option ls=120 ps=75 nocenter nodate; 3 title of System due to the Hot Electron Effect ; 4 * TIME = failure time (hours) of a system due to drift

More information

y response variable x 1, x 2,, x k -- a set of explanatory variables

y response variable x 1, x 2,, x k -- a set of explanatory variables 11. Multiple Regression and Correlation y response variable x 1, x 2,, x k -- a set of explanatory variables In this chapter, all variables are assumed to be quantitative. Chapters 12-14 show how to incorporate

More information

STOR 455 STATISTICAL METHODS I

STOR 455 STATISTICAL METHODS I STOR 455 STATISTICAL METHODS I Jan Hannig Mul9variate Regression Y=X β + ε X is a regression matrix, β is a vector of parameters and ε are independent N(0,σ) Es9mated parameters b=(x X) - 1 X Y Predicted

More information

Introduction to SAS proc mixed

Introduction to SAS proc mixed Faculty of Health Sciences Introduction to SAS proc mixed Analysis of repeated measurements, 2017 Julie Forman Department of Biostatistics, University of Copenhagen 2 / 28 Preparing data for analysis The

More information

Handout 1: Predicting GPA from SAT

Handout 1: Predicting GPA from SAT Handout 1: Predicting GPA from SAT appsrv01.srv.cquest.utoronto.ca> appsrv01.srv.cquest.utoronto.ca> ls Desktop grades.data grades.sas oldstuff sasuser.800 appsrv01.srv.cquest.utoronto.ca> cat grades.data

More information

Topic 25 - One-Way Random Effects Models. Outline. Random Effects vs Fixed Effects. Data for One-way Random Effects Model. One-way Random effects

Topic 25 - One-Way Random Effects Models. Outline. Random Effects vs Fixed Effects. Data for One-way Random Effects Model. One-way Random effects Topic 5 - One-Way Random Effects Models One-way Random effects Outline Model Variance component estimation - Fall 013 Confidence intervals Topic 5 Random Effects vs Fixed Effects Consider factor with numerous

More information

STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002

STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002 Time allowed: 3 HOURS. STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002 This is an open book exam: all course notes and the text are allowed, and you are expected to use your own calculator.

More information

Correlated data. Variance component models. Example: Evaluate vaccine. Traditional assumption so far. Faculty of Health Sciences

Correlated data. Variance component models. Example: Evaluate vaccine. Traditional assumption so far. Faculty of Health Sciences Faculty of Health Sciences Variance component models Definitions and motivation Correlated data Variance component models, I Lene Theil Skovgaard November 29, 2013 One-way anova with random variation The

More information

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression 36-309/749 Experimental Design for Behavioral and Social Sciences Sep. 22, 2015 Lecture 4: Linear Regression TCELL Simple Regression Example Male black wheatear birds carry stones to the nest as a form

More information

Interactions. Interactions. Lectures 1 & 2. Linear Relationships. y = a + bx. Slope. Intercept

Interactions. Interactions. Lectures 1 & 2. Linear Relationships. y = a + bx. Slope. Intercept Interactions Lectures 1 & Regression Sometimes two variables appear related: > smoking and lung cancers > height and weight > years of education and income > engine size and gas mileage > GMAT scores and

More information

Detecting and Assessing Data Outliers and Leverage Points

Detecting and Assessing Data Outliers and Leverage Points Chapter 9 Detecting and Assessing Data Outliers and Leverage Points Section 9.1 Background Background Because OLS estimators arise due to the minimization of the sum of squared errors, large residuals

More information

ECO220Y Simple Regression: Testing the Slope

ECO220Y Simple Regression: Testing the Slope ECO220Y Simple Regression: Testing the Slope Readings: Chapter 18 (Sections 18.3-18.5) Winter 2012 Lecture 19 (Winter 2012) Simple Regression Lecture 19 1 / 32 Simple Regression Model y i = β 0 + β 1 x

More information

Paper: ST-161. Techniques for Evidence-Based Decision Making Using SAS Ian Stockwell, The Hilltop UMBC, Baltimore, MD

Paper: ST-161. Techniques for Evidence-Based Decision Making Using SAS Ian Stockwell, The Hilltop UMBC, Baltimore, MD Paper: ST-161 Techniques for Evidence-Based Decision Making Using SAS Ian Stockwell, The Hilltop Institute @ UMBC, Baltimore, MD ABSTRACT SAS has many tools that can be used for data analysis. From Freqs

More information

Statistical Modelling in Stata 5: Linear Models

Statistical Modelling in Stata 5: Linear Models Statistical Modelling in Stata 5: Linear Models Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 07/11/2017 Structure This Week What is a linear model? How good is my model? Does

More information

ANALYSIS OF VARIANCE OF BALANCED DAIRY SCIENCE DATA USING SAS

ANALYSIS OF VARIANCE OF BALANCED DAIRY SCIENCE DATA USING SAS ANALYSIS OF VARIANCE OF BALANCED DAIRY SCIENCE DATA USING SAS Ravinder Malhotra and Vipul Sharma National Dairy Research Institute, Karnal-132001 The most common use of statistics in dairy science is testing

More information

Scatter plot of data from the study. Linear Regression

Scatter plot of data from the study. Linear Regression 1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25

More information

Introduction to SAS proc mixed

Introduction to SAS proc mixed Faculty of Health Sciences Introduction to SAS proc mixed Analysis of repeated measurements, 2017 Julie Forman Department of Biostatistics, University of Copenhagen Outline Data in wide and long format

More information

ssh tap sas913, sas

ssh tap sas913, sas B. Kedem, STAT 430 SAS Examples SAS8 ===================== ssh xyz@glue.umd.edu, tap sas913, sas https://www.statlab.umd.edu/sasdoc/sashtml/onldoc.htm Multiple Regression ====================== 0. Show

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression In simple linear regression we are concerned about the relationship between two variables, X and Y. There are two components to such a relationship. 1. The strength of the relationship.

More information

Biostatistics. Correlation and linear regression. Burkhardt Seifert & Alois Tschopp. Biostatistics Unit University of Zurich

Biostatistics. Correlation and linear regression. Burkhardt Seifert & Alois Tschopp. Biostatistics Unit University of Zurich Biostatistics Correlation and linear regression Burkhardt Seifert & Alois Tschopp Biostatistics Unit University of Zurich Master of Science in Medical Biology 1 Correlation and linear regression Analysis

More information

Circle a single answer for each multiple choice question. Your choice should be made clearly.

Circle a single answer for each multiple choice question. Your choice should be made clearly. TEST #1 STA 4853 March 4, 215 Name: Please read the following directions. DO NOT TURN THE PAGE UNTIL INSTRUCTED TO DO SO Directions This exam is closed book and closed notes. There are 31 questions. Circle

More information

R 2 and F -Tests and ANOVA

R 2 and F -Tests and ANOVA R 2 and F -Tests and ANOVA December 6, 2018 1 Partition of Sums of Squares The distance from any point y i in a collection of data, to the mean of the data ȳ, is the deviation, written as y i ȳ. Definition.

More information

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model Topic 17 - Single Factor Analysis of Variance - Fall 2013 One way ANOVA Cell means model Factor effects model Outline Topic 17 2 One-way ANOVA Response variable Y is continuous Explanatory variable is

More information

Correlation and Linear Regression

Correlation and Linear Regression Correlation and Linear Regression Correlation: Relationships between Variables So far, nearly all of our discussion of inferential statistics has focused on testing for differences between group means

More information

The General Linear Model. April 22, 2008

The General Linear Model. April 22, 2008 The General Linear Model. April 22, 2008 Multiple regression Data: The Faroese Mercury Study Simple linear regression Confounding The multiple linear regression model Interpretation of parameters Model

More information

The General Linear Model. November 20, 2007

The General Linear Model. November 20, 2007 The General Linear Model. November 20, 2007 Multiple regression Data: The Faroese Mercury Study Simple linear regression Confounding The multiple linear regression model Interpretation of parameters Model

More information

Chapter 8 Quantitative and Qualitative Predictors

Chapter 8 Quantitative and Qualitative Predictors STAT 525 FALL 2017 Chapter 8 Quantitative and Qualitative Predictors Professor Dabao Zhang Polynomial Regression Multiple regression using X 2 i, X3 i, etc as additional predictors Generates quadratic,

More information