Answer to exercise 'height vs. age' (Juul)

Size: px
Start display at page:

Download "Answer to exercise 'height vs. age' (Juul)"

Transcription

1 Answer to exercise 'height vs. age' (Juul) Question 1 Fitting a straight line to height for males in the age range 5-20 and making the corresponding illustration is performed by writing: proc reg data=juul; where age ge 5 and age le 20 and sex='male'; model height=age; proc gplot data=juul; where age ge 5 and age le 20 and sex='male'; plot height*age / haxis=axis1 vaxis=axis2 frame; axis1 value=(h=3) minor=none label=(h=3); axis2 value=(h=3) minor=none label=(a=90 R=0 H=3); symbol1 v=circle i=rl c=black h=2 l=1 w=2; It looks as if the variation increases somewhat with age, but for ease of interpretation, we abstain from performing any transformation. More importantly, it looks as if linearity is not entirely adequate, at least not extending beyond the age of 16. This is indeed not very surprising. 1

2 Question 2 Graphical ts of parabolas and cubic functions may be performed simply by using i=rq or i=rc in the symbol-statement in gplot. We get Quadratic ts: Cubic ts: In order to t the models for males and females simultaneously, we may use two dierent parametrisations. First, we choose the estimation friendy version: 2

3 proc glm data=juul; where age ge 5 and age le 20; class sex; model height=sex sex*age sex*age2 sex*age3 / noint solution; which gives The GLM Procedure Dependent Variable: height Sum of Source DF Squares Mean Square F Value Pr > F Model <.0001 Error Uncorrected Total R-Square Coeff Var Root MSE height Mean Source DF Type I SS Mean Square F Value Pr > F sex <.0001 age*sex <.0001 age2*sex <.0001 age3*sex <.0001 Source DF Type III SS Mean Square F Value Pr > F sex <.0001 age*sex age2*sex <.0001 age3*sex <.0001 Standard Parameter Estimate Error t Value Pr > t sex female <.0001 sex male <.0001 age*sex female age*sex male age2*sex female age2*sex male <.0001 age3*sex female <.0001 age3*sex male <.0001 or with the test friendly parametrisation: 3

4 proc glm data=juul; where age ge 5 and age le 20; class sex; model height=age age2 age3 sex sex*age sex*age2 sex*age3 / solution; which gives The GLM Procedure Dependent Variable: height Sum of Source DF Squares Mean Square F Value Pr > F Model <.0001 Error Corrected Total R-Square Coeff Var Root MSE height Mean Source DF Type I SS Mean Square F Value Pr > F age <.0001 age <.0001 age <.0001 sex <.0001 age*sex <.0001 age2*sex <.0001 age3*sex Source DF Type III SS Mean Square F Value Pr > F age age <.0001 age <.0001 sex age*sex age2*sex age3*sex Standard Parameter Estimate Error t Value Pr > t Intercept B <.0001 age B age B <.0001 age B <.0001 sex female B sex male B... 4

5 age*sex female B age*sex male B... age2*sex female B age2*sex male B... age3*sex female B age3*sex male B... NOTE: The X'X matrix has been found to be singular, and a generalized inverse was used to solve the normal equations. Terms whose estimates are followed by the letter 'B' are not uniquely estimable. We see that the two genders do not deviate in the coecient of the third order term (P=0.32), but (seen from Type I) they do dier in the coecient of the second order term (F=44.04, P < ). From the estimation friendly parametrisation, we may also note that the linear term is not signicant for girls (P=0.67), but this is rather hard to interpret... Question 3 Automatic t with e.g. i=sm40 gives the result which pretty much resembles the parabolic ts. Question 4. We now consider cutpoints at the ages 10, 12, 13 and 15 and dene variables 5

6 denoting number of years above these respective thresholds. data juul; set juul; /* definition af new intercept, at the age 5 */ age5=age-5; extra_age10=max(age-10,0); extra_age12=max(age-12,0); extra_age13=max(age-13,0); extra_age15=max(age-15,0); We now use these new variables to dene a linear spline, and at the same time we test linearity using a contrast-statement proc glm data=juul; where age ge 5 and age le 20; by sex; model lheight=age5 extra_age15 extra_age13 extra_age12 extra_age10 / solution; contrast 'all' extra_age10 1, extra_age12 1, extra_age13 1, extra_age15 1; From this we get sex=male The GLM Procedure Number of Observations Used 502 Dependent Variable: height Sum of Source DF Squares Mean Square F Value Pr > F Model <.0001 Error Corrected Total R-Square Coeff Var Root MSE height Mean

7 Source DF Type I SS Mean Square F Value Pr > F age <.0001 extra_age <.0001 extra_age extra_age extra_age Source DF Type III SS Mean Square F Value Pr > F age <.0001 extra_age <.0001 extra_age extra_age extra_age Contrast DF Contrast SS Mean Square F Value Pr > F all <.0001 Standard Parameter Estimate Error t Value Pr > t Intercept <.0001 age <.0001 extra_age <.0001 extra_age extra_age extra_age Question 5 We nd the rst deviation from the straight line at the age of 13 (based on the Type I tests, P=0.0002). Question 6 For girls, we get similarly sex=female The GLM Procedure Number of Observations Used 628 Dependent Variable: height Sum of Source DF Squares Mean Square F Value Pr > F Model <.0001 Error Corrected Total R-Square Coeff Var Root MSE height Mean

8 Source DF Type I SS Mean Square F Value Pr > F age <.0001 extra_age <.0001 extra_age <.0001 extra_age extra_age Source DF Type III SS Mean Square F Value Pr > F age <.0001 extra_age extra_age extra_age extra_age Contrast DF Contrast SS Mean Square F Value Pr > F all <.0001 Standard Parameter Estimate Error t Value Pr > t Intercept <.0001 age <.0001 extra_age extra_age extra_age extra_age and again, we see the rst deviation at the age of 13 (F=17.95, P < ). Question 7 A model for the two genders simultaneously is dened as proc glm data=juul; where age ge 5 and age le 20; class sex; model height=age5 extra_age10 extra_age12 extra_age13 extra_age15 sex sex*extra_age15 sex*extra_age13 sex*extra_age12 sex*extra_age10 sex*age5 / solution; and yields 8

9 The GLM Procedure Dependent Variable: height Sum of Source DF Squares Mean Square F Value Pr > F Model <.0001 Error Corrected Total R-Square Coeff Var Root MSE height Mean Source DF Type I SS Mean Square F Value Pr > F age <.0001 extra_age <.0001 extra_age <.0001 extra_age <.0001 extra_age <.0001 sex <.0001 extra_age15*sex <.0001 extra_age13*sex <.0001 extra_age12*sex extra_age10*sex age5*sex Source DF Type III SS Mean Square F Value Pr > F age <.0001 extra_age extra_age extra_age extra_age <.0001 sex extra_age15*sex extra_age13*sex extra_age12*sex extra_age10*sex age5*sex Standard Parameter Estimate Error t Value Pr > t Intercept B <.0001 age B <.0001 extra_age B extra_age B extra_age B extra_age B <.0001 sex female B sex male B... extra_age15*sex female B extra_age15*sex male B... extra_age13*sex female B extra_age13*sex male B... 9

10 extra_age12*sex female B extra_age12*sex male B... extra_age10*sex female B extra_age10*sex male B... age5*sex female B age5*sex male B... NOTE: The X'X matrix has been found to be singular, and a generalized inverse was used to solve the normal equations. Terms whose estimates are followed by the letter 'B' are not uniquely estimable. We see some rather small signs of the genders dier already from the age of 10, although this does not reach signicance. From the age 13 and onwards, they clearly dier, since the boys grow more quickly than the girls. A scatter plot of the predictions looks like If we reduce the model to include only changes in the slope from the age 13 and onwards, we get The GLM Procedure Class Level Information Class Levels Values sex 2 female male Number of Observations Read 1142 Number of Observations Used 1130 Dependent Variable: height 10

11 Sum of Source DF Squares Mean Square F Value Pr > F Model <.0001 Error Corrected Total R-Square Coeff Var Root MSE height Mean Source DF Type I SS Mean Square F Value Pr > F age <.0001 extra_age <.0001 extra_age <.0001 sex <.0001 extra_age15*sex <.0001 extra_age13*sex <.0001 Source DF Type III SS Mean Square F Value Pr > F age <.0001 extra_age extra_age <.0001 sex extra_age15*sex extra_age13*sex <.0001 Standard Parameter Estimate Error t Value Pr > t Intercept B <.0001 age <.0001 extra_age B <.0001 extra_age B <.0001 sex female B sex male B... extra_age15*sex female B extra_age15*sex male B... extra_age13*sex female B <.0001 extra_age13*sex male B... NOTE: The X'X matrix has been found to be singular, and a generalized inverse was used to solve the normal equations. Terms whose estimates are followed by the letter 'B' are not uniquely estimable. with a new plot of predicted values: 11

12 From the output as well as from the gures, we note that between the ages 13 and 15, boys grow an extra 4.4 cm (s.e.=0.62cm) a year on average, compared to the girls. Using the estimation friendly version of this model gives us Standard Parameter Estimate Error t Value Pr > t age <.0001 sex female <.0001 sex male <.0001 extra_age15*sex female extra_age15*sex male <.0001 extra_age13*sex female <.0001 extra_age13*sex male <.0001 Question 8 The Gompertz growth model takes the following form: height = α e β e γ age It is a bit tricky to determine starting values for this equation, since we have no possibility of transforming to linearity. First of all, we realize that when age increases, we approach an asymptote which is α. Hence, a reasonable guess of α is a maximum height, i.e. around 12

13 200cm. The parameter β is related to the timing of growth, whereas γ describes the growth velocity. Setting α to be 200, we may rearrange the above equation, and transform with the (natural) logarithm: height/200 e β e γ age log(200/height) β e γ age log(log(200/height)) log(β) γ age By dening the transformed variable y=log(log(200/height)), we may therefore estimate β and γ using linear regression, e.g. for males: data juul; set juul; y=log(log(200/height)); /* create rather crude starting values for nonlinear regression */ proc reg data=juul; model y=age; where age ge 5 and age le 20 and sex='male'; from which we get The REG Procedure Dependent Variable: y Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model <.0001 Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var

14 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept <.0001 age <.0001 We therefore take the starting values to be: α = 200, β = exp( ) = 1.54, γ = Question 9 We now t the nonlinear model using these starting values proc nlin data=juul; where age ge 5 and age le 20; by sex; parms alfa=200 beta=1.5 gamma=0.15; model height=alfa*exp(-beta*exp(-gamma*age)); output out=ny p=yhat r=resid; sex=female The NLIN Procedure Dependent Variable height Approx Parameter Estimate Std Error Approximate 95% Confidence Limits alfa beta gamma sex=male The NLIN Procedure Approx Parameter Estimate Std Error Approximate 95% Confidence Limits alfa beta gamma

15 The output statement creates a new data set called ny, from which we may make residual plots, or as shown here, plots showing the tted relations: We note the slight increase in variance with age, which could be eliminated by a logarithmic transformation of height. Therefore we could instead t the model as data juul; set juul; lheight=log(height); proc nlin data=juul; where age ge 5 and age le 20; by sex; parms alfa=200 beta=1.5 gamma=0.15; model lheight=log(alfa)-beta*exp(-gamma*age); output out=ny p=yhat r=resid; 15

16 sex=female The NLIN Procedure Approx Parameter Estimate Std Error Approximate 95% Confidence Limits alfa beta gamma sex=male The NLIN Procedure Approx Parameter Estimate Std Error Approximate 95% Confidence Limits alfa beta gamma Question 10 We may compare males and females in various ways. From the estimates above, we may perform simple t-tests, e.g. for the parameter β: = 1.11 which is not signicant, no matter how many degrees of freedom this quantity might have (approximately). We may also perform a test in SAS by analysing the two genders simultaneously: proc nlin data=juul; where age ge 5 and age le 20; parms alfa=200 beta=1.5 gamma=0.15 alfadif=0 betadif=0 gammadif=0; if sex='male' then model lheight=log(alfa)-beta*exp(-gamma*age); if sex='female' then model lheight=log(alfa+alfadif)- (beta+betadif)*exp(-(gamma+gammadif)*age); 16

17 yielding The NLIN Procedure Dependent Variable lheight Approx Parameter Estimate Std Error Approximate 95% Confidence Limits alfa beta gamma alfadif betadif gammadif We may conclude that the αs and the γs dier between males and females, whereas the βs do not seem to dier. A cautious interpretation is that boys grow to become taller than girls (α) but that their growth appear to happen at a slower pace (γ). No dierence can be found in the timing of growth between the two genders. 17

18 Answer to additional questions for the BOC exercise Question 1 We t the nonlinear relation directly using the procedure nlin: proc nlin data=boc; parms beta=0.8 gamma=228; model boc=gamma*exp(-beta/days); output out=ny p=yhat r=resid; which creates the following output (plus some more, which is not shown here) Dependent Variable boc Method: Gauss-Newton Iterative Phase Sum of Iter beta gamma Squares NOTE: Convergence criterion met. NOTE: An intercept was not specified for this model. Sum of Mean Approx Source DF Squares Square F Value Pr > F Regression <.0001 Residual Uncorrected Total Corrected Total

19 Approx Parameter Estimate Std Error Approximate 95% Confidence Limits beta gamma This provides a direct estimation of both parameters and we may compare the new condence intervals with the ones obtained from linearisation of the data: method β γ linearisation (0.727,0.888) (219.6,237.7) non-linear regression (0.713,0.952) (221.2,240.0) The dierences are not pronounced, but the nonlinear regression gives slightly wider condence intervals, especially for β. Question 2 The linear regression of the transformed relation is to be preferred since the assumption of variance homogeneity is better satised for this relation. This may also be seen from the residual plot below which is based on the nonlinear regression and which shows a clear 'trumpet' pattern. 19

20 However, the ts for the two models do not dier much as can be seen from the plots below: 20

Analysis of variance and regression. November 22, 2007

Analysis of variance and regression. November 22, 2007 Analysis of variance and regression November 22, 2007 Parametrisations: Choice of parameters Comparison of models Test for linearity Linear splines Lene Theil Skovgaard, Dept. of Biostatistics, Institute

More information

Parametrisations, splines

Parametrisations, splines / 7 Parametrisations, splines Analysis of variance and regression course http://staff.pubhealth.ku.dk/~lts/regression_2 Marc Andersen, mja@statgroup.dk Analysis of variance and regression for health researchers,

More information

6. Multiple regression - PROC GLM

6. Multiple regression - PROC GLM Use of SAS - November 2016 6. Multiple regression - PROC GLM Karl Bang Christensen Department of Biostatistics, University of Copenhagen. http://biostat.ku.dk/~kach/sas2016/ kach@biostat.ku.dk, tel: 35327491

More information

Chapter 1 Linear Regression with One Predictor

Chapter 1 Linear Regression with One Predictor STAT 525 FALL 2018 Chapter 1 Linear Regression with One Predictor Professor Min Zhang Goals of Regression Analysis Serve three purposes Describes an association between X and Y In some applications, the

More information

Nonlinear regression. Nonlinear regression analysis. Polynomial regression. How can we model non-linear effects?

Nonlinear regression. Nonlinear regression analysis. Polynomial regression. How can we model non-linear effects? Nonlinear regression Nonlinear regression analysis Peter Dalgaard (orig. Lene Theil Skovgaard) Department of Biostatistics University of Copenhagen Simple kinetic model Compartment models Michaelis Menten

More information

Analysis of variance and regression. November 29, 2007

Analysis of variance and regression. November 29, 2007 Analysis of variance and regression November 29, 2007 Nonlinear regression: Simple kinetic model Compartment models Michaelis Menten reaction Dose-response relationships Lene Theil Skovgaard, Dept. of

More information

ANALYSES OF NCGS DATA FOR ALCOHOL STATUS CATEGORIES 1 22:46 Sunday, March 2, 2003

ANALYSES OF NCGS DATA FOR ALCOHOL STATUS CATEGORIES 1 22:46 Sunday, March 2, 2003 ANALYSES OF NCGS DATA FOR ALCOHOL STATUS CATEGORIES 1 22:46 Sunday, March 2, 2003 The MEANS Procedure DRINKING STATUS=1 Analysis Variable : TRIGL N Mean Std Dev Minimum Maximum 164 151.6219512 95.3801744

More information

T-test: means of Spock's judge versus all other judges 1 12:10 Wednesday, January 5, judge1 N Mean Std Dev Std Err Minimum Maximum

T-test: means of Spock's judge versus all other judges 1 12:10 Wednesday, January 5, judge1 N Mean Std Dev Std Err Minimum Maximum T-test: means of Spock's judge versus all other judges 1 The TTEST Procedure Variable: pcwomen judge1 N Mean Std Dev Std Err Minimum Maximum OTHER 37 29.4919 7.4308 1.2216 16.5000 48.9000 SPOCKS 9 14.6222

More information

Nonlinear curve fitting

Nonlinear curve fitting 16 Nonlinear curve fitting Curve fitting problems occur in many scientific areas. The typical case is that you wish to fit the relation between some response y and a one-dimensional predictor x, by adjusting

More information

Lecture 11: Simple Linear Regression

Lecture 11: Simple Linear Regression Lecture 11: Simple Linear Regression Readings: Sections 3.1-3.3, 11.1-11.3 Apr 17, 2009 In linear regression, we examine the association between two quantitative variables. Number of beers that you drink

More information

Swabs, revisited. The families were subdivided into 3 groups according to the factor crowding, which describes the space available for the household.

Swabs, revisited. The families were subdivided into 3 groups according to the factor crowding, which describes the space available for the household. Swabs, revisited 18 families with 3 children each (in well defined age intervals) were followed over a certain period of time, during which repeated swabs were taken. The variable swabs indicates how many

More information

Linear models Analysis of Covariance

Linear models Analysis of Covariance Esben Budtz-Jørgensen April 22, 2008 Linear models Analysis of Covariance Confounding Interactions Parameterizations Analysis of Covariance group comparisons can become biased if an important predictor

More information

Chapter 8 Quantitative and Qualitative Predictors

Chapter 8 Quantitative and Qualitative Predictors STAT 525 FALL 2017 Chapter 8 Quantitative and Qualitative Predictors Professor Dabao Zhang Polynomial Regression Multiple regression using X 2 i, X3 i, etc as additional predictors Generates quadratic,

More information

Linear models Analysis of Covariance

Linear models Analysis of Covariance Esben Budtz-Jørgensen November 20, 2007 Linear models Analysis of Covariance Confounding Interactions Parameterizations Analysis of Covariance group comparisons can become biased if an important predictor

More information

unadjusted model for baseline cholesterol 22:31 Monday, April 19,

unadjusted model for baseline cholesterol 22:31 Monday, April 19, unadjusted model for baseline cholesterol 22:31 Monday, April 19, 2004 1 Class Level Information Class Levels Values TRETGRP 3 3 4 5 SEX 2 0 1 Number of observations 916 unadjusted model for baseline cholesterol

More information

Analysis of variance and regression. April 17, Contents Comparison of several groups One-way ANOVA. Two-way ANOVA Interaction Model checking

Analysis of variance and regression. April 17, Contents Comparison of several groups One-way ANOVA. Two-way ANOVA Interaction Model checking Analysis of variance and regression Contents Comparison of several groups One-way ANOVA April 7, 008 Two-way ANOVA Interaction Model checking ANOVA, April 008 Comparison of or more groups Julie Lyng Forman,

More information

Outline. Review regression diagnostics Remedial measures Weighted regression Ridge regression Robust regression Bootstrapping

Outline. Review regression diagnostics Remedial measures Weighted regression Ridge regression Robust regression Bootstrapping Topic 19: Remedies Outline Review regression diagnostics Remedial measures Weighted regression Ridge regression Robust regression Bootstrapping Regression Diagnostics Summary Check normality of the residuals

More information

General Linear Model (Chapter 4)

General Linear Model (Chapter 4) General Linear Model (Chapter 4) Outcome variable is considered continuous Simple linear regression Scatterplots OLS is BLUE under basic assumptions MSE estimates residual variance testing regression coefficients

More information

data proc sort proc corr run proc reg run proc glm run proc glm run proc glm run proc reg CONMAIN CONINT run proc reg DUMMAIN DUMINT run proc reg

data proc sort proc corr run proc reg run proc glm run proc glm run proc glm run proc reg CONMAIN CONINT run proc reg DUMMAIN DUMINT run proc reg data one; input id Y group X; I1=0;I2=0;I3=0;if group=1 then I1=1;if group=2 then I2=1;if group=3 then I3=1; IINT1=I1*X;IINT2=I2*X;IINT3=I3*X; *************************************************************************;

More information

The General Linear Model. April 22, 2008

The General Linear Model. April 22, 2008 The General Linear Model. April 22, 2008 Multiple regression Data: The Faroese Mercury Study Simple linear regression Confounding The multiple linear regression model Interpretation of parameters Model

More information

Outline. Analysis of Variance. Acknowledgements. Comparison of 2 or more groups. Comparison of serveral groups

Outline. Analysis of Variance. Acknowledgements. Comparison of 2 or more groups. Comparison of serveral groups Outline Analysis of Variance Analysis of variance and regression course http://staff.pubhealth.ku.dk/~lts/regression10_2/index.html Comparison of serveral groups Model checking Marc Andersen, mja@statgroup.dk

More information

The General Linear Model. November 20, 2007

The General Linear Model. November 20, 2007 The General Linear Model. November 20, 2007 Multiple regression Data: The Faroese Mercury Study Simple linear regression Confounding The multiple linear regression model Interpretation of parameters Model

More information

Overview Scatter Plot Example

Overview Scatter Plot Example Overview Topic 22 - Linear Regression and Correlation STAT 5 Professor Bruce Craig Consider one population but two variables For each sampling unit observe X and Y Assume linear relationship between variables

More information

Outline. Analysis of Variance. Comparison of 2 or more groups. Acknowledgements. Comparison of serveral groups

Outline. Analysis of Variance. Comparison of 2 or more groups. Acknowledgements. Comparison of serveral groups Outline Analysis of Variance Analysis of variance and regression course http://staff.pubhealth.ku.dk/~jufo/varianceregressionf2011.html Comparison of serveral groups Model checking Marc Andersen, mja@statgroup.dk

More information

a. The least squares estimators of intercept and slope are (from JMP output): b 0 = 6.25 b 1 =

a. The least squares estimators of intercept and slope are (from JMP output): b 0 = 6.25 b 1 = Stat 28 Fall 2004 Key to Homework Exercise.10 a. There is evidence of a linear trend: winning times appear to decrease with year. A straight-line model for predicting winning times based on year is: Winning

More information

Chapter 8 (More on Assumptions for the Simple Linear Regression)

Chapter 8 (More on Assumptions for the Simple Linear Regression) EXST3201 Chapter 8b Geaghan Fall 2005: Page 1 Chapter 8 (More on Assumptions for the Simple Linear Regression) Your textbook considers the following assumptions: Linearity This is not something I usually

More information

Analysis of variance. April 16, Contents Comparison of several groups

Analysis of variance. April 16, Contents Comparison of several groups Contents Comparison of several groups Analysis of variance April 16, 2009 One-way ANOVA Two-way ANOVA Interaction Model checking Acknowledgement for use of presentation Julie Lyng Forman, Dept. of Biostatistics

More information

Analysis of variance. April 16, 2009

Analysis of variance. April 16, 2009 Analysis of variance April 16, 2009 Contents Comparison of several groups One-way ANOVA Two-way ANOVA Interaction Model checking Acknowledgement for use of presentation Julie Lyng Forman, Dept. of Biostatistics

More information

using the beginning of all regression models

using the beginning of all regression models Estimating using the beginning of all regression models 3 examples Note about shorthand Cavendish's 29 measurements of the earth's density Heights (inches) of 14 11 year-old males from Alberta study Half-life

More information

Lecture 11 Multiple Linear Regression

Lecture 11 Multiple Linear Regression Lecture 11 Multiple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: 6.1-6.5 11-1 Topic Overview Review: Multiple Linear Regression (MLR) Computer Science Case Study 11-2 Multiple Regression

More information

STAT 350: Summer Semester Midterm 1: Solutions

STAT 350: Summer Semester Midterm 1: Solutions Name: Student Number: STAT 350: Summer Semester 2008 Midterm 1: Solutions 9 June 2008 Instructor: Richard Lockhart Instructions: This is an open book test. You may use notes, text, other books and a calculator.

More information

Analysis of Variance

Analysis of Variance 1 / 70 Analysis of Variance Analysis of variance and regression course http://staff.pubhealth.ku.dk/~lts/regression11_2 Marc Andersen, mja@statgroup.dk Analysis of variance and regression for health researchers,

More information

Booklet of Code and Output for STAC32 Final Exam

Booklet of Code and Output for STAC32 Final Exam Booklet of Code and Output for STAC32 Final Exam December 7, 2017 Figure captions are below the Figures they refer to. LowCalorie LowFat LowCarbo Control 8 2 3 2 9 4 5 2 6 3 4-1 7 5 2 0 3 1 3 3 Figure

More information

In Class Review Exercises Vartanian: SW 540

In Class Review Exercises Vartanian: SW 540 In Class Review Exercises Vartanian: SW 540 1. Given the following output from an OLS model looking at income, what is the slope and intercept for those who are black and those who are not black? b SE

More information

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 27 pages

More information

Correlated data. Introduction. We expect students to... Aim of the course. Faculty of Health Sciences. NFA, May 19, 2014.

Correlated data. Introduction. We expect students to... Aim of the course. Faculty of Health Sciences. NFA, May 19, 2014. Faculty of Health Sciences Introduction Correlated data NFA, May 19, 2014 Introduction Julie Lyng Forman & Lene Theil Skovgaard Department of Biostatistics University of Copenhagen The idea of the course

More information

3 Variables: Cyberloafing Conscientiousness Age

3 Variables: Cyberloafing Conscientiousness Age title 'Cyberloafing, Mike Sage'; run; PROC CORR data=sage; var Cyberloafing Conscientiousness Age; run; quit; The CORR Procedure 3 Variables: Cyberloafing Conscientiousness Age Simple Statistics Variable

More information

Statistics for exp. medical researchers Regression and Correlation

Statistics for exp. medical researchers Regression and Correlation Faculty of Health Sciences Regression analysis Statistics for exp. medical researchers Regression and Correlation Lene Theil Skovgaard Sept. 28, 2015 Linear regression, Estimation and Testing Confidence

More information

SAS Commands. General Plan. Output. Construct scatterplot / interaction plot. Run full model

SAS Commands. General Plan. Output. Construct scatterplot / interaction plot. Run full model Topic 23 - Unequal Replication Data Model Outline - Fall 2013 Parameter Estimates Inference Topic 23 2 Example Page 954 Data for Two Factor ANOVA Y is the response variable Factor A has levels i = 1, 2,...,

More information

Stat 302 Statistical Software and Its Applications SAS: Simple Linear Regression

Stat 302 Statistical Software and Its Applications SAS: Simple Linear Regression 1 Stat 302 Statistical Software and Its Applications SAS: Simple Linear Regression Fritz Scholz Department of Statistics, University of Washington Winter Quarter 2015 February 16, 2015 2 The Spirit of

More information

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model Topic 17 - Single Factor Analysis of Variance - Fall 2013 One way ANOVA Cell means model Factor effects model Outline Topic 17 2 One-way ANOVA Response variable Y is continuous Explanatory variable is

More information

Data Set 8: Laysan Finch Beak Widths

Data Set 8: Laysan Finch Beak Widths Data Set 8: Finch Beak Widths Statistical Setting This handout describes an analysis of covariance (ANCOVA) involving one categorical independent variable (with only two levels) and one quantitative covariate.

More information

SAS Procedures Inference about the Line ffl model statement in proc reg has many options ffl To construct confidence intervals use alpha=, clm, cli, c

SAS Procedures Inference about the Line ffl model statement in proc reg has many options ffl To construct confidence intervals use alpha=, clm, cli, c Inference About the Slope ffl As with all estimates, ^fi1 subject to sampling var ffl Because Y jx _ Normal, the estimate ^fi1 _ Normal A linear combination of indep Normals is Normal Simple Linear Regression

More information

Statistics 5100 Spring 2018 Exam 1

Statistics 5100 Spring 2018 Exam 1 Statistics 5100 Spring 2018 Exam 1 Directions: You have 60 minutes to complete the exam. Be sure to answer every question, and do not spend too much time on any part of any question. Be concise with all

More information

Statistics 512: Solution to Homework#11. Problems 1-3 refer to the soybean sausage dataset of Problem 20.8 (ch21pr08.dat).

Statistics 512: Solution to Homework#11. Problems 1-3 refer to the soybean sausage dataset of Problem 20.8 (ch21pr08.dat). Statistics 512: Solution to Homework#11 Problems 1-3 refer to the soybean sausage dataset of Problem 20.8 (ch21pr08.dat). 1. Perform the two-way ANOVA without interaction for this model. Use the results

More information

SplineLinear.doc 1 # 9 Last save: Saturday, 9. December 2006

SplineLinear.doc 1 # 9 Last save: Saturday, 9. December 2006 SplineLinear.doc 1 # 9 Problem:... 2 Objective... 2 Reformulate... 2 Wording... 2 Simulating an example... 3 SPSS 13... 4 Substituting the indicator function... 4 SPSS-Syntax... 4 Remark... 4 Result...

More information

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007 STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007 LAST NAME: SOLUTIONS FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 302 STA 1001 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator.

More information

Use of Dummy (Indicator) Variables in Applied Econometrics

Use of Dummy (Indicator) Variables in Applied Econometrics Chapter 5 Use of Dummy (Indicator) Variables in Applied Econometrics Section 5.1 Introduction Use of Dummy (Indicator) Variables Model specifications in applied econometrics often necessitate the use of

More information

EXST7015: Estimating tree weights from other morphometric variables Raw data print

EXST7015: Estimating tree weights from other morphometric variables Raw data print Simple Linear Regression SAS example Page 1 1 ********************************************; 2 *** Data from Freund & Wilson (1993) ***; 3 *** TABLE 8.24 : ESTIMATING TREE WEIGHTS ***; 4 ********************************************;

More information

Analysis of Variance. Source DF Squares Square F Value Pr > F. Model <.0001 Error Corrected Total

Analysis of Variance. Source DF Squares Square F Value Pr > F. Model <.0001 Error Corrected Total Math 221: Linear Regression and Prediction Intervals S. K. Hyde Chapter 23 (Moore, 5th Ed.) (Neter, Kutner, Nachsheim, and Wasserman) The Toluca Company manufactures refrigeration equipment as well as

More information

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation.

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation. Statistical Computation Math 475 Jimin Ding Department of Mathematics Washington University in St. Louis www.math.wustl.edu/ jmding/math475/index.html October 10, 2013 Ridge Part IV October 10, 2013 1

More information

Analysis of Covariance

Analysis of Covariance Analysis of Covariance (ANCOVA) Bruce A Craig Department of Statistics Purdue University STAT 514 Topic 10 1 When to Use ANCOVA In experiment, there is a nuisance factor x that is 1 Correlated with y 2

More information

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression BSTT523: Kutner et al., Chapter 1 1 Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression Introduction: Functional relation between

More information

Topic 20: Single Factor Analysis of Variance

Topic 20: Single Factor Analysis of Variance Topic 20: Single Factor Analysis of Variance Outline Single factor Analysis of Variance One set of treatments Cell means model Factor effects model Link to linear regression using indicator explanatory

More information

STAT 350. Assignment 4

STAT 350. Assignment 4 STAT 350 Assignment 4 1. For the Mileage data in assignment 3 conduct a residual analysis and report your findings. I used the full model for this since my answers to assignment 3 suggested we needed the

More information

Statistics 512: Applied Linear Models. Topic 1

Statistics 512: Applied Linear Models. Topic 1 Topic Overview This topic will cover Course Overview & Policies SAS Statistics 512: Applied Linear Models Topic 1 KNNL Chapter 1 (emphasis on Sections 1.3, 1.6, and 1.7; much should be review) Simple linear

More information

ssh tap sas913, sas

ssh tap sas913, sas B. Kedem, STAT 430 SAS Examples SAS8 ===================== ssh xyz@glue.umd.edu, tap sas913, sas https://www.statlab.umd.edu/sasdoc/sashtml/onldoc.htm Multiple Regression ====================== 0. Show

More information

Topic 28: Unequal Replication in Two-Way ANOVA

Topic 28: Unequal Replication in Two-Way ANOVA Topic 28: Unequal Replication in Two-Way ANOVA Outline Two-way ANOVA with unequal numbers of observations in the cells Data and model Regression approach Parameter estimates Previous analyses with constant

More information

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression.

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression. 10/3/011 Functional Connectivity Correlation and Regression Variance VAR = Standard deviation Standard deviation SD = Unbiased SD = 1 10/3/011 Standard error Confidence interval SE = CI = = t value for

More information

Answer Keys to Homework#10

Answer Keys to Homework#10 Answer Keys to Homework#10 Problem 1 Use either restricted or unrestricted mixed models. Problem 2 (a) First, the respective means for the 8 level combinations are listed in the following table A B C Mean

More information

Chapter 2 Inferences in Simple Linear Regression

Chapter 2 Inferences in Simple Linear Regression STAT 525 SPRING 2018 Chapter 2 Inferences in Simple Linear Regression Professor Min Zhang Testing for Linear Relationship Term β 1 X i defines linear relationship Will then test H 0 : β 1 = 0 Test requires

More information

COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION

COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION Answer all parts. Closed book, calculators allowed. It is important to show all working,

More information

Chapter 11. Analysis of Variance (One-Way)

Chapter 11. Analysis of Variance (One-Way) Chapter 11 Analysis of Variance (One-Way) We now develop a statistical procedure for comparing the means of two or more groups, known as analysis of variance or ANOVA. These groups might be the result

More information

Assignment 9 Answer Keys

Assignment 9 Answer Keys Assignment 9 Answer Keys Problem 1 (a) First, the respective means for the 8 level combinations are listed in the following table A B C Mean 26.00 + 34.67 + 39.67 + + 49.33 + 42.33 + + 37.67 + + 54.67

More information

Logistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

Logistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Logistic Regression 1 / 38 Logistic Regression 1 Introduction

More information

171:162 Design and Analysis of Biomedical Studies, Summer 2011 Exam #3, July 16th

171:162 Design and Analysis of Biomedical Studies, Summer 2011 Exam #3, July 16th Name 171:162 Design and Analysis of Biomedical Studies, Summer 2011 Exam #3, July 16th Use the selected SAS output to help you answer the questions. The SAS output is all at the back of the exam on pages

More information

Lecture 3. Experiments with a Single Factor: ANOVA Montgomery 3.1 through 3.3

Lecture 3. Experiments with a Single Factor: ANOVA Montgomery 3.1 through 3.3 Lecture 3. Experiments with a Single Factor: ANOVA Montgomery 3.1 through 3.3 Fall, 2013 Page 1 Tensile Strength Experiment Investigate the tensile strength of a new synthetic fiber. The factor is the

More information

Lecture 3. Experiments with a Single Factor: ANOVA Montgomery 3-1 through 3-3

Lecture 3. Experiments with a Single Factor: ANOVA Montgomery 3-1 through 3-3 Lecture 3. Experiments with a Single Factor: ANOVA Montgomery 3-1 through 3-3 Page 1 Tensile Strength Experiment Investigate the tensile strength of a new synthetic fiber. The factor is the weight percent

More information

ST505/S697R: Fall Homework 2 Solution.

ST505/S697R: Fall Homework 2 Solution. ST505/S69R: Fall 2012. Homework 2 Solution. 1. 1a; problem 1.22 Below is the summary information (edited) from the regression (using R output); code at end of solution as is code and output for SAS. a)

More information

Course Information Text:

Course Information Text: Course Information Text: Special reprint of Applied Linear Statistical Models, 5th edition by Kutner, Neter, Nachtsheim, and Li, 2012. Recommended: Applied Statistics and the SAS Programming Language,

More information

Section 9c. Propensity scores. Controlling for bias & confounding in observational studies

Section 9c. Propensity scores. Controlling for bias & confounding in observational studies Section 9c Propensity scores Controlling for bias & confounding in observational studies 1 Logistic regression and propensity scores Consider comparing an outcome in two treatment groups: A vs B. In a

More information

STATISTICS 479 Exam II (100 points)

STATISTICS 479 Exam II (100 points) Name STATISTICS 79 Exam II (1 points) 1. A SAS data set was created using the following input statement: Answer parts(a) to (e) below. input State $ City $ Pop199 Income Housing Electric; (a) () Give the

More information

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction

More information

STAT 3A03 Applied Regression Analysis With SAS Fall 2017

STAT 3A03 Applied Regression Analysis With SAS Fall 2017 STAT 3A03 Applied Regression Analysis With SAS Fall 2017 Assignment 5 Solution Set Q. 1 a The code that I used and the output is as follows PROC GLM DataS3A3.Wool plotsnone; Class Amp Len Load; Model CyclesAmp

More information

PLS205!! Lab 9!! March 6, Topic 13: Covariance Analysis

PLS205!! Lab 9!! March 6, Topic 13: Covariance Analysis PLS205!! Lab 9!! March 6, 2014 Topic 13: Covariance Analysis Covariable as a tool for increasing precision Carrying out a full ANCOVA Testing ANOVA assumptions Happiness! Covariable as a Tool for Increasing

More information

1.) Fit the full model, i.e., allow for separate regression lines (different slopes and intercepts) for each species

1.) Fit the full model, i.e., allow for separate regression lines (different slopes and intercepts) for each species Lecture notes 2/22/2000 Dummy variables and extra SS F-test Page 1 Crab claw size and closing force. Problem 7.25, 10.9, and 10.10 Regression for all species at once, i.e., include dummy variables for

More information

Topic 18: Model Selection and Diagnostics

Topic 18: Model Selection and Diagnostics Topic 18: Model Selection and Diagnostics Variable Selection We want to choose a best model that is a subset of the available explanatory variables Two separate problems 1. How many explanatory variables

More information

5.3 Three-Stage Nested Design Example

5.3 Three-Stage Nested Design Example 5.3 Three-Stage Nested Design Example A researcher designs an experiment to study the of a metal alloy. A three-stage nested design was conducted that included Two alloy chemistry compositions. Three ovens

More information

Chapter 12: Multiple Regression

Chapter 12: Multiple Regression Chapter 12: Multiple Regression 12.1 a. A scatterplot of the data is given here: Plot of Drug Potency versus Dose Level Potency 0 5 10 15 20 25 30 0 5 10 15 20 25 30 35 Dose Level b. ŷ = 8.667 + 0.575x

More information

Comparison of a Population Means

Comparison of a Population Means Analysis of Variance Interested in comparing Several treatments Several levels of one treatment Comparison of a Population Means Could do numerous two-sample t-tests but... ANOVA provides method of joint

More information

1. (Problem 3.4 in OLRT)

1. (Problem 3.4 in OLRT) STAT:5201 Homework 5 Solutions 1. (Problem 3.4 in OLRT) The relationship of the untransformed data is shown below. There does appear to be a decrease in adenine with increased caffeine intake. This is

More information

Booklet of Code and Output for STAC32 Final Exam

Booklet of Code and Output for STAC32 Final Exam Booklet of Code and Output for STAC32 Final Exam December 8, 2014 List of Figures in this document by page: List of Figures 1 Popcorn data............................. 2 2 MDs by city, with normal quantile

More information

The program for the following sections follows.

The program for the following sections follows. Homework 6 nswer sheet Page 31 The program for the following sections follows. dm'log;clear;output;clear'; *************************************************************; *** EXST734 Homework Example 1

More information

Lab 11. Multilevel Models. Description of Data

Lab 11. Multilevel Models. Description of Data Lab 11 Multilevel Models Henian Chen, M.D., Ph.D. Description of Data MULTILEVEL.TXT is clustered data for 386 women distributed across 40 groups. ID: 386 women, id from 1 to 386, individual level (level

More information

Lecture 13 Extra Sums of Squares

Lecture 13 Extra Sums of Squares Lecture 13 Extra Sums of Squares STAT 512 Spring 2011 Background Reading KNNL: 7.1-7.4 13-1 Topic Overview Extra Sums of Squares (Defined) Using and Interpreting R 2 and Partial-R 2 Getting ESS and Partial-R

More information

1: a b c d e 2: a b c d e 3: a b c d e 4: a b c d e 5: a b c d e. 6: a b c d e 7: a b c d e 8: a b c d e 9: a b c d e 10: a b c d e

1: a b c d e 2: a b c d e 3: a b c d e 4: a b c d e 5: a b c d e. 6: a b c d e 7: a b c d e 8: a b c d e 9: a b c d e 10: a b c d e Economics 102: Analysis of Economic Data Cameron Spring 2016 Department of Economics, U.C.-Davis Final Exam (A) Tuesday June 7 Compulsory. Closed book. Total of 58 points and worth 45% of course grade.

More information

ST Correlation and Regression

ST Correlation and Regression Chapter 5 ST 370 - Correlation and Regression Readings: Chapter 11.1-11.4, 11.7.2-11.8, Chapter 12.1-12.2 Recap: So far we ve learned: Why we want a random sample and how to achieve it (Sampling Scheme)

More information

EXST 7015 Fall 2014 Lab 08: Polynomial Regression

EXST 7015 Fall 2014 Lab 08: Polynomial Regression EXST 7015 Fall 2014 Lab 08: Polynomial Regression OBJECTIVES Polynomial regression is a statistical modeling technique to fit the curvilinear data that either shows a maximum or a minimum in the curve,

More information

Residuals from regression on original data 1

Residuals from regression on original data 1 Residuals from regression on original data 1 Obs a b n i y 1 1 1 3 1 1 2 1 1 3 2 2 3 1 1 3 3 3 4 1 2 3 1 4 5 1 2 3 2 5 6 1 2 3 3 6 7 1 3 3 1 7 8 1 3 3 2 8 9 1 3 3 3 9 10 2 1 3 1 10 11 2 1 3 2 11 12 2 1

More information

Simple Linear Regression

Simple Linear Regression Chapter 2 Simple Linear Regression Linear Regression with One Independent Variable 2.1 Introduction In Chapter 1 we introduced the linear model as an alternative for making inferences on means of one or

More information

Lecture 1 Linear Regression with One Predictor Variable.p2

Lecture 1 Linear Regression with One Predictor Variable.p2 Lecture Linear Regression with One Predictor Variablep - Basics - Meaning of regression parameters p - β - the slope of the regression line -it indicates the change in mean of the probability distn of

More information

STA 6207 Practice Problems Nonlinear Regression

STA 6207 Practice Problems Nonlinear Regression STA 6207 Practice Problems Nonlinear Regression Q.1. A study was conducted to measure the effects of pea density (X 1, in plants/m 2 ) and volunteer barley density (X 2, in plants/m 2 ) on pea seed yield

More information

Stat 501, F. Chiaromonte. Lecture #8

Stat 501, F. Chiaromonte. Lecture #8 Stat 501, F. Chiaromonte Lecture #8 Data set: BEARS.MTW In the minitab example data sets (for description, get into the help option and search for "Data Set Description"). Wild bears were anesthetized,

More information

Problem Set 10: Panel Data

Problem Set 10: Panel Data Problem Set 10: Panel Data 1. Read in the data set, e11panel1.dta from the course website. This contains data on a sample or 1252 men and women who were asked about their hourly wage in two years, 2005

More information

Statistical Modelling in Stata 5: Linear Models

Statistical Modelling in Stata 5: Linear Models Statistical Modelling in Stata 5: Linear Models Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 07/11/2017 Structure This Week What is a linear model? How good is my model? Does

More information

Topic 2. Chapter 3: Diagnostics and Remedial Measures

Topic 2. Chapter 3: Diagnostics and Remedial Measures Topic Overview This topic will cover Regression Diagnostics Remedial Measures Statistics 512: Applied Linear Models Some other Miscellaneous Topics Topic 2 Chapter 3: Diagnostics and Remedial Measures

More information

EXAM IN TMA4255 EXPERIMENTAL DESIGN AND APPLIED STATISTICAL METHODS

EXAM IN TMA4255 EXPERIMENTAL DESIGN AND APPLIED STATISTICAL METHODS Norges teknisk naturvitenskapelige universitet Institutt for matematiske fag Side 1 av 8 Contact during exam: Bo Lindqvist Tel. 975 89 418 EXAM IN TMA4255 EXPERIMENTAL DESIGN AND APPLIED STATISTICAL METHODS

More information

Non-Linear Models. Estimating Parameters from a Non-linear Regression Model

Non-Linear Models. Estimating Parameters from a Non-linear Regression Model Non-Linear Models Mohammad Ehsanul Karim Institute of Statistical Research and training; University of Dhaka, Dhaka, Bangladesh Estimating Parameters from a Non-linear Regression Model

More information

Question 1a 1b 1c 1d 1e 2a 2b 2c 2d 2e 2f 3a 3b 3c 3d 3e 3f M ult: choice Points

Question 1a 1b 1c 1d 1e 2a 2b 2c 2d 2e 2f 3a 3b 3c 3d 3e 3f M ult: choice Points Economics 102: Analysis of Economic Data Cameron Spring 2016 May 12 Department of Economics, U.C.-Davis Second Midterm Exam (Version A) Compulsory. Closed book. Total of 30 points and worth 22.5% of course

More information

Topic 29: Three-Way ANOVA

Topic 29: Three-Way ANOVA Topic 29: Three-Way ANOVA Outline Three-way ANOVA Data Model Inference Data for three-way ANOVA Y, the response variable Factor A with levels i = 1 to a Factor B with levels j = 1 to b Factor C with levels

More information