BIOSTATS 640 Spring 2018 Unit 2. Regression and Correlation (Part 1 of 2) R Users
|
|
- Alvin Dalton
- 6 years ago
- Views:
Transcription
1 BIOSTATS 640 Spring 08 Unit. Regression and Correlation (Part of ) R Users Unit Regression and Correlation of - Practice Problems Solutions R Users. In this exercise, you will gain some practice doing a simple linear regression using an R data set called week0.rdata. This data set has n=3 observations of boiling points (Y=boiling) and temperature (X=temp). Tip In this exercise you will need to create a new variable newy = 00*log 0 (boiling) Carry out an exploratory analysis to determine whether the relationship between temperature and boiling point is better represented using (i) (ii) Y = β 0+βX or ' ' 00 log (Y) = β +βx 0 0 In developing your answer, try our hand at producing (a) Estimates of the regression line parameters (b) Analysis of variance tables (c) R (d) Scatter plot with overlay of fitted line. Complete your answer with a one paragraph text that is an interpretation of your work. Take your time with this and have fun. # Input data that you have downloaded and saved to your desktop (edit my desktop) setwd( /Users/cbigelow/Desktop ) load(file= week0.rdata ) # Look at first 6 rows of data using command head( ) head(week0) temp boiling sol_regression( of ) R USERS.docx Page of 4
2 BIOSTATS 640 Spring 08 Unit. Regression and Correlation (Part of ) R Users # I will create newvar=oldvar as follows: newvar <- dataframe$oldvar x <- week0$temp y <- week0$boiling m <- lm(y ~ x) # OR: m <- lm(boiling~temp, data=dat) # OR: m <- lm(dat$boiling ~ dat$temp) summary(m) Call: lm(formula = y ~ x) Residuals: Min Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) e- 4 *** x < e- 6 *** Signif. codes: 0 '***' 0.00 '**' 0.0 '*' 0.05 '.' 0. ' ' Residual standard error:.57 on 9 degrees of freedom Multiple R- squared: 0.907, Adjusted R- squared: F- statistic: on and 9 DF, p- value: <.e- 6 m$coefficients # output the intercept and slope (Intercept) x # confint(m) # output the CI for the parameters in the fitted model anova(m) Analysis of Variance Table Response: y Df Sum Sq Mean Sq F value Pr(>F) x <.e- 6 *** Residuals Signif. codes: 0 '***' 0.00 '**' 0.0 '*' 0.05 '.' 0. ' ' plot(x, y, main="simple Linear Regression of Y=boiling on X=temp", xlab="temp", ylab="boiling", pch=8) # pch means "plot character", can be to 8; abline(m, col="red", lty=, lwd=) # lty: "line type"; lwd: "line width". sol_regression( of ) R USERS.docx Page of 4
3 BIOSTATS 640 Spring 08 Unit. Regression and Correlation (Part of ) R Users From the above summary(m) output, we know that: (a): Parameter Estimates: and 0.44, so regression line: Y= *X; (b): ANOVA (Analysis of Variance) table is obtained above using the anova(m) code; (c): R-square=0.907; (d): The scatterplot with the fitted line is plotted above. (). Now, instead of Y=boiling, we want to use newy=00*log0(boiling) newy=00*log0(y) m <- lm(newy ~ x) summary(m) Call: lm(formula = newy ~ x) Residuals: Min Q Median 3Q Max Coefficients: sol_regression( of ) R USERS.docx Page 3 of 4
4 BIOSTATS 640 Spring 08 Unit. Regression and Correlation (Part of ) R Users Estimate Std. Error t value Pr(> t ) (Intercept) *** x e- 5 *** Signif. codes: 0 '***' 0.00 '**' 0.0 '*' 0.05 '.' 0. ' ' Residual standard error:.96 on 9 degrees of freedom Multiple R- squared: 0.885, Adjusted R- squared: F- statistic: 3.7 on and 9 DF, p- value: 3.63e- 5 m$coefficients # output the intercept and slope (Intercept) x # confint(m) # output the CI for the parameters in the fitted model anova(m) Analysis of Variance Table Response: newy Df Sum Sq Mean Sq F value Pr(>F) x e- 5 *** Residuals Signif. codes: 0 '***' 0.00 '**' 0.0 '*' 0.05 '.' 0. ' ' plot(x, newy, main="simple Linear Regression of Y=00*log0(boiling) on X=temp", xlab="te mp", ylab="00*log0(boiling)", pch=8) # pch means "plot character", can be to 8; abline(m, col="red", lwd=) # lwd means "line width" sol_regression( of ) R USERS.docx Page 4 of 4
5 BIOSTATS 640 Spring 08 Unit. Regression and Correlation (Part of ) R Users From the above summary(m) output, we know that: (a): Parameter Estimates: and 0.93, so regression line: 00log0(Y)= *X; (b): ANOVA (Analysis of Variance) table is obtained above using the anova(m) code; (c): R-square=0.885; (d): The scatterplot with the fitted line is plotted above. Solution (one paragraph of text that is interpretation of analysis): Did you notice that the scatter plot of these data reveal two outlying values? Their inclusion may or may not be appropriate. If all n=3 data points are included in the analysis, then the model that explains more of the variability in boiling point is Y=boiling point modeled linearly in X=temp. It has a greater R^ (9.07% vs. 88.5%). Be careful - It would not make sense to compare the residual mean squares of the two models because the scales of measurement involved are different. sol_regression( of ) R USERS.docx Page 5 of 4
6 BIOSTATS 640 Spring 08 Unit. Regression and Correlation (Part of ) R Users. A psychiatrist wants to know whether the level of pathology (Y) in psychotic patients 6 months after treatment could be predicted with reasonable accuracy from knowledge of pretreatment symptom ratings of thinking disturbance (X ) and hostile suspiciousness (X ). (a) The least squares estimation equation involving both independent variables is given by Y = (X ) 7.47(X ) Using this equation, determine the predicted level of pathology (Y) for a patient with pretreatment scores of.80 on thinking disturbance and 7.0 on hostile suspiciousness. How does the predicted value obtained compare with the actual value of 5 observed for this patient? Ŷ = X X with X =.80 and X =7.0 Ŷ = 5.53 This value is lower than the observed value of 5 (b) Using the analysis of variance tables below, carry out the overall regression F tests for models containing both X and X, X alone, and X alone. Source DF Sum of Squares Regression on X 546 Residual 5 46 Source DF Sum of Squares Regression on X 60 Residual sol_regression( of ) R USERS.docx Page 6 of 4
7 BIOSTATS 640 Spring 08 Unit. Regression and Correlation (Part of ) R Users Source DF Sum of Squares Regression on X, X 784 Residual Model Containing X and X F = SSQ regression on X and X /Regression df SSQ residual / Residual df =,784 /,008 / 50 = 6.37 on DF=,50 p-value= à Application of the null hypothesis model has led to an extremely unlikely result (p-value =.00356), prompting statistical rejection of the null hypothesis. The fitted linear model in X and X explains statistically significantly more of the variability in level of pathology (Y) than is explained by Y (the intercept model) alone. Model Containing X ALONE F = SSQ Regression on X /DF Regression SSQ residual/df Residual = 546 /,46 / 5 = on DF=,5 p-value=0.047 Here, too, application of the null hypothesis model has led to an extremely unlikely result (p-value =.04), prompting statistical rejection of the null hypothesis. The fitted linear model in X explains statistically significantly more of the variability in level of pathology (Y) than is explained by Y (the intercept model) alone. Model Containing X ALONE F = SSQ Regresion on X /DF Regression SSQ Residual/DF Residual = 60 / 3,63 / 5 = on DF=,5 p-value= Here, application of the null hypothesis model has not led to an extremely unlikely result (p-value =.44). The null hypothesis is therefore not rejected. The fitted linear model in X does not explain statistically significantly more of the variability in level of pathology (Y) than is explained by Y (the intercept model) alone. (c) Based on your results in part (b), how would you rate the importance of the two variables in predicting Y? X explains a significant proportion of the variability in Y when modelled as a linear predictor. X does not. (However, we don't know if a different functional form might have been important.) sol_regression( of ) R USERS.docx Page 7 of 4
8 BIOSTATS 640 Spring 08 Unit. Regression and Correlation (Part of ) R Users (d) What are the R values for the three regressions referred to in part (b)? Total SSQ= (Regression SSQ) + (Residual SSQ) is constant. Therefore total SSQ can be calculated from just one anova table: Total (SSQ)=,546 +,46 = 3,79 ( ) R X only = (Re gression SSQ)/(Total SSQ) R (X only) = (60)/(3,79) = 0.06 ( ) ( ) = (546)/(3,79) = 0. R Xand X = 784 /(3, 79) = 0.09 (e) What is the best model involving either one or both of the two independent variables? Eliminate from consideration model with X only. Compare model with X alone versus X and X using partial F test. Partial F = {(SSQ Regression on X,X ) - (SSQ Regression on X )}/VDF = SSQ Residual for model w X,X /Residual DF = on DF =,50 P-value =0.06 Addition of X to model containing X is statistically significant (p-value =.0). More appropriate model includes X and X ( ) / (,008) / 50 sol_regression( of ) R USERS.docx Page 8 of 4
9 BIOSTATS 640 Spring 08 Unit. Regression and Correlation (Part of ) R Users 3. In an experiment to describe the toxic action of a certain chemical on silkworm larvae, the relationship of log 0 (dose) and log 0 (larva weight) to log 0 (survival) was sought. The data, obtained by feeding each larva a precisely measured dose of the chemical in an aqueous solution and then recording the survival time (ie time until death) are given in the table. Also given are relevant computer results and the analysis of variance table. Larva Y = log 0 (survival time) X =log 0 (dose) X =log 0 (weight) Larva Y = log 0 (survival time) X =log 0 (dose) X =log 0 (weight) Y = (X ) Y = (X ) Y = (X ) + ).87 (X ) Source DF Sum of Squares Regression on X Residual Source DF Sum of Squares Regression on X Residual Source DF Sum of Squares Regression on X, X Residual sol_regression( of ) R USERS.docx Page 9 of 4
10 BIOSTATS 640 Spring 08 Unit. Regression and Correlation (Part of ) R Users (a) Test for the significance of the overall regression involving both independent variables X and X. X and X F = (SSQ regression on X and X ) / = (0.464) / = 59.8 on DF =, (SSQ Residual) / (0.047) / P value < Application of the null hypothesis model has led to an extremely unlikely result (p-value =.000), prompting statistical rejection of the null hypothesis. The fitted linear model in X and X explains statistically significantly more of the variability in log 0 (survival time) (Y) than is explained by Y (the intercept model) alone. (b) Test to see whether using X alone significantly helps in predicting survival time. X alone F = (0.3633) / (0.480) / 3 = (SSQ Regression on X ) / = 3.95 on DF =,3 (SSQ Residual) / 3 P value = Application of the null hypothesis model has led to an extremely unlikely result (p-value =.00008), prompting statistical rejection of the null hypothesis. The fitted linear model in X explains statistically significantly more of the variability in log 0 (survival time) (Y) than is explained by Y (the intercept model) alone. (c) Test to see whether using X alone significantly helps in predicting survival time. X alone F = (SSQ Regression on X ) / (SSQ Residual) / 3 P value = = (0.3367) / = 5.07 on DF =,3 (0.746) / 3 Application of the null hypothesis model has led to an extremely unlikely result (p-value =.0007), prompting statistical rejection of the null hypothesis. The fitted linear model in X explains statistically significantly more of the variability in log 0 (survival time) (Y) than is explained by Y (the intercept model) alone. sol_regression( of ) R USERS.docx Page 0 of 4
11 BIOSTATS 640 Spring 08 Unit. Regression and Correlation (Part of ) R Users (d) Compute R for each of the three models. TotalSSQ = 0.53 ( ) R X and X = / 0.53 = R (X alone) = / 0.53 = ( ) R X alone = / 0.53 = (e) Which independent predictor do you consider to be the best single predictor of survival time? Using just the criteria of the overall F test and comparison of containing X is better. R, the single predictor model (f) Which model involving one or both of the independent predictors do you prefer and why? Partial F for comparing model with Xalone versus model with X and X ( ΔRegressionSSQ)/ΔReg DF ( = = )/ - ( ResidualSSQ-model w X, X )/ResidualDF-model w X, X 0.047/ = on DF=, P-value= Choose model with both X and X ( ) sol_regression( of ) R USERS.docx Page of 4
12 BIOSTATS 640 Spring 08 Unit. Regression and Correlation (Part of ) R Users 4. Using R, try your hand at reproducing the analysis of variance tables you worked with in problem #3. load(file= larvae.rdata ) dat <- larvae # model regression on x alone m <- lm(y ~ x, data=dat) summary(m) Call: lm(formula = y ~ x, data = dat) Residuals: Min Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) e- 5 *** x e- 05 *** Signif. codes: 0 '***' 0.00 '**' 0.0 '*' 0.05 '.' 0. ' ' Residual standard error: on 3 degrees of freedom Multiple R- squared: 0.705, Adjusted R- squared: F- statistic: 3.9 on and 3 DF, p- value: 7.944e- 05 anova(m) Analysis of Variance Table Response: y Df Sum Sq Mean Sq F value Pr(>F) x e- 05 *** Residuals Signif. codes: 0 '***' 0.00 '**' 0.0 '*' 0.05 '.' 0. ' ' sol_regression( of ) R USERS.docx Page of 4
13 BIOSTATS 640 Spring 08 Unit. Regression and Correlation (Part of ) R Users # model regression on x alone m <- lm(y ~ x, data=dat) summary(m) Call: lm(formula = y ~ x, data = dat) Residuals: Min Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) e- 3 *** x *** Signif. codes: 0 '***' 0.00 '**' 0.0 '*' 0.05 '.' 0. ' ' Residual standard error: 0.59 on 3 degrees of freedom Multiple R- squared: , Adjusted R- squared: 0.63 F- statistic: 5.07 on and 3 DF, p- value: anova(m) Analysis of Variance Table Response: y Df Sum Sq Mean Sq F value Pr(>F) x *** Residuals Signif. codes: 0 '***' 0.00 '**' 0.0 '*' 0.05 '.' 0. ' ' sol_regression( of ) R USERS.docx Page 3 of 4
14 BIOSTATS 640 Spring 08 Unit. Regression and Correlation (Part of ) R Users # model regression on x and x m3 <- lm(y ~ x + x, data=dat) summary(m3) Call: lm(formula = y ~ x + x, data = dat) Residuals: Min Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) e- 3 *** x e- 05 *** x *** Signif. codes: 0 '***' 0.00 '**' 0.0 '*' 0.05 '.' 0. ' ' Residual standard error: on degrees of freedom Multiple R- squared: , Adjusted R- squared: F- statistic: 59.8 on and DF, p- value: 6.085e- 07 anova(m3) Analysis of Variance Table Response: y Df Sum Sq Mean Sq F value Pr(>F) x e- 07 *** x *** Residuals Signif. codes: 0 '***' 0.00 '**' 0.0 '*' 0.05 '.' 0. ' ' sol_regression( of ) R USERS.docx Page 4 of 4
BIOSTATS 640 Spring 2018 Unit 2. Regression and Correlation (Part 1 of 2) STATA Users
Unit Regression and Correlation 1 of - Practice Problems Solutions Stata Users 1. In this exercise, you will gain some practice doing a simple linear regression using a Stata data set called week0.dta.
More informationUnit 2 Regression and Correlation Practice Problems. SOLUTIONS Version STATA
PubHlth 640. Regression and Correlation Page 1 of 19 Unit Regression and Correlation Practice Problems SOLUTIONS Version STATA 1. A regression analysis of measurements of a dependent variable Y on an independent
More informationUnit 2 Regression and Correlation WEEK 3 - Practice Problems. Due: Wednesday February 10, 2016
BIOSTATS 640 Spring 2016 2. Regression and Correlation Page 1 of 5 Unit 2 Regression and Correlation WEEK 3 - Practice Problems Due: Wednesday February 10, 2016 1. A psychiatrist wants to know whether
More informationSTAT 3022 Spring 2007
Simple Linear Regression Example These commands reproduce what we did in class. You should enter these in R and see what they do. Start by typing > set.seed(42) to reset the random number generator so
More information36-707: Regression Analysis Homework Solutions. Homework 3
36-707: Regression Analysis Homework Solutions Homework 3 Fall 2012 Problem 1 Y i = βx i + ɛ i, i {1, 2,..., n}. (a) Find the LS estimator of β: RSS = Σ n i=1(y i βx i ) 2 RSS β = Σ n i=1( 2X i )(Y i βx
More informationInference for Regression
Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu
More informationRegression and Models with Multiple Factors. Ch. 17, 18
Regression and Models with Multiple Factors Ch. 17, 18 Mass 15 20 25 Scatter Plot 70 75 80 Snout-Vent Length Mass 15 20 25 Linear Regression 70 75 80 Snout-Vent Length Least-squares The method of least
More informationST430 Exam 2 Solutions
ST430 Exam 2 Solutions Date: November 9, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textbook are permitted but you may use a calculator. Giving
More informationMultiple Regression Introduction to Statistics Using R (Psychology 9041B)
Multiple Regression Introduction to Statistics Using R (Psychology 9041B) Paul Gribble Winter, 2016 1 Correlation, Regression & Multiple Regression 1.1 Bivariate correlation The Pearson product-moment
More informationST430 Exam 1 with Answers
ST430 Exam 1 with Answers Date: October 5, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textook are permitted but you may use a calculator.
More informationLab 3 A Quick Introduction to Multiple Linear Regression Psychology The Multiple Linear Regression Model
Lab 3 A Quick Introduction to Multiple Linear Regression Psychology 310 Instructions.Work through the lab, saving the output as you go. You will be submitting your assignment as an R Markdown document.
More informationDealing with Heteroskedasticity
Dealing with Heteroskedasticity James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Dealing with Heteroskedasticity 1 / 27 Dealing
More informationRegression and the 2-Sample t
Regression and the 2-Sample t James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Regression and the 2-Sample t 1 / 44 Regression
More informationVariance Decomposition and Goodness of Fit
Variance Decomposition and Goodness of Fit 1. Example: Monthly Earnings and Years of Education In this tutorial, we will focus on an example that explores the relationship between total monthly earnings
More informationSCHOOL OF MATHEMATICS AND STATISTICS
RESTRICTED OPEN BOOK EXAMINATION (Not to be removed from the examination hall) Data provided: Statistics Tables by H.R. Neave MAS5052 SCHOOL OF MATHEMATICS AND STATISTICS Basic Statistics Spring Semester
More informationSecond Midterm Exam Name: Solutions March 19, 2014
Math 3080 1. Treibergs σιι Second Midterm Exam Name: Solutions March 19, 2014 (1. The article Withdrawl Strength of Threaded Nails, in Journal of Structural Engineering, 2001, describes an experiment to
More informationTests of Linear Restrictions
Tests of Linear Restrictions 1. Linear Restricted in Regression Models In this tutorial, we consider tests on general linear restrictions on regression coefficients. In other tutorials, we examine some
More informationChapter 16: Understanding Relationships Numerical Data
Chapter 16: Understanding Relationships Numerical Data These notes reflect material from our text, Statistics, Learning from Data, First Edition, by Roxy Peck, published by CENGAGE Learning, 2015. Linear
More informationMODELS WITHOUT AN INTERCEPT
Consider the balanced two factor design MODELS WITHOUT AN INTERCEPT Factor A 3 levels, indexed j 0, 1, 2; Factor B 5 levels, indexed l 0, 1, 2, 3, 4; n jl 4 replicate observations for each factor level
More informationDensity Temp vs Ratio. temp
Temp Ratio Density 0.00 0.02 0.04 0.06 0.08 0.10 0.12 Density 0.0 0.2 0.4 0.6 0.8 1.0 1. (a) 170 175 180 185 temp 1.0 1.5 2.0 2.5 3.0 ratio The histogram shows that the temperature measures have two peaks,
More informationNature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.
Understanding regression output from software Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals In 1966 Cyril Burt published a paper called The genetic determination of differences
More informationUNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test, October 2013
UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test, October 2013 STAC67H3 Regression Analysis Duration: One hour and fifty minutes Last Name: First Name: Student
More informationWeek 7 Multiple factors. Ch , Some miscellaneous parts
Week 7 Multiple factors Ch. 18-19, Some miscellaneous parts Multiple Factors Most experiments will involve multiple factors, some of which will be nuisance variables Dealing with these factors requires
More informationComparing Nested Models
Comparing Nested Models ST 370 Two regression models are called nested if one contains all the predictors of the other, and some additional predictors. For example, the first-order model in two independent
More informationBiostatistics 380 Multiple Regression 1. Multiple Regression
Biostatistics 0 Multiple Regression ORIGIN 0 Multiple Regression Multiple Regression is an extension of the technique of linear regression to describe the relationship between a single dependent (response)
More informationMultiple Regression: Example
Multiple Regression: Example Cobb-Douglas Production Function The Cobb-Douglas production function for observed economic data i = 1,..., n may be expressed as where O i is output l i is labour input c
More informationUNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017
UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Tuesday, January 17, 2017 Work all problems 60 points are needed to pass at the Masters Level and 75
More informationIntroduction and Single Predictor Regression. Correlation
Introduction and Single Predictor Regression Dr. J. Kyle Roberts Southern Methodist University Simmons School of Education and Human Development Department of Teaching and Learning Correlation A correlation
More informationR 2 and F -Tests and ANOVA
R 2 and F -Tests and ANOVA December 6, 2018 1 Partition of Sums of Squares The distance from any point y i in a collection of data, to the mean of the data ȳ, is the deviation, written as y i ȳ. Definition.
More informationPractice 2 due today. Assignment from Berndt due Monday. If you double the number of programmers the amount of time it takes doubles. Huh?
Admistrivia Practice 2 due today. Assignment from Berndt due Monday. 1 Story: Pair programming Mythical man month If you double the number of programmers the amount of time it takes doubles. Huh? Invention
More informationVariance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017
Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017 PDF file location: http://www.murraylax.org/rtutorials/regression_anovatable.pdf
More information1.) Fit the full model, i.e., allow for separate regression lines (different slopes and intercepts) for each species
Lecture notes 2/22/2000 Dummy variables and extra SS F-test Page 1 Crab claw size and closing force. Problem 7.25, 10.9, and 10.10 Regression for all species at once, i.e., include dummy variables for
More information1 Multiple Regression
1 Multiple Regression In this section, we extend the linear model to the case of several quantitative explanatory variables. There are many issues involved in this problem and this section serves only
More informationLecture 18: Simple Linear Regression
Lecture 18: Simple Linear Regression BIOS 553 Department of Biostatistics University of Michigan Fall 2004 The Correlation Coefficient: r The correlation coefficient (r) is a number that measures the strength
More informationBIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression
BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression Introduction to Correlation and Regression The procedures discussed in the previous ANOVA labs are most useful in cases where we are interested
More informationRegression. Marc H. Mehlman University of New Haven
Regression Marc H. Mehlman marcmehlman@yahoo.com University of New Haven the statistician knows that in nature there never was a normal distribution, there never was a straight line, yet with normal and
More informationStatistics - Lecture Three. Linear Models. Charlotte Wickham 1.
Statistics - Lecture Three Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Linear Models 1. The Theory 2. Practical Use 3. How to do it in R 4. An example 5. Extensions
More informationcor(dataset$measurement1, dataset$measurement2, method= pearson ) cor.test(datavector1, datavector2, method= pearson )
Tutorial 7: Correlation and Regression Correlation Used to test whether two variables are linearly associated. A correlation coefficient (r) indicates the strength and direction of the association. A correlation
More informationInferences on Linear Combinations of Coefficients
Inferences on Linear Combinations of Coefficients Note on required packages: The following code required the package multcomp to test hypotheses on linear combinations of regression coefficients. If you
More informationExample: Poisondata. 22s:152 Applied Linear Regression. Chapter 8: ANOVA
s:5 Applied Linear Regression Chapter 8: ANOVA Two-way ANOVA Used to compare populations means when the populations are classified by two factors (or categorical variables) For example sex and occupation
More informationLecture 19: Inference for SLR & Transformations
Lecture 19: Inference for SLR & Transformations Statistics 101 Mine Çetinkaya-Rundel April 3, 2012 Announcements Announcements HW 7 due Thursday. Correlation guessing game - ends on April 12 at noon. Winner
More informationSwarthmore Honors Exam 2012: Statistics
Swarthmore Honors Exam 2012: Statistics 1 Swarthmore Honors Exam 2012: Statistics John W. Emerson, Yale University NAME: Instructions: This is a closed-book three-hour exam having six questions. You may
More informationMATH 644: Regression Analysis Methods
MATH 644: Regression Analysis Methods FINAL EXAM Fall, 2012 INSTRUCTIONS TO STUDENTS: 1. This test contains SIX questions. It comprises ELEVEN printed pages. 2. Answer ALL questions for a total of 100
More informationFigure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim
0.0 1.0 1.5 2.0 2.5 3.0 8 10 12 14 16 18 20 22 y x Figure 1: The fitted line using the shipment route-number of ampules data STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim Problem#
More informationGarvan Ins)tute Biosta)s)cal Workshop 16/6/2015. Tuan V. Nguyen. Garvan Ins)tute of Medical Research Sydney, Australia
Garvan Ins)tute Biosta)s)cal Workshop 16/6/2015 Tuan V. Nguyen Tuan V. Nguyen Garvan Ins)tute of Medical Research Sydney, Australia Introduction to linear regression analysis Purposes Ideas of regression
More informationStat 412/512 TWO WAY ANOVA. Charlotte Wickham. stat512.cwick.co.nz. Feb
Stat 42/52 TWO WAY ANOVA Feb 6 25 Charlotte Wickham stat52.cwick.co.nz Roadmap DONE: Understand what a multiple regression model is. Know how to do inference on single and multiple parameters. Some extra
More informationApplied Regression Analysis
Applied Regression Analysis Chapter 3 Multiple Linear Regression Hongcheng Li April, 6, 2013 Recall simple linear regression 1 Recall simple linear regression 2 Parameter Estimation 3 Interpretations of
More information> modlyq <- lm(ly poly(x,2,raw=true)) > summary(modlyq) Call: lm(formula = ly poly(x, 2, raw = TRUE))
School of Mathematical Sciences MTH5120 Statistical Modelling I Tutorial 4 Solutions The first two models were looked at last week and both had flaws. The output for the third model with log y and a quadratic
More informationCh 2: Simple Linear Regression
Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component
More informationRegression, Part I. - In correlation, it would be irrelevant if we changed the axes on our graph.
Regression, Part I I. Difference from correlation. II. Basic idea: A) Correlation describes the relationship between two variables, where neither is independent or a predictor. - In correlation, it would
More informationStat 5102 Final Exam May 14, 2015
Stat 5102 Final Exam May 14, 2015 Name Student ID The exam is closed book and closed notes. You may use three 8 1 11 2 sheets of paper with formulas, etc. You may also use the handouts on brand name distributions
More information1 Use of indicator random variables. (Chapter 8)
1 Use of indicator random variables. (Chapter 8) let I(A) = 1 if the event A occurs, and I(A) = 0 otherwise. I(A) is referred to as the indicator of the event A. The notation I A is often used. 1 2 Fitting
More informationChapter 8: Correlation & Regression
Chapter 8: Correlation & Regression We can think of ANOVA and the two-sample t-test as applicable to situations where there is a response variable which is quantitative, and another variable that indicates
More informationHandout 4: Simple Linear Regression
Handout 4: Simple Linear Regression By: Brandon Berman The following problem comes from Kokoska s Introductory Statistics: A Problem-Solving Approach. The data can be read in to R using the following code:
More informationStat 5031 Quadratic Response Surface Methods (QRSM) Sanford Weisberg November 30, 2015
Stat 5031 Quadratic Response Surface Methods (QRSM) Sanford Weisberg November 30, 2015 One Variable x = spacing of plants (either 4, 8 12 or 16 inches), and y = plant yield (bushels per acre). Each condition
More informationIntroduction and Background to Multilevel Analysis
Introduction and Background to Multilevel Analysis Dr. J. Kyle Roberts Southern Methodist University Simmons School of Education and Human Development Department of Teaching and Learning Background and
More informationQuantitative Understanding in Biology Module II: Model Parameter Estimation Lecture I: Linear Correlation and Regression
Quantitative Understanding in Biology Module II: Model Parameter Estimation Lecture I: Linear Correlation and Regression Correlation Linear correlation and linear regression are often confused, mostly
More informationNo other aids are allowed. For example you are not allowed to have any other textbook or past exams.
UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Sample Exam Note: This is one of our past exams, In fact the only past exam with R. Before that we were using SAS. In
More informationExercise 2 SISG Association Mapping
Exercise 2 SISG Association Mapping Load the bpdata.csv data file into your R session. LHON.txt data file into your R session. Can read the data directly from the website if your computer is connected
More informationChapter 8 Conclusion
1 Chapter 8 Conclusion Three questions about test scores (score) and student-teacher ratio (str): a) After controlling for differences in economic characteristics of different districts, does the effect
More information1 The Classic Bivariate Least Squares Model
Review of Bivariate Linear Regression Contents 1 The Classic Bivariate Least Squares Model 1 1.1 The Setup............................... 1 1.2 An Example Predicting Kids IQ................. 1 2 Evaluating
More informationCAS MA575 Linear Models
CAS MA575 Linear Models Boston University, Fall 2013 Midterm Exam (Correction) Instructor: Cedric Ginestet Date: 22 Oct 2013. Maximal Score: 200pts. Please Note: You will only be graded on work and answers
More informationWorkshop 7.4a: Single factor ANOVA
-1- Workshop 7.4a: Single factor ANOVA Murray Logan November 23, 2016 Table of contents 1 Revision 1 2 Anova Parameterization 2 3 Partitioning of variance (ANOVA) 10 4 Worked Examples 13 1. Revision 1.1.
More informationInference with Simple Regression
1 Introduction Inference with Simple Regression Alan B. Gelder 06E:071, The University of Iowa 1 Moving to infinite means: In this course we have seen one-mean problems, twomean problems, and problems
More informationSTAT 215 Confidence and Prediction Intervals in Regression
STAT 215 Confidence and Prediction Intervals in Regression Colin Reimer Dawson Oberlin College 24 October 2016 Outline Regression Slope Inference Partitioning Variability Prediction Intervals Reminder:
More informationLinear Model Specification in R
Linear Model Specification in R How to deal with overparameterisation? Paul Janssen 1 Luc Duchateau 2 1 Center for Statistics Hasselt University, Belgium 2 Faculty of Veterinary Medicine Ghent University,
More informationExtensions of One-Way ANOVA.
Extensions of One-Way ANOVA http://www.pelagicos.net/classes_biometry_fa18.htm What do I want You to Know What are two main limitations of ANOVA? What two approaches can follow a significant ANOVA? How
More informationRecall that a measure of fit is the sum of squared residuals: where. The F-test statistic may be written as:
1 Joint hypotheses The null and alternative hypotheses can usually be interpreted as a restricted model ( ) and an model ( ). In our example: Note that if the model fits significantly better than the restricted
More information13 Simple Linear Regression
B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 3 Simple Linear Regression 3. An industrial example A study was undertaken to determine the effect of stirring rate on the amount of impurity
More informationExtensions of One-Way ANOVA.
Extensions of One-Way ANOVA http://www.pelagicos.net/classes_biometry_fa17.htm What do I want You to Know What are two main limitations of ANOVA? What two approaches can follow a significant ANOVA? How
More informationSimple Linear Regression
Simple Linear Regression MATH 282A Introduction to Computational Statistics University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/ eariasca/math282a.html MATH 282A University
More informationChapter 12: Linear regression II
Chapter 12: Linear regression II Timothy Hanson Department of Statistics, University of South Carolina Stat 205: Elementary Statistics for the Biological and Life Sciences 1 / 14 12.4 The regression model
More informationStudy Sheet. December 10, The course PDF has been updated (6/11). Read the new one.
Study Sheet December 10, 2017 The course PDF has been updated (6/11). Read the new one. 1 Definitions to know The mode:= the class or center of the class with the highest frequency. The median : Q 2 is
More informationBMI 541/699 Lecture 22
BMI 541/699 Lecture 22 Where we are: 1. Introduction and Experimental Design 2. Exploratory Data Analysis 3. Probability 4. T-based methods for continous variables 5. Power and sample size for t-based
More informationDiagnostics and Transformations Part 2
Diagnostics and Transformations Part 2 Bivariate Linear Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University Multilevel Regression Modeling, 2009 Diagnostics
More informationFoundations of Correlation and Regression
BWH - Biostatistics Intermediate Biostatistics for Medical Researchers Robert Goldman Professor of Statistics Simmons College Foundations of Correlation and Regression Tuesday, March 7, 2017 March 7 Foundations
More informationMultiple Linear Regression (solutions to exercises)
Chapter 6 1 Chapter 6 Multiple Linear Regression (solutions to exercises) Chapter 6 CONTENTS 2 Contents 6 Multiple Linear Regression (solutions to exercises) 1 6.1 Nitrate concentration..........................
More informationStatistiek II. John Nerbonne. March 17, Dept of Information Science incl. important reworkings by Harmut Fitz
Dept of Information Science j.nerbonne@rug.nl incl. important reworkings by Harmut Fitz March 17, 2015 Review: regression compares result on two distinct tests, e.g., geographic and phonetic distance of
More informationLinear Regression. In this lecture we will study a particular type of regression model: the linear regression model
1 Linear Regression 2 Linear Regression In this lecture we will study a particular type of regression model: the linear regression model We will first consider the case of the model with one predictor
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 15: Examples of hypothesis tests (v5) Ramesh Johari ramesh.johari@stanford.edu 1 / 32 The recipe 2 / 32 The hypothesis testing recipe In this lecture we repeatedly apply the
More informationRegression Analysis Chapter 2 Simple Linear Regression
Regression Analysis Chapter 2 Simple Linear Regression Dr. Bisher Mamoun Iqelan biqelan@iugaza.edu.ps Department of Mathematics The Islamic University of Gaza 2010-2011, Semester 2 Dr. Bisher M. Iqelan
More informationScatter plot of data from the study. Linear Regression
1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25
More informationMultiple Linear Regression
Multiple Linear Regression ST 430/514 Recall: a regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates).
More informationLinear Modelling: Simple Regression
Linear Modelling: Simple Regression 10 th of Ma 2018 R. Nicholls / D.-L. Couturier / M. Fernandes Introduction: ANOVA Used for testing hpotheses regarding differences between groups Considers the variation
More informationInferences for Regression
Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In
More informationMath 2311 Written Homework 6 (Sections )
Math 2311 Written Homework 6 (Sections 5.4 5.6) Name: PeopleSoft ID: Instructions: Homework will NOT be accepted through email or in person. Homework must be submitted through CourseWare BEFORE the deadline.
More informationSTK4900/ Lecture 3. Program
STK4900/9900 - Lecture 3 Program 1. Multiple regression: Data structure and basic questions 2. The multiple linear regression model 3. Categorical predictors 4. Planned experiments and observational studies
More informationSTAT 350 Final (new Material) Review Problems Key Spring 2016
1. The editor of a statistics textbook would like to plan for the next edition. A key variable is the number of pages that will be in the final version. Text files are prepared by the authors using LaTeX,
More informationExample: 1982 State SAT Scores (First year state by state data available)
Lecture 11 Review Section 3.5 from last Monday (on board) Overview of today s example (on board) Section 3.6, Continued: Nested F tests, review on board first Section 3.4: Interaction for quantitative
More informationAnalysis of variance. Gilles Guillot. September 30, Gilles Guillot September 30, / 29
Analysis of variance Gilles Guillot gigu@dtu.dk September 30, 2013 Gilles Guillot (gigu@dtu.dk) September 30, 2013 1 / 29 1 Introductory example 2 One-way ANOVA 3 Two-way ANOVA 4 Two-way ANOVA with interactions
More informationLecture 18 Miscellaneous Topics in Multiple Regression
Lecture 18 Miscellaneous Topics in Multiple Regression STAT 512 Spring 2011 Background Reading KNNL: 8.1-8.5,10.1, 11, 12 18-1 Topic Overview Polynomial Models (8.1) Interaction Models (8.2) Qualitative
More information(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box.
FINAL EXAM ** Two different ways to submit your answer sheet (i) Use MS-Word and place it in a drop-box. (ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box. Deadline: December
More informationChapter 3 - Linear Regression
Chapter 3 - Linear Regression Lab Solution 1 Problem 9 First we will read the Auto" data. Note that most datasets referred to in the text are in the R package the authors developed. So we just need to
More informationSTAT 572 Assignment 5 - Answers Due: March 2, 2007
1. The file glue.txt contains a data set with the results of an experiment on the dry sheer strength (in pounds per square inch) of birch plywood, bonded with 5 different resin glues A, B, C, D, and E.
More informationBooklet of Code and Output for STAC32 Final Exam
Booklet of Code and Output for STAC32 Final Exam December 7, 2017 Figure captions are below the Figures they refer to. LowCalorie LowFat LowCarbo Control 8 2 3 2 9 4 5 2 6 3 4-1 7 5 2 0 3 1 3 3 Figure
More informationLAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION
LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION In this lab you will learn how to use Excel to display the relationship between two quantitative variables, measure the strength and direction of the
More informationSTAT 571A Advanced Statistical Regression Analysis. Chapter 8 NOTES Quantitative and Qualitative Predictors for MLR
STAT 571A Advanced Statistical Regression Analysis Chapter 8 NOTES Quantitative and Qualitative Predictors for MLR 2015 University of Arizona Statistics GIDP. All rights reserved, except where previous
More informationUnit 9 Regression and Correlation Homework #14 (Unit 9 Regression and Correlation) SOLUTIONS. X = cigarette consumption (per capita in 1930)
BIOSTATS 540 Fall 2015 Introductory Biostatistics Page 1 of 10 Unit 9 Regression and Correlation Homework #14 (Unit 9 Regression and Correlation) SOLUTIONS Consider the following study of the relationship
More informationUnit 6 - Introduction to linear regression
Unit 6 - Introduction to linear regression Suggested reading: OpenIntro Statistics, Chapter 7 Suggested exercises: Part 1 - Relationship between two numerical variables: 7.7, 7.9, 7.11, 7.13, 7.15, 7.25,
More informationInference for Regression Simple Linear Regression
Inference for Regression Simple Linear Regression IPS Chapter 10.1 2009 W.H. Freeman and Company Objectives (IPS Chapter 10.1) Simple linear regression p Statistical model for linear regression p Estimating
More information