Solutions - Homework #2

Size: px
Start display at page:

Download "Solutions - Homework #2"

Transcription

1 Solutions - Homework #2 1. Problem 1: Biological Recovery (a) A scatterplot of the biological recovery percentages versus time is given to the right. In viewing this plot, there is negative, slightly nonlinear relationship between recovery percentage and time. There also appears to be more variability in the recoveries at early times in that the relationship is most clearly defined at greater times. Hence, there appears to be some variance heterogeneity present. Biological Recovery Percentage Scatterplot of Recovery Percentage vs. Time (b) A scatterplot of the log biological recovery percentages versus time is given to the right. In viewing this plot, the relationship is still negative as before, but is now fairly linear. Additionally, the amount of variability about this linear pattern appears to be homogeneous over all times indicating variance homogeneity for this relationship. Hence, a simple linear regression model on this log scale seems more appropriate than fitting a nonlinear model on the original scale due to the variance heterogeneity on the raw scale. (c) The simple linear regression model y i = β +β 1 x i +ǫ i (with y = log biological recovery and x = time) was fit to these data, producing the following parameter estimates Log Biological Recovery Percentage Time (in minutes) Scatterplot of Log Recovery vs. Time Time (in minutes) for the intercept and slope: β = & β 1 =.368. The R 2 -statistic as reported from the MatLaboutputwas R 2 =.9287,meaningthat92.9% of the variation in log recovery percentages was explained by the model on time. The relevant output from the tstats regression output structure is shown below and all code used in this problem is given at the end of these solutions. Coefficients: Value Std. Error t value p-value Intercept <.5 times <.5 Multiple R-Squared:.9287 F-statistic: on 1 and 11 degrees of freedom, p-value = e-7 (d) An estimate of σ 2 is given by the mean square error (MSE), which is reported in MatLab as out.mse in the out regression structure. Hence, MSE =.431. The standard errors of β and β 1 are given in the Coefficients table above as: SE( β ) =.188 & SE( β 1 ) =.31. Using these standard errors and computing the t-critical value with n p = 13 2 = 11 degrees 1

2 of freedom at the 99% level (t.995 (11) = 3.158), individual 99% confidence intervals for β & β 1 were computed via MatLab as: For β : β ±t 11 SE( β ) = ±3.158(.188)= ±.3379 (3.63, 4.3). For β 1 : β1 ±t 11 SE( β 1 ) =.368±3.158(.31)=.368±.96 (-.464, -.273). Hence, we are 99% confident that the slope of the regression of log recovery percentage on time is between 3.63 and 4.3 log(%) per minute. We are also (individually) 99% confident that the log recovery percentage at time is between and log(%). Since neither confidence interval contains the value, then both β and β 1 appear to be significant (significantly different from ) in this model. (e) The confidence bands were computed using the confregplot function from the course webpage, as shown in the scatterplot to the right below with the fitted regression line. The prediction bands are also plotted. To get at the gains in precision in estimating the mean of y for different values of x, the table to the left below gives the margins of error in the confidence interval for E(y) for several values of x. As expected, the further we get from the mean x-value of 3, the greater the variability in estimating E(y). To be 5 more precise, the margin of error at either 4.5 extreme of the times ( or 6) is roughly 41% larger (.29 to.41) than that at the 4 middle time (3 minutes). x Margin of Error Log Recovery Percentage Fitted Line Confidence Bands Prediction Bands Time (in minutes) 2. Problem 2: Logistic reparameterization: Suppose we begin with the logistic growth model parameterized as: Mu u(t) = u +(M u )exp( kt) where (M,u,k) are the model parameters. First, divide all terms in this model by u. Doing so gives: u(t) = = M ( ) M u 1+ exp( kt) u M [ ( M 1+exp log 1 u ) ] kt ( since M [ ( )]) M 1 = exp log 1. u u ( ) M M Defining a = log 1 and b = k, this can be written: u(t) =, as desired. u 1+exp(a+bt) 2

3 3. Problem 3: Weibull fit problem 14 Growth vs. Time (a) A scatterplot of the growth amounts vs. time is given to the right. In viewing this plot, the growth amounts initially increase very rapidly but seem to level off around 1, orgs./.5ml. It is also worth noting that there seems to be more variability in these growth amounts as their values increase. The resulting pattern may be well-described by an exponential growth curve, although we may require the additional flexibility provided by the Weibull model. Growth (orgs./ml) (b) Using the Weibull growth model parameterized as: y i = α{1 exp[ (t i /σ) γ ]}+ǫ i, Time (in days) we first recognize that α is the upper asymptote of the curve, since as time increases, the exponential piece goes to zero. Eyeballing where this limit occurs from the scatterplot, we choose α = 11, as the starting value for α in a nonlinear least squares fit. Picking two points from the scatterplot, we can see that (x,y) = (2,248) & (6,944) roughly fit the curved pattern seen. Substituting these values into the Weibull model gives the following pair of equations: 248 = 11{1 exp[ (2/σ) γ ]} and 944 = 11{1 exp[ (6/σ) γ ]}. Solving this system of two equations in the two unknowns (σ,γ) gives: 852 = 11exp[ (2/σ) γ ] 156 = 11exp[ (6/σ) γ ] = ln(852/11)= (2/σ)γ ln(156/11)= (6/σ) γ (dividing the equations) [ ] ln(852/11) = γ = ln /ln(1/3) = 1.85 ln(156/11) = ln( ln(156/11)) = ln(6/σ) γ = σ = 6exp[ ln( ln(156/11))/γ] = ln(852/11) ln(156/11) = 2γ 6 γ = 6 exp[ ln( ln(156/11))/1.85] = Hence, the starting values I used to fit the Weibull model were: (α,σ,γ) = (11,4.18,1.85). (c) With these starting values found above, nlinfit was used to fit a Weibull growth curve to these data, resulting in the fitted curve plotted above. The resulting parameter estimates are: α = 1,369, σ = , γ = (d) A residual plot and normal quantile plot are shown below. In viewing these plots, the residual plot exhibits a potential pattern with most of the data appearing at the largest predicted values. However, this is a byproduct of having roughly 1 of the values at the asymptote of 1,. The points at this largest predicted value do have more variability possibly indicating variance heterogeneity in the larger growth values. The normal quantile plot shows a reasonably linear relationship between the residuals and the standard normal quantiles, indicating no serious departures from normality for the residuals. 3

4 3 Residual Plot 3 Normal Quantile Plot Residuals 1 5 Residuals Predicted Values Standard Normal Quantiles (e) The nlparci function in MatLab was used to find individual 95% confidence intervals for the three model parameters. The resulting intervals are reported below along with the by hand calculations using the standard errors reported in the next part. For each confidence interval to be at the 95% level individually, since there are n = 16 data pairs and p = 3 parameters being estimated, there are n p = 13 degrees of freedom available. Under the assumption that the sampling distributions of the parameter estimates are normal, we use a t-based confidence interval for α,σ, and γ. The critical t-value is: t = t 13 (.975) = as found via MatLab. This resulted in the following set of 3 individual 95% confidence intervals for the 3 parameters: For α : α±t SE( α) = 1369±2.164( )= 1369±74 = (9629,1119), For σ : σ ±t SE( σ) = ±2.164(.2939)= ±.635 = (3.3526,4.6226), For γ : γ ±t SE( γ) = ±2.164(.5892)= ±1.273= (1.211,3.7569). The confidence interval for α can be interpreted as: We are 95% confident that the true value for α in the Weibull model relating growth amount to time is between 9629 and More preceisely, we are 95% confident that the maximum growth reached is between 9629 and 1119 orgs./ml. The others can be interpreted similarly. To have 3 confidence intervals simultaneously at the 95% level, we need to use the Bonferroni correction as discussed briefly in class. To do this, instead of finding the critical t-value at the.975 percentile of the t-distribution with 13 degrees of freedom, we divide the lower tail (.25) by 3 (the number of CIs desired) to get a tail probability of.25/3 =.83 and then compute t.9917 (13) to get the critical value. Doing so gives t.9917 (13) = and a wider set of CIs: For α : (9428,1131) For σ : (3.181,4.795) For γ : (.866,4.12) in which we are simultaneously 95% confident that the three parameters fall inside their respective intervals. (f) As indicated in the previous part, the operative t-critical value is: t = t 13 (.975) = The lengths of the three confidence intervals divided by 2 give the margins of error for the three intervals. Dividing these margins of error by t gives the standard errors for each of the three parameter estimates. These calculations are summarized in the table below. 4

5 Parameter Estimate CI Margin of Error Standard Error α 1369 (9629,1119) ( )/2 = 74 74/t = σ (3.3526,4.6226) ( )/2= /t =.2939 γ (1.211,3.757) ( )/2= /t =.5893 (g) Since the confidence interval for γ clearly does not include the value γ = 1, then we have evidence at the.5-level that γ differs from 1 and that this extra Weibull parameter is significant in the model. If we instead consider the Bonferroni-corrected 95% confidence intervals (which is really the correct thing to do), since the CI for γ contains the value 1 (albeit barely), we would conclude that this parameter is unnecessary in the model If we omit this parameter, which we are justified in doing, we are left with the 2-parameter exponential growth model which was fully discussed in class. We would then conclude that the model is overspecified. clear all; MatLab Code Used % ============================================================ % % Problem 1: Plot of biological recovery percentages vs. time % % ============================================================ % load../data/cloud.mat % Reads in the data time = cloud.time; % Renames the time variable recov = cloud.recovery; % Renames the recovery variable figure(1) plot(time,recov, ko ) % Plots recoveries vs. time xlim([-5 65]) % Sets x-limits on plot xlabel( Time (in minutes), fontsize,14) ylabel( Biological Recovery Percentage, fontsize,14) title( Scatterplot of Recovery Percentage vs. Time, fontsize,14, fontweight, b ) figure(2) logrecov = log(recov); % Log of recoveries plot(time,logrecov, ko ) % Plots log recovery vs. time xlim([-5,65]) % Sets x-limits on plot xlabel( Time (in minutes), fontsize,14) ylabel( Log Biological Recovery Percentage, fontsize,14) title( Scatterplot of Log Recovery vs. Time, fontsize,14, fontweight, b ) % ===================================================================== % % Problem 1: Regression of log biological recovery percentages vs. time % % ===================================================================== % out = regstats(logrecov,time); % Regresses recovery (y) on time (x) out.tstat % Requests relevant parameter estimate info out.rsquare % Multiple R-Squared Value out.mse % Mean Squared Error (MSE) out.fstat % Model F-Statistic, df, pval n = length(time); % Sample size p = 2; % Defines # parameters bhat = out.beta; % Vector of parameter estimates 5

6 seb = sqrt(diag(out.covb)); tbonf = tinv((1-.5),n-p); ci_b = [bhat-tbonf*seb,... bhat+tbonf*seb]; ci_b % Vector of standard errors % 99% uncorrected t*-value % Confidence intervals for betas % in 2 columns (lower,upper) % Prints confidence intervals % ================================================= % % Problem 1: Confidence bands for E(log recoveries) % % ================================================= % xlab = Time (in minutes) ; % X-axis label ylab = Log Recovery Percentage ; % Y-axis label % The confregplot function plots the confidence and prediction % bands for E(y). This function requires as inputs the x & y % variables, labels for these variables, and the confidence level. confregplot(time,logrecov,xlab,ylab,99) % =============================================== % % Problem 3: Weibull fit to Growth Data with plot % % =============================================== % load../data/paramecium.mat; % Loads the growth data growth = paramecium.growth; % Defines growth vector time = paramecium.time; % Defines time vector plot(time,growth, ko ) % Plots growth vs. time xlim([ 16]) % x-axis plotting limits xlabel( Time (in days), fontsize,14, fontweight, b ); ylabel( Growth (orgs./ml), fontsize,14, fontweight, b ); title( Growth vs. Time, fontsize,14, fontweight, b ); hold on; % Hold the current plot % ======================================================= % % Problem 3c - Nonlinear Weibull model fit to growth data % % ======================================================= % beta1 = [ ]; % Parameter starting values [betahat1 resid1 J1] = nlinfit(time,... % Performs nonlinear Weibull fit growth,@weibull,beta1); % returning betahats, resids time1 = :.1:16; % Vector of times from to 16 yhat1 = betahat1(1)*(1-exp(-(time1./... % Computes Weibull predicted betahat1(2)).^betahat1(3))); % values (yhat s) plot(time1,yhat1); % Plots the fitted line hold off; % End hold on current plot nlintool(time,growth,@weibull,beta1); % Plots 95% confidence bands % =========================================================== % % Problem 3d - Residual and normal quantile plot of residuals % % =========================================================== % figure(1) % 1st Figure yhat.weib = growth - resid1; % Computes Weibull predicted values plot(yhat.weib,resid1, ko ); % Plots residuals vs. predicted y-values xlabel( Predicted Values, fontsize,14, fontweight, b ); 6

7 ylabel( Residuals, fontsize,14, fontweight, b ); title( Residual Plot, fontsize,14, fontweight, b ); figure(2); % 2nd Figure qqplot(resid1); % Normal quantile plot of residuals xlabel( Standard Normal Quantiles, fontsize,14, fontweight, b ); ylabel( Residuals, fontsize,14, fontweight, b ); title( Normal Quantile Plot, fontsize,14, fontweight, b ); % ======================================================================= % % Problem 3e,f - Individual 95% Confidence intervals for the 3 parameters % % ======================================================================= % ci = nlparci(betahat1,resid1,j1); % Computes CIs for (alpha,sigma,gamma) moe = (ci(:,2)-ci(:,1))/2; % Margin of error from CIs se = moe/tinv(.975,13); % Standard errors from CIs 7

Solutions - Homework #2

Solutions - Homework #2 45 Scatterplot of Abundance vs. Relative Density Parasite Abundance 4 35 3 5 5 5 5 5 Relative Host Population Density Figure : 3 Scatterplot of Log Abundance vs. Log RD Log Parasite Abundance 3.5.5.5.5.5

More information

The General Linear Model

The General Linear Model The General Linear Model Thus far, we have discussed measures of uncertainty for the estimated parameters ( β 0, β 1 ) and responses ( y h ) from the simple linear regression model We would like to extend

More information

The General Linear Model

The General Linear Model The General Linear Model Thus far, we have discussed measures of uncertainty for the estimated parameters ( β 0, β 1 ) and responses ( y h ) from the simple linear regression model We would like to extend

More information

Density Temp vs Ratio. temp

Density Temp vs Ratio. temp Temp Ratio Density 0.00 0.02 0.04 0.06 0.08 0.10 0.12 Density 0.0 0.2 0.4 0.6 0.8 1.0 1. (a) 170 175 180 185 temp 1.0 1.5 2.0 2.5 3.0 ratio The histogram shows that the temperature measures have two peaks,

More information

Analysis of Variance (ANOVA)

Analysis of Variance (ANOVA) Analysis of Variance (ANOVA) Much of statistical inference centers around the ability to distinguish between two or more groups in terms of some underlying response variable y. Sometimes, there are but

More information

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007 STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007 LAST NAME: SOLUTIONS FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 302 STA 1001 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator.

More information

Statistics 572 Semester Review

Statistics 572 Semester Review Statistics 572 Semester Review Final Exam Information: The final exam is Friday, May 16, 10:05-12:05, in Social Science 6104. The format will be 8 True/False and explains questions (3 pts. each/ 24 pts.

More information

Lecture 18: Simple Linear Regression

Lecture 18: Simple Linear Regression Lecture 18: Simple Linear Regression BIOS 553 Department of Biostatistics University of Michigan Fall 2004 The Correlation Coefficient: r The correlation coefficient (r) is a number that measures the strength

More information

Project Two. James K. Peterson. March 26, Department of Biological Sciences and Department of Mathematical Sciences Clemson University

Project Two. James K. Peterson. March 26, Department of Biological Sciences and Department of Mathematical Sciences Clemson University Project Two James K. Peterson Department of Biological Sciences and Department of Mathematical Sciences Clemson University March 26, 2019 Outline 1 Cooling Models 2 Estimating the Cooling Rate k 3 Typical

More information

Inference in Normal Regression Model. Dr. Frank Wood

Inference in Normal Regression Model. Dr. Frank Wood Inference in Normal Regression Model Dr. Frank Wood Remember We know that the point estimator of b 1 is b 1 = (Xi X )(Y i Ȳ ) (Xi X ) 2 Last class we derived the sampling distribution of b 1, it being

More information

ST505/S697R: Fall Homework 2 Solution.

ST505/S697R: Fall Homework 2 Solution. ST505/S69R: Fall 2012. Homework 2 Solution. 1. 1a; problem 1.22 Below is the summary information (edited) from the regression (using R output); code at end of solution as is code and output for SAS. a)

More information

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X. Estimating σ 2 We can do simple prediction of Y and estimation of the mean of Y at any value of X. To perform inferences about our regression line, we must estimate σ 2, the variance of the error term.

More information

Handout 4: Simple Linear Regression

Handout 4: Simple Linear Regression Handout 4: Simple Linear Regression By: Brandon Berman The following problem comes from Kokoska s Introductory Statistics: A Problem-Solving Approach. The data can be read in to R using the following code:

More information

STAT 350. Assignment 4

STAT 350. Assignment 4 STAT 350 Assignment 4 1. For the Mileage data in assignment 3 conduct a residual analysis and report your findings. I used the full model for this since my answers to assignment 3 suggested we needed the

More information

9 Correlation and Regression

9 Correlation and Regression 9 Correlation and Regression SW, Chapter 12. Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then retakes the

More information

Chapter 8 (More on Assumptions for the Simple Linear Regression)

Chapter 8 (More on Assumptions for the Simple Linear Regression) EXST3201 Chapter 8b Geaghan Fall 2005: Page 1 Chapter 8 (More on Assumptions for the Simple Linear Regression) Your textbook considers the following assumptions: Linearity This is not something I usually

More information

Unit 6 - Simple linear regression

Unit 6 - Simple linear regression Sta 101: Data Analysis and Statistical Inference Dr. Çetinkaya-Rundel Unit 6 - Simple linear regression LO 1. Define the explanatory variable as the independent variable (predictor), and the response variable

More information

Solutions - Homework #1

Solutions - Homework #1 Solutions - Homework #1 1. Problem 1: Below appears a summary of the paper The pattern of a host-parasite distribution by Schmid & Robinson (197). Using the gnat Culicoides crepuscularis as a host specimen

More information

Chapter 14 Simple Linear Regression (A)

Chapter 14 Simple Linear Regression (A) Chapter 14 Simple Linear Regression (A) 1. Characteristics Managerial decisions often are based on the relationship between two or more variables. can be used to develop an equation showing how the variables

More information

Project Two. Outline. James K. Peterson. March 27, Cooling Models. Estimating the Cooling Rate k. Typical Cooling Project Matlab Session

Project Two. Outline. James K. Peterson. March 27, Cooling Models. Estimating the Cooling Rate k. Typical Cooling Project Matlab Session Project Two James K. Peterson Department of Biological Sciences and Department of Mathematical Sciences Clemson University March 27, 2018 Outline Cooling Models Estimating the Cooling Rate k Typical Cooling

More information

Data Set 8: Laysan Finch Beak Widths

Data Set 8: Laysan Finch Beak Widths Data Set 8: Finch Beak Widths Statistical Setting This handout describes an analysis of covariance (ANCOVA) involving one categorical independent variable (with only two levels) and one quantitative covariate.

More information

Biostatistics 380 Multiple Regression 1. Multiple Regression

Biostatistics 380 Multiple Regression 1. Multiple Regression Biostatistics 0 Multiple Regression ORIGIN 0 Multiple Regression Multiple Regression is an extension of the technique of linear regression to describe the relationship between a single dependent (response)

More information

EXST Regression Techniques Page 1. We can also test the hypothesis H :" œ 0 versus H :"

EXST Regression Techniques Page 1. We can also test the hypothesis H : œ 0 versus H : EXST704 - Regression Techniques Page 1 Using F tests instead of t-tests We can also test the hypothesis H :" œ 0 versus H :" Á 0 with an F test.! " " " F œ MSRegression MSError This test is mathematically

More information

Inference for the Regression Coefficient

Inference for the Regression Coefficient Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression line. We can shows that b 0 and b 1 are the unbiased estimates

More information

Lecture 2 Linear Regression: A Model for the Mean. Sharyn O Halloran

Lecture 2 Linear Regression: A Model for the Mean. Sharyn O Halloran Lecture 2 Linear Regression: A Model for the Mean Sharyn O Halloran Closer Look at: Linear Regression Model Least squares procedure Inferential tools Confidence and Prediction Intervals Assumptions Robustness

More information

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont. TCELL 9/4/205 36-309/749 Experimental Design for Behavioral and Social Sciences Simple Regression Example Male black wheatear birds carry stones to the nest as a form of sexual display. Soler et al. wanted

More information

MatLab Code for Simple Convex Minimization

MatLab Code for Simple Convex Minimization MatLab Code for Simple Convex Minimization, 1-8, 2017 Draft Version April 25, 2018 MatLab Code for Simple Convex Minimization James K. Peterson 1 * Abstract We look at MatLab code to solve a simple L 1

More information

STAT Chapter 11: Regression

STAT Chapter 11: Regression STAT 515 -- Chapter 11: Regression Mostly we have studied the behavior of a single random variable. Often, however, we gather data on two random variables. We wish to determine: Is there a relationship

More information

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics January, 2018

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics January, 2018 UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics January, 2018 Work all problems. 60 points needed to pass at the Masters level, 75 to pass at the PhD

More information

Any of 27 linear and nonlinear models may be fit. The output parallels that of the Simple Regression procedure.

Any of 27 linear and nonlinear models may be fit. The output parallels that of the Simple Regression procedure. STATGRAPHICS Rev. 9/13/213 Calibration Models Summary... 1 Data Input... 3 Analysis Summary... 5 Analysis Options... 7 Plot of Fitted Model... 9 Predicted Values... 1 Confidence Intervals... 11 Observed

More information

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression 36-309/749 Experimental Design for Behavioral and Social Sciences Sep. 22, 2015 Lecture 4: Linear Regression TCELL Simple Regression Example Male black wheatear birds carry stones to the nest as a form

More information

Psychology 282 Lecture #4 Outline Inferences in SLR

Psychology 282 Lecture #4 Outline Inferences in SLR Psychology 282 Lecture #4 Outline Inferences in SLR Assumptions To this point we have not had to make any distributional assumptions. Principle of least squares requires no assumptions. Can use correlations

More information

Inference for Regression Simple Linear Regression

Inference for Regression Simple Linear Regression Inference for Regression Simple Linear Regression IPS Chapter 10.1 2009 W.H. Freeman and Company Objectives (IPS Chapter 10.1) Simple linear regression p Statistical model for linear regression p Estimating

More information

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference. Understanding regression output from software Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals In 1966 Cyril Burt published a paper called The genetic determination of differences

More information

Business Statistics. Lecture 10: Course Review

Business Statistics. Lecture 10: Course Review Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,

More information

Introduction and Single Predictor Regression. Correlation

Introduction and Single Predictor Regression. Correlation Introduction and Single Predictor Regression Dr. J. Kyle Roberts Southern Methodist University Simmons School of Education and Human Development Department of Teaching and Learning Correlation A correlation

More information

INFERENCE FOR REGRESSION

INFERENCE FOR REGRESSION CHAPTER 3 INFERENCE FOR REGRESSION OVERVIEW In Chapter 5 of the textbook, we first encountered regression. The assumptions that describe the regression model we use in this chapter are the following. We

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression Reading: Hoff Chapter 9 November 4, 2009 Problem Data: Observe pairs (Y i,x i ),i = 1,... n Response or dependent variable Y Predictor or independent variable X GOALS: Exploring

More information

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z). Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z). For example P(X 1.04) =.8508. For z < 0 subtract the value from

More information

Final Exam. Name: Solution:

Final Exam. Name: Solution: Final Exam. Name: Instructions. Answer all questions on the exam. Open books, open notes, but no electronic devices. The first 13 problems are worth 5 points each. The rest are worth 1 point each. HW1.

More information

1 A Review of Correlation and Regression

1 A Review of Correlation and Regression 1 A Review of Correlation and Regression SW, Chapter 12 Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then

More information

Inference for Regression Inference about the Regression Model and Using the Regression Line

Inference for Regression Inference about the Regression Model and Using the Regression Line Inference for Regression Inference about the Regression Model and Using the Regression Line PBS Chapter 10.1 and 10.2 2009 W.H. Freeman and Company Objectives (PBS Chapter 10.1 and 10.2) Inference about

More information

ST430 Exam 1 with Answers

ST430 Exam 1 with Answers ST430 Exam 1 with Answers Date: October 5, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textook are permitted but you may use a calculator.

More information

a. The least squares estimators of intercept and slope are (from JMP output): b 0 = 6.25 b 1 =

a. The least squares estimators of intercept and slope are (from JMP output): b 0 = 6.25 b 1 = Stat 28 Fall 2004 Key to Homework Exercise.10 a. There is evidence of a linear trend: winning times appear to decrease with year. A straight-line model for predicting winning times based on year is: Winning

More information

The scatterplot is the basic tool for graphically displaying bivariate quantitative data.

The scatterplot is the basic tool for graphically displaying bivariate quantitative data. Bivariate Data: Graphical Display The scatterplot is the basic tool for graphically displaying bivariate quantitative data. Example: Some investors think that the performance of the stock market in January

More information

Introduction to Linear regression analysis. Part 2. Model comparisons

Introduction to Linear regression analysis. Part 2. Model comparisons Introduction to Linear regression analysis Part Model comparisons 1 ANOVA for regression Total variation in Y SS Total = Variation explained by regression with X SS Regression + Residual variation SS Residual

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression September 24, 2008 Reading HH 8, GIll 4 Simple Linear Regression p.1/20 Problem Data: Observe pairs (Y i,x i ),i = 1,...n Response or dependent variable Y Predictor or independent

More information

The Big Picture. Model Modifications. Example (cont.) Bacteria Count Example

The Big Picture. Model Modifications. Example (cont.) Bacteria Count Example The Big Picture Remedies after Model Diagnostics The Big Picture Model Modifications Bret Larget Departments of Botany and of Statistics University of Wisconsin Madison February 6, 2007 Residual plots

More information

Bivariate Data: Graphical Display The scatterplot is the basic tool for graphically displaying bivariate quantitative data.

Bivariate Data: Graphical Display The scatterplot is the basic tool for graphically displaying bivariate quantitative data. Bivariate Data: Graphical Display The scatterplot is the basic tool for graphically displaying bivariate quantitative data. Example: Some investors think that the performance of the stock market in January

More information

Ch Inference for Linear Regression

Ch Inference for Linear Regression Ch. 12-1 Inference for Linear Regression ACT = 6.71 + 5.17(GPA) For every increase of 1 in GPA, we predict the ACT score to increase by 5.17. population regression line β (true slope) μ y = α + βx mean

More information

CAS MA575 Linear Models

CAS MA575 Linear Models CAS MA575 Linear Models Boston University, Fall 2013 Midterm Exam (Correction) Instructor: Cedric Ginestet Date: 22 Oct 2013. Maximal Score: 200pts. Please Note: You will only be graded on work and answers

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression ST 370 Regression models are used to study the relationship of a response variable and one or more predictors. The response is also called the dependent variable, and the predictors

More information

Homework 2: Simple Linear Regression

Homework 2: Simple Linear Regression STAT 4385 Applied Regression Analysis Homework : Simple Linear Regression (Simple Linear Regression) Thirty (n = 30) College graduates who have recently entered the job market. For each student, the CGPA

More information

Statistical Test of the Global Warming Hypothesis

Statistical Test of the Global Warming Hypothesis Statistical Test of the Global Warming Hypothesis http://www.leapcad.com/climate_analysis/statistical_test_of_global_warming.xmcd Statistical test of how well Solar Insolation and CO2 variation explain

More information

Inferences for Regression

Inferences for Regression Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In

More information

Model Modifications. Bret Larget. Departments of Botany and of Statistics University of Wisconsin Madison. February 6, 2007

Model Modifications. Bret Larget. Departments of Botany and of Statistics University of Wisconsin Madison. February 6, 2007 Model Modifications Bret Larget Departments of Botany and of Statistics University of Wisconsin Madison February 6, 2007 Statistics 572 (Spring 2007) Model Modifications February 6, 2007 1 / 20 The Big

More information

What to do if Assumptions are Violated?

What to do if Assumptions are Violated? What to do if Assumptions are Violated? Abandon simple linear regression for something else (usually more complicated). Some examples of alternative models: weighted least square appropriate model if the

More information

Unit 6 - Introduction to linear regression

Unit 6 - Introduction to linear regression Unit 6 - Introduction to linear regression Suggested reading: OpenIntro Statistics, Chapter 7 Suggested exercises: Part 1 - Relationship between two numerical variables: 7.7, 7.9, 7.11, 7.13, 7.15, 7.25,

More information

Ratio of Polynomials Fit Many Variables

Ratio of Polynomials Fit Many Variables Chapter 376 Ratio of Polynomials Fit Many Variables Introduction This program fits a model that is the ratio of two polynomials of up to fifth order. Instead of a single independent variable, these polynomials

More information

SOCY5601 Handout 8, Fall DETECTING CURVILINEARITY (continued) CONDITIONAL EFFECTS PLOTS

SOCY5601 Handout 8, Fall DETECTING CURVILINEARITY (continued) CONDITIONAL EFFECTS PLOTS SOCY5601 DETECTING CURVILINEARITY (continued) CONDITIONAL EFFECTS PLOTS More on use of X 2 terms to detect curvilinearity: As we have said, a quick way to detect curvilinearity in the relationship between

More information

Review for Final Exam Stat 205: Statistics for the Life Sciences

Review for Final Exam Stat 205: Statistics for the Life Sciences Review for Final Exam Stat 205: Statistics for the Life Sciences Tim Hanson, Ph.D. University of South Carolina T. Hanson (USC) Stat 205: Statistics for the Life Sciences 1 / 20 Overview of Final Exam

More information

Analysis of Variance

Analysis of Variance Statistical Techniques II EXST7015 Analysis of Variance 15a_ANOVA_Introduction 1 Design The simplest model for Analysis of Variance (ANOVA) is the CRD, the Completely Randomized Design This model is also

More information

Introduction to Simple Linear Regression

Introduction to Simple Linear Regression Introduction to Simple Linear Regression 1. Regression Equation A simple linear regression (also known as a bivariate regression) is a linear equation describing the relationship between an explanatory

More information

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

 M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2 Notation and Equations for Final Exam Symbol Definition X The variable we measure in a scientific study n The size of the sample N The size of the population M The mean of the sample µ The mean of the

More information

Statistics 512: Solution to Homework#11. Problems 1-3 refer to the soybean sausage dataset of Problem 20.8 (ch21pr08.dat).

Statistics 512: Solution to Homework#11. Problems 1-3 refer to the soybean sausage dataset of Problem 20.8 (ch21pr08.dat). Statistics 512: Solution to Homework#11 Problems 1-3 refer to the soybean sausage dataset of Problem 20.8 (ch21pr08.dat). 1. Perform the two-way ANOVA without interaction for this model. Use the results

More information

UNIVERSITY OF TORONTO Faculty of Arts and Science

UNIVERSITY OF TORONTO Faculty of Arts and Science UNIVERSITY OF TORONTO Faculty of Arts and Science December 2013 Final Examination STA442H1F/2101HF Methods of Applied Statistics Jerry Brunner Duration - 3 hours Aids: Calculator Model(s): Any calculator

More information

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company Multiple Regression Inference for Multiple Regression and A Case Study IPS Chapters 11.1 and 11.2 2009 W.H. Freeman and Company Objectives (IPS Chapters 11.1 and 11.2) Multiple regression Data for multiple

More information

SSR = The sum of squared errors measures how much Y varies around the regression line n. It happily turns out that SSR + SSE = SSTO.

SSR = The sum of squared errors measures how much Y varies around the regression line n. It happily turns out that SSR + SSE = SSTO. Analysis of variance approach to regression If x is useless, i.e. β 1 = 0, then E(Y i ) = β 0. In this case β 0 is estimated by Ȳ. The ith deviation about this grand mean can be written: deviation about

More information

R 2 and F -Tests and ANOVA

R 2 and F -Tests and ANOVA R 2 and F -Tests and ANOVA December 6, 2018 1 Partition of Sums of Squares The distance from any point y i in a collection of data, to the mean of the data ȳ, is the deviation, written as y i ȳ. Definition.

More information

13 Simple Linear Regression

13 Simple Linear Regression B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 3 Simple Linear Regression 3. An industrial example A study was undertaken to determine the effect of stirring rate on the amount of impurity

More information

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics Exploring Data: Distributions Look for overall pattern (shape, center, spread) and deviations (outliers). Mean (use a calculator): x = x 1 + x

More information

STAT 350 Final (new Material) Review Problems Key Spring 2016

STAT 350 Final (new Material) Review Problems Key Spring 2016 1. The editor of a statistics textbook would like to plan for the next edition. A key variable is the number of pages that will be in the final version. Text files are prepared by the authors using LaTeX,

More information

STAT 3022 Spring 2007

STAT 3022 Spring 2007 Simple Linear Regression Example These commands reproduce what we did in class. You should enter these in R and see what they do. Start by typing > set.seed(42) to reset the random number generator so

More information

SAS Procedures Inference about the Line ffl model statement in proc reg has many options ffl To construct confidence intervals use alpha=, clm, cli, c

SAS Procedures Inference about the Line ffl model statement in proc reg has many options ffl To construct confidence intervals use alpha=, clm, cli, c Inference About the Slope ffl As with all estimates, ^fi1 subject to sampling var ffl Because Y jx _ Normal, the estimate ^fi1 _ Normal A linear combination of indep Normals is Normal Simple Linear Regression

More information

Stat 101 Exam 1 Important Formulas and Concepts 1

Stat 101 Exam 1 Important Formulas and Concepts 1 1 Chapter 1 1.1 Definitions Stat 101 Exam 1 Important Formulas and Concepts 1 1. Data Any collection of numbers, characters, images, or other items that provide information about something. 2. Categorical/Qualitative

More information

Topic 14: Inference in Multiple Regression

Topic 14: Inference in Multiple Regression Topic 14: Inference in Multiple Regression Outline Review multiple linear regression Inference of regression coefficients Application to book example Inference of mean Application to book example Inference

More information

Chapter 1 Linear Regression with One Predictor

Chapter 1 Linear Regression with One Predictor STAT 525 FALL 2018 Chapter 1 Linear Regression with One Predictor Professor Min Zhang Goals of Regression Analysis Serve three purposes Describes an association between X and Y In some applications, the

More information

15: Regression. Introduction

15: Regression. Introduction 15: Regression Introduction Regression Model Inference About the Slope Introduction As with correlation, regression is used to analyze the relation between two continuous (scale) variables. However, regression

More information

Math 3330: Solution to midterm Exam

Math 3330: Solution to midterm Exam Math 3330: Solution to midterm Exam Question 1: (14 marks) Suppose the regression model is y i = β 0 + β 1 x i + ε i, i = 1,, n, where ε i are iid Normal distribution N(0, σ 2 ). a. (2 marks) Compute the

More information

STATISTICS 479 Exam II (100 points)

STATISTICS 479 Exam II (100 points) Name STATISTICS 79 Exam II (1 points) 1. A SAS data set was created using the following input statement: Answer parts(a) to (e) below. input State $ City $ Pop199 Income Housing Electric; (a) () Give the

More information

9. Linear Regression and Correlation

9. Linear Regression and Correlation 9. Linear Regression and Correlation Data: y a quantitative response variable x a quantitative explanatory variable (Chap. 8: Recall that both variables were categorical) For example, y = annual income,

More information

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data July 2012 Bangkok, Thailand Cosimo Beverelli (World Trade Organization) 1 Content a) Classical regression model b)

More information

AMS 7 Correlation and Regression Lecture 8

AMS 7 Correlation and Regression Lecture 8 AMS 7 Correlation and Regression Lecture 8 Department of Applied Mathematics and Statistics, University of California, Santa Cruz Suumer 2014 1 / 18 Correlation pairs of continuous observations. Correlation

More information

10 Model Checking and Regression Diagnostics

10 Model Checking and Regression Diagnostics 10 Model Checking and Regression Diagnostics The simple linear regression model is usually written as i = β 0 + β 1 i + ɛ i where the ɛ i s are independent normal random variables with mean 0 and variance

More information

TOPIC 9 SIMPLE REGRESSION & CORRELATION

TOPIC 9 SIMPLE REGRESSION & CORRELATION TOPIC 9 SIMPLE REGRESSION & CORRELATION Basic Linear Relationships Mathematical representation: Y = a + bx X is the independent variable [the variable whose value we can choose, or the input variable].

More information

ECON 497 Midterm Spring

ECON 497 Midterm Spring ECON 497 Midterm Spring 2009 1 ECON 497: Economic Research and Forecasting Name: Spring 2009 Bellas Midterm You have three hours and twenty minutes to complete this exam. Answer all questions and explain

More information

ST 512-Practice Exam I - Osborne Directions: Answer questions as directed. For true/false questions, circle either true or false.

ST 512-Practice Exam I - Osborne Directions: Answer questions as directed. For true/false questions, circle either true or false. ST 512-Practice Exam I - Osborne Directions: Answer questions as directed. For true/false questions, circle either true or false. 1. A study was carried out to examine the relationship between the number

More information

Introduction to Linear Regression

Introduction to Linear Regression Introduction to Linear Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Introduction to Linear Regression 1 / 46

More information

CHAPTER 5 FUNCTIONAL FORMS OF REGRESSION MODELS

CHAPTER 5 FUNCTIONAL FORMS OF REGRESSION MODELS CHAPTER 5 FUNCTIONAL FORMS OF REGRESSION MODELS QUESTIONS 5.1. (a) In a log-log model the dependent and all explanatory variables are in the logarithmic form. (b) In the log-lin model the dependent variable

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

Example: Four levels of herbicide strength in an experiment on dry weight of treated plants.

Example: Four levels of herbicide strength in an experiment on dry weight of treated plants. The idea of ANOVA Reminders: A factor is a variable that can take one of several levels used to differentiate one group from another. An experiment has a one-way, or completely randomized, design if several

More information

Terminology Suppose we have N observations {x(n)} N 1. Estimators as Random Variables. {x(n)} N 1

Terminology Suppose we have N observations {x(n)} N 1. Estimators as Random Variables. {x(n)} N 1 Estimation Theory Overview Properties Bias, Variance, and Mean Square Error Cramér-Rao lower bound Maximum likelihood Consistency Confidence intervals Properties of the mean estimator Properties of the

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

Interaction effects for continuous predictors in regression modeling

Interaction effects for continuous predictors in regression modeling Interaction effects for continuous predictors in regression modeling Testing for interactions The linear regression model is undoubtedly the most commonly-used statistical model, and has the advantage

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression MATH 282A Introduction to Computational Statistics University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/ eariasca/math282a.html MATH 282A University

More information

Lecture 10: F -Tests, ANOVA and R 2

Lecture 10: F -Tests, ANOVA and R 2 Lecture 10: F -Tests, ANOVA and R 2 1 ANOVA We saw that we could test the null hypothesis that β 1 0 using the statistic ( β 1 0)/ŝe. (Although I also mentioned that confidence intervals are generally

More information

Supplementary materials Quantitative assessment of ribosome drop-off in E. coli

Supplementary materials Quantitative assessment of ribosome drop-off in E. coli Supplementary materials Quantitative assessment of ribosome drop-off in E. coli Celine Sin, Davide Chiarugi, Angelo Valleriani 1 Downstream Analysis Supplementary Figure 1: Illustration of the core steps

More information

Consider fitting a model using ordinary least squares (OLS) regression:

Consider fitting a model using ordinary least squares (OLS) regression: Example 1: Mating Success of African Elephants In this study, 41 male African elephants were followed over a period of 8 years. The age of the elephant at the beginning of the study and the number of successful

More information

Test 3 Practice Test A. NOTE: Ignore Q10 (not covered)

Test 3 Practice Test A. NOTE: Ignore Q10 (not covered) Test 3 Practice Test A NOTE: Ignore Q10 (not covered) MA 180/418 Midterm Test 3, Version A Fall 2010 Student Name (PRINT):............................................. Student Signature:...................................................

More information