Simple linear regression: estimation, diagnostics, prediction
|
|
- Hope Robertson
- 5 years ago
- Views:
Transcription
1 UPPSALA UNIVERSITY Department of Mathematics Mathematical statistics Regression and Analysis of Variance Autumn 2015 COMPUTER SESSION 1: Regression In the first computer exercise we will study the following subjects: Simple linear regression: estimation, diagnostics, prediction Multiple regression Examples of transformation Let s begin! 1 Simple linear regression For simplicity, we use for large parts of the session the built-in data set mtcars. Load the data by writing: data(mtcars); attach(mtcars) In simple linear regression we assume the model y i = β 0 + β 1 x i + ɛ i, i = 1,..., n where β 0 and β 1 are regression coefficients, (x i, y i ) are observed values and ɛ i NID(0, σ 2 ). Let us, to begin with, study the effect of weight on fuel consumption, that is, we take mpg as our y and wt as our x. We can plot the data by writing: plot(mpg ~ wt, xlab = "Weight", ylab="miles per gallon") As we will see below, mpg wt is interpreted as mpg as a function of wt. 1.1 Estimation of the parameters A useful R-routine for linear regression is lm (short for linear model). The model is written using a symbolic notation suggested by Wilkinson and Rogers: y = β 0 + β 1 x 1 corresponds to y x1 y = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 1 x 2 corresponds to y x1+x2+x1*x2 y = β 0 + β 1 x 1 + β 2 x 2 1 corresponds to y x1+i(x1^2) The routine for fitting a linear model by the least-squares method is lm. By the following command, information is collected in the object called m1. m1 <- lm(mpg ~ wt, mtcars) 1
2 The general R command str gives an overview of the object, type str(m1) in this case. Moreover, an output summarizing the important information can be given by summary(m1). In the output from the summary command, to find where R 2 and the estimates of β i are located and which variables are significant. It possible to access the numerical values of an object. Try for instance m1$coefficients summary(m1)$coefficients[2,1] We can add the estimated curve (actually, a straight line) in red to the figure that we plotted earlier by using the following command (note that the short form coef works for coefficients): abline(coef(m1), col="red") Compare the figure with the estimated coefficients and convince yourself that the fit seems reasonable. Confidence interval for the regression coefficients The standard errors of the estimators are given in the summary-table. To calculate a confidence interval for β 1 we need a t-quantile, which we obtain using the function qt. # Some preparations beta1 <- summary(m1)$coefficients[2,1] sterror <- summary(m1)$coefficients[2,2] f <- m1$df.res # Degrees of freedom quantile <- qt(0.975,f) # the (or 0.975) t-quantile # The interval c(beta1 - quantile*sterror, beta1 + quantile*sterror) 1.2 Diagnostics To study the impact of randomness we study the residuals y i ŷ i. These are extracted from the model by residuals(m1) Let us, to begin with, plot the sequence of raw residuals. plot(residuals(m1)) Does there seem to be some evident pattern in the sequence? If so, we need to examine our data more carefully. For a (very) rough check of the normality assumption we can look at the histogram of the observations: hist(residuals(m1)) 2
3 Does the histogram resemble the normal bell-curve? Looking at a Q-Q-plot is probably a better way to investigate the normality assumption. We get one by writing qqnorm(residuals(m1)); qqline(residuals(m1)) Do the points follow the line? A popular (and, usually, powerful) formal test for normality is the Shapiro-Wilk test: shapiro.test(residuals(m1)) Remind yourself what the null hypothesis of the test is. What is the conclusions given by the p-value? Extracting the design matrix Given a model from a call of lm we can extract the so called design matrix, i.e. X in the matrix formulation of the regression model; y = Xβ. The following command extracts the matrix and calculates the useful matrix (X T X) 1. X <- model.matrix(m1); solve( t(x) %*% X ) EXERCISE. Recall from theory that the estimated covariance matrix for the regression coefficients is σ 2 (X T X) 1. For an object m1, this can be found as vcov(m1) in R. Now, verify the elements in the covariance matrix by multiplying suitable elements of (X T X) 1 found above and the estimated variance of the residuals (which is found in the summary table). Compare your answer to that found elsewhere in the summary table (squaring the standard errors forthe individual parameter estimates). Diagnostics plots Next we ll see how we can plot different residuals and leverage/influence measures. To plot some useful figures we can write # Plot are wanted in two rows and two columns: par(mfrow=c(2,2)) # Sequence of residuals: plot(residuals(m1)) # Residuals against fitted values: plot(m1$fit,m1$res) # R-Student residuals: plot(rstudent(m1)) # Cook s distance: plot(cooks.distance(m1)) Judging by the Cook s distance, it seems that some points have large influence. To find out which, use the commands below. A crosshair will appear in the figure; click on interesting points using the the left mouse button to identify them and click the right mouse button to end the identification procedure. car.models <- row.names(mtcars) identify(1:32,cooks.distance(m1),car.models) 3
4 If we only are interested in the index of the observation, we leave out the car.models part of the identify call. Given an lm object, e.g. m1 in our case, R can give a lot of figures for residual diagnostics. Simply write plot(m1) Press Enter on the keyboard to go to the next figure. 1.3 Prediction With a model from lm we can easily do prediction: predict(m1,mtcars) Compare the predicted values ŷ i to the original observations y i. Prediction intervals are obtained by adding another argument: predict(m1, mtcars, interval="prediction") Let us now assume that we want to predict arbitrary values and that we want to show the result in a figure. We plot confidence intervals (for the curve) as well as prediction intervals (for the values that we want to predict; wider and more insecure ). attach(mtcars) # Sequence of x-values that we want to do prediction for pred.frame <- data.frame(wt=seq(1.5,5.5,0.5)) # Calculate prediction and confidence intervals pp <- predict(m1, int="p", newdata = pred.frame) pc <- predict(m1, int="c", newdata = pred.frame) # Graphics (introducing the command matlines) plot(wt,mpg,ylim=range(mpg,pp,na.rm=t)) pred.mpg <- pred.frame$wt matlines(pred.mpg,pc,lty=c(1,2,2),col="blue") matlines(pred.mpg,pp,lty=c(1,3,3),col="black") For the graphical options with lty, se the help text for par. For instance, lty=1,2 or 3 correspond to solid, dashed or dotted, respectively. 2 Multiple regression Next we ll use a second explanatory variable: the power of the engine measured in horse powers (hp). An lm call storing the model in m2 is now m2 <- lm(mpg ~ wt + hp); summary(m2) Feel free to examine residuals, normality and influence as we did above, this time using m2. It might also be of interest to look at the design matrix. Compare the R 2 of m1 with that of m2. Was it what you expected? Prediction: Assume that we want to predict the fuel consumption of a new, rather heavy, car with a small engine: wt= 3.5, hp= 90. The commands are: 4
5 x0 <- data.frame(wt=3.5,hp=90) yhat <- predict(m2,x0) EXERCISE. The hat matrix H, satisfying ŷ = Hy, is given by H = X(X T X) 1 X T. Plot studentized residuals for the fitted object m2. Do you find, perhaps, three spurious observations? Find the related values of the hat matrix, either by typing in from definition (using model.matrix and diag) or the ready-to use command as hatvalues(m2). Use the rule of thumb that hat values for leverage points exceed 2(k + 1)/n where k is the number of explanatory variables and n the number of observations and draw conclusions for the observations. 2.1 Polynomial regression and model choice Suppose we want to introduce a model with a quadratic term, when modelling mpg as a function of wt. (Could seem possible from the original plot?) The commands in R are as follow: mqua <- lm(mpg ~ wt + I(wt^2), mtcars); summary(mqua) Compare the R 2 values between models m1 and mqua (also the adjusted ones). Now, consider an example also including a qualitative variable. An engineer wants to estimate the expected time E[Y ] per month (in hours) for machines out of use due to maintenance as a function of the explanatory variables machine type (1 or 2) and the age of the machine (in years). The following model is suggested: E[Y ] = β 0 + β 1 x 1 + β 2 x β 3 x 2 where x 1 is the age of the machine, x 2 the type of machine (x = 1: Type 1, x = 0, Type 2). In the file shutdown.dat data is collected. Save into a suitable directory, read into R and study the data structure: mask <- read.table("shutdown.dat") mask str(mask) attach(mask) EXERCISE. (a) Estimation of parameters: mask0 <- lm(v1 ~ V2 + I(V2^2) + V4) summary(mask0) Does the model seem reasonable? What variables are significant? 5
6 (b) A simpler model is considered, and one wants to test β 1 = β 2 = 0 (at the level α = 0.10). More precisely, H 0 : β 1 = β 2 = 0 H 1 : At least one β i 0, i = 1, 2. We may use the fact (Sundberg, page 75) that F = R2 0 R2 1 k l /1 R2 0 N k F (k l, N k) where R 2 0 and R2 1 are the R2 values of the general (full) model and hypothesis model, respectively, and we have N = 20, k = 3 and l = 1. If the observed value of the test quantity F is larger than the F quantile, we reject the hypothesis of the simpler model and reperation time depends on age. Find if this is the case for our data. (A quantile from the F distribution is found by using qf.) 3 Transformations Transformations are often useful in regression. We ll study the effect of heating on the strength of vegetables (e.g. carrots). Such experiments are of importance for packing strategies for food. The data comes from an experiment from Belgium. The temperature was fixed at 90 C, the heating times were measured in seconds and the force in N. We import the data (download the file skalla.dat from the course page of the student portal) and plot the strength as a function of the length of the heating time. We also plot a figure where we ve taken the logarithm of the strength: skalla <- read.table("skalla.dat", col.names=c("temp","time","force") ) attach(skalla) plot(force~time) plot(log(force)~time) Do you think that the use of the logarithm will give us a better linear regression and a more homogeneous variance? The regression models are fitted as usual: m3 <- lm(force ~ Time) m4 <- lm(log(force) ~ Time) Look at the result of the regression: significant variables, R 2 and so on. If you have the time, use the boxcox function in R to find out whether taking the logarithm is a good transformation. When selecting a power transformation of the response variable with the Box Cox method, a suggested exponent of 0 corresponds to taking logarithm. The routine is found in the MASS-package, activate this by typing library(mass). If not already installed in your computer environment, it can be installed by writing install.packages(), choosing Sweden and then choosing MASS. For the routine, use help(boxcox) to get instructions on its usage. 6
1 Introduction to Minitab
1 Introduction to Minitab Minitab is a statistical analysis software package. The software is freely available to all students and is downloadable through the Technology Tab at my.calpoly.edu. When you
More informationBIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression
BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression Introduction to Correlation and Regression The procedures discussed in the previous ANOVA labs are most useful in cases where we are interested
More informationSTAT 3022 Spring 2007
Simple Linear Regression Example These commands reproduce what we did in class. You should enter these in R and see what they do. Start by typing > set.seed(42) to reset the random number generator so
More information, (1) e i = ˆσ 1 h ii. c 2016, Jeffrey S. Simonoff 1
Regression diagnostics As is true of all statistical methodologies, linear regression analysis can be a very effective way to model data, as along as the assumptions being made are true. For the regression
More informationCOMPUTER SESSION: ARMA PROCESSES
UPPSALA UNIVERSITY Department of Mathematics Jesper Rydén Stationary Stochastic Processes 1MS025 Autumn 2010 COMPUTER SESSION: ARMA PROCESSES 1 Introduction In this computer session, we work within the
More informationRegression Analysis and Forecasting Prof. Shalabh Department of Mathematics and Statistics Indian Institute of Technology-Kanpur
Regression Analysis and Forecasting Prof. Shalabh Department of Mathematics and Statistics Indian Institute of Technology-Kanpur Lecture 10 Software Implementation in Simple Linear Regression Model using
More informationx 21 x 22 x 23 f X 1 X 2 X 3 ε
Chapter 2 Estimation 2.1 Example Let s start with an example. Suppose that Y is the fuel consumption of a particular model of car in m.p.g. Suppose that the predictors are 1. X 1 the weight of the car
More informationMultiple Linear Regression
Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from
More informationIntroduction to Linear Regression Rebecca C. Steorts September 15, 2015
Introduction to Linear Regression Rebecca C. Steorts September 15, 2015 Today (Re-)Introduction to linear models and the model space What is linear regression Basic properties of linear regression Using
More informationGenerating OLS Results Manually via R
Generating OLS Results Manually via R Sujan Bandyopadhyay Statistical softwares and packages have made it extremely easy for people to run regression analyses. Packages like lm in R or the reg command
More information14 Multiple Linear Regression
B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 14 Multiple Linear Regression 14.1 The multiple linear regression model In simple linear regression, the response variable y is expressed in
More informationSTATISTICS 110/201 PRACTICE FINAL EXAM
STATISTICS 110/201 PRACTICE FINAL EXAM Questions 1 to 5: There is a downloadable Stata package that produces sequential sums of squares for regression. In other words, the SS is built up as each variable
More informationRegression Analysis V... More Model Building: Including Qualitative Predictors, Model Searching, Model "Checking"/Diagnostics
Regression Analysis V... More Model Building: Including Qualitative Predictors, Model Searching, Model "Checking"/Diagnostics The session is a continuation of a version of Section 11.3 of MMD&S. It concerns
More informationRegression Analysis V... More Model Building: Including Qualitative Predictors, Model Searching, Model "Checking"/Diagnostics
Regression Analysis V... More Model Building: Including Qualitative Predictors, Model Searching, Model "Checking"/Diagnostics The session is a continuation of a version of Section 11.3 of MMD&S. It concerns
More informationIntroduction and Single Predictor Regression. Correlation
Introduction and Single Predictor Regression Dr. J. Kyle Roberts Southern Methodist University Simmons School of Education and Human Development Department of Teaching and Learning Correlation A correlation
More informationComputer exercise 4 Poisson Regression
Chalmers-University of Gothenburg Department of Mathematical Sciences Probability, Statistics and Risk MVE300 Computer exercise 4 Poisson Regression When dealing with two or more variables, the functional
More informationLinear Regression. In this lecture we will study a particular type of regression model: the linear regression model
1 Linear Regression 2 Linear Regression In this lecture we will study a particular type of regression model: the linear regression model We will first consider the case of the model with one predictor
More informationINFERENCE FOR REGRESSION
CHAPTER 3 INFERENCE FOR REGRESSION OVERVIEW In Chapter 5 of the textbook, we first encountered regression. The assumptions that describe the regression model we use in this chapter are the following. We
More informationMultiple Regression Introduction to Statistics Using R (Psychology 9041B)
Multiple Regression Introduction to Statistics Using R (Psychology 9041B) Paul Gribble Winter, 2016 1 Correlation, Regression & Multiple Regression 1.1 Bivariate correlation The Pearson product-moment
More informationBANA 7046 Data Mining I Lecture 2. Linear Regression, Model Assessment, and Cross-validation 1
BANA 7046 Data Mining I Lecture 2. Linear Regression, Model Assessment, and Cross-validation 1 Shaobo Li University of Cincinnati 1 Partially based on Hastie, et al. (2009) ESL, and James, et al. (2013)
More informationRegression Analysis in R
Regression Analysis in R 1 Purpose The purpose of this activity is to provide you with an understanding of regression analysis and to both develop and apply that knowledge to the use of the R statistical
More informationStatistics 203 Introduction to Regression Models and ANOVA Practice Exam
Statistics 203 Introduction to Regression Models and ANOVA Practice Exam Prof. J. Taylor You may use your 4 single-sided pages of notes This exam is 7 pages long. There are 4 questions, first 3 worth 10
More informationIntroductory Statistics with R: Linear models for continuous response (Chapters 6, 7, and 11)
Introductory Statistics with R: Linear models for continuous response (Chapters 6, 7, and 11) Statistical Packages STAT 1301 / 2300, Fall 2014 Sungkyu Jung Department of Statistics University of Pittsburgh
More informationChapter 16: Understanding Relationships Numerical Data
Chapter 16: Understanding Relationships Numerical Data These notes reflect material from our text, Statistics, Learning from Data, First Edition, by Roxy Peck, published by CENGAGE Learning, 2015. Linear
More informationApart from this page, you are not permitted to read the contents of this question paper until instructed to do so by an invigilator.
B. Sc. Examination by course unit 2014 MTH5120 Statistical Modelling I Duration: 2 hours Date and time: 16 May 2014, 1000h 1200h Apart from this page, you are not permitted to read the contents of this
More informationMulticollinearity occurs when two or more predictors in the model are correlated and provide redundant information about the response.
Multicollinearity Read Section 7.5 in textbook. Multicollinearity occurs when two or more predictors in the model are correlated and provide redundant information about the response. Example of multicollinear
More informationUnit 10: Simple Linear Regression and Correlation
Unit 10: Simple Linear Regression and Correlation Statistics 571: Statistical Methods Ramón V. León 6/28/2004 Unit 10 - Stat 571 - Ramón V. León 1 Introductory Remarks Regression analysis is a method for
More informationLecture 18: Simple Linear Regression
Lecture 18: Simple Linear Regression BIOS 553 Department of Biostatistics University of Michigan Fall 2004 The Correlation Coefficient: r The correlation coefficient (r) is a number that measures the strength
More informationCHAPTER 10 ONE-WAY ANALYSIS OF VARIANCE. It would be very unusual for all the research one might conduct to be restricted to
CHAPTER 10 ONE-WAY ANALYSIS OF VARIANCE It would be very unusual for all the research one might conduct to be restricted to comparisons of only two samples. Respondents and various groups are seldom divided
More informationAssumptions, Diagnostics, and Inferences for the Simple Linear Regression Model with Normal Residuals
Assumptions, Diagnostics, and Inferences for the Simple Linear Regression Model with Normal Residuals 4 December 2018 1 The Simple Linear Regression Model with Normal Residuals In previous class sessions,
More informationLAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION
LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION In this lab you will learn how to use Excel to display the relationship between two quantitative variables, measure the strength and direction of the
More informationChapter 5 Exercises 1. Data Analysis & Graphics Using R Solutions to Exercises (April 24, 2004)
Chapter 5 Exercises 1 Data Analysis & Graphics Using R Solutions to Exercises (April 24, 2004) Preliminaries > library(daag) Exercise 2 The final three sentences have been reworded For each of the data
More informationPassing-Bablok Regression for Method Comparison
Chapter 313 Passing-Bablok Regression for Method Comparison Introduction Passing-Bablok regression for method comparison is a robust, nonparametric method for fitting a straight line to two-dimensional
More informationChapter 5 Exercises 1
Chapter 5 Exercises 1 Data Analysis & Graphics Using R, 2 nd edn Solutions to Exercises (December 13, 2006) Preliminaries > library(daag) Exercise 2 For each of the data sets elastic1 and elastic2, determine
More informationProbability Distributions
CONDENSED LESSON 13.1 Probability Distributions In this lesson, you Sketch the graph of the probability distribution for a continuous random variable Find probabilities by finding or approximating areas
More informationGeneralized Linear Models
Generalized Linear Models Lecture 3. Hypothesis testing. Goodness of Fit. Model diagnostics GLM (Spring, 2018) Lecture 3 1 / 34 Models Let M(X r ) be a model with design matrix X r (with r columns) r n
More informationassumes a linear relationship between mean of Y and the X s with additive normal errors the errors are assumed to be a sample from N(0, σ 2 )
Multiple Linear Regression is used to relate a continuous response (or dependent) variable Y to several explanatory (or independent) (or predictor) variables X 1, X 2,, X k assumes a linear relationship
More informationChapter 3 - Linear Regression
Chapter 3 - Linear Regression Lab Solution 1 Problem 9 First we will read the Auto" data. Note that most datasets referred to in the text are in the R package the authors developed. So we just need to
More informationChapter 16. Simple Linear Regression and dcorrelation
Chapter 16 Simple Linear Regression and dcorrelation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will
More information1 Correlation and Inference from Regression
1 Correlation and Inference from Regression Reading: Kennedy (1998) A Guide to Econometrics, Chapters 4 and 6 Maddala, G.S. (1992) Introduction to Econometrics p. 170-177 Moore and McCabe, chapter 12 is
More information1 Least Squares Estimation - multiple regression.
Introduction to multiple regression. Fall 2010 1 Least Squares Estimation - multiple regression. Let y = {y 1,, y n } be a n 1 vector of dependent variable observations. Let β = {β 0, β 1 } be the 2 1
More informationHandout 4: Simple Linear Regression
Handout 4: Simple Linear Regression By: Brandon Berman The following problem comes from Kokoska s Introductory Statistics: A Problem-Solving Approach. The data can be read in to R using the following code:
More informationInference Tutorial 2
Inference Tutorial 2 This sheet covers the basics of linear modelling in R, as well as bootstrapping, and the frequentist notion of a confidence interval. When working in R, always create a file containing
More informationMotor Trend Car Road Analysis
Motor Trend Car Road Analysis Zakia Sultana February 28, 2016 Executive Summary You work for Motor Trend, a magazine about the automobile industry. Looking at a data set of a collection of cars, they are
More informationLinear Models II. Chapter Key ideas
Chapter 6 Linear Models II 6.1 Key ideas Consider a situation in which we take measurements of some attribute Y on two distinct group. We want to know whether the mean of group 1, µ 1, is different from
More informationCh3. TRENDS. Time Series Analysis
3.1 Deterministic Versus Stochastic Trends The simulated random walk in Exhibit 2.1 shows a upward trend. However, it is caused by a strong correlation between the series at nearby time points. The true
More informationRegression diagnostics
Regression diagnostics Leiden University Leiden, 30 April 2018 Outline 1 Error assumptions Introduction Variance Normality 2 Residual vs error Outliers Influential observations Introduction Errors and
More informationWooldridge, Introductory Econometrics, 2d ed. Chapter 8: Heteroskedasticity In laying out the standard regression model, we made the assumption of
Wooldridge, Introductory Econometrics, d ed. Chapter 8: Heteroskedasticity In laying out the standard regression model, we made the assumption of homoskedasticity of the regression error term: that its
More informationCS Homework 3. October 15, 2009
CS 294 - Homework 3 October 15, 2009 If you have questions, contact Alexandre Bouchard (bouchard@cs.berkeley.edu) for part 1 and Alex Simma (asimma@eecs.berkeley.edu) for part 2. Also check the class website
More informationMath 423/533: The Main Theoretical Topics
Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)
More informationGlossary. The ISI glossary of statistical terms provides definitions in a number of different languages:
Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the
More informationBusiness Statistics. Lecture 9: Simple Regression
Business Statistics Lecture 9: Simple Regression 1 On to Model Building! Up to now, class was about descriptive and inferential statistics Numerical and graphical summaries of data Confidence intervals
More informationModule 6: Model Diagnostics
St@tmaster 02429/MIXED LINEAR MODELS PREPARED BY THE STATISTICS GROUPS AT IMM, DTU AND KU-LIFE Module 6: Model Diagnostics 6.1 Introduction............................... 1 6.2 Linear model diagnostics........................
More informationOutline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model
Outline 1 Multiple Linear Regression (Estimation, Inference, Diagnostics and Remedial Measures) 2 Special Topics for Multiple Regression Extra Sums of Squares Standardized Version of the Multiple Regression
More informationSimple Linear Regression
Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)
More informationSTAT 520 FORECASTING AND TIME SERIES 2013 FALL Homework 05
STAT 520 FORECASTING AND TIME SERIES 2013 FALL Homework 05 1. ibm data: The random walk model of first differences is chosen to be the suggest model of ibm data. That is (1 B)Y t = e t where e t is a mean
More informationChapter 16. Simple Linear Regression and Correlation
Chapter 16 Simple Linear Regression and Correlation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will
More informationTime Series Analysis. Smoothing Time Series. 2) assessment of/accounting for seasonality. 3) assessment of/exploiting "serial correlation"
Time Series Analysis 2) assessment of/accounting for seasonality This (not surprisingly) concerns the analysis of data collected over time... weekly values, monthly values, quarterly values, yearly values,
More informationScatter plot of data from the study. Linear Regression
1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25
More informationKeller: Stats for Mgmt & Econ, 7th Ed July 17, 2006
Chapter 17 Simple Linear Regression and Correlation 17.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will
More informationThe Simple Linear Regression Model
The Simple Linear Regression Model Lesson 3 Ryan Safner 1 1 Department of Economics Hood College ECON 480 - Econometrics Fall 2017 Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 1 / 77 Bivariate
More informationRatio of Polynomials Fit Many Variables
Chapter 376 Ratio of Polynomials Fit Many Variables Introduction This program fits a model that is the ratio of two polynomials of up to fifth order. Instead of a single independent variable, these polynomials
More informationLAB 3 INSTRUCTIONS SIMPLE LINEAR REGRESSION
LAB 3 INSTRUCTIONS SIMPLE LINEAR REGRESSION In this lab you will first learn how to display the relationship between two quantitative variables with a scatterplot and also how to measure the strength of
More informationAny of 27 linear and nonlinear models may be fit. The output parallels that of the Simple Regression procedure.
STATGRAPHICS Rev. 9/13/213 Calibration Models Summary... 1 Data Input... 3 Analysis Summary... 5 Analysis Options... 7 Plot of Fitted Model... 9 Predicted Values... 1 Confidence Intervals... 11 Observed
More informationCHAPTER 8 MODEL DIAGNOSTICS. 8.1 Residual Analysis
CHAPTER 8 MODEL DIAGNOSTICS We have now discussed methods for specifying models and for efficiently estimating the parameters in those models. Model diagnostics, or model criticism, is concerned with testing
More informationRegression_Model_Project Md Ahmed June 13th, 2017
Regression_Model_Project Md Ahmed June 13th, 2017 Executive Summary Motor Trend is a magazine about the automobile industry. It is interested in exploring the relationship between a set of variables and
More informationGraphical Diagnosis. Paul E. Johnson 1 2. (Mostly QQ and Leverage Plots) 1 / Department of Political Science
(Mostly QQ and Leverage Plots) 1 / 63 Graphical Diagnosis Paul E. Johnson 1 2 1 Department of Political Science 2 Center for Research Methods and Data Analysis, University of Kansas. (Mostly QQ and Leverage
More informationWarm-up Using the given data Create a scatterplot Find the regression line
Time at the lunch table Caloric intake 21.4 472 30.8 498 37.7 335 32.8 423 39.5 437 22.8 508 34.1 431 33.9 479 43.8 454 42.4 450 43.1 410 29.2 504 31.3 437 28.6 489 32.9 436 30.6 480 35.1 439 33.0 444
More informationTake-home Final. The questions you are expected to answer for this project are numbered and italicized. There is one bonus question. Good luck!
Take-home Final The data for our final project come from a study initiated by the Tasmanian Aquaculture and Fisheries Institute to investigate the growth patterns of abalone living along the Tasmanian
More informationSTATISTICS 479 Exam II (100 points)
Name STATISTICS 79 Exam II (1 points) 1. A SAS data set was created using the following input statement: Answer parts(a) to (e) below. input State $ City $ Pop199 Income Housing Electric; (a) () Give the
More informationChapter 1 Statistical Inference
Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations
More informationPolynomial Regression
Polynomial Regression Summary... 1 Analysis Summary... 3 Plot of Fitted Model... 4 Analysis Options... 6 Conditional Sums of Squares... 7 Lack-of-Fit Test... 7 Observed versus Predicted... 8 Residual Plots...
More informationBIOSTATS 640 Spring 2018 Unit 2. Regression and Correlation (Part 1 of 2) R Users
BIOSTATS 640 Spring 08 Unit. Regression and Correlation (Part of ) R Users Unit Regression and Correlation of - Practice Problems Solutions R Users. In this exercise, you will gain some practice doing
More informationStat 311: HW 9, due Th 5/27/10 in your Quiz Section
Stat 311: HW 9, due Th 5/27/10 in your Quiz Section Fritz Scholz Your returned assignment should show your name and student ID number. It should be printed or written clearly. 1. The data set ReactionTime
More informationRobustness and Distribution Assumptions
Chapter 1 Robustness and Distribution Assumptions 1.1 Introduction In statistics, one often works with model assumptions, i.e., one assumes that data follow a certain model. Then one makes use of methodology
More informationScatter plot of data from the study. Linear Regression
1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25
More information10 Model Checking and Regression Diagnostics
10 Model Checking and Regression Diagnostics The simple linear regression model is usually written as i = β 0 + β 1 i + ɛ i where the ɛ i s are independent normal random variables with mean 0 and variance
More informationUNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017
UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Tuesday, January 17, 2017 Work all problems 60 points are needed to pass at the Masters Level and 75
More informationInference for Regression Inference about the Regression Model and Using the Regression Line
Inference for Regression Inference about the Regression Model and Using the Regression Line PBS Chapter 10.1 and 10.2 2009 W.H. Freeman and Company Objectives (PBS Chapter 10.1 and 10.2) Inference about
More informationMultiple Linear Regression (solutions to exercises)
Chapter 6 1 Chapter 6 Multiple Linear Regression (solutions to exercises) Chapter 6 CONTENTS 2 Contents 6 Multiple Linear Regression (solutions to exercises) 1 6.1 Nitrate concentration..........................
More informationR STATISTICAL COMPUTING
R STATISTICAL COMPUTING some R Examples Dennis Friday 2 nd and Saturday 3 rd May, 14. Topics covered Vector and Matrix operation. File Operations. Evaluation of Probability Density Functions. Testing of
More informationSTAT 420: Methods of Applied Statistics
STAT 420: Methods of Applied Statistics Model Diagnostics Transformation Shiwei Lan, Ph.D. Course website: http://shiwei.stat.illinois.edu/lectures/stat420.html August 15, 2018 Department
More informationNotes on Maxwell & Delaney
Notes on Maxwell & Delaney PSY710 9 Designs with Covariates 9.1 Blocking Consider the following hypothetical experiment. We want to measure the effect of a drug on locomotor activity in hyperactive children.
More informationAnalysis of 2x2 Cross-Over Designs using T-Tests
Chapter 234 Analysis of 2x2 Cross-Over Designs using T-Tests Introduction This procedure analyzes data from a two-treatment, two-period (2x2) cross-over design. The response is assumed to be a continuous
More informationMATH11400 Statistics Homepage
MATH11400 Statistics 1 2010 11 Homepage http://www.stats.bris.ac.uk/%7emapjg/teach/stats1/ 4. Linear Regression 4.1 Introduction So far our data have consisted of observations on a single variable of interest.
More information13 Simple Linear Regression
B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 3 Simple Linear Regression 3. An industrial example A study was undertaken to determine the effect of stirring rate on the amount of impurity
More informationCh 8. MODEL DIAGNOSTICS. Time Series Analysis
Model diagnostics is concerned with testing the goodness of fit of a model and, if the fit is poor, suggesting appropriate modifications. We shall present two complementary approaches: analysis of residuals
More informationLecture 1: Linear Models and Applications
Lecture 1: Linear Models and Applications Claudia Czado TU München c (Claudia Czado, TU Munich) ZFS/IMS Göttingen 2004 0 Overview Introduction to linear models Exploratory data analysis (EDA) Estimation
More informationStatistical Modelling in Stata 5: Linear Models
Statistical Modelling in Stata 5: Linear Models Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 07/11/2017 Structure This Week What is a linear model? How good is my model? Does
More informationSTA442/2101: Assignment 5
STA442/2101: Assignment 5 Craig Burkett Quiz on: Oct 23 rd, 2015 The questions are practice for the quiz next week, and are not to be handed in. I would like you to bring in all of the code you used to
More informationDetermination of Density 1
Introduction Determination of Density 1 Authors: B. D. Lamp, D. L. McCurdy, V. M. Pultz and J. M. McCormick* Last Update: February 1, 2013 Not so long ago a statistical data analysis of any data set larger
More informationLecture 9: Predictive Inference
Lecture 9: Predictive Inference There are (at least) three levels at which we can make predictions with a regression model: we can give a single best guess about what Y will be when X = x, a point prediction;
More informationYou can use numeric categorical predictors. A categorical predictor is one that takes values from a fixed set of possibilities.
CONTENTS Linear Regression Prepare Data To begin fitting a regression, put your data into a form that fitting functions expect. All regression techniques begin with input data in an array X and response
More informationQuantitative Understanding in Biology Module II: Model Parameter Estimation Lecture I: Linear Correlation and Regression
Quantitative Understanding in Biology Module II: Model Parameter Estimation Lecture I: Linear Correlation and Regression Correlation Linear correlation and linear regression are often confused, mostly
More informationRegression Analysis. Table Relationship between muscle contractile force (mj) and stimulus intensity (mv).
Regression Analysis Two variables may be related in such a way that the magnitude of one, the dependent variable, is assumed to be a function of the magnitude of the second, the independent variable; however,
More informationRatio of Polynomials Fit One Variable
Chapter 375 Ratio of Polynomials Fit One Variable Introduction This program fits a model that is the ratio of two polynomials of up to fifth order. Examples of this type of model are: and Y = A0 + A1 X
More informationData Science for Engineers Department of Computer Science and Engineering Indian Institute of Technology, Madras
Data Science for Engineers Department of Computer Science and Engineering Indian Institute of Technology, Madras Lecture 36 Simple Linear Regression Model Assessment So, welcome to the second lecture on
More information1. (Problem 3.4 in OLRT)
STAT:5201 Homework 5 Solutions 1. (Problem 3.4 in OLRT) The relationship of the untransformed data is shown below. There does appear to be a decrease in adenine with increased caffeine intake. This is
More informationBusiness Analytics and Data Mining Modeling Using R Prof. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee
Business Analytics and Data Mining Modeling Using R Prof. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee Lecture - 04 Basic Statistics Part-1 (Refer Slide Time: 00:33)
More informationInference for Single Proportions and Means T.Scofield
Inference for Single Proportions and Means TScofield Confidence Intervals for Single Proportions and Means A CI gives upper and lower bounds between which we hope to capture the (fixed) population parameter
More information