Handout 4: Simple Linear Regression
|
|
- Cynthia Randall
- 5 years ago
- Views:
Transcription
1 Handout 4: Simple Linear Regression By: Brandon Berman The following problem comes from Kokoska s Introductory Statistics: A Problem-Solving Approach. The data can be read in to R using the following code: msm = read.csv(" 1 Background and Maximum Likelihood Estimates The European Food Safety Authority recently issued a scientific opinion on the public health risks related to mechanically separated meat (MSM). The analysis suggested that calcium could be used to distinguish between MSM and non-msm products. A random sample of MSM poultry was obtained and the deboner head pressure (in psi) and the amount of calcium (in ppm) was measured for each. The data are given in the following table. Based on this data set we want to build a model that can predict how much calcium (Y i ) a sample will contain depending on the pressure (x i ) the deboner used on the poultry. Our model will be: Y i = β 0 + β 1 x i + ɛ i where ɛ i iid N(0, σ 2 ), i = 1, 2,..., 18 A consequence of the definition above is that Y i indep. N (β 0 + β 1 x i, σ 2 ) for all i = 1, 2,..., n. In order to fit the model we first must find the maximum likelihood estimates for β 0, β 1, and σ 2. (Note that n = 18, but we re going to pretend for the time being we don t know that fact). The likelihood and log-likelihood equations are: L(β 0, β 1, σ 2 ) = n [ ( 1 exp (y )] i β 0 β 1 x i ) 2 2πσ 2 2σ 2 l(β 0, β 1, σ 2 ) = n 2 log(2πσ2 ) n (y i β 0 β 1 x i ) 2 2σ 2 1
2 Pressure (in psi) Calcium (in ppm) Now we must take the derivative of the likelihood equation with respect to β 0, β 1 and σ 2. l β 0 = l β 1 = n n (y i β 0 β 1 x i ) σ 2 l σ = n n 2 2σ + 2 x i (y i β 0 β 1 x i ) σ 2 (y i β 0 β 1 x i ) 2 2(σ 2 ) 2 Set the three equations above equal to zero and simultaneously solve for β 0, β 1 and σ 2. The results will be the maximum likelihood estimates. Prove to yourself the following are the maximum 2
3 likelihood estimates: ˆβ 0 = ȳ ˆβ 1 x n ˆβ 1 = (x i x)(y i ȳ) n (x i x) 2 ˆσ 2 = n (y i ˆβ 0 ˆβ 1 x i ) 2 To find the maximum likelihood estimates for β 0 and β 1 in R, we can use the following code: > n = dim(msm)[1] > x = msm$pressure > y = msm$calcium > beta1 = sum( ( x - mean(x) ) * ( y - mean(y) ) )/sum( ( x - mean(x) )^2 ) > beta0 = mean(y) - beta1 * mean(x) > beta0 [1] > beta1 [1] To interpret ˆβ 0 s value of , we would say that the expected calcium, given the machine is set to a pressure of 0 psi, is 505 ppm. Note that often times, interpretations of ˆβ 0 might be non-sensical such as in this case; in this example when the machine is set to 0 psi it can t separate the meat. Often what is of scientific interest is the interpretation of ˆβ 1. In this example, one way to interpret ˆβ 1 is to say that for a 1 psi increase in pressure the calcium concentration is expected to increase by 1.01 ppm. Typically, we don t use the maximum likelihood estimator for σ 2 because it is a biased estimator (prove this fact to yourself). Instead, we use the unbiased estimate we sometimes refer to as MSE, n (y i ŷ i ) 2 n (y i MSE = = ˆβ 0 ˆβ 1 x i ) 2 n 2 n 2 To find the value for MSE using R, we can use the following code: > yhat = beta0 + beta1*x > MSE = sum( (y - yhat)^2 )/(n-2) > MSE [1] n
4 2 Hypothesis Testing H 0 : β 1 = 0 vs. H a : β 1 0 According to the assumptions we made, ˆβ 1 N ( β 1, ) σ 2 n (x i x) 2 If we wanted to test the null hypothesis of H 0 : β 1 = 0 vs. H a : β 1 0 then our test statistic might be: ˆβ 1 0 test statistic = σ 2 n (x i x) 2 However, there is a problem with the test statistic above, we don t know the value of σ 2, so we have to substitute in for σ 2 the unbiased estimate we previously found. test statistic = ˆβ 1 0 MSE n (x i x) 2 Then when the null hypothesis is true, test statistic = ˆβ 1 0 MSE n (x i x) 2 t (n 2) To carry out the equivalent test in R, we could use the following code: > test.stat = beta1/sqrt( MSE / sum( ( x - mean(x) )^2 ) ) > test.stat [1] > alpha = 0.05 > # Rejection region approach > cutoff = qt( c(alpha/2, 1-alpha/2), df = n - 2 ) > cutoff [1] > (test.stat <= cutoff[1]) (test.stat >= cutoff[2]) [1] TRUE > > # p-value approach > p.value = 2*pt( test.stat, df = n - 2, lower.tail = F) 4
5 > p.value [1] > p.value <= alpha [1] TRUE From our hypothesis tests above we can now make a conclusion. If we choose to use the rejection region approach then we reject H 0 if the test statistic falls in to either the (, 2.12] or the [2.12, ) interval. Since our test statistic is 3.57 then our conclusion becomes we reject H 0 and conclude significance, of course, no conclusion is complete without referencing the context of the problem, so here we would conclude that calcium concentration in MSM poultry is linearly associated with pressure of the separation machine. If we choose to use the p-value approach to hypothesis testing then we compare our p-value against the pre-selected significance level of α = Since the p-value is which is less than 0.05 then we reject the null hypothesis and conclude their is a significant relationship. Like before, to complete the conclusion we need to explain which relationship is significant so we need to say that the linear relationship between calcium concentration in MSM poultry and the pressure of the separation machine is significant. 3 Confidence Interval for Estimated Mean and a New Observation at a given point There are a few other things that we might be interested in examining with our model. Suppose we wanted to create a confidence interval for the estimated mean response at a given point of x = x h. From class we know that such a confidence interval has the following formula: Ŷ ± t (n 2);1 α/2 MSE ( 1 n + (x ) h x) 2 n (x i x) 2 Suppose we are interested in generating a 95% confidence interval for the mean calcium at 100 psi. To do this in R we could use the following code: > y_100 = beta0 + beta1*100 > y_100 [1] > y_100 + c(-1,1)*qt(1-0.05/2, df = n-2)*sqrt(mse*(1/n + + (100-mean(x))^2/sum( (x-mean(x))^2 ) ) ) [1]
6 Some students find the concept of using vectors in R challenging, so an alternative way is to produce the lower and upper endpoints of the interval separately, like so: > lower = y_100 - qt(1-0.05/2, df = n-2)*sqrt(mse*(1/n + + (100-mean(x))^2/sum( (x-mean(x))^2 ) ) ) > upper = y_100 + qt(1-0.05/2, df = n-2)*sqrt(mse*(1/n + + (100-mean(x))^2/sum( (x-mean(x))^2 ) ) ) > lower [1] > upper [1] Notice that both ways produce equivalent results. The 95% confidence interval for the mean calcium when pressure is 100 psi is (589.1, 624.1). We interpret this confidence interval by saying We are 95% confident that the mean amount of calcium at 100 psi is between and In R, there is a builtin function that will achieve the same results: > predict(mod, newdata = list(pressure = 100), level = 0.95, interval = "confidence") fit lwr upr Now suppose there was a new observation at 100 psi, to calculate a confidence interval for that new observation we use the formula: Ŷ new ± t (n 2);1 α/2 MSE ( 1 n (x ) h x) 2 n (x i x) 2 To calculate a 95% confidence interval for the calcium of a new observation at 100 psi we could use the following R code: > y_100 + c(-1,1)*qt(1-0.05/2, df = n-2)*sqrt(mse*(1/n (100-mean(x))^2/sum( (x-mean(x))^2 ) [1] or, > lower = y_100 - qt(1-0.05/2, df = n-2)*sqrt(mse*(1/n (100-mean(x))^2/sum( (x-mean(x))^2 ) ) ) > upper = y_100 + qt(1-0.05/2, df = n-2)*sqrt(mse*(1/n (100-mean(x))^2/sum( (x-mean(x))^2 ) ) ) 6
7 > lower [1] > upper [1] So the 95% confidence interval we just solved for would have the following interpretation, We are 95% confident that a new observation with a pressure of 100 psi will be between and ppm. In R the same builtin function can be used to solve for the prediction interval: > predict(mod, newdata = list(pressure = 100), level = 0.95, interval = "prediction") fit lwr upr S ums of Squares Finally, we can find the Sum of Squares due to Regression, the Sum of Squares due to Error, and the Sum of Squares of Total. Recall the formulas: SSE (sometimes called RSS) = SSR = SST O = n (y i ŷ i ) 2 n (ŷ i ȳ) 2 n (y i ȳ) 2 In R, we can find these values easily using the following code: > yhat = beta0 + beta1*x > SSE = sum( (y - yhat)^2 ) # called RSS, residual sum of squares > SSE [1] > SSReg = sum( (yhat - mean(y))^2 ) # SS Regression > SSReg [1] > SSTO = sum( (y - mean(y))^2 ) > SSTO [1]
8 > Rsquared = SSReg/SSTO > Rsquared [1] Of course there is an easy way to do all of these tasks in R without having to calculate all this: > mod = lm(calcium ~ Pressure, data = msm) > summary(mod) Call: lm(formula = Calcium ~ Pressure, data = msm) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) e-12 *** Pressure ** --- Signif. codes: 0 *** ** 0.01 * Residual standard error: on 16 degrees of freedom Multiple R-squared: ,Adjusted R-squared: F-statistic: on 1 and 16 DF, p-value: Checking Model Assumptions Regardless, we need to check the assumptions made for linear regression are correct. One of the assumptions we made was the variance is constant for all observations. To check the assumption we can plot the residuals vs. fitted values. > plot(x = yhat, y = stdresid, xlab = "Predicted Calcium (in ppm)", ylab = + "Standardized Residuals", main = "Standardized Residuals\nvs. Fitted") > abline(h = 0, lwd = 1, lty = 2, col = "grey") The plot that results from the code above is figure 1. 8
9 Standardized Residuals Standardized Residuals vs. Fitted Predicted Calcium (in ppm) Figure 1: Residuals vs. Fitted Values. The cloud of points should be centered around zero and remain fairly constant. The next assumption we can check the assumption that the data are normally distributed. To check this assumption we can create a QQ plot of the residuals. To check this assumption in R we can use the following code: > qqnorm( scale(y-yhat) ) > qqline( scale(y-yhat), lty = 2, lwd = 1, col = "grey" ) The plot the code generates is in Figure 2. Finally, one of the plots often included in is a scatter plot with the regression line added (see Figure 3). The following code generates that: > plot(x = x, y = y, xlab = "Pressure (in psi)", ylab = "Calcium (in ppm)", + main = "Scatterplot of data with\nregression line added") > curve(beta0 + beta1*x, from = min(x), to = max(x), add = TRUE, lwd = 2, lty = 1) 9
10 Sample Quantiles Normal Q Q Plot Theoretical Quantiles Figure 2: QQ plot. The points should follow the line y = x if the data is normally distributed. This looks pretty close. 10
11 Calcium (in ppm) Scatterplot of data with regression line added Pressure (in psi) Figure 3: Scatter plot with regression line. 11
Ch 2: Simple Linear Regression
Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component
More informationInference for Regression
Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu
More informationSTAT 3022 Spring 2007
Simple Linear Regression Example These commands reproduce what we did in class. You should enter these in R and see what they do. Start by typing > set.seed(42) to reset the random number generator so
More informationFigure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim
0.0 1.0 1.5 2.0 2.5 3.0 8 10 12 14 16 18 20 22 y x Figure 1: The fitted line using the shipment route-number of ampules data STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim Problem#
More informationCoefficient of Determination
Coefficient of Determination ST 430/514 The coefficient of determination, R 2, is defined as before: R 2 = 1 SS E (yi ŷ i ) = 1 2 SS yy (yi ȳ) 2 The interpretation of R 2 is still the fraction of variance
More informationCh 3: Multiple Linear Regression
Ch 3: Multiple Linear Regression 1. Multiple Linear Regression Model Multiple regression model has more than one regressor. For example, we have one response variable and two regressor variables: 1. delivery
More informationDensity Temp vs Ratio. temp
Temp Ratio Density 0.00 0.02 0.04 0.06 0.08 0.10 0.12 Density 0.0 0.2 0.4 0.6 0.8 1.0 1. (a) 170 175 180 185 temp 1.0 1.5 2.0 2.5 3.0 ratio The histogram shows that the temperature measures have two peaks,
More informationSimple and Multiple Linear Regression
Sta. 113 Chapter 12 and 13 of Devore March 12, 2010 Table of contents 1 Simple Linear Regression 2 Model Simple Linear Regression A simple linear regression model is given by Y = β 0 + β 1 x + ɛ where
More informationST430 Exam 1 with Answers
ST430 Exam 1 with Answers Date: October 5, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textook are permitted but you may use a calculator.
More informationChapter 16: Understanding Relationships Numerical Data
Chapter 16: Understanding Relationships Numerical Data These notes reflect material from our text, Statistics, Learning from Data, First Edition, by Roxy Peck, published by CENGAGE Learning, 2015. Linear
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 15: Examples of hypothesis tests (v5) Ramesh Johari ramesh.johari@stanford.edu 1 / 32 The recipe 2 / 32 The hypothesis testing recipe In this lecture we repeatedly apply the
More informationLecture 6 Multiple Linear Regression, cont.
Lecture 6 Multiple Linear Regression, cont. BIOST 515 January 22, 2004 BIOST 515, Lecture 6 Testing general linear hypotheses Suppose we are interested in testing linear combinations of the regression
More informationSimple Linear Regression
Simple Linear Regression MATH 282A Introduction to Computational Statistics University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/ eariasca/math282a.html MATH 282A University
More informationApplied Regression Analysis
Applied Regression Analysis Chapter 3 Multiple Linear Regression Hongcheng Li April, 6, 2013 Recall simple linear regression 1 Recall simple linear regression 2 Parameter Estimation 3 Interpretations of
More informationLinear Regression Model. Badr Missaoui
Linear Regression Model Badr Missaoui Introduction What is this course about? It is a course on applied statistics. It comprises 2 hours lectures each week and 1 hour lab sessions/tutorials. We will focus
More informationChapter 8: Simple Linear Regression
Chapter 8: Simple Linear Regression Shiwen Shen University of South Carolina 2017 Summer 1 / 70 Introduction A problem that arises in engineering, economics, medicine, and other areas is that of investigating
More informationSimple Linear Regression
Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)
More informationLinear models and their mathematical foundations: Simple linear regression
Linear models and their mathematical foundations: Simple linear regression Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/21 Introduction
More informationCAS MA575 Linear Models
CAS MA575 Linear Models Boston University, Fall 2013 Midterm Exam (Correction) Instructor: Cedric Ginestet Date: 22 Oct 2013. Maximal Score: 200pts. Please Note: You will only be graded on work and answers
More informationSimple Linear Regression
Simple Linear Regression In simple linear regression we are concerned about the relationship between two variables, X and Y. There are two components to such a relationship. 1. The strength of the relationship.
More informationBiostatistics 380 Multiple Regression 1. Multiple Regression
Biostatistics 0 Multiple Regression ORIGIN 0 Multiple Regression Multiple Regression is an extension of the technique of linear regression to describe the relationship between a single dependent (response)
More informationSTAT 215 Confidence and Prediction Intervals in Regression
STAT 215 Confidence and Prediction Intervals in Regression Colin Reimer Dawson Oberlin College 24 October 2016 Outline Regression Slope Inference Partitioning Variability Prediction Intervals Reminder:
More informationLectures on Simple Linear Regression Stat 431, Summer 2012
Lectures on Simple Linear Regression Stat 43, Summer 0 Hyunseung Kang July 6-8, 0 Last Updated: July 8, 0 :59PM Introduction Previously, we have been investigating various properties of the population
More informationSTAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing
STAT763: Applied Regression Analysis Multiple linear regression 4.4 Hypothesis testing Chunsheng Ma E-mail: cma@math.wichita.edu 4.4.1 Significance of regression Null hypothesis (Test whether all β j =
More informationMeasuring the fit of the model - SSR
Measuring the fit of the model - SSR Once we ve determined our estimated regression line, we d like to know how well the model fits. How far/close are the observations to the fitted line? One way to do
More informationAMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression
AMS 315/576 Lecture Notes Chapter 11. Simple Linear Regression 11.1 Motivation A restaurant opening on a reservations-only basis would like to use the number of advance reservations x to predict the number
More informationMultiple Linear Regression
Multiple Linear Regression ST 430/514 Recall: a regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates).
More informationSTAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)
STAT40 Midterm Exam University of Illinois Urbana-Champaign October 19 (Friday), 018 3:00 4:15p SOLUTIONS (Yellow) Question 1 (15 points) (10 points) 3 (50 points) extra ( points) Total (77 points) Points
More informationSSR = The sum of squared errors measures how much Y varies around the regression line n. It happily turns out that SSR + SSE = SSTO.
Analysis of variance approach to regression If x is useless, i.e. β 1 = 0, then E(Y i ) = β 0. In this case β 0 is estimated by Ȳ. The ith deviation about this grand mean can be written: deviation about
More informationAnalytics 512: Homework # 2 Tim Ahn February 9, 2016
Analytics 512: Homework # 2 Tim Ahn February 9, 2016 Chapter 3 Problem 1 (# 3) Suppose we have a data set with five predictors, X 1 = GP A, X 2 = IQ, X 3 = Gender (1 for Female and 0 for Male), X 4 = Interaction
More informationSimple Linear Regression
Simple Linear Regression ST 370 Regression models are used to study the relationship of a response variable and one or more predictors. The response is also called the dependent variable, and the predictors
More informationMultiple Linear Regression (solutions to exercises)
Chapter 6 1 Chapter 6 Multiple Linear Regression (solutions to exercises) Chapter 6 CONTENTS 2 Contents 6 Multiple Linear Regression (solutions to exercises) 1 6.1 Nitrate concentration..........................
More informationLecture 4 Multiple linear regression
Lecture 4 Multiple linear regression BIOST 515 January 15, 2004 Outline 1 Motivation for the multiple regression model Multiple regression in matrix notation Least squares estimation of model parameters
More informationIntroduction and Single Predictor Regression. Correlation
Introduction and Single Predictor Regression Dr. J. Kyle Roberts Southern Methodist University Simmons School of Education and Human Development Department of Teaching and Learning Correlation A correlation
More informationUNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017
UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Tuesday, January 17, 2017 Work all problems 60 points are needed to pass at the Masters Level and 75
More informationVariance Decomposition and Goodness of Fit
Variance Decomposition and Goodness of Fit 1. Example: Monthly Earnings and Years of Education In this tutorial, we will focus on an example that explores the relationship between total monthly earnings
More informationStatistics for Engineers Lecture 9 Linear Regression
Statistics for Engineers Lecture 9 Linear Regression Chong Ma Department of Statistics University of South Carolina chongm@email.sc.edu April 17, 2017 Chong Ma (Statistics, USC) STAT 509 Spring 2017 April
More informationR 2 and F -Tests and ANOVA
R 2 and F -Tests and ANOVA December 6, 2018 1 Partition of Sums of Squares The distance from any point y i in a collection of data, to the mean of the data ȳ, is the deviation, written as y i ȳ. Definition.
More informationLinear Regression. Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x).
Linear Regression Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x). A dependent variable is a random variable whose variation
More informationwhere x and ȳ are the sample means of x 1,, x n
y y Animal Studies of Side Effects Simple Linear Regression Basic Ideas In simple linear regression there is an approximately linear relation between two variables say y = pressure in the pancreas x =
More informationUNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test, October 2013
UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test, October 2013 STAC67H3 Regression Analysis Duration: One hour and fifty minutes Last Name: First Name: Student
More informationST430 Exam 2 Solutions
ST430 Exam 2 Solutions Date: November 9, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textbook are permitted but you may use a calculator. Giving
More informationRegression Analysis. Regression: Methodology for studying the relationship among two or more variables
Regression Analysis Regression: Methodology for studying the relationship among two or more variables Two major aims: Determine an appropriate model for the relationship between the variables Predict the
More informationApplied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013
Applied Regression Chapter 2 Simple Linear Regression Hongcheng Li April, 6, 2013 Outline 1 Introduction of simple linear regression 2 Scatter plot 3 Simple linear regression model 4 Test of Hypothesis
More informationEstimated Simple Regression Equation
Simple Linear Regression A simple linear regression model that describes the relationship between two variables x and y can be expressed by the following equation. The numbers α and β are called parameters,
More informationRegression Analysis lab 3. 1 Multiple linear regression. 1.1 Import data. 1.2 Scatterplot matrix
Regression Analysis lab 3 1 Multiple linear regression 1.1 Import data delivery
More informationHomework 9 Sample Solution
Homework 9 Sample Solution # 1 (Ex 9.12, Ex 9.23) Ex 9.12 (a) Let p vitamin denote the probability of having cold when a person had taken vitamin C, and p placebo denote the probability of having cold
More informationSTAT 350: Summer Semester Midterm 1: Solutions
Name: Student Number: STAT 350: Summer Semester 2008 Midterm 1: Solutions 9 June 2008 Instructor: Richard Lockhart Instructions: This is an open book test. You may use notes, text, other books and a calculator.
More informationSTAT5044: Regression and Anova. Inyoung Kim
STAT5044: Regression and Anova Inyoung Kim 2 / 47 Outline 1 Regression 2 Simple Linear regression 3 Basic concepts in regression 4 How to estimate unknown parameters 5 Properties of Least Squares Estimators:
More informationANOVA (Analysis of Variance) output RLS 11/20/2016
ANOVA (Analysis of Variance) output RLS 11/20/2016 1. Analysis of Variance (ANOVA) The goal of ANOVA is to see if the variation in the data can explain enough to see if there are differences in the means.
More informationHomework 2: Simple Linear Regression
STAT 4385 Applied Regression Analysis Homework : Simple Linear Regression (Simple Linear Regression) Thirty (n = 30) College graduates who have recently entered the job market. For each student, the CGPA
More informationCorrelation Analysis
Simple Regression Correlation Analysis Correlation analysis is used to measure strength of the association (linear relationship) between two variables Correlation is only concerned with strength of the
More informationLecture 18: Simple Linear Regression
Lecture 18: Simple Linear Regression BIOS 553 Department of Biostatistics University of Michigan Fall 2004 The Correlation Coefficient: r The correlation coefficient (r) is a number that measures the strength
More informationInferences for Regression
Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In
More informationStat 5102 Final Exam May 14, 2015
Stat 5102 Final Exam May 14, 2015 Name Student ID The exam is closed book and closed notes. You may use three 8 1 11 2 sheets of paper with formulas, etc. You may also use the handouts on brand name distributions
More informationMultiple Linear Regression
Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there
More informationLinear Regression. In this lecture we will study a particular type of regression model: the linear regression model
1 Linear Regression 2 Linear Regression In this lecture we will study a particular type of regression model: the linear regression model We will first consider the case of the model with one predictor
More informationFormal Statement of Simple Linear Regression Model
Formal Statement of Simple Linear Regression Model Y i = β 0 + β 1 X i + ɛ i Y i value of the response variable in the i th trial β 0 and β 1 are parameters X i is a known constant, the value of the predictor
More informationSolution to Series 3
Prof. Nicolai Meinshausen Regression FS 2016 Solution to Series 3 1. a) The general least-squares regression estimator is given as Using the model equation, we get in this case ( ) X T x X (1)T x (1) x
More informationMA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7
MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 1 Random Vectors Let a 0 and y be n 1 vectors, and let A be an n n matrix. Here, a 0 and A are non-random, whereas y is
More informationMATH 644: Regression Analysis Methods
MATH 644: Regression Analysis Methods FINAL EXAM Fall, 2012 INSTRUCTIONS TO STUDENTS: 1. This test contains SIX questions. It comprises ELEVEN printed pages. 2. Answer ALL questions for a total of 100
More informationMath 3330: Solution to midterm Exam
Math 3330: Solution to midterm Exam Question 1: (14 marks) Suppose the regression model is y i = β 0 + β 1 x i + ε i, i = 1,, n, where ε i are iid Normal distribution N(0, σ 2 ). a. (2 marks) Compute the
More information1 Use of indicator random variables. (Chapter 8)
1 Use of indicator random variables. (Chapter 8) let I(A) = 1 if the event A occurs, and I(A) = 0 otherwise. I(A) is referred to as the indicator of the event A. The notation I A is often used. 1 2 Fitting
More informationVariance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017
Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017 PDF file location: http://www.murraylax.org/rtutorials/regression_anovatable.pdf
More informationLecture 15. Hypothesis testing in the linear model
14. Lecture 15. Hypothesis testing in the linear model Lecture 15. Hypothesis testing in the linear model 1 (1 1) Preliminary lemma 15. Hypothesis testing in the linear model 15.1. Preliminary lemma Lemma
More informationFoundations of Correlation and Regression
BWH - Biostatistics Intermediate Biostatistics for Medical Researchers Robert Goldman Professor of Statistics Simmons College Foundations of Correlation and Regression Tuesday, March 7, 2017 March 7 Foundations
More informationInference for Regression Simple Linear Regression
Inference for Regression Simple Linear Regression IPS Chapter 10.1 2009 W.H. Freeman and Company Objectives (IPS Chapter 10.1) Simple linear regression p Statistical model for linear regression p Estimating
More informationSimple linear regression
Simple linear regression Biometry 755 Spring 2008 Simple linear regression p. 1/40 Overview of regression analysis Evaluate relationship between one or more independent variables (X 1,...,X k ) and a single
More information1 Multiple Regression
1 Multiple Regression In this section, we extend the linear model to the case of several quantitative explanatory variables. There are many issues involved in this problem and this section serves only
More informationChapter 12 - Lecture 2 Inferences about regression coefficient
Chapter 12 - Lecture 2 Inferences about regression coefficient April 19th, 2010 Facts about slope Test Statistic Confidence interval Hypothesis testing Test using ANOVA Table Facts about slope In previous
More informationNo other aids are allowed. For example you are not allowed to have any other textbook or past exams.
UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Sample Exam Note: This is one of our past exams, In fact the only past exam with R. Before that we were using SAS. In
More informationScatter plot of data from the study. Linear Regression
1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25
More informationHow to mathematically model a linear relationship and make predictions.
Introductory Statistics Lectures Linear regression How to mathematically model a linear relationship and make predictions. Department of Mathematics Pima Community College (Compile date: Mon Apr 28 20:50:28
More informationReview: General Approach to Hypothesis Testing. 1. Define the research question and formulate the appropriate null and alternative hypotheses.
1 Review: Let X 1, X,..., X n denote n independent random variables sampled from some distribution might not be normal!) with mean µ) and standard deviation σ). Then X µ σ n In other words, X is approximately
More informationChapter 14. Linear least squares
Serik Sagitov, Chalmers and GU, March 5, 2018 Chapter 14 Linear least squares 1 Simple linear regression model A linear model for the random response Y = Y (x) to an independent variable X = x For a given
More informationHow to mathematically model a linear relationship and make predictions.
Introductory Statistics Lectures Linear regression How to mathematically model a linear relationship and make predictions. Department of Mathematics Pima Community College Redistribution of this material
More information22s:152 Applied Linear Regression. Take random samples from each of m populations.
22s:152 Applied Linear Regression Chapter 8: ANOVA NOTE: We will meet in the lab on Monday October 10. One-way ANOVA Focuses on testing for differences among group means. Take random samples from each
More informationRegression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin
Regression Review Statistics 149 Spring 2006 Copyright c 2006 by Mark E. Irwin Matrix Approach to Regression Linear Model: Y i = β 0 + β 1 X i1 +... + β p X ip + ɛ i ; ɛ i iid N(0, σ 2 ), i = 1,..., n
More informationChapter 11: Linear Regression and Correla4on. Correla4on
Chapter 11: Linear Regression and Correla4on Regression analysis is a sta3s3cal tool that u3lizes the rela3on between two or more quan3ta3ve variables so that one variable can be predicted from the other,
More informationLinear regression. We have that the estimated mean in linear regression is. ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. The standard error of ˆµ Y X=x is.
Linear regression We have that the estimated mean in linear regression is The standard error of ˆµ Y X=x is where x = 1 n s.e.(ˆµ Y X=x ) = σ ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. 1 n + (x x)2 i (x i x) 2 i x i. The
More informationCorrelation and the Analysis of Variance Approach to Simple Linear Regression
Correlation and the Analysis of Variance Approach to Simple Linear Regression Biometry 755 Spring 2009 Correlation and the Analysis of Variance Approach to Simple Linear Regression p. 1/35 Correlation
More informationStatistiek II. John Nerbonne. March 17, Dept of Information Science incl. important reworkings by Harmut Fitz
Dept of Information Science j.nerbonne@rug.nl incl. important reworkings by Harmut Fitz March 17, 2015 Review: regression compares result on two distinct tests, e.g., geographic and phonetic distance of
More informationChapter 12: Multiple Linear Regression
Chapter 12: Multiple Linear Regression Seungchul Baek Department of Statistics, University of South Carolina STAT 509: Statistics for Engineers 1 / 55 Introduction A regression model can be expressed as
More informationLecture 1: Linear Models and Applications
Lecture 1: Linear Models and Applications Claudia Czado TU München c (Claudia Czado, TU Munich) ZFS/IMS Göttingen 2004 0 Overview Introduction to linear models Exploratory data analysis (EDA) Estimation
More informationRegression used to predict or estimate the value of one variable corresponding to a given value of another variable.
CHAPTER 9 Simple Linear Regression and Correlation Regression used to predict or estimate the value of one variable corresponding to a given value of another variable. X = independent variable. Y = dependent
More informationSection 4.6 Simple Linear Regression
Section 4.6 Simple Linear Regression Objectives ˆ Basic philosophy of SLR and the regression assumptions ˆ Point & interval estimation of the model parameters, and how to make predictions ˆ Point and interval
More informationScatter plot of data from the study. Linear Regression
1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25
More informationMultiple Regression Analysis
Multiple Regression Analysis y = 0 + 1 x 1 + x +... k x k + u 6. Heteroskedasticity What is Heteroskedasticity?! Recall the assumption of homoskedasticity implied that conditional on the explanatory variables,
More informationGenerating OLS Results Manually via R
Generating OLS Results Manually via R Sujan Bandyopadhyay Statistical softwares and packages have made it extremely easy for people to run regression analyses. Packages like lm in R or the reg command
More informationChapter 8 Conclusion
1 Chapter 8 Conclusion Three questions about test scores (score) and student-teacher ratio (str): a) After controlling for differences in economic characteristics of different districts, does the effect
More informationInference for the Regression Coefficient
Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression line. We can shows that b 0 and b 1 are the unbiased estimates
More informationNonstationary time series models
13 November, 2009 Goals Trends in economic data. Alternative models of time series trends: deterministic trend, and stochastic trend. Comparison of deterministic and stochastic trend models The statistical
More informationBiostatistics for physicists fall Correlation Linear regression Analysis of variance
Biostatistics for physicists fall 2015 Correlation Linear regression Analysis of variance Correlation Example: Antibody level on 38 newborns and their mothers There is a positive correlation in antibody
More informationMAT2377. Rafa l Kulik. Version 2015/November/26. Rafa l Kulik
MAT2377 Rafa l Kulik Version 2015/November/26 Rafa l Kulik Bivariate data and scatterplot Data: Hydrocarbon level (x) and Oxygen level (y): x: 0.99, 1.02, 1.15, 1.29, 1.46, 1.36, 0.87, 1.23, 1.55, 1.40,
More informationSCHOOL OF MATHEMATICS AND STATISTICS
RESTRICTED OPEN BOOK EXAMINATION (Not to be removed from the examination hall) Data provided: Statistics Tables by H.R. Neave MAS5052 SCHOOL OF MATHEMATICS AND STATISTICS Basic Statistics Spring Semester
More informationProblems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B
Simple Linear Regression 35 Problems 1 Consider a set of data (x i, y i ), i =1, 2,,n, and the following two regression models: y i = β 0 + β 1 x i + ε, (i =1, 2,,n), Model A y i = γ 0 + γ 1 x i + γ 2
More informationInference for Regression Inference about the Regression Model and Using the Regression Line
Inference for Regression Inference about the Regression Model and Using the Regression Line PBS Chapter 10.1 and 10.2 2009 W.H. Freeman and Company Objectives (PBS Chapter 10.1 and 10.2) Inference about
More informationEstimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.
Estimating σ 2 We can do simple prediction of Y and estimation of the mean of Y at any value of X. To perform inferences about our regression line, we must estimate σ 2, the variance of the error term.
More informationSTAT2012 Statistical Tests 23 Regression analysis: method of least squares
23 Regression analysis: method of least squares L23 Regression analysis The main purpose of regression is to explore the dependence of one variable (Y ) on another variable (X). 23.1 Introduction (P.532-555)
More informationReplication of Examples in Chapter 6
Replication of Examples in Chapter 6 Zheng Tian 1 Introduction This document is to show how to perform hypothesis testing for a single coefficient in a simple linear regression model. I replicate examples
More information