Introduction to Linear Regression Rebecca C. Steorts September 15, 2015
|
|
- Elfreda Shields
- 5 years ago
- Views:
Transcription
1 Introduction to Linear Regression Rebecca C. Steorts September 15, 2015 Today (Re-)Introduction to linear models and the model space What is linear regression Basic properties of linear regression Using data frames for statistical purposes Manipulation of data into more convenient forms How do we do exploratory analysis to see if linear regression is appropriate? Linear regression Let Y n 1 be the response (poverty rate) Let X n p represent a matrix of covariates (age, education level, state you live in, etc.) Thus, for each observation, there are p covariates. Let β be an unknown parameter that can help up estimate future poverty rates where ɛ N(0, σ 2 I). Y n 1 = X n p β p 1 + ɛ n 1 Estimation and Prediction We seek to estimate ˆβ which is the Let f(y ) = Y Xβ 2 and solve for arg min Y Xβ 2. f(y ) β = β [(Y Xβ)T (Y Xβ)] =... We thus, find that ˆβ = (X T X) 1 X T Y. Predicting Ŷ We can now predict new observations via, where H is often called the hat matrix. Ŷ = X(X T X) 1 X T Y = HY 1
2 How can we evaluate an linear model? Residual plots Outliers Colliearity Residual plots Graphical tool for identifying non-linearity. For each observation i, the residual is e i = y i ŷ i. Plotting the residuals is plotting e i versus one covariate x i for all i. Residuals for multiple covariates Translating to multiple covariates, we plot the residuals versus the fitted values. That is, we plot e i versus ŷ i. (Think about why on your own). A strong pattern in the residuals indicates non-linearity. Instead: non-linear transformation of the covariates. Outliers An outlier is a point for which y i is far from the value predicted by the model. Can identify these by residual plots but often it s hard to know how large a residual should be to consider a point an outlier. Instead, we can plot the studentized residuals, computed by dividing each residual e i by its estimated standard error. Observations whose studentized residuals are greater than 3 in absolute value are possible outliers. What to do if you think you have an outliner, remove it. High-leverage points Recall that outliers are observations for which the response y i is unusual given the predictor x i Observations with high leverage have an unusual value for x i Correlation It s important that the error terms e are all uncorrelated. e are uncorrelated, means that if e i > 0 provides little or no information about the sign of e i+1 The fitted values (or standard errors) are computed assumed the e are uncorrleated. If there is correlated, then the estimated std errors will tend to understimate the true std errors. This implies confidence and prediction intervals will be too narrow. 2
3 Figure 1: Observation 41 is a high leverage point, while 20 is not. The red line is the fit to all the data, and the dotted line is the fit with observation 41 removed. Center: The red observation is not unusual in terms of its X1 value or its X2 value, but still falls outside the bulk of the data, and hence has high leverage. Right: Observation 41 has a high leverage and a high residual. Similarly, p-value will be lower than they should (and this could cause a parameter to be thought to be statistically significant when it s not). Where does this happen often time series data (things are correlated). Collinearity Collinearity refers to the situation in which two or more predictor variables are closely related to each other. Suppose we are predicting balance of credit card. Credit limit and age when plotted appear to have no obvious relation. However, credit limit and credit rating do (and this is also intuitive is you have credit and own a credit card) Credit limit and rating are high correlated. Collinearity (continued) The presence of collinearity can pose problems in the regression context, since it can be difficult to separate out the individual effects of collinear variables on the response. Collinearity reduces accuracy of estimates of regression coefficients and causes std error for beta ˆ j to grow. Elements of this matrix that are large in absolute value indicated pair of highly correlated variables. Another way to check is computing the variance inflation factor (VIF). VIF of 1 says no colllinearity. VIF above 5 problematic amount. How to deal with collinearity? The first is to drop one of the problematic variables from the regression. The second solution is to combine the collinear variables together into a single predictor. (better solution since not throwing away data) Example: For take the average of standardized versions of limit and rating in order to create a new variable that measures credit worthiness. 3
4 So You ve Got A Data Frame What can we do with it? Plot it: examine multiple variables and distributions Test it: compare groups of individuals to each other Check it: does it conform to what we d like for our needs? Test Case: Birth weight data Included in R already: library(mass) data(birthwt) summary(birthwt) low age lwt race Min. : Min. :14.00 Min. : 80.0 Min. : st Qu.: st Qu.: st Qu.: st Qu.:1.000 Median : Median :23.00 Median :121.0 Median :1.000 Mean : Mean :23.24 Mean :129.8 Mean : rd Qu.: rd Qu.: rd Qu.: rd Qu.:3.000 Max. : Max. :45.00 Max. :250.0 Max. :3.000 smoke ptl ht ui Min. : Min. : Min. : Min. : st Qu.: st Qu.: st Qu.: st Qu.: Median : Median : Median : Median : Mean : Mean : Mean : Mean : rd Qu.: rd Qu.: rd Qu.: rd Qu.: Max. : Max. : Max. : Max. : ftv bwt Min. : Min. : 709 1st Qu.: st Qu.:2414 Median : Median :2977 Mean : Mean :2945 3rd Qu.: rd Qu.:3487 Max. : Max. :4990 From R help Go to R help for more info, because someone documented this (thanks, someone!) help(birthwt) Make it readable! 4
5 colnames(birthwt) [1] "low" "age" "lwt" "race" "smoke" "ptl" "ht" "ui" [9] "ftv" "bwt" colnames(birthwt) <- c("birthwt.below.2500", "mother.age", "mother.weight", "race", "mother.smokes", "previous.prem.labor", "hypertension", "uterine.irr", "physician.visits", "birthwt.grams") Make it readable, again! Let s make all the factors more descriptive. birthwt$race <- factor(c("white", "black", "other")[birthwt$race]) birthwt$mother.smokes <- factor(c("no", "Yes")[birthwt$mother.smokes + 1]) birthwt$uterine.irr <- factor(c("no", "Yes")[birthwt$uterine.irr + 1]) birthwt$hypertension <- factor(c("no", "Yes")[birthwt$hypertension + 1]) Make it readable, again! summary(birthwt) birthwt.below.2500 mother.age mother.weight race Min. : Min. :14.00 Min. : 80.0 black:26 1st Qu.: st Qu.: st Qu.:110.0 other:67 Median : Median :23.00 Median :121.0 white:96 Mean : Mean :23.24 Mean : rd Qu.: rd Qu.: rd Qu.:140.0 Max. : Max. :45.00 Max. :250.0 mother.smokes previous.prem.labor hypertension uterine.irr No :115 Min. : No :177 No :161 Yes: 74 1st Qu.: Yes: 12 Yes: 28 Median : Mean : rd Qu.: Max. : physician.visits birthwt.grams Min. : Min. : 709 1st Qu.: st Qu.:2414 Median : Median :2977 Mean : Mean :2945 3rd Qu.: rd Qu.:3487 Max. : Max. :4990 5
6 Explore it! R s basic plotting functions go a long way. plot (birthwt$race) title (main = "Count of Mother's Race in Springfield MA, 1986") Count of Mother's Race in Springfield MA, black other white Explore it! R s basic plotting functions go a long way. plot (birthwt$mother.age) title (main = "Mother's Ages in Springfield MA, 1986") 6
7 Mother's Ages in Springfield MA, 1986 birthwt$mother.age Index Explore it! R s basic plotting functions go a long way. plot (sort(birthwt$mother.age)) title (main = "(Sorted) Mother's Ages in Springfield MA, 1986") 7
8 (Sorted) Mother's Ages in Springfield MA, 1986 sort(birthwt$mother.age) Index Explore it! R s basic plotting functions go a long way. plot (birthwt$mother.age, birthwt$birthwt.grams) title (main = "Birth Weight by Mother's Age in Springfield MA, 1986") 8
9 Birth Weight by Mother's Age in Springfield MA, 1986 birthwt$birthwt.grams birthwt$mother.age Basic statistical testing Let s fit some models to the data pertaining to our outcome(s) of interest. plot (birthwt$mother.smokes, birthwt$birthwt.grams, main="birth Weight by Mother's Smoking Habit", ylab 9
10 Birth Weight by Mother's Smoking Habit Birth Weight (g) No Yes Mother Smokes Basic statistical testing Tough to tell! Simple two-sample t-test. We re testing: H o : µ 1 = µ 2 versus H a : µ 1 µ 2 where µ 1 is the mean birth weight when the mother smokes and µ 2 is the mean birth weight when the mother doesn t smoke. t.test (birthwt$birthwt.grams[birthwt$mother.smokes == "Yes"], birthwt$birthwt.grams[birthwt$mother.smokes == "No"]) Welch Two Sample t-test data: birthwt$birthwt.grams[birthwt$mother.smokes == "Yes"] and birthwt$birthwt.grams[birthwt$mother t = , df = 170.1, p-value = alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: sample estimates: mean of x mean of y We reject the null in favor that the two means are not the same. 10
11 Basic statistical testing Does this difference match the linear model? linear.model.1 <- lm (birthwt.grams ~ mother.smokes, data=birthwt) linear.model.1 Call: lm(formula = birthwt.grams ~ mother.smokes, data = birthwt) Coefficients: (Intercept) mother.smokesyes Basic statistical testing Does this difference match the linear model? summary(linear.model.1) Call: lm(formula = birthwt.grams ~ mother.smokes, data = birthwt) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) < 2e-16 *** mother.smokesyes ** --- Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: on 187 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 1 and 187 DF, p-value: Basic statistical testing Does this difference match the linear model? linear.model.2 <- lm (birthwt.grams ~ mother.age, data=birthwt) linear.model.2 Call: 11
12 lm(formula = birthwt.grams ~ mother.age, data = birthwt) Coefficients: (Intercept) mother.age Basic statistical testing summary(linear.model.2) Call: lm(formula = birthwt.grams ~ mother.age, data = birthwt) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) <2e-16 *** mother.age Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: on 187 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 1 and 187 DF, p-value: Basic statistical testing Diagnostics: R tries to make it as easy as possible (but no easier). Try in R proper: plot(linear.model.2) 12
13 Residuals Residuals vs Fitted Fitted values lm(birthwt.grams ~ mother.age) Normal Q Q Standardized residuals Theoretical Quantiles lm(birthwt.grams ~ mother.age) 13
14 Scale Location Standardized residuals Fitted values lm(birthwt.grams ~ mother.age) Residuals vs Leverage Standardized residuals Cook's distance Leverage lm(birthwt.grams ~ mother.age) Detecting Outliers These are the default diagnostic plots for the analysis. Note that our oldest mother and her heaviest child are greatly skewing this analysis as we suspected. 14
15 birthwt.noout <- birthwt[birthwt$mother.age <= 40,] linear.model.3 <- lm (birthwt.grams ~ mother.age, data=birthwt.noout) linear.model.3 Call: lm(formula = birthwt.grams ~ mother.age, data = birthwt.noout) Coefficients: (Intercept) mother.age Detecting Outliers summary(linear.model.3) Call: lm(formula = birthwt.grams ~ mother.age, data = birthwt.noout) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) <2e-16 *** mother.age Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: on 186 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 1 and 186 DF, p-value: More complex models Add in smoking behavior: linear.model.3b <- lm (birthwt.grams ~ mother.age + mother.smokes*race, data=birthwt.noout) summary(linear.model.3b) Call: lm(formula = birthwt.grams ~ mother.age + mother.smokes * race, data = birthwt.noout) Residuals: 15
16 Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) < 2e-16 *** mother.age mother.smokesyes raceother racewhite ** mother.smokesyes:raceother mother.smokesyes:racewhite Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: on 181 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 6 and 181 DF, p-value: More complex models plot(linear.model.3b) Residuals vs Fitted Residuals Fitted values lm(birthwt.grams ~ mother.age + mother.smokes * race) 16
17 Normal Q Q Standardized residuals Theoretical Quantiles lm(birthwt.grams ~ mother.age + mother.smokes * race) Scale Location 10 4 Standardized residuals Fitted values lm(birthwt.grams ~ mother.age + mother.smokes * race) 17
18 Residuals vs Leverage Standardized residuals Cook's distance Leverage lm(birthwt.grams ~ mother.age + mother.smokes * race) Everything Must Go (In) Let s do a kitchen sink model on this new data set: linear.model.4 <- lm (birthwt.grams ~., data=birthwt.noout) linear.model.4 Call: lm(formula = birthwt.grams ~., data = birthwt.noout) Coefficients: (Intercept) birthwt.below.2500 mother.age mother.weight raceother racewhite mother.smokesyes previous.prem.labor hypertensionyes uterine.irryes physician.visits Everything Must Go (In), Except What Must Not Whoops! One of those variables was birthwt.below.2500 which is a function of the outcome. 18
19 linear.model.4a <- lm (birthwt.grams ~. - birthwt.below.2500, data=birthwt.noout) summary(linear.model.4a) Call: lm(formula = birthwt.grams ~. - birthwt.below.2500, data = birthwt.noout) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) e-13 *** mother.age mother.weight ** raceother racewhite *** mother.smokesyes ** previous.prem.labor hypertensionyes ** uterine.irryes *** physician.visits Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 638 on 178 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 9 and 178 DF, p-value: 8.255e-08 Everything Must Go (In), Except What Must Not Whoops! One of those variables was birthwt.below.2500 which is a function of the outcome. plot(linear.model.4a) 19
20 Residuals vs Fitted Residuals Fitted values lm(birthwt.grams ~. birthwt.below.2500) Normal Q Q Standardized residuals Theoretical Quantiles lm(birthwt.grams ~. birthwt.below.2500) 20
21 Standardized residuals Scale Location Fitted values lm(birthwt.grams ~. birthwt.below.2500) Residuals vs Leverage Standardized residuals Cook's distance Leverage lm(birthwt.grams ~. birthwt.below.2500) 21
UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017
UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Tuesday, January 17, 2017 Work all problems 60 points are needed to pass at the Masters Level and 75
More informationST430 Exam 1 with Answers
ST430 Exam 1 with Answers Date: October 5, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textook are permitted but you may use a calculator.
More informationIntroduction to linear model
Introduction to linear model Valeska Andreozzi 2012 References 3 Correlation 5 Definition.................................................................... 6 Pearson correlation coefficient.......................................................
More informationDensity Temp vs Ratio. temp
Temp Ratio Density 0.00 0.02 0.04 0.06 0.08 0.10 0.12 Density 0.0 0.2 0.4 0.6 0.8 1.0 1. (a) 170 175 180 185 temp 1.0 1.5 2.0 2.5 3.0 ratio The histogram shows that the temperature measures have two peaks,
More informationLinear Modelling: Simple Regression
Linear Modelling: Simple Regression 10 th of Ma 2018 R. Nicholls / D.-L. Couturier / M. Fernandes Introduction: ANOVA Used for testing hpotheses regarding differences between groups Considers the variation
More informationIntroduction and Single Predictor Regression. Correlation
Introduction and Single Predictor Regression Dr. J. Kyle Roberts Southern Methodist University Simmons School of Education and Human Development Department of Teaching and Learning Correlation A correlation
More informationSimple linear regression
Simple linear regression Business Statistics 41000 Fall 2015 1 Topics 1. conditional distributions, squared error, means and variances 2. linear prediction 3. signal + noise and R 2 goodness of fit 4.
More information14 Multiple Linear Regression
B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 14 Multiple Linear Regression 14.1 The multiple linear regression model In simple linear regression, the response variable y is expressed in
More informationLecture 1: Linear Models and Applications
Lecture 1: Linear Models and Applications Claudia Czado TU München c (Claudia Czado, TU Munich) ZFS/IMS Göttingen 2004 0 Overview Introduction to linear models Exploratory data analysis (EDA) Estimation
More informationLecture 4 Multiple linear regression
Lecture 4 Multiple linear regression BIOST 515 January 15, 2004 Outline 1 Motivation for the multiple regression model Multiple regression in matrix notation Least squares estimation of model parameters
More informationWe d like to know the equation of the line shown (the so called best fit or regression line).
Linear Regression in R. Example. Let s create a data frame. > exam1 = c(100,90,90,85,80,75,60) > exam2 = c(95,100,90,80,95,60,40) > students = c("asuka", "Rei", "Shinji", "Mari", "Hikari", "Toji", "Kensuke")
More informationInference for Regression
Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu
More informationMultiple Linear Regression
Multiple Linear Regression ST 430/514 Recall: a regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates).
More informationLecture 8: Fitting Data Statistical Computing, Wednesday October 7, 2015
Lecture 8: Fitting Data Statistical Computing, 36-350 Wednesday October 7, 2015 In previous episodes Loading and saving data sets in R format Loading and saving data sets in other structured formats Intro
More informationLinear Regression. In this lecture we will study a particular type of regression model: the linear regression model
1 Linear Regression 2 Linear Regression In this lecture we will study a particular type of regression model: the linear regression model We will first consider the case of the model with one predictor
More informationUnit 6 - Introduction to linear regression
Unit 6 - Introduction to linear regression Suggested reading: OpenIntro Statistics, Chapter 7 Suggested exercises: Part 1 - Relationship between two numerical variables: 7.7, 7.9, 7.11, 7.13, 7.15, 7.25,
More informationLecture 18: Simple Linear Regression
Lecture 18: Simple Linear Regression BIOS 553 Department of Biostatistics University of Michigan Fall 2004 The Correlation Coefficient: r The correlation coefficient (r) is a number that measures the strength
More informationSTATISTICS 479 Exam II (100 points)
Name STATISTICS 79 Exam II (1 points) 1. A SAS data set was created using the following input statement: Answer parts(a) to (e) below. input State $ City $ Pop199 Income Housing Electric; (a) () Give the
More informationLecture 6 Multiple Linear Regression, cont.
Lecture 6 Multiple Linear Regression, cont. BIOST 515 January 22, 2004 BIOST 515, Lecture 6 Testing general linear hypotheses Suppose we are interested in testing linear combinations of the regression
More informationAnalytics 512: Homework # 2 Tim Ahn February 9, 2016
Analytics 512: Homework # 2 Tim Ahn February 9, 2016 Chapter 3 Problem 1 (# 3) Suppose we have a data set with five predictors, X 1 = GP A, X 2 = IQ, X 3 = Gender (1 for Female and 0 for Male), X 4 = Interaction
More informationTHE LINEAR DISCRIMINATION PROBLEM
What exactly is the linear discrimination story? In the logistic regression problem we have 0/ dependent variable, and we set up a model that predict this from independent variables. Specifically we use
More informationRegression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin
Regression Review Statistics 149 Spring 2006 Copyright c 2006 by Mark E. Irwin Matrix Approach to Regression Linear Model: Y i = β 0 + β 1 X i1 +... + β p X ip + ɛ i ; ɛ i iid N(0, σ 2 ), i = 1,..., n
More information1 The Classic Bivariate Least Squares Model
Review of Bivariate Linear Regression Contents 1 The Classic Bivariate Least Squares Model 1 1.1 The Setup............................... 1 1.2 An Example Predicting Kids IQ................. 1 2 Evaluating
More informationMultiple Regression and Regression Model Adequacy
Multiple Regression and Regression Model Adequacy Joseph J. Luczkovich, PhD February 14, 2014 Introduction Regression is a technique to mathematically model the linear association between two or more variables,
More informationNature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.
Understanding regression output from software Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals In 1966 Cyril Burt published a paper called The genetic determination of differences
More informationSimple Linear Regression
Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)
More information1 Least Squares Estimation - multiple regression.
Introduction to multiple regression. Fall 2010 1 Least Squares Estimation - multiple regression. Let y = {y 1,, y n } be a n 1 vector of dependent variable observations. Let β = {β 0, β 1 } be the 2 1
More informationApplied Regression Analysis
Applied Regression Analysis Chapter 3 Multiple Linear Regression Hongcheng Li April, 6, 2013 Recall simple linear regression 1 Recall simple linear regression 2 Parameter Estimation 3 Interpretations of
More informationSTAT 350: Summer Semester Midterm 1: Solutions
Name: Student Number: STAT 350: Summer Semester 2008 Midterm 1: Solutions 9 June 2008 Instructor: Richard Lockhart Instructions: This is an open book test. You may use notes, text, other books and a calculator.
More information13 Simple Linear Regression
B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 3 Simple Linear Regression 3. An industrial example A study was undertaken to determine the effect of stirring rate on the amount of impurity
More informationPOLSCI 702 Non-Normality and Heteroskedasticity
Goals of this Lecture POLSCI 702 Non-Normality and Heteroskedasticity Dave Armstrong University of Wisconsin Milwaukee Department of Political Science e: armstrod@uwm.edu w: www.quantoid.net/uwm702.html
More informationSwarthmore Honors Exam 2012: Statistics
Swarthmore Honors Exam 2012: Statistics 1 Swarthmore Honors Exam 2012: Statistics John W. Emerson, Yale University NAME: Instructions: This is a closed-book three-hour exam having six questions. You may
More informationSTA6938-Logistic Regression Model
Dr. Ying Zhang STA6938-Logistic Regression Model Topic 2-Multiple Logistic Regression Model Outlines:. Model Fitting 2. Statistical Inference for Multiple Logistic Regression Model 3. Interpretation of
More informationStatistics - Lecture Three. Linear Models. Charlotte Wickham 1.
Statistics - Lecture Three Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Linear Models 1. The Theory 2. Practical Use 3. How to do it in R 4. An example 5. Extensions
More informationStat 5102 Final Exam May 14, 2015
Stat 5102 Final Exam May 14, 2015 Name Student ID The exam is closed book and closed notes. You may use three 8 1 11 2 sheets of paper with formulas, etc. You may also use the handouts on brand name distributions
More informationCh 3: Multiple Linear Regression
Ch 3: Multiple Linear Regression 1. Multiple Linear Regression Model Multiple regression model has more than one regressor. For example, we have one response variable and two regressor variables: 1. delivery
More informationRegression Analysis and Forecasting Prof. Shalabh Department of Mathematics and Statistics Indian Institute of Technology-Kanpur
Regression Analysis and Forecasting Prof. Shalabh Department of Mathematics and Statistics Indian Institute of Technology-Kanpur Lecture 10 Software Implementation in Simple Linear Regression Model using
More informationLectures on Simple Linear Regression Stat 431, Summer 2012
Lectures on Simple Linear Regression Stat 43, Summer 0 Hyunseung Kang July 6-8, 0 Last Updated: July 8, 0 :59PM Introduction Previously, we have been investigating various properties of the population
More informationLab 3 A Quick Introduction to Multiple Linear Regression Psychology The Multiple Linear Regression Model
Lab 3 A Quick Introduction to Multiple Linear Regression Psychology 310 Instructions.Work through the lab, saving the output as you go. You will be submitting your assignment as an R Markdown document.
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 15: Examples of hypothesis tests (v5) Ramesh Johari ramesh.johari@stanford.edu 1 / 32 The recipe 2 / 32 The hypothesis testing recipe In this lecture we repeatedly apply the
More informationMultiple Linear Regression
Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from
More informationLecture 4: Regression Analysis
Lecture 4: Regression Analysis 1 Regression Regression is a multivariate analysis, i.e., we are interested in relationship between several variables. For corporate audience, it is sufficient to show correlation.
More information22s:152 Applied Linear Regression. Example: Study on lead levels in children. Ch. 14 (sec. 1) and Ch. 15 (sec. 1 & 4): Logistic Regression
22s:52 Applied Linear Regression Ch. 4 (sec. and Ch. 5 (sec. & 4: Logistic Regression Logistic Regression When the response variable is a binary variable, such as 0 or live or die fail or succeed then
More informationUnit 6 - Simple linear regression
Sta 101: Data Analysis and Statistical Inference Dr. Çetinkaya-Rundel Unit 6 - Simple linear regression LO 1. Define the explanatory variable as the independent variable (predictor), and the response variable
More informationIES 612/STA 4-573/STA Winter 2008 Week 1--IES 612-STA STA doc
IES 612/STA 4-573/STA 4-576 Winter 2008 Week 1--IES 612-STA 4-573-STA 4-576.doc Review Notes: [OL] = Ott & Longnecker Statistical Methods and Data Analysis, 5 th edition. [Handouts based on notes prepared
More informationLeverage. the response is in line with the other values, or the high leverage has caused the fitted model to be pulled toward the observed response.
Leverage Some cases have high leverage, the potential to greatly affect the fit. These cases are outliers in the space of predictors. Often the residuals for these cases are not large because the response
More informationVariance Decomposition and Goodness of Fit
Variance Decomposition and Goodness of Fit 1. Example: Monthly Earnings and Years of Education In this tutorial, we will focus on an example that explores the relationship between total monthly earnings
More informationST430 Exam 2 Solutions
ST430 Exam 2 Solutions Date: November 9, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textbook are permitted but you may use a calculator. Giving
More informationLINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises
LINEAR REGRESSION ANALYSIS MODULE XVI Lecture - 44 Exercises Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Exercise 1 The following data has been obtained on
More informationExam Applied Statistical Regression. Good Luck!
Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.
More informationRegression and the 2-Sample t
Regression and the 2-Sample t James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Regression and the 2-Sample t 1 / 44 Regression
More informationCh 2: Simple Linear Regression
Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component
More informationComparing Nested Models
Comparing Nested Models ST 370 Two regression models are called nested if one contains all the predictors of the other, and some additional predictors. For example, the first-order model in two independent
More informationMATH 644: Regression Analysis Methods
MATH 644: Regression Analysis Methods FINAL EXAM Fall, 2012 INSTRUCTIONS TO STUDENTS: 1. This test contains SIX questions. It comprises ELEVEN printed pages. 2. Answer ALL questions for a total of 100
More informationBusiness Statistics. Lecture 10: Course Review
Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,
More informationSimple Linear Regression
Simple Linear Regression September 24, 2008 Reading HH 8, GIll 4 Simple Linear Regression p.1/20 Problem Data: Observe pairs (Y i,x i ),i = 1,...n Response or dependent variable Y Predictor or independent
More informationR 2 and F -Tests and ANOVA
R 2 and F -Tests and ANOVA December 6, 2018 1 Partition of Sums of Squares The distance from any point y i in a collection of data, to the mean of the data ȳ, is the deviation, written as y i ȳ. Definition.
More information1 Multiple Regression
1 Multiple Regression In this section, we extend the linear model to the case of several quantitative explanatory variables. There are many issues involved in this problem and this section serves only
More informationNote on Bivariate Regression: Connecting Practice and Theory. Konstantin Kashin
Note on Bivariate Regression: Connecting Practice and Theory Konstantin Kashin Fall 2012 1 This note will explain - in less theoretical terms - the basics of a bivariate linear regression, including testing
More informationMLR Model Checking. Author: Nicholas G Reich, Jeff Goldsmith. This material is part of the statsteachr project
MLR Model Checking Author: Nicholas G Reich, Jeff Goldsmith This material is part of the statsteachr project Made available under the Creative Commons Attribution-ShareAlike 3.0 Unported License: http://creativecommons.org/licenses/by-sa/3.0/deed.en
More informationSCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models
SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION
More information1 Introduction 1. 2 The Multiple Regression Model 1
Multiple Linear Regression Contents 1 Introduction 1 2 The Multiple Regression Model 1 3 Setting Up a Multiple Regression Model 2 3.1 Introduction.............................. 2 3.2 Significance Tests
More informationRegression. Marc H. Mehlman University of New Haven
Regression Marc H. Mehlman marcmehlman@yahoo.com University of New Haven the statistician knows that in nature there never was a normal distribution, there never was a straight line, yet with normal and
More informationy ˆ i = ˆ " T u i ( i th fitted value or i th fit)
1 2 INFERENCE FOR MULTIPLE LINEAR REGRESSION Recall Terminology: p predictors x 1, x 2,, x p Some might be indicator variables for categorical variables) k-1 non-constant terms u 1, u 2,, u k-1 Each u
More informationSTAT 215 Confidence and Prediction Intervals in Regression
STAT 215 Confidence and Prediction Intervals in Regression Colin Reimer Dawson Oberlin College 24 October 2016 Outline Regression Slope Inference Partitioning Variability Prediction Intervals Reminder:
More informationK. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =
K. Model Diagnostics We ve already seen how to check model assumptions prior to fitting a one-way ANOVA. Diagnostics carried out after model fitting by using residuals are more informative for assessing
More informationChapter 12: Linear regression II
Chapter 12: Linear regression II Timothy Hanson Department of Statistics, University of South Carolina Stat 205: Elementary Statistics for the Biological and Life Sciences 1 / 14 12.4 The regression model
More informationUnless provided with information to the contrary, assume for each question below that the Classical Linear Model assumptions hold.
Economics 345: Applied Econometrics Section A01 University of Victoria Midterm Examination #2 Version 1 SOLUTIONS Spring 2015 Instructor: Martin Farnham Unless provided with information to the contrary,
More informationLinear Regression Model. Badr Missaoui
Linear Regression Model Badr Missaoui Introduction What is this course about? It is a course on applied statistics. It comprises 2 hours lectures each week and 1 hour lab sessions/tutorials. We will focus
More informationMath 423/533: The Main Theoretical Topics
Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)
More informationSimple Linear Regression for the Climate Data
Prediction Prediction Interval Temperature 0.2 0.0 0.2 0.4 0.6 0.8 320 340 360 380 CO 2 Simple Linear Regression for the Climate Data What do we do with the data? y i = Temperature of i th Year x i =CO
More informationRegression diagnostics
Regression diagnostics Kerby Shedden Department of Statistics, University of Michigan November 5, 018 1 / 6 Motivation When working with a linear model with design matrix X, the conventional linear model
More informationLinear Regression. Furthermore, it is simple.
Linear Regression While linear regression has limited value in the classification problem, it is often very useful in predicting a numerical response, on a linear or ratio scale. Furthermore, it is simple.
More informationCAS MA575 Linear Models
CAS MA575 Linear Models Boston University, Fall 2013 Midterm Exam (Correction) Instructor: Cedric Ginestet Date: 22 Oct 2013. Maximal Score: 200pts. Please Note: You will only be graded on work and answers
More informationChapter 8 Conclusion
1 Chapter 8 Conclusion Three questions about test scores (score) and student-teacher ratio (str): a) After controlling for differences in economic characteristics of different districts, does the effect
More information22s:152 Applied Linear Regression
22s:152 Applied Linear Regression Chapter 7: Dummy Variable Regression So far, we ve only considered quantitative variables in our models. We can integrate categorical predictors by constructing artificial
More informationPsychology Seminar Psych 406 Dr. Jeffrey Leitzel
Psychology Seminar Psych 406 Dr. Jeffrey Leitzel Structural Equation Modeling Topic 1: Correlation / Linear Regression Outline/Overview Correlations (r, pr, sr) Linear regression Multiple regression interpreting
More informationData Analysis and Statistical Methods Statistics 651
y 1 2 3 4 5 6 7 x Data Analysis and Statistical Methods Statistics 651 http://www.stat.tamu.edu/~suhasini/teaching.html Lecture 32 Suhasini Subba Rao Previous lecture We are interested in whether a dependent
More informationStat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS
Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS 1a) The model is cw i = β 0 + β 1 el i + ɛ i, where cw i is the weight of the ith chick, el i the length of the egg from which it hatched, and ɛ i
More informationChapter 12: Multiple Linear Regression
Chapter 12: Multiple Linear Regression Seungchul Baek Department of Statistics, University of South Carolina STAT 509: Statistics for Engineers 1 / 55 Introduction A regression model can be expressed as
More informationMultiple Linear Regression. Chapter 12
13 Multiple Linear Regression Chapter 12 Multiple Regression Analysis Definition The multiple regression model equation is Y = b 0 + b 1 x 1 + b 2 x 2 +... + b p x p + ε where E(ε) = 0 and Var(ε) = s 2.
More informationTests of Linear Restrictions
Tests of Linear Restrictions 1. Linear Restricted in Regression Models In this tutorial, we consider tests on general linear restrictions on regression coefficients. In other tutorials, we examine some
More informationLecture 2. Simple linear regression
Lecture 2. Simple linear regression Jesper Rydén Department of Mathematics, Uppsala University jesper@math.uu.se Regression and Analysis of Variance autumn 2014 Overview of lecture Introduction, short
More informationHypothesis testing Goodness of fit Multicollinearity Prediction. Applied Statistics. Lecturer: Serena Arima
Applied Statistics Lecturer: Serena Arima Hypothesis testing for the linear model Under the Gauss-Markov assumptions and the normality of the error terms, we saw that β N(β, σ 2 (X X ) 1 ) and hence s
More informationLinear Probability Model
Linear Probability Model Note on required packages: The following code requires the packages sandwich and lmtest to estimate regression error variance that may change with the explanatory variables. If
More informationCollinearity: Impact and Possible Remedies
Collinearity: Impact and Possible Remedies Deepayan Sarkar What is collinearity? Exact dependence between columns of X make coefficients non-estimable Collinearity refers to the situation where some columns
More informationIntroduction to Regression
Regression Introduction to Regression If two variables covary, we should be able to predict the value of one variable from another. Correlation only tells us how much two variables covary. In regression,
More informationPrepared by: Prof. Dr Bahaman Abu Samah Department of Professional Development and Continuing Education Faculty of Educational Studies Universiti
Prepared by: Prof Dr Bahaman Abu Samah Department of Professional Development and Continuing Education Faculty of Educational Studies Universiti Putra Malaysia Serdang M L Regression is an extension to
More informationST505/S697R: Fall Homework 2 Solution.
ST505/S69R: Fall 2012. Homework 2 Solution. 1. 1a; problem 1.22 Below is the summary information (edited) from the regression (using R output); code at end of solution as is code and output for SAS. a)
More informationCOMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION
COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION Answer all parts. Closed book, calculators allowed. It is important to show all working,
More informationVariance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017
Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017 PDF file location: http://www.murraylax.org/rtutorials/regression_anovatable.pdf
More informationIntroduction to Linear Regression
Introduction to Linear Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Introduction to Linear Regression 1 / 46
More informationSimple Linear Regression
Simple Linear Regression Reading: Hoff Chapter 9 November 4, 2009 Problem Data: Observe pairs (Y i,x i ),i = 1,... n Response or dependent variable Y Predictor or independent variable X GOALS: Exploring
More informationModeling kid s test scores (revisited) Lecture 20 - Model Selection. Model output. Backward-elimination
Modeling kid s test scores (revisited) Lecture 20 - Model Selection Sta102 / BME102 Colin Rundel November 17, 2014 Predicting cognitive test scores of three- and four-year-old children using characteristics
More informationMatrices and vectors A matrix is a rectangular array of numbers. Here s an example: A =
Matrices and vectors A matrix is a rectangular array of numbers Here s an example: 23 14 17 A = 225 0 2 This matrix has dimensions 2 3 The number of rows is first, then the number of columns We can write
More informationSTAT5044: Regression and Anova
STAT5044: Regression and Anova Inyoung Kim 1 / 49 Outline 1 How to check assumptions 2 / 49 Assumption Linearity: scatter plot, residual plot Randomness: Run test, Durbin-Watson test when the data can
More informationMultiple Regression Introduction to Statistics Using R (Psychology 9041B)
Multiple Regression Introduction to Statistics Using R (Psychology 9041B) Paul Gribble Winter, 2016 1 Correlation, Regression & Multiple Regression 1.1 Bivariate correlation The Pearson product-moment
More informationHomework 2. For the homework, be sure to give full explanations where required and to turn in any relevant plots.
Homework 2 1 Data analysis problems For the homework, be sure to give full explanations where required and to turn in any relevant plots. 1. The file berkeley.dat contains average yearly temperatures for
More informationLecture 1 Intro to Spatial and Temporal Data
Lecture 1 Intro to Spatial and Temporal Data Dennis Sun Stanford University Stats 253 June 22, 2015 1 What is Spatial and Temporal Data? 2 Trend Modeling 3 Omitted Variables 4 Overview of this Class 1
More informationStatistical View of Least Squares
May 23, 2006 Purpose of Regression Some Examples Least Squares Purpose of Regression Purpose of Regression Some Examples Least Squares Suppose we have two variables x and y Purpose of Regression Some Examples
More information