Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model
|
|
- Benedict Hubbard
- 5 years ago
- Views:
Transcription
1 1 Linear Regression
2 2 Linear Regression In this lecture we will study a particular type of regression model: the linear regression model We will first consider the case of the model with one predictor variable: simple linear regression We will then consider the case of the model with more than one predictor variable: multiple linear regression
3 3 Linear Regression Models Recall a regression model is defined by 1 a random response variable 2 a list of predictor variables 3 a regression equation 4 a distribution for the value of the random response variable
4 4 Simple Linear Regression The regression equation for simple linear regression is: EY i = µ i = α + β x i Note that the link function g is the identity function for linear regression. The assumption here is that the relationship between x and EY i is a straight line The slope of the line is β
5 5 Example: Full Blood Count A clinical full blood count takes a standard volume of blood and measures: # of blood cells (platelets, white, red) haemoglobin concentration Empirically, there is a log-linear relationship between the number of red cells and haemoglobin concentration
6 6 log(haemoglobin) on log(rbc) Example
7 7 Interpretation of α To interpret α put x i = 0 into the regression equation: then EY i = α + β x i EY i = α α is the average value of the response variable amongst study subjects for which the predictor variable is zero.
8 8 Interpretation of β To interpret β put x = z and x = z + 1 for study subjects i and i into the regression equation to obtain: EY i = α + β z (1) EY i = α + β (z + 1) (2) then take equation (1) from equation (2) EY i EY i = β (3)
9 9 Interpretation: Blood Count Example EY i = α + β x i E log(hgb) = log(#rbc) HGB = haemoglobin concentration #RBC = red cell count For the blood example when log(rbc) = 0 the average log(hgb) = 1.50 Why is this interpretation silly? If log(#rbc) increases by 1; log(hgb) increases on average by 0.74
10 10 Linear Regression Response For linear regression the response distribution is assumed to be normal (sometimes called Gaussian). Y i N(µ i, σ 2 ) equivalently Y i N(α + β x i, σ 2 )
11 Linear Regression Errors The quantity ϵ i = Y i µ i = Y i (α + β x i ) is the error corresponding to study subject i The distributional assumption of linear regression is equivalent to the assumption that the errors are normally distributed, with mean zero: ϵ i N(0, σ 2 ) The error variance σ 2 is the same for each study sample 11 σ 2 can be estimated from the data using maximum likelihood
12 12 Linear Regression Error Assumption We can put the regression equation and distribution assumption into a single statement: Y i = α + β x i + ϵ i, ϵ i N(0, σ 2 )
13 13 Linear Regression Residuals We define residual for study subject i by: r i = Y i ( ˆα + ˆβ x i ) Recall that the error for study subject i is defined by ϵ i = Y i (α + β x i ) Note that residuals and errors are not the same. Errors are unknown because we don t know α and β Residuals can be computed from the data Residuals can be thought of as estimates of errors
14 14 Linear Regression Residuals
15 Properties of Linear Regression Residuals Although residuals and errors are not the same, residuals have similar properties to errors: 1 The mean (and sum) of the residuals for a study sample is equal to zero 2 Residuals are normally distributed 3 The variance of the residuals should not depend on the value of the predictors The first property holds regardless of the validity of the modelling assumptions 15 The second and third properties only holds if the model assumptions are valid. Specifically only if 1 The relationship between x and Y is linear 2 The errors are normally distributed 3 The variance of the errors is constant
16 16 Checking Modelling Assumptions Before we rely on an inference made from a linear regression model, we should always verify that the modelling assumptions hold Specifically we should check 1 EY is a linear function of x 2 The properties of the residuals are consistent with the assumption about the distribution of Y
17 17 Check Linearity Suppose the R variable y is a vector containing data from a response variable Y and the R variable x is a vector containing data from a predictor variable x. We can generate a plot of y against x with the command > plot(x,y) We will deal with non-linearity in a subsequent lecture
18 18 Fitting a Linear Model in R We can fit a linear regression in R using the lm function. > fit.obj = lm(y~x) Fits the regression equation EY = α + β x The result of the model fit is stored in the R object fit.obj
19 19 Extracting the Residuals The residuals can be extracted from the linear regression object using fit.obj$residuals For example to draw a histogram of the residuals you can type: > hist(fit.obj$residuals) Alternatively you can do the fitting and plotting in one statement, without storing a model object: > hist(lm(y~x)$residuals)
20 20 By examining a histogram of the residuals we can check the normality assumption holds Histogram of the Residuals
21 21 QQ-plot of the Residuals A Q-Q plot (short for quantile-quantile plot) is a graphical method for comparing two probability distributions We can use it to compare the observed distribution of the residuals with the distribution of a N(0, 1) random variable If the residuals are normally distributed the plot should follow an approximately straight line
22 22 1 Draw n lines on a normal density to divide into n + 1 regions of equal probability 2 Plot the x-axis values of the dotted lines against the ordered residuals QQ-plot of the Residuals Suppose we have n data points, so n residuals
23 QQ-plot of the Residuals 23 R commands to generate this plot in the next session
24 24 Plot the Residuals vs. the Predictor Plot the residuals against the predictor variable to verify that the distribution of the residuals is independent of x > plot(x, lm(y~x)$residuals)
25 25 Maximum Likelihood Estimation For linear regression there are formulae for the maximum likelihood estimates of the regression coefficients: n i ˆβ = (x i x)(y i Ȳ) n i (x i x) 2 ˆα = ȳ ˆβ x However we do not need to worry about these too much as R will do the calculations for us
26 26 Viewing Model Fit Information in R The simplest way to view model fit information in R is to type the name of a fitted model object and hit return: > fit.obj = lm(y~x) > fit.obj Call: lm(formula = y ~ x) Coefficients: (Intercept) x This prints the MLEs of the coefficients
27 27 Printing Confidence Intervals Using R 95% confidence limits can be computed with the confint function > confint(fit.obj) 2.5 % 97.5 % (Intercept) x A different confidence level can be specified if desired, e.g. 99%: > confint(fit.obj, level=0.99) 0.5 % 99.5 % (Intercept) x
28 28 The display command > display(fit.obj) lm(formula = y ~ x) coef.est coef.se (Intercept) x n = 500, k = 2 residual sd = 0.51, R-Squared = 0.94 Prints: the MLE, the standard errors of the coefficient MLEs, the standard deviations of the regression coefficients, the residual standard deviation and R 2
29 29 Standard Errors of the MLEs Back to the idea of imaginary repeated experiments Suppose, in an imaginary world we: 1 repeat our experiment very many times 2 generate a new dataset on each occasion 3 estimate a new MLE ˆβ using each dataset The MLE is a random variable under this replication process The standard error of ˆβ denoted SE( ˆβ) is defined as the standard deviation of the MLE.
30 30 Proportion of Variance Explained R 2 is the proportion of the variance in the response which is explained by the predictor. R 2 is a number between 0 and 1 R 2 is a measure of the correlation between x and y. When R 2 = 1, x is perfectly correlated with y and the residuals are all equal to 0 When R 2 = 0, x contains no information about y.
31 31 Residual Standard Deviation The residual standard deviation is what it says on the tin: sd(ˆϵ) = 1 n n (ϵ i ϵ) 2 i
32 32 The R summary Command > summary(fit.obj) Call: lm(formula = y ~ x) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) <2e-16 *** x <2e-16 *** --- Signif. codes: 0 *** ** 0.01 * Residual standard error: on 498 degrees of freedom Multiple R-squared: ,^^IAdjusted R-squared: F-statistic: 7462 on 1 and 498 DF, p-value: < 2.2e-16
33 33 p-values The p-value in the Pr(> t ) column of the summary command is a measure of the weight of evidence against the null hypothesis that the regression coefficient in that row is equal to zero. The null hypothesis is so called because it refers to the assumed position that there is no association between the predictor and the response. Usually the evidence must be strong before a null hypothesis is rejected A p-value is a number between 0 and 1. The smaller the number the greater the evidence against the null hypothesis. Typically a p-value at least as small as 0.05 is required to reject a null hypothesis.
34 34 Interpretation of p-values The interpretation of p-values, is based on the idea of imaginary repeated experiments. Suppose, in an imaginary world we: 1 repeat our experiment very many times 2 generate a new dataset on each occasion 3 calculate a new p-value level using each dataset then assuming the null hypothesis is true α 100% of the calculated p-values should be less than α Small p-values are rare when the null hypothesis is true
35 35 Computing Confidence Intervals Manually Although R provides the confint function, confidence intervals can also be computed manually from standard errors Not all statistical software provides functions to compute confidence intervals so this is a useful skill Standard errors are listed in the second column of the summary output. (They are also printed by the display command) Manual calculation of confidence intervals is based the assumption that the MLE of the regression coefficient follows a normal distribution.
36 36 Computing Confidence Intervals Manually We can compute a 95% confidence interval for a regression coefficient using a normal approximation: ˆβ 1.96 SE( ˆβ) < β < ˆβ SE( ˆβ)
37 37 Multiple Linear Regression Multiple linear regression is very similar to simple linear regression More than one predictor is now allowed on the right handside of the equation EY i = µ i = α + β 1 x i1 + β 1 x i1 +...β 1 x ip The assumptions about the distribution of Y i (normal, homogeneous variance) are the same as those for simple linear regression.
38 38 Fitting a Multiple Linear Regression A multiple linear regression can be fitted with the lm command. > fit.obj=lm(y~x1+x2) Information can be extracted from the model object using the functions already seen: confint, display and summary.
39 39 When to Use Multiple Linear Regression Multiple linear regression is useful when more than one predictor is thought to associate with the response simultaneously By fitting both predictors in the same model we can get more precise estimates of the regression coefficients
40 40 Fitting a Multiple Linear Regression > summary(fit.obj) Call: lm(formula = y ~ x1 + x2) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) x e-07 *** x * --- Signif. codes: 0 *** ** 0.01 * Residual standard error: on 47 degrees of freedom Multiple R-squared: ,^^IAdjusted R-squared: F-statistic: on 2 and 47 DF, p-value: 6.864e-07
41 41 Multiple Linear Regression: Interpretation of α To interpret α put x ij = 0 into the regression equation for each predictor: then EY i = µ i = α + β 1 x i1 + β 1 x i1 +...β 1 x ip EY i = α α is the average value of the response variable amongst study subjects for which every predictor variable is zero.
42 42 Interpretation of β j To interpret β j, the regression coefficient for the jth predictor variable, put x = z for study subjects i and i and x = z + 1 into the regression equation to obtain: EY i = α + β 1 x i1 +...β j z +...β p x ip (4) EY i = α + β 1 x i β j (z + 1) +...β p x i p (5) then take equation (1) from equation (2) EY i EY i = β β is the difference in the average value of the response variable between groups of study subjects for which the predictor variable differs by one unit.
43 43 Multiple Linear Regression with Interactions Multiple linear regression assumes that the effect of a unit change in a predictor on the mean of the response is independent of the values of the other predictors. e.g. increasing the predictor value x ij by one unit increases EY i by the amount β j, whatever the values of the other predictor variables. Interaction models allow us to relax this assumption
44 44 Interactions in Linear Regression An interaction model is one where the interpretation of the effect of one predictor depends on the value of another and vice versa. The simplest interaction models includes a predictor variable formed by multiplying two ordinary predictors: EY i = α + β i1 x i1 + β 2 x i2 + β 3 x i1 x i2 Interaction term
45 45 Interaction Between 2 Variables Consider a linear model where the main predictors of Y (blood pressure in mmhg) are age (in years) and weight in (kg) EY = α + β 1 weight + β 2 age + β 3 (weight age)
46 46 Interpreting β 1 and β 2 EY i = α + β 1 weight + β 2 age + β 3 (weight age) We would like to know how to interpret β 1 if the interaction term was not there. Since in that case would just have an ordinary multivariate linear model. This happens when the age of a study subject is equal to 0, then weight: EY = α + β 1 weight + β β 3 (weight 0) = α + β 1 weight
47 47 Interpreting β 1 and β 2 Amongst study subjects aged 0 years EY = α + β 1 weight We know how to interpret β 1 in this case as it s a simple linear model. β 1 is the difference in the expected BP between individuals whose weight differs by 1kg and are aged 0 years This interpretation is factual correct, but practically not very useful The data aren t likely to contain many 0 year olds. Is the model valid in this range?
48 48 Interpreting β 1 and β 2 EY = α + β 1 weight + β 2 age + β 3 (weight age) To interpret β 2 we need to get rid of the interaction term without getting rid of the β 2 age term. Same argument as before but now set weight=0: EY = α + β β 2 age + β 3 (0 age) = β 0 + β 2 age β 2 is the difference in the expected BP between individuals whose age differs by 1 year and who weigh 0kg.
49 49 Interpreting β 3 EY = α + β 1 weight + β 2 age + β 3 (weight age) To interpret β 3 rewrite the regression equation: EY = α + [β 1 + β 3 age]weight + β 2 age This looks like a multivariate regression model with weight and age as predictors where: β 1 + β 3 age is the regression coefficient for weight β2 is the regression coefficient for age β 3 is the difference between the regression coefficients for weight for study subjects whose age differs by 1 year
50 50 Interpreting β 3 EY = α + β 1 weight + β 2 age + β 3 (weight age) We could just as well have rewritten the equation this way: EY = α + β 1 weight + [β 2 + β 3 weight]age β 3 is the difference between the regression coefficients for age for study subjects whose weight differs by 1kg So we have two ways of thinking about β 3 : 1 either as modification of the effect of weight by age 2 or the modification of the effect of age by weight.
51 51 Fitting an Interaction Term in R A multiple linear regression can be fitted with the lm command. > summary(lm(y~x1*x2)) Call: lm(formula = y ~ x1 * x2) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) x x x1:x ** --- Signif. codes: 0 *** ** 0.01 *
52 52 Red Blood Cell Count in UK Biobank The distribution of the number of red blood cells in a unit of blood in middle aged males and females in the UK.
53 53 Categorical Predictors in Regression So far we have implicitly assumed that the predictor variable x is numerical and that the data contain a range of values for x The previous example shows how we might wish to use a categorical variables such as sex as a predictor in a regression model How do we put a categorical variable into a regression equation? EY i = µ i = α + β "Female" does not make sense - only numbers can be put into equations
54 54 Dummy Variables The solution to this problem is to use dummy variables A dummy variable is a 0/1 variable which acts as proxy for the value of a categorical variable For example if x is a categorical variable with possible categories "Male"/"Female" we can substitute a dummy variable δ with: δ i = 0 δ i = 1 if and only if x i ="Male" if and only if x i ="Female" The regression equation is now: EY i = µ i = α + β δ i
55 55 Dummy Variables The regression equation: EY i = µ i = α + β δ i δ i = 0 δ i = 1 if and only if x i ="Male" if and only if x i ="Female" The mean value of Y i in females is α + β The mean value of Y i in males is α The interpretation of the regression coefficients depends on the coding chosen. We could have chosen to code Females as 0 and men as 1.
56 56 Dummy Variables When x has more than two possible categories we need more than one dummy variable to code the categories numerically. For example suppose x has possible categories "Male"/"Pre-Menopausal Female"/"Post-Menopausal Female". We need to choose a baseline category, which corresponds to all the dummy variables being equal to zero. This choice changes the interpretation of the coefficients but has no effect on statistical inferences
57 Dummy Variables δ i1 = 0 δ i1 = 1 δ i2 = 0 δ i2 = 1 if and only if x i ="Not Pre-menopausal Female" if and only if x i ="Pre-Menopausal Female" if and only if x i ="Not Pre-menopausal Male" if and only if x i ="Post-Menopausal Female" The regression equation becomes: EY i = µ i = α + β 1 δ i1 + β 2 δ i2 The mean value of Y i in pre-menopausal females is α + β 1 The mean value of Y i in post-menopausal females is α + β 2 57 The mean value of Y i in males is α
58 58 Fitting Categorical Predictors in R R calls categorical variables "factor" variables R automatically converts factor variables into dummy variables when they are put into a regression equation: > lm(rbc~sex) Call: lm(formula = rbc ~ sex) Coefficients: (Intercept) sexmale
59 59 Summary This lecture has been about linear regression Linear regression is used to model the association between the mean of a random variable and one or more predictors We ve covered univariate and multiple regression Interpretation of regression coefficients Interaction terms Dummy variables
Lecture 4 Multiple linear regression
Lecture 4 Multiple linear regression BIOST 515 January 15, 2004 Outline 1 Motivation for the multiple regression model Multiple regression in matrix notation Least squares estimation of model parameters
More informationST430 Exam 1 with Answers
ST430 Exam 1 with Answers Date: October 5, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textook are permitted but you may use a calculator.
More informationRegression and the 2-Sample t
Regression and the 2-Sample t James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Regression and the 2-Sample t 1 / 44 Regression
More informationLecture 14: Introduction to Poisson Regression
Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why
More informationModelling counts. Lecture 14: Introduction to Poisson Regression. Overview
Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week
More informationComparing Nested Models
Comparing Nested Models ST 370 Two regression models are called nested if one contains all the predictors of the other, and some additional predictors. For example, the first-order model in two independent
More information22s:152 Applied Linear Regression
22s:152 Applied Linear Regression Chapter 7: Dummy Variable Regression So far, we ve only considered quantitative variables in our models. We can integrate categorical predictors by constructing artificial
More informationLecture 18: Simple Linear Regression
Lecture 18: Simple Linear Regression BIOS 553 Department of Biostatistics University of Michigan Fall 2004 The Correlation Coefficient: r The correlation coefficient (r) is a number that measures the strength
More informationInference for Regression
Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu
More informationMultiple Linear Regression
Multiple Linear Regression ST 430/514 Recall: a regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates).
More informationMatrices and vectors A matrix is a rectangular array of numbers. Here s an example: A =
Matrices and vectors A matrix is a rectangular array of numbers Here s an example: 23 14 17 A = 225 0 2 This matrix has dimensions 2 3 The number of rows is first, then the number of columns We can write
More informationCategorical Predictor Variables
Categorical Predictor Variables We often wish to use categorical (or qualitative) variables as covariates in a regression model. For binary variables (taking on only 2 values, e.g. sex), it is relatively
More information22s:152 Applied Linear Regression. Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA)
22s:152 Applied Linear Regression Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA) We now consider an analysis with only categorical predictors (i.e. all predictors are
More informationLecture 2. The Simple Linear Regression Model: Matrix Approach
Lecture 2 The Simple Linear Regression Model: Matrix Approach Matrix algebra Matrix representation of simple linear regression model 1 Vectors and Matrices Where it is necessary to consider a distribution
More informationCh 2: Simple Linear Regression
Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component
More informationCorrelated Data: Linear Mixed Models with Random Intercepts
1 Correlated Data: Linear Mixed Models with Random Intercepts Mixed Effects Models This lecture introduces linear mixed effects models. Linear mixed models are a type of regression model, which generalise
More informationDensity Temp vs Ratio. temp
Temp Ratio Density 0.00 0.02 0.04 0.06 0.08 0.10 0.12 Density 0.0 0.2 0.4 0.6 0.8 1.0 1. (a) 170 175 180 185 temp 1.0 1.5 2.0 2.5 3.0 ratio The histogram shows that the temperature measures have two peaks,
More information22s:152 Applied Linear Regression. Take random samples from each of m populations.
22s:152 Applied Linear Regression Chapter 8: ANOVA NOTE: We will meet in the lab on Monday October 10. One-way ANOVA Focuses on testing for differences among group means. Take random samples from each
More informationExample. Multiple Regression. Review of ANOVA & Simple Regression /749 Experimental Design for Behavioral and Social Sciences
36-309/749 Experimental Design for Behavioral and Social Sciences Sep. 29, 2015 Lecture 5: Multiple Regression Review of ANOVA & Simple Regression Both Quantitative outcome Independent, Gaussian errors
More information22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA
22s:152 Applied Linear Regression Chapter 8: ANOVA NOTE: We will meet in the lab on Monday October 10. One-way ANOVA Focuses on testing for differences among group means. Take random samples from each
More information22s:152 Applied Linear Regression. 1-way ANOVA visual:
22s:152 Applied Linear Regression 1-way ANOVA visual: Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA) 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 Y We now consider an analysis
More informationLecture 14 Simple Linear Regression
Lecture 4 Simple Linear Regression Ordinary Least Squares (OLS) Consider the following simple linear regression model where, for each unit i, Y i is the dependent variable (response). X i is the independent
More informationWorkshop 7.4a: Single factor ANOVA
-1- Workshop 7.4a: Single factor ANOVA Murray Logan November 23, 2016 Table of contents 1 Revision 1 2 Anova Parameterization 2 3 Partitioning of variance (ANOVA) 10 4 Worked Examples 13 1. Revision 1.1.
More informationSimple and Multiple Linear Regression
Sta. 113 Chapter 12 and 13 of Devore March 12, 2010 Table of contents 1 Simple Linear Regression 2 Model Simple Linear Regression A simple linear regression model is given by Y = β 0 + β 1 x + ɛ where
More informationCoefficient of Determination
Coefficient of Determination ST 430/514 The coefficient of determination, R 2, is defined as before: R 2 = 1 SS E (yi ŷ i ) = 1 2 SS yy (yi ȳ) 2 The interpretation of R 2 is still the fraction of variance
More informationUNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017
UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Tuesday, January 17, 2017 Work all problems 60 points are needed to pass at the Masters Level and 75
More informationScatter plot of data from the study. Linear Regression
1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25
More informationCorrelation and simple linear regression S5
Basic medical statistics for clinical and eperimental research Correlation and simple linear regression S5 Katarzyna Jóźwiak k.jozwiak@nki.nl November 15, 2017 1/41 Introduction Eample: Brain size and
More informationBMI 541/699 Lecture 22
BMI 541/699 Lecture 22 Where we are: 1. Introduction and Experimental Design 2. Exploratory Data Analysis 3. Probability 4. T-based methods for continous variables 5. Power and sample size for t-based
More informationSCHOOL OF MATHEMATICS AND STATISTICS
RESTRICTED OPEN BOOK EXAMINATION (Not to be removed from the examination hall) Data provided: Statistics Tables by H.R. Neave MAS5052 SCHOOL OF MATHEMATICS AND STATISTICS Basic Statistics Spring Semester
More information1 The Classic Bivariate Least Squares Model
Review of Bivariate Linear Regression Contents 1 The Classic Bivariate Least Squares Model 1 1.1 The Setup............................... 1 1.2 An Example Predicting Kids IQ................. 1 2 Evaluating
More informationInference. ME104: Linear Regression Analysis Kenneth Benoit. August 15, August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58
Inference ME104: Linear Regression Analysis Kenneth Benoit August 15, 2012 August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58 Stata output resvisited. reg votes1st spend_total incumb minister
More informationScatter plot of data from the study. Linear Regression
1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25
More informationSTAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression
STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression Rebecca Barter April 20, 2015 Fisher s Exact Test Fisher s Exact Test
More informationSimple Linear Regression
Simple Linear Regression ST 370 Regression models are used to study the relationship of a response variable and one or more predictors. The response is also called the dependent variable, and the predictors
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 15: Examples of hypothesis tests (v5) Ramesh Johari ramesh.johari@stanford.edu 1 / 32 The recipe 2 / 32 The hypothesis testing recipe In this lecture we repeatedly apply the
More informationMATH11400 Statistics Homepage
MATH11400 Statistics 1 2010 11 Homepage http://www.stats.bris.ac.uk/%7emapjg/teach/stats1/ 4. Linear Regression 4.1 Introduction So far our data have consisted of observations on a single variable of interest.
More informationy i s 2 X 1 n i 1 1. Show that the least squares estimators can be written as n xx i x i 1 ns 2 X i 1 n ` px xqx i x i 1 pδ ij 1 n px i xq x j x
Question 1 Suppose that we have data Let x 1 n x i px 1, y 1 q,..., px n, y n q. ȳ 1 n y i s 2 X 1 n px i xq 2 Throughout this question, we assume that the simple linear model is correct. We also assume
More informationRon Heck, Fall Week 3: Notes Building a Two-Level Model
Ron Heck, Fall 2011 1 EDEP 768E: Seminar on Multilevel Modeling rev. 9/6/2011@11:27pm Week 3: Notes Building a Two-Level Model We will build a model to explain student math achievement using student-level
More informationExample: Poisondata. 22s:152 Applied Linear Regression. Chapter 8: ANOVA
s:5 Applied Linear Regression Chapter 8: ANOVA Two-way ANOVA Used to compare populations means when the populations are classified by two factors (or categorical variables) For example sex and occupation
More informationGeneralised linear models. Response variable can take a number of different formats
Generalised linear models Response variable can take a number of different formats Structure Limitations of linear models and GLM theory GLM for count data GLM for presence \ absence data GLM for proportion
More informationSimple Linear Regression
Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)
More information1 Multiple Regression
1 Multiple Regression In this section, we extend the linear model to the case of several quantitative explanatory variables. There are many issues involved in this problem and this section serves only
More informationNature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.
Understanding regression output from software Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals In 1966 Cyril Burt published a paper called The genetic determination of differences
More informationIntroduction and Single Predictor Regression. Correlation
Introduction and Single Predictor Regression Dr. J. Kyle Roberts Southern Methodist University Simmons School of Education and Human Development Department of Teaching and Learning Correlation A correlation
More information2.1 Linear regression with matrices
21 Linear regression with matrices The values of the independent variables are united into the matrix X (design matrix), the values of the outcome and the coefficient are represented by the vectors Y and
More informationSection 4.6 Simple Linear Regression
Section 4.6 Simple Linear Regression Objectives ˆ Basic philosophy of SLR and the regression assumptions ˆ Point & interval estimation of the model parameters, and how to make predictions ˆ Point and interval
More informationStat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS
Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS 1a) The model is cw i = β 0 + β 1 el i + ɛ i, where cw i is the weight of the ith chick, el i the length of the egg from which it hatched, and ɛ i
More informationLinear Regression Model. Badr Missaoui
Linear Regression Model Badr Missaoui Introduction What is this course about? It is a course on applied statistics. It comprises 2 hours lectures each week and 1 hour lab sessions/tutorials. We will focus
More information14 Multiple Linear Regression
B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 14 Multiple Linear Regression 14.1 The multiple linear regression model In simple linear regression, the response variable y is expressed in
More informationGeneral Linear Model (Chapter 4)
General Linear Model (Chapter 4) Outcome variable is considered continuous Simple linear regression Scatterplots OLS is BLUE under basic assumptions MSE estimates residual variance testing regression coefficients
More informationAnalysing data: regression and correlation S6 and S7
Basic medical statistics for clinical and experimental research Analysing data: regression and correlation S6 and S7 K. Jozwiak k.jozwiak@nki.nl 2 / 49 Correlation So far we have looked at the association
More informationMATH 644: Regression Analysis Methods
MATH 644: Regression Analysis Methods FINAL EXAM Fall, 2012 INSTRUCTIONS TO STUDENTS: 1. This test contains SIX questions. It comprises ELEVEN printed pages. 2. Answer ALL questions for a total of 100
More informationMultilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2
Multilevel Models in Matrix Form Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Today s Lecture Linear models from a matrix perspective An example of how to do
More informationProblems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B
Simple Linear Regression 35 Problems 1 Consider a set of data (x i, y i ), i =1, 2,,n, and the following two regression models: y i = β 0 + β 1 x i + ε, (i =1, 2,,n), Model A y i = γ 0 + γ 1 x i + γ 2
More informationST430 Exam 2 Solutions
ST430 Exam 2 Solutions Date: November 9, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textbook are permitted but you may use a calculator. Giving
More informationACOVA and Interactions
Chapter 15 ACOVA and Interactions Analysis of covariance (ACOVA) incorporates one or more regression variables into an analysis of variance. As such, we can think of it as analogous to the two-way ANOVA
More informationAcknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression
INTRODUCTION TO CLINICAL RESEARCH Introduction to Linear Regression Karen Bandeen-Roche, Ph.D. July 17, 2012 Acknowledgements Marie Diener-West Rick Thompson ICTR Leadership / Team JHU Intro to Clinical
More informationUnit 6 - Simple linear regression
Sta 101: Data Analysis and Statistical Inference Dr. Çetinkaya-Rundel Unit 6 - Simple linear regression LO 1. Define the explanatory variable as the independent variable (predictor), and the response variable
More informationSimple Linear Regression
Simple Linear Regression Reading: Hoff Chapter 9 November 4, 2009 Problem Data: Observe pairs (Y i,x i ),i = 1,... n Response or dependent variable Y Predictor or independent variable X GOALS: Exploring
More informationLecture 6: Linear Regression
Lecture 6: Linear Regression Reading: Sections 3.1-3 STATS 202: Data mining and analysis Jonathan Taylor, 10/5 Slide credits: Sergio Bacallado 1 / 30 Simple linear regression Model: y i = β 0 + β 1 x i
More informationSLR output RLS. Refer to slr (code) on the Lecture Page of the class website.
SLR output RLS Refer to slr (code) on the Lecture Page of the class website. Old Faithful at Yellowstone National Park, WY: Simple Linear Regression (SLR) Analysis SLR analysis explores the linear association
More informationStatistical Methods III Statistics 212. Problem Set 2 - Answer Key
Statistical Methods III Statistics 212 Problem Set 2 - Answer Key 1. (Analysis to be turned in and discussed on Tuesday, April 24th) The data for this problem are taken from long-term followup of 1423
More informationTopics on Statistics 2
Topics on Statistics 2 Pejman Mahboubi March 7, 2018 1 Regression vs Anova In Anova groups are the predictors. When plotting, we can put the groups on the x axis in any order we wish, say in increasing
More information13 Simple Linear Regression
B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 3 Simple Linear Regression 3. An industrial example A study was undertaken to determine the effect of stirring rate on the amount of impurity
More information401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.
401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis
More informationBusiness Statistics. Lecture 10: Course Review
Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 12: Frequentist properties of estimators (v4) Ramesh Johari ramesh.johari@stanford.edu 1 / 39 Frequentist inference 2 / 39 Thinking like a frequentist Suppose that for some
More informationIntroduction to Linear Regression Rebecca C. Steorts September 15, 2015
Introduction to Linear Regression Rebecca C. Steorts September 15, 2015 Today (Re-)Introduction to linear models and the model space What is linear regression Basic properties of linear regression Using
More informationUnit 6 - Introduction to linear regression
Unit 6 - Introduction to linear regression Suggested reading: OpenIntro Statistics, Chapter 7 Suggested exercises: Part 1 - Relationship between two numerical variables: 7.7, 7.9, 7.11, 7.13, 7.15, 7.25,
More informationDiscrete Multivariate Statistics
Discrete Multivariate Statistics Univariate Discrete Random variables Let X be a discrete random variable which, in this module, will be assumed to take a finite number of t different values which are
More informationR 2 and F -Tests and ANOVA
R 2 and F -Tests and ANOVA December 6, 2018 1 Partition of Sums of Squares The distance from any point y i in a collection of data, to the mean of the data ȳ, is the deviation, written as y i ȳ. Definition.
More informationDe-mystifying random effects models
De-mystifying random effects models Peter J Diggle Lecture 4, Leahurst, October 2012 Linear regression input variable x factor, covariate, explanatory variable,... output variable y response, end-point,
More informationOne-Way ANOVA. Some examples of when ANOVA would be appropriate include:
One-Way ANOVA 1. Purpose Analysis of variance (ANOVA) is used when one wishes to determine whether two or more groups (e.g., classes A, B, and C) differ on some outcome of interest (e.g., an achievement
More informationApplied Regression Analysis
Applied Regression Analysis Chapter 3 Multiple Linear Regression Hongcheng Li April, 6, 2013 Recall simple linear regression 1 Recall simple linear regression 2 Parameter Estimation 3 Interpretations of
More informationST505/S697R: Fall Homework 2 Solution.
ST505/S69R: Fall 2012. Homework 2 Solution. 1. 1a; problem 1.22 Below is the summary information (edited) from the regression (using R output); code at end of solution as is code and output for SAS. a)
More informationFinal Exam. Name: Solution:
Final Exam. Name: Instructions. Answer all questions on the exam. Open books, open notes, but no electronic devices. The first 13 problems are worth 5 points each. The rest are worth 1 point each. HW1.
More informationIntroduction to Linear Regression
Introduction to Linear Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Introduction to Linear Regression 1 / 46
More informationLecture 6 Multiple Linear Regression, cont.
Lecture 6 Multiple Linear Regression, cont. BIOST 515 January 22, 2004 BIOST 515, Lecture 6 Testing general linear hypotheses Suppose we are interested in testing linear combinations of the regression
More informationSummer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.
Summer School in Statistics for Astronomers V June 1 - June 6, 2009 Regression Mosuk Chow Statistics Department Penn State University. Adapted from notes prepared by RL Karandikar Mean and variance Recall
More informationSimple Linear Regression: One Qualitative IV
Simple Linear Regression: One Qualitative IV 1. Purpose As noted before regression is used both to explain and predict variation in DVs, and adding to the equation categorical variables extends regression
More informationReview of the General Linear Model
Review of the General Linear Model EPSY 905: Multivariate Analysis Online Lecture #2 Learning Objectives Types of distributions: Ø Conditional distributions The General Linear Model Ø Regression Ø Analysis
More informationSimple, Marginal, and Interaction Effects in General Linear Models
Simple, Marginal, and Interaction Effects in General Linear Models PRE 905: Multivariate Analysis Lecture 3 Today s Class Centering and Coding Predictors Interpreting Parameters in the Model for the Means
More informationApplied Regression Modeling: A Business Approach Chapter 2: Simple Linear Regression Sections
Applied Regression Modeling: A Business Approach Chapter 2: Simple Linear Regression Sections 2.1 2.3 by Iain Pardoe 2.1 Probability model for and 2 Simple linear regression model for and....................................
More informationRegression and Models with Multiple Factors. Ch. 17, 18
Regression and Models with Multiple Factors Ch. 17, 18 Mass 15 20 25 Scatter Plot 70 75 80 Snout-Vent Length Mass 15 20 25 Linear Regression 70 75 80 Snout-Vent Length Least-squares The method of least
More informationSTA442/2101: Assignment 5
STA442/2101: Assignment 5 Craig Burkett Quiz on: Oct 23 rd, 2015 The questions are practice for the quiz next week, and are not to be handed in. I would like you to bring in all of the code you used to
More informationInferences on Linear Combinations of Coefficients
Inferences on Linear Combinations of Coefficients Note on required packages: The following code required the package multcomp to test hypotheses on linear combinations of regression coefficients. If you
More informationStatistics for Engineers Lecture 9 Linear Regression
Statistics for Engineers Lecture 9 Linear Regression Chong Ma Department of Statistics University of South Carolina chongm@email.sc.edu April 17, 2017 Chong Ma (Statistics, USC) STAT 509 Spring 2017 April
More informationIntroduction to the Analysis of Hierarchical and Longitudinal Data
Introduction to the Analysis of Hierarchical and Longitudinal Data Georges Monette, York University with Ye Sun SPIDA June 7, 2004 1 Graphical overview of selected concepts Nature of hierarchical models
More informationHow to mathematically model a linear relationship and make predictions.
Introductory Statistics Lectures Linear regression How to mathematically model a linear relationship and make predictions. Department of Mathematics Pima Community College (Compile date: Mon Apr 28 20:50:28
More informationSimple Linear Regression
Simple Linear Regression September 24, 2008 Reading HH 8, GIll 4 Simple Linear Regression p.1/20 Problem Data: Observe pairs (Y i,x i ),i = 1,...n Response or dependent variable Y Predictor or independent
More informationChapter 27 Summary Inferences for Regression
Chapter 7 Summary Inferences for Regression What have we learned? We have now applied inference to regression models. Like in all inference situations, there are conditions that we must check. We can test
More informationHow to mathematically model a linear relationship and make predictions.
Introductory Statistics Lectures Linear regression How to mathematically model a linear relationship and make predictions. Department of Mathematics Pima Community College Redistribution of this material
More informationExtensions of One-Way ANOVA.
Extensions of One-Way ANOVA http://www.pelagicos.net/classes_biometry_fa18.htm What do I want You to Know What are two main limitations of ANOVA? What two approaches can follow a significant ANOVA? How
More informationLecture 15. Hypothesis testing in the linear model
14. Lecture 15. Hypothesis testing in the linear model Lecture 15. Hypothesis testing in the linear model 1 (1 1) Preliminary lemma 15. Hypothesis testing in the linear model 15.1. Preliminary lemma Lemma
More informationNonstationary time series models
13 November, 2009 Goals Trends in economic data. Alternative models of time series trends: deterministic trend, and stochastic trend. Comparison of deterministic and stochastic trend models The statistical
More informationSCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models
SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION
More informationHeteroskedasticity. Part VII. Heteroskedasticity
Part VII Heteroskedasticity As of Oct 15, 2015 1 Heteroskedasticity Consequences Heteroskedasticity-robust inference Testing for Heteroskedasticity Weighted Least Squares (WLS) Feasible generalized Least
More informationRegression With a Categorical Independent Variable
Regression ith a Independent Variable ERSH 8320 Slide 1 of 34 Today s Lecture Regression with a single categorical independent variable. Today s Lecture Coding procedures for analysis. Dummy coding. Relationship
More informationRegression With a Categorical Independent Variable
Regression With a Independent Variable Lecture 10 November 5, 2008 ERSH 8320 Lecture #10-11/5/2008 Slide 1 of 54 Today s Lecture Today s Lecture Chapter 11: Regression with a single categorical independent
More information