Multiple regression. Partial regression coefficients

Size: px
Start display at page:

Download "Multiple regression. Partial regression coefficients"

Transcription

1 Multiple regression We now generalise the results of simple linear regression to the case where there is one response variable Y and two predictor variables, X and Z. Data consist of n triplets of values (X, Z, Y ) (X n, Z n, Y n ). We want to predict the value of Y associated with particular combination of values of X and Z, or to describe the relationship between Y, X and Z, or to estimate the effect of changes in X and Z on Y. Once again there are two situations in which this type of problem arises. a) Predictors X, Z and response Y are all random. b) Predictors X and Z are fixed, e.g. by experimental design. In either case, there is a prediction equation Y = b 0 + b X + b 2 Z + e () The prediction error e is assumed N(0, σ 2 ). Geometrically, we can imagine a 3-d picture with perpendicular horizontal axes X and Z and a vertical axis Y. The equation Y = b 0 + b X + b 2 Z represents a plane surface whose position is determined by b 0 and its orientation by b and b 2. Partial regression coefficients The coefficients b and b 2 in equation () are partial regression coefficients. For example, b represents the effect on Y of changing X while holding Z constant. The parameters b and b 2 can be estimated by least squares, i.e. chosen so that the sum of squares n (Y i= i b 0 b X i b 2 Z i ) 2 is minimized. The least squares estimates b and b 2 are solutions of the two equations [ S xx S xz S zx S zz ][ b b 2 ] = [ S xy S zy ] For the case where all variables X, Z, and Y are random, the corrected sums of squares and products would be replaced by variances and covariances. Solving the equations gives b = (S xy S xz S yz /S zz )/(S xx S 2 xz /S zz) The formula for b 2 is obtained by switching x and z in the formula for b. It sometimes helps to give regression coefficients more informative labels. For example, if we denote the regression coefficient of Y on X (ignoring Z) by b yx and the partial regression coefficient of Y on X (accounting for Z) by b yx,z then b yx = b yx,z + b yz,x b zx

2 If X changes by a small amount δx, there is a small concomitant change in Z of amount δz = b zx δx, which is invisible when we regress Y on X (ignoring Z). The total change in Y is δy = b yx,z δx + b yz,x δz and b yx = δy/δx. Residuals and fitted values The fitted value for the ith observation is Y i = b 0 + b X i + b 2 Z i, and the equation which we derived for simple linear regression still holds: n i= (Y i Y ) 2 = n i= ( Y i Y ) 2 + n i= (Y i Y i ) 2 The regression sum of squares (first term on r.h.s.) simplifies to b S xy + b 2 S zy, with 2 d.f. The total sum of squares (on l.h.s.) has n d.f., as before. The residual sum of squares (calculated as the difference between total and regression sums of squares) has n 3 d.f. The ANOVA table now has the form Source of variation DF Regression 2 Residual n 3 Total n Mean squares for regression and residual are calculated in the usual way, and F is the ratio of these two mean squares. The F statistic is used to test H 0 : b = b 2 = 0, i.e. that E(Y ) = b 0 (a constant). Extra sums of squares The regression sum of squares with 2 d.f. is usually split into two components: the SSQ associated with fitting X alone ( d.f.), and the extra SSQ obtained when Z is added to the equation (this also has d.f.). Alternatively, it is the SSQ associated with fitting Z alone, plus the extra SSQ obtained by adding X to the equation. Note that, for example, the SSQ associated with fitting X alone is not generally the same as the extra SSQ obtained by fitting X after Z. However, whichever way we chose to do the split, the two components add up to the regression SSQ with 2 d.f. Hypothesis testing As well as testing the hypothesis that b = b 2 = 0, we can test each partial regression coefficient separately for significance. The test of H 0 : b = 0 is based on the extra sum of squares obtained by fitting X after Z: if H 0 is true, (extra sum of squares)/s 2 F with and n 3 d.f. 2

3 Two examples ) In animal breeding, Y might be the breeding value of an animal for a particular trait, and X and Z values of the trait measured on two of its relatives. For example, if Y is the milk yield breeding value of a heifer, and X and Z milk yields of its mother and paternal half-sister, we might predict Y from X and Z. Covariance matrix for X, Z and Y is V P 0 2 V A 0 V P 2 V A 4 V A 4 V A V A Prediction equation is Y = h 2 ( 2 X + 4 Z), Y = b X + b 2 Z, where b V P = 2 V A, b 2 V P = 4 V A, so where h 2 = V A /V P is the heritability of the trait. 2) The difficulty of a hill race is measured by (i) the total distance covered, and (ii) the total climb required. The file hills.csv gives the distance (miles), climb (000 feet), and record time (minutes) for 35 Scottish hill races. Multiple regression can be used to find a relationship between record time and the two measures of difficulty. Fitting distance (time = distance) produces a reasonable fit, but there are large residuals for two difficult races, Bens of Jura (+75.7) and Lairig Ghru ( 39.4). With both distance and climb fitted, (time = distance +.75 climb), there is a significant improvement in overall fit. For example, the residuals for both Bens of Jura (+27.9) and Lairig Ghru (+3.3) are considerably reduced. Using lm( ) to fit this model to the hill race data gives the following estimates for the partial regression coefficients: Estimate Std. Error t value Distance Climb and the ANOVA table is: Df Sum Sq Mean Sq F value Distance Climb Residuals The t statistic for the Climb partial regression coefficient is the square root of the F statistic for the effect of fitting Climb after Distance. These two tests are equivalent. 3

4 Miscellaneous The results given above describe how to predict Y using two r.v.s X and Z. These results generalise in a straightforward way to the problem of predicting Y from any number of predictors X, Z, W, etc. Comments already made about residuals, outliers, cause and effect, etc, for simple linear regression remain relevant for multiple regression. It often happens that one of the predictor variables (X, say) is of primary interest, and the other (Z) is included as a potential confounder. In other words, Z is included not because we are interested in its effect, but to ensure that the effect of X is adjusted for the effect of Z. For example, in a study to determine whether drinking coffee might have a beneficial effect on health, we might also include annual earnings. Unless we do this, an association between good health and high earnings might appear misleadingly as an association with level of coffee consumption (if, for example, high earners consume large amounts of coffee and low earners do not, because they cannot afford to). If there is a strong correlation between X and Z, the partial regression coefficients will have large standard errors. In extreme cases, when the correlation is close to ± ( collinearity ), the fitting procedure breaks down and one or other variable must be dropped from the regression equation. Multiple regression in R The lm( ) function which we have already used for simple linear regression also deals with multiple regression. Diagnostic plots (residuals against fitted values, etc), and analysis of variance tables are produced as for simple linear regression. R code for the hill-race example is given below. If fit is a fitted lm object, summary(fit) produces estimates and standard errors for partial regression coefficients. These are adjusted for all other effects in the model, and do not depend on the order of terms in the model equation. anova(fit) produces sequential sums of squares, which depend on the order of terms. In the example below, the first row of the anova table will have a sum of squares ( d.f.) for the effect of distance (unadjusted). The second row will have a sum of squares ( d.f.) for the effect of climb, adjusted for the effect of distance. If the model equation had been given as time ~ climb + dist, a different ANOVA would be produced with sums of squares for climb (unadjusted) and distance (adjusted for climb). The sum of the two sums of squares (unadjusted and adjusted) would be the same in both cases. hills <- read.table("hills.txt") fit <- lm(time ~ dist + climb, data = hills) summary(fit) anova(fit) plot(fit) 4

5 Factors and dummy variables The following is a typical example of an analysis of variance model. We will consider such models next week. Here we show that the anova model can be considered as a special case of multiple regression. A flock of sheep consists of three breeds: Scottish Blackface, Welsh Mountain and the Blackface Welsh cross. A blood sample is taken from a random sample of each breed, and the Cu content measured. Do the breeds differ in Cu concentrations? Blackface Welsh Cross Assume the data are normally distributed with constant variance and a mean value that depends on breed. This can be treated as a multiple regression E(Y ) = b 0 + b X + b 2 Z, where the values assigned to X and Z depend on breed as follows: Breed X Z b 0 + b X + b 2 Z Blackface 0 0 b 0 Welsh 0 b 0 + b Cross 0 b 0 + b 2 The parameters b 0, b, b 2 represent the mean value for the Blackface breed, the difference between Welsh and Blackface, and the difference between Cross and Blackface. The multiple regression F test (with 2 d.f. in the numerator) tests H 0 : b 2 = b 3 = 0 (no difference among breeds). This is easily generalized to compare any number of groups. Dummy variables usually arise through use of factors in model formulas. For example, in the R code given below, the formula ~ breed is equivalent to ~ X + Z, where X and Z are the dummy variables described above. However, in this case, the usual splitting of the regression sums of squares does not take place. breed <- factor(rep(c("blackface", "Welsh", "Cross"), c(5,5,5))) Cu <- c(6.5,7.9,7.4,6.8,8.,0.4,9.8,., 0.6,9.2,6.9,9.2,8.4,7.6,9.7) fit <- lm(cu ~ breed) anova(fit) 5

6 Example model formulas Suppose x and z are numeric, and that A is a factor. Some possibilities for the right-hand side of an lm( ) formula are: Formula x x + z poly(x,2) A A + x A * x Interpretation simple linear regression multiple regression polynomial regression one-way analysis of variance parallel lines (analysis of covariance) separate lines (intercept and slope) for each level of A For example, if factor A has two levels, the model formula for parallel lines gives E(Y ) as b 0 + b 2 X for the first level of A and b 0 + b + b 2 X for the second level. The common slope of the two lines is b 2, the intercept for the first level of A is b 0, and b is the difference between the intercepts (the constant vertical separation of the two lines). This model ( analysis of covariance ) is often used when primary interest is in the factor A, and x is a potential confounder. 6

Oct Simple linear regression. Minimum mean square error prediction. Univariate. regression. Calculating intercept and slope

Oct Simple linear regression. Minimum mean square error prediction. Univariate. regression. Calculating intercept and slope Oct 2017 1 / 28 Minimum MSE Y is the response variable, X the predictor variable, E(X) = E(Y) = 0. BLUP of Y minimizes average discrepancy var (Y ux) = C YY 2u C XY + u 2 C XX This is minimized when u

More information

Oct Analysis of variance models. One-way anova. Three sheep breeds. Finger ridges. Random and. Fixed effects model. The random effects model

Oct Analysis of variance models. One-way anova. Three sheep breeds. Finger ridges. Random and. Fixed effects model. The random effects model s s Oct 2017 1 / 34 s Consider N = n 0 + n 1 + + n k 1 observations, which form k groups, of sizes n 0, n 1,..., n k 1. The r-th group has sample mean Ȳ r The overall mean (for all groups combined) is

More information

Statistics - Lecture Three. Linear Models. Charlotte Wickham 1.

Statistics - Lecture Three. Linear Models. Charlotte Wickham   1. Statistics - Lecture Three Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Linear Models 1. The Theory 2. Practical Use 3. How to do it in R 4. An example 5. Extensions

More information

Unbalanced Data in Factorials Types I, II, III SS Part 1

Unbalanced Data in Factorials Types I, II, III SS Part 1 Unbalanced Data in Factorials Types I, II, III SS Part 1 Chapter 10 in Oehlert STAT:5201 Week 9 - Lecture 2 1 / 14 When we perform an ANOVA, we try to quantify the amount of variability in the data accounted

More information

Multiple Regression: Example

Multiple Regression: Example Multiple Regression: Example Cobb-Douglas Production Function The Cobb-Douglas production function for observed economic data i = 1,..., n may be expressed as where O i is output l i is labour input c

More information

Sec. 14.3: Partial Derivatives. All of the following are ways of representing the derivative. y dx

Sec. 14.3: Partial Derivatives. All of the following are ways of representing the derivative. y dx Math 2204 Multivariable Calc Chapter 14: Partial Derivatives I. Review from math 1225 A. First Derivative Sec. 14.3: Partial Derivatives 1. Def n : The derivative of the function f with respect to the

More information

NC Births, ANOVA & F-tests

NC Births, ANOVA & F-tests Math 158, Spring 2018 Jo Hardin Multiple Regression II R code Decomposition of Sums of Squares (and F-tests) NC Births, ANOVA & F-tests A description of the data is given at http://pages.pomona.edu/~jsh04747/courses/math58/

More information

36-707: Regression Analysis Homework Solutions. Homework 3

36-707: Regression Analysis Homework Solutions. Homework 3 36-707: Regression Analysis Homework Solutions Homework 3 Fall 2012 Problem 1 Y i = βx i + ɛ i, i {1, 2,..., n}. (a) Find the LS estimator of β: RSS = Σ n i=1(y i βx i ) 2 RSS β = Σ n i=1( 2X i )(Y i βx

More information

y response variable x 1, x 2,, x k -- a set of explanatory variables

y response variable x 1, x 2,, x k -- a set of explanatory variables 11. Multiple Regression and Correlation y response variable x 1, x 2,, x k -- a set of explanatory variables In this chapter, all variables are assumed to be quantitative. Chapters 12-14 show how to incorporate

More information

A DARK GREY P O N T, with a Switch Tail, and a small Star on the Forehead. Any

A DARK GREY P O N T, with a Switch Tail, and a small Star on the Forehead. Any Y Y Y X X «/ YY Y Y ««Y x ) & \ & & } # Y \#$& / Y Y X» \\ / X X X x & Y Y X «q «z \x» = q Y # % \ & [ & Z \ & { + % ) / / «q zy» / & / / / & x x X / % % ) Y x X Y $ Z % Y Y x x } / % «] «] # z» & Y X»

More information

MODELS WITHOUT AN INTERCEPT

MODELS WITHOUT AN INTERCEPT Consider the balanced two factor design MODELS WITHOUT AN INTERCEPT Factor A 3 levels, indexed j 0, 1, 2; Factor B 5 levels, indexed l 0, 1, 2, 3, 4; n jl 4 replicate observations for each factor level

More information

Categorical Predictor Variables

Categorical Predictor Variables Categorical Predictor Variables We often wish to use categorical (or qualitative) variables as covariates in a regression model. For binary variables (taking on only 2 values, e.g. sex), it is relatively

More information

Data Analysis and Statistical Methods Statistics 651

Data Analysis and Statistical Methods Statistics 651 y 1 2 3 4 5 6 7 x Data Analysis and Statistical Methods Statistics 651 http://www.stat.tamu.edu/~suhasini/teaching.html Lecture 32 Suhasini Subba Rao Previous lecture We are interested in whether a dependent

More information

Exam Applied Statistical Regression. Good Luck!

Exam Applied Statistical Regression. Good Luck! Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.

More information

Matrices and vectors A matrix is a rectangular array of numbers. Here s an example: A =

Matrices and vectors A matrix is a rectangular array of numbers. Here s an example: A = Matrices and vectors A matrix is a rectangular array of numbers Here s an example: 23 14 17 A = 225 0 2 This matrix has dimensions 2 3 The number of rows is first, then the number of columns We can write

More information

Homework 1/Solutions. Graded Exercises

Homework 1/Solutions. Graded Exercises MTH 310-3 Abstract Algebra I and Number Theory S18 Homework 1/Solutions Graded Exercises Exercise 1. Below are parts of the addition table and parts of the multiplication table of a ring. Complete both

More information

Lecture 4 Multiple linear regression

Lecture 4 Multiple linear regression Lecture 4 Multiple linear regression BIOST 515 January 15, 2004 Outline 1 Motivation for the multiple regression model Multiple regression in matrix notation Least squares estimation of model parameters

More information

School of Mathematical Sciences. Question 1

School of Mathematical Sciences. Question 1 School of Mathematical Sciences MTH5120 Statistical Modelling I Practical 8 and Assignment 7 Solutions Question 1 Figure 1: The residual plots do not contradict the model assumptions of normality, constant

More information

Regression. Estimation of the linear function (straight line) describing the linear component of the joint relationship between two variables X and Y.

Regression. Estimation of the linear function (straight line) describing the linear component of the joint relationship between two variables X and Y. Regression Bivariate i linear regression: Estimation of the linear function (straight line) describing the linear component of the joint relationship between two variables and. Generally describe as a

More information

Neatest and Promptest Manner. E d i t u r ami rul)lihher. FOIt THE CIIILDIIES'. Trifles.

Neatest and Promptest Manner. E d i t u r ami rul)lihher. FOIt THE CIIILDIIES'. Trifles. » ~ $ ) 7 x X ) / ( 8 2 X 39 ««x» ««! «! / x? \» «({? «» q «(? (?? x! «? 8? ( z x x q? ) «q q q ) x z x 69 7( X X ( 3»«! ( ~«x ««x ) (» «8 4 X «4 «4 «8 X «x «(» X) ()»» «X «97 X X X 4 ( 86) x) ( ) z z

More information

Correlation. A statistics method to measure the relationship between two variables. Three characteristics

Correlation. A statistics method to measure the relationship between two variables. Three characteristics Correlation Correlation A statistics method to measure the relationship between two variables Three characteristics Direction of the relationship Form of the relationship Strength/Consistency Direction

More information

STAT Chapter 11: Regression

STAT Chapter 11: Regression STAT 515 -- Chapter 11: Regression Mostly we have studied the behavior of a single random variable. Often, however, we gather data on two random variables. We wish to determine: Is there a relationship

More information

Multiple linear regression S6

Multiple linear regression S6 Basic medical statistics for clinical and experimental research Multiple linear regression S6 Katarzyna Jóźwiak k.jozwiak@nki.nl November 15, 2017 1/42 Introduction Two main motivations for doing multiple

More information

Multiple Regression. More Hypothesis Testing. More Hypothesis Testing The big question: What we really want to know: What we actually know: We know:

Multiple Regression. More Hypothesis Testing. More Hypothesis Testing The big question: What we really want to know: What we actually know: We know: Multiple Regression Ψ320 Ainsworth More Hypothesis Testing What we really want to know: Is the relationship in the population we have selected between X & Y strong enough that we can use the relationship

More information

STATISTICS 110/201 PRACTICE FINAL EXAM

STATISTICS 110/201 PRACTICE FINAL EXAM STATISTICS 110/201 PRACTICE FINAL EXAM Questions 1 to 5: There is a downloadable Stata package that produces sequential sums of squares for regression. In other words, the SS is built up as each variable

More information

FREC 608 Guided Exercise 9

FREC 608 Guided Exercise 9 FREC 608 Guided Eercise 9 Problem. Model of Average Annual Precipitation An article in Geography (July 980) used regression to predict average annual rainfall levels in California. Data on the following

More information

Sociology 593 Exam 2 Answer Key March 28, 2002

Sociology 593 Exam 2 Answer Key March 28, 2002 Sociology 59 Exam Answer Key March 8, 00 I. True-False. (0 points) Indicate whether the following statements are true or false. If false, briefly explain why.. A variable is called CATHOLIC. This probably

More information

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X. Estimating σ 2 We can do simple prediction of Y and estimation of the mean of Y at any value of X. To perform inferences about our regression line, we must estimate σ 2, the variance of the error term.

More information

Faculty of Science FINAL EXAMINATION Mathematics MATH 523 Generalized Linear Models

Faculty of Science FINAL EXAMINATION Mathematics MATH 523 Generalized Linear Models Faculty of Science FINAL EXAMINATION Mathematics MATH 523 Generalized Linear Models Examiner: Professor K.J. Worsley Associate Examiner: Professor R. Steele Date: Thursday, April 17, 2008 Time: 14:00-17:00

More information

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017 UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Tuesday, January 17, 2017 Work all problems 60 points are needed to pass at the Masters Level and 75

More information

ST430 Exam 2 Solutions

ST430 Exam 2 Solutions ST430 Exam 2 Solutions Date: November 9, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textbook are permitted but you may use a calculator. Giving

More information

Lecture 9: Linear Regression

Lecture 9: Linear Regression Lecture 9: Linear Regression Goals Develop basic concepts of linear regression from a probabilistic framework Estimating parameters and hypothesis testing with linear models Linear regression in R Regression

More information

General Linear Model (Chapter 4)

General Linear Model (Chapter 4) General Linear Model (Chapter 4) Outcome variable is considered continuous Simple linear regression Scatterplots OLS is BLUE under basic assumptions MSE estimates residual variance testing regression coefficients

More information

Statistics 203 Introduction to Regression Models and ANOVA Practice Exam

Statistics 203 Introduction to Regression Models and ANOVA Practice Exam Statistics 203 Introduction to Regression Models and ANOVA Practice Exam Prof. J. Taylor You may use your 4 single-sided pages of notes This exam is 7 pages long. There are 4 questions, first 3 worth 10

More information

Cartesian Plane. Analytic Geometry. Unit Name

Cartesian Plane. Analytic Geometry. Unit Name 3.1cartesian Unit Name Analytic Geometry Unit Goals 1. Create table of values in order to graph &/or determine if a relation is linear. Determine slope 3. Calculate missing information for linearelationships.

More information

MATH 644: Regression Analysis Methods

MATH 644: Regression Analysis Methods MATH 644: Regression Analysis Methods FINAL EXAM Fall, 2012 INSTRUCTIONS TO STUDENTS: 1. This test contains SIX questions. It comprises ELEVEN printed pages. 2. Answer ALL questions for a total of 100

More information

Acknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression

Acknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression INTRODUCTION TO CLINICAL RESEARCH Introduction to Linear Regression Karen Bandeen-Roche, Ph.D. July 17, 2012 Acknowledgements Marie Diener-West Rick Thompson ICTR Leadership / Team JHU Intro to Clinical

More information

Stat 500 Midterm 2 12 November 2009 page 0 of 11

Stat 500 Midterm 2 12 November 2009 page 0 of 11 Stat 500 Midterm 2 12 November 2009 page 0 of 11 Please put your name on the back of your answer book. Do NOT put it on the front. Thanks. Do not start until I tell you to. The exam is closed book, closed

More information

Linear regression and correlation

Linear regression and correlation Faculty of Health Sciences Linear regression and correlation Statistics for experimental medical researchers 2018 Julie Forman, Christian Pipper & Claus Ekstrøm Department of Biostatistics, University

More information

Algebraic Expressions

Algebraic Expressions Algebraic Expressions 1. Expressions are formed from variables and constants. 2. Terms are added to form expressions. Terms themselves are formed as product of factors. 3. Expressions that contain exactly

More information

Correlation and simple linear regression S5

Correlation and simple linear regression S5 Basic medical statistics for clinical and eperimental research Correlation and simple linear regression S5 Katarzyna Jóźwiak k.jozwiak@nki.nl November 15, 2017 1/41 Introduction Eample: Brain size and

More information

Chapter 10. Correlation and Regression. McGraw-Hill, Bluman, 7th ed., Chapter 10 1

Chapter 10. Correlation and Regression. McGraw-Hill, Bluman, 7th ed., Chapter 10 1 Chapter 10 Correlation and Regression McGraw-Hill, Bluman, 7th ed., Chapter 10 1 Chapter 10 Overview Introduction 10-1 Scatter Plots and Correlation 10- Regression 10-3 Coefficient of Determination and

More information

Lab 3 A Quick Introduction to Multiple Linear Regression Psychology The Multiple Linear Regression Model

Lab 3 A Quick Introduction to Multiple Linear Regression Psychology The Multiple Linear Regression Model Lab 3 A Quick Introduction to Multiple Linear Regression Psychology 310 Instructions.Work through the lab, saving the output as you go. You will be submitting your assignment as an R Markdown document.

More information

STA 303H1F: Two-way Analysis of Variance Practice Problems

STA 303H1F: Two-way Analysis of Variance Practice Problems STA 303H1F: Two-way Analysis of Variance Practice Problems 1. In the Pygmalion example from lecture, why are the average scores of the platoon used as the response variable, rather than the scores of the

More information

Math 423/533: The Main Theoretical Topics

Math 423/533: The Main Theoretical Topics Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)

More information

Can you tell the relationship between students SAT scores and their college grades?

Can you tell the relationship between students SAT scores and their college grades? Correlation One Challenge Can you tell the relationship between students SAT scores and their college grades? A: The higher SAT scores are, the better GPA may be. B: The higher SAT scores are, the lower

More information

Regression Models. Chapter 4. Introduction. Introduction. Introduction

Regression Models. Chapter 4. Introduction. Introduction. Introduction Chapter 4 Regression Models Quantitative Analysis for Management, Tenth Edition, by Render, Stair, and Hanna 008 Prentice-Hall, Inc. Introduction Regression analysis is a very valuable tool for a manager

More information

Re: January 27, 2015 Math 080: Final Exam Review Page 1 of 6

Re: January 27, 2015 Math 080: Final Exam Review Page 1 of 6 Re: January 7, 015 Math 080: Final Exam Review Page 1 of 6 Note: If you have difficulty with any of these problems, get help, then go back to the appropriate sections and work more problems! 1. Solve for

More information

1 A Review of Correlation and Regression

1 A Review of Correlation and Regression 1 A Review of Correlation and Regression SW, Chapter 12 Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then

More information

LOWELL JOURNAL. MUST APOLOGIZE. such communication with the shore as Is m i Boimhle, noewwary and proper for the comfort

LOWELL JOURNAL. MUST APOLOGIZE. such communication with the shore as Is m i Boimhle, noewwary and proper for the comfort - 7 7 Z 8 q ) V x - X > q - < Y Y X V - z - - - - V - V - q \ - q q < -- V - - - x - - V q > x - x q - x q - x - - - 7 -» - - - - 6 q x - > - - x - - - x- - - q q - V - x - - ( Y q Y7 - >»> - x Y - ] [

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there

More information

Linear Regression is a very popular method in science and engineering. It lets you establish relationships between two or more numerical variables.

Linear Regression is a very popular method in science and engineering. It lets you establish relationships between two or more numerical variables. Lab 13. Linear Regression www.nmt.edu/~olegm/382labs/lab13r.pdf Note: the things you will read or type on the computer are in the Typewriter Font. All the files mentioned can be found at www.nmt.edu/~olegm/382labs/

More information

Mrs. Poyner/Mr. Page Chapter 3 page 1

Mrs. Poyner/Mr. Page Chapter 3 page 1 Name: Date: Period: Chapter 2: Take Home TEST Bivariate Data Part 1: Multiple Choice. (2.5 points each) Hand write the letter corresponding to the best answer in space provided on page 6. 1. In a statistics

More information

22s:152 Applied Linear Regression. Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA)

22s:152 Applied Linear Regression. Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA) 22s:152 Applied Linear Regression Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA) We now consider an analysis with only categorical predictors (i.e. all predictors are

More information

9. Linear Regression and Correlation

9. Linear Regression and Correlation 9. Linear Regression and Correlation Data: y a quantitative response variable x a quantitative explanatory variable (Chap. 8: Recall that both variables were categorical) For example, y = annual income,

More information

Biostatistics 380 Multiple Regression 1. Multiple Regression

Biostatistics 380 Multiple Regression 1. Multiple Regression Biostatistics 0 Multiple Regression ORIGIN 0 Multiple Regression Multiple Regression is an extension of the technique of linear regression to describe the relationship between a single dependent (response)

More information

CSE 167: Introduction to Computer Graphics Lecture #2: Linear Algebra Primer

CSE 167: Introduction to Computer Graphics Lecture #2: Linear Algebra Primer CSE 167: Introduction to Computer Graphics Lecture #2: Linear Algebra Primer Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2016 Announcements Monday October 3: Discussion Assignment

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

1. (Rao example 11.15) A study measures oxygen demand (y) (on a log scale) and five explanatory variables (see below). Data are available as

1. (Rao example 11.15) A study measures oxygen demand (y) (on a log scale) and five explanatory variables (see below). Data are available as ST 51, Summer, Dr. Jason A. Osborne Homework assignment # - Solutions 1. (Rao example 11.15) A study measures oxygen demand (y) (on a log scale) and five explanatory variables (see below). Data are available

More information

Linear Modelling in Stata Session 6: Further Topics in Linear Modelling

Linear Modelling in Stata Session 6: Further Topics in Linear Modelling Linear Modelling in Stata Session 6: Further Topics in Linear Modelling Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 14/11/2017 This Week Categorical Variables Categorical

More information

1.) Fit the full model, i.e., allow for separate regression lines (different slopes and intercepts) for each species

1.) Fit the full model, i.e., allow for separate regression lines (different slopes and intercepts) for each species Lecture notes 2/22/2000 Dummy variables and extra SS F-test Page 1 Crab claw size and closing force. Problem 7.25, 10.9, and 10.10 Regression for all species at once, i.e., include dummy variables for

More information

Lecture 2. The Simple Linear Regression Model: Matrix Approach

Lecture 2. The Simple Linear Regression Model: Matrix Approach Lecture 2 The Simple Linear Regression Model: Matrix Approach Matrix algebra Matrix representation of simple linear regression model 1 Vectors and Matrices Where it is necessary to consider a distribution

More information

13 Simple Linear Regression

13 Simple Linear Regression B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 3 Simple Linear Regression 3. An industrial example A study was undertaken to determine the effect of stirring rate on the amount of impurity

More information

Confidence Intervals, Testing and ANOVA Summary

Confidence Intervals, Testing and ANOVA Summary Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0

More information

STAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing

STAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing STAT763: Applied Regression Analysis Multiple linear regression 4.4 Hypothesis testing Chunsheng Ma E-mail: cma@math.wichita.edu 4.4.1 Significance of regression Null hypothesis (Test whether all β j =

More information

Models with multiple random effects: Repeated Measures and Maternal effects

Models with multiple random effects: Repeated Measures and Maternal effects Models with multiple random effects: Repeated Measures and Maternal effects 1 Often there are several vectors of random effects Repeatability models Multiple measures Common family effects Cleaning up

More information

9 Correlation and Regression

9 Correlation and Regression 9 Correlation and Regression SW, Chapter 12. Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then retakes the

More information

Formula for the t-test

Formula for the t-test Formula for the t-test: How the t-test Relates to the Distribution of the Data for the Groups Formula for the t-test: Formula for the Standard Error of the Difference Between the Means Formula for the

More information

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010 1 Linear models Y = Xβ + ɛ with ɛ N (0, σ 2 e) or Y N (Xβ, σ 2 e) where the model matrix X contains the information on predictors and β includes all coefficients (intercept, slope(s) etc.). 1. Number of

More information

Lecture 14. Analysis of Variance * Correlation and Regression. The McGraw-Hill Companies, Inc., 2000

Lecture 14. Analysis of Variance * Correlation and Regression. The McGraw-Hill Companies, Inc., 2000 Lecture 14 Analysis of Variance * Correlation and Regression Outline Analysis of Variance (ANOVA) 11-1 Introduction 11-2 Scatter Plots 11-3 Correlation 11-4 Regression Outline 11-5 Coefficient of Determination

More information

Lecture 14. Outline. Outline. Analysis of Variance * Correlation and Regression Analysis of Variance (ANOVA)

Lecture 14. Outline. Outline. Analysis of Variance * Correlation and Regression Analysis of Variance (ANOVA) Outline Lecture 14 Analysis of Variance * Correlation and Regression Analysis of Variance (ANOVA) 11-1 Introduction 11- Scatter Plots 11-3 Correlation 11-4 Regression Outline 11-5 Coefficient of Determination

More information

No other aids are allowed. For example you are not allowed to have any other textbook or past exams.

No other aids are allowed. For example you are not allowed to have any other textbook or past exams. UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Sample Exam Note: This is one of our past exams, In fact the only past exam with R. Before that we were using SAS. In

More information

Analysing data: regression and correlation S6 and S7

Analysing data: regression and correlation S6 and S7 Basic medical statistics for clinical and experimental research Analysing data: regression and correlation S6 and S7 K. Jozwiak k.jozwiak@nki.nl 2 / 49 Correlation So far we have looked at the association

More information

' Liberty and Umou Ono and Inseparablo "

' Liberty and Umou Ono and Inseparablo 3 5? #< q 8 2 / / ) 9 ) 2 ) > < _ / ] > ) 2 ) ) 5 > x > [ < > < ) > _ ] ]? <

More information

CSE 167: Introduction to Computer Graphics Lecture #2: Linear Algebra Primer

CSE 167: Introduction to Computer Graphics Lecture #2: Linear Algebra Primer CSE 167: Introduction to Computer Graphics Lecture #2: Linear Algebra Primer Jürgen P. Schulze, Ph.D. University of California, San Diego Spring Quarter 2016 Announcements Project 1 due next Friday at

More information

22s:152 Applied Linear Regression. 1-way ANOVA visual:

22s:152 Applied Linear Regression. 1-way ANOVA visual: 22s:152 Applied Linear Regression 1-way ANOVA visual: Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA) 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 Y We now consider an analysis

More information

The following formulas related to this topic are provided on the formula sheet:

The following formulas related to this topic are provided on the formula sheet: Student Notes Prep Session Topic: Exploring Content The AP Statistics topic outline contains a long list of items in the category titled Exploring Data. Section D topics will be reviewed in this session.

More information

Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues

Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues Overfitting Categorical Variables Interaction Terms Non-linear Terms Linear Logarithmic y = a +

More information

Unit IV State of stress in Three Dimensions

Unit IV State of stress in Three Dimensions Unit IV State of stress in Three Dimensions State of stress in Three Dimensions References Punmia B.C.,"Theory of Structures" (SMTS) Vol II, Laxmi Publishing Pvt Ltd, New Delhi 2004. Rattan.S.S., "Strength

More information

AP Statistics Bivariate Data Analysis Test Review. Multiple-Choice

AP Statistics Bivariate Data Analysis Test Review. Multiple-Choice Name Period AP Statistics Bivariate Data Analysis Test Review Multiple-Choice 1. The correlation coefficient measures: (a) Whether there is a relationship between two variables (b) The strength of the

More information

Making Sense of Coefficients in Multiple Regression

Making Sense of Coefficients in Multiple Regression Making Sense of Coefficients in Multiple Regression David C. Hoaglin 12 January 2013 Many analyses of data by multiple regression and related methods (e.g., logistic regression) involve interpreting coefficients

More information

UNLV University of Nevada, Las Vegas

UNLV University of Nevada, Las Vegas UNLV University of Nevada, Las Vegas The Department of Mathematical Sciences Information Regarding Math 14 Final Exam Revised 8.8.016 While all material covered in the syllabus is essential for success

More information

BIOSTATISTICS NURS 3324

BIOSTATISTICS NURS 3324 Simple Linear Regression and Correlation Introduction Previously, our attention has been focused on one variable which we designated by x. Frequently, it is desirable to learn something about the relationship

More information

PART I. (a) Describe all the assumptions for a normal error regression model with one predictor variable,

PART I. (a) Describe all the assumptions for a normal error regression model with one predictor variable, Concordia University Department of Mathematics and Statistics Course Number Section Statistics 360/2 01 Examination Date Time Pages Final December 2002 3 hours 6 Instructors Course Examiner Marks Y.P.

More information

Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model

Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model Outline 1 Multiple Linear Regression (Estimation, Inference, Diagnostics and Remedial Measures) 2 Special Topics for Multiple Regression Extra Sums of Squares Standardized Version of the Multiple Regression

More information

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises LINEAR REGRESSION ANALYSIS MODULE XVI Lecture - 44 Exercises Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Exercise 1 The following data has been obtained on

More information

SPSS Output. ANOVA a b Residual Coefficients a Standardized Coefficients

SPSS Output. ANOVA a b Residual Coefficients a Standardized Coefficients SPSS Output Homework 1-1e ANOVA a Sum of Squares df Mean Square F Sig. 1 Regression 351.056 1 351.056 11.295.002 b Residual 932.412 30 31.080 Total 1283.469 31 a. Dependent Variable: Sexual Harassment

More information

Statistical Thinking in Biomedical Research Session #3 Statistical Modeling

Statistical Thinking in Biomedical Research Session #3 Statistical Modeling Statistical Thinking in Biomedical Research Session #3 Statistical Modeling Lily Wang, PhD Department of Biostatistics (modified from notes by J.Patrie, R.Abbott, U of Virginia and WD Dupont, Vanderbilt

More information

Applied Regression Modeling: A Business Approach Chapter 3: Multiple Linear Regression Sections

Applied Regression Modeling: A Business Approach Chapter 3: Multiple Linear Regression Sections Applied Regression Modeling: A Business Approach Chapter 3: Multiple Linear Regression Sections 3.4 3.6 by Iain Pardoe 3.4 Model assumptions 2 Regression model assumptions.............................................

More information

Psychology Seminar Psych 406 Dr. Jeffrey Leitzel

Psychology Seminar Psych 406 Dr. Jeffrey Leitzel Psychology Seminar Psych 406 Dr. Jeffrey Leitzel Structural Equation Modeling Topic 1: Correlation / Linear Regression Outline/Overview Correlations (r, pr, sr) Linear regression Multiple regression interpreting

More information

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /1/2016 1/46

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /1/2016 1/46 BIO5312 Biostatistics Lecture 10:Regression and Correlation Methods Dr. Junchao Xia Center of Biophysics and Computational Biology Fall 2016 11/1/2016 1/46 Outline In this lecture, we will discuss topics

More information

Alternatives to Difference Scores: Polynomial Regression and Response Surface Methodology. Jeffrey R. Edwards University of North Carolina

Alternatives to Difference Scores: Polynomial Regression and Response Surface Methodology. Jeffrey R. Edwards University of North Carolina Alternatives to Difference Scores: Polynomial Regression and Response Surface Methodology Jeffrey R. Edwards University of North Carolina 1 Outline I. Types of Difference Scores II. Questions Difference

More information

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test, October 2013

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test, October 2013 UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test, October 2013 STAC67H3 Regression Analysis Duration: One hour and fifty minutes Last Name: First Name: Student

More information

Chapter 4: Regression Models

Chapter 4: Regression Models Sales volume of company 1 Textbook: pp. 129-164 Chapter 4: Regression Models Money spent on advertising 2 Learning Objectives After completing this chapter, students will be able to: Identify variables,

More information

MANY BILLS OF CONCERN TO PUBLIC

MANY BILLS OF CONCERN TO PUBLIC - 6 8 9-6 8 9 6 9 XXX 4 > -? - 8 9 x 4 z ) - -! x - x - - X - - - - - x 00 - - - - - x z - - - x x - x - - - - - ) x - - - - - - 0 > - 000-90 - - 4 0 x 00 - -? z 8 & x - - 8? > 9 - - - - 64 49 9 x - -

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression EdPsych 580 C.J. Anderson Fall 2005 Simple Linear Regression p. 1/80 Outline 1. What it is and why it s useful 2. How 3. Statistical Inference 4. Examining assumptions (diagnostics)

More information

Multiple random effects. Often there are several vectors of random effects. Covariance structure

Multiple random effects. Often there are several vectors of random effects. Covariance structure Models with multiple random effects: Repeated Measures and Maternal effects Bruce Walsh lecture notes SISG -Mixed Model Course version 8 June 01 Multiple random effects y = X! + Za + Wu + e y is a n x

More information

A discussion on multiple regression models

A discussion on multiple regression models A discussion on multiple regression models In our previous discussion of simple linear regression, we focused on a model in which one independent or explanatory variable X was used to predict the value

More information

The General Linear Model. April 22, 2008

The General Linear Model. April 22, 2008 The General Linear Model. April 22, 2008 Multiple regression Data: The Faroese Mercury Study Simple linear regression Confounding The multiple linear regression model Interpretation of parameters Model

More information