Multiple linear regression

Size: px
Start display at page:

Download "Multiple linear regression"

Transcription

1 Multiple linear regression Course MF 930: Introduction to statistics June 0 Tron Anders Moger Department of biostatistics, IMB University of Oslo Aims for this lecture: Continue where we left off. Repeat the most important things from last lecture. Learn tests for checking whether the slope of the regression line is different from zero 3. Look at what happens if more variables are included in the model Learn how to handle Binary independent variables Categorical independent variables

2 Example: 5000, ,00 birthweight 3000,00 000,00 000,00 0,00 50,00 00,00 50,00 00,00 50,00 weight in pounds Repetition: Simple linear regression We define a model ε i Y Dependent variable = β + β x + ε i 0 i i Independent variable where are independent, normally distributed, with equal variance σ Wish to fit a line as close to the observed data (two normally distributed variables) as possible Example: Birth weight=β 0 +β *mother s weight Estimate for β 0 is called a, estimate for β is called b

3 Least squares regression 5000, ,00 birthweight 3000,00 000,00 000,00 R Sq Linear = 0,035 0,00 50,00 00,00 50,00 00,00 50,00 weight in pounds Find the best fitting line by minimizing the squared distance from each data point to the line, summed over all data Let (x, y ), (x, y ),...,(x n, y n ) denote the points in the plane. Find a and b so that y=a+bx fit the points by minimizing Solution: n y) + ( a + bx y) + + ( a + bxn yn) = ( a + bxi yi ) i= SSE = ( a + bx L n b = xi yi ( xi )( yi ) xi yi = n( xi ) ( xi ) xi nxi yi b xi a = n = y bx nxy where xi y = x =, y n n i and all sums are done for i=,...,n.

4 How close are the data to the fitted line? R y SST = yi y x i,y i ε = SSE = y yˆ i i i SSR = yˆi y Predicted value=any point on the regression line ˆ i y = a + bx i R, the proportion of the total variance in the y i s in the data explained by the regression line, is given by SSR/SST x Also remember: Residuals (distance from data points to the regression line) have to be normally distributed!! Plots for checking this is easily obtained from SPSS Histograms Q-Q plots (Which SPSS calls P-P plots in regression)

5 Example: Regression of birth weight with mother s weight as independent variable Summary b SSE SST Adjusted Std. Error of R R Square R Square the Estimate,86 a,035,09 78,470 a. Predictors: (Constant), weight in pounds Pearson s r R b. Dependent Variable: birthweight Regression Residual Total Estimate for β 0 Estimate for β ANOVA b Sum of Squares df Mean Square F Sig ,30 6,686,00 a , a. Predictors: (Constant), weight in pounds b. Dependent Variable: birthweight (Constant) weight in pounds a. Dependent Variable: birthweight Unstandardized SSR a Standardized Estimate for σ P-value for test on whether there is a significant relationship between the variables in the model. Null hypothesis is no relationship P-values, confidence intervals etc. for the β s 95% Confidence Interval for B t Sig. Lower Bound Upper Bound B Std. Error Beta 369,67 8,43 0,374,000 99,040 80,304 4,49,73,86,586,00,050 7,809 But how to answer questions like: Given that a positive slope (b) has been estimated: Does it give a reproducible indication that there is a positive trend, or is it a result of random variation? What is a confidence interval for the estimated slope?

6 Confidence intervals for simple regression In a simple regression model, a estimates b estimates β ˆ σ = SSE /( n ) Also, where of b β 0 ( b β )/ S ~ t ˆ σ Sb = ( n ) s b n estimates So a confidence interval for by b± tn, α /Sb x σ estimates variance β is given Hypothesis testing for simple regression Choose hypotheses: H 0 : β = 0 H: β 0 Test statistic: b/ Sb ~ tn Reject H 0 if b/ Sb < tn, α / or b/ Sb > tn, α / For the example: Test H 0 : β mother s weight =0 on 5%-sig. level Get 4.49/.73=.586. Look up.5 and 97.5-percentiles in t-distribution with 87 degrees of freedom (use normal dist.) Find p-value<0.05, reject H 0

7 More than one independent variable: Multiple regression Assume we have data of the type (x, x, x 3, y ), (x, x, x 3, y ),... We want to explain y from the x-values by fitting the following model: y = a + bx + + cx dx3 Just like before, one can produce formulas for a,b,c,d minimizing the sum of the squares of the errors. Multiple regression model y β β x β x β x ε i = 0 + i + i n ni + i ε i The errors are independent random (normal) variables with expected value zero and variance σ The explanatory variables x i, x i,, x ni cannot be linearily related, that is, measuring almost the same thing

8 Indicator variables Binary variables (yes/no, male/female, ) can be represented as /0, and used as independent variables. Also called dummy variables in the book. When used directly, they influence only the constant term of the regression It is also possible to use a binary variable so that it changes both constant term and slope of the regression line (interaction) Example: Regression of birth weight with mother s weight and smoking status as independent variables Summary b Adjusted Std. Error of R R Square R Square the Estimate,59 a,067, ,83567 a. Predictors: (Constant), smoking status, weight in pounds b. Dependent Variable: birthweight ANOVA b Regression Residual Total Sum of Squares df Mean Square F Sig ,65 6,7,00 a , a. Predictors: (Constant), smoking status, weight in pounds b. Dependent Variable: birthweight (Constant) weight in pounds smoking status a. Dependent Variable: birthweight Unstandardized a Standardized 95% Confidence Interval for B t Sig. Lower Bound Upper Bound B Std. Error Beta 500,74 30,833 0,83, , ,56 4,38,690,78,508,03,905 7,57-70,03 05,590 -,8 -,557,0-478,3-6,705

9 Interpretation: Have fitted the model Birth weight= *mother s weight-70.03*smoking status If the mother start to smoke (and her weight remain constant), what is the predicted influence on the infant s birth weight? *= -70 grams What is the predicted weight of the child of a 50 pound, smoking woman? * *=866 grams Confounding See that the estimated effects of mothers weight has changed a little compared to the univariate analysis (where it was 4.49) Mother s weight is slightly confounded by smoking Mwt Smk Bwt Confounder: An independent variable that causes a great change (at least 0%) in the effect of other independent variables (the β), when it s included in the model

10 Confounding cont d. A confounder is differently distributed for different values of the variable it confounds E.g. if lean mothers smoked more than obese mothers, a univariate effect of mothers weight on birth weight would partly be due to smoking!! Including smoking in the model, removes this effect, you get a more correct estimate of mothers weight What if a categorical variable has more than two values? Example: Ethinicity; black, white, other For categorical variables with m possible values, use m- indicators Common to choose a large group as baseline, otherwise unstable estimation A model with two indicator variables will assume that the effect of one indicator adds to the effect of the other If this may be unsuitable, use an additional interaction variable (product of indicators)

11 birth weight as a function of ethnicity Have constructed variables black=0 or and other=0 or : Birth weight=a+b*black+c*others Get (Constant) black other Unstandardized a. Dependent Variable: birthweight a Standardized 95% Confidence Interval for B t Sig. Lower Bound Upper Bound B Std. Error Beta 303,740 7,88 4,586, , ,5-384,047 57,874 -,8 -,433,06-695,50-7,593-99,75 3,678 -,97 -,637,009-53,988-75,46 Hence, predicted birth weight decrease by 384 grams for blacks and 99 grams for others Predicted birth weight for whites is 304 grams Multiple regression: Traffic deaths in 976 Want to find if there is any relationship between highway death rate (deaths per 000 per state) in the U.S. and the following variables: Average car age (in months) Average car weight (in 000 pounds) Percentage light trucks Percentage imported cars All data are per state

12 69,00 69,50 70,00 70,50 7,00 7,50 First: Scatter plots: 0,35 0,35 0,30 0,30 0,5 0,5 deaths 0,0 deaths 0,0 0,5 0,5 0,0 0,0 0,05 0,05 carage 3,00 3,0 3,40 3,60 3,80 vehwt 0,35 0,35 0,30 0,30 0,5 0,5 deaths 0,0 deaths 0,0 0,5 0,5 0,0 0,0 0,05 0,05 5,00 0,00 5,00 0,00 5,00 30,00 35,00 lghttrks 0,00 5,00 0,00 5,00 0,00 5,00 30,00 impcars Summary b Adjusted Std. Error of R R Square R Square the Estimate,49 a,4,6,0506 a. Predictors: (Constant), carage Univariate effects (including one independent variable at a time!): b. Dependent Variable: deaths a (Constant) carage a. Dependent Variable: deaths Deaths per 000=a+b*car age (in months) Unstandardized Standardized 95% Confidence Interval for B t Sig. Lower Bound Upper Bound B Std. Error Beta 4,56,34 3,98,000,33 6,800 -,06,06 -,49-3,834,000 -,094 -,09 Hence: If all else is equal, if average car age increases by one month, you get 0.06 fewer deaths per 000 inhabitants; increase age by months, you get *0.06=0.74 fewer deaths per 000 inhabitants Summary b Adjusted Std. Error of R R Square R Square the Estimate,8 a,079,059,05740 a. Predictors: (Constant), vehwt b. Dependent Variable: deaths a (Constant) vehwt Deaths per 000=a+b*car weight (in pounds) Unstandardized a. Dependent Variable: deaths Standardized 95% Confidence Interval for B t Sig. Lower Bound Upper Bound B Std. Error Beta -,7, -,7,6 -,76,74,4,06,8,983,053 -,00,49

13 Univariate effects cont d (one independent variable at a time!): Summary b Adjusted Std. Error of R R Square R Square the Estimate,76 a,5,50,0478 a. Predictors: (Constant), lghttrks b. Dependent Variable: deaths Hence: Increase prop. light trucks by 0 means 0*0.007=0.4 more deaths per 000 inhabitants (Constant) lghttrks a. Dependent Variable: deaths Unstandardized a Standardized 95% Confidence Interval for B t Sig. Lower Bound Upper Bound B Std. Error Beta,046,08,478,07,009,083,007,00,76 6,947,000,005,00 Summary b Adjusted Std. Error of R R Square R Square the Estimate,308 a,095,075,05690 a. Predictors: (Constant), impcars b. Dependent Variable: deaths Predicted number of deaths per 000 if prop. Imported cars is 0%: *0=0.7 a (Constant) impcars a. Dependent Variable: deaths Unstandardized Standardized 95% Confidence Interval for B t Sig. Lower Bound Upper Bound B Std. Error Beta,06,00 0,46,000,66,46 -,004,00 -,308 -,93,033 -,007,000 Building a multiple regression model, exploratory analysis: Forward regression: Try all independent variables, one at a time, keep the variable with the lowest p-value Repeat step, with the independent variable from the first round now included in the model Repeat until no more variables can be added to the model (no more significant variables) Backward regression: Include all independent variables in the model, remove the variable with the highest p- value Continue until only significant variables are left However: In health sciences you would often keep age, gender etc. in the model even though they are not significant

14 Two better methods of model building:. All independent variables chosen for the study have strong medical reasons for being interesting and you have a large enough study Then, all might be included in the final model regardless of significance. Middle road: use a cut-off saying that all variables with p-value<e.g. 0. in simple analyses can be included in final model For the traffic deaths, end up with: Deaths per 000= *car age *perc. light trucks Summary b Adjusted Std. Error of R R Square R Square the Estimate,768 a,590,57,0387 a. Predictors: (Constant), lghttrks, carage b. Dependent Variable: deaths (Constant) carage lghttrks a. Dependent Variable: deaths Unstandardized a Standardized 95% Confidence Interval for B t Sig. Lower Bound Upper Bound B Std. Error Beta,668,895,98,005,865 4,470 -,037,03 -,95 -,930,005 -,063 -,0,006,00,6 6,8,000,004,009 Conclusion: Did a multiple linear regression on traffic deaths, with car age, car weight, prop. light trucks and prop. imported cars as independent variables. Car age (in months, β=-0.037, 95% CI=(-0.063, -0.0)) and prop. light trucks (β=0.006, 95% CI=(0.004, 0.009)) were significant on 5%-level

15 Check of assumptions: Are residuals normally distributed? Histogram Normal P-P Plot of Regression Standardized Residual Dependent Variable: deaths Dependent Variable: deaths,0 4 0,8 Frequency Expected Cum Prob 0,6 0,4 0, Regression Standardized Residual Mean =,3E-7 Std. Dev. = 0,978 N = 48 0,0 0,0 0, 0,4 0,6 0,8,0 Observed Cum Prob Least squares estimation in multiple regression yi = β0 + βx i + βx i βkxki + εi The least squares estimates of β0, β,..., βk are the values b, b,, b K minimizing n i= (... ) SSE = b + b x + b x + + b x y 0 i i K Ki i They can be computed with similar but more complex formulas as with simple regression

16 R is defined just as before: Defining n We get as before We define yˆ = b + bx + b x b x i 0 i i K Ki n n ( ) i SSE = ( y ˆ ) i yi SSR = ( yˆ ) i y SST = y y i= R i= SST = SSR + SSE SSR SSE = = SST SST i= Adjusted coefficient of determination Adding more independent variables will generally increase SSR and decrease SSE Thus the coefficient of determination will tend to indicate that models with many variables always fit better. To avoid this effect, the adjusted coefficient of determination may be used: SSE /( n K ) R = SST /( n )

17 Drawing inference about the model parameters in multiple regression Similar to simple regression, we get that the following statistic has a t distribution with n-k- degrees of freedom: bj β j tb = j sbj where b j is the least squares estimate for and s bj is its estimated standard deviation K is number of independent variables s bj is computed from SSE and the correlation between independent variables Confidence intervals and hypothesis tests A confidence interval for b ± t s j n K, α / bj β j becomes Testing the hypothesis H : 0 0 β j = vs H : 0 β j Reject if b s j bj < t n K, α / or b s j bj > t n K, α /

18 Testing sets of parameters We can also test the null hypothesis that a specific set of the betas are simultaneously zero. The alternative hypothesis is that at least one beta in the set is nonzero. But will not go into details here What if the relationship between x and y is non-linear? Most common thing to do is to categorize the independent variable E.g. categorize age into 0-0 yrs, -40 yrs, 4-60 yrs and so on Choose a baseline category, and estimate a slope b for each of the other categories Then, it does not matter what relationship you have between the outcome and the independent variable

19 Other options if the relationship is non-linear: Transformed variables The relationship between variables may not be linear Example: The natural model may be y = ae bx We want to find a and b bx so that the line y = ae approximates the points as well as possible Example (cont.) bx When y = ae then log( y ) = log( a) + bx Use standard formulas on the pairs (x,log(y )), (x, log(y )),..., (x n, log(y n )) We get estimates for log(a) and b, and thus a and b

20 Doing a regression analysis Plot the data first, to investigate whether there is a natural relationship Linear or transformed model? Are there outliers which will unduly affect the result? Fit a model. Different models with same number of parameters may be compared with R Check the assumptions! Make tests / confidence intervals for parameters A lot of practice is needed!

Correlation and simple linear regression S5

Correlation and simple linear regression S5 Basic medical statistics for clinical and eperimental research Correlation and simple linear regression S5 Katarzyna Jóźwiak k.jozwiak@nki.nl November 15, 2017 1/41 Introduction Eample: Brain size and

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore What is Multiple Linear Regression Several independent variables may influence the change in response variable we are trying to study. When several independent variables are included in the equation, the

More information

Correlation Analysis

Correlation Analysis Simple Regression Correlation Analysis Correlation analysis is used to measure strength of the association (linear relationship) between two variables Correlation is only concerned with strength of the

More information

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont. TCELL 9/4/205 36-309/749 Experimental Design for Behavioral and Social Sciences Simple Regression Example Male black wheatear birds carry stones to the nest as a form of sexual display. Soler et al. wanted

More information

Review of Multiple Regression

Review of Multiple Regression Ronald H. Heck 1 Let s begin with a little review of multiple regression this week. Linear models [e.g., correlation, t-tests, analysis of variance (ANOVA), multiple regression, path analysis, multivariate

More information

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression 36-309/749 Experimental Design for Behavioral and Social Sciences Sep. 22, 2015 Lecture 4: Linear Regression TCELL Simple Regression Example Male black wheatear birds carry stones to the nest as a form

More information

Simple Linear Regression Using Ordinary Least Squares

Simple Linear Regression Using Ordinary Least Squares Simple Linear Regression Using Ordinary Least Squares Purpose: To approximate a linear relationship with a line. Reason: We want to be able to predict Y using X. Definition: The Least Squares Regression

More information

Lecture 9: Linear Regression

Lecture 9: Linear Regression Lecture 9: Linear Regression Goals Develop basic concepts of linear regression from a probabilistic framework Estimating parameters and hypothesis testing with linear models Linear regression in R Regression

More information

Basic Business Statistics 6 th Edition

Basic Business Statistics 6 th Edition Basic Business Statistics 6 th Edition Chapter 12 Simple Linear Regression Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value of a dependent variable based

More information

Chapter 3 Multiple Regression Complete Example

Chapter 3 Multiple Regression Complete Example Department of Quantitative Methods & Information Systems ECON 504 Chapter 3 Multiple Regression Complete Example Spring 2013 Dr. Mohammad Zainal Review Goals After completing this lecture, you should be

More information

Correlation and regression

Correlation and regression 1 Correlation and regression Yongjua Laosiritaworn Introductory on Field Epidemiology 6 July 2015, Thailand Data 2 Illustrative data (Doll, 1955) 3 Scatter plot 4 Doll, 1955 5 6 Correlation coefficient,

More information

Lecture 18: Simple Linear Regression

Lecture 18: Simple Linear Regression Lecture 18: Simple Linear Regression BIOS 553 Department of Biostatistics University of Michigan Fall 2004 The Correlation Coefficient: r The correlation coefficient (r) is a number that measures the strength

More information

Business Statistics. Lecture 10: Course Review

Business Statistics. Lecture 10: Course Review Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,

More information

Chapter 9 - Correlation and Regression

Chapter 9 - Correlation and Regression Chapter 9 - Correlation and Regression 9. Scatter diagram of percentage of LBW infants (Y) and high-risk fertility rate (X ) in Vermont Health Planning Districts. 9.3 Correlation between percentage of

More information

Unit 10: Simple Linear Regression and Correlation

Unit 10: Simple Linear Regression and Correlation Unit 10: Simple Linear Regression and Correlation Statistics 571: Statistical Methods Ramón V. León 6/28/2004 Unit 10 - Stat 571 - Ramón V. León 1 Introductory Remarks Regression analysis is a method for

More information

Inference for Regression Inference about the Regression Model and Using the Regression Line

Inference for Regression Inference about the Regression Model and Using the Regression Line Inference for Regression Inference about the Regression Model and Using the Regression Line PBS Chapter 10.1 and 10.2 2009 W.H. Freeman and Company Objectives (PBS Chapter 10.1 and 10.2) Inference about

More information

REVIEW 8/2/2017 陈芳华东师大英语系

REVIEW 8/2/2017 陈芳华东师大英语系 REVIEW Hypothesis testing starts with a null hypothesis and a null distribution. We compare what we have to the null distribution, if the result is too extreme to belong to the null distribution (p

More information

9. Linear Regression and Correlation

9. Linear Regression and Correlation 9. Linear Regression and Correlation Data: y a quantitative response variable x a quantitative explanatory variable (Chap. 8: Recall that both variables were categorical) For example, y = annual income,

More information

Statistics for Managers using Microsoft Excel 6 th Edition

Statistics for Managers using Microsoft Excel 6 th Edition Statistics for Managers using Microsoft Excel 6 th Edition Chapter 13 Simple Linear Regression 13-1 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value of

More information

Inferences for Regression

Inferences for Regression Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In

More information

x3,..., Multiple Regression β q α, β 1, β 2, β 3,..., β q in the model can all be estimated by least square estimators

x3,..., Multiple Regression β q α, β 1, β 2, β 3,..., β q in the model can all be estimated by least square estimators Multiple Regression Relating a response (dependent, input) y to a set of explanatory (independent, output, predictor) variables x, x 2, x 3,, x q. A technique for modeling the relationship between variables.

More information

Inference for Regression Simple Linear Regression

Inference for Regression Simple Linear Regression Inference for Regression Simple Linear Regression IPS Chapter 10.1 2009 W.H. Freeman and Company Objectives (IPS Chapter 10.1) Simple linear regression p Statistical model for linear regression p Estimating

More information

Lecture 10 Multiple Linear Regression

Lecture 10 Multiple Linear Regression Lecture 10 Multiple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: 6.1-6.5 10-1 Topic Overview Multiple Linear Regression Model 10-2 Data for Multiple Regression Y i is the response variable

More information

LI EAR REGRESSIO A D CORRELATIO

LI EAR REGRESSIO A D CORRELATIO CHAPTER 6 LI EAR REGRESSIO A D CORRELATIO Page Contents 6.1 Introduction 10 6. Curve Fitting 10 6.3 Fitting a Simple Linear Regression Line 103 6.4 Linear Correlation Analysis 107 6.5 Spearman s Rank Correlation

More information

Simple Linear Regression: One Quantitative IV

Simple Linear Regression: One Quantitative IV Simple Linear Regression: One Quantitative IV Linear regression is frequently used to explain variation observed in a dependent variable (DV) with theoretically linked independent variables (IV). For example,

More information

: The model hypothesizes a relationship between the variables. The simplest probabilistic model: or.

: The model hypothesizes a relationship between the variables. The simplest probabilistic model: or. Chapter Simple Linear Regression : comparing means across groups : presenting relationships among numeric variables. Probabilistic Model : The model hypothesizes an relationship between the variables.

More information

Multiple Regression. More Hypothesis Testing. More Hypothesis Testing The big question: What we really want to know: What we actually know: We know:

Multiple Regression. More Hypothesis Testing. More Hypothesis Testing The big question: What we really want to know: What we actually know: We know: Multiple Regression Ψ320 Ainsworth More Hypothesis Testing What we really want to know: Is the relationship in the population we have selected between X & Y strong enough that we can use the relationship

More information

Mathematics for Economics MA course

Mathematics for Economics MA course Mathematics for Economics MA course Simple Linear Regression Dr. Seetha Bandara Simple Regression Simple linear regression is a statistical method that allows us to summarize and study relationships between

More information

Ordinary Least Squares Regression Explained: Vartanian

Ordinary Least Squares Regression Explained: Vartanian Ordinary Least Squares Regression Eplained: Vartanian When to Use Ordinary Least Squares Regression Analysis A. Variable types. When you have an interval/ratio scale dependent variable.. When your independent

More information

Inference for Regression

Inference for Regression Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu

More information

Chapter 4: Regression Models

Chapter 4: Regression Models Sales volume of company 1 Textbook: pp. 129-164 Chapter 4: Regression Models Money spent on advertising 2 Learning Objectives After completing this chapter, students will be able to: Identify variables,

More information

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X. Estimating σ 2 We can do simple prediction of Y and estimation of the mean of Y at any value of X. To perform inferences about our regression line, we must estimate σ 2, the variance of the error term.

More information

Chapter 4. Regression Models. Learning Objectives

Chapter 4. Regression Models. Learning Objectives Chapter 4 Regression Models To accompany Quantitative Analysis for Management, Eleventh Edition, by Render, Stair, and Hanna Power Point slides created by Brian Peterson Learning Objectives After completing

More information

Ch. 1: Data and Distributions

Ch. 1: Data and Distributions Ch. 1: Data and Distributions Populations vs. Samples How to graphically display data Histograms, dot plots, stem plots, etc Helps to show how samples are distributed Distributions of both continuous and

More information

y response variable x 1, x 2,, x k -- a set of explanatory variables

y response variable x 1, x 2,, x k -- a set of explanatory variables 11. Multiple Regression and Correlation y response variable x 1, x 2,, x k -- a set of explanatory variables In this chapter, all variables are assumed to be quantitative. Chapters 12-14 show how to incorporate

More information

Multiple Regression and Model Building Lecture 20 1 May 2006 R. Ryznar

Multiple Regression and Model Building Lecture 20 1 May 2006 R. Ryznar Multiple Regression and Model Building 11.220 Lecture 20 1 May 2006 R. Ryznar Building Models: Making Sure the Assumptions Hold 1. There is a linear relationship between the explanatory (independent) variable(s)

More information

Multiple linear regression S6

Multiple linear regression S6 Basic medical statistics for clinical and experimental research Multiple linear regression S6 Katarzyna Jóźwiak k.jozwiak@nki.nl November 15, 2017 1/42 Introduction Two main motivations for doing multiple

More information

General Linear Model (Chapter 4)

General Linear Model (Chapter 4) General Linear Model (Chapter 4) Outcome variable is considered continuous Simple linear regression Scatterplots OLS is BLUE under basic assumptions MSE estimates residual variance testing regression coefficients

More information

Lecture 11: Simple Linear Regression

Lecture 11: Simple Linear Regression Lecture 11: Simple Linear Regression Readings: Sections 3.1-3.3, 11.1-11.3 Apr 17, 2009 In linear regression, we examine the association between two quantitative variables. Number of beers that you drink

More information

CS 5014: Research Methods in Computer Science

CS 5014: Research Methods in Computer Science Computer Science Clifford A. Shaffer Department of Computer Science Virginia Tech Blacksburg, Virginia Fall 2010 Copyright c 2010 by Clifford A. Shaffer Computer Science Fall 2010 1 / 207 Correlation and

More information

STAT 4385 Topic 03: Simple Linear Regression

STAT 4385 Topic 03: Simple Linear Regression STAT 4385 Topic 03: Simple Linear Regression Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso xsu@utep.edu Spring, 2017 Outline The Set-Up Exploratory Data Analysis

More information

Basic Business Statistics, 10/e

Basic Business Statistics, 10/e Chapter 4 4- Basic Business Statistics th Edition Chapter 4 Introduction to Multiple Regression Basic Business Statistics, e 9 Prentice-Hall, Inc. Chap 4- Learning Objectives In this chapter, you learn:

More information

Lecture 3: Inference in SLR

Lecture 3: Inference in SLR Lecture 3: Inference in SLR STAT 51 Spring 011 Background Reading KNNL:.1.6 3-1 Topic Overview This topic will cover: Review of hypothesis testing Inference about 1 Inference about 0 Confidence Intervals

More information

Confidence Interval for the mean response

Confidence Interval for the mean response Week 3: Prediction and Confidence Intervals at specified x. Testing lack of fit with replicates at some x's. Inference for the correlation. Introduction to regression with several explanatory variables.

More information

Example. Multiple Regression. Review of ANOVA & Simple Regression /749 Experimental Design for Behavioral and Social Sciences

Example. Multiple Regression. Review of ANOVA & Simple Regression /749 Experimental Design for Behavioral and Social Sciences 36-309/749 Experimental Design for Behavioral and Social Sciences Sep. 29, 2015 Lecture 5: Multiple Regression Review of ANOVA & Simple Regression Both Quantitative outcome Independent, Gaussian errors

More information

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow) STAT40 Midterm Exam University of Illinois Urbana-Champaign October 19 (Friday), 018 3:00 4:15p SOLUTIONS (Yellow) Question 1 (15 points) (10 points) 3 (50 points) extra ( points) Total (77 points) Points

More information

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit LECTURE 6 Introduction to Econometrics Hypothesis testing & Goodness of fit October 25, 2016 1 / 23 ON TODAY S LECTURE We will explain how multiple hypotheses are tested in a regression model We will define

More information

Practical Biostatistics

Practical Biostatistics Practical Biostatistics Clinical Epidemiology, Biostatistics and Bioinformatics AMC Multivariable regression Day 5 Recap Describing association: Correlation Parametric technique: Pearson (PMCC) Non-parametric:

More information

Inference for the Regression Coefficient

Inference for the Regression Coefficient Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression line. We can shows that b 0 and b 1 are the unbiased estimates

More information

Chapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression

Chapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression Chapter 14 Student Lecture Notes 14-1 Department of Quantitative Methods & Information Systems Business Statistics Chapter 14 Multiple Regression QMIS 0 Dr. Mohammad Zainal Chapter Goals After completing

More information

A discussion on multiple regression models

A discussion on multiple regression models A discussion on multiple regression models In our previous discussion of simple linear regression, we focused on a model in which one independent or explanatory variable X was used to predict the value

More information

Chapter 16. Simple Linear Regression and Correlation

Chapter 16. Simple Linear Regression and Correlation Chapter 16 Simple Linear Regression and Correlation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will

More information

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS In our work on hypothesis testing, we used the value of a sample statistic to challenge an accepted value of a population parameter. We focused only

More information

Statistics and Quantitative Analysis U4320

Statistics and Quantitative Analysis U4320 Statistics and Quantitative Analysis U3 Lecture 13: Explaining Variation Prof. Sharyn O Halloran Explaining Variation: Adjusted R (cont) Definition of Adjusted R So we'd like a measure like R, but one

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

WORKSHOP 3 Measuring Association

WORKSHOP 3 Measuring Association WORKSHOP 3 Measuring Association Concepts Analysing Categorical Data o Testing of Proportions o Contingency Tables & Tests o Odds Ratios Linear Association Measures o Correlation o Simple Linear Regression

More information

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box.

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box. FINAL EXAM ** Two different ways to submit your answer sheet (i) Use MS-Word and place it in a drop-box. (ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box. Deadline: December

More information

Business Statistics. Lecture 10: Correlation and Linear Regression

Business Statistics. Lecture 10: Correlation and Linear Regression Business Statistics Lecture 10: Correlation and Linear Regression Scatterplot A scatterplot shows the relationship between two quantitative variables measured on the same individuals. It displays the Form

More information

Categorical Predictor Variables

Categorical Predictor Variables Categorical Predictor Variables We often wish to use categorical (or qualitative) variables as covariates in a regression model. For binary variables (taking on only 2 values, e.g. sex), it is relatively

More information

Example: Forced Expiratory Volume (FEV) Program L13. Example: Forced Expiratory Volume (FEV) Example: Forced Expiratory Volume (FEV)

Example: Forced Expiratory Volume (FEV) Program L13. Example: Forced Expiratory Volume (FEV) Example: Forced Expiratory Volume (FEV) Program L13 Relationships between two variables Correlation, cont d Regression Relationships between more than two variables Multiple linear regression Two numerical variables Linear or curved relationship?

More information

Chapter 14 Student Lecture Notes 14-1

Chapter 14 Student Lecture Notes 14-1 Chapter 14 Student Lecture Notes 14-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter 14 Multiple Regression Analysis and Model Building Chap 14-1 Chapter Goals After completing this

More information

Area1 Scaled Score (NAPLEX) .535 ** **.000 N. Sig. (2-tailed)

Area1 Scaled Score (NAPLEX) .535 ** **.000 N. Sig. (2-tailed) Institutional Assessment Report Texas Southern University College of Pharmacy and Health Sciences "An Analysis of 2013 NAPLEX, P4-Comp. Exams and P3 courses The following analysis illustrates relationships

More information

STAT 511. Lecture : Simple linear regression Devore: Section Prof. Michael Levine. December 3, Levine STAT 511

STAT 511. Lecture : Simple linear regression Devore: Section Prof. Michael Levine. December 3, Levine STAT 511 STAT 511 Lecture : Simple linear regression Devore: Section 12.1-12.4 Prof. Michael Levine December 3, 2018 A simple linear regression investigates the relationship between the two variables that is not

More information

Correlation and Regression Bangkok, 14-18, Sept. 2015

Correlation and Regression Bangkok, 14-18, Sept. 2015 Analysing and Understanding Learning Assessment for Evidence-based Policy Making Correlation and Regression Bangkok, 14-18, Sept. 2015 Australian Council for Educational Research Correlation The strength

More information

STK4900/ Lecture 3. Program

STK4900/ Lecture 3. Program STK4900/9900 - Lecture 3 Program 1. Multiple regression: Data structure and basic questions 2. The multiple linear regression model 3. Categorical predictors 4. Planned experiments and observational studies

More information

16.400/453J Human Factors Engineering. Design of Experiments II

16.400/453J Human Factors Engineering. Design of Experiments II J Human Factors Engineering Design of Experiments II Review Experiment Design and Descriptive Statistics Research question, independent and dependent variables, histograms, box plots, etc. Inferential

More information

Unit 11: Multiple Linear Regression

Unit 11: Multiple Linear Regression Unit 11: Multiple Linear Regression Statistics 571: Statistical Methods Ramón V. León 7/13/2004 Unit 11 - Stat 571 - Ramón V. León 1 Main Application of Multiple Regression Isolating the effect of a variable

More information

Formal Statement of Simple Linear Regression Model

Formal Statement of Simple Linear Regression Model Formal Statement of Simple Linear Regression Model Y i = β 0 + β 1 X i + ɛ i Y i value of the response variable in the i th trial β 0 and β 1 are parameters X i is a known constant, the value of the predictor

More information

Analysis of variance

Analysis of variance Analysis of variance Tron Anders Moger 3.0.007 Comparing more than two groups Up to now we have studied situations with One observation per subject One group Two groups Two or more observations per subject

More information

Economics 113. Simple Regression Assumptions. Simple Regression Derivation. Changing Units of Measurement. Nonlinear effects

Economics 113. Simple Regression Assumptions. Simple Regression Derivation. Changing Units of Measurement. Nonlinear effects Economics 113 Simple Regression Models Simple Regression Assumptions Simple Regression Derivation Changing Units of Measurement Nonlinear effects OLS and unbiased estimates Variance of the OLS estimates

More information

Business Statistics. Lecture 9: Simple Regression

Business Statistics. Lecture 9: Simple Regression Business Statistics Lecture 9: Simple Regression 1 On to Model Building! Up to now, class was about descriptive and inferential statistics Numerical and graphical summaries of data Confidence intervals

More information

Finding Relationships Among Variables

Finding Relationships Among Variables Finding Relationships Among Variables BUS 230: Business and Economic Research and Communication 1 Goals Specific goals: Re-familiarize ourselves with basic statistics ideas: sampling distributions, hypothesis

More information

CHAPTER EIGHT Linear Regression

CHAPTER EIGHT Linear Regression 7 CHAPTER EIGHT Linear Regression 8. Scatter Diagram Example 8. A chemical engineer is investigating the effect of process operating temperature ( x ) on product yield ( y ). The study results in the following

More information

1 Correlation and Inference from Regression

1 Correlation and Inference from Regression 1 Correlation and Inference from Regression Reading: Kennedy (1998) A Guide to Econometrics, Chapters 4 and 6 Maddala, G.S. (1992) Introduction to Econometrics p. 170-177 Moore and McCabe, chapter 12 is

More information

Linear Regression. Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x).

Linear Regression. Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x). Linear Regression Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x). A dependent variable is a random variable whose variation

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there

More information

STA6938-Logistic Regression Model

STA6938-Logistic Regression Model Dr. Ying Zhang STA6938-Logistic Regression Model Topic 2-Multiple Logistic Regression Model Outlines:. Model Fitting 2. Statistical Inference for Multiple Logistic Regression Model 3. Interpretation of

More information

Lecture 6 Multiple Linear Regression, cont.

Lecture 6 Multiple Linear Regression, cont. Lecture 6 Multiple Linear Regression, cont. BIOST 515 January 22, 2004 BIOST 515, Lecture 6 Testing general linear hypotheses Suppose we are interested in testing linear combinations of the regression

More information

The Multiple Regression Model

The Multiple Regression Model Multiple Regression The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & or more independent variables (X i ) Multiple Regression Model with k Independent Variables:

More information

Chapter 16. Simple Linear Regression and dcorrelation

Chapter 16. Simple Linear Regression and dcorrelation Chapter 16 Simple Linear Regression and dcorrelation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will

More information

Inference in Regression Analysis

Inference in Regression Analysis Inference in Regression Analysis Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 1 Today: Normal Error Regression Model Y i = β 0 + β 1 X i + ǫ i Y i value

More information

ESP 178 Applied Research Methods. 2/23: Quantitative Analysis

ESP 178 Applied Research Methods. 2/23: Quantitative Analysis ESP 178 Applied Research Methods 2/23: Quantitative Analysis Data Preparation Data coding create codebook that defines each variable, its response scale, how it was coded Data entry for mail surveys and

More information

QUANTITATIVE STATISTICAL METHODS: REGRESSION AND FORECASTING JOHANNES LEDOLTER VIENNA UNIVERSITY OF ECONOMICS AND BUSINESS ADMINISTRATION SPRING 2013

QUANTITATIVE STATISTICAL METHODS: REGRESSION AND FORECASTING JOHANNES LEDOLTER VIENNA UNIVERSITY OF ECONOMICS AND BUSINESS ADMINISTRATION SPRING 2013 QUANTITATIVE STATISTICAL METHODS: REGRESSION AND FORECASTING JOHANNES LEDOLTER VIENNA UNIVERSITY OF ECONOMICS AND BUSINESS ADMINISTRATION SPRING 3 Introduction Objectives of course: Regression and Forecasting

More information

Chapter 14. Linear least squares

Chapter 14. Linear least squares Serik Sagitov, Chalmers and GU, March 5, 2018 Chapter 14 Linear least squares 1 Simple linear regression model A linear model for the random response Y = Y (x) to an independent variable X = x For a given

More information

Lecture 19: Inference for SLR & Transformations

Lecture 19: Inference for SLR & Transformations Lecture 19: Inference for SLR & Transformations Statistics 101 Mine Çetinkaya-Rundel April 3, 2012 Announcements Announcements HW 7 due Thursday. Correlation guessing game - ends on April 12 at noon. Winner

More information

Biostatistics for physicists fall Correlation Linear regression Analysis of variance

Biostatistics for physicists fall Correlation Linear regression Analysis of variance Biostatistics for physicists fall 2015 Correlation Linear regression Analysis of variance Correlation Example: Antibody level on 38 newborns and their mothers There is a positive correlation in antibody

More information

Linear Modelling in Stata Session 6: Further Topics in Linear Modelling

Linear Modelling in Stata Session 6: Further Topics in Linear Modelling Linear Modelling in Stata Session 6: Further Topics in Linear Modelling Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 14/11/2017 This Week Categorical Variables Categorical

More information

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006 Chapter 17 Simple Linear Regression and Correlation 17.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will

More information

Chapter 14 Simple Linear Regression (A)

Chapter 14 Simple Linear Regression (A) Chapter 14 Simple Linear Regression (A) 1. Characteristics Managerial decisions often are based on the relationship between two or more variables. can be used to develop an equation showing how the variables

More information

Disadvantages of using many pooled t procedures. The sampling distribution of the sample means. The variability between the sample means

Disadvantages of using many pooled t procedures. The sampling distribution of the sample means. The variability between the sample means Stat 529 (Winter 2011) Analysis of Variance (ANOVA) Reading: Sections 5.1 5.3. Introduction and notation Birthweight example Disadvantages of using many pooled t procedures The analysis of variance procedure

More information

Acknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression

Acknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression INTRODUCTION TO CLINICAL RESEARCH Introduction to Linear Regression Karen Bandeen-Roche, Ph.D. July 17, 2012 Acknowledgements Marie Diener-West Rick Thompson ICTR Leadership / Team JHU Intro to Clinical

More information

Simple Linear Regression: One Qualitative IV

Simple Linear Regression: One Qualitative IV Simple Linear Regression: One Qualitative IV 1. Purpose As noted before regression is used both to explain and predict variation in DVs, and adding to the equation categorical variables extends regression

More information

Chapter Goals. To understand the methods for displaying and describing relationship among variables. Formulate Theories.

Chapter Goals. To understand the methods for displaying and describing relationship among variables. Formulate Theories. Chapter Goals To understand the methods for displaying and describing relationship among variables. Formulate Theories Interpret Results/Make Decisions Collect Data Summarize Results Chapter 7: Is There

More information

( ), which of the coefficients would end

( ), which of the coefficients would end Discussion Sheet 29.7.9 Qualitative Variables We have devoted most of our attention in multiple regression to quantitative or numerical variables. MR models can become more useful and complex when we consider

More information

Example: Multiple linear regression. Least squares regression. Repetition: Simple linear regression. Tron Anders Moger

Example: Multiple linear regression. Least squares regression. Repetition: Simple linear regression. Tron Anders Moger Example: Multple lear regresso 5000,00 4000,00 Tro Aders Moger 0.0.007 brthweght 3000,00 000,00 000,00 0,00 50,00 00,00 50,00 00,00 50,00 weght pouds Repetto: Smple lear regresso We defe a model Y = β0

More information

Correlation and Simple Linear Regression

Correlation and Simple Linear Regression Correlation and Simple Linear Regression Sasivimol Rattanasiri, Ph.D Section for Clinical Epidemiology and Biostatistics Ramathibodi Hospital, Mahidol University E-mail: sasivimol.rat@mahidol.ac.th 1 Outline

More information

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /1/2016 1/46

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /1/2016 1/46 BIO5312 Biostatistics Lecture 10:Regression and Correlation Methods Dr. Junchao Xia Center of Biophysics and Computational Biology Fall 2016 11/1/2016 1/46 Outline In this lecture, we will discuss topics

More information

Correlation and Regression

Correlation and Regression Correlation and Regression October 25, 2017 STAT 151 Class 9 Slide 1 Outline of Topics 1 Associations 2 Scatter plot 3 Correlation 4 Regression 5 Testing and estimation 6 Goodness-of-fit STAT 151 Class

More information

Statistics and Quantitative Analysis U4320. Segment 10 Prof. Sharyn O Halloran

Statistics and Quantitative Analysis U4320. Segment 10 Prof. Sharyn O Halloran Statistics and Quantitative Analysis U4320 Segment 10 Prof. Sharyn O Halloran Key Points 1. Review Univariate Regression Model 2. Introduce Multivariate Regression Model Assumptions Estimation Hypothesis

More information