Unit 2 Regression and Correlation Practice Problems. SOLUTIONS Version STATA

Size: px
Start display at page:

Download "Unit 2 Regression and Correlation Practice Problems. SOLUTIONS Version STATA"

Transcription

1 PubHlth 640. Regression and Correlation Page 1 of 19 Unit Regression and Correlation Practice Problems SOLUTIONS Version STATA 1. A regression analysis of measurements of a dependent variable Y on an independent variable X produces a statistically significant association between X and Y. Drawing upon your education in introductory biostatistics, the theory of epidemiology, the scientific method, etc see how many explanations you can offer for this finding. Hint I get seven (7) (i) X causes Y Example: X=smoking Y=cancer (ii) Y causes X Example: X=cancer Y=smoking (iii) Positive confounding Example: X=alcohol Y=cancer Confounder = Z = smoking (iv) Necessary but not sufficient Example: X=sun exposure Y=melanoma Sun exposure alone does not cause melanoma. Melanoma is the result of a gene-environment interaction. (v) Intermediary Example: X = asthma Y=lesions on the lung X=asthma is an intermediary in the pathway coal exposure asthma lesions on lung (vi) Confounding Example: X=yellow finger Y=lung cancer Z=smoking causes both yellow finger and lung cancer. (vii) Anomoly There is no association. An event of low probability has occurred.

2 PubHlth 640. Regression and Correlation Page of 19. Below is a figure summarizing some data for which a simple linear regression analysis has been performed. The point denoted X that appears on the line is (x,y). The two points indicated by open circles were NOT included in the original analysis.

3 PubHlth 640. Regression and Correlation Page 3 of 19 Multiple Choice (Choose ONE): Suppose the two points represented by the open circles are added to the data set and the regression analysis is repeated. What is the effect of adding these points on: 1. The estimated slope. (a) increase (b) decrease X (c) no change. The residual sum of squares. (a) increase (b) decrease X_ (c) no change 3. The degrees of freedom. X (a) increase (b) decrease (c) no change 4. Standard error of the estimated slope. (a) increase X (b) decrease (c) no change 5. The predicted value of y at x=4. (a) increase (b) decrease X (c) no change

4 PubHlth 640. Regression and Correlation Page 4 of The course website page REGRESSION AND CORRELATION will have some examples of code to produce regression analyses in STATA and SAS. The data in the table below are values of boiling points (Y) and temperature (X). Carry out an exploratory analysis to determine whether the relationship between temperature and boiling point is better represented using (i) (ii) Y = β 0+β1X or ' ' 100 log (Y) = β +β X In developing your answer, use whatever statistical software you like (SAS, STATA, or Minitab). Try your hand at producing (a) Estimates of the regression line parameters (b) Analysis of variance tables (c) R (d) Scatter plot with overlay of fitted line. Complete your answer with a one paragraph text that is an interpretation of your work. Take your time with this and have fun. X=Temp Y=Boiling Pt X=Temp Y=Boiling Pt X=Temp Y=Boiling Pt

5 PubHlth 640. Regression and Correlation Page 5 of 19 Solution (one paragraph of text that is interpretation of analysis): Did you notice that the scatter plot of these data reveal two outlying values? Their inclusion may or may not be appropriate. If all n=31 data points are included in the analysis, then the model that explains more of the variability in boiling point is Y=boiling point modeled linearly in X=temperature. It has a greater R (9% v 89%). Be careful - It would not make sense to compare the residual mean squares of the two models because the scales of measurement involved are different. (a). Estimates of Regression Line Parameters i. Y= *X ii. 100log 10 (Y)= *X Table 1. Parameters estimations for dependent = Boiling Point Parameter Estimates Parameter Standard Variable Label DF Estimate Error t Value Pr > t Intercept Intercept <.0001 Temp Temperature <.0001 Table. Parameters estimations for dependent = 100 log 10 (Boiling Point) Parameter Estimates Parameter Standard Variable Label DF Estimate Error t Value Pr > t Intercept Intercept Temp Temperature <.0001 (b). Analysis of Variance Tables Y=Boiling Point Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model <.0001 Error Corrected Total

6 PubHlth 640. Regression and Correlation Page 6 of 19 Y = 100 log 10 (Boiling Point) Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model <.0001 Error Corrected Total (c). R Y=Boiling Point: R = Y = 100 log 10 (Boiling Point) R = (d) Scatterplot with Overlay of Fitted Line Y=Boiling Point

7 PubHlth 640. Regression and Correlation Page 7 of 19 Y = 100 log 10 (Boiling Point)

8 PubHlth 640. Regression and Correlation Page 8 of 19 For STATA users (version 10.1) *comments begin with an asterisk and are shown in green Read in data and create NEWY = 100 log 10 (BOILING). * toggle off the screen by screen pausing of results. set more off. * read into memory the data for exercise #3. It is named week0.dta. use " * Use the command GENERATE to create a new variable called newy. generate newy=100*log10(boiling) Model 1: Y=Boiling Point Least squares estimation and analysis of variance table. regress boiling temp You should see.. Source SS df MS Number of obs = F( 1, 9) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = boiling Coef. Std. Err. t P> t [95% Conf. Interval] temp _cons Scatter plot with overlay of fitted line.. graph twoway (scatter boiling temp, msymbol(d)) (lfit boiling temp), title("simple Linear Regression") subtitle("y=boiling v X=temp") note("unit_graph01.png")

9 PubHlth 640. Regression and Correlation Page 9 of 19 You should see.. Model : NEWY=100*log10(Boiling Point) Least squares estimation and analysis of variance table. regress newy temp You should see.. Source SS df MS Number of obs = F( 1, 9) = 3.69 Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = newy Coef. Std. Err. t P> t [95% Conf. Interval] temp _cons

10 PubHlth 640. Regression and Correlation Page 10 of 19 Scatter plot with overlay of fitted line. Note - This requires saving the predicted values. Here I ve saved them as hat_temp. graph twoway (scatter newy temp, msymbol(d)) (lfit newy temp), title("simple Linear Regression") subtitle("y=100*log10(boiling) v X=temp") note("unit_graph0.png") You should see..

11 PubHlth 640. Regression and Correlation Page 11 of A psychiatrist wants to know whether the level of pathology (Y) in psychotic patients 6 months after treatment could be predicted with reasonable accuracy from knowledge of pretreatment symptom ratings of thinking disturbance (X 1 ) and hostile suspiciousness (X ). (a) The least squares estimation equation involving both independent variables is given by Y = (X 1 ) 7.147(X ) Using this equation, determine the predicted level of pathology (Y) for a patient with pretreatment scores of.80 on thinking disturbance and 7.0 on hostile suspiciousness. How does the predicted value obtained compare with the actual value of 5 observed for this patient? Y = X X X1 X with =.80 and =7.0 Y = This value is lower than the observed value of 5 (b) Using the analysis of variance tables below, carry out the overall regression F tests for models containing both X 1 and X, X 1 alone, and X alone. Source DF SS Regression on X Residual Source DF SS Regression on X Residual

12 PubHlth 640. Regression and Correlation Page 1 of 19 Source DF SS Regression on X 1, X 784 Residual Model Containing X 1 and X F,784 / = = ,008/ 50 on DF=,50 p-value= Conclude linear model in X 1 and X explains statistically significantly more of the variability in level of pathology (Y) than is explained by Y (the intercept model) alone. Model Containing X 1 ALONE F 1546 /1 = = ,46 / 51 on DF=1,51 p-value= Conclude linear model in X 1 explains statistically significantly more of the variability in level of pathology (Y) than is explained by Y (the intercept model) alone. Model Containing X ALONE F 160 /1 = = ,63 / 51 on DF=1,51 p-value= Conclude linear model in X does NOT explain statistically significantly more of the variability in level of pathology (Y) than is explained by Y (the intercept model) alone.

13 PubHlth 640. Regression and Correlation Page 13 of 19 (c) Based on your results in part (b), how would you rate the importance of the two variables in predicting Y? X explains a significant proportion of the variability in Y when modelled as a linear predictor. 1 X does not. (However, we don't know if a different functional form might have been important.) (d) What are the R values for the three regressions referred to in part (b)? Total SSQ= (Regression SSQ) + (Regression SSQ) is constant. Therefore total SSQ can be calculated from just one anova table: Total (SSQ)= 1, ,46 = 13,79 ( ) R X1 only = (Re gression SSQ)/(Total SSQ) R (X only) = (160)/(13,79) = ( ) ( ) = (1546)/(13,79) = R X1and X = 784 /(13, 79) = (e) What is the best model involving either one or both of the two independent variables? Eliminate from consideration model with X only. Compare model with X1 alone versus X1 and X using partial F test. Re gression SSQ / DF ( ) /1 Partial F = = Re sidual SSQ( full) / DF (11,008) / 50 = on DF=1,50 P-value = Most appropriate model includes X and X 1

14 PubHlth 640. Regression and Correlation Page 14 of In an experiment to describe the toxic action of a certain chemical on silkworm larvae, the relationship of log 10 (dose) and log 10 (larva weight) to log 10 (survival) was sought. The data, obtained by feeding each larva a precisely measured dose of the chemical in an aqueous solution and then recording the survival time (ie time until death) are given in the table. Also given are relevant computer results and the analysis of variance table. Larva Y = log 10 (survival time) X 1 =log 10 (dose) X =log 10 (weight) Larva Y = log 10 (survival time) X 1 =log 10 (dose) X =log 10 (weight) Y = (X 1 ) Y = (X ) Y = (X 1 ) + ).871 (X ) Source DF SS Regression on X Residual Source DF SS Regression on X Residual Source DF SS Regression on X 1, X Residual

15 PubHlth 640. Regression and Correlation Page 15 of 19 (a) Test for the significance of the overall regression involving both independent variables X 1 and X. X1and X (0.464) / F = = (0.0471) /1 on DF =,1 P value < (b) Test to see whether using X 1 alone significantly helps in predicting survival time. X1 alone (0.3633) /1 F = = on DF = 1,13 (0.1480) /13 P value = (c) Test to see whether using X alone significantly helps in predicting survival time. X alone (0.3367) /1 F = = 5.07 (0.1746) /13 on = 1,13 P value = (d) Compute R for each of the three models. TotalSSQ = ( ) R X1 and X = / = R (X1 alone) = / = ( ) R X alone = / = (e) Which independent predictor do you consider to be the best single predictor of survival time? Based on overall F test and comparison of, the single predictor model containing X is better. R 1

16 PubHlth 640. Regression and Correlation Page 16 of 19 (f) Which model involving one or both of the independent predictors do you prefer and why? Partial F for comparing model with X1alone versus model with X1 an d X ( gressionssq) ( sidual SSQ) ( ) Re / Re g DF /1 = = Re / Re sidual DF /1 = on DF=1,1 P-value= Choose model with both X and X 1 6. Using whatever software package you like, try your hand at reproducing the analysis of variance tables you worked with in problem #5. For Stata users Green comments (note: comments begin with an asterisk) Black stata syntax Blue output (note: I have shown only the analysis of variance table portions of the output). * Use FILE > OPEN to read in data from week03.dta. use " * Anova table for X1. regress y x1 Source SS df MS Number of obs = F( 1, 13) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = * Anova table for X. regress y x Source SS df MS Number of obs = F( 1, 13) = 5.07 Model Prob > F = Residual R-squared = Adj R-squared = 0.63 Total Root MSE =.11589

17 PubHlth 640. Regression and Correlation Page 17 of 19. * Anova table for X1 and X. regress y x1 x Source SS df MS Number of obs = F(, 1) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = An educator examined the relationship between number of hours devoted to reading each week (Y) and the independent variables social class (X 1 ), number of years school completed (X ), and reading speed measured by pages read per hour (X 3 ). The analysis of variance table obtained from a stepwise regression analysis on data for a sample of 19 women over the age of 60 is shown. Source DF SSQ Regression (X 3 ) (X X 3 ) (X 1 X,X 3 ) Residual (a) Test the significance of each variable as it enters the model. Total SSQ = X 3 Step 1 Enters Regression SSQ = on DF=1 Residual SSQ = ( ) + (37.98) + ( )= F= / /17 = on DF=1,17 P-value = on DF=17 Step X 3 already in X Enters Additional Regression SSQ = on DF=1 Residual SSQ = ( ) + (37.98)=401.8 on DF= /1 F= /16 P-value = = 7.36 on DF=1,16 Step 3 X and X 3 already exist in X 1 Enters Additional Regression SSQ = on DF=1 Residual SSQ = on DF=15

18 PubHlth 640. Regression and Correlation Page 18 of /1 F= 363.3/15 P-value=0.964 = on DF=1,15 (b) Test H O : β 1 = β = 0 in the model Y = β 0 + β 1 X 1 + β X + β 3 X 3 + E. Models to Compare: X alone versus (X 1, X and X 3) 3 Partial F = Re gression SSQ / Re g DF Re sidual SSQ / Re s DF P-value= ( + ) / = = / on DF (c) Why can t we test H O : β 1 = β 3 = 0 using the ANOVA table given? What formula would you use for this test? The regression SSQ for the model containing X alone is not available. Re g SSQ(mod el w X1and X and X 3) Re g SSQ( X alone) / Partial F = Re sid SSQ( X, X and X ) / (d) What is your overall evaluation concerning the appropriate model to use given the results in parts (a) and (b)? The most appropriate model is the one with two predictors, X3and X ( R = ). 1 The additional predictive information in X (change in R = 0.031) is not statistically significant (p=0.3)

19 PubHlth 640. Regression and Correlation Page 19 of Consider the following analysis of variance table. Source DF SS Regression (X 1 ) 1 18, (X 3 X 1 ) 1 7, (X X 1,X 3 ) Residual 16,48.3 8,.3 Using a type I error of 0.05, (a) Provide a test to compare the following two models: Y = β 0 + β 1 X 1 + β X + β 3 X 3 + E. VERSUS Y = β 0 + β 1 X 1 + E. RegSSQ(X 1, X,X 3)-RegSSQ(X 1)/ Partial F= ResSSQ(X,X X )/ P-value=< ( )/ = = 4.983on DF=,16 (48.3)/16 (b) Provide a test to compare the following two models: Y = β 0 + β 1 X 1 + β 3 X 3 + E. VERSUS Y = β 0 + E. RegSSQ(X 1,X )/ (18, )/ Overall F= = ResSSQ(X,X )/17 (, )/ P-value< = on DF=,17 (c) State which two models are being compared in computing: (18, , )/3 F = (48.3)/16 Y = β + β X + β X + β X + E Y= β + E versus

BIOSTATS 640 Spring 2018 Unit 2. Regression and Correlation (Part 1 of 2) STATA Users

BIOSTATS 640 Spring 2018 Unit 2. Regression and Correlation (Part 1 of 2) STATA Users Unit Regression and Correlation 1 of - Practice Problems Solutions Stata Users 1. In this exercise, you will gain some practice doing a simple linear regression using a Stata data set called week0.dta.

More information

BIOSTATS 640 Spring 2018 Unit 2. Regression and Correlation (Part 1 of 2) R Users

BIOSTATS 640 Spring 2018 Unit 2. Regression and Correlation (Part 1 of 2) R Users BIOSTATS 640 Spring 08 Unit. Regression and Correlation (Part of ) R Users Unit Regression and Correlation of - Practice Problems Solutions R Users. In this exercise, you will gain some practice doing

More information

Unit 2 Regression and Correlation WEEK 3 - Practice Problems. Due: Wednesday February 10, 2016

Unit 2 Regression and Correlation WEEK 3 - Practice Problems. Due: Wednesday February 10, 2016 BIOSTATS 640 Spring 2016 2. Regression and Correlation Page 1 of 5 Unit 2 Regression and Correlation WEEK 3 - Practice Problems Due: Wednesday February 10, 2016 1. A psychiatrist wants to know whether

More information

STATISTICS 110/201 PRACTICE FINAL EXAM

STATISTICS 110/201 PRACTICE FINAL EXAM STATISTICS 110/201 PRACTICE FINAL EXAM Questions 1 to 5: There is a downloadable Stata package that produces sequential sums of squares for regression. In other words, the SS is built up as each variable

More information

1 A Review of Correlation and Regression

1 A Review of Correlation and Regression 1 A Review of Correlation and Regression SW, Chapter 12 Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then

More information

General Linear Model (Chapter 4)

General Linear Model (Chapter 4) General Linear Model (Chapter 4) Outcome variable is considered continuous Simple linear regression Scatterplots OLS is BLUE under basic assumptions MSE estimates residual variance testing regression coefficients

More information

Linear Modelling in Stata Session 6: Further Topics in Linear Modelling

Linear Modelling in Stata Session 6: Further Topics in Linear Modelling Linear Modelling in Stata Session 6: Further Topics in Linear Modelling Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 14/11/2017 This Week Categorical Variables Categorical

More information

Acknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression

Acknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression INTRODUCTION TO CLINICAL RESEARCH Introduction to Linear Regression Karen Bandeen-Roche, Ph.D. July 17, 2012 Acknowledgements Marie Diener-West Rick Thompson ICTR Leadership / Team JHU Intro to Clinical

More information

Unit 9 Regression and Correlation Homework #14 (Unit 9 Regression and Correlation) SOLUTIONS. X = cigarette consumption (per capita in 1930)

Unit 9 Regression and Correlation Homework #14 (Unit 9 Regression and Correlation) SOLUTIONS. X = cigarette consumption (per capita in 1930) BIOSTATS 540 Fall 2015 Introductory Biostatistics Page 1 of 10 Unit 9 Regression and Correlation Homework #14 (Unit 9 Regression and Correlation) SOLUTIONS Consider the following study of the relationship

More information

Name: Biostatistics 1 st year Comprehensive Examination: Applied in-class exam. June 8 th, 2016: 9am to 1pm

Name: Biostatistics 1 st year Comprehensive Examination: Applied in-class exam. June 8 th, 2016: 9am to 1pm Name: Biostatistics 1 st year Comprehensive Examination: Applied in-class exam June 8 th, 2016: 9am to 1pm Instructions: 1. This is exam is to be completed independently. Do not discuss your work with

More information

Correlation and Simple Linear Regression

Correlation and Simple Linear Regression Correlation and Simple Linear Regression Sasivimol Rattanasiri, Ph.D Section for Clinical Epidemiology and Biostatistics Ramathibodi Hospital, Mahidol University E-mail: sasivimol.rat@mahidol.ac.th 1 Outline

More information

Section Least Squares Regression

Section Least Squares Regression Section 2.3 - Least Squares Regression Statistics 104 Autumn 2004 Copyright c 2004 by Mark E. Irwin Regression Correlation gives us a strength of a linear relationship is, but it doesn t tell us what it

More information

Lab 10 - Binary Variables

Lab 10 - Binary Variables Lab 10 - Binary Variables Spring 2017 Contents 1 Introduction 1 2 SLR on a Dummy 2 3 MLR with binary independent variables 3 3.1 MLR with a Dummy: different intercepts, same slope................. 4 3.2

More information

Essential of Simple regression

Essential of Simple regression Essential of Simple regression We use simple regression when we are interested in the relationship between two variables (e.g., x is class size, and y is student s GPA). For simplicity we assume the relationship

More information

Lecture 3: Multiple Regression. Prof. Sharyn O Halloran Sustainable Development U9611 Econometrics II

Lecture 3: Multiple Regression. Prof. Sharyn O Halloran Sustainable Development U9611 Econometrics II Lecture 3: Multiple Regression Prof. Sharyn O Halloran Sustainable Development Econometrics II Outline Basics of Multiple Regression Dummy Variables Interactive terms Curvilinear models Review Strategies

More information

ECO220Y Simple Regression: Testing the Slope

ECO220Y Simple Regression: Testing the Slope ECO220Y Simple Regression: Testing the Slope Readings: Chapter 18 (Sections 18.3-18.5) Winter 2012 Lecture 19 (Winter 2012) Simple Regression Lecture 19 1 / 32 Simple Regression Model y i = β 0 + β 1 x

More information

Statistical Modelling in Stata 5: Linear Models

Statistical Modelling in Stata 5: Linear Models Statistical Modelling in Stata 5: Linear Models Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 07/11/2017 Structure This Week What is a linear model? How good is my model? Does

More information

Model Building Chap 5 p251

Model Building Chap 5 p251 Model Building Chap 5 p251 Models with one qualitative variable, 5.7 p277 Example 4 Colours : Blue, Green, Lemon Yellow and white Row Blue Green Lemon Insects trapped 1 0 0 1 45 2 0 0 1 59 3 0 0 1 48 4

More information

Problem Set #5-Key Sonoma State University Dr. Cuellar Economics 317- Introduction to Econometrics

Problem Set #5-Key Sonoma State University Dr. Cuellar Economics 317- Introduction to Econometrics Problem Set #5-Key Sonoma State University Dr. Cuellar Economics 317- Introduction to Econometrics C1.1 Use the data set Wage1.dta to answer the following questions. Estimate regression equation wage =

More information

Sociology Exam 2 Answer Key March 30, 2012

Sociology Exam 2 Answer Key March 30, 2012 Sociology 63993 Exam 2 Answer Key March 30, 2012 I. True-False. (20 points) Indicate whether the following statements are true or false. If false, briefly explain why. 1. A researcher has constructed scales

More information

Problem Set #3-Key. wage Coef. Std. Err. t P> t [95% Conf. Interval]

Problem Set #3-Key. wage Coef. Std. Err. t P> t [95% Conf. Interval] Problem Set #3-Key Sonoma State University Economics 317- Introduction to Econometrics Dr. Cuellar 1. Use the data set Wage1.dta to answer the following questions. a. For the regression model Wage i =

More information

sociology 362 regression

sociology 362 regression sociology 36 regression Regression is a means of modeling how the conditional distribution of a response variable (say, Y) varies for different values of one or more independent explanatory variables (say,

More information

Lab 6 - Simple Regression

Lab 6 - Simple Regression Lab 6 - Simple Regression Spring 2017 Contents 1 Thinking About Regression 2 2 Regression Output 3 3 Fitted Values 5 4 Residuals 6 5 Functional Forms 8 Updated from Stata tutorials provided by Prof. Cichello

More information

STAT 350 Final (new Material) Review Problems Key Spring 2016

STAT 350 Final (new Material) Review Problems Key Spring 2016 1. The editor of a statistics textbook would like to plan for the next edition. A key variable is the number of pages that will be in the final version. Text files are prepared by the authors using LaTeX,

More information

Problem Set 1 ANSWERS

Problem Set 1 ANSWERS Economics 20 Prof. Patricia M. Anderson Problem Set 1 ANSWERS Part I. Multiple Choice Problems 1. If X and Z are two random variables, then E[X-Z] is d. E[X] E[Z] This is just a simple application of one

More information

SOCY5601 Handout 8, Fall DETECTING CURVILINEARITY (continued) CONDITIONAL EFFECTS PLOTS

SOCY5601 Handout 8, Fall DETECTING CURVILINEARITY (continued) CONDITIONAL EFFECTS PLOTS SOCY5601 DETECTING CURVILINEARITY (continued) CONDITIONAL EFFECTS PLOTS More on use of X 2 terms to detect curvilinearity: As we have said, a quick way to detect curvilinearity in the relationship between

More information

ECON3150/4150 Spring 2015

ECON3150/4150 Spring 2015 ECON3150/4150 Spring 2015 Lecture 3&4 - The linear regression model Siv-Elisabeth Skjelbred University of Oslo January 29, 2015 1 / 67 Chapter 4 in S&W Section 17.1 in S&W (extended OLS assumptions) 2

More information

Empirical Application of Simple Regression (Chapter 2)

Empirical Application of Simple Regression (Chapter 2) Empirical Application of Simple Regression (Chapter 2) 1. The data file is House Data, which can be downloaded from my webpage. 2. Use stata menu File Import Excel Spreadsheet to read the data. Don t forget

More information

1: a b c d e 2: a b c d e 3: a b c d e 4: a b c d e 5: a b c d e. 6: a b c d e 7: a b c d e 8: a b c d e 9: a b c d e 10: a b c d e

1: a b c d e 2: a b c d e 3: a b c d e 4: a b c d e 5: a b c d e. 6: a b c d e 7: a b c d e 8: a b c d e 9: a b c d e 10: a b c d e Economics 102: Analysis of Economic Data Cameron Spring 2016 Department of Economics, U.C.-Davis Final Exam (A) Tuesday June 7 Compulsory. Closed book. Total of 58 points and worth 45% of course grade.

More information

Soc 63993, Homework #7 Answer Key: Nonlinear effects/ Intro to path analysis

Soc 63993, Homework #7 Answer Key: Nonlinear effects/ Intro to path analysis Soc 63993, Homework #7 Answer Key: Nonlinear effects/ Intro to path analysis Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised February 20, 2015 Problem 1. The files

More information

sociology sociology Scatterplots Quantitative Research Methods: Introduction to correlation and regression Age vs Income

sociology sociology Scatterplots Quantitative Research Methods: Introduction to correlation and regression Age vs Income Scatterplots Quantitative Research Methods: Introduction to correlation and regression Scatterplots can be considered as interval/ratio analogue of cross-tabs: arbitrarily many values mapped out in -dimensions

More information

ECON3150/4150 Spring 2016

ECON3150/4150 Spring 2016 ECON3150/4150 Spring 2016 Lecture 4 - The linear regression model Siv-Elisabeth Skjelbred University of Oslo Last updated: January 26, 2016 1 / 49 Overview These lecture slides covers: The linear regression

More information

Interpreting coefficients for transformed variables

Interpreting coefficients for transformed variables Interpreting coefficients for transformed variables! Recall that when both independent and dependent variables are untransformed, an estimated coefficient represents the change in the dependent variable

More information

Analysis of repeated measurements (KLMED8008)

Analysis of repeated measurements (KLMED8008) Analysis of repeated measurements (KLMED8008) Eirik Skogvoll, MD PhD Professor and Consultant Institute of Circulation and Medical Imaging Dept. of Anaesthesiology and Emergency Medicine 1 Day 2 Practical

More information

2.1. Consider the following production function, known in the literature as the transcendental production function (TPF).

2.1. Consider the following production function, known in the literature as the transcendental production function (TPF). CHAPTER Functional Forms of Regression Models.1. Consider the following production function, known in the literature as the transcendental production function (TPF). Q i B 1 L B i K i B 3 e B L B K 4 i

More information

sociology 362 regression

sociology 362 regression sociology 36 regression Regression is a means of studying how the conditional distribution of a response variable (say, Y) varies for different values of one or more independent explanatory variables (say,

More information

Interaction effects between continuous variables (Optional)

Interaction effects between continuous variables (Optional) Interaction effects between continuous variables (Optional) Richard Williams, University of Notre Dame, https://www.nd.edu/~rwilliam/ Last revised February 0, 05 This is a very brief overview of this somewhat

More information

Correlation and regression. Correlation and regression analysis. Measures of association. Why bother? Positive linear relationship

Correlation and regression. Correlation and regression analysis. Measures of association. Why bother? Positive linear relationship 1 Correlation and regression analsis 12 Januar 2009 Monda, 14.00-16.00 (C1058) Frank Haege Department of Politics and Public Administration Universit of Limerick frank.haege@ul.ie www.frankhaege.eu Correlation

More information

Six Sigma Black Belt Study Guides

Six Sigma Black Belt Study Guides Six Sigma Black Belt Study Guides 1 www.pmtutor.org Powered by POeT Solvers Limited. Analyze Correlation and Regression Analysis 2 www.pmtutor.org Powered by POeT Solvers Limited. Variables and relationships

More information

Answer all questions from part I. Answer two question from part II.a, and one question from part II.b.

Answer all questions from part I. Answer two question from part II.a, and one question from part II.b. B203: Quantitative Methods Answer all questions from part I. Answer two question from part II.a, and one question from part II.b. Part I: Compulsory Questions. Answer all questions. Each question carries

More information

Section I. Define or explain the following terms (3 points each) 1. centered vs. uncentered 2 R - 2. Frisch theorem -

Section I. Define or explain the following terms (3 points each) 1. centered vs. uncentered 2 R - 2. Frisch theorem - First Exam: Economics 388, Econometrics Spring 006 in R. Butler s class YOUR NAME: Section I (30 points) Questions 1-10 (3 points each) Section II (40 points) Questions 11-15 (10 points each) Section III

More information

Econometrics Midterm Examination Answers

Econometrics Midterm Examination Answers Econometrics Midterm Examination Answers March 4, 204. Question (35 points) Answer the following short questions. (i) De ne what is an unbiased estimator. Show that X is an unbiased estimator for E(X i

More information

especially with continuous

especially with continuous Handling interactions in Stata, especially with continuous predictors Patrick Royston & Willi Sauerbrei UK Stata Users meeting, London, 13-14 September 2012 Interactions general concepts General idea of

More information

SplineLinear.doc 1 # 9 Last save: Saturday, 9. December 2006

SplineLinear.doc 1 # 9 Last save: Saturday, 9. December 2006 SplineLinear.doc 1 # 9 Problem:... 2 Objective... 2 Reformulate... 2 Wording... 2 Simulating an example... 3 SPSS 13... 4 Substituting the indicator function... 4 SPSS-Syntax... 4 Remark... 4 Result...

More information

Econometrics Homework 1

Econometrics Homework 1 Econometrics Homework Due Date: March, 24. by This problem set includes questions for Lecture -4 covered before midterm exam. Question Let z be a random column vector of size 3 : z = @ (a) Write out z

More information

Problem Set 10: Panel Data

Problem Set 10: Panel Data Problem Set 10: Panel Data 1. Read in the data set, e11panel1.dta from the course website. This contains data on a sample or 1252 men and women who were asked about their hourly wage in two years, 2005

More information

9. Linear Regression and Correlation

9. Linear Regression and Correlation 9. Linear Regression and Correlation Data: y a quantitative response variable x a quantitative explanatory variable (Chap. 8: Recall that both variables were categorical) For example, y = annual income,

More information

1. The shoe size of five randomly selected men in the class is 7, 7.5, 6, 6.5 the shoe size of 4 randomly selected women is 6, 5.

1. The shoe size of five randomly selected men in the class is 7, 7.5, 6, 6.5 the shoe size of 4 randomly selected women is 6, 5. Economics 3 Introduction to Econometrics Winter 2004 Professor Dobkin Name Final Exam (Sample) You must answer all the questions. The exam is closed book and closed notes you may use calculators. You must

More information

ECON Introductory Econometrics. Lecture 4: Linear Regression with One Regressor

ECON Introductory Econometrics. Lecture 4: Linear Regression with One Regressor ECON4150 - Introductory Econometrics Lecture 4: Linear Regression with One Regressor Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter 4 Lecture outline 2 The OLS estimators The effect of

More information

Inference with Simple Regression

Inference with Simple Regression 1 Introduction Inference with Simple Regression Alan B. Gelder 06E:071, The University of Iowa 1 Moving to infinite means: In this course we have seen one-mean problems, twomean problems, and problems

More information

5. Let W follow a normal distribution with mean of μ and the variance of 1. Then, the pdf of W is

5. Let W follow a normal distribution with mean of μ and the variance of 1. Then, the pdf of W is Practice Final Exam Last Name:, First Name:. Please write LEGIBLY. Answer all questions on this exam in the space provided (you may use the back of any page if you need more space). Show all work but do

More information

Lecture (chapter 13): Association between variables measured at the interval-ratio level

Lecture (chapter 13): Association between variables measured at the interval-ratio level Lecture (chapter 13): Association between variables measured at the interval-ratio level Ernesto F. L. Amaral April 9 11, 2018 Advanced Methods of Social Research (SOCI 420) Source: Healey, Joseph F. 2015.

More information

1 Warm-Up: 2 Adjusted R 2. Introductory Applied Econometrics EEP/IAS 118 Spring Sylvan Herskowitz Section #

1 Warm-Up: 2 Adjusted R 2. Introductory Applied Econometrics EEP/IAS 118 Spring Sylvan Herskowitz Section # Introductory Applied Econometrics EEP/IAS 118 Spring 2015 Sylvan Herskowitz Section #10 4-1-15 1 Warm-Up: Remember that exam you took before break? We had a question that said this A researcher wants to

More information

PubHlth 540 Fall Summarizing Data Page 1 of 18. Unit 1 - Summarizing Data Practice Problems. Solutions

PubHlth 540 Fall Summarizing Data Page 1 of 18. Unit 1 - Summarizing Data Practice Problems. Solutions PubHlth 50 Fall 0. Summarizing Data Page of 8 Unit - Summarizing Data Practice Problems Solutions #. a. Qualitative - ordinal b. Qualitative - nominal c. Quantitative continuous, ratio d. Qualitative -

More information

Introduction to Regression

Introduction to Regression Introduction to Regression ιατµηµατικό Πρόγραµµα Μεταπτυχιακών Σπουδών Τεχνο-Οικονοµικά Συστήµατα ηµήτρης Φουσκάκης Introduction Basic idea: Use data to identify relationships among variables and use these

More information

Confidence Interval for the mean response

Confidence Interval for the mean response Week 3: Prediction and Confidence Intervals at specified x. Testing lack of fit with replicates at some x's. Inference for the correlation. Introduction to regression with several explanatory variables.

More information

STAT 3022 Spring 2007

STAT 3022 Spring 2007 Simple Linear Regression Example These commands reproduce what we did in class. You should enter these in R and see what they do. Start by typing > set.seed(42) to reset the random number generator so

More information

Homework Solutions Applied Logistic Regression

Homework Solutions Applied Logistic Regression Homework Solutions Applied Logistic Regression WEEK 6 Exercise 1 From the ICU data, use as the outcome variable vital status (STA) and CPR prior to ICU admission (CPR) as a covariate. (a) Demonstrate that

More information

Lecture 12: Interactions and Splines

Lecture 12: Interactions and Splines Lecture 12: Interactions and Splines Sandy Eckel seckel@jhsph.edu 12 May 2007 1 Definition Effect Modification The phenomenon in which the relationship between the primary predictor and outcome varies

More information

Inference for the Regression Coefficient

Inference for the Regression Coefficient Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression line. We can shows that b 0 and b 1 are the unbiased estimates

More information

ECON Introductory Econometrics. Lecture 7: OLS with Multiple Regressors Hypotheses tests

ECON Introductory Econometrics. Lecture 7: OLS with Multiple Regressors Hypotheses tests ECON4150 - Introductory Econometrics Lecture 7: OLS with Multiple Regressors Hypotheses tests Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter 7 Lecture outline 2 Hypothesis test for single

More information

Question 1a 1b 1c 1d 1e 2a 2b 2c 2d 2e 2f 3a 3b 3c 3d 3e 3f M ult: choice Points

Question 1a 1b 1c 1d 1e 2a 2b 2c 2d 2e 2f 3a 3b 3c 3d 3e 3f M ult: choice Points Economics 102: Analysis of Economic Data Cameron Spring 2016 May 12 Department of Economics, U.C.-Davis Second Midterm Exam (Version A) Compulsory. Closed book. Total of 30 points and worth 22.5% of course

More information

Unit 2. Regression and Correlation

Unit 2. Regression and Correlation BIOSTATS 640 - Spring 016. Regression and Correlation Page 1 of 88 Unit. Regression and Correlation Don t let us quarrel, the White Queen said in an anxious tone. What is the cause of lighting? The cause

More information

1 The basics of panel data

1 The basics of panel data Introductory Applied Econometrics EEP/IAS 118 Spring 2015 Related materials: Steven Buck Notes to accompany fixed effects material 4-16-14 ˆ Wooldridge 5e, Ch. 1.3: The Structure of Economic Data ˆ Wooldridge

More information

Exam Applied Statistical Regression. Good Luck!

Exam Applied Statistical Regression. Good Luck! Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.

More information

S o c i o l o g y E x a m 2 A n s w e r K e y - D R A F T M a r c h 2 7,

S o c i o l o g y E x a m 2 A n s w e r K e y - D R A F T M a r c h 2 7, S o c i o l o g y 63993 E x a m 2 A n s w e r K e y - D R A F T M a r c h 2 7, 2 0 0 9 I. True-False. (20 points) Indicate whether the following statements are true or false. If false, briefly explain

More information

CAMPBELL COLLABORATION

CAMPBELL COLLABORATION CAMPBELL COLLABORATION Random and Mixed-effects Modeling C Training Materials 1 Overview Effect-size estimates Random-effects model Mixed model C Training Materials Effect sizes Suppose we have computed

More information

Inferences for Regression

Inferences for Regression Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In

More information

Working with Stata Inference on the mean

Working with Stata Inference on the mean Working with Stata Inference on the mean Nicola Orsini Biostatistics Team Department of Public Health Sciences Karolinska Institutet Dataset: hyponatremia.dta Motivating example Outcome: Serum sodium concentration,

More information

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Applied Statistics Friday, January 15, 2016

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Applied Statistics Friday, January 15, 2016 UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Applied Statistics Friday, January 15, 2016 Work all problems. 60 points are needed to pass at the Masters Level and 75 to pass at the

More information

Lab 07 Introduction to Econometrics

Lab 07 Introduction to Econometrics Lab 07 Introduction to Econometrics Learning outcomes for this lab: Introduce the different typologies of data and the econometric models that can be used Understand the rationale behind econometrics Understand

More information

Sociology 63993, Exam 2 Answer Key [DRAFT] March 27, 2015 Richard Williams, University of Notre Dame,

Sociology 63993, Exam 2 Answer Key [DRAFT] March 27, 2015 Richard Williams, University of Notre Dame, Sociology 63993, Exam 2 Answer Key [DRAFT] March 27, 2015 Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ I. True-False. (20 points) Indicate whether the following statements

More information

Chapter 14. Multiple Regression Models. Multiple Regression Models. Multiple Regression Models

Chapter 14. Multiple Regression Models. Multiple Regression Models. Multiple Regression Models Chapter 14 Multiple Regression Models 1 Multiple Regression Models A general additive multiple regression model, which relates a dependent variable y to k predictor variables,,, is given by the model equation

More information

Specification Error: Omitted and Extraneous Variables

Specification Error: Omitted and Extraneous Variables Specification Error: Omitted and Extraneous Variables Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised February 5, 05 Omitted variable bias. Suppose that the correct

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

A discussion on multiple regression models

A discussion on multiple regression models A discussion on multiple regression models In our previous discussion of simple linear regression, we focused on a model in which one independent or explanatory variable X was used to predict the value

More information

Nonlinear relationships Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised February 20, 2015

Nonlinear relationships Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised February 20, 2015 Nonlinear relationships Richard Williams, University of Notre Dame, https://www.nd.edu/~rwilliam/ Last revised February, 5 Sources: Berry & Feldman s Multiple Regression in Practice 985; Pindyck and Rubinfeld

More information

The simple linear regression model discussed in Chapter 13 was written as

The simple linear regression model discussed in Chapter 13 was written as 1519T_c14 03/27/2006 07:28 AM Page 614 Chapter Jose Luis Pelaez Inc/Blend Images/Getty Images, Inc./Getty Images, Inc. 14 Multiple Regression 14.1 Multiple Regression Analysis 14.2 Assumptions of the Multiple

More information

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6 STA 8 Applied Linear Models: Regression Analysis Spring 011 Solution for Homework #6 6. a) = 11 1 31 41 51 1 3 4 5 11 1 31 41 51 β = β1 β β 3 b) = 1 1 1 1 1 11 1 31 41 51 1 3 4 5 β = β 0 β1 β 6.15 a) Stem-and-leaf

More information

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises LINEAR REGRESSION ANALYSIS MODULE XVI Lecture - 44 Exercises Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Exercise 1 The following data has been obtained on

More information

AP Statistics Unit 6 Note Packet Linear Regression. Scatterplots and Correlation

AP Statistics Unit 6 Note Packet Linear Regression. Scatterplots and Correlation Scatterplots and Correlation Name Hr A scatterplot shows the relationship between two quantitative variables measured on the same individuals. variable (y) measures an outcome of a study variable (x) may

More information

Regression, Part I. - In correlation, it would be irrelevant if we changed the axes on our graph.

Regression, Part I. - In correlation, it would be irrelevant if we changed the axes on our graph. Regression, Part I I. Difference from correlation. II. Basic idea: A) Correlation describes the relationship between two variables, where neither is independent or a predictor. - In correlation, it would

More information

171:162 Design and Analysis of Biomedical Studies, Summer 2011 Exam #3, July 16th

171:162 Design and Analysis of Biomedical Studies, Summer 2011 Exam #3, July 16th Name 171:162 Design and Analysis of Biomedical Studies, Summer 2011 Exam #3, July 16th Use the selected SAS output to help you answer the questions. The SAS output is all at the back of the exam on pages

More information

Analysis of Covariance. The following example illustrates a case where the covariate is affected by the treatments.

Analysis of Covariance. The following example illustrates a case where the covariate is affected by the treatments. Analysis of Covariance In some experiments, the experimental units (subjects) are nonhomogeneous or there is variation in the experimental conditions that are not due to the treatments. For example, a

More information

1 Introduction to Minitab

1 Introduction to Minitab 1 Introduction to Minitab Minitab is a statistical analysis software package. The software is freely available to all students and is downloadable through the Technology Tab at my.calpoly.edu. When you

More information

Economics 326 Methods of Empirical Research in Economics. Lecture 14: Hypothesis testing in the multiple regression model, Part 2

Economics 326 Methods of Empirical Research in Economics. Lecture 14: Hypothesis testing in the multiple regression model, Part 2 Economics 326 Methods of Empirical Research in Economics Lecture 14: Hypothesis testing in the multiple regression model, Part 2 Vadim Marmer University of British Columbia May 5, 2010 Multiple restrictions

More information

ECON3150/4150 Spring 2016

ECON3150/4150 Spring 2016 ECON3150/4150 Spring 2016 Lecture 6 Multiple regression model Siv-Elisabeth Skjelbred University of Oslo February 5th Last updated: February 3, 2016 1 / 49 Outline Multiple linear regression model and

More information

Analysis of Bivariate Data

Analysis of Bivariate Data Analysis of Bivariate Data Data Two Quantitative variables GPA and GAES Interest rates and indices Tax and fund allocation Population size and prison population Bivariate data (x,y) Case corr&reg 2 Independent

More information

INFERENCE FOR REGRESSION

INFERENCE FOR REGRESSION CHAPTER 3 INFERENCE FOR REGRESSION OVERVIEW In Chapter 5 of the textbook, we first encountered regression. The assumptions that describe the regression model we use in this chapter are the following. We

More information

Inference. ME104: Linear Regression Analysis Kenneth Benoit. August 15, August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58

Inference. ME104: Linear Regression Analysis Kenneth Benoit. August 15, August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58 Inference ME104: Linear Regression Analysis Kenneth Benoit August 15, 2012 August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58 Stata output resvisited. reg votes1st spend_total incumb minister

More information

This module focuses on the logic of ANOVA with special attention given to variance components and the relationship between ANOVA and regression.

This module focuses on the logic of ANOVA with special attention given to variance components and the relationship between ANOVA and regression. WISE ANOVA and Regression Lab Introduction to the WISE Correlation/Regression and ANOVA Applet This module focuses on the logic of ANOVA with special attention given to variance components and the relationship

More information

Lecture 18: Simple Linear Regression

Lecture 18: Simple Linear Regression Lecture 18: Simple Linear Regression BIOS 553 Department of Biostatistics University of Michigan Fall 2004 The Correlation Coefficient: r The correlation coefficient (r) is a number that measures the strength

More information

9) Time series econometrics

9) Time series econometrics 30C00200 Econometrics 9) Time series econometrics Timo Kuosmanen Professor Management Science http://nomepre.net/index.php/timokuosmanen 1 Macroeconomic data: GDP Inflation rate Examples of time series

More information

Lecture 4: Multivariate Regression, Part 2

Lecture 4: Multivariate Regression, Part 2 Lecture 4: Multivariate Regression, Part 2 Gauss-Markov Assumptions 1) Linear in Parameters: Y X X X i 0 1 1 2 2 k k 2) Random Sampling: we have a random sample from the population that follows the above

More information

8 Analysis of Covariance

8 Analysis of Covariance 8 Analysis of Covariance Let us recall our previous one-way ANOVA problem, where we compared the mean birth weight (weight) for children in three groups defined by the mother s smoking habits. The three

More information

Econometrics. 8) Instrumental variables

Econometrics. 8) Instrumental variables 30C00200 Econometrics 8) Instrumental variables Timo Kuosmanen Professor, Ph.D. http://nomepre.net/index.php/timokuosmanen Today s topics Thery of IV regression Overidentification Two-stage least squates

More information

REVIEW 8/2/2017 陈芳华东师大英语系

REVIEW 8/2/2017 陈芳华东师大英语系 REVIEW Hypothesis testing starts with a null hypothesis and a null distribution. We compare what we have to the null distribution, if the result is too extreme to belong to the null distribution (p

More information

ECON Introductory Econometrics. Lecture 5: OLS with One Regressor: Hypothesis Tests

ECON Introductory Econometrics. Lecture 5: OLS with One Regressor: Hypothesis Tests ECON4150 - Introductory Econometrics Lecture 5: OLS with One Regressor: Hypothesis Tests Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter 5 Lecture outline 2 Testing Hypotheses about one

More information

Inference for Regression Inference about the Regression Model and Using the Regression Line

Inference for Regression Inference about the Regression Model and Using the Regression Line Inference for Regression Inference about the Regression Model and Using the Regression Line PBS Chapter 10.1 and 10.2 2009 W.H. Freeman and Company Objectives (PBS Chapter 10.1 and 10.2) Inference about

More information

Meta-analysis of epidemiological dose-response studies

Meta-analysis of epidemiological dose-response studies Meta-analysis of epidemiological dose-response studies Nicola Orsini 2nd Italian Stata Users Group meeting October 10-11, 2005 Institute Environmental Medicine, Karolinska Institutet Rino Bellocco Dept.

More information