BIOSTATS 640 Spring 2018 Unit 2. Regression and Correlation (Part 1 of 2) STATA Users

Size: px
Start display at page:

Download "BIOSTATS 640 Spring 2018 Unit 2. Regression and Correlation (Part 1 of 2) STATA Users"

Transcription

1 Unit Regression and Correlation 1 of - Practice Problems Solutions Stata Users 1. In this exercise, you will gain some practice doing a simple linear regression using a Stata data set called week0.dta. This data set has n=31 observations of boiling points (Y=boiling) and temperature (X=temp). Tip In this exercise you will need to create a new variable newy = 100*log 10 (boiling). generate newy=100*log10(boiling) Carry out an exploratory analysis to determine whether the relationship between temperature and boiling point is better represented using (i) (ii) Y = β 0+βX or 1 ' ' 100 log (Y) = β +βx In developing your answer, try our hand at producing (a) Estimates of the regression line parameters (b) Analysis of variance tables (c) R (d) Scatter plot with overlay of fitted line. Complete your answer with a one paragraph text that is an interpretation of your work. Take your time with this and have fun. sol_regression(1 of ) STATA.docx Page 1 of 18

2 Tip! A really good practice is to maintain a log of your session. Here is how. Step 1: From the main menu at upper left, click: FILE > LOG > BEGIN sol_regression(1 of ) STATA.docx Page of 18

3 Step : At the dialog box SAVE AS: Type in a file name of your choosing. Mine is carol_log Note that I have NOT typed an extension. Stata does this for you. Do NOT click on SAVE just yet. Step 3: At the dialog box WHERE: Click on the drop down menu and select a path of your choosing. Again, do NOT click on SAVE just yet Step 4: At the dialog box FILE FORMAT: click on drop down menu at right. Step 5: From the drop down menu, choose STATA LOG. Now you can click on SAVE sol_regression(1 of ) STATA.docx Page 3 of 18

4 My Stata Session Note I imported my log into an MS Word document and, there, edited all my misspellings and errors Key BLACK commands. Comments begin with asterisk and are in bold. Blue output name: <unnamed> log: /Users/carolbigelow/Desktop/hw_week0.log log type: text **** Turn off screen pause of output. set more off **** Read data that you have already downloaded into stata workspace **** At top left, FILE > OPEN **** Preliminary) Use command codebook with option compact for a quick look at data. codebook, compact Variable Obs Unique Mean Min Max Label temp boiling **** Preliminary) Use command generate to create newy = 100* log10(boiling). generate newy=100*log10(boiling) **** Simple Linear Regression * a-c) Use command regress yvariable xvariable for: fitted line, anova, and R-squared. regress boiling temp Source SS df MS Number of obs = F( 1, 9) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = boiling Coef. Std. Err. t P> t [95% Conf. Interval] temp _cons sol_regression(1 of ) STATA.docx Page 4 of 18

5 * d) Use command graph twoway for scatter plot with overlay of fitted line.** No frills graph. graph twoway (scatter boiling temp) (lfit boiling temp) * Graph with aesthetics.. graph twoway (scatter boiling temp, symbol(d)) (lfit boiling temp), title("simple Linear Regression") subtitle("y=boiling on X=temp") caption("boiling.png", size(vsmall)) Tip I used the option caption( )so that the name of my saved file (which I am about to save) will appear on the graph itself. Handy! sol_regression(1 of ) STATA.docx Page 5 of 18

6 Tip! Save your graph as a.png image so that you can paste it elsewhere. Here is how. Step 1: Create your graph (see previous page ) Step : With the graph window still active, click on the SAVE icon. Step 3: From drop down menu for File format: Choose PORTABLE NETWORK GRAPHICS (*.png) sol_regression(1 of ) STATA.docx Page 6 of 18

7 **** Simple Linear Regression of Y=100*log10(boiling) on X=temp * a-c) Use command regress yvariable xvariable for: fitted line, anova, and R-squared. regress newy temp Source SS df MS Number of obs = F( 1, 9) = 3.69 Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE =.9618 newy Coef. Std. Err. t P> t [95% Conf. Interval] temp _cons * d) Scatter plot with overlay of fitted line using the command graph twoway * No frills graph. graph twoway (scatter newy temp) (lfit newy temp) sol_regression(1 of ) STATA.docx Page 7 of 18

8 * Graph with aesthetics. graph twoway (scatter newy temp, symbol(d)) (lfit newy temp), title("simple Linear Regression") subtitle("y=100*log10(boiling) on X=temp") caption("newy.png", size(vsmall)) sol_regression(1 of ) STATA.docx Page 8 of 18

9 Putting it all together (a) Estimates of Regression Line Parameters i. Y= *X ii. 100log 10 (Y)= *X Table 1. Parameters estimations for dependent = y = boiling (Boiling Point) Parameter Estimates Parameter Standard Variable Label DF Estimate Error t Value Pr > t Intercept Intercept <.0001 Temp Temperature <.0001 Table. Parameters estimations for dependent = y = newy (100 log 10 (Boiling Point)) Parameter Estimates Parameter Standard Variable Label DF Estimate Error t Value Pr > t Intercept Intercept Temp Temperature <.0001 (b) Analysis of Variance Tables Y=boiling (Boiling Point) Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model <.0001 Error Corrected Total Y = newy (100 log 10 (Boiling Point)) Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model <.0001 Error Corrected Total sol_regression(1 of ) STATA.docx Page 9 of 18

10 (c) R Y=boiling (Boiling Point): R = Y = newy (100 log 10 (Boiling Point)) R = (d) Scatterplot with Overlay of Fitted Line Y=boiling (Boiling Point) Y = newy (100 log 10 (Boiling Point)) Solution (one paragraph of text that is interpretation of analysis): Did you notice that the scatter plot of these data reveal two outlying values? Their inclusion may or may not be appropriate. If all n=31 data points are included in the analysis, then the model that explains more of the variability in boiling point is Y=boiling point modeled linearly in X=temperature. It has a greater R (9% v 89%). Be careful - It would not make sense to compare the residual mean squares of the two models because the scales of measurement involved are different. sol_regression(1 of ) STATA.docx Page 10 of 18

11 . A psychiatrist wants to know whether the level of pathology (Y) in psychotic patients 6 months after treatment could be predicted with reasonable accuracy from knowledge of pretreatment symptom ratings of thinking disturbance (X 1 ) and hostile suspiciousness (X ). (a) The least squares estimation equation involving both independent variables is given by Y = (X 1 ) 7.147(X ) Using this equation, determine the predicted level of pathology (Y) for a patient with pretreatment scores of.80 on thinking disturbance and 7.0 on hostile suspiciousness. How does the predicted value obtained compare with the actual value of 5 observed for this patient? Ŷ = X X with X =.80 and 1 X =7.0 Ŷ = This value is lower than the observed value of 5 (b) Using the analysis of variance tables below, carry out the overall regression F tests for models containing both X 1 and X, X 1 alone, and X alone. Source DF Sum of Squares Regression on X Residual Source DF Sum of Squares Regression on X Residual sol_regression(1 of ) STATA.docx Page 11 of 18

12 Source DF Sum of Squares Regression on X 1, X 784 Residual Model Containing X 1 and X F = SSQ regression on X and X /Regression df 1 SSQ residual / Residual df =,784 / 11,008 / 50 = 6.37 on DF=,50 p-value= à Application of the null hypothesis model has led to an extremely unlikely result (p-value =.00356), prompting statistical rejection of the null hypothesis. The fitted linear model in X 1 and X explains statistically significantly more of the variability in level of pathology (Y) than is explained by Y (the intercept model) alone. Model Containing X 1 ALONE F = SSQ Regression on X /DF Regression 1 SSQ residual/df Residual = 1546 / 1 1,46 / 51 = on DF=1,51 p-value= Here, too, application of the null hypothesis model has led to an extremely unlikely result (p-value =.014), prompting statistical rejection of the null hypothesis. The fitted linear model in X 1 explains statistically significantly more of the variability in level of pathology (Y) than is explained by Y (the intercept model) alone. Model Containing X ALONE F = SSQ Regresion on X /DF Regression SSQ Residual/DF Residual = 160 / 1 13,63 / 51 = on DF=1,51 p-value= Here, application of the null hypothesis model has not led to an extremely unlikely result (p-value =.44). The null hypothesis is therefore not rejected. The fitted linear model in X does not explain statistically significantly more of the variability in level of pathology (Y) than is explained by Y (the intercept model) alone. (c) Based on your results in part (b), how would you rate the importance of the two variables in predicting Y? X explains a significant proportion of the variability in Y when modelled as a linear predictor. 1 X does not. (However, we don't know if a different functional form might have been important.) sol_regression(1 of ) STATA.docx Page 1 of 18

13 (d) What are the R values for the three regressions referred to in part (b)? Total SSQ= (Regression SSQ) + (Residual SSQ) is constant. Therefore total SSQ can be calculated from just one anova table: Total (SSQ)= 1, ,46 = 13,79 ( ) R X1 only = (Re gression SSQ)/(Total SSQ) R (X only) = (160)/(13,79) = ( ) ( ) = (1546)/(13,79) = R X1and X = 784 /(13, 79) = (e) What is the best model involving either one or both of the two independent variables? Eliminate from consideration model with X only. Compare model with X alone versus 1 X and 1 X using partial F test. Partial F = {(SSQ Regression on X,X ) - (SSQ Regression on X )}/VDF 1 1 = SSQ Residual for model w X 1,X /Residual DF = on DF =1,50 P-value =0.016 Addition of X to model containing X 1 is statistically significant (p-value =.0). More appropriate model includes X 1 and X ( ) / 1 (11,008) / 50 sol_regression(1 of ) STATA.docx Page 13 of 18

14 3. In an experiment to describe the toxic action of a certain chemical on silkworm larvae, the relationship of log 10 (dose) and log 10 (larva weight) to log 10 (survival) was sought. The data, obtained by feeding each larva a precisely measured dose of the chemical in an aqueous solution and then recording the survival time (ie time until death) are given in the table. Also given are relevant computer results and the analysis of variance table. Larva Y = log 10 (survival time) X 1 =log 10 (dose) X =log 10 (weight) Larva Y = log 10 (survival time) X 1 =log 10 (dose) X =log 10 (weight) Y = (X 1 ) Y = (X ) Y = (X 1 ) + ).871 (X ) Source DF Sum of Squares Regression on X Residual Source DF Sum of Squares Regression on X Residual Source DF Sum of Squares Regression on X 1, X Residual sol_regression(1 of ) STATA.docx Page 14 of 18

15 (a) Test for the significance of the overall regression involving both independent variables X 1 and X. X 1 and X F = (SSQ regression on X and X ) / 1 = (0.464) / = on DF =,1 (SSQ Residual) / 1 (0.0471) / 1 P value < Application of the null hypothesis model has led to an extremely unlikely result (p-value =.0001), prompting statistical rejection of the null hypothesis. The fitted linear model in X 1 and X explains statistically significantly more of the variability in log 10 (survival time) (Y) than is explained by Y (the intercept model) alone. (b) Test to see whether using X 1 alone significantly helps in predicting survival time. X 1 alone F = (0.3633) / 1 (0.1480) / 13 = (SSQ Regression on X ) / 1 1 = on DF = 1,13 (SSQ Residual) / 13 P value = Application of the null hypothesis model has led to an extremely unlikely result (p-value =.00008), prompting statistical rejection of the null hypothesis. The fitted linear model in X 1 explains statistically significantly more of the variability in log 10 (survival time) (Y) than is explained by Y (the intercept model) alone. (c) Test to see whether using X alone significantly helps in predicting survival time. X alone F = (SSQ Regression on X ) / 1 (SSQ Residual) / 13 P value = = (0.3367) / 1 = 5.07 on DF = 1,13 (0.1746) / 13 Application of the null hypothesis model has led to an extremely unlikely result (p-value =.0007), prompting statistical rejection of the null hypothesis. The fitted linear model in X explains statistically significantly more of the variability in log 10 (survival time) (Y) than is explained by Y (the intercept model) alone. sol_regression(1 of ) STATA.docx Page 15 of 18

16 (d) Compute R for each of the three models. TotalSSQ = ( ) R X1 and X = / = R (X1 alone) = / = ( ) R X alone = / = (e) Which independent predictor do you consider to be the best single predictor of survival time? Using just the criteria of the overall F test and comparison of containing X is better. 1 R, the single predictor model (f) Which model involving one or both of the independent predictors do you prefer and why? Partial F for comparing model with X1alone versus model with X 1 and X ( ΔRegressionSSQ)/ΔReg DF ( = = )/ -1 ( ResidualSSQ-model w X 1, X )/ResidualDF-model w X 1, X /1 = on DF=1,1 P-value= Choose model with both X 1 and X ( ) sol_regression(1 of ) STATA.docx Page 16 of 18

17 4. Using Stata, try your hand at reproducing the analysis of variance tables you worked with in problem #3. The following assumes you: a) downloaded larvae.dta; and b) opened it Stata commands describe and notes to get information on data set Tip You need to issue the command clear before you load larvae.dta.describe Contains data from /Users/carolbigelow/Desktop/larvae.dta obs: 15 PubHlth 640 Unit Regression - Larvae data vars: 4 10 Feb :36 size: 40 (_dta has notes) storage display value variable name type format label variable label id float %9.0g larva id y float %9.0g log10(survival) x1 float %9.0g log10(dose) x float %9.0g log10(weight) Sorted by:. notes _dta: 1. "Week 3 homework assignment exercises and 3" **** ) Fit model containing x1 alone using command regress yvariable xvariable. regress y x1 Source SS df MS Number of obs = F( 1, 13) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE =.1067 y Coef. Std. Err. t P> t [95% Conf. Interval] x _cons sol_regression(1 of ) STATA.docx Page 17 of 18

18 **** ) Model containing x alone. regress y x Source SS df MS Number of obs = F( 1, 13) = 5.07 Model Prob > F = Residual R-squared = Adj R-squared = 0.63 Total Root MSE = y Coef. Std. Err. t P> t [95% Conf. Interval] x _cons **** ) Model containing both x1 and x. regress y x1 x Source SS df MS Number of obs = F(, 1) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE =.0663 y Coef. Std. Err. t P> t [95% Conf. Interval] x x _cons sol_regression(1 of ) STATA.docx Page 18 of 18

Unit 2 Regression and Correlation Practice Problems. SOLUTIONS Version STATA

Unit 2 Regression and Correlation Practice Problems. SOLUTIONS Version STATA PubHlth 640. Regression and Correlation Page 1 of 19 Unit Regression and Correlation Practice Problems SOLUTIONS Version STATA 1. A regression analysis of measurements of a dependent variable Y on an independent

More information

BIOSTATS 640 Spring 2018 Unit 2. Regression and Correlation (Part 1 of 2) R Users

BIOSTATS 640 Spring 2018 Unit 2. Regression and Correlation (Part 1 of 2) R Users BIOSTATS 640 Spring 08 Unit. Regression and Correlation (Part of ) R Users Unit Regression and Correlation of - Practice Problems Solutions R Users. In this exercise, you will gain some practice doing

More information

Unit 2 Regression and Correlation WEEK 3 - Practice Problems. Due: Wednesday February 10, 2016

Unit 2 Regression and Correlation WEEK 3 - Practice Problems. Due: Wednesday February 10, 2016 BIOSTATS 640 Spring 2016 2. Regression and Correlation Page 1 of 5 Unit 2 Regression and Correlation WEEK 3 - Practice Problems Due: Wednesday February 10, 2016 1. A psychiatrist wants to know whether

More information

1 A Review of Correlation and Regression

1 A Review of Correlation and Regression 1 A Review of Correlation and Regression SW, Chapter 12 Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then

More information

STATISTICS 110/201 PRACTICE FINAL EXAM

STATISTICS 110/201 PRACTICE FINAL EXAM STATISTICS 110/201 PRACTICE FINAL EXAM Questions 1 to 5: There is a downloadable Stata package that produces sequential sums of squares for regression. In other words, the SS is built up as each variable

More information

General Linear Model (Chapter 4)

General Linear Model (Chapter 4) General Linear Model (Chapter 4) Outcome variable is considered continuous Simple linear regression Scatterplots OLS is BLUE under basic assumptions MSE estimates residual variance testing regression coefficients

More information

Empirical Application of Simple Regression (Chapter 2)

Empirical Application of Simple Regression (Chapter 2) Empirical Application of Simple Regression (Chapter 2) 1. The data file is House Data, which can be downloaded from my webpage. 2. Use stata menu File Import Excel Spreadsheet to read the data. Don t forget

More information

Statistical Modelling in Stata 5: Linear Models

Statistical Modelling in Stata 5: Linear Models Statistical Modelling in Stata 5: Linear Models Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 07/11/2017 Structure This Week What is a linear model? How good is my model? Does

More information

Essential of Simple regression

Essential of Simple regression Essential of Simple regression We use simple regression when we are interested in the relationship between two variables (e.g., x is class size, and y is student s GPA). For simplicity we assume the relationship

More information

Linear Modelling in Stata Session 6: Further Topics in Linear Modelling

Linear Modelling in Stata Session 6: Further Topics in Linear Modelling Linear Modelling in Stata Session 6: Further Topics in Linear Modelling Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 14/11/2017 This Week Categorical Variables Categorical

More information

SOCY5601 Handout 8, Fall DETECTING CURVILINEARITY (continued) CONDITIONAL EFFECTS PLOTS

SOCY5601 Handout 8, Fall DETECTING CURVILINEARITY (continued) CONDITIONAL EFFECTS PLOTS SOCY5601 DETECTING CURVILINEARITY (continued) CONDITIONAL EFFECTS PLOTS More on use of X 2 terms to detect curvilinearity: As we have said, a quick way to detect curvilinearity in the relationship between

More information

Lab 10 - Binary Variables

Lab 10 - Binary Variables Lab 10 - Binary Variables Spring 2017 Contents 1 Introduction 1 2 SLR on a Dummy 2 3 MLR with binary independent variables 3 3.1 MLR with a Dummy: different intercepts, same slope................. 4 3.2

More information

Acknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression

Acknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression INTRODUCTION TO CLINICAL RESEARCH Introduction to Linear Regression Karen Bandeen-Roche, Ph.D. July 17, 2012 Acknowledgements Marie Diener-West Rick Thompson ICTR Leadership / Team JHU Intro to Clinical

More information

Interpreting coefficients for transformed variables

Interpreting coefficients for transformed variables Interpreting coefficients for transformed variables! Recall that when both independent and dependent variables are untransformed, an estimated coefficient represents the change in the dependent variable

More information

Lecture 3: Multiple Regression. Prof. Sharyn O Halloran Sustainable Development U9611 Econometrics II

Lecture 3: Multiple Regression. Prof. Sharyn O Halloran Sustainable Development U9611 Econometrics II Lecture 3: Multiple Regression Prof. Sharyn O Halloran Sustainable Development Econometrics II Outline Basics of Multiple Regression Dummy Variables Interactive terms Curvilinear models Review Strategies

More information

Section Least Squares Regression

Section Least Squares Regression Section 2.3 - Least Squares Regression Statistics 104 Autumn 2004 Copyright c 2004 by Mark E. Irwin Regression Correlation gives us a strength of a linear relationship is, but it doesn t tell us what it

More information

Problem Set #3-Key. wage Coef. Std. Err. t P> t [95% Conf. Interval]

Problem Set #3-Key. wage Coef. Std. Err. t P> t [95% Conf. Interval] Problem Set #3-Key Sonoma State University Economics 317- Introduction to Econometrics Dr. Cuellar 1. Use the data set Wage1.dta to answer the following questions. a. For the regression model Wage i =

More information

Using SPSS for One Way Analysis of Variance

Using SPSS for One Way Analysis of Variance Using SPSS for One Way Analysis of Variance This tutorial will show you how to use SPSS version 12 to perform a one-way, between- subjects analysis of variance and related post-hoc tests. This tutorial

More information

Correlation and Simple Linear Regression

Correlation and Simple Linear Regression Correlation and Simple Linear Regression Sasivimol Rattanasiri, Ph.D Section for Clinical Epidemiology and Biostatistics Ramathibodi Hospital, Mahidol University E-mail: sasivimol.rat@mahidol.ac.th 1 Outline

More information

Lab 6 - Simple Regression

Lab 6 - Simple Regression Lab 6 - Simple Regression Spring 2017 Contents 1 Thinking About Regression 2 2 Regression Output 3 3 Fitted Values 5 4 Residuals 6 5 Functional Forms 8 Updated from Stata tutorials provided by Prof. Cichello

More information

Answer all questions from part I. Answer two question from part II.a, and one question from part II.b.

Answer all questions from part I. Answer two question from part II.a, and one question from part II.b. B203: Quantitative Methods Answer all questions from part I. Answer two question from part II.a, and one question from part II.b. Part I: Compulsory Questions. Answer all questions. Each question carries

More information

Name: Biostatistics 1 st year Comprehensive Examination: Applied in-class exam. June 8 th, 2016: 9am to 1pm

Name: Biostatistics 1 st year Comprehensive Examination: Applied in-class exam. June 8 th, 2016: 9am to 1pm Name: Biostatistics 1 st year Comprehensive Examination: Applied in-class exam June 8 th, 2016: 9am to 1pm Instructions: 1. This is exam is to be completed independently. Do not discuss your work with

More information

1 Introduction to Minitab

1 Introduction to Minitab 1 Introduction to Minitab Minitab is a statistical analysis software package. The software is freely available to all students and is downloadable through the Technology Tab at my.calpoly.edu. When you

More information

LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION

LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION In this lab you will learn how to use Excel to display the relationship between two quantitative variables, measure the strength and direction of the

More information

STAT 350 Final (new Material) Review Problems Key Spring 2016

STAT 350 Final (new Material) Review Problems Key Spring 2016 1. The editor of a statistics textbook would like to plan for the next edition. A key variable is the number of pages that will be in the final version. Text files are prepared by the authors using LaTeX,

More information

Problem Set #5-Key Sonoma State University Dr. Cuellar Economics 317- Introduction to Econometrics

Problem Set #5-Key Sonoma State University Dr. Cuellar Economics 317- Introduction to Econometrics Problem Set #5-Key Sonoma State University Dr. Cuellar Economics 317- Introduction to Econometrics C1.1 Use the data set Wage1.dta to answer the following questions. Estimate regression equation wage =

More information

Unit 2. Regression and Correlation

Unit 2. Regression and Correlation BIOSTATS 640 - Spring 016. Regression and Correlation Page 1 of 88 Unit. Regression and Correlation Don t let us quarrel, the White Queen said in an anxious tone. What is the cause of lighting? The cause

More information

Warwick Economics Summer School Topics in Microeconometrics Instrumental Variables Estimation

Warwick Economics Summer School Topics in Microeconometrics Instrumental Variables Estimation Warwick Economics Summer School Topics in Microeconometrics Instrumental Variables Estimation Michele Aquaro University of Warwick This version: July 21, 2016 1 / 31 Reading material Textbook: Introductory

More information

sociology 362 regression

sociology 362 regression sociology 36 regression Regression is a means of modeling how the conditional distribution of a response variable (say, Y) varies for different values of one or more independent explanatory variables (say,

More information

ESTIMATING AVERAGE TREATMENT EFFECTS: REGRESSION DISCONTINUITY DESIGNS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics

ESTIMATING AVERAGE TREATMENT EFFECTS: REGRESSION DISCONTINUITY DESIGNS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics ESTIMATING AVERAGE TREATMENT EFFECTS: REGRESSION DISCONTINUITY DESIGNS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009 1. Introduction 2. The Sharp RD Design 3.

More information

Soc 63993, Homework #7 Answer Key: Nonlinear effects/ Intro to path analysis

Soc 63993, Homework #7 Answer Key: Nonlinear effects/ Intro to path analysis Soc 63993, Homework #7 Answer Key: Nonlinear effects/ Intro to path analysis Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised February 20, 2015 Problem 1. The files

More information

Unit 9 Regression and Correlation Homework #14 (Unit 9 Regression and Correlation) SOLUTIONS. X = cigarette consumption (per capita in 1930)

Unit 9 Regression and Correlation Homework #14 (Unit 9 Regression and Correlation) SOLUTIONS. X = cigarette consumption (per capita in 1930) BIOSTATS 540 Fall 2015 Introductory Biostatistics Page 1 of 10 Unit 9 Regression and Correlation Homework #14 (Unit 9 Regression and Correlation) SOLUTIONS Consider the following study of the relationship

More information

Economics 326 Methods of Empirical Research in Economics. Lecture 14: Hypothesis testing in the multiple regression model, Part 2

Economics 326 Methods of Empirical Research in Economics. Lecture 14: Hypothesis testing in the multiple regression model, Part 2 Economics 326 Methods of Empirical Research in Economics Lecture 14: Hypothesis testing in the multiple regression model, Part 2 Vadim Marmer University of British Columbia May 5, 2010 Multiple restrictions

More information

****Lab 4, Feb 4: EDA and OLS and WLS

****Lab 4, Feb 4: EDA and OLS and WLS ****Lab 4, Feb 4: EDA and OLS and WLS ------- log: C:\Documents and Settings\Default\Desktop\LDA\Data\cows_Lab4.log log type: text opened on: 4 Feb 2004, 09:26:19. use use "Z:\LDA\DataLDA\cowsP.dta", clear.

More information

Lab 07 Introduction to Econometrics

Lab 07 Introduction to Econometrics Lab 07 Introduction to Econometrics Learning outcomes for this lab: Introduce the different typologies of data and the econometric models that can be used Understand the rationale behind econometrics Understand

More information

At this point, if you ve done everything correctly, you should have data that looks something like:

At this point, if you ve done everything correctly, you should have data that looks something like: This homework is due on July 19 th. Economics 375: Introduction to Econometrics Homework #4 1. One tool to aid in understanding econometrics is the Monte Carlo experiment. A Monte Carlo experiment allows

More information

Problem Set 1 ANSWERS

Problem Set 1 ANSWERS Economics 20 Prof. Patricia M. Anderson Problem Set 1 ANSWERS Part I. Multiple Choice Problems 1. If X and Z are two random variables, then E[X-Z] is d. E[X] E[Z] This is just a simple application of one

More information

Inference with Simple Regression

Inference with Simple Regression 1 Introduction Inference with Simple Regression Alan B. Gelder 06E:071, The University of Iowa 1 Moving to infinite means: In this course we have seen one-mean problems, twomean problems, and problems

More information

ECO220Y Simple Regression: Testing the Slope

ECO220Y Simple Regression: Testing the Slope ECO220Y Simple Regression: Testing the Slope Readings: Chapter 18 (Sections 18.3-18.5) Winter 2012 Lecture 19 (Winter 2012) Simple Regression Lecture 19 1 / 32 Simple Regression Model y i = β 0 + β 1 x

More information

y response variable x 1, x 2,, x k -- a set of explanatory variables

y response variable x 1, x 2,, x k -- a set of explanatory variables 11. Multiple Regression and Correlation y response variable x 1, x 2,, x k -- a set of explanatory variables In this chapter, all variables are assumed to be quantitative. Chapters 12-14 show how to incorporate

More information

Introductory Econometrics. Lecture 13: Hypothesis testing in the multiple regression model, Part 1

Introductory Econometrics. Lecture 13: Hypothesis testing in the multiple regression model, Part 1 Introductory Econometrics Lecture 13: Hypothesis testing in the multiple regression model, Part 1 Jun Ma School of Economics Renmin University of China October 19, 2016 The model I We consider the classical

More information

Business Statistics. Lecture 9: Simple Regression

Business Statistics. Lecture 9: Simple Regression Business Statistics Lecture 9: Simple Regression 1 On to Model Building! Up to now, class was about descriptive and inferential statistics Numerical and graphical summaries of data Confidence intervals

More information

Analysis of repeated measurements (KLMED8008)

Analysis of repeated measurements (KLMED8008) Analysis of repeated measurements (KLMED8008) Eirik Skogvoll, MD PhD Professor and Consultant Institute of Circulation and Medical Imaging Dept. of Anaesthesiology and Emergency Medicine 1 Day 2 Practical

More information

1 Independent Practice: Hypothesis tests for one parameter:

1 Independent Practice: Hypothesis tests for one parameter: 1 Independent Practice: Hypothesis tests for one parameter: Data from the Indian DHS survey from 2006 includes a measure of autonomy of the women surveyed (a scale from 0-10, 10 being the most autonomous)

More information

ECON3150/4150 Spring 2015

ECON3150/4150 Spring 2015 ECON3150/4150 Spring 2015 Lecture 3&4 - The linear regression model Siv-Elisabeth Skjelbred University of Oslo January 29, 2015 1 / 67 Chapter 4 in S&W Section 17.1 in S&W (extended OLS assumptions) 2

More information

CHAPTER 10. Regression and Correlation

CHAPTER 10. Regression and Correlation CHAPTER 10 Regression and Correlation In this Chapter we assess the strength of the linear relationship between two continuous variables. If a significant linear relationship is found, the next step would

More information

Assumptions, Diagnostics, and Inferences for the Simple Linear Regression Model with Normal Residuals

Assumptions, Diagnostics, and Inferences for the Simple Linear Regression Model with Normal Residuals Assumptions, Diagnostics, and Inferences for the Simple Linear Regression Model with Normal Residuals 4 December 2018 1 The Simple Linear Regression Model with Normal Residuals In previous class sessions,

More information

Homework 2: Simple Linear Regression

Homework 2: Simple Linear Regression STAT 4385 Applied Regression Analysis Homework : Simple Linear Regression (Simple Linear Regression) Thirty (n = 30) College graduates who have recently entered the job market. For each student, the CGPA

More information

ECON Introductory Econometrics. Lecture 5: OLS with One Regressor: Hypothesis Tests

ECON Introductory Econometrics. Lecture 5: OLS with One Regressor: Hypothesis Tests ECON4150 - Introductory Econometrics Lecture 5: OLS with One Regressor: Hypothesis Tests Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter 5 Lecture outline 2 Testing Hypotheses about one

More information

2.1. Consider the following production function, known in the literature as the transcendental production function (TPF).

2.1. Consider the following production function, known in the literature as the transcendental production function (TPF). CHAPTER Functional Forms of Regression Models.1. Consider the following production function, known in the literature as the transcendental production function (TPF). Q i B 1 L B i K i B 3 e B L B K 4 i

More information

Quantitative Methods Final Exam (2017/1)

Quantitative Methods Final Exam (2017/1) Quantitative Methods Final Exam (2017/1) 1. Please write down your name and student ID number. 2. Calculator is allowed during the exam, but DO NOT use a smartphone. 3. List your answers (together with

More information

Six Sigma Black Belt Study Guides

Six Sigma Black Belt Study Guides Six Sigma Black Belt Study Guides 1 www.pmtutor.org Powered by POeT Solvers Limited. Analyze Correlation and Regression Analysis 2 www.pmtutor.org Powered by POeT Solvers Limited. Variables and relationships

More information

Inference for Regression Simple Linear Regression

Inference for Regression Simple Linear Regression Inference for Regression Simple Linear Regression IPS Chapter 10.1 2009 W.H. Freeman and Company Objectives (IPS Chapter 10.1) Simple linear regression p Statistical model for linear regression p Estimating

More information

Outline. Linear OLS Models vs: Linear Marginal Models Linear Conditional Models. Random Intercepts Random Intercepts & Slopes

Outline. Linear OLS Models vs: Linear Marginal Models Linear Conditional Models. Random Intercepts Random Intercepts & Slopes Lecture 2.1 Basic Linear LDA 1 Outline Linear OLS Models vs: Linear Marginal Models Linear Conditional Models Random Intercepts Random Intercepts & Slopes Cond l & Marginal Connections Empirical Bayes

More information

1: a b c d e 2: a b c d e 3: a b c d e 4: a b c d e 5: a b c d e. 6: a b c d e 7: a b c d e 8: a b c d e 9: a b c d e 10: a b c d e

1: a b c d e 2: a b c d e 3: a b c d e 4: a b c d e 5: a b c d e. 6: a b c d e 7: a b c d e 8: a b c d e 9: a b c d e 10: a b c d e Economics 102: Analysis of Economic Data Cameron Spring 2016 Department of Economics, U.C.-Davis Final Exam (A) Tuesday June 7 Compulsory. Closed book. Total of 58 points and worth 45% of course grade.

More information

Inferences for Regression

Inferences for Regression Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In

More information

Statistics 512: Solution to Homework#11. Problems 1-3 refer to the soybean sausage dataset of Problem 20.8 (ch21pr08.dat).

Statistics 512: Solution to Homework#11. Problems 1-3 refer to the soybean sausage dataset of Problem 20.8 (ch21pr08.dat). Statistics 512: Solution to Homework#11 Problems 1-3 refer to the soybean sausage dataset of Problem 20.8 (ch21pr08.dat). 1. Perform the two-way ANOVA without interaction for this model. Use the results

More information

Working with Stata Inference on the mean

Working with Stata Inference on the mean Working with Stata Inference on the mean Nicola Orsini Biostatistics Team Department of Public Health Sciences Karolinska Institutet Dataset: hyponatremia.dta Motivating example Outcome: Serum sodium concentration,

More information

Model Building Chap 5 p251

Model Building Chap 5 p251 Model Building Chap 5 p251 Models with one qualitative variable, 5.7 p277 Example 4 Colours : Blue, Green, Lemon Yellow and white Row Blue Green Lemon Insects trapped 1 0 0 1 45 2 0 0 1 59 3 0 0 1 48 4

More information

4:3 LEC - PLANNED COMPARISONS AND REGRESSION ANALYSES

4:3 LEC - PLANNED COMPARISONS AND REGRESSION ANALYSES 4:3 LEC - PLANNED COMPARISONS AND REGRESSION ANALYSES FOR SINGLE FACTOR BETWEEN-S DESIGNS Planned or A Priori Comparisons We previously showed various ways to test all possible pairwise comparisons for

More information

Final Exam. Question 1 (20 points) 2 (25 points) 3 (30 points) 4 (25 points) 5 (10 points) 6 (40 points) Total (150 points) Bonus question (10)

Final Exam. Question 1 (20 points) 2 (25 points) 3 (30 points) 4 (25 points) 5 (10 points) 6 (40 points) Total (150 points) Bonus question (10) Name Economics 170 Spring 2004 Honor pledge: I have neither given nor received aid on this exam including the preparation of my one page formula list and the preparation of the Stata assignment for the

More information

sociology 362 regression

sociology 362 regression sociology 36 regression Regression is a means of studying how the conditional distribution of a response variable (say, Y) varies for different values of one or more independent explanatory variables (say,

More information

MULTIPLE LINEAR REGRESSION IN MINITAB

MULTIPLE LINEAR REGRESSION IN MINITAB MULTIPLE LINEAR REGRESSION IN MINITAB This document shows a complicated Minitab multiple regression. It includes descriptions of the Minitab commands, and the Minitab output is heavily annotated. Comments

More information

Week 3: Simple Linear Regression

Week 3: Simple Linear Regression Week 3: Simple Linear Regression Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ALL RIGHTS RESERVED 1 Outline

More information

sociology sociology Scatterplots Quantitative Research Methods: Introduction to correlation and regression Age vs Income

sociology sociology Scatterplots Quantitative Research Methods: Introduction to correlation and regression Age vs Income Scatterplots Quantitative Research Methods: Introduction to correlation and regression Scatterplots can be considered as interval/ratio analogue of cross-tabs: arbitrarily many values mapped out in -dimensions

More information

Applied Statistics and Econometrics

Applied Statistics and Econometrics Applied Statistics and Econometrics Lecture 6 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 53 Outline of Lecture 6 1 Omitted variable bias (SW 6.1) 2 Multiple

More information

1 Warm-Up: 2 Adjusted R 2. Introductory Applied Econometrics EEP/IAS 118 Spring Sylvan Herskowitz Section #

1 Warm-Up: 2 Adjusted R 2. Introductory Applied Econometrics EEP/IAS 118 Spring Sylvan Herskowitz Section # Introductory Applied Econometrics EEP/IAS 118 Spring 2015 Sylvan Herskowitz Section #10 4-1-15 1 Warm-Up: Remember that exam you took before break? We had a question that said this A researcher wants to

More information

Example: Poisondata. 22s:152 Applied Linear Regression. Chapter 8: ANOVA

Example: Poisondata. 22s:152 Applied Linear Regression. Chapter 8: ANOVA s:5 Applied Linear Regression Chapter 8: ANOVA Two-way ANOVA Used to compare populations means when the populations are classified by two factors (or categorical variables) For example sex and occupation

More information

Unit 1 Review of BIOSTATS 540 Practice Problems SOLUTIONS - Stata Users

Unit 1 Review of BIOSTATS 540 Practice Problems SOLUTIONS - Stata Users BIOSTATS 640 Spring 2017 Review of Introductory Biostatistics STATA solutions Page 1 of 16 Unit 1 Review of BIOSTATS 540 Practice Problems SOLUTIONS - Stata Users #1. The following table lists length of

More information

Practice 2SLS with Artificial Data Part 1

Practice 2SLS with Artificial Data Part 1 Practice 2SLS with Artificial Data Part 1 Yona Rubinstein July 2016 Yona Rubinstein (LSE) Practice 2SLS with Artificial Data Part 1 07/16 1 / 16 Practice with Artificial Data In this note we use artificial

More information

CHAPTER EIGHT Linear Regression

CHAPTER EIGHT Linear Regression 7 CHAPTER EIGHT Linear Regression 8. Scatter Diagram Example 8. A chemical engineer is investigating the effect of process operating temperature ( x ) on product yield ( y ). The study results in the following

More information

Autocorrelation. Think of autocorrelation as signifying a systematic relationship between the residuals measured at different points in time

Autocorrelation. Think of autocorrelation as signifying a systematic relationship between the residuals measured at different points in time Autocorrelation Given the model Y t = b 0 + b 1 X t + u t Think of autocorrelation as signifying a systematic relationship between the residuals measured at different points in time This could be caused

More information

Testing methodology. It often the case that we try to determine the form of the model on the basis of data

Testing methodology. It often the case that we try to determine the form of the model on the basis of data Testing methodology It often the case that we try to determine the form of the model on the basis of data The simplest case: we try to determine the set of explanatory variables in the model Testing for

More information

Question 1a 1b 1c 1d 1e 2a 2b 2c 2d 2e 2f 3a 3b 3c 3d 3e 3f M ult: choice Points

Question 1a 1b 1c 1d 1e 2a 2b 2c 2d 2e 2f 3a 3b 3c 3d 3e 3f M ult: choice Points Economics 102: Analysis of Economic Data Cameron Spring 2016 May 12 Department of Economics, U.C.-Davis Second Midterm Exam (Version A) Compulsory. Closed book. Total of 30 points and worth 22.5% of course

More information

Passing-Bablok Regression for Method Comparison

Passing-Bablok Regression for Method Comparison Chapter 313 Passing-Bablok Regression for Method Comparison Introduction Passing-Bablok regression for method comparison is a robust, nonparametric method for fitting a straight line to two-dimensional

More information

Binary Dependent Variables

Binary Dependent Variables Binary Dependent Variables In some cases the outcome of interest rather than one of the right hand side variables - is discrete rather than continuous Binary Dependent Variables In some cases the outcome

More information

Sociology Exam 2 Answer Key March 30, 2012

Sociology Exam 2 Answer Key March 30, 2012 Sociology 63993 Exam 2 Answer Key March 30, 2012 I. True-False. (20 points) Indicate whether the following statements are true or false. If false, briefly explain why. 1. A researcher has constructed scales

More information

Handout 12. Endogeneity & Simultaneous Equation Models

Handout 12. Endogeneity & Simultaneous Equation Models Handout 12. Endogeneity & Simultaneous Equation Models In which you learn about another potential source of endogeneity caused by the simultaneous determination of economic variables, and learn how to

More information

9) Time series econometrics

9) Time series econometrics 30C00200 Econometrics 9) Time series econometrics Timo Kuosmanen Professor Management Science http://nomepre.net/index.php/timokuosmanen 1 Macroeconomic data: GDP Inflation rate Examples of time series

More information

BE640 Intermediate Biostatistics 2. Regression and Correlation. Simple Linear Regression Software: SAS. Emergency Calls to the New York Auto Club

BE640 Intermediate Biostatistics 2. Regression and Correlation. Simple Linear Regression Software: SAS. Emergency Calls to the New York Auto Club BE640 Intermediate Biostatistics 2. Regression and Correlation Simple Linear Regression Software: SAS Emergency Calls to the New York Auto Club Source: Chatterjee, S; Handcock MS and Simonoff JS A Casebook

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

BIOL Biometry LAB 6 - SINGLE FACTOR ANOVA and MULTIPLE COMPARISON PROCEDURES

BIOL Biometry LAB 6 - SINGLE FACTOR ANOVA and MULTIPLE COMPARISON PROCEDURES BIOL 458 - Biometry LAB 6 - SINGLE FACTOR ANOVA and MULTIPLE COMPARISON PROCEDURES PART 1: INTRODUCTION TO ANOVA Purpose of ANOVA Analysis of Variance (ANOVA) is an extremely useful statistical method

More information

Lecture 7: OLS with qualitative information

Lecture 7: OLS with qualitative information Lecture 7: OLS with qualitative information Dummy variables Dummy variable: an indicator that says whether a particular observation is in a category or not Like a light switch: on or off Most useful values:

More information

especially with continuous

especially with continuous Handling interactions in Stata, especially with continuous predictors Patrick Royston & Willi Sauerbrei UK Stata Users meeting, London, 13-14 September 2012 Interactions general concepts General idea of

More information

Empirical Application of Panel Data Regression

Empirical Application of Panel Data Regression Empirical Application of Panel Data Regression 1. We use Fatality data, and we are interested in whether rising beer tax rate can help lower traffic death. So the dependent variable is traffic death, while

More information

A discussion on multiple regression models

A discussion on multiple regression models A discussion on multiple regression models In our previous discussion of simple linear regression, we focused on a model in which one independent or explanatory variable X was used to predict the value

More information

Lecture (chapter 13): Association between variables measured at the interval-ratio level

Lecture (chapter 13): Association between variables measured at the interval-ratio level Lecture (chapter 13): Association between variables measured at the interval-ratio level Ernesto F. L. Amaral April 9 11, 2018 Advanced Methods of Social Research (SOCI 420) Source: Healey, Joseph F. 2015.

More information

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises LINEAR REGRESSION ANALYSIS MODULE XVI Lecture - 44 Exercises Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Exercise 1 The following data has been obtained on

More information

Econometrics Midterm Examination Answers

Econometrics Midterm Examination Answers Econometrics Midterm Examination Answers March 4, 204. Question (35 points) Answer the following short questions. (i) De ne what is an unbiased estimator. Show that X is an unbiased estimator for E(X i

More information

Statistical Inference with Regression Analysis

Statistical Inference with Regression Analysis Introductory Applied Econometrics EEP/IAS 118 Spring 2015 Steven Buck Lecture #13 Statistical Inference with Regression Analysis Next we turn to calculating confidence intervals and hypothesis testing

More information

Monday 7 th Febraury 2005

Monday 7 th Febraury 2005 Monday 7 th Febraury 2 Analysis of Pigs data Data: Body weights of 48 pigs at 9 successive follow-up visits. This is an equally spaced data. It is always a good habit to reshape the data, so we can easily

More information

Nonlinear relationships Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised February 20, 2015

Nonlinear relationships Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised February 20, 2015 Nonlinear relationships Richard Williams, University of Notre Dame, https://www.nd.edu/~rwilliam/ Last revised February, 5 Sources: Berry & Feldman s Multiple Regression in Practice 985; Pindyck and Rubinfeld

More information

ECON3150/4150 Spring 2016

ECON3150/4150 Spring 2016 ECON3150/4150 Spring 2016 Lecture 4 - The linear regression model Siv-Elisabeth Skjelbred University of Oslo Last updated: January 26, 2016 1 / 49 Overview These lecture slides covers: The linear regression

More information

Inference for Regression Inference about the Regression Model and Using the Regression Line

Inference for Regression Inference about the Regression Model and Using the Regression Line Inference for Regression Inference about the Regression Model and Using the Regression Line PBS Chapter 10.1 and 10.2 2009 W.H. Freeman and Company Objectives (PBS Chapter 10.1 and 10.2) Inference about

More information

EDF 7405 Advanced Quantitative Methods in Educational Research MULTR.SAS

EDF 7405 Advanced Quantitative Methods in Educational Research MULTR.SAS EDF 7405 Advanced Quantitative Methods in Educational Research MULTR.SAS The data used in this example describe teacher and student behavior in 8 classrooms. The variables are: Y percentage of interventions

More information

SplineLinear.doc 1 # 9 Last save: Saturday, 9. December 2006

SplineLinear.doc 1 # 9 Last save: Saturday, 9. December 2006 SplineLinear.doc 1 # 9 Problem:... 2 Objective... 2 Reformulate... 2 Wording... 2 Simulating an example... 3 SPSS 13... 4 Substituting the indicator function... 4 SPSS-Syntax... 4 Remark... 4 Result...

More information

STAT 350. Assignment 4

STAT 350. Assignment 4 STAT 350 Assignment 4 1. For the Mileage data in assignment 3 conduct a residual analysis and report your findings. I used the full model for this since my answers to assignment 3 suggested we needed the

More information

MORE ON MULTIPLE REGRESSION

MORE ON MULTIPLE REGRESSION DEPARTMENT OF POLITICAL SCIENCE AND INTERNATIONAL RELATIONS Posc/Uapp 816 MORE ON MULTIPLE REGRESSION I. AGENDA: A. Multiple regression 1. Categorical variables with more than two categories 2. Interaction

More information

1. The shoe size of five randomly selected men in the class is 7, 7.5, 6, 6.5 the shoe size of 4 randomly selected women is 6, 5.

1. The shoe size of five randomly selected men in the class is 7, 7.5, 6, 6.5 the shoe size of 4 randomly selected women is 6, 5. Economics 3 Introduction to Econometrics Winter 2004 Professor Dobkin Name Final Exam (Sample) You must answer all the questions. The exam is closed book and closed notes you may use calculators. You must

More information

LEARNING WITH MINITAB Chapter 12 SESSION FIVE: DESIGNING AN EXPERIMENT

LEARNING WITH MINITAB Chapter 12 SESSION FIVE: DESIGNING AN EXPERIMENT LEARNING WITH MINITAB Chapter 12 SESSION FIVE: DESIGNING AN EXPERIMENT Laura M Williams, RN, CLNC, MSN MOREHEAD STATE UNIVERSITY IET603: STATISTICAL QUALITY ASSURANCE IN SCIENCE AND TECHNOLOGY DR. AHMAD

More information