MULTIPLE REGRESSION. part 1. Christopher Adolph. and. Department of Political Science. Center for Statistics and the Social Sciences

Similar documents
Supplementary Appendix for. Version: February 3, 2014

Situation on the death penalty in the world. UNGA Vote 2012 Resolutio n 67/176. UNGA Vote 2010 Resolutio n 65/206. UNGA Vote 2008 Resolutio n 63/168

Appendices. Please note that Internet resources are of a time-sensitive nature and URL addresses may often change or be deleted.

Does socio-economic indicator influent ICT variable? II. Method of data collection, Objective and data gathered

PROPOSED BUDGET FOR THE PROGRAMME OF WORK OF THE CONVENTION ON BIOLOGICAL DIVERSITY FOR THE BIENNIUM Corrigendum

GINA Children. II Global Index for humanitarian Needs Assessment (GINA 2004) Sheet N V V VI VIII IX X XI XII XII HDR2003 HDR 2003 UNDP

Most Recent Periodic Report Initial State Report. Next Periodic Accession/Ratification. Report Publication Publication. Report Due

Country of Citizenship, College-Wide - All Students, Fall 2014

PRECURSORS. Pseudoephedrine preparations 3,4-MDP-2-P a P-2-P b. Ephedrine

Mexico, Central America and the Caribbean South America

Patent Cooperation Treaty (PCT) Working Group

The Chemical Weapons Convention, Biological and Toxin Weapons Convention, Geneva Protocol

Programme budget for the biennium Programme budget for the biennium

2001 Environmental Sustainability Index

PROPOSED BUDGET FOR THE PROGRAMME OF WORK OF THE CARTAGENA PROTOCOL ON BIOSAFETY FOR THE BIENNIUM Corrigendum

SUGAR YEAR BOOK INTERNATIONAL SUGAR ORGANIZATION 1 CANADA SQUARE, CANARY WHARF, LONDON, E14 5AA.

04 June Dim A W V Total. Total Laser Met

Government Size and Economic Growth: A new Framework and Some Evidence from Cross-Section and Time-Series Data

Climate variability and international migration: an empirical analysis

North-South Gap Mapping Assignment Country Classification / Statistical Analysis

Countries in Order of Increasing Per Capita Income, 2000

Immigrant Status and Period of Immigration Newfoundland and Labrador 2001 Census

Export Destinations and Input Prices. Appendix A

Dimensionality Reduction and Visualization

Velocity Virtual Rate Card 2018

Erratum to: Policies against human trafficking: the role of religion and political institutions

International legal instruments related to the prevention and suppression of international terrorism

Delegations School GA Opening Speech 1 SPC Opening Speech 2 SC Total Amnesty International Agora Sant Cugat Botswana Agora Sant Cugat 1 Y 1 Y

Appendix A. ICT Core Indicators: Definitions

AT&T Phone. International Calling Rates for Phone International Plus, Phone 200 and Phone Unlimited North America

DISTILLED SPIRITS - EXPORTS BY VALUE DECEMBER 2017

Report by the Secretariat

Yodekoo Business Pro Tariff (Including Quickstart Out Of Bundle)

About the Authors Geography and Tourism: The Attraction of Place p. 1 The Elements of Geography p. 2 Themes of Geography p. 4 Location: The Where of

November 2014 CL 150/LIM 2 COUNCIL. Hundred and Fiftieth Session. Rome, 1-5 December 2014

Fall International Student Enrollment Statistics

Hundred and Fifty-sixth Session. Rome, 3-7 November Status of Current Assessments and Arrears as at 30 June 2014

Fall International Student Enrollment & Scholar Statistics

COMMITTEE ON FISHERIES

Natural Resource Management Indicators for the Least Developed Countries

COUNCIL. Hundred and Fifty-fifth Session. Rome, 5-9 December Status of Current Assessments and Arrears as at 29 November 2016.

Spring 2007 International Student Enrollment by Country, Educational Level, and Gender

Tables of Results 21

Solow model: Convergence

COUNCIL. Hundred and Fifty-eighth Session. Rome, 4-8 December Status of Current Assessments and Arrears as at 27 November 2017

Governments that have requested pre-export notifications pursuant to article 12, paragraph 10 (a), of the 1988 Convention

Fall International Student Enrollment Statistics

Canadian Imports of Honey

Swaziland Posts and Telecommunications Corporation (SPTC)---International Call Charges

Research Exercise 1: Instructions

DISTILLED SPIRITS - IMPORTS BY VALUE DECEMBER 2017

ProxiWorld tariffs & zones 2016

International Student Enrollment Fall 2018 By CIP Code, Country of Citizenship, and Education Level Harpur College of Arts and Sciences

DISTILLED SPIRITS - IMPORTS BY VOLUME DECEMBER 2017


Marketing Report: Traffic Demographics (Monthly Comprehensive)

University of Oklahoma, Norman Campus International Student Report Fall 2014

LAND INFO Worldwide Mapping, LLC 1 of 5

Human resources: update

Travel and Diabetes Survey

Demography, Time and Space

Annex 6. Variable Descriptions and Data

CALLS FROM HOME RESIDENTIAL TARIFFS. Prices effective from 3rd February _03/02/09_Residential_Cable _Version 2

2017 Source of Foreign Income Earned By Fund

Overview of past procurement of Solar Direct Drive (SDD) refrigeration systems and UNICEF SD support in Cold Chain

natural gas World Oil and Gas Review

Online APPENDIX. Further Results for: The Effects of the International Security Environment on National Military Expenditures: A Multi-Country Study

GEODATA AVAILABILITY. 50% off. Order RegioGraph by October 31, 2018 and save 50% on maps for an additional country of your choice!

ICC Rev August 2010 Original: English. Agreement. International Coffee Council 105 th Session September 2010 London, England

A Note on Human Development Indices with Income Equalities

Chapter 8 - Appendixes

Bilateral Labour Agreements, 2004

International Rates. RATE per minute use

Table 8c: Total endemic and threatened endemic species in each country (totals by taxonomic group): PLANTS

Required answers: 0 Allowed answers: 0. [Code = 1] [TextBox] Required answers: 0 Allowed answers: 1. Required answers: 1 Allowed answers: 7

Does Corruption Persist In Sub-Saharan Africa?

International Trusteeship: External Authority in Areas of Limited Statehood

The World Trade Network

INTERNATIONAL TELECOMMUNICATION UNION SERIES T: TERMINALS FOR TELEMATIC SERVICES

Fertility and population policy

W o r l d O i l a n d G a s R e v i e w

Africa, Asia and the Pacific, Latin America and the Caribbean. Africa, Asia and the Pacific, Latin America and the Caribbean

Table 8c: Total endemic and threatened endemic species in each country (totals by taxonomic group): PLANTS

Annex to the ITU Operational Bulletin No III.2012 TSB TELECOMMUNICATION STANDARDIZATION BUREAU OF ITU LEGAL TIME 2012

LEGAL TIME 2015 TSB TELECOMMUNICATION STANDARDIZATION BUREAU OF ITU. Annex to the ITU Operational Bulletin No IV.2015.

GEF Corporate Scorecard. May 2018

Duke Visa Services Open Doors Report on International Educational Exchange Annual Census of International Students Fall 2017

Developing a Conflict Vulnerability Index

Big Data at BBVA Research using BigQuery

SuperPack -Light. Data Sources. SuperPack-Light is for sophisticated weather data users who require large volumes of high quality world

Nigerian Capital Importation QUARTER THREE 2016

trade liberalisation 1. Introduction CREATE TRADE FOR SOUTH AFRICA?

Effects of Business-as-usual anthropogenic emissions on air quality

Office of Budget & Planning 311 Thomas Boyd Hall Baton Rouge, LA Telephone 225/ Fax 225/

Internet Utilisation in 112 Countries. Robust Regression Diagnostics Project. Submitted by : xxxx xxxx. Submitted to: Prof. Ali S.

United States. Mexico

2005 Environmental Sustainability Index Benchmarking National Environmental Stewardship. Appendix C Variable Profiles and Data

Landline & Calls Corporate Pricing Information

Scaling Seed Kits Through Household Gardens

Countries, World, Universe Maps of world, continents. CSRidentity.com

Transcription:

CSSS/SOC/STAT 321 Case-Based Statistics I MULTIPLE REGRESSION part 1 Christopher Adolph Department of Political Science and Center for Statistics and the Social Sciences University of Washington, Seattle Chris Adolph (University of Washington) Multiple Regression, part 1 1 / 33

Motivating Example: Cross-national determinants of fertility We have cross-national data from several sources: Fertility The average number of children born per adult female, in 2000 (United Nations) Education Ratio The ratio of girls to boys in primary and secondary education, in 2000 (Word Bank Development Indicators) GDP per capita Economic activity in thousands of dollars, purchasing power parity in 2000 (Penn World Tables) Agricultural Labor Percentage of the labor force working in agriculture in 2000 (International Labor Organization) Note the addition of a fourth variable Chris Adolph (University of Washington) Multiple Regression, part 1 2 / 33

Motivating Example: Cross-national determinants of fertility All three independent variables might cause the fertility rate More agricultural nations may have more children to bolster the labor force on family farms Letʼs look at the univariate summaries & bivariate regression results for this new covariate Chris Adolph (University of Washington) Multiple Regression, part 1 3 / 33

Summary of Univariate Distribution: Agricultural Labor Frequency 0 10 20 30 40 Median = 8.1% Mean = 16.0 % std dev = 17.9% 0 20 40 60 80 Agriculture workers as % of labor force Chris Adolph (University of Washington) Multiple Regression, part 1 4 / 33

Summary of Univariate Distribution: Agricultural Labor Frequency 0 10 20 30 40 Median = 8.1% Mean = 16.0 % std dev = 17.9% How would you describe this distribution? 0 20 40 60 80 Agriculture workers as % of labor force Chris Adolph (University of Washington) Multiple Regression, part 1 4 / 33

. Regression of Fertility on Agricultural Labor. Variable Estimates se t-stat p-value. Intercept 1.83 (0.15) 12.34 <0.001 Agricultural Labor 0.02 (0.01) 3.52 <0.001 N 72 R 2 0.15 RMSE 0.93 How do we read this table? Chris Adolph (University of Washington) Multiple Regression, part 1 5 / 33

. Regression of Fertility on Agricultural Labor. Variable Estimates se t-stat p-value. Intercept 1.83 (0.15) 12.34 <0.001 Agricultural Labor 0.02 (0.01) 3.52 <0.001 N 72 R 2 0.15 RMSE 0.93 How do we read this table? Note the reduction in N: lots of cases are missing data on agricultural labor Any cases missing any covariates need to be deleted from the data before using regression (listwise deletion) Chris Adolph (University of Washington) Multiple Regression, part 1 5 / 33

0 10 20 30 40 50 2 4 6 8 Agricultural workers as % labor force Fertility Rate What looks different about this scatterplot? Chris Adolph (University of Washington) Multiple Regression, part 1 6 / 33

8 Fertility Rate 6 4 2 0 10 20 30 40 50 Agricultural workers as % labor force What looks different about this scatterplot? The high fertility cases seem to be missing (deleted due to missing data) Chris Adolph (University of Washington) Multiple Regression, part 1 6 / 33

8 Fertility Rate 6 4 2 Guatemala Oman Bolivia Jordan Namibia Paraguay Botswana Israel Ecuador Peru Malaysia El Salvador entina ombia ited Arab Maldives South Emirates Panama Africa Mexico Costa Jamaica BrazilRica Iceland Azerbaijan Denmark nds Uruguay ed New States Australia Barbados elgium Cyprus Antilles Ireland Zealand Mongo etherlands embourg Malta Finland Canada nidad Norway apore weden Kingdom Tobago Austria Croatia Cuba Mol Ge Slovak Switzerland ermany Japan Hungary Estonia Korea, Republic Portugal Spain Slovenia Rep. Latvia Greece Lithuania Poland Ukraine Bulgaria Romania o, China 0 10 20 30 40 50 Agricultural workers as % labor force Chris Adolph (University of Washington) Multiple Regression, part 1 7 / 33

0 10 20 30 40 50 2 4 6 8 Agricultural workers as % labor force Fertility Rate Is this a strong relationship? Chris Adolph (University of Washington) Multiple Regression, part 1 8 / 33

0 10 20 30 40 50 2 4 6 8 Agricultural workers as % labor force Fertility Rate Is this a strong relationship? How many datapoints would have to move to reduce the slope to 0? Chris Adolph (University of Washington) Multiple Regression, part 1 8 / 33

0 10 20 30 40 50 2 4 6 8 Agricultural workers as % labor force Fertility Rate Which are larger, the residuals or the explained variance? Chris Adolph (University of Washington) Multiple Regression, part 1 9 / 33

Density 0.4 0.3 0.2 What is the standard deviation of this distribution called? 0.1 2 1 0 1 2 3 Residuals from Fertility vs Agriculture Chris Adolph (University of Washington) Multiple Regression, part 1 10 / 33

Density 0.4 0.3 0.2 0.1 2 1 0 1 2 3 Residuals from Fertility vs Agriculture What is the standard deviation of this distribution called? The RMSE, or standard error of the regression: how much predictions from this model tend to miss by Chris Adolph (University of Washington) Multiple Regression, part 1 10 / 33

0 10 20 30 40 50 2 4 6 8 Residuals from Fertility vs Agriculture Fertility Rate How confident are we that this line has a positive slope? Chris Adolph (University of Washington) Multiple Regression, part 1 11 / 33

8 How Fertility Rate 6 4 confident are we that this line has a positive slope? 2 0 10 20 30 40 50 Residuals from Fertility vs Agriculture Are we as confident as we were for the other models? Chris Adolph (University of Washington) Multiple Regression, part 1 11 / 33

Confounders and Omitted Variable Bias Which (if any) of the three models weʼve looked at are right? Do Education, GDP, and Ag Labor all affect Fertility? Chris Adolph (University of Washington) Multiple Regression, part 1 12 / 33

Confounders and Omitted Variable Bias Which (if any) of the three models weʼve looked at are right? Do Education, GDP, and Ag Labor all affect Fertility? What if Education, GDP, and Ag Labor are correlated? Chris Adolph (University of Washington) Multiple Regression, part 1 12 / 33

Confounders and Omitted Variable Bias Which (if any) of the three models weʼve looked at are right? Do Education, GDP, and Ag Labor all affect Fertility? What if Education, GDP, and Ag Labor are correlated? If we regress Fertility on Education, and Education is correlated with GDP and Ag, might it proxy all three variables? Chris Adolph (University of Washington) Multiple Regression, part 1 12 / 33

Confounders and Omitted Variable Bias Which (if any) of the three models weʼve looked at are right? Do Education, GDP, and Ag Labor all affect Fertility? What if Education, GDP, and Ag Labor are correlated? If we regress Fertility on Education, and Education is correlated with GDP and Ag, might it proxy all three variables? Yes: if countries which educate women also tend to be rich and have few ag workers, then the bivariate results will blur all three relationships Chris Adolph (University of Washington) Multiple Regression, part 1 12 / 33

Confounders and Omitted Variable Bias Should we be worried? Correlation between: Education & GDP is 0.46 Chris Adolph (University of Washington) Multiple Regression, part 1 13 / 33

Confounders and Omitted Variable Bias Should we be worried? Correlation between: Education & GDP is 0.46 Correlation between GDP & Ag is -0.64 Chris Adolph (University of Washington) Multiple Regression, part 1 13 / 33

Confounders and Omitted Variable Bias Should we be worried? Correlation between: Education & GDP is 0.46 Correlation between GDP & Ag is -0.64 Correlation between Education & Ag is -0.41 (What do these numbers mean?) Chris Adolph (University of Washington) Multiple Regression, part 1 13 / 33

Confounders and Omitted Variable Bias Should we be worried? Correlation between: Education & GDP is 0.46 Correlation between GDP & Ag is -0.64 Correlation between Education & Ag is -0.41 (What do these numbers mean?) Omitted variable bias: Leaving any of these variables out of our model could lead to misleading estimates of the effects of any variables we do include Chris Adolph (University of Washington) Multiple Regression, part 1 13 / 33

The linear regression model, redux y i = β 0 + β 1 x 1i + β 2 x 2i +... + β k x ki + ε i Our dependent variable likely depends on many covariates For example, x 1 might be the education ratio, x 2 might be GDP per capita, and so on for as many covariates as we have, up to our kth covariate This leads to the above model, with multiple partial slopes β 1, β 2, β 3 Chris Adolph (University of Washington) Multiple Regression, part 1 14 / 33

The linear regression model, redux y i = β 0 + β 1 x 1i + β 2 x 2i +... + β k x ki + ε i This model is still a linear regression model. Sometimes called this is called a multiple regression model to distinguish it from a bivariate regression, but mathematically, they are equivalent Henceforth, we will assume a linear regression can have many covariates How many covariates are allowed? Up to N 1, where N is the number of observations Each covariate added uses up a degree of freedom; once they are gone, there is nothing left for an additional covariate to explain Chris Adolph (University of Washington) Multiple Regression, part 1 15 / 33

The linear regression model, redux y i = β 0 + β 1 x 1i + β 2 x 2i +... + β k x ki + ε i How do we interpret the βʼs? Just as before. Chris Adolph (University of Washington) Multiple Regression, part 1 16 / 33

The linear regression model, redux y i = β 0 + β 1 x 1i + β 2 x 2i +... + β k x ki + ε i How do we interpret the βʼs? Just as before. The βʼs are still slopes, or the amount y i changes on average for a 1 unit increase in x, all else held equal Chris Adolph (University of Washington) Multiple Regression, part 1 16 / 33

The linear regression model, redux y i = β 0 + β 1 x 1i + β 2 x 2i +... + β k x ki + ε i How do we interpret the βʼs? Just as before. The βʼs are still slopes, or the amount y i changes on average for a 1 unit increase in x, all else held equal If we increase x 1 by 1 unit, and hold x 2 fixed at its present level, then y goes up by β 1 Chris Adolph (University of Washington) Multiple Regression, part 1 16 / 33

The linear regression model, redux y i = β 0 + β 1 x 1i + β 2 x 2i +... + β k x ki + ε i How do we interpret the βʼs? Just as before. The βʼs are still slopes, or the amount y i changes on average for a 1 unit increase in x, all else held equal If we increase x 1 by 1 unit, and hold x 2 fixed at its present level, then y goes up by β 1 Weʼve finally found a way to control for confounders using observational data! Chris Adolph (University of Washington) Multiple Regression, part 1 16 / 33

The linear regression model, redux y i = β 0 + β 1 x 1i + β 2 x 2i +... + β k x ki + ε i Aside for calculus-users: The βʼs are partial derivatives with respect to the x they multiply Chris Adolph (University of Washington) Multiple Regression, part 1 17 / 33

The linear regression model, redux y i = β 0 + β 1 x 1i + β 2 x 2i +... + β k x ki + ε i Aside for calculus-users: The βʼs are partial derivatives with respect to the x they multiply To see this, imagine a model with three covariates: x 1, x 2, and x 3. Chris Adolph (University of Washington) Multiple Regression, part 1 17 / 33

The linear regression model, redux y i = β 0 + β 1 x 1i + β 2 x 2i +... + β k x ki + ε i Aside for calculus-users: The βʼs are partial derivatives with respect to the x they multiply To see this, imagine a model with three covariates: x 1, x 2, and x 3. What is the effect of a tiny change in x 2 on y, holding other xʼs constant? Chris Adolph (University of Washington) Multiple Regression, part 1 17 / 33

The linear regression model, redux y i = β 0 + β 1 x 1i + β 2 x 2i +... + β k x ki + ε i Aside for calculus-users: The βʼs are partial derivatives with respect to the x they multiply To see this, imagine a model with three covariates: x 1, x 2, and x 3. What is the effect of a tiny change in x 2 on y, holding other xʼs constant? y x 2 Chris Adolph (University of Washington) Multiple Regression, part 1 17 / 33

The linear regression model, redux y i = β 0 + β 1 x 1i + β 2 x 2i +... + β k x ki + ε i Aside for calculus-users: The βʼs are partial derivatives with respect to the x they multiply To see this, imagine a model with three covariates: x 1, x 2, and x 3. What is the effect of a tiny change in x 2 on y, holding other xʼs constant? y = (β 0 + β 1 x 1 + β 2 x 2 + β 3 x 3 ) x 2 x 2 Chris Adolph (University of Washington) Multiple Regression, part 1 17 / 33

The linear regression model, redux y i = β 0 + β 1 x 1i + β 2 x 2i +... + β k x ki + ε i Aside for calculus-users: The βʼs are partial derivatives with respect to the x they multiply To see this, imagine a model with three covariates: x 1, x 2, and x 3. What is the effect of a tiny change in x 2 on y, holding other xʼs constant? y = (β 0 + β 1 x 1 + β 2 x 2 + β 3 x 3 ) x 2 x 2 = β 2 This makes β k a very useful summary of the effect of x k on y Chris Adolph (University of Washington) Multiple Regression, part 1 17 / 33

Multiple regression: just like bivariate. 1 Our estimates, ˆβ k, are the β kʼs that minimize the sum of the squared residuals (least squares) Chris Adolph (University of Washington) Multiple Regression, part 1 18 / 33

Multiple regression: just like bivariate. 1 Our estimates, ˆβ k, are the β kʼs that minimize the sum of the squared residuals (least squares). 2 The uncertainty of each ˆβ k is given by its standard error Chris Adolph (University of Washington) Multiple Regression, part 1 18 / 33

Multiple regression: just like bivariate. 1 Our estimates, ˆβ k, are the β kʼs that minimize the sum of the squared residuals (least squares). 2 The uncertainty of each ˆβ k is given by its standard error. 3 We can still perform t-tests and calculate confidence intervals for each ˆβ k Chris Adolph (University of Washington) Multiple Regression, part 1 18 / 33

Multiple regression: just like bivariate. 1 Our estimates, ˆβ k, are the β kʼs that minimize the sum of the squared residuals (least squares). 2 The uncertainty of each ˆβ k is given by its standard error. 3 We can still perform t-tests and calculate confidence intervals for each ˆβ k. 4 We can still calculate the fitted value ŷ i of any observation i: this is the model prediction for that case Chris Adolph (University of Washington) Multiple Regression, part 1 18 / 33

Multiple regression: just like bivariate. 1 Our estimates, ˆβ k, are the β kʼs that minimize the sum of the squared residuals (least squares). 2 The uncertainty of each ˆβ k is given by its standard error. 3 We can still perform t-tests and calculate confidence intervals for each ˆβ k. 4 We can still calculate the fitted value ŷ i of any observation i: this is the model prediction for that case 5. We can still summarize goodness of fit using such measures as RMSE and R 2 Chris Adolph (University of Washington) Multiple Regression, part 1 18 / 33

Fertility as function of Education and GDP per capita Letʼs start small: a model with two covariates: Fertility i = ˆβ 0 + ˆβ 1 Edu atio i + ˆβ 2 GDPpc i Fertility i = 11.24 0.08 Edu atio 1 0.05 GDPpc i We can present this result in several ways:. 1 In a table by itself. 2 In a table compared to other models. 3 Through graphics Chris Adolph (University of Washington) Multiple Regression, part 1 19 / 33

. Regression of Fertility on Education Ratio & GDP. Variable Estimates se t-stat p-value. Intercept 11.25 (0.73) 15.46 <0.001 Education Ratio -0.08 (0.01) -9.93 <0.001 GDP per capita ($k) -0.05 (0.01) -5.32 <0.001 N 130 R 2 0.64 RMSE 1.01 How do we interpret the above? Chris Adolph (University of Washington) Multiple Regression, part 1 20 / 33

. Three regression models of fertility. Model Variable 1 2 3 Intercept 12.59 4.13 11.25 (0.75) (0.17) (0.73) Education Ratio -0.10-0.08 (0.01) (0.01) GDP per capita -0.10-0.05 (0.01) (0.01) N 130 130 130 R 2 0.55 0.35 0.64 RMSE 1.12 1.35 1.01. Standard errors in parentheses How do we interpret the above table? Chris Adolph (University of Washington) Multiple Regression, part 1 21 / 33

. Three regression models of fertility. Model Variable 1 2 3 Intercept 12.59 4.13 11.25 [11.11, 14.08] [3.80, 4.46] [9.81, 12.69] Education Ratio -0.10-0.08 [-0.12, -0.08] [-0.10, -0.06] GDP per capita -0.10-0.05 [-0.12, -0.08] [-0.07, -0.03] N 130 130 130 R 2 0.55 0.35 0.64 RMSE 1.12 1.35 1.01 95%. confidence intervals in brackets This table presents the same information, but is easier to digest Chris Adolph (University of Washington) Multiple Regression, part 1 22 / 33

2 4 6 8 2 4 6 8 Model fitted values, Fertility hat Actual data, Fertility To see the residuals, compare the model fit with reality Chris Adolph (University of Washington) Multiple Regression, part 1 23 / 33

2 4 6 8 2 4 6 8 Model fitted values, Fertility hat Actual data, Fertility Note that in the multivariate case, we need to plot against ŷ i, not x i, because there is more than one x i Chris Adolph (University of Washington) Multiple Regression, part 1 24 / 33

Actual data, Fertility 8 6 4 2 Niger Uganda Chad Somalia Malawi Burkina Faso Yemen, Rep. Zambia Ethiopia Benin Guinea Rwanda Liberia Guinea Bissau Equatorial Guinea Mali Mozambique Senegal Eritrea Mauritania Cote Togo d'ivoire Kenya Iraq Guatemala Congo, Rep. Djibouti Solomon Ghana Islands Samoa OmanVanuatu Comoros Tonga Swaziland Lesotho Bolivia Namibia Gabon Tajikistan Nepal JordanZimbabwe Cambodia Bhutan Paraguay Belize Botswana Nicaragua India Qatar Bangladesh Fiji Israel Malaysia Ecuador South El Salvador Maldives AfricaPeru United Arab Emirates Panama Bahrain Colombia Morocco Jamaica Kuwait Brunei Argentina Mexico Costa Brazil Rica Guyana LebanonIndonesia Vietnam Uruguay Albania Netherlands Mongolia Iceland Chile Antilles Tunisia United Azerbaijan Ireland New States ZealandMauritius Kazakhstan Norway Denmark Finland Australia France ourg Netherlands Cyprus Belgium United Singapore Malta Macedonia, Cuba Georgia FYR weden Trinidad Kingdom and Tobago Canada Moldova Switzerland Portugal Barbados Croatia Germany Japan Austria Greece Korea, Estonia Belarus Hungary Lithuania Romania Rep. Slovak Republic Poland Slovenia Spain Latvia Ukraine Bulgaria Macao, China Examining which cases are big outliers may suggest additional variables to include as covariates 2 4 6 8 Model fitted values, Fertility hat Chris Adolph (University of Washington) Multiple Regression, part 1 25 / 33

Actual data, Fertility 8 6 4 2 Niger Uganda Chad Somalia Malawi Burkina Faso Yemen, Rep. Zambia Ethiopia Benin Guinea Rwanda Liberia Guinea Bissau Equatorial Guinea Mali Mozambique Senegal Eritrea Mauritania Cote Togo d'ivoire Kenya Iraq Guatemala Congo, Rep. Djibouti Solomon Ghana Islands Samoa OmanVanuatu Comoros Tonga Swaziland Lesotho Bolivia Namibia Gabon Tajikistan Nepal JordanZimbabwe Cambodia Bhutan Paraguay Belize Botswana Nicaragua India Qatar Bangladesh Fiji Israel Malaysia Ecuador South El Salvador Maldives AfricaPeru United Arab Emirates Panama Bahrain Colombia Morocco Jamaica Kuwait Brunei Argentina Mexico Costa Brazil Rica Guyana LebanonIndonesia Vietnam Uruguay Albania Netherlands Mongolia Iceland Chile Antilles Tunisia United Azerbaijan Ireland New States ZealandMauritius Kazakhstan Norway Denmark Finland Australia France ourg Netherlands Cyprus Belgium United Singapore Malta Macedonia, Cuba Georgia FYR weden Trinidad Kingdom and Tobago Canada Moldova Switzerland Portugal Barbados Croatia Germany Japan Austria Greece Korea, Estonia Belarus Hungary Lithuania Romania Rep. Slovak Republic Poland Slovenia Spain Latvia Ukraine Bulgaria Macao, China 2 4 6 8 Model fitted values, Fertility hat Examining which cases are big outliers may suggest additional variables to include as covariates Think of what the missing cases have in common Chris Adolph (University of Washington) Multiple Regression, part 1 25 / 33

linear predictor Visualizing the modelled relationship between many variables is tricky Education Ratio GDP per capita Chris Adolph (University of Washington) Multiple Regression, part 1 26 / 33

linear predictor Visualizing the modelled relationship between many variables is tricky Education Ratio GDP per capita We can do it with a 3D plot for 2 covariates, but not for 3 or more Chris Adolph (University of Washington) Multiple Regression, part 1 26 / 33

8 Vary Education; GDP at mean 8 Vary GDP; Education at mean Fertility Rate 6 4 6 4 2 2 60 70 80 90 100 110 Education Ratio 10 20 30 40 50 GDP pc $k An alternative that works for any number of covariates: Plot out the model predictions as a function of each covariate, while holding the other covariates fixed, e.g., at their means Then predict what Fertility rate should happen on average if the country had average GDP but variable Education (or vice versa) Chris Adolph (University of Washington) Multiple Regression, part 1 27 / 33

8 Vary Education; GDP at mean 8 Vary GDP; Education at mean Fertility Rate 6 4 6 4 2 2 60 70 80 90 100 110 Education Ratio 10 20 30 40 50 GDP pc $k Letʼs compare the multiple regression estimates (in color) with the bivariate regression results (in black) How are they different? Are the bivariate results affected by omitted variable bias? Chris Adolph (University of Washington) Multiple Regression, part 1 28 / 33

. Regression models including Agricultural Labor. Model Variable 1 2 3 4 Intercept 11.15 2.76 1.83 8.95 (2.64) (0.18) (0.15) (2.79) Education Ratio -0.09-0.06 (0.03) (0.03) GDP per capita ($k) -0.04-0.03 (0.01) (0.01) Agriculture Labor 0.02 0.004 (0.01) (0.008) N 72 72 72 72 R 2 0.13 0.17 0.14 0.26 RMSE 0.94 0.92 0.93 0.88. Standard errors in parentheses Chris Adolph (University of Washington) Multiple Regression, part 1 29 / 33

. Regression models including Agricultural Labor. Model Variable 1 2 3 4 Intercept 11.15 2.76 1.83 8.95 [5.90, 16.41] [2.39, 3.13] [1.53, 2.12] [3.38, 14.52] Edu Ratio -0.09-0.06 [-0.14, -0.04] [-0.12, -0.01] GDP pc -0.04-0.03 [-0.06, -0.02] [-0.06, -0.003] Ag Labor 0.02 0 0.004 [0.01, 0.03] [-0.01, 0.02] N 72 72 72 72 R 2 0.13 0.17 0.14 0.26 RMSE 0.94 0.92 0.93 0.88. 95% confidence intervals in brackets Chris Adolph (University of Washington) Multiple Regression, part 1 30 / 33

Vary Edu; GDP & Ag at mean 8 Vary GDP; Edu & Ag at mean 8 Vary Ag; Edu & GDP at mean 8 Fertility Rate 6 4 6 4 6 4 2 2 2 60 70 80 90 100 110 Education Ratio 10 20 30 40 50 GDP pc $k 0 10 20 30 40 50 60 70 Ag Labor % How do we interpret these plots? The dashed lines indicate extrapolation: no observed data have these values for the covariates Chris Adolph (University of Washington) Multiple Regression, part 1 31 / 33

Vary Edu; GDP & Ag at mean 8 Vary GDP; Edu & Ag at mean 8 Vary Ag; Edu & GDP at mean 8 Fertility Rate 6 4 6 4 6 4 2 2 2 60 70 80 90 100 110 Education Ratio 10 20 30 40 50 GDP pc $k 0 10 20 30 40 50 60 70 Ag Labor % The black lines show the bivariate results. Was there omitted variable bias? Chris Adolph (University of Washington) Multiple Regression, part 1 32 / 33

Vary Edu; GDP & Ag at mean 8 Vary GDP; Edu & Ag at mean 8 Vary Ag; Edu & GDP at mean 8 Fertility Rate 6 4 6 4 6 4 2 2 2 60 70 80 90 100 110 Education Ratio 10 20 30 40 50 GDP pc $k 0 10 20 30 40 50 60 70 Ag Labor % The black lines show the bivariate results. Was there omitted variable bias? YES. The apparent effect of Ag Labor was a mirage: just the omitted effect of GDP per capita. If we control for GDP, we see Ag Labor has no effect. Chris Adolph (University of Washington) Multiple Regression, part 1 32 / 33

Warning! Linear regression is powerful, but easy to misuse We mentioned one assumption last time: That the error term is Normally distributed To this we now add two additonal assumptions Correct specification The model contains all the covariates that produce Y. If any omitted cause of Y is correlated with the included Xʼs, then ˆβ can no longer be trusted. No endogeneity of Y None of the included Xʼs are caused by Y More on these assumptions next time Chris Adolph (University of Washington) Multiple Regression, part 1 33 / 33