Class business PS is due Wed. Lecture 20 (QPM 2016) Multivariate Regression November 14, / 44

Similar documents
Dummies and Interactions

Lecture 26 Section 8.4. Mon, Oct 13, 2008

Forecasting the 2012 Presidential Election from History and the Polls

Parametric Test. Multiple Linear Regression Spatial Application I: State Homicide Rates Equations taken from Zar, 1984.

Nursing Facilities' Life Safety Standard Survey Results Quarterly Reference Tables

Your Galactic Address

Sample Statistics 5021 First Midterm Examination with solutions

Analyzing Severe Weather Data

What Lies Beneath: A Sub- National Look at Okun s Law for the United States.

Use your text to define the following term. Use the terms to label the figure below. Define the following term.

Evolution Strategies for Optimizing Rectangular Cartograms

Estimating Dynamic Games of Electoral Competition to Evaluate Term Limits in U.S. Gubernatorial Elections: Online Appendix

Module 19: Simple Linear Regression

C Further Concepts in Statistics

EXST 7015 Fall 2014 Lab 08: Polynomial Regression

Regression Diagnostics

The veto as electoral stunt

Summary of Natural Hazard Statistics for 2008 in the United States

Multiway Analysis of Bridge Structural Types in the National Bridge Inventory (NBI) A Tensor Decomposition Approach

Smart Magnets for Smart Product Design: Advanced Topics

Cluster Analysis. Part of the Michigan Prosperity Initiative

REGRESSION ANALYSIS BY EXAMPLE

Kari Lock. Department of Statistics, Harvard University Joint Work with Andrew Gelman (Columbia University)

Appendix 5 Summary of State Trademark Registration Provisions (as of July 2016)

Annual Performance Report: State Assessment Data

Sampling Distributions

SAMPLE AUDIT FORMAT. Pre Audit Notification Letter Draft. Dear Registrant:

Final Exam. 1. Definitions: Briefly Define each of the following terms as they relate to the material covered in class.

Statistical Mechanics of Money, Income, and Wealth

Drought Monitoring Capability of the Oklahoma Mesonet. Gary McManus Oklahoma Climatological Survey Oklahoma Mesonet

Week 3 Linear Regression I

2006 Supplemental Tax Information for JennisonDryden and Strategic Partners Funds

Empirical Application of Panel Data Regression

Combinatorics. Problem: How to count without counting.

Swine Enteric Coronavirus Disease (SECD) Situation Report June 30, 2016

Meteorology 110. Lab 1. Geography and Map Skills

Data Visualization (DSC 530/CIS )

POL 681 Lecture Notes: Statistical Interactions

Business Statistics. Lecture 10: Correlation and Linear Regression

Some concepts are so simple

Statistics and Quantitative Analysis U4320. Segment 10 Prof. Sharyn O Halloran

Chapter 3 Multiple Regression Complete Example

Data Analysis 1 LINEAR REGRESSION. Chapter 03

Swine Enteric Coronavirus Disease (SECD) Situation Report Sept 17, 2015

Business Statistics. Lecture 9: Simple Regression

Analysis of the USDA Annual Report (2015) of Animal Usage by Research Facility. July 4th, 2017

Further Concepts in Statistics

Test of Convergence in Agricultural Factor Productivity: A Semiparametric Approach

STA441: Spring Multiple Regression. More than one explanatory variable at the same time

Introduction to Mathematical Statistics and Its Applications Richard J. Larsen Morris L. Marx Fifth Edition

The linear regression model: functional form and structural breaks

Further Concepts in Statistics

Online Appendix to The Political Economy of the U.S. Mortgage Default Crisis Not For Publication

AIR FORCE RESCUE COORDINATION CENTER

Chapter 7 Student Lecture Notes 7-1

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

1 Fixed E ects and Random E ects

Chapter 4: Regression Models

Swine Enteric Coronavirus Disease (SECD) Situation Report Mar 5, 2015

Business Statistics. Lecture 10: Course Review

1. The Multivariate Classical Linear Regression Model

STA441: Spring Multiple Regression. This slide show is a free open source document. See the last slide for copyright information.

Chapter 14 Student Lecture Notes 14-1

STA121: Applied Regression Analysis

Lecture 11: Simple Linear Regression

Lecture 9: Linear Regression

1 Quantitative Techniques in Practice

Two-Variable Regression Model: The Problem of Estimation

Review of Statistics 101

Objectives Simple linear regression. Statistical model for linear regression. Estimating the regression parameters

Research Methods in Political Science I

Statistical Distribution Assumptions of General Linear Models

ISQS 5349 Spring 2013 Final Exam

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

Applied Regression Modeling: A Business Approach Chapter 3: Multiple Linear Regression Sections

Applied Regression Modeling: A Business Approach Chapter 2: Simple Linear Regression Sections

Categorical Predictor Variables

Linear Regression Analysis for Survey Data. Professor Ron Fricker Naval Postgraduate School Monterey, California

Intro to Linear Regression

Use of Dummy (Indicator) Variables in Applied Econometrics

MORE ON MULTIPLE REGRESSION

Black White Total Observed Expected χ 2 = (f observed f expected ) 2 f expected (83 126) 2 ( )2 126

Intro to Linear Regression

AFRCC AIR FORCE RESCUE COORDINATION CENTER

Lecture 10 Multiple Linear Regression

AIM HIGH SCHOOL. Curriculum Map W. 12 Mile Road Farmington Hills, MI (248)

POLI 618 Notes. Stuart Soroka, Department of Political Science, McGill University. March 2010

Measuring the fit of the model - SSR

Suppose we obtain a MLR equation as follows:

Simple Linear Regression Estimation and Properties

Data Visualization (CIS 468)

Multiple Regression Analysis

1 Correlation between an independent variable and the error

Gov 2000: 9. Regression with Two Independent Variables

Bivariate Relationships Between Variables

TMA 4275 Lifetime Analysis June 2004 Solution

A Second Opinion Correlation Coefficient. Rudy A. Gideon and Carol A. Ulsafer. University of Montana, Missoula MT 59812

Review of Statistics

Lectures on Simple Linear Regression Stat 431, Summer 2012

Basic Business Statistics, 10/e

Transcription:

Multivariate Regression Prof. Jacob M. Montgomery Quantitative Political Methodology (L32 363) November 14, 2016 Lecture 20 (QPM 2016) Multivariate Regression November 14, 2016 1 / 44

Class business PS is due Wed. Lecture 20 (QPM 2016) Multivariate Regression November 14, 2016 2 / 44

Class business PS is due Wed. You should take notes on this one. Lecture 20 (QPM 2016) Multivariate Regression November 14, 2016 2 / 44

Overview: A big day Introducing multivariate regression I An example (time for change model) I (Hyper)planes in (hyper)space I Specifying and estimating the regression model Three ways to think about regression I Hyperplanes I Lines within groups I Added variable plots Inference for multivariate regression A brief word on Simpson s paradox Lecture 20 (QPM 2016) Multivariate Regression November 14, 2016 3 / 44

So far we have looked at data like this Incumbent party share of vote 45 50 55 60 1980 1960 1952 1956 2012 2004 2016 2008 1976 1964 1992 1988 1984 1996 1948 2000 1968 1972 5 0 5 10 Q2 GDP Growth Lecture 20 (QPM 2016) Multivariate Regression November 14, 2016 4 / 44

Motivating example: Presidential elections But what if it s time for a change? Table: Success of Incumbent Party Candidate in Presidential Electios by Type of Election, 1948-2016 Results First-Term Second- or Later Won 8 2 Lost 1 8 Average vote 55.3 49.3 Lecture 20 (QPM 2016) Multivariate Regression November 14, 2016 5 / 44

Accounting for time in o ce Estimate a more complex equation: where: µ y is mean presidential vote share b 0 is the y-intercept ( constant ) b 1 is the slope ( coe µ y = b 0 + b 1 x 1 + b 2 x 2 cient ) for Q2 GDP growth x 1 is Q2 GDP growth in the election year b 2 is the slope ( coe cient ) for TFC ( time for a change ) x 2 is an indicator ( dummy ) variable for TFC (1=first term; 0=second term or later) Lecture 20 (QPM 2016) Multivariate Regression November 14, 2016 6 / 44

Multivariate regression Equation for the graph: Vote share = 46.59 + 0.76 Q2 GDP + 6.02 FirstTermInc or Vote share TFC = 46.59 + 0.76 Q2 GDP Vote share Not TFC = 52.61 + 0.76 Q2 GDP Lecture 20 (QPM 2016) Multivariate Regression November 14, 2016 7 / 44

Multivariate regression Incumbent party share of vote 45 50 55 60 1980 1952 1956 1964 1984 1996 1988 1948 2012 2004 2016 1960 2000 1968 1976 2008 1992 1972 5 0 5 10 Q2 GDP Growth Lecture 20 (QPM 2016) Multivariate Regression November 14, 2016 8 / 44

Overview: A big day Introducing multivariate regression I An example (time for change model) I (Hyper)planes in (hyper)space I Specifying and estimating the regression model Three ways to think about regression I Hyperplanes I Lines within groups I Added variable plots Inference for multivariate regression A brief word on Simpson s paradox Lecture 20 (QPM 2016) Multivariate Regression November 14, 2016 9 / 44

Multivariate regression Lecture 20 (QPM 2016) Multivariate Regression November 14, 2016 10 / 44

Beyond two dimensions Incumbent party vote share Model 1 Model 2 Intercept 49.27 49.35 (1.35) (4.51) 2nd Qtr GDP 0.754 0.451 (0.248) (0.161) June Polling 0.147 (0.085) Multiple R-Squared 0.366 0.781 Standard errors are in parentheses. N=18. Lecture 20 (QPM 2016) Multivariate Regression November 14, 2016 11 / 44

Beyond two dimensions Incumbent party vote share Model 1 Model 2 Intercept 49.27 49.35 (1.35) (4.51) 2nd Qtr GDP 0.754 0.451 (0.248) (0.161) June Polling 0.147 (0.085) Multiple R-Squared 0.366 0.781 Standard errors are in parentheses. N=18. Two questions to try to understand: What do the coe cients (and standard errors) mean? Lecture 20 (QPM 2016) Multivariate Regression November 14, 2016 11 / 44

Beyond two dimensions Incumbent party vote share Model 1 Model 2 Intercept 49.27 49.35 (1.35) (4.51) 2nd Qtr GDP 0.754 0.451 (0.248) (0.161) June Polling 0.147 (0.085) Multiple R-Squared 0.366 0.781 Standard errors are in parentheses. N=18. Two questions to try to understand: What do the coe cients (and standard errors) mean? Why did the 2nd Quarter GDP coe cient change? Lecture 20 (QPM 2016) Multivariate Regression November 14, 2016 11 / 44

Now we need to think about data like this Vote Share 5 0 5 10 45 50 55 60 5 0 5 10 2nd Quarter GDP 45 50 55 60 20 0 20 40 60 20 20 60 June Approval Lecture 20 (QPM 2016) Multivariate Regression November 14, 2016 12 / 44

Which can also be thought of like this 60 Vote share 55 50 45 40 10 5 2nd Quarter GDP 0 Lecture 20 (QPM 2016) Multivariate Regression November 14, 2016 13 / 44 5 10 June approval 20 40 60 40 20 0

Overview: A big day Introducing multivariate regression I An example (time for change model) I (Hyper)planes in (hyper)space I Specifying and estimating the regression model Three ways to think about regression I Hyperplanes I Lines within groups I Added variable plots Inference for multivariate regression A brief word on Simpson s paradox Lecture 20 (QPM 2016) Multivariate Regression November 14, 2016 14 / 44

To draw the best line we wanted to minimize error Residuals e i =(Y i Ŷ i )=(Y i â ˆbX i ) Lecture 20 (QPM 2016) Multivariate Regression November 14, 2016 15 / 44

To draw the best line we wanted to minimize error Residuals e i =(Y i Ŷ i )=(Y i â ˆbX i ) Incumbent party share of vote 45 50 55 60 Residuals for presidential regression -5 0 5 10 2nd quarter GDP Growth Lecture 20 (QPM 2016) Multivariate Regression November 14, 2016 15 / 44

Residuals for multiple regression 60 Vote share 55 50 45 40 10 5 2nd Quarter GDP 0 5 10 June approval 20 40 60 40 20 0 Lecture 20 (QPM 2016) Multivariate Regression November 14, 2016 16 / 44

Multidimensional linear models On average, we are hypothesizing that the world looks like this: E (Y )=a + b 1 X 1 +...+ b k X k Lecture 20 (QPM 2016) Multivariate Regression November 14, 2016 17 / 44

Multidimensional linear models On average, we are hypothesizing that the world looks like this: E (Y )=a + b 1 X 1 +...+ b k X k Overall, we think that the data looks like this Y i = a + b 1 X 1,i +...+ b k X k,i + e i e i N(0, s 2 ) Just like before, we need to decide on a rule to choose the best estimates: â, ŝ 2, ˆ b 1, ˆ b 2,... Lecture 20 (QPM 2016) Multivariate Regression November 14, 2016 17 / 44

Residuals, SSE, and ŝ 2 Residuals e i =(Y i Ŷ i )=(Y i â ˆ b 1 X 1i... ˆ b k X ki ) Lecture 20 (QPM 2016) Multivariate Regression November 14, 2016 18 / 44

Residuals, SSE, and ŝ 2 Residuals e i =(Y i Ŷ i )=(Y i â ˆ b 1 X 1i... ˆ b k X ki ) Sum of Squared Error SSE = n  i=1 (Y i Ŷ i ) 2 Lecture 20 (QPM 2016) Multivariate Regression November 14, 2016 18 / 44

Residuals, SSE, and ŝ 2 Residuals e i =(Y i Ŷ i )=(Y i â ˆ b 1 X 1i... ˆ b k X ki ) Sum of Squared Error SSE = n  i=1 (Y i Ŷ i ) 2 = n  i=1 (Y i â bˆ 1 X 1i... bˆ k X ki ) 2 Lecture 20 (QPM 2016) Multivariate Regression November 14, 2016 18 / 44

Residuals, SSE, and ŝ 2 Residuals e i =(Y i Ŷ i )=(Y i â ˆ b 1 X 1i... ˆ b k X ki ) Sum of Squared Error SSE = n  i=1 (Y i Ŷ i ) 2 = n  i=1 (Y i â bˆ 1 X 1i... bˆ k X ki ) 2 Conditional Variance: Estimate of variance around hyperplane in population q r ŝ 2 = SSE n (k+1) = Â(Y i Ŷ i ) 2 ) ŝ = SSE n (k+1) n (k+1) = Â(Y i Ŷ i ) 2 n (k+1) Lecture 20 (QPM 2016) Multivariate Regression November 14, 2016 18 / 44

Coe cient of multiple determination Same as before: TSS = Â(Y i Ȳ ) 2 Lecture 20 (QPM 2016) Multivariate Regression November 14, 2016 19 / 44

Coe cient of multiple determination Same as before: TSS = Â(Y i Ȳ ) 2 SSE = Â(Y i Ŷ i ) 2 Lecture 20 (QPM 2016) Multivariate Regression November 14, 2016 19 / 44

Coe cient of multiple determination Same as before: TSS = Â(Y i Ȳ ) 2 SSE = Â(Y i Ŷ i ) 2 Lecture 20 (QPM 2016) Multivariate Regression November 14, 2016 19 / 44

Coe cient of multiple determination Same as before: TSS = Â(Y i Ȳ ) 2 SSE = Â(Y i Ŷ i ) 2 R-squared R 2 = Explained Variance Total Variance = Total Variance - Unexplained Variance Total Variance Lecture 20 (QPM 2016) Multivariate Regression November 14, 2016 19 / 44

Coe cient of multiple determination Same as before: TSS = Â(Y i Ȳ ) 2 SSE = Â(Y i Ŷ i ) 2 R-squared R 2 = Explained Variance Total Variance = Total Variance - Unexplained Variance Total Variance = TSS SSE TSS Lecture 20 (QPM 2016) Multivariate Regression November 14, 2016 19 / 44

Overview: A big day Introducing multivariate regression I An example (time for change model) I (Hyper)planes in (hyper)space I Specifying and estimating the regression model Three ways to think about regression I Hyperplanes I Lines within groups I Added variable plots Inference for multivariate regression A brief word on Simpson s paradox Lecture 20 (QPM 2016) Multivariate Regression November 14, 2016 20 / 44

Beyond two dimensions Incumbent party vote share Model 1 Model 2 Intercept 49.27 49.35 (1.35) (4.51) 2nd Qtr GDP 0.754 0.451 (0.248) (0.161) June Polling 0.147 (0.085) Multiple R-Squared 0.366 0.781 Standard errors are in parentheses. N=18. Lecture 20 (QPM 2016) Multivariate Regression November 14, 2016 21 / 44

Conceptualization #1: Planes 60 Vote share 55 50 45 60 40 10 5 2nd Quarter GDP 0 5 10 June approval 20 0 20 40 40 Lecture 20 (QPM 2016) Multivariate Regression November 14, 2016 22 / 44

Conceptualization #2: Lines within groups Let X i take on a value of only 0 or 1. Y i = a + b 1 X i + e i Lecture 20 (QPM 2016) Multivariate Regression November 14, 2016 23 / 44

Conceptualization #2: Lines within groups Let X i take on a value of only 0 or 1. Y i = a + b 1 X i + e i E (Y i X i = 0) =a E (Y i X i = 1) =a + b Lecture 20 (QPM 2016) Multivariate Regression November 14, 2016 23 / 44

Conceptualization #2: Lines within groups Let X i take on a value of only 0 or 1. Y i = a + b 1 X i + e i E (Y i X i = 0) =a E (Y i X i = 1) =a + b (This provides the same inference as a t-test) Lecture 20 (QPM 2016) Multivariate Regression November 14, 2016 23 / 44

Nominal data X 1 = {Blue, Not blue}, X 2 = {Brown, Not brown} Lecture 20 (QPM 2016) Multivariate Regression November 14, 2016 24 / 44

Nominal data X 1 = {Blue, Not blue}, X 2 = {Brown, Not brown} Y i = a + b 1 X i,1 + b 2 X i,2 + e i Lecture 20 (QPM 2016) Multivariate Regression November 14, 2016 24 / 44

Nominal data X 1 = {Blue, Not blue}, X 2 = {Brown, Not brown} Y i = a + b 1 X i,1 + b 2 X i,2 + e i E (Y i Blue) =a + b 1 Lecture 20 (QPM 2016) Multivariate Regression November 14, 2016 24 / 44

Nominal data X 1 = {Blue, Not blue}, X 2 = {Brown, Not brown} Y i = a + b 1 X i,1 + b 2 X i,2 + e i E (Y i Blue) =a + b 1 E (Y i Brown) =a + b 2 Lecture 20 (QPM 2016) Multivariate Regression November 14, 2016 24 / 44

Nominal data X 1 = {Blue, Not blue}, X 2 = {Brown, Not brown} Y i = a + b 1 X i,1 + b 2 X i,2 + e i E (Y i Blue) =a + b 1 E (Y i Brown) =a + b 2 E (Y i Green) =a Lecture 20 (QPM 2016) Multivariate Regression November 14, 2016 24 / 44

Nominal data X 1 = {Blue, Not blue}, X 2 = {Brown, Not brown} Y i = a + b 1 X i,1 + b 2 X i,2 + e i E (Y i Blue) =a + b 1 E (Y i Brown) =a + b 2 E (Y i Green) =a Lecture 20 (QPM 2016) Multivariate Regression November 14, 2016 24 / 44

Ordinal data X = {1, 2, 3, 4, 5} Treating ordinal variable as continuous Treating ordinal variable as distinct categories Y -1 0 1 2 3 4 5 6 2 3 2 3 2 3 2 1 1 1 4 4 4 5 5 5 Y -1 0 1 2 3 4 5 6 2 3 2 3 2 3 2 1 1 1 4 4 4 5 5 5 1 2 3 4 5 X 1 2 3 4 5 X Lecture 20 (QPM 2016) Multivariate Regression November 14, 2016 25 / 44

Example: 2009 health care poll Support for Obama health care plan CNN/ORC poll, September 11 13, 2009 0.1.2.3.4 Strongly oppose Moderately oppose Moderately support Strongly support Lecture 20 (QPM 2016) Multivariate Regression November 14, 2016 26 / 44

Example: 2009 health care poll 0.1.2.3.4.5 18 29 30 44 45 64 65+ Lecture 20 (QPM 2016) Multivariate Regression November 14, 2016 27 / 44

Example: 2009 health care poll Strongly favor Support for Obama health care plan Moderately favor Moderately oppose Strongly oppose 18 29 30 44 45 64 65+ Lecture 20 (QPM 2016) Multivariate Regression November 14, 2016 28 / 44

Dummy variable regression What is the association between age and support for HCR controlling for party? Goal: Recode age variable (18-29=1, 30-44=2, 45-64=3, 65+=4) into dummy variables Equation: HCR support = b 0 + b 1 Party + b 2 Age 30-44 + b 3 45-64 + b 4 65+ Lecture 20 (QPM 2016) Multivariate Regression November 14, 2016 29 / 44

Dummy variable regression results Variable Constant 1.421 (0.116) Party 0.914 (0.031) Age 30-44 0.77 (0.13) Age 45-64 0.65 (0.117) Age 65 + 0.16 (0.121) N = 981 R 2 = 0.4799 Lecture 20 (QPM 2016) Multivariate Regression November 14, 2016 30 / 44

Dummy variable regression Strongly favor Support for Obama health care plan Moderately favor Moderately oppose Strongly oppose GOP Independent Democrat 18 29 30 44 45 64 65+ Lecture 20 (QPM 2016) Multivariate Regression November 14, 2016 31 / 44

Things to note For k levels of your categorical variable, you need to create k 1 dummy variables. The choice of baseline is arbitrary, but you need to know which is the baseline category in order to interpret the results correctly All e ects are relative to the baseline category Lecture 20 (QPM 2016) Multivariate Regression November 14, 2016 32 / 44

Conceptualization #3: Added variable plots Added-Variable Plots vote others -5 0 5 vote others -5 0 5-10 -5 0 5 q2gdp others -10 0 10 20 junetrial others Lecture 20 (QPM 2016) Multivariate Regression November 14, 2016 33 / 44

Thinking about X s from this point of view You are going to be doing this for the homework Lecture 20 (QPM 2016) Multivariate Regression November 14, 2016 34 / 44

Thinking about X s from this point of view You are going to be doing this for the homework The slope of these lines corresponds to b estimates in the table. Lecture 20 (QPM 2016) Multivariate Regression November 14, 2016 34 / 44

Thinking about X s from this point of view You are going to be doing this for the homework The slope of these lines corresponds to b estimates in the table. Adding additional controls positively correlated to both the vote-share and GDP growth will lower the b for GDP growth. Lecture 20 (QPM 2016) Multivariate Regression November 14, 2016 34 / 44

Thinking about X s from this point of view You are going to be doing this for the homework The slope of these lines corresponds to b estimates in the table. Adding additional controls positively correlated to both the vote-share and GDP growth will lower the b for GDP growth. If we have many variables that are highly co-linear, often called multicollinearity, it will make coe cients smaller (and therefore less likely to be significant). Lecture 20 (QPM 2016) Multivariate Regression November 14, 2016 34 / 44

Thinking about X s from this point of view You are going to be doing this for the homework The slope of these lines corresponds to b estimates in the table. Adding additional controls positively correlated to both the vote-share and GDP growth will lower the b for GDP growth. If we have many variables that are highly co-linear, often called multicollinearity, it will make coe cients smaller (and therefore less likely to be significant). It is di cult to decide on the right variables, but DO NOT use stepwise methods. Lecture 20 (QPM 2016) Multivariate Regression November 14, 2016 34 / 44

Thinking about X s from this point of view You are going to be doing this for the homework The slope of these lines corresponds to b estimates in the table. Adding additional controls positively correlated to both the vote-share and GDP growth will lower the b for GDP growth. If we have many variables that are highly co-linear, often called multicollinearity, it will make coe cients smaller (and therefore less likely to be significant). It is di cult to decide on the right variables, but DO NOT use stepwise methods. When in doubt, use theory. Lecture 20 (QPM 2016) Multivariate Regression November 14, 2016 34 / 44

Overview: A big day Introducing multivariate regression I An example (time for change model) I (Hyper)planes in (hyper)space I Specifying and estimating the regression model Three ways to think about regression I Hyperplanes I Lines within groups I Added variable plots Inference for multivariate regression A brief word on Simpson s paradox Lecture 20 (QPM 2016) Multivariate Regression November 14, 2016 35 / 44

Inference in regression coe cients Y i = a + b 1 X 1,i + b 2 X 2,i + e i e i N(0, s 2 ) Lecture 20 (QPM 2016) Multivariate Regression November 14, 2016 36 / 44

Inference in regression coe cients Y i = a + b 1 X 1,i + b 2 X 2,i + e i e i N(0, s 2 ) b 1 :Thecoe cient for X 1. Interpretation: A one unit increase X 1 leads to a b 1 increase in Y controlling for the independent e ect of X 2. Lecture 20 (QPM 2016) Multivariate Regression November 14, 2016 36 / 44

Inference in regression coe cients Y i = a + b 1 X 1,i + b 2 X 2,i + e i e i N(0, s 2 ) b 1 :Thecoe cient for X 1. Interpretation: A one unit increase X 1 leads to a b 1 increase in Y controlling for the independent e ect of X 2. Lecture 20 (QPM 2016) Multivariate Regression November 14, 2016 36 / 44

Inference in regression coe cients Y i = a + b 1 X 1,i + b 2 X 2,i + e i e i N(0, s 2 ) b 1 :Thecoe cient for X 1. Interpretation: A one unit increase X 1 leads to a b 1 increase in Y controlling for the independent e ect of X 2. We want to test whether X 1 has any e ect on Y independent of X 2 ˆb 1 t ŝ df =n (k+1) b ˆ1 Lecture 20 (QPM 2016) Multivariate Regression November 14, 2016 36 / 44

Inference in regression coe cients Y i = a + b 1 X 1,i + b 2 X 2,i + e i e i N(0, s 2 ) b 1 :Thecoe cient for X 1. Interpretation: A one unit increase X 1 leads to a b 1 increase in Y controlling for the independent e ect of X 2. We want to test whether X 1 has any e ect on Y independent of X 2 ˆb 1 t ŝ df =n (k+1) b ˆ1 Just read these values o of the tables Lecture 20 (QPM 2016) Multivariate Regression November 14, 2016 36 / 44

Inference in regression coe cients Y i = a + b 1 X 1,i + b 2 X 2,i + e i e i N(0, s 2 ) b 1 :Thecoe cient for X 1. Interpretation: A one unit increase X 1 leads to a b 1 increase in Y controlling for the independent e ect of X 2. We want to test whether X 1 has any e ect on Y independent of X 2 ˆb 1 t ŝ df =n (k+1) b ˆ1 Just read these values o of the tables But watch your degrees of freedom. Lecture 20 (QPM 2016) Multivariate Regression November 14, 2016 36 / 44

Inference for the entire model Before we said that R 2 is intuitively R 2 Explained Variance =.Itmakes Total Variance sense then that (1 R 2 ) is the percent of variance we haven t explained. Lecture 20 (QPM 2016) Multivariate Regression November 14, 2016 37 / 44

Inference for the entire model Before we said that R 2 is intuitively R 2 Explained Variance =.Itmakes Total Variance sense then that (1 R 2 ) is the percent of variance we haven t explained. It turns out that: F-statistic for regression F = R 2 /k (1 R 2 )/[n (k + 1)] Here k is the number of covariates (gdp, incumbent, etc.), and n is the number of observations. This will be distributed according to the F-distribution with df 1 = k, anddf 2 = n (k + 1). Lecture 20 (QPM 2016) Multivariate Regression November 14, 2016 37 / 44

Flashback: Interpreting the F-test Is our model any good? This is a formalized way of asking whether our model is any good. Lecture 20 (QPM 2016) Multivariate Regression November 14, 2016 38 / 44

Flashback: Interpreting the F-test Is our model any good? This is a formalized way of asking whether our model is any good. We compare the amount of variance explained by the regression to the amount unexplained. Lecture 20 (QPM 2016) Multivariate Regression November 14, 2016 38 / 44

Flashback: Interpreting the F-test Is our model any good? This is a formalized way of asking whether our model is any good. We compare the amount of variance explained by the regression to the amount unexplained. I H 0 : Y i = a + e i Lecture 20 (QPM 2016) Multivariate Regression November 14, 2016 38 / 44

Flashback: Interpreting the F-test Is our model any good? This is a formalized way of asking whether our model is any good. We compare the amount of variance explained by the regression to the amount unexplained. I H 0 : Y i = a + e i I H 1 : Y i = a + X i1 b 1 + X i2 b 2 + X i3 b 3 +...+ e i Lecture 20 (QPM 2016) Multivariate Regression November 14, 2016 38 / 44

Overview: A big day Introducing multivariate regression I An example (time for change model) I (Hyper)planes in (hyper)space I Specifying and estimating the regression model Three ways to think about regression I Hyperplanes I Lines within groups I Added variable plots Inference for multivariate regression A brief word on Simpson s paradox Lecture 20 (QPM 2016) Multivariate Regression November 14, 2016 39 / 44

Controlling for a variable can change the sign Simpson's Paradox Outcome -2-1 0 1 2 3 4 5 0 2 4 6 8 10 12 X1 E (Y )=1 0.25X 1 + 2X 2 Relationship between X 1 and Y is the same across groups. Lecture 20 (QPM 2016) Multivariate Regression November 14, 2016 40 / 44

Controlling for a variable can change the sign Simpson's Paradox Outcome -2-1 0 1 2 3 4 5 0 2 4 6 8 10 12 X1 E (Y )=1 0.25X 1 + 2X 2 Relationship between X 1 and Y is the same across groups. We can solve: X 2 = 0 for black observations, X 2 = 2 for red. Lecture 20 (QPM 2016) Multivariate Regression November 14, 2016 40 / 44

Applied example: Income in presidential voting Lecture 20 (QPM 2016) Multivariate Regression November 14, 2016 41 / 44

Applied example: Income in presidential voting % Clinton Vote 30 40 50 60 HI VT MA CA MD NY UT WA IL RI DE CT NJ NM OR ME FLMI NV VA MN NH NC PA WI GA AZ CO OH TX IA MS SC AK LA MOIN MT KS AR KY AL TN NE SD OK ID ND WV WY 40000 50000 60000 70000 State Income Lecture 20 (QPM 2016) Multivariate Regression November 14, 2016 42 / 44

Applied example 2000 2004 Probability Voting Rep 0.25 0.50 0.75 Mississippi Ohio Connecticut Probability Voting Rep 0.25 0.50 0.75 Mississippi Ohio Connecticut 2 1 0 1 2 Individual Income 2 1 0 1 2 Individual Income Gelman et al. (2007): Rich states more likely to vote D (solid circles) Rich within states more likely to vote GOP (open circles) Lecture 20 (QPM 2016) Multivariate Regression November 14, 2016 43 / 44

Get in the game Incumbent party vote share Model 1 Model 2 Intercept 49.27 49.35 (1.35) (4.51) 2nd Qtr GDP 0.754 0.451 (0.248) (0.161) June Polling 0.147 (0.085) Multiple R-Squared 0.366 0.781 Standard errors are in parentheses. N=18. Lecture 20 (QPM 2016) Multivariate Regression November 14, 2016 44 / 44

Get in the game Incumbent party vote share Model 1 Model 2 Intercept 49.27 49.35 (1.35) (4.51) 2nd Qtr GDP 0.754 0.451 (0.248) (0.161) June Polling 0.147 (0.085) Multiple R-Squared 0.366 0.781 Standard errors are in parentheses. N=18. What is the predicted value for 2016? (2nd Quarter GDP = 1.4, June Approval=6) Interpret Model 2 This should include a discussion of both substantive and statistical significance. Lecture 20 (QPM 2016) Multivariate Regression November 14, 2016 44 / 44