Kausalanalyse. Analysemöglichkeiten von Paneldaten

Similar documents
Longitudinal Data Analysis Using SAS Paul D. Allison, Ph.D. Upcoming Seminar: October 13-14, 2017, Boston, Massachusetts

Longitudinal Data Analysis Using Stata Paul D. Allison, Ph.D. Upcoming Seminar: May 18-19, 2017, Chicago, Illinois

Applied Microeconometrics (L5): Panel Data-Basics

Econometrics of Panel Data

Introduction to Linear Regression Analysis

Simultaneous Equations with Error Components. Mike Bronner Marko Ledic Anja Breitwieser

Applied Econometrics Lecture 1

Longitudinal Data Analysis. RatSWD Nachwuchsworkshop Vorlesung von Josef Brüderl 25. August, 2009

Econometrics I Lecture 7: Dummy Variables

A Meta-Analysis of the Urban Wage Premium

Introductory Econometrics

Econometrics I. Professor William Greene Stern School of Business Department of Economics 1-1/40. Part 1: Introduction

Labour Supply Responses and the Extensive Margin: The US, UK and France

1 The basics of panel data

Exercise sheet 6 Models with endogenous explanatory variables

Lecture-1: Introduction to Econometrics

Lecture 4: Linear panel models

Chapter 1 Introduction. What are longitudinal and panel data? Benefits and drawbacks of longitudinal data Longitudinal data models Historical notes

Gibbs Sampling in Latent Variable Models #1

Marginal effects and extending the Blinder-Oaxaca. decomposition to nonlinear models. Tamás Bartus

Analysis of Panel Data: Introduction and Causal Inference with Panel Data

Tables and Figures. This draft, July 2, 2007

Making sense of Econometrics: Basics

Ch 7: Dummy (binary, indicator) variables

Introduction to Panel Data Analysis

Binary Dependent Variables

Chapter 11. Regression with a Binary Dependent Variable

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data?

Econometrics Homework 4 Solutions

2) For a normal distribution, the skewness and kurtosis measures are as follows: A) 1.96 and 4 B) 1 and 2 C) 0 and 3 D) 0 and 0

Ninth ARTNeT Capacity Building Workshop for Trade Research "Trade Flows and Trade Policy Analysis"

Econ 444, class 11. Robert de Jong 1. Monday November 6. Ohio State University. Econ 444, Wednesday November 1, class Department of Economics

An overview of applied econometrics

Basic econometrics. Tutorial 3. Dipl.Kfm. Johannes Metzler

Course Econometrics I

α version (only brief introduction so far)

Environmental Econometrics

Lecture 9: Panel Data Model (Chapter 14, Wooldridge Textbook)

Chapter 9: The Regression Model with Qualitative Information: Binary Variables (Dummies)

Applied Quantitative Methods II

Interpreting and using heterogeneous choice & generalized ordered logit models

GROWING APART: THE CHANGING FIRM-SIZE WAGE PREMIUM AND ITS INEQUALITY CONSEQUENCES ONLINE APPENDIX

Longitudinal and Panel Data: Analysis and Applications for the Social Sciences. Table of Contents

Review of Panel Data Model Types Next Steps. Panel GLMs. Department of Political Science and Government Aarhus University.

Selection endogenous dummy ordered probit, and selection endogenous dummy dynamic ordered probit models

Wooldridge, Introductory Econometrics, 3d ed. Chapter 16: Simultaneous equations models. An obvious reason for the endogeneity of explanatory

ECON Interactions and Dummies

Introduction to Linear Regression Analysis Interpretation of Results

Warwick Economics Summer School Topics in Microeconometrics Instrumental Variables Estimation

Can a Pseudo Panel be a Substitute for a Genuine Panel?

EC327: Advanced Econometrics, Spring 2007

Female Wage Careers - A Bayesian Analysis Using Markov Chain Clustering

Population Aging, Labor Demand, and the Structure of Wages

Topic 10: Panel Data Analysis

Applied Economics. Regression with a Binary Dependent Variable. Department of Economics Universidad Carlos III de Madrid

Econometric Analysis of Panel Data. Final Examination: Spring 2018

Tobit and Selection Models

ECON 482 / WH Hong Binary or Dummy Variables 1. Qualitative Information

WHAT IS HETEROSKEDASTICITY AND WHY SHOULD WE CARE?

A PRIMER ON LINEAR REGRESSION

Problem Set 10: Panel Data

Introduction to GSEM in Stata

Panel data methods for policy analysis

Statistical Inference with Regression Analysis

Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals

ECONOMETRICS HONOR S EXAM REVIEW SESSION

Unemployment Rate Example

ES103 Introduction to Econometrics

Econometric Analysis of Panel Data Assignment 4 Parameter Heterogeneity in Linear Models: RPM and HLM

ECON5115. Solution Proposal for Problem Set 4. Vibeke Øi and Stefan Flügel

Handout 12. Endogeneity & Simultaneous Equation Models

Lab 10 - Binary Variables

Econometrics. Week 6. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

E c o n o m e t r i c s

Please discuss each of the 3 problems on a separate sheet of paper, not just on a separate page!

Regression with Qualitative Information. Part VI. Regression with Qualitative Information

Controlling for Time Invariant Heterogeneity

Applied Health Economics (for B.Sc.)

Logistic regression: Why we often can do what we think we can do. Maarten Buis 19 th UK Stata Users Group meeting, 10 Sept. 2015

ECON Introductory Econometrics. Lecture 2: Review of Statistics

Panel Data. March 2, () Applied Economoetrics: Topic 6 March 2, / 43

Statistical methods for Education Economics

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score

Dynamic Panel Data Models

Introduction to Regression Analysis. Dr. Devlina Chatterjee 11 th August, 2017

Econometrics I KS. Module 1: Bivariate Linear Regression. Alexander Ahammer. This version: March 12, 2018

Panel Data Models. Chapter 5. Financial Econometrics. Michael Hauser WS17/18 1 / 63

THE DESIGN (VERSUS THE ANALYSIS) OF EVALUATIONS FROM OBSERVATIONAL STUDIES: PARALLELS WITH THE DESIGN OF RANDOMIZED EXPERIMENTS DONALD B.

Lecture 8. Using the CLR Model. Relation between patent applications and R&D spending. Variables

1. Regressions and Regression Models. 2. Model Example. EEP/IAS Introductory Applied Econometrics Fall Erin Kelley Section Handout 1

Advanced Quantitative Methods: panel data

Applied Statistics and Econometrics

The Changing Nature of Gender Selection into Employment: Europe over the Great Recession

Panel Data: Linear Models

(a) Briefly discuss the advantage of using panel data in this situation rather than pure crosssections

Sources of Inequality: Additive Decomposition of the Gini Coefficient.

INTRODUCTION TO MULTILEVEL MODELLING FOR REPEATED MEASURES DATA. Belfast 9 th June to 10 th June, 2011

The Simple Regression Model. Part II. The Simple Regression Model

Econometrics of Panel Data

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Panel Data?

Transcription:

Kausalanalyse Analysemöglichkeiten von Paneldaten

Warum geht es in den folgenden Sitzungen? Sitzung Thema Paneldaten Einführung 1 2 3 4 5 6 7 8 9 10 11 12 13 14 09.04.2008 16.04.2008 23.04.2008 30.04.2008 07.05.2008 14.05.2008 21.05.2008 28.05.2008 04.06.2008 11.06.2008 18.06.2008 25.06.2008 02.07.2008 09.07.2008 16.07.2008 Einführung und Überblick Allgemeines lineares Modell Kumulierte Querschnittsdaten I fällt aus Kumulierte Querschnittsdaten II Analysemöglichkeiten von Paneldaten (trotz Pfingstferien) Paneldatenanalyse kontinuierlicher Zielvariablen I Paneldatenanalyse kontinuierlicher Zielvariablen II Paneldatenanalyse kontinuierlicher Zielvariablen III Paneldatenanalyse kategorialer Zielvariablen I Paneldatenanalyse kategorialer Zielvariablen II Paneldatenanalyse kategorialer Zielvariablen III Ereignisdatenanalyse I Ereignisdatenanalyse II Ereignisdatenanalyse III 22.07.2008 Klausur (60 Minuten) 2

Topics 1. Introduction 2. How to manage panel data 3. Independent observations? 4. Describing panel data 5. Explaining panel data 6. Modeling panel data 7. How to estimate models for panel data 3

Aims analysis of panel data rapidly growing field in statistics terminological heterogeneity novice users are easily lost I want to illustrate problems of panel data analysis give basic orientation on first choice methods introduce most important technical terms audience: beginners of panel analysis 4

Important distinction categorical dependent variables continuous dependent variables 5

Categorical dependent variables few discrete values employment status, political attitudes, number of children model probability of observing a certain value (category) of the variable Does the probability of being unemployed change with labor force experience? 6

Continuous dependent variables many different values (on a continuous scale) income, firm size, gross domestic product, amount of social expenditures model certain distributional characteristics of these variables (e.g., expected value) Does income on average increase with educational attainment? 7

Topics 1. Introduction 2. How to manage panel data 3. Independent observations? 4. Describing panel data 5. Explaining panel data 6. Modeling panel data 7. How to estimate models for panel data 8

Definition: panel data repeated observations of the same individuals over time units: individuals, firms, nations, etc. 3 dimensions units i=1,, n variables: ν=1,, V (time-constant, time-dependent) measurements (panel waves): t=1,, T How to put this into a 2-dimensional data matrix? 9

Wide format ID Kids84 Kids85 Educ84 Educ85 1 0 0 12 12 2 2 2 9 9 3 0 1 10 11 4 1 2 8 8 5 3 3 13 13 6 2 2 15 15 7 0 1 9 10............... few measurements n individuals, T measurements size of data matrix n rows (units) V T columns (variables) 10

Long format ID Jahr Kids Educ 1 1984 0 12 1 1985 0 12............ 2 1984 2 9 2 1985 2 9............ 3 1984 0 10 3 1985 1 11............ 4 1984 1 8 4 1985 2 8............ 5 1984 3 13 5 1985 3 13............ 6 1984 2 15 6 1985 2 15............ 7 1984 0 9 7 1985 1 10............ 7 2000 2 13 many measurements n individuals, T measurements size of data matrix N = n T rows (observations) V columns (variables) Which observations belong to the same unit? Stata: tsset id jahr Hierarchical data set observations clustered within units Attention! N = n T looks like you have a lot of observations 11

Pooling pooled time series ID Jahr Kids Educ 1 1984 0 12 1 1985 0 12............ 2 1984 2 9 2 1985 2 9............ 3 1984 0 10 3 1985 1 11............ 4 1984 1 8 4 1985 2 8............ 5 1984 3 13 5 1985 3 13............ 6 1984 2 15 6 1985 2 15............ pooled cross-sections ID Jahr Kids Educ 1 1984 0 12 2 1984 2 9 3 1984 0 10 4 1984 1 8 5 1984 3 13 6 1984 2 15 7 1984 0 9............ 1 1985 0 12 2 1985 2 9 3 1985 1 11 4 1985 2 8 5 1985 3 13 6 1985 2 15 7 1985 1 10............ 7 2000 2 13............ Panel data = pooled time series and cross-section data (TSCS) 12

Micro and macro panels macro panel OECD data (countries over time) T n or even T > n micro panel household panel studies (e.g., GSOEP) n >> T Why is this distinction important macro: more info to analyze time dimension I focus on micro panels! 13

Topics 1. Introduction 2. How to manage panel data 3. Independent observations? 4. Describing panel data 5. Explaining panel data 6. Modeling panel data 7. How to estimate models for panel data 14

Example 1: y continuous 545 males observed 1980-1987 (NLS Youth Sample) n = 545, T = 8, y = log hourly wage Source: Vella and Verbeek (1998) Log hourly wage: original data Year n Mean Serial correlation Sd log(y) y (t, t-1) (t, t=1) 1980 545 1.393 4.03 0.558 1981 545 1.513 4.54 0.531 0.454 0.454 1982 545 1.572 4.81 0.497 0.611 0.432 1983 545 1.619 5.05 0.481 0.690 0.408 1984 545 1.690 5.42 0.524 0.675 0.316 1985 545 1.739 5.69 0.523 0.664 0.356 1986 545 1.800 6.05 0.515 0.632 0.297 1987 545 1.866 6.47 0.467 0.693 0.310 high serial dependence decreases with time-lag between measurements 15

Example 2: y categorical 700 females observed 1970-1973 (NLS Women Sample) n = 700, T = 4, y = union membership (no, yes) Source: Stata Manual Year t 1970 1971 1972 1973 Union membership State probability First-order transition matrix Higher-order transition n (t, t+1) matrix (t=1, t+1) no member member no member member no member 535 76.43 91.96 8.04 91.96 8.04 member 165 23.57 25.45 74.55 25.45 74.55 no member 534 76.29 92.88 7.12 91.59 8.41 member 166 23.71 23.49 76.51 27.27 72.73 no member 535 76.43 93.08 6.92 90.84 9.16 member 165 23.57 26.67 73.33 33.94 66.06 no member 542 77.43 member 158 22.57 high serial dependence decreases with time-lag between measurements 16

Consequences conventional statistical methods assume independent observations consequently, estimated standard errors tend to be too low test statistics are too high p-values are too low significance tests may lead to erroneous conclusions 17

Topics 1. Introduction 2. How to manage panel data 3. Independent observations? 4. Describing panel data 5. Explaining panel data 6. Modeling panel data 7. How to estimate models for panel data 18

Simple techniques Trend Serial dependence Sequence Continuous mean, standard deviation correlation graphs? tables? Categorical proportion transition probability graphs? tables? 19

Example 2: y categorical 1970 Sequence n % Total 0000 442 63.14 0001 24 3.43 0010 19 2.71 0011 7 1.00 0100 21 3.00 535 0101 3 0.43 0110 4 0.57 0111 15 2.14 1000 27 3.86 1001 3 0.43 1010 4 0.57 1011 8 1.14 1100 8 1.14 165 1101 7 1.00 1110 17 2.43 1111 91 13.00 no union member union member for example: four alternatives to get from y 70 =0 to y 73 =1 sequences too detailed, especially with large T 20

... continued alternatively, focus on one origin state plot probability of survival in the origin state alternatively, plot conditional transition probability Figure: Union membership over time and region probability of survival 0.00 0.25 0.50 0.75 1.00 0 1 2 3 4 wave region = other region = South 21

Example 1: y continuous 1981 1982 1983 1984 1985 1986 1987 Mean 2.937 3.473 2.971 3.777 4.052 3.293 3.096 3.174 2.541 3.086 3.229 2.620 2.691 2.606 2.236 2.685 1.972 2.232 2.554 2.905 2.966 2.990 3.132 2.625 2.338 2.416 2.514 2.648 2.636 2.719 2.769 2.550 2.510 2.593 2.564 2.465 2.613 2.645 2.282 2.530 2.391 2.356 2.562 2.636 2.643 2.807 2.813 2.523 2.412 2.315 2.508 2.599 2.700 2.741 2.663 2.518 1.962 2.276 2.195 2.428 2.723 2.966 3.065 2.454 1.574 1.442 2.547 2.991 3.011 3.099 3.097 2.437 2.531 2.171 2.197 2.455 2.069 2.429 2.602 2.377 0.030 0.688 0.676 0.647 0.955 1.564 1.301 0.793 0.898 0.970 0.804 1.306 1.477-0.981 0.791 0.782 0.541 0.195 0.383 1.104 1.066 1.202 1.046 0.777 0.289 0.906 0.339 1.213 1.502 1.277-0.191 0.764 1.079 0.840 1.508 1.021 0.543 0.115 0.313 0.763 0.075 0.172 0.948 1.227 0.435 0.639 1.678 0.760 0.606 0.865 0.512 0.684 1.000 1.435 1.177 0.738-1.417-0.670 0.703 0.713 0.820 1.069 1.039 0.414-0.149 0.181-0.036 0.397 0.973 0.563 0.759 0.333 no duplicate patterns simple data listing no structure visible 22

... continued Alternative: use line plots for each unit limited technique with many units Figure: Income trajectories for 19 men from high and low income groups log hourly wage -2 0 2 4 1980 1982 1984 1986 1988 Year 23

Topics 1. Introduction 2. How to manage panel data 3. Independent observations? 4. Describing panel data 5. Explaining panel data 6. Modeling panel data 7. How to estimate models for panel data 24

Why different income trajectories? log hourly wage -2 0 2 4 1980 1982 1984 1986 1988 Year 25

... continued Explanatory factors (unit, context) time-constant variables Z e.g., ethnicity, national language time-dependent variables X e.g., labor force experience, economic growth time t e.g., calendar time, time since an event (birth, labor force entry) Notes misunderstanding: Z level, X change few variables time-constant by nature often only time-constant information available time is an indicator rather than a causal factor 26

Why serial dependence? 1. time-constant variables Z e.g., ethnicity and income 2. time-dependent variables X more complicated, but X often similar next year serial dependence of the X 3. Y is influenced by former values of Y e.g., bureaucratic behavior technical terms spurious state dependence: (1, 2) true state dependence: (3) dynamic models: (3) 27

Topics 1. Introduction 2. How to manage panel data 3. Independent observations? 4. Describing panel data 5. Explaining panel data 6. Modeling panel data 7. How to estimate models for panel data 28

Types of longitudinal models independent variables level Y change ΔY level X, Z human capital theory gender-specific mortality change ΔX adaptive behavior many change processes level Y learning processes 29

Mathematics: y continuous Level Change E( yit ji ) = β 0 ( t) + β1x1 it + K+ βk xkit + γ 1z1i + K+ γ j z 14444 24444 31442444 3 time-dependent part time-constant part model in levels easily transformed in model of change both models conceptually equivalent however, empirical estimates different 30

Mathematics: y categorical Level Pr( yit = k) = G( β ( t) + β1x1 it + K+ βk xkit + γ 1z1i + K+ γ j z G( ) suitable distribution function normal probit, logistic logistic regression Change 0 ji Pr( yit = k yi = j, K, yi, t 1 = j) = G( β0( t) + β1x1 it + K+ γ1z1 i + 1 K conditional transition probability logistic regression for discrete event histories ) ) 31

Topics 1. Introduction 2. How to manage panel data 3. Independent observations? 4. Describing panel data 5. Explaining panel data 6. Modeling panel data 7. Estimating models for panel data 32

What to do? switch from E(y) and Pr(y) to observed data (include an error term) proceed with two alternative options: 1. treat serial dependence as a nuisance 2. explicitely model serial dependence y 1444 K it = β0 ( t) + β1x1 it + K+ βk xkit + eit + γ 1z1i + + γ j z ji + ui 144444 244444 3 24443 time-dependent part time-constant part u i (unit-specific effect) controls that observations for unit i have something in common that is not captured by X and Z 33

Strategies of estimation strategy continuous categorical control serial dependence level Y change ΔY robust std err s generalized estimating equations linear regression with fixed or random (unit) effects linear regression with first differences robust std err s generalized estimating equations logistic regression with fixed or random (unit) effects event history analysis 34

Finally

What I did not talk about statistical assumptions dynamic models models with reciprocal causation measurement error panel attrition missing data... 36

Important technical terms categorical / continuous variable (conditional) transition probability dynamic model event history analysis first differences (FD) fixed effects (FE) generalized estimating equations (GEE) hierarchical data macro / micro panel pooling random effects (RE) robust standard errors sequence serial correlation / dependence spurious / true state dependence survival probability transition matrix unit-specific effect wide / long format 37

More info Introductory textbook Chapters 13 and 14 of: Wooldridge, J. (2005): Introductory econometrics: a modern approach. South Western College Publishing. 38