Causality in Econometrics (3)

Similar documents
Graphical models and causality: Directed acyclic graphs (DAGs) and conditional (in)dependence

1. what conditional independencies are implied by the graph. 2. whether these independecies correspond to the probability distribution

Equivalence in Non-Recursive Structural Equation Models

Causal Inference & Reasoning with Causal Bayesian Networks

Automatic Causal Discovery

Lecture 4 October 18th

Summary of the Bayes Net Formalism. David Danks Institute for Human & Machine Cognition

Identifiability assumptions for directed graphical models with feedback

Rapid Introduction to Machine Learning/ Deep Learning

Chapter 16. Structured Probabilistic Models for Deep Learning

Using Descendants as Instrumental Variables for the Identification of Direct Causal Effects in Linear SEMs

Causal Reasoning with Ancestral Graphs

Towards an extension of the PC algorithm to local context-specific independencies detection

Arrowhead completeness from minimal conditional independencies

Chris Bishop s PRML Ch. 8: Graphical Models

Abstract. Three Methods and Their Limitations. N-1 Experiments Suffice to Determine the Causal Relations Among N Variables

Introduction to Causal Calculus

CS Lecture 3. More Bayesian Networks

Introduction to Probabilistic Graphical Models

Causal Discovery. Richard Scheines. Peter Spirtes, Clark Glymour, and many others. Dept. of Philosophy & CALD Carnegie Mellon

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

Learning causal network structure from multiple (in)dependence models

Tópicos Especiais em Modelagem e Análise - Aprendizado por Máquina CPS863

Directed Graphical Models

Detecting marginal and conditional independencies between events and learning their causal structure.

Probabilistic Graphical Networks: Definitions and Basic Results

Learning in Bayesian Networks

Supplementary material to Structure Learning of Linear Gaussian Structural Equation Models with Weak Edges

Covariance and Correlation

Directed and Undirected Graphical Models

Generalized Measurement Models

Causal Effect Identification in Alternative Acyclic Directed Mixed Graphs

Learning Multivariate Regression Chain Graphs under Faithfulness

2 : Directed GMs: Bayesian Networks

Probabilistic Graphical Models (I)

Conditional Independence and Factorization

On the Identification of a Class of Linear Models

Faithfulness of Probability Distributions and Graphs

Causal Bayesian networks. Peter Antal

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

CSC 412 (Lecture 4): Undirected Graphical Models

Recall from last time. Lecture 3: Conditional independence and graph structure. Example: A Bayesian (belief) network.

Directed Graphical Models or Bayesian Networks

STA 4273H: Statistical Machine Learning

Markov properties for undirected graphs

Learning Semi-Markovian Causal Models using Experiments

Causal Bayesian networks. Peter Antal

STATISTICAL METHODS IN AI/ML Vibhav Gogate The University of Texas at Dallas. Bayesian networks: Representation

4.1 Notation and probability review

The Role of Assumptions in Causal Discovery

Learning the Structure of Linear Latent Variable Models

Intelligent Systems: Reasoning and Recognition. Reasoning with Bayesian Networks

Markov Independence (Continued)

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Learning Marginal AMP Chain Graphs under Faithfulness

Learning Causal Bayesian Networks from Observations and Experiments: A Decision Theoretic Approach

Reasoning with Bayesian Networks

Identifying Linear Causal Effects

A Brief Introduction to Graphical Models. Presenter: Yijuan Lu November 12,2004

COMP538: Introduction to Bayesian Networks

Final Exam. Economics 835: Econometrics. Fall 2010

Exchangeability and Invariance: A Causal Theory. Jiji Zhang. (Very Preliminary Draft) 1. Motivation: Lindley-Novick s Puzzle

Bayesian Networks Representation

1 : Introduction. 1 Course Overview. 2 Notation. 3 Representing Multivariate Distributions : Probabilistic Graphical Models , Spring 2014

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

Statistical Approaches to Learning and Discovery

Reasoning Under Uncertainty: Belief Network Inference

Causal Models with Hidden Variables

Carnegie Mellon Pittsburgh, Pennsylvania 15213

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

CS839: Probabilistic Graphical Models. Lecture 2: Directed Graphical Models. Theo Rekatsinas

Bayesian networks as causal models. Peter Antal

Belief Update in CLG Bayesian Networks With Lazy Propagation

The Two Sides of Interventionist Causation

Causal Search in Time Series Models

Undirected Graphical Models

Interpreting and using CPDAGs with background knowledge

Preliminaries Bayesian Networks Graphoid Axioms d-separation Wrap-up. Bayesian Networks. Brandon Malone

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

ANALYTIC COMPARISON. Pearl and Rubin CAUSAL FRAMEWORKS

Lecture 1: Bayesian Framework Basics

Tópicos Especiais em Modelagem e Análise - Aprendizado por Máquina CPS863

Machine Learning Lecture 14

3 : Representation of Undirected GM

Tutorial: Causal Model Search

Motivation. Bayesian Networks in Epistemology and Philosophy of Science Lecture. Overview. Organizational Issues

Learning from Sensor Data: Set II. Behnaam Aazhang J.S. Abercombie Professor Electrical and Computer Engineering Rice University

Causal Discovery and MIMIC Models. Alexander Murray-Watters

2 : Directed GMs: Bayesian Networks

Econometrics A. Simple linear model (2) Keio University, Faculty of Economics. Simon Clinet (Keio University) Econometrics A October 16, / 11

Day 3: Search Continued

MA 575 Linear Models: Cedric E. Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2

Simple Linear Regression: The Model

Bayesian Graphical Models for Structural Vector AutoregressiveMarch Processes 21, / 1

Probabilistic Graphical Models. Rudolf Kruse, Alexander Dockhorn Bayesian Networks 153

Local Characterizations of Causal Bayesian Networks

Based on slides by Richard Zemel

Tutorial: Gaussian conditional independence and graphical models. Thomas Kahle Otto-von-Guericke Universität Magdeburg

Chapter 17: Undirected Graphical Models

CAUSAL MODELS: THE MEANINGFUL INFORMATION OF PROBABILITY DISTRIBUTIONS

Transcription:

Graphical Causal Models References Causality in Econometrics (3) Alessio Moneta Max Planck Institute of Economics Jena moneta@econ.mpg.de 26 April 2011 GSBC Lecture Friedrich-Schiller-Universität Jena Causality in Econometrics 1/27

Graphical Causal Models Terminology and Representation of Statistical Dependence Causality in Econometrics 2/27

Sources and Motivations The graphical-models approach to causal inference was mainly developed by: Spirtes, Glymour, Scheines (2000), Causation, Prediction, and Search, 2 nd edition. Pearl (2000), Causality: Models, Reasoning, and Inference. Forerunners: J.S. Mill C. Spearman T. Haavelmo, H. Wold, H. Simon H. Reichenbach, P. Suppes Causality in Econometrics 3/27

Sources and Motivations Ideas: Use of probability + diagrams to represent associations in the data Use of graph-theory to represent and analyze causal relations This permits, in particular: addressing the symmetry problem, typical of probabilistic approaches representation of structures where interventions are possible Formalization of the relationship between probabilistic and causal representation Emphasis on inference, agnosticism about causal ontology. But: many points of contact with probabilistic approach (Reichenbach) manipulability theory (Woodward). Causality in Econometrics 4/27

Formal preliminaries Graph: < V, M, E > set V of vertices (or nodes) to represent variables. set M of marks as >, (or EM empty mark), o, to represent directions of causal influences. set E of edges, which are pairs of the form {[V 1, M 1 ], [V 2, M 2 ]}, to represent causal relationships. V 1 V2 V 3 G: < {V 1, V 2, V 3 }, {EM, >}, {{[V 1, EM], [V 2, >]}, {[V 1, EM], [V 3, EM]}, {[V 3, EM], [V 2, >]}} > Causality in Econometrics 5/27

Formal preliminaries Undirected graph: graph in which the set of marks M = {EM} Directed graph: graph in which the set of marks M = {EM, >} and for each edge in E the marks are are always: EM, > Directed edges: A B ( {[A, EM], [B, >]}) A : parent, B : child (descendant). Causality in Econometrics 6/27

Formal preliminaries Path: undirected path: a sequence of vertices A,..., B such that for every pair of vertices X, Y adjacent (in the sequence) there is a connecting edge {[X, M 1 ][Y, M 2 ]}. directed path: a sequence of vertices A,..., B such that for every pair of vertices X, Y adjacent (in the sequence) there is a connecting edge {[X, EM][Y, >]}. acyclic path: path that contains no vertex more than once, otherwise it is cyclic. Causality in Econometrics 7/27

Example Graphical Causal Models References Introduction (In)dependence Probabilistic Inference V 1 V2 V4 V5 V 3 Directed paths: < V 1, V 2, V 4, V 5 >; < V 3, V 2, V 4, V 5 >; < V 2, V 4, V 5 >, etc. Undirected paths: < V 1, V 3, V 2, V 4, V 5 >; < V 1, V 2, V 3 >, etc. Undirected cyclic path: < V 1, V 2, V 3, V 1 > No directed cyclic paths. Causality in Econometrics 8/27

More terminology Collider: vertex V such that A V B Unshielded collider: vertex V such that A V B and A and B are not adjacent ( connected by edge) in the graph Complete graph: graph in which every pair of vertices are adjacent Directed Acyclic Graph (DAG): directed graph that contains no directed cyclic paths Directed Cyclic Graph (DCG): directed graph that contains directed cyclic paths Causality in Econometrics 9/27

Graphs and probabilistic dependence First use of graphs: representation of probabilistic dependence and independence Nodes: random variables (discrete or continuous). Edges: probabilistic dependence. Bayesian networks (Pearl 1985). Causality in Econometrics 10/27

Conditional Independence If X, Y, Z are random variables, we say that X is conditionally independent of Y given Z, and write X Y Z (1) if for discrete variables: P(X = x, Y = y Z = z) = P(X = x Z = z)p(y = y Z = z) for continuous variables: f XY Z (x, y z) = f X Z (x z)f Y Z (y z) We can also write (simplifying the notation): X Y Z f (x, y, z)f (z) = f (x, z)f (y, z) Causality in Econometrics 11/27

Conditional independence Some equalities: X Y Z f (x, y z) = f (x z)f (y z) X Y Z f (x, y, z)f (z) = f (x, z)f (y, z) X Y Z f (x y, z) = f (x z) X Y Z f (x, z y) = f (x z)f (z y) X Y Z f (x, y, z) = f (x z)f (y, z) Note: f (x, y z) = f (x, y, z)/f (z) Causality in Econometrics 12/27

Conditional independence It holds also: X Y Z Y X Z (symmetry) If Z is empty (trivial) X Y: X is independent of Y. Other properties: X YW Z = X Y Z (decomposition) X YW Z = X Y ZW (weak union) See Pearl 2000:11 Causality in Econometrics 13/27

Interpretations of C.I. Useful interpretations of C.I. X Y Z: once we know Z, learning the value of Y does not provide additional information about X. once we know Z, reading X is irrelevant for reading Y. once we observe realizations of Z, observing realizations of Y is irrelevant for predicting the frequent realizations of X. Causality in Econometrics 14/27

Independence and uncorrelatedness Important to distinguish between (conditional) independence and (conditional or partial) correlation. Recall: Variance of X: σ 2 X := E[(X E(X))2 ] Covariance between X and Y: σ XY := E[(X E(X))(Y E(Y))] Correlation coefficient (Pearson): ρ XY := σ XY σ X σ Y Linear regression coefficient: r XY := σ XY σ 2 Y = ρ XY σ X σ Y This suggest that correlation is a measure of linear dependence Notice: σ XY = σ YX and ρ XY = ρ YX but r XY = r YX Causality in Econometrics 15/27

Independence and uncorrelatedness Recall: Partial correlation between X and Y given Z ρ XY.Z = ρ XY ρ YZ ρ XZ 1 ρ 2 YZ 1 ρ 2 XZ Conditional independence X Y Z: f XY Z (x, y z) = f X Z (x z)f Y Z (y z) It holds: X Y = ρ XY = 0 X Y Z = ρ XY.Z = 0 and (of course): ρ XY = 0 = X / Y ρ XY.Z = 0 = X / Y Z Causality in Econometrics 16/27

Independence and uncorrelatedness In general: ρ XY = 0 = X Y ρ XY.Z = 0 = X Y Z However, if the joint distribution F(XYZ) is normal: ρ XY = 0 = X Y ρ XY.Z = 0 = X Y Z Causality in Econometrics 17/27

Population and sample Notice also the difference between population parameters and sample statistics: ρ XY = σ XY σ X σ Y ˆρ XY = n k=1 (X k X)(Y k Ȳ) n k=1 (X k X) 2 n k=1 (Y k Ȳ) 2 r YX = σ XY σ 2 X ˆr YX = n k=1 (X k X)(Y k Ȳ) n k=1 (X k X) 2 ˆβ OLS = (X X) 1 XY, for vectors of data X (X 1,..., X n ), Y (Y 1,..., Y n ) and where X = n 1 ΣX i. Notice that when X = 0 and Ȳ = 0, ˆr YX = ˆβ OLS. Causality in Econometrics 18/27

Other concepts related to independence If, given the r.v. X and Y, the moments E(X k ) < and E(Y m ) <, it turns out that X Y iff E(X k Y m ) = E(X k )E(Y m ), for all k, m = 1, 2,... X and Y are (k, m)-order dependent iff E(X k Y m ) = E(X k )E(Y m ), for any k, m = 1, 2,... (1-1)-order linear dependence: E(XY) = E(X)E(Y) (1-1)-order independence: E(XY) = E(X)E(Y) E{[X E(X)][Y E(Y)]} = 0 σ XY = 0 ρ XY = 0 Orthogonality E(XY) = 0 Note: 1 if X and Y are uncorrelated (ρ XY = 0), this is equivalent to say that their mean deviations are orthogonal (if X and Y are centered, subtracting their mean, they become orthogonal). 2 if X and Y are orthogonal, ρ XY = 0 only if E(X) = 0 or E(Y) = 0 Causality in Econometrics 19/27

Other concepts related to independence r-th order independence E(Y r X = x) = 0 for all x R X In summary: independence = 1 st -order independence = non-correlation orthogonality mean-subtracted variables non-correlation = dependencies!) independence (there could be non-liner (cfr. Spanos 1999: 272-279) Causality in Econometrics 20/27

Statistical model Importance of defining a statistical model. Typical statistical model for continuous set of n random variables X Probability model: defines a family of density functions f (x; θ) defined over the range of values of X; Sampling model: X ((T n) matrix of data) is a random sample. (cfr. Spanos 1999: 33) Causality in Econometrics 21/27

The Markov Condition The Markov condition permits the representation of probabilistic dependence through a DAG. In particular, it imposes a relationship between the Bayesian network (DAG in which nodes are random variables) and the probabilistic structure. A directed acyclic graph G over V (set of vertices) and a probability distribution P(V) satisfy the Markov condition iff for every W V, W V\(Descendants(W) Parents(W)) given Parents(W). (Spirtes et al. 2000: 11) or, in other words: Any vertex (node) is conditionally independent of its nondescendants (except parents), given its parents. Causality in Econometrics 22/27

Markov Condition (example) V 1 V2 V4 V5 V 3 The DAG above and the probability distribution P(V 1, V 2, V 3, V 4 ) satisfy MC iff: (1) V 4 {V 1, V 3 } V 2 (2) V 5 {V 1, V 2, V 3 } V 4 Notice that many other c.i. relations follow from (1) and (2) by applying symmetry, decomposition, and weak union (see Slide 13 ). For example {V 1, V 3 } V 4 V 2 ; V 1 V 4 V 2 ; V 3 V 4 V 2 ; V 1, V 4 {V 2, V 3 }; etc. {V 1, V 2, V 3 } V 5 V 4 ; V 5 {V 1, V 2 } V 4 ; etc. Causality in Econometrics 23/27

Markov condition (factorization) The M.C. permits the following factorization: discrete case: P(V 1,..., V n ) = Π n i=1 P(V i Parents(V i )),where if Parents(V i ) =, P(V i Parents(V i )) = P(V i ) continuous case: f (V 1,..., V n ) = Π n i=1 f (V i Parents(V i )), where if Parents(V i ) =, f (V i Parents(V i )) = f (V i ) V 1 V2 V4 V5 V 3 We have: P(V 1, V 2, V 3, V 4, V 5 ) = P(V 1 V 3 )P(V 2 V 1, V 3 )P(V 3 )P(V 4 V 2 )P(V 5 V 4 ) Recall chain rule: in general P(V 1,..., V n ) = P(V n V n 1,..., V 2, V 1 ),..., P(V 2 V 1 )P(V 1 ) Causality in Econometrics 24/27

The d-separation criterion d-separation: a graphical criterion which captures exactly all the C.I. relationships that are implied by the M.C. Consider a graph G, with distinct nodes X, Y and a set of nodes W, where neither X nor Y belongs to W. We say that X and Y are d-separated given W in G iff there exists no undirected path U between X and Y, such that: 1 every collider C ( C ) on U is in W or has a descendant in W, and 2 no other vertex on U is in W. if there is such a path, then X and Y are d-connected. (cfr. Spirtes et al. 2000: 14). Included those derived by the MC through symmetry, decomposition and weak union. Causality in Econometrics 25/27

The d-separation criterion (Pearl s definition) d-separation: Consider a graph G, with distinct nodes X, Y and a set of nodes W, where neither X nor Y belongs to W. A path U is said to be d-separated by a set of nodes W iff 1 U contains a chain ( C or C ) or a fork ( C ) such that the middle node C W, or 2 U contains a collider C ( C ) s.t. C / W and s.t. no descendant of C is in W. A set W is said to d-separate X from Y iff W every path from X to Y is d-separated by W. Otherwise X and Y d-connected by W. (cfr. Pearl 2000: 16-17). Causality in Econometrics 26/27

Graphical Causal Models References Reading List Spirtes, Glymour, Scheines (2000), Causation, Prediction, and Search, MIT Press 2 nd edition: Chapter 1 and 2 Pearl (2000), Causality: Models, Reasoning, and Inference, CUP: Section 1.1 and 1.2 Spanos, A. (1999), Probability Theory and Statistical Inference. CUP: Further reading: Section 2.2 and 6.4 Cooper, G.F. (1999), An Overview of the Representation and Discovery of Causal Relationships Using Bayesian Networks, in C. Glymour, G.F. Cooper, Computation Causation, and Discovery, MIT Press. Scheines, R. (1997), An Introduction to causal inference. www Causality in Econometrics 27/27