Pairwise measures of causal direction in linear non-gaussian acy

Similar documents
New Machine Learning Methods for Neuroimaging

Discovery of Linear Acyclic Models Using Independent Component Analysis

Estimation of linear non-gaussian acyclic models for latent factors

Independent Component Analysis

Causal Inference on Discrete Data via Estimating Distance Correlations

DirectLiNGAM: A Direct Method for Learning a Linear Non-Gaussian Structural Equation Model

Discovery of non-gaussian linear causal models using ICA

Joint estimation of linear non-gaussian acyclic models

CIFAR Lectures: Non-Gaussian statistics and natural images

Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models

Distinguishing between Cause and Effect: Estimation of Causal Graphs with two Variables

From independent component analysis to score matching

arxiv: v1 [cs.ai] 16 Sep 2015

Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models

Independent Component Analysis

Natural Image Statistics

Causal Discovery with Linear Non-Gaussian Models under Measurement Error: Structural Identifiability Results

Connections between score matching, contrastive divergence, and pseudolikelihood for continuous-valued variables. Revised submission to IEEE TNN

Causal Discovery in the Presence of Measurement Error: Identifiability Conditions

Independent Component Analysis

Joint Gaussian Graphical Model Review Series I

Bayesian Discovery of Linear Acyclic Causal Models

Nonperforming Loans and Rules of Monetary Policy

TWO METHODS FOR ESTIMATING OVERCOMPLETE INDEPENDENT COMPONENT BASES. Mika Inki and Aapo Hyvärinen

Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts I-II

2 Nonlinear least squares algorithms

Estimating Unnormalized models. Without Numerical Integration

High-dimensional learning of linear causal networks via inverse covariance estimation

Causality. Bernhard Schölkopf and Jonas Peters MPI for Intelligent Systems, Tübingen. MLSS, Tübingen 21st July 2015

Structural equations and divisive normalization for energy-dependent component analysis

Questions and Answers on Heteroskedasticity, Autocorrelation and Generalized Least Squares

Causal modelling combining instantaneous and lagged effects: an identifiable model based on non-gaussianity

1 Introduction Independent component analysis (ICA) [10] is a statistical technique whose main applications are blind source separation, blind deconvo

Covariance and Correlation

Statistics and Data Analysis

Numerical Methods I Solving Nonlinear Equations

11. Regression and Least Squares

Inferring the Causal Decomposition under the Presence of Deterministic Relations.

Chapter 1 Statistical Inference

Lecture'12:' SSMs;'Independent'Component'Analysis;' Canonical'Correla;on'Analysis'

Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts III-IV

Practical Statistics

Learning features by contrasting natural images with noise

Lecture 16 : Independence, Covariance and Correlation of Discrete Random Variables

Testing for Regime Switching in Singaporean Business Cycles

Introduction to Probability and Stocastic Processes - Part I

Finding a causal ordering via independent component analysis

SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416)

Linear Models and Estimation by Least Squares

Discovery of Exogenous Variables in Data with More Variables than Observations

Supplementary material to Structure Learning of Linear Gaussian Structural Equation Models with Weak Edges

Probabilistic Graphical Models

Unsupervised Feature Extraction by Time-Contrastive Learning and Nonlinear ICA

An Introduction to Independent Components Analysis (ICA)

Linear Factor Models. Sargur N. Srihari

Simplicity of Additive Noise Models

Using Kernel PCA for Initialisation of Variational Bayesian Nonlinear Blind Source Separation Method

Linear Methods for Regression. Lijun Zhang

Introduction to Simple Linear Regression

Lecture 25: Review. Statistics 104. April 23, Colin Rundel

Multivariate Random Variable

Proximity-Based Anomaly Detection using Sparse Structure Learning

REVIEW 8/2/2017 陈芳华东师大英语系

A Note on Bootstraps and Robustness. Tony Lancaster, Brown University, December 2003.

Lecture: Simultaneous Equation Model (Wooldridge s Book Chapter 16)

Estimation theory and information geometry based on denoising

Strategies for Discovering Mechanisms of Mind using fmri: 6 NUMBERS. Joseph Ramsey, Ruben Sanchez Romero and Clark Glymour

Problem Solving. Correlation and Covariance. Yi Lu. Problem Solving. Yi Lu ECE 313 2/51

11. Learning graphical models

Review of Probability Theory

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

Methods for sparse analysis of high-dimensional data, II

Bayesian Networks and Markov Random Fields

Regression, Ridge Regression, Lasso

Semi-Blind approaches to source separation: introduction to the special session

CSC321 Lecture 20: Reversible and Autoregressive Models

Reduction of Random Variables in Structural Reliability Analysis

Statistics, Data Analysis, and Simulation SS 2015

Questions and Answers on Unit Roots, Cointegration, VARs and VECMs

ON SOME EXTENSIONS OF THE NATURAL GRADIENT ALGORITHM. Brain Science Institute, RIKEN, Wako-shi, Saitama , Japan

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison

ECON 4160: Econometrics-Modelling and Systems Estimation Lecture 7: Single equation models

BLIND SEPARATION USING ABSOLUTE MOMENTS BASED ADAPTIVE ESTIMATING FUNCTION. Juha Karvanen and Visa Koivunen

Deep Convolutional Neural Networks for Pairwise Causality

CS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu

Econ 623 Econometrics II Topic 2: Stationary Time Series

Estimation of causal effects using linear non-gaussian causal models with hidden variables

Impulsive Noise Filtering In Biomedical Signals With Application of New Myriad Filter

Unsupervised Learning: Dimensionality Reduction

CSCI5654 (Linear Programming, Fall 2013) Lectures Lectures 10,11 Slide# 1

Economics 536 Lecture 7. Introduction to Specification Testing in Dynamic Econometric Models

Dimension Reduction (PCA, ICA, CCA, FLD,

A short introduction to INLA and R-INLA

CAUSAL MODELING WITH APPLICATIONS TO THE FOREIGN EXCHANGE MARKET. A Dissertation BRIAN DAVID DEATON

Single-Gaussian Messages and Noise Thresholds for Low-Density Lattice Codes

Instrumental Sets. Carlos Brito. 1 Introduction. 2 The Identification Problem

Correlation analysis. Contents

MATHEMATICS FOR COMPUTER VISION WEEK 2 LINEAR SYSTEMS. Dr Fabio Cuzzolin MSc in Computer Vision Oxford Brookes University Year

Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science.

Independent Component Analysis. PhD Seminar Jörgen Ungh

Transcription:

Pairwise measures of causal direction in linear non-gaussian acyclic models Dept of Mathematics and Statistics & Dept of Computer Science University of Helsinki, Finland

Abstract Estimating causal direction is fundamental problem in science Bayesian networks or structural equation models are ill-defined for gaussian data They can be estimated using non-gaussianity (Shimizu et al, JMLR 2006) Here, we develop a new approach based on likelihood ratios of variable pairs Approximated by simple nonlinear correlations Further lead to higher-order cumulants which allow deeper theoretical analysis give intuitive interpretation are noise-robust

Introduction Structural equation models : Introduction Model connections between the measured variables: Which variable causes which? Correlation does not equal causation : but we can go beyond correlation Two fundamental approaches If we have time series and time-resolution of measurements fast enough: use autoregressive modelling Otherwise, use structural equation models (here)

Structural equation models Introduction Structural equation models How does an externally imposed change in one variable affect the others? x i = j i b ij x j +e i Difficult to estimate, not simple regression Classic methods fail in general

Introduction Structural equation models Structural equation models How does an externally imposed change in one variable affect the others? x4 x i = j i b ij x j +e i -0.3 x2-0.56 0.82 0.89 Difficult to estimate, not simple regression Classic methods fail in general x3 0.14-0.26 x1 0.37 Can be estimated if (Shimizu et al., JMLR, 2006) 1-1 0.12 1. the e i (t) are mutually independent 2. the e i (t) are non-gaussian, e.g. sparse 3. the b ij are acyclic: There is an ordering of x i where effects are all forward x6 x7 1 x5

Introduction Structural equation models Estimation of SEM by ICA We have thus defined a linear non-gaussian acyclic model (LiNGAM; Shimizu et al, JMLR, 2006) Previously, we proposed estimation using ICA. Transform Becomes an ICA model! x = Bx+e x = (I B) 1 e But one complication: ICA does not estimate order of e i In SEM, ei have a specific order Acyclicity allows determination of the right order

Definition and approximation Cumulant-based approach Pairwise likelihood ratio approach Consider two variables, x and y, both standardized and non-gaussian. Goal: distinguish between two causal models: y = ρx +d (x y) (1) x = ρy +e (y x) (2) where disturbances d,e are independent of x,y. Simple solution: Compute likelihoods of the models, and take their ratio

Deriving likelihood ratios Definition and approximation Cumulant-based approach The two models are special cases of LiNGAM, and likelihood can be obtained as (Hyvarinen, JMLR, 2010) For x y : logl(x y) = t G x (x t )+G d ( y t ρx t 1 ρ 2 ) log(1 ρ2 ) where G x (u) = logp x (u), and G d is the standardized log-pdf of the residual when regressing y on x. Symmetrically obtained for y x. This gives likelihood ratio in closed form, if we have good approximations of the pdf s of the variables and the residuals. in ICA we would typically approximate: G(u) = 2logcosh( π 2 u)+const. (3) 3

Approximation of likelihood ratios Definition and approximation Cumulant-based approach Assume the pdf s of x and y equal, and take Taylor expansion G( y ρx 1 ρ 2 ) = G(y) ρx g(y)+o(ρ2 ) (4) where g is the derivative of G, We obtain: logl(x y) logl(y x) R = ρ x t g(y t )+g(x t )y t T where typically g(u) = tanh(u) Choosing between models is reduced to considering the sign of a nonlinear correlation!!! If R > 0, decide x y, otherwise decide y x. t

Definition and approximation Cumulant-based approach Approach using higher-order cumulants Nonlinear correlations can be replaced using cumulants, e.g. ρê{x3 y xy 3 } = ρ[cum(x,x,x,y) cum(y,y,y,x)] Likely to have similar qualitative behaviour as ρê{x tanh(y)+tanh(x)y} since related to Taylor expansion tanh(u) = u 1 3 u3 +o(u 3 ) For skewed distributions, we can use third-order cumulant ρê{x 2 y xy 2 } Important points: Cumulants can be proven to give right direction Immune to additive noise

Intuitive interpretation Definition and approximation Cumulant-based approach x y, i.e. y = ρx +d and the variables are very sparse. Regression toward the mean: ρ < 1 for standardized variables Nonlinear correlation E{x 3 y} is larger than E{xy 3 } because both variables are simultaneously large typically when x takes larger values than y due to regression towards the mean.

Using pairwise measures with more variables Assume LiNGAM model for n variables x = Bx+e (5) Compute nonlinear correlation matrix derived above M = cov(x) E{xg(x) T g(x)x T } (6) for some nonlinearity such as g(u) = u 3, g(u) = u 2, g(u) = tanh(u), For x i with no parents, all entries in the i-th row of M are non-negative, neglecting random errors. This allows us to find the (non-unique) root of the graph Iterating, we find ordering of directed acyclic graph Closely related to DirectLiNGAM (Shimizu et al 2009)

One simulation 1 a) First variable found 1 b) Two first variables found 1 c) Mean rank correlations 10000 d) Computation time 0.8 0.8 0.8 1000 0.6 0.6 0.6 0.4 0.4 0.4 100 0.2 0.2 0.2 10 0 LR LRap LRnd kdir dir ICA 0 LR LRap LRnd kdir dir ICA 0 LR LRap LRnd kdir dir ICA 1 LR LRap LRnd kdir dir ICA with added measurement noise. Five variables, 10,000 data points. Algorithms: LR : true likelihood ratios LRap : LR approximations based on tanh LRnd : Simpler variant with no deflation kdir : KernelDirectLiNGAM (Sogawa et al 2010) dir : original DirectLiNGAM (Shimizu et al 2009) ICA : LiNGAM estimated by ICA (Shimizu et al 2006)

possible using something more than correlations Structural equation models can be estimated by non-gaussianity (Shimizu et al, JMLR, 2006) Here I propose likelihood ratio tests for two variables Log-likelihood ratio approximated by nonlinear correlations Leads to higher-order cumulants (which have no other known intuitive interpretation) Pairwise tests can be used to estimate model with many variables using methods like DirectLiNGAM Particularly efficient when there is Measurement noise, or Few data points