The joint distribution criterion and the distance tests for selective probabilistic causality

Similar documents
The Joint Distribution Criterion and the Distance Tests for Selective Probabilistic Causality

On selective influences, marginal selectivity, and Bell/CHSH inequalities

Dissimilarity, Quasimetric, Metric

arxiv: v4 [math-ph] 28 Aug 2013

Some Background Material

Seminaar Abstrakte Wiskunde Seminar in Abstract Mathematics Lecture notes in progress (27 March 2010)

Contextuality-by-Default: A Brief Overview of Ideas, Concepts, and Terminology

Finite Automata. Mahesh Viswanathan

Chapter 2 Classical Probability Theories

Linear Algebra. The analysis of many models in the social sciences reduces to the study of systems of equations.

Vector Spaces. Chapter 1

Partial cubes: structures, characterizations, and constructions

Dr. Relja Vulanovic Professor of Mathematics Kent State University at Stark c 2008

Tree sets. Reinhard Diestel

The Integers. Peter J. Kahn

2 Generating Functions

Foundations of Mathematics MATH 220 FALL 2017 Lecture Notes

Hierarchy among Automata on Linear Orderings

The small ball property in Banach spaces (quantitative results)

Estimates for probabilities of independent events and infinite series

LINEAR ALGEBRA: THEORY. Version: August 12,

The Integers. Math 3040: Spring Contents 1. The Basic Construction 1 2. Adding integers 4 3. Ordering integers Multiplying integers 12

Checking Consistency. Chapter Introduction Support of a Consistent Family

Set theory. Math 304 Spring 2007

POL502: Foundations. Kosuke Imai Department of Politics, Princeton University. October 10, 2005

Economics 204 Fall 2011 Problem Set 1 Suggested Solutions

Boolean Algebras. Chapter 2

Hardy s Paradox. Chapter Introduction

CHAPTER 1: Functions

Principles of Real Analysis I Fall I. The Real Number System

Lebesgue Measure on R n

A CRITERION FOR THE EXISTENCE OF TRANSVERSALS OF SET SYSTEMS

STAT 7032 Probability Spring Wlodek Bryc

Universal Algebra for Logics

Set, functions and Euclidean space. Seungjin Han

After taking the square and expanding, we get x + y 2 = (x + y) (x + y) = x 2 + 2x y + y 2, inequality in analysis, we obtain.

DERIVATIONS IN SENTENTIAL LOGIC

Computational Tasks and Models

1 Differentiable manifolds and smooth maps

Lecture Notes in Advanced Calculus 1 (80315) Raz Kupferman Institute of Mathematics The Hebrew University

Stochastic Processes

The Skorokhod reflection problem for functions with discontinuities (contractive case)

Course 212: Academic Year Section 1: Metric Spaces

Lecture Notes on Inductive Definitions

MA103 Introduction to Abstract Mathematics Second part, Analysis and Algebra

Discrete Probability Refresher

Chapter ML:IV. IV. Statistical Learning. Probability Basics Bayes Classification Maximum a-posteriori Hypotheses

CONSTRUCTION OF THE REAL NUMBERS.

HILBERT SPACES AND THE RADON-NIKODYM THEOREM. where the bar in the first equation denotes complex conjugation. In either case, for any x V define

Notes on the Dual Ramsey Theorem

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9

Theorems. Theorem 1.11: Greatest-Lower-Bound Property. Theorem 1.20: The Archimedean property of. Theorem 1.21: -th Root of Real Numbers

1.1. MEASURES AND INTEGRALS

A Class of Partially Ordered Sets: III

Stochastic Quantum Dynamics I. Born Rule

On Universality of Classical Probability with Contextually Labeled Random Variables: Response to A. Khrennikov

Chapter 2. Mathematical Reasoning. 2.1 Mathematical Models

Context-free grammars and languages

Proving languages to be nonregular

Lecture Notes on Inductive Definitions

Standard forms for writing numbers

1. Continuous Functions between Euclidean spaces

Homework 1 (revised) Solutions

Basic counting techniques. Periklis A. Papakonstantinou Rutgers Business School

Singular Failures of GCH and Level by Level Equivalence

Axiomatic set theory. Chapter Why axiomatic set theory?

REVIEW OF ESSENTIAL MATH 346 TOPICS

Coins and Counterfactuals

Topological properties

CS632 Notes on Relational Query Languages I

Discrete Mathematics: Lectures 6 and 7 Sets, Relations, Functions and Counting Instructor: Arijit Bishnu Date: August 4 and 6, 2009

Euler s, Fermat s and Wilson s Theorems

Part V. 17 Introduction: What are measures and why measurable sets. Lebesgue Integration Theory

Math 105A HW 1 Solutions

Countability. 1 Motivation. 2 Counting

Continuum Probability and Sets of Measure Zero

Mathematics 114L Spring 2018 D.A. Martin. Mathematical Logic

Axioms of Kleene Algebra

Metric Spaces and Topology

Notes on State Minimization

at time t, in dimension d. The index i varies in a countable set I. We call configuration the family, denoted generically by Φ: U (x i (t) x j (t))

MATH 102 INTRODUCTION TO MATHEMATICAL ANALYSIS. 1. Some Fundamentals

Additive factors and stages of mental processes in task networks.

A Note On Comparative Probability

The Lebesgue Integral

Sample Spaces, Random Variables

Measure and integration

Connectedness. Proposition 2.2. The following are equivalent for a topological space (X, T ).

Review of Probability Theory

INTRODUCTION TO FURSTENBERG S 2 3 CONJECTURE

2. Two binary operations (addition, denoted + and multiplication, denoted

Construction of a general measure structure

Equivalent Forms of the Axiom of Infinity

Rose-Hulman Undergraduate Mathematics Journal

DR.RUPNATHJI( DR.RUPAK NATH )

Exercises for Unit VI (Infinite constructions in set theory)

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

HOW DO ULTRAFILTERS ACT ON THEORIES? THE CUT SPECTRUM AND TREETOPS

EXAMPLES AND EXERCISES IN BASIC CATEGORY THEORY

Supplementary Notes on Inductive Definitions

Transcription:

Hypothesis and Theory Article published: 7 September 00 doi: 03389/fpsyg00005 The joint distribution criterion and the distance tests for selective probabilistic causality Ehtibar N Dzhafarov * and Janne V Kujala 3 Department of Psychological Sciences Purdue University West Lafayette IN USA Swedish Collegium for Advanced Study Uppsala Sweden 3 University of Jyväskylä Jyväskylä Finland Edited by: Hans Colonius Carl von Ossietzky Universitaet Oldenburg Germany Reviewed by: Hans Colonius Carl von Ossietzky Universitaet Oldenburg Germany Ali Ünlü University of Dortmund Germany *Correspondence: Ehtibar N Dzhafarov Department of Psychological Sciences Purdue University 703 Third Street West Lafayette IN 47907 USA e-mail: ehtibar@purdueedu A general definition and a criterion (a necessary and sufficient condition) are formulated for an arbitrary set of eternal factors to selectively influence a corresponding set of random entities (generalized random variables with values in arbitrary observation spaces) jointly distributed at every treatment (a set of factor values containing precisely one value of each factor) The random entities are selectively influenced by the corresponding factors if and only if the following condition called the joint distribution criterion is satisfied: there is a jointly distributed set of random entities one entity for every value of every factor such that every subset of this set that corresponds to a treatment is distributed as the original variables at this treatment The distance tests (necessary conditions) for selective influence previously formulated for two random variables in a two-by-two factorial design (Kujala and Dzhafarov 008 J Math Psychol 5 8 44) are etended to arbitrary sets of factors and random variables The generalization turns out to be the simplest possible one: the distance tests should be applied to all two-by-two designs etractable from a given set of factors Keywords: eternal factors joint distribution probabilistic causality selective influence systems of random variables stochastic dependence stochastically unrelated A system s behavior be the system biological social or technological can be thought of as a network of stochastically interdependent random entities The eternal world provides inputs (influences interventions conditions) presumably affecting some of the components of the network and not affecting the others The question arises therefore as to how based on the joint distributions of all these random entities to distinguish the components affected and not affected by each of these eternal inputs The notion of selective influence under stochastic interdependence was introduced and systematically analyzed in the behavioral contet by Townsend (984) although implicitly it had been used before (Lazarsfeld 965; Bloom 97; Schweickert 98) Townsend s approach to selective influence (further developed in Townsend and Thomas 994 and mathematically characterized in Dzhafarov 999) is however very different from the present one In fact in all non-trivial cases they are incompatible Our approach gradually developed starting with Dzhafarov (00) based on Dzhafarov s earlier work on response time analysis (see Dzhafarov 997 for an overview) In Dzhafarov (003) the definition of selective influence adopted in the present paper was given for finite systems of random entities This notion was put on a more solid probabilistic foundation in Dzhafarov and Gluhovsky (006) and further developed in Kujala and Dzhafarov (008) In the latter work for the first time workable tests for selective influence were formulated The present paper continues this line of research on a higher level of mathematical rigor and arguably the highest possible level of generality The abstract nature of the mathematical theory makes it rather difficult reading with notation which though carefully chosen may appear complicated As a partial remedy we precede the formal development in Sections 5 by Section in which we provide a more intuitive account of some of the basic notions and results We do this on simple eamples involving just two random variables influenced by two factors Intuitive introduction What is selective influence? Consider a simple double-detection eperiment: there are two stimuli each of which may possess or lack a certain feature (signal property) and an observer has to respond Yes (signal present) or No (signal absent) to each of the two stimuli For instance the stimuli may be two spatially separated line segments in a frontal plane each of which may be either vertical (signal absent) or tilted by a fied small angle (signal present); the observer says Yes Yes if both lines appear to be tilted Yes No if the left line appears tilted and the right one not etc These responses are random variables: A (response to the left stimulus) and B (response to the right one) each with two possible values {Yes No occurring with some probabilities They are jointly distributed in the sense that by the virtue of co-occurring in the same trial the values of A and B are naturally paired enabling one to meaningfully pose questions like What is the joint probability of A = Yes and B = No? The joint distribution of A and B may change depending on the values of the following eternal factors: = Tilt of the left line with two possible values {absent present and = Tilt of the right line with the same two values The combination of factor values chosen one for each of the factors is traditionally referred to as a treatment With wwwfrontiersinorg September 00 Volume Article 5

this terminology and notation the population-level (idealized) results of the eperiment in question can be presented in the form of four matrices: a = absent Treatment b = absent B = yes B = no A = yes p p p A = no p p P p p a = absent Treatment b = present B = yes B = no A = yes q q q A = no q q q q q a = present Treatment b = absent B = yes B = no A = yes r r r A = no r r r r r a = present Treatment b = present B = yes B = no A = yes s s s A = no s s s s s The letters p q r s here represent theoretical probabilities with the usual meaning of the subscripts: p = p + p p = p + p p + p = etc It is natural to surmise that unless the observer does not look at the stimuli at all the random variable A should depend on (be influenced by) the value of and B should be influenced by the value of It is not obvious however whether factor only (or selectively) influences random variable A without affecting B and whether factor only (selectively) influences random variable B without affecting A: A B What does this mean? The meaning of the relation is only obvious if A and B are stochastically independent for all four treatments ie if p ij = p i p j q ij = q i q j etc where i j { In this case all one has to establish to prove (A B) ( ) is that the marginal distribution of A is not affected by changes in and the marginal distribution of B is not affected by changes in To look at this in detail let the pair of our random variables (A B) at the four treatments be denoted (A B) (A B) (A B) and (A B) where denote absent and present respectively If A and B are independent at all four treatments then the selectiveness (A B) ( ) simply means that the marginal distribution of A does not depend on (ie p = q and r = s ) and the marginal distribution of B does not depend on (ie p = q and r = s ) The problem arises when A and B are not independent for at least one of the treatments: how should one determine then if (A B) ( )? This is the problem addressed in this paper only we do not confine the consideration to the case of two factors and two random variables Rather we generalize the problem to an arbitrary set of eternal factors and an arbitrary (but one-to-one corresponding to the set of factors) set of random entities For a finite set of random variables the definition of selective influence was given in Dzhafarov (003) and then refined in Dzhafarov and Gluhovsky (006) and Kujala and Dzhafarov (008) Applying it to our eample (A B) is selectively influenced by ( ) if and only if one can find functions f and g and a random entity C whose distribution does not depend on such that ( ) ( ( ) ( )) AB f C g C () where stands for is distributed as That is denoting f( = C) by f (C) g( = C) by g (C) etc ( AB ) ~( f ( C) g ( C)) ( AB ) ~( f ( C) g ( C)) ( AB ) ~( f ( C) g ( C)) ( A B) ~( f ( C) g ( C)) As an eample let C be a random vector (C 0 C I C ) with stochastically independent components having the following interpretation: C 0 is a random entity representing the general level of visual attention while C and C are stimulus-specific sources of randomness (which with no loss of generality can be taken to be uniformly distributed between 0 and ) Let as opposed to the possibilities A B A B A B Thus we will have one of the latter scenarios if the present value of visually masks or enhances the salience of the present value of or if the values of somehow affect the level of attention the observer pays to the factor We denote the case when ( ) selectively influence (A B) respectively by ( ) ( ) AB As eplained in Section we distinguish random entities and their special case random variables Random entities take on values in arbitrary measurable spaces while random variables map or can be redefined to map into real numbers endowed with the Borel sigma-algebra Note also that the notion of a random entity (or variable) should always be taken to include deterministic entities (variables) as a special case the same as the notion of stochastic interdependence unless otherwise indicated should be taken to include stochastic independence as a special case It is usually the case that the possibility of selectiveness is considered when it is known that the factors are effective in their influence upon (A B) meaning that for at least one value of either of the two factors the change of the other factor from to changes the joint distribution of (A B) This aspect of the dependence of (A B) on ( ) being relatively trivial we do not include it in the definition of selective influence In other words (A B) ( ) is taken to mean that does not influence A and does not influence B leaving open the question of whether influences A and/or influences B (see Dzhafarov 003 p 0) Frontiers in Psychology Quantitative Psychology and Measurement September 00 Volume Article 5

C h C C h C A = if > ( 0) if > ( 0) B = 0 if C h( C0) 0 if C h( C0) where h h are some measurable functions from the set of possible values of C into interval [0 ] One can see that A and B are generally stochastically interdependent by virtue of depending on one and the same random entity C 0 but that A does not depend on in the sense that for any given values of the other arguments C 0 = c 0 C = c C = c and = or the value of A does not change as a function of ; and B does not depend on in the analogous sense The definition of selective influence can also be looked at in a simpler and more fundamental way The fact that for any given treatment A and B are stochastically related (ie paired whether independent or interdependent) means in Kolmogorov s probability theory that A and B are measurable functions of one and the same random entity It is always true therefore that ( ) ( ( ) ( )) AB f C g C The random entities C C C C can always be replaced with a single C eg by putting C = (C C C C ) and redefining the functions f g accordingly: ( ) ( ( ) ( )) AB f C g C () Comparing this universal representation with () we see that the assumption of selective influence is that in f and in g are dummy arguments Main properties of selective influence There are three main properties of the selective influence relation (A B) ( ) First selective influence is invariant with respect to all (measurable) transformations of the random variables AB even if transformations of A are allowed to depend on values of factor and transformations of B are allowed to depend on values of factor In our eample the values of A and B are denoted yes and no Clearly we can encode them 0 and respectively or by any other two numbers or words Moreover we can if we so choose denote {yes no for A by { 0 if = absent but by { 3 5 if = present; analogously we can denote {yes no for B by {lion crocodile if = absent but by {zebra cheetah if = present (A B) ( ) if it holds for the original values for A and B must also hold after any such transformations This follows from the fact that if () holds then after any factor-value-specific transformations F( A) and G( B) we have ( ) ( F( A) G( B) ) ~ F( f ( C) ) G( g( C) ) = ( f* ( C) g* ( C) ) Second selective influence implies marginal selectivity the term coined by Townsend and Schweickert (989) for the situation when the marginal distribution of A does not depend on and the marginal distribution of B does not depend on This is an obvious consequence of () The reverse is not true as illustrated by eamples in Dzhafarov (003) and more systematically in Kujala and Dzhafarov (008) Other eamples are given in this paper: in fact in all our eamples where the selective influence relation does not hold marginal selectivity is satisfied Third selective influence relation satisfies the nestedness property: if some random variables are selectively influenced by corresponding factors (say (A B C) ( γ) we need more than two factor-variable pairs for this property to be non-trivial) then any subset of these variables is selectively influenced by the corresponding subset of factors: (A B) ( ) (A C) ( γ) and (B C) ( z) This property is obvious as soon as () is generalized to larger sets In this paper the three properties of selective influence will be demonstrated on the maimal level of generality for arbitrary sets of random entities and corresponding sets of factors of arbitrary nature 3 Distance tests for selective influence How can one determine that (A B) ( )? In Kujala and Dzhafarov (008) two types of necessary conditions for selective influence were formulated termed cosphericity tests and distance tests As we only generalize in this paper the latter class of tests we need not discuss the former To apply a distance test to our eample means to do the following First the values of A and B have to be encoded by real numbers In accordance with what we know about the transformations we can use any functions f( A) and g( B) with numerical values Second one chooses a number r Third for each of the four treatments = one computes the quantity D = E A B where E denotes epected value and (A B ) is an alternative (and more convenient) way of designating (A B) Note that D (the same as etc) is a string of symbols with no multiplication involved It has been shown in Kujala and Dzhafarov (008) that if (A B) ( ) then considering each random variable at each value of the corresponding factor as a point (this yields four points A A B B ) these points can be placed in a metric space in which the values D D D D are with some caveats distances between A points and B points (D between A and B D between A and B etc) As these distances by definition should satisfy the triangle inequality we conclude with a bit of algebra (see Section 5) that ma{ D D D D ( D+ D + D+ D) A distance test consists in checking if this inequality is satisfied: if not (at least for one choice of the numerical values and the eponent r) then the selective influence relation is ruled out In this paper we generalize this test to arbitrary sets of random entities selectively influenced by arbitrary sets of eternal factors As it turns out all one has to do to prove that all random entities taken one for each value of the corresponding factor can be embedded in a metric space is to apply the test just described to wwwfrontiersinorg September 00 Volume Article 5 3

all pairs of treatments for all pairs of factors To present this result in an unambiguous form we have to introduce some notation that may appear cumbersome at first: since a value of a factor generally does not itself indicate which factor it is a value of (eg absent or can be a value of both and ) we superscript each factor value by the corresponding factor name In our eample it would be We call these pairs factor value with factor name factor points The four distances will now be written D D D D Note that we could only get away with the previous notation because the identity of the factors in it was encoded by the order of their values within pairs: in D the first belonged to and the second one to This convention cannot work of course for more than two factors In the new notation the distance test acquires the form where { ( ) ma D D D D D + D + D + D D y = Dy = E A B y { y y 4 The joint distribution criterion for selective influence Compliance with a given set of distance tests is only a necessary condition for selective influence Is there a way to definitively prove that selective influence (A B) ( ) does hold if it is not ruled out by distance tests? As it turns out the answer is affirmative and it is an almost immediate consequence of our definition of selective influence if presented at a sufficiently high level of mathematical rigor Stated in intuitive terms and applied to our eample consider four hypothetical random variables one for each of our factor points: H H H H Suppose that they are jointly distributed ie we can speak of co-occurring quadruples of values There are si pairwise combinations of the four factor points but only four of them those of the form y ( y { ) form treatments whereas the remaining two and do not The four treatments correspond to pairs ( A B ) whose joint distributions y y are well defined Suppose now that for all those cases when a pair of factor points forms a treatment we have ( y ) ( y y) { (3) H H A B y (4) Then and only then (A B) ( ) and we can write then A instead of A y and B y instead of B y We call this the joint distribution criterion for selective influence The joint distributions of ( H H ) and ( H H ) must also be well defined even though they do not correspond to any pairs of random variables one can choose from the observable A B A B Let us look at this in detail The observed joint distributions of ( A B ) are represented by four probabilities each denoted y y in the four matrices introducing our eample by p ij q ij r ij s ij We now switch to a more convenient notation (although again more cumbersome at first glance): ( ) { { P y = Pr A = i B = j i j yes no y ij y y where P ij y is a string of symbols with no multiplication implied To ascertain if (A B) ( ) using (4) we have to see if we can find 6 probabilities ( ) Qijkl = H = i H = j H = k H = l Pr { ijkl yes no for four binary variables { H H H H subject to the basic constraints and such that Q ijkl (5) 0 Q = (6) ijkl ijkl Q = P Q = P kl ijkl ij jk ijkl il il Q = P Q = P ijkl for all i j k l {yes no Indeed Pr kj ij ijkl kl by (5) ( = = ) = ijkl = kl H i H j Q P = Pr A = i B = j ij ( ) which shows that the first of the equations (7) is equivalent to ( H H ) ( A B ) the application of (4) to y = ; and analogously for the other three equations Note that (7) implies marginal selectivity For instance it follows from (7) that Pr( A = i)= Pij = Qijkl = Qijkl = Pr H = i and j j kl jkl (7) ( ) Pr( A = i)= Pil = Qijkl = Qijkl = Pr H = i ( ) l l jk jkl that is Pr( A = i) = Pr( A = i) Eample Let the dependence of {A B on { be described by the distributions Treatment = B = yes B = no A = yes 05 0 05 A = no 0 Treatment = B = yes B = no A = yes 0 A = no 05 0 05 Treatment = B = yes B = no A = yes 0 A = no 05 0 05 Frontiers in Psychology Quantitative Psychology and Measurement September 00 Volume Article 5 4

Treatment = B = yes B = no A = yes 05 0 05 A = no 0 and { ma D D D D => = + D D + D + D ( )/ Note that marginal selectivity here is satisfied trivially as the marginal distributions remain fied Consider the distribution of { H H H H with Q ijkl = 5 for ijkl =00 5 for ijkl = 00 0 otherwise It is easy to check that this distribution satisfies (6) and (7) hence also (4) By the joint distribution criterion we conclude that {A B { Eample No such probabilities Q ijkl can be found for the distributions Treatment = B = yes B = no A = yes 05 0 05 A = no 0 Treatment = B = yes B = no A = yes 05 0 05 A = no 0 Treatment = B = yes B = no A = yes 05 0 05 A = no 0 Treatment = B = yes B = no A = yes 0 A = no 05 0 05 so {A B are not selectively influenced by { in this case This can be shown by direct algebra but there is a simpler method: this dependence of {A B on { fails a distance test Indeed let us transform yes into 0 and no into and let us choose the eponent r = (although in this eample the value of r does not matter) Then D = D = D = 0 0 05 + 0 0+ 0 0+ 05 = 0 D = 0 0 0+ 0 0 5+ 0 0 5+ 0= which contravenes (3) In this paper the joint distribution criterion is formulated in complete generality for arbitrary sets of random entities and corresponding sets of eternal factors 5 The need for generalization In a controlled eperiment or systematic survey we usually focus on a small number of random entities such as which of several responses is given and how long it has taken and try to selectively target some of them by eperimental manipulations or selectively relate them to concomitant factors Relatively small networks of random entities and eternal factors are therefore of paramount practical importance But a network of random entities and the set of eternal factors that may be thought to affect them selectively can be quite large even infinitely large in theoretical considerations dealing with comple observable behaviors such as a person s activities within a typical day or unobservable mental networks behind even relatively simple tasks such as pushing a key in response to a stimulus varying in two binary properties (see Dzhafarov Schweickert and Sung 004 for an eample) Random processes are routinely used in modeling simple forms of decision making (see eg Diederich and Busemeyer 003) Any random process can be viewed as a system of stochastically interdependent random entities indeed by intervention values (including no intervention ) at every moment of time An intervention at moment t can be thought to selectively affect a portion of the random process in some interval [t t ] (perhaps even with t = t ) and the problem arises as to how to identify such an interval from the observed joint distribution of the random entities constituting the process It is important therefore to be able to apply the notion of selective influence to arbitrary finite and infinite systems of random entities and eternal factors Conventions and notation A factor is defined as a non-empty set of factor points (a dummy factor can be defined as a set containing a single point) Denoting factors by lowercase Greek letters γ the factor points of say factor are formally pairs ( ) consisting of a factor value (or level) and a unique factor name (read: value/level of factor ) This ensures that no two distinct factors have common points: eg level of factor size ( size ) is distinct from level of factor shape ( shape ) It is convenient to write in place of ( ): shape (50 db) intensity present left stilumus etc Let Φ be a non-empty set of factors A set φ containing precisely one factor point ( )= for each factor in Φ φ = { Φ Φ wwwfrontiersinorg September 00 Volume Article 5 5

is called a treatment 3 When the set of factors Φ is finite treatments will be presented as strings of factor points without commas or parentheses: y z γ µ µ µ k k etc Eample If Φ = { γ with = { = { and γ = { γ then the treatments φ (written as strings) are γ γ γ and γ A random entity A is a triad consisting of a measurable function f : A A a sample (probability) space ( A Σ M ) and an observation (measurable) space (A Σ) on which f induces a probability measure M Traditionally A is simply identified with f the sample space and the observation space being assumed implicitly or A is viewed as the identity function on A with (A Σ M ) = (A Σ M) The latter view is often the only practical one as we almost never know anything about a sample space as separate from the observation space A random variable is a random entity whose observation space is a subset of reals endowed with the Borel sigma-algebra 4 Given an arbitrary indeing set Ω any set of random entities whose measurable functions {f ω : A A ω ω Ω map from one and the same sample space (A Σ M ) into respective observation spaces {(A ω Σ ω ) ω Ω possesses a joint distribution ie a probability measure M induced by M on the product space ω Ω (A ω Σ ω ) 5 Eample Let the sample space consist of A = {0 {0 {0 Σ = A and M derived from elementary probabilities p ijk (i j k {0 ) Then the random variables A B C defined on (A Σ M ) by the coordinate projections a : (i j k) i b : (i j k) j and c : (i j k) k respectively (i j k {0 ) possess the joint distribution M = M on ({0 {0 ) ({0 {0 ) ({0 {0 ) = (A Σ ) Two random entities A and B defined on different sample spaces are called (stochastically or probabilistically) unrelated (see Dzhafarov and Gluhovsky 006) They do not possess a joint distribution Note that two unrelated random variables can be identically distributed if they map into one and the same observation space on which they induce one and the same probability measure Throughout this paper we deal with a set of probabilistically unrelated random entities {A φ φ ΠΦ indeed by treatments φ ΠΦ with measures {M φ φ ΠΦ induced on one and the same observation space (A Σ) For convenience we refer to A φ as a random entity A at φ as if A φ and A φ for φ φ were a single entity A at two 3 Strictly speaking an element of the Cartesian product ΠΦ is a choice function {( ) Φ whereas a treatment φ is the range of a choice function { We Φ conveniently confuse the two notions Also for convenience only in this paper we assume completely crossed design ie that every member of ΠΦ is a possible treatment With only slight modifications ΠΦ can be replaced with any nonempty subset thereof 4 A random entity A with A a finite or infinite denumerable set and Σ the set of all its subsets can also be (and traditionally is) considered a random variable because such an A can always be injectively mapped into the set of reals or into a partition of an interval of reals 5 Recall that in the product measurable space ω Ω (A ω Σ ω ) = (A Σ) the set A is the Cartesian product Π ω Ω A ω while Σ = ω Ω Σ ω is the smallest sigma algebra containing all sets of the form aω Π A ω Ω { ω ω aω Σ ω 0 0 0 0 different treatments Note that A φ and A φ are defined on different sample spaces: they do not possess a joint distribution In particular they are not mutually independent 6 Eample 3 The random variables A and B in the eamples of Section are called A and B by the abuse of language just mentioned Strictly speaking we deal with four pairwise stochastically unrelated A A A A and four pairwise stochastically unrelated B B B B such that A y and B z u possess a joint distribution if and only if y = z u A set of random entities {A ω ω Ω on one and the same sample space is a random entity whose observation space (A Σ) is the conventionally understood product of the observation spaces (A ω Σ ω ) for A ω ω Ω If the set of random entities {A ω ω Ω depends on Φ we present {A ω ω ω Ω at a treatment φ as { A φ ω Ω instead of the more correct but less convenient ({A ω ω Ω ) φ Eample 4 The variables {A B C of Eample depend on the factors Φ = { γ of Eample if M = M φ is viewed as a function of φ = γ γ In this eample A φ= A Σ φ = Σ and {a φ b φ c φ = {a b c 3 In accordance with the previous section given a set of factors Φ a corresponding set of random entities is denoted {A Φ For each Φ the entity A may in fact be a shortcut notation for a set of stochastically unrelated random entities indeed by different treatments { A φ φ Φ In other words A φ is treated as a random entity A corresponding to factor and taken at treatment φ The complete notation for the set of random entities {A Φ then is { { A φ Φ φ ΠΦ (8) where the elements of { A φ Φ for a given φ are stochastically interrelated (possess a joint distribution) while the sets { A φ Φ and { A φ Φ for distinct φ φ are stochastically unrelated It is more convenient however not to use this eplicit notation and to speak instead of {A Φ depending on Φ Definition 3 Let a set of random entities {A Φ indeed by a set of factors Φ depend on this set of factors (ie be presentable as (8)) We say that the dependence of {A Φ on Φ is marginally selective (satisfies the property of marginal selectivity) if for any subset Φ Φ and any φ ΠΦ the distribution of { A φ Φ is the same for all treatments φ containing φ (that is it does not depend on { φ : Φ Φ ) The notion of marginal selectivity was introduced by Townsend and Schweickert (989) for two random variables In Dzhafarov (003) it was generalized to a finite set of random variables under 6 We could have etended the scope of this definition by allowing A φ to be a function f φ : A φ A φ relating ( A φ Σ φ M φ ) to (A φ Σ φ ) ie by allowing the set and the sigma algebra not only the measure M φ to depend on treatment φ This would have however made our abuse of language (in treating different A φ s as a single A at different φ s) even more abusive Moreover this general approach can always be reduced to the set-up with a φ-independent (A Σ) by putting A= ( φ ΠΦ A φ { φ ) and Σ the sigma algebra consisting of all countable unions of the sets a { φ for all a Σ φ and all φ ΠΦ Frontiers in Psychology Quantitative Psychology and Measurement September 00 Volume Article 5 6

the name of complete marginal selectivity The adjective complete (omitted in the present paper for simplicity) distinguishes this notion from a weaker and less useful generalization of Townsend and Schweickert s term: for any factor Φ and any treatment φ the distribution of A φ does not depend on { φ : Φ {a Note that Definition 3 does not mean that for distinct treatments φ and φ which include φ = { φ : Φ = { φ : Φ { A A φ φ φ = { φ φ φ ( ) Φ ( ) Φ This equality is not legitimate as the two sets of random variables do not possess a joint distribution One can only say that { A A φ φ φ { φ φ φ ( ) Φ ( ) Φ where as before means is distributed as Eample 3 Let the variables {A B C of Eample be indeed by the factors Φ = { γ of Eample respectively That is {A B C stands for {A A A γ in accordance with the general notation of Definition 3 Then the dependence of {A A A γ on { γ is marginally selective if and only if { { { { { { { { γ γ γ γ A A ~ A A & A A ~ A A γ γ γ γ γ γ γ γ A A ~ A A & A A ~ A A γ γ γ γ A γ A γ & A γ A γ A γ A γ& A γ A γ A A A A γ γ γ γ γ γ γ γ Under marginal selectivity it is sometimes admissible by abuse of notation to write { A φ Φ as { A φ Φ where φ = { φ : Φ Thus in Eample 3 it may be convenient to present both γ { A A and { A A γ as { A A γ γ γ γ present both A γ and A γ as A etc This does not lead to complications provided one remembers that say A γ and A γ are identically distributed rather than identical One can therefore deal with A and A in all considerations involving only A or with A and A in all considerations involving only A It would be incorrect however to speak of stochastic relationships between A or A and A or A : to depict such relationships one needs both A and A to be double-indeed by the factor points involved (unless we have selective influence in addition to marginal selectivity as eplained in the net section) Eample 33 Continuing Eample 3 let us return to writing {A B C instead of {A A A γ We can omit the factor γ when dealing with {A B only and write { AB { AB { AB { Let the distributions of these random pairs be AB Treatment = B = 0 B = A = 0 06 0 06 A = 0 04 04 06 04 Treatment = B = 0 B = A = 0 0 06 06 A = 04 0 04 04 06 Treatment = B = 0 B = A = 0 03 0 05 A = 03 0 05 06 04 Treatment = B = 0 B = A = 0 05 05 05 A = 05 035 05 04 06 We verify that marginal selectivity holds for A and B and introduce the abridged indeing A A etc: A A A A A= 0 A= = A 06 04 A= 0 A= = A 05 05 B B B B B= 0 B= = B 06 04 B= 0 B= = B 04 06 We can use the abridged indeing in all considerations involving A alone and B alone Consider however this: from the joint distribution matrices we have A = B A = B ( ) A B A B where indicates stochastic independence and negation (ie A and B are not stochastically independent) If now we attempt to use in these relations the abridged indeing we will run into a contradiction: from A = B and A = B we conclude B = B but then it is impossible for B to be independent of A and for B not to be It is therefore necessary to retain the notation A A etc in all considerations involving both A and B even though we know that the dependence of {A B on { is marginally selective Definition 34 Let a set of random entities {A a Φ indeed by a set of factors Φ depend on this set of factors We say that the set {A a Φ is selectively influenced by Φ and write { A Φ Φ if for some random entity C and every Φ there is a measurable function f such that for every treatment φ wwwfrontiersinorg September 00 Volume Article 5 7

{ A φ f ( C) Φ { φ Remark 35 Alternatively one could posit for every treatment φ where { φ { A φ = f ( Cφ ) Φ { C φ φ ΠΦ is a set of pairwise unrelated random entities all distributed as C This formulation is more cumbersome but it correctly emphasizes the stochastic unrelatedness of { A φ Φ for different treatments φ Definition 34 however is more parsimonious as the stochastic unrelatedness property is known from the contet Remark 36 If applied to finite sets Φ Definition 34 becomes equivalent to the formulations of selective influence given in Dzhafarov (003) Dzhafarov and Gluhovsky (006) and Kujala and Dzhafarov (008) Even for the finite case however the present definition is mathematically more rigorous and it profits from the precision offered by the notation = ( ) for factor points More importantly it can be seen more immediately than the previous definitions to be reformulable into the joint distribution criterion for selective influence as discussed in the net section The following statements are obvious Lemma 37 If {A Φ Φ then (i) { A Φ Φ for any Φ Φ; (ii) the dependence of {A Φ on Φ is marginally selective That is (refer to Section ) selective influence has the nestedness property and implies marginal selectivity The net lemma says that if a set of random entities {A Φ is selectively influenced by Φ then the set of individually transformed versions of these random variables is also selectively influenced by Φ (refer to the first property in Section ) Individual transformations of A can be different for different factor points Lemma 38 If {A Φ Φ then {B Φ Φ where for any Φ any and any treatment φ containing B = h ( A ) for some measurable function h Proof By definition which implies φ { A φ ~ { f ( C ) Φ φ φ { { ( ) = { B ~ h f ( φ C ) g ( C ) Φ φ φ where g h f is a measurable function 4 The joint distribution criterion Definition 34 suggests a way of looking at the selective influence relation directly in terms of the (product) observation space for the system of the random entities involved making the overt reconstruction of C and the functions f unnecessary (or trivial as in the proof of the theorem below) Theorem 4 A necessary and sufficient condition for { A Φ Φ is the eistence of a jointly distributed system { H Φ such that for every subset φ of Φ that forms a treatment (ie belongs to ΠΦ) { H A { φ φ Φ Remark 4 We call this the joint distribution criterion for selective influence Proof The necessity is proved by observing that if {A Φ Φ then the system { H = f C { ( ) Φ Φ is a jointly distributed system of random entities To prove the sufficiency define and for every define C = { H Φ f ( C) = Pr oj ( C) = H where Proj denotes the th coordinate projection As a very simple application of the joint distribution criterion we prove the following (intuitively quite obvious) statement Lemma 43 If the dependence of {A Φ on Φ is marginally selective and { A φ Φ is a set of mutually independent random entities for every treatment φ then {A Φ Φ Proof For any factor Φ marginal selectivity implies that the distribution of A φ depends only on the factor point φ Form the set { H consisting of mutually independent random entities such that for any Φ H ~ A Then for every treatment φ { H ~{ A φ φ Φ and Theorem 4 implies {A Φ Φ In Section 4 we have seen illustrations of the criterion on interdependent random entities Here is another eample Eample 44 Let Φ = { = { = { and let {A φ B φ for every treatment φ be a pair of Bernoulli variables Consider the distributions below: Treatment = B = 0 B = A = 0 05 0 05 A = 0 Frontiers in Psychology Quantitative Psychology and Measurement September 00 Volume Article 5 8

Treatment = B = 0 B = A = 0 0 A = 05 0 05 Treatment = B = 0 B = A = 0 0 A = 05 0 05 Treatment = B = 0 B = A = 0 0 A = 05 0 05 The criterion of the joint distribution of { H H H H rejects the possibility of {A B { as it can be shown by direct algebra that there are no 6 probabilities Q ijkl generating the distributions in question (cf Eamples and ) We do not need to provide such a demonstration as it is obvious in this case from probabilistic considerations that { H H H H cannot be jointly distributed and satisfy { { { { { { { H H ~ A B H H ~ A B { H H ~ A B H H ~ A B Otherwise the four joint distribution matrices shown above would have implied respectively H = H H = H H = H and H = H which are not mutually compatible equations If we take the numerical values of A and B in the last eample as they are then with any eponent r (see Section 3) the distance test is passed: { ma D D D D ( 3 = = D + D + D + D)/ This is just another demonstration that a distance test is only a necessary condition for selective influence: a dependence of random entities on eternal factors can pass such a test but still fail the joint distribution criterion It is instructive to see however in reference to Lemma 38 that with appropriately chosen transformations of the random variables the distributions in question can be made to fail the respective distance tests Thus the possibility of selective influence in Eample 44 will be rejected if we apply the simple transformation h B B y y ( y )= = = y while leaving A y untransformed The distributions then become essentially the same as in Eample The distance tests therefore can be conjectured to have considerable rejection power if one combines it with adeptly chosen transformations (see the open question we pose at the conclusion of the paper) In any case the eceptional simplicity of these tests makes it worthwhile to always consider them before applying the joint distribution criterion 5 Distance tests In Kujala and Dzhafarov (008) the distance tests were formulated for two variables influenced by two factors in a two-by-two factorial design In this section we generalize these tests to arbitrary random variables {A Φ whose dependence on factors Φ is marginally selective Perhaps surprisingly we show that this generalization requires nothing more and nothing less than applying the original tests to all possible two-by-two factorial designs one can etract from Φ Distance tests can be applied to non-numerical random entities only after they have been numerically transformed (thus for the distance test applied to Eample we transformed yes into 0 and no into ) In this section therefore we confine our discussion to random variables We will need some auiliary notions and notation conventions n Any finite sequence of factor points ( n ) is called a chain n Chains will be written as strings n without commas and parentheses (this generalizes the convention we have already used for chains which are finite treatments) Chains can be denoted by n capital Roman letters X = n (from the second half of the alphabet to distinguish them from random variables and entities for which we use the first half) A chain X may be empty or consist of a single element (factor point) A subsequence of points belonging to a chain forms its subchain A concatenation of two chains X and Y is written as XY So we can have chains Xy XYy X y Y Xy Z etc The number of points in a chain X is its cardinality X and any chain with the smallest cardinality within a set of chains is referred to as a minimal chain (in this set) In particular one can speak of a minimal subchain of a chain among all subchains with a certain property (this notion is used in the proof of Theorem 5 below) Definition 5 Let the dependence of a set of random variables {A Φ on factors Φ be marginally selective Let r be fied For any ( y ) with we define D y = A A where A - B r for any jointly distributed A and B is defined as y y E A B A B = for < esssup A B for = Remark 5 Here ess sup is the essential supremum the lowest upper bound that holds almost surely; it is the limit of A - B r as r Remark 53 Note that D y is well-defined only under the assumption that the dependence of a set of random variables {A Φ on factors Φ is marginally selective Otherwise A A y y would not be determined by y only and it would not even be legitimate to inde the two variables by y alone wwwfrontiersinorg September 00 Volume Article 5 9

We will rely on the following result whose proof we omit as its only non-trivial part follows from the Minkowski inequality (a somewhat abridged proof can be found in Kujala and Dzhafarov 008) Lemma 54 Given a sample space let R be a set of all random variables A B (jointly distributed) on this space For any r A - B r is an etended metric on R provided we do not distinguish AB identical on a set of measure Remark 55 The adjective etended means that is included in the set of possible values The norms E[ A B ] and ess sup A - B as they only involve non-negative values always eist finite or infinite Convention 56 In the remainder of this section we will tacitly assume that the dependence of {A Φ on Φ is marginally selective We will also tacitly assume that r in the definition of r and D is fied n For any chain X = n such that a n and i a i+ for i = n define n = i + i= DX D i (with the understanding that the sum is zero if n is 0 or ) The operator D always acts upon the entire chain following it eg µ ν µ Du Xv D DX D n ν = + + Definition 57 A chain Xy b is said to be compliant with the chain inequality (or simply compliant) if D Xy D y The chain is said to be contravening (the chain inequality) if D Xy < D y It follows from this definition that if Xy is contravening or compliant then (otherwise D y is not defined) and no factor in Xy occurs twice in succession For a chain to be contravening in addition X must be non-empty (ie Xy 3; Lemma 59 below shows that in fact Xy 4) A non-contravening chain need not be compliant: it may eg be any chain with fewer than 3 elements or it can be any chain of the form Xy Analogously a non-compliant chain is not necessarily contravening 7 Lemma 58 Let U = Xy Yz γ Z be a contravening chain with a compliant subchain y Yz γ Then U* = Xy z γ Z (ie U without Y) is a contravening subchain of U Proof Let and u δ be the first and the last elements of U respectively (then necessarily δ) Note that may coincide with y or u δ with z γ (but not both) From and δ γ γ D u > DXy + Dy Yz + Dz Z Dy Yz Dy z γ γ 7 We cannot resist mentioning at this point a surprising mathematical similarity between the conceptual apparatus of (hence also the notation adopted in) the present theory especially in this section and that of the completely unrelated theory of regular well-matched spaces developed in Dzhafarov and Dzhafarov (00) for comparative judgments In particular factors and factor points seem to be formally homologous to stimulus areas and stimuli respectively and the contravening chains of the present theory essentially mirror the soritical sequences for comparative judgments so that the proof of Theorem 5 below is almost identical to that of Lemma 33 of Dzhafarov and Dzhafarov (00) we get δ γ γ D u > DXy + Dy z + DzZ Lemma 59 Every triadic chain y z γ with pairwise distinct γ is compliant Proof Denoting the random variables corresponding to the factors γ by A B C respectively marginal selectivity implies D y = A B = A B γ Dy z = B γ C γ = B C y z y z y z γ D z = A C = A C γ γ y y y z y z γ γ y z γ γ γ γ z z y z y z Since { A γ B γ C γ are jointly distributed the statement y z y z y z follows from Lemma 54 8 We are ready now to prove the main theorems regarding the distance tests for selective influence Theorem 50 Let {A Φ be selectively influenced by Φ Then any n chain n such that i i+ for i = n and n is compliant n n i i+ D n Di i+ i= Proof By the joint distribution criterion there is a jointly distributed system { H Φ such that for any { y within a treatment (ie with ) But then { y y { y A A ~ H H D y = A B = H H y y y and the statement of the lemma follows from Lemma 54 For the net theorem recall that we are following Convention 56 Theorem 5 Every contravening chain X contains a contravening tetradic subchain X of the form y v u Proof Let X = Pu be a minimal contravening subchain of X Then and by Lemma 59 X 4 If for some z γ in X we had γ then the subchains Qz γ and z γ Ru with Qz γ R = P would have to be compliant (otherwise X would not be minimal) Then by Lemma 58 we would have a contravening triadic chain z γ u which is impossible by Lemma 59 For every z γ in X therefore either γ = or γ = Since a contravening chain cannot contain repeating superscripts X is of the form y v Su 8 n One can easily generalize this reasoning to show that every chain n with pairwise distinct { n is compliant As will be apparent from the proof of Theorem 5 however in the present development we should only be concerned with n = 3 Frontiers in Psychology Quantitative Psychology and Measurement September 00 Volume Article 5 0

But then v Su must be compliant (otherwise X would not be minimal) and by Lemma 58 y v u is contravening Since X is minimal we conclude that S is empty and X = y v u It follows that the task of testing the compliance of D with all possible chain inequalities as stated in Theorem 50 is reduced to testing the compliance with only the inequalities involving tetradic chains: if {A Φ Φ then for any chain μ y ν u μ v ν with distinct μ and ν µ ν µ ν ν µ µ ν D v D y + Dy u + Duv and if all such inequalities are satisfied then there can be no other contravening chains Given any four factor points μ y ν u μ v ν one can form four different chains with alternating factors and four corresponding inequalities µ ν µ ν ν µ µ ν D v D y + Dy u + Duv µ ν µ ν ν µ µ ν D y Dv + Dv u + Du y µ ν µ ν ν µ µ ν Du y Duv + Dv + D y µ ν µ ν ν µ µ ν Du v Du y + Dy + Dv Following Kujala and Dzhafarov (008) these are easy to see (by adding the left-hand sides to themselves and to the right-hand sides) to be equivalent to the single inequality { ma D y Dv Du y Duv ( D y + Dv + Du y + Duv) / We call this a tetradic inequality Note that it is always satisfied if μ = u μ or y ν = v ν so we only have to look at μ y ν u μ v ν with two distinct points of each factor The theorem below shows that we have to check all such tetradic inequalities (for any given r) Theorem 5 The tetradic inequalities are mutually independent in the sense that any one of them can be violated while the rest of them hold Proof Let μ and ν be distinct factors in Φ and let for all points μ and y ν A µ µ ~ A ν ν ~ E where E is some non-singular random variable (ie no constant equals E with probability ) Let μ μ ν ν be distinct fied points of μ and ν and let y ( A µ A ν A µ A ν A µ A ν µ ν µ ν EE ) ~ ( ) ~ ( ~ ) ( ) (9) Let A for any point of any factor {μ ν be distributed arbitrarily and let the random variables { A φ Φ be mutually independent for any treatment φ ecept if the latter includes one of the pairs { μ ν { μ ν or { μ ν : in those cases the joint distribution of µ ν ( A µ ν A µ ν) is given by (9) while { A u v u v φ Φ -{ µ and { A φ Φ -{ ν remain the sets of mutually independent variables It is easy to see that { A φ thus defined satisfy the marginal selectivity property Φ Now D = D = D = 0 D = E E > 0 where E and E are identically distributed and independent The tetradic inequality on { μ ν μ ν is therefore violated: { ma D D D D E E D + D + D +D = E E > = Clearly the tetradic inequality holds on any set { y s t that does not include { μ ν { μ ν or { μ ν as this inequality then only involves mutually independent random variables (Lemma 43 and Theorem 50) Denoting by 3 μ any point other than μ and μ (if such a point eists) and analogously for 3 ν it remains to consider the cases { μ ν 3 μ 3 ν { μ ν μ 3 ν { μ ν 3 μ ν { μ ν 3 μ 3 ν { μ ν μ 3 ν { μ ν 3 μ ν and { μ ν 3 μ 3 ν (note that the order of the points is immaterial here) It is easy to check that the four distances in each of these quadruples equal either 0 or E E > 0 (with E E independent identically distributed) and that the number of zero distances in these quadruples is never greater than two The tetradic inequality therefore always holds: either { ma{ = 3 E E sum E E < = or ma{ = E E sum{ E E = = This completes the proof Theorem 5 is proved for a fied r and for untransformed {A Φ The application scope of the distance tests can be significantly broadened by using various values of r and by applying to {A Φ various transformations as specified in Lemma 38 It is clear that the tetradic inequalities cannot be independent across all r and/or all transformations Since a violation of a tetradic inequality means a strict inequality the inequality involving the same quadruple of factor points will have to hold also for sufficiently close values of r and sufficiently slight transformations This also applies to r = : every violated inequality for r = will have to remain violated for all sufficiently large values of r since the difference between ess sup A B and E[ A B ] can be made arbitrarily small (or if the former is infinite the latter can be made arbitrarily large) 6 Conclusion We have advanced the theory of selective influence in three ways The notion of selective influence (together with the related but weaker notion of marginal selectivity) has been generalized to arbitrary sets of random entities whose joint distributions depend on arbitrary sets of eternal factors by which the random entities are indeed (Definition 34) The joint distribution criterion has been formulated for random entities to be selectively influenced by their indeing eternal factors: this happens if and only if there is a jointly wwwfrontiersinorg September 00 Volume Article 5