Dynamics in Social Networks and Causality

Web Science & Technologies University of Koblenz Landau, Germany Dynamics in Social Networks and Causality JProf. Dr. University Koblenz Landau GESIS Leibniz Institute for the Social Sciences

Last Time: Case Study: the spread of Facebook (Causal) Relationships Correlation Today: (Causal) Relationships Regression Analysis Matching Methods Spreading of culture 2

Isolating mechanisms is difficult but important. We should think about mechanisms that may explain causal effects before we measure them http://www.bitbybitbook.com/en/running-experiments/beyond-simple/mechanisms/ 5

Is X related with Y? RELATIONS 6

Correlation Coefficient Is the normalized version of the covariance: Variance: Frequently same sign of difference: positive correlation Frequently different sign of difference: negative correlation Vary sign of difference: no correlation 7

Correlation coefficient http://guessthecorrelation.com/ http://greenteapress.com/thinkstats/thinkstats.pdf 8

Can we predict Y from X? LINEAR REGRESSION 9

Observations: Can we predict Y from X? Can we formulate a model that expresses Y as a function of X? 10

Assume Linear Relationship Y i = b 0 + b 1 X i + ε i iid ε1,., ε n N(0,σ 2 ) Y i N(b 0 + b 1 X i, σ 2 ) Y i = 2 + 0.5 X i i Observed value 6 ε Y Y i = b 0 + b 1 X i + ε i Y 4 2 Predicted value: Y i = b 0 + b 1 X i 0 b 0 0 2 4 6 8 X 11 11

Bivariate Regression How to estimate b0 and b1? N SSE = (Y i Y i ) 2 i=1 N SSE = (Y i b 0 + b 1 X i ) 2 i=1 6 Y 4 2 0 0 2 4 6 8 X 12 12

Interpreting coefficients Y = b 0 + b 1 X + ε b 1 is the average effect of X on Y One unit increase in X leads to b 1 units of increase in Y b o is the best guess for Y if X=0 13

Example How much does the IQ of kid change on average if the mother has a high school degree or not? outcome predictor IQ kid = b 0 + b 1 HC mom + ε error intercept coefficient Binary predictor. b 1 tells us how much IQ kid differs between the 2 groups 14 14

Example Continuous Predictor How much does the IQ of a kid change on average if we increase IQ of mother? outcome predictor IQ kid = b 0 + b 1 IQ mom + ε intercept coefficient error IQ kid when IQ mom =0 Difference between IQ kid for 2 groups that differ in one unit of IQ mom 15 15

Example IQ kid = 26 + 0.6 IQ mom If we compare children one unit change in IQ of mother leads to 0.6 change in IQ kid 10 points difference in mothers IQ corresponds to 6 point difference in kid s IQ What does the intercept tell us? Kids of mothers with IQ=0 who did not go to college would have a IQ of 26 16

Multiple Predictors Continouse predictor Difference between IQ kid for 2 groups that differ in HC mom but have equal IQ mom units outcome coefficient IQ kid = b 0 + b 1 IQ mom + b2 HC mom + ε intercept coefficient IQ kid when IQ mom =0 and HC mom =0 Difference between IQ kid for 2 groups that differ in one unit of IQ mom but have same HC mom value 17 17

Correlation versus Linear Regression Both measure the strength of the linear relationship between X and Y. Correlation gives you a bounded measurement of how close X and Y follow a perfect linear relationship Regression coefficient indicates the estimated change in the expected value of Y for a given value of X. It dependents on scale. The Pearson correlation coefficient, is the slope of the regression line when both variables have been standardized first. 18

Regression Models Causal Effects? Do students from elite colleges earn more later in life? earn ~ b0 +b1 *college +error Do people who went to an elite college earn on average more later in life? College Salary Time 19

Problem with Regressions Being accepted in an elite college correlates with motivation and socio-economic status These factors also correlate with salary socioeconomic status Motivation College Salary Time 20

Observational Data Randomization in experiments assures that: T O C T O In observational studies we need to control for covariates C that effect both, the outcome O and the treatment assignment T T O C 21

Regression Models Causality Even if we include all covariates that are relevant to ensure that T O C we measure population-wide average effects with regression model We measure the average effect of treatment T on outcome O when controlling for covariates C earn ~ b0 +b1 *college + b2 *motivation+ b3 *socio-econ+ error 22

Causal Effect Individual Level Effect wealth Did going to an elite college impact your future earnings? Y i (T) - Y i (C) college time 23

Does X cause Y? CAUSAL RELATIONSHIPS 24

Solutions Matching Methods Idea: find people that look like twins in pre-treatment covariates 25

Matching Methods Goal: treatment assignment T should be conditionally independent of outcome O given observed covariates X T O X Balance the distribution of observed covariates in treated and control group matching == pruning Unobserved covariates that are related with treatment and outcome and are not correlated with observed covariates are still a problem! 26

Matching Methods 4 Steps: Define a distance measure Find matches (e.g., use greedy k:1 matching) Assess quality of matches Are covariate distributions similar for treated and untreated group? Analyze effect of treatment on outcome 27

Position Does Special Training Help Job Promotion? Treated with elite education Outcome education (in years) 1-dimensional covariate Ho, Daniel, Kosuke Imai, Gary King, and Elizabeth Stuart. 2007. Matching as Nonparametric Preprocessing for Reducing Model Dependence in Parametric Causal Inference. Political Analysis 15: 199 236. Copy at http://j.mp/jpupwz 28

Example How to match people based on eductation? Greedy Distance Matching Treated Edu Pos 1 4 5 1 7 5 1 10 9 Treated Edu Pos 0 6 5 0 10 5 0 1 9 Matches: 4-6 7-10 10-1 One really bad match. Optimal solution: 4-1, 7-6 and 10-10 29

Greedy Distance Matching Greedy matching may not find the optimal matches! Introduce Caliper = maximal acceptable distance Throw bad matches away Pair-matching (1:1 matching), but also 1:k matching is possible What if our data is multivariate? 30

Euclidean distance Euclidean distance doesn t make sense when different dimensions are on different scales. For example: yearly income, age, gender, body weight Problem: distance is dominated by largest values 31

Mahalanobis Distance Multivariate distance measure Measures how many std away two points are Rescales variables based on their direction and variance Inverse of variance-covariance matrix 32

position Example: Distance Matching Approximates fully blocked experiment Completely randomized: Flip a coin for each subject. Heads -> T, tails -> C. Could get unlucky: all men assigned T education Fully blocked experiment: First pair up similar subjects: e.g. same gender, age, Then flip a coin for each pair. One gets T, one C. Balances the known covariates. Gary King, "Why Propensity Scores Should Not Be Used for Matching, Methods Colloquium, 2015 33

Assess Quality of Matches Was the matching successful? Are covariates balanced between the 2 groups? Standardized difference is the difference in groups for each covariate in units of standard deviation smd < 0.1 is good smd >0.2 indicates serious imbalance 34

Matching Methods 4 Steps: Define a distance measure Create matches (e.g., use greedy k:1 matching) Assess quality of matches Are covariate distributions similar for treated and untreated group? Analyze effect of treatment on outcome ATE Regression 35

Average Treatment Effect (ATE) Randomization Tests Compute test statistic from observed (matched) data ATE = Outcome Treated - Outcome Control Assume that null hypothesis is true (no treatment effect) ATE = E[Outcome Treated ] E[Outcome Control ]= 0 Randomly permute treatment assignments (shuffle treatment labels) Recompute test-statistic Is our observed test statistic surprising? 36

Regression After Matching position (p) pos = b0 + b1*edu + b2*is_treated 1) Preprocessing (matching) 2) Estimation of effects (regression models) education (e) Matching can help to reduce model dependence! 37

What if we simply use regression analysis? position change (p) pos = b0 + b1*edu + b2*is_treated binary variable estimated treatment effect education (e) Correcting for education, the treated group has higher positions. 38

Quadratic Regression position (p) pos = b0 + b 1 *e + b 2 *edu 2 + γ*is_treated Model Dependence Too much freedom given to analyst. Reason: Imbalance of covariates education (e) Correcting for education, the treated group has lower positions. 39

Distance Matching Distance matching works well if we have few covariates (less than 50). What if we have high dimensional data? Many covariates? Which should we include? Idea: project covariates into lower dimensional space compute for each observation (high dimensional vector) one number, the probability to be treated (propensity score). Match based on this number 40

Propensity Score Matching Propensity score is the probability of a subject to receive treatment given all covariates we want to control for If a subject has a propensity score=0.3 that means that a subject with these covariate values has a 30% chance of receiving the treatment Idea: subjects are not matched based on the covariates but based on the propensity score 41

position Propensity Score Matching Approximates complete randomization Completely randomized: Flip a coin for each subject. Heads -> T, tails -> C. Could get unlucky: all men assigned T education 43

Propensity Score Matching Estimate propensity score from observed data Outcome variable: T Independent variable: X Logistic regression: take predicted outcome T as propensity score Idea: achieve balance in covariates by conditioning on propensity score. This works if: P X = x π X = p, T = 1 = P(X = x π X = p, T = 0) Note: In a randomized trail the propensity score (==allocation probability) is known: P T = 1 X = P T = 1 = 0.5 44

Propensity Score Matching Matching: greedy, nearest neighbor Match on logit(propensity score) = log odds of propensity score We do that to stretch the values, often treatment probabilities are very small Look at std of transformed propensity scores Calliper: remove bad matches subjects that are more than 0.2 std away from the propensity score of their match Smaller calliper lower bias but more variance 45

Assess Quality of Matches Is the distribution of propensity scores similar for treated and control group? 48

Instrument Variables Problem: lets assume we cannot observe socio-economic status and motivation What could we do? socioeconomic status Motivation College Salary Time 50

Instrumental Variables Instrumental Variables are highly correlated with covariates but not with outcome Example Apply to elite college correlates with motivation Living area correlates with socio-economic status earn ~ b0 +b1 *college + b2*collegeap + b3*living +error We come closer to the causal effect of college choice on earning after controlling for the confounder socioeconomic status 51

Summary Matching methods and Instrumental Variables are powerful and help to approximate causality Problems Instrumental variables are often hard to find Researchers have lots of freedom when decide how to match We remove data we need to specify for which group the causal effect holds Compare results from different matching methods, different dimensionality reduction methods, different models Avoid model dependence and method dependence! 52

Dissemination of Culture How does culture evolve? 53 53

Cultural practices 54

Dynamics of Culture What is culture? How does it diffuse? Culture can be seen as an agglomerate of beliefs, opinions, values, behaviour and other things that certain groups of people have agreed on. People learn culture by interacting People are more likely to interact with similar people E.g. if they share a language they are more likely to interact and start sharing other traits over time Why do we see global polarization and local convergence? 55 55

56 56

Axelrod Model Each agent has a vector of f different features Each feature may allow q different traits Ethnicity Political Orientation Religion 1 5 4 57

Initial Setup 58 58

Axelrod Model Dynamic Process: An agent i and one of his neighbours j are randomly selected The overlap w i,j between their cultural vectors is computed With the probability w i,j the interaction takes place If the interaction takes place one feature is selected randomly and the trait of the neighbour j is set to the trait of i for this feature 59

Will we converge to one mono-culture? http://www-personal.umich.edu/~axe/research/dissemination.pdf 60

Will we converge to one mono-culture? http://www-personal.umich.edu/~axe/research/dissemination.pdf 61

Cultural Complexity The more cultural features we have the more likely it is the 2 agents will have something in common and can interact Few features and many traits the probability that agents will not share anything is high http://www-personal.umich.edu/~axe/research/dissemination.pdf 62

Any further questions? See you next week 90