Statistical modelling of a terrorist network with the latent class model and Bayesian model comparisons

Size: px

Start display at page:

Download "Statistical modelling of a terrorist network with the latent class model and Bayesian model comparisons"

Reynard Cooper
5 years ago
Views:

1 Statistical modelling of a terrorist network with the latent class model and Bayesian model comparisons Murray Aitkin and Duy Vu, and Brian Francis murray.aitkin@unimelb.edu.au duy.vu@unimelb.edu.au b.francis@lancaster.ac.uk School of Mathematics and Statistics, University of Melbourne and Department of Mathematics and Statistics, University of Lancaster, UK BOB 2015 Terrorist network p. 1

2 Statistical modelling of social networks Work supported by Australian Research Council , Aim: to evaluate latent class modelling by maximum likelihood and Bayesian methods for the analysis of social network and criminal career data. Participants Murray Aitkin, Pip Pattison, Brian Francis, Duy Vu. Main contribution: identifying subgroups, their number and membership, by latent class modelling and Bayesian model comparison. Two examples: Social network of the Natchez Mississippi women see Aitkin, M., Vu, D. and Francis, B.J. (2014). Statistical modelling of the group structure of social networks. Social Networks 38, Noordin Top terrorist network Aitkin, M., Vu, D. and Francis, B.J. (2016) to appear in an RSS journal (A or C). BOB 2015 Terrorist network p. 2

3 Where do network data come from? For social networks networks of people or other social creatures from either direct observation, or indirect data gathering though newpapers and other recording instruments. These sources of information provide the evidence of connections among actors. The connections are presented mathematically, and are analysed through properties of the mathematical structure. BOB 2015 Terrorist network p. 3

4 Facebook friendship network (Wikipedia) BOB 2015 Terrorist network p. 4

5 Unipartite and bipartite networks The Facebook network is a unipartite network it represents direct connections between the Facebook users. These connections may be directed (A likes B does not imply that B likes A) or undirected or reciprocal (A and B are connected through something, like Facebook). We discuss bipartite networks, in which the connections between actors are through their joint participation in events. BOB 2015 Terrorist network p. 5

6 The Natchez social network BOB 2015 Terrorist network p. 6

7 The adjacency matrix To perform any analysis we need to re-express the table elements mathematically through a link, or tie variable Y ij, with the presence of woman i at event j defining Y ij = 1, and her absence from the event defining Y ij = 0. We use n to denote the number of rows women, and r to denote the number of columns events. The resulting table is expressed as an n r matrix, called the adjacency matrix, denoted by Y. Marginal totals (T) have been added to the table, giving the total number of events attended by each woman, and the total number of women attending each event. We see that women vary in their propensity to attend events, and events vary in their attractiveness to women. We also give the marital status of woman i in a variable x i, coded 1 for married and 0 for unmarried. BOB 2015 Terrorist network p. 7

8 Two-mode network data x W\E T T BOB 2015 Terrorist network p. 8

9 Two-mode network data zeros suppressed x W\E T T BOB 2015 Terrorist network p. 9

10 Original Matrix 18 Actors 14 Events BOB 2015 Terrorist network p. 10

11 Random Shuffled Matrix 18 Actors 14 Events BOB 2015 Terrorist network p. 11

12 Probability models for actors and events Analysis needs to allow for uncertainty in the behaviour of actors: even if they form an established group with other actors, this does not mean that they all attend the same events. We consider the presence or absence of an actor at an event as a random process attendance is determined by a possibly large number of factors unknown to us, so we represent the process outcome as a Bernoulli random variable: The probability that actor i attends event j, (Y ij = 1), is p ij, and that actor i does not attend event j, (Y ij = 0), is 1 p ij. We want to bring the actors and event structures into the event attendance probability in some way. BOB 2015 Terrorist network p. 12

13 Models The null" model is a single-parameter model, giving the same constant probability p ij = p that every actor attends every event, independently across events and actors all actors have the same attendance probability, and all events have the same attraction probability. The Rasch model has a parameter for each actor and a parameter for each event: Each actor i has a propensity θ i to attend any event. Each event j has an attractiveness φ j to any actor. Actors attend events independently. The Rasch model is a main effect or additive exponential random graph model (ERGM), in events and actors, on the logit-transformed probability scale: logitp ij = log It has no subgroup structure. ( pij 1 p ij ) = θ i +φ j. BOB 2015 Terrorist network p. 13

14 The latent class model This model specifies a K-class latent structure for actors. The K classes are distinguished by K sets of event attendance parameters q jk, different among classes, but identical within classes. The proportion of actors in class k is π k ; θ K = (K,{π k },{q jk }). The class structure is unobserved; it is implied and identified by the actors different patterns of event attendance. The (observed data) likelihood L(θ K ) the probability of the observed data is given by r Pr[{y ij } k,i] = q y ij jk (1 q jk) 1 y ij Pr[{y ij } i] = L(θ K ) = Pr[{y ij }] = j=1 K r π k k=1 n i=1 j=1 K r π k k=1 q y ij jk (1 q jk) 1 y ij j=1 q y ij jk (1 q jk) 1 y ij. BOB 2015 Terrorist network p. 14

15 Analysis with the complete data likelihood Bayesian analysis is greatly simplified by introducing counterfactual missing data: the class identification of each actor. We define Z ik = 1 if actor i belongs to class k, and zero otherwise, with (prior) probability π k. If the complete data y ij and Z ik were observed, the complete data likelihood CL(θ K ) for the K-class model would be CL(θ K ) = Pr[{y ij },{Z ik }] = Pr[{y ij } {Z ik }] Pr[{Z ik }] [ n r K ( ) = q y ij jk (1 q jk) 1 y Zik n ij = i=1j=1k=1 n K i=1k=1 π k r j=1 q y ij jk (1 q jk) 1 y ij Zik. K i=1k=1 π Z ik k ] BOB 2015 Terrorist network p. 15

16 MCMC analysis MCMC iterates between making random draws of the Z ik given the current parameter draws, and random draws of the parameters given the current Z ik draws. With flat or non-informative priors on the parameters and the Z ik, the conditional distributions can be inferred from the complete data likelihood: the Z ik given the parameters are multinomial with probabilities proportional to π r k j=1 qy ij jk (1 q jk) 1 y ij ; the parameters given the Z ik : the π k are Dirichlet with parameters Z +k = n i=1 Z ik; the q jk are Beta with parameters n i=1 Z iky ij, n i=1 Z ik(1 y ij ). BOB 2015 Terrorist network p. 16

17 Bayesian model comparison Bayesian theory allows us to decide which of these (or other) models is the most plausible for a given data set, through the deviance distributions for the competing models (deviance = -2 log likelihood = badness of fit of the model"). This approach (Dempster 1997, Aitkin 1997, Aitkin, Boys and Chadwick 2005, Aitkin 2010) is to use the posterior distributions of the likelihoods, by substituting M (typically 10,000) random draws θ [m] k of the parameters θ k (k = 1,...,K) from their posterior distributions into the (observed data) likelihoods L(θ k ), giving M corresponding random draws L [m] k = L(θ [m] k ) from the posterior distributions of the observed data likelihoods. BOB 2015 Terrorist network p. 17

18 Model comparison through posterior deviances Because of the scale of likelihoods, we use (observed data) deviances D k (θ k ) = 2logL k (θ k ) rather than likelihoods L. Models are compared for the stochastic ordering of their posterior deviance distributions, initially by graphing the cdfs of the deviance draws D [m] k = D k (θ [m] k ) for each number of components; the left-most cdf defines the best-supported model. BOB 2015 Terrorist network p. 18

19 DIC The DIC of Spiegelhalter et al (2002), implemented in BUGS, also uses deviance draws, but these are of the complete data deviance rather than the observed data deviance, and are used only to compute the mean complete data deviance across the draws. The complete data likelihood and deviance treat the latent class as an observed structure, overstating the data information. The DIC, like AIC and BIC and some other decision criteria, requires a penalty (in this case using the effective number of parameters) on the mean complete data deviance to account for this overstatement. This is not needed for the comparison of observed data deviance distributions: models with increasing numbers of components are effectively penalized for their increasing parametrization, as they have increasingly diffuse deviance distributions because of the decreasing data information about each component. But does it work? see later. BOB 2015 Terrorist network p. 19

20 Class membership probabilities The Bayesian analysis also provides the full posterior distribution of the class membership probabilities. The probability of membership of actor i in class k, given the data, follows from Bayes s theorem: π ik data = Pr(i classk data) = π r k j=1 qy ij jk (1 q jk) 1 y ij K ] r k=1[π k j=1 qy ij jk (1 q jk) 1 y ij Substituting the parameter draws θ [m] k = (π [m] k,qm] jk ) into the membership probability gives its posterior distribution from the M values π [m] ik data. Label-switching can be a problem. BOB 2015 Terrorist network p. 20

21 The Natchez women Asymptotic Deviances CDF Rasch K = 2 K = Deviance BOB 2015 Terrorist network p. 21

22 The Natchez women The two-class latent class model fits better than the Rasch, and the three-class model does not fit any better than the two-class model. We conclude that the two-class model is best. BOB 2015 Terrorist network p. 22

23 Posterior membership distributions for women 1 9 BOB 2015 Terrorist network p. 23

24 Posterior membership distributions for women BOB 2015 Terrorist network p. 24

25 Summary Women 1 6 clearly belong to class 1. Women clearly belong to class 2. Women 7 9 have grades of membership in both classes: Woman 7 93% in class 1, 7% in class 2. Woman 8 11% in class 1, 89% in class 2. Woman 9 43% in class 1, 57% in class 2. Woman 9 was claimed by both classes in interviews; woman 8 was placed in a separate class in many analyses. BOB 2015 Terrorist network p. 25

26 The Noordin Top terrorist network (Wikipedia) Noordin Mohammad Top, a Malaysian citizen was a Muslim extremist, and Indonesia s most wanted Islamist militant. He is thought to have been a key bomb maker and/or financier. Noordin and Azahari Husin were thought to have masterminded the 2003 Marriott hotel bombing in Jakarta, the 2004 Australian embassy bombing in Jakarta, the 2005 Bali bombings and the 2009 JW Marriott-Ritz-Carlton bombings, and Noordin may have assisted in the 2002 Bali bombings. Noordin was an indoctrinator who specialized in recruiting militants into becoming suicide bombers and collecting funds for militant activities. Husin was killed in a police raid on his hideout in Batu, near Malang in East Java on 9 November Top was killed during a police raid in Solo, Central Java, on 17 September 2009 conducted by an Indonesian anti-terrorist team. BOB 2015 Terrorist network p. 26

27 Data source The data come from the book Disrupting Dark Networks by Everton, in the Structural Analysis in the Social Science Series, Cambridge (2012). Appendix 1 of this book provides data: The subset of the Noordin Top Terrorist Network was drawn primarily from Terrorism in Indonesia: Noordin s Networks, a 2006 publication of the International Crisis Group. It includes relational data on 79 individuals listed in Appendix C of that publication. The data were initially coded as 45 binary items. Our analysis is restricted to 75 of these individuals: four individuals were eliminated as they were not present at any of the 45 events used in the analysis. BOB 2015 Terrorist network p. 27

28 Original Matrix 74 Actors 45 Events BOB 2015 Terrorist network p. 28

29 Interpretation The appearance of the full adjacency matrix is quite different from that for the Natchez women. It is very sparse, and appears random without the structure we can see in the Natchez women network matrix. Re-ordering the rows and columns does not change much the appearance of the matrix there is no clear division into sub-groups. BOB 2015 Terrorist network p. 29

30 The Noordin network Noordin CDF Rasch K = 2 K = 3 K = Deviance BOB 2015 Terrorist network p. 30

31 The Noordin network The three-class model fits better than the two-class or Rasch, and the four-class model does not fit any better than the three-class. We conclude that the three-class model is best. Which actors are in which classes? We need a probabilistic expression for this the ternary plot. BOB 2015 Terrorist network p. 31

32 Ternary plot Noordin Mohammed Top Azahari Husin G 1 G 3 G BOB 2015 Terrorist network p. 32

33 The Noordin network Top and Husin define class 1 the planners and leaders. Class 3 contains all the actors who attended 6-9 events the trainers who meet the planners and train the footsoldiers. Actors who attended 5 or fewer events are spread along the class 2-3 axis class 2 are the footsoldiers who are present at organisation, training and operations never at finance, meetings or logistics. The division beween classes 2 and 3 is not absolute. This may be because of the short working life" of many actors, killed in actions or arrested and in prison. Trainers who are killed in actions or imprisoned, may be replaced by footsoldiers who have survived actions. Of the 74 actors, 45 were dead or in prison by BOB 2015 Terrorist network p. 33

34 What works in Bayesian model comparison? AIC chooses over-complex models asymptotically. BIC chooses the true model asymptotically but chooses under-complex models in finite samples. Simulation studies with the posterior deviance distribution: comparison of normal mixture models with 1 7 components on samples generated from each model (Aitkin, Vu and Francis 2015), derived from real astronomical galaxy recession velocity data, with DIC comparison: Aitkin, M., Vu, D. and Francis, B.J. (2015). A new Bayesian approach for determining the number of components in a finite mixture. Metron DOI : /s comparison of four single-population models normal, lognormal, gamma and multinomial, on samples generated from each model (Aitkin 2010 pp ), derived from real family income data. BOB 2015 Terrorist network p. 34

35 Galaxy recession velocity data 82 observations of recession velocities of galaxies, modelled as a mixture of K normals with different means and variances (shown with single normal cdf) cdf velocity BOB 2015 Terrorist network p. 35

36 Simulations 100 samples of sizes 82, 164, 328, 656 were generated from normal mixture distributions for K = 1,...,7 components. The means, variances and proportions for the simulated data were equal to the MLEs of the galaxy data. For each sample the posterior deviance for each model was computed for each of 1,000 draws, and for each draw the model with the smallest deviance was called best". The graphs show the percentages in the 100 samples of best model" for each K and sample size, by both the DIC and the posterior deviance: DIC (solid), posterior deviance (dash). Panel 1 sample size 82, panel 2 (across) sample size 164, panel 3 (under) sample size 328, panel 4 sample size 656. BOB 2015 Terrorist network p. 36

37 Galaxy data simulations percent correct percent correct K K percent correct percent correct K K BOB 2015 Terrorist network p. 37

38 Simulation results The DIC performed well for up to 3 components but poorly for more than 3. The posterior deviance worked uniformly better: its correct identification probabilities increased steadily with increasing sample size. The posterior deviance performed similarly in simulations from latent class models with varying numbers of classes (Aitkin, Vu and Francis 2015). BOB 2015 Terrorist network p. 38

39 Family income data count INCOME in hundreds BOB 2015 Terrorist network p. 39

40 Model fit normal population deviate lognormal population deviate deviate deviate income gamma population deviate log income deviate income BOB 2015 Terrorist network p. 40

41 Simulations 1,000 samples of sizes from 10 to 1,000 were generated from normal, lognormal, gamma and multinomial populations. The mean and variance for the parametric models were equal to those of the income data population. For each sample the posterior deviance for each model was computed for each of 10,000 draws, and for each draw the model with the smallest deviance was called best". The graphs show the proportions in the 1,000 samples of best model" for each of the four: normal (solid), lognormal (dotted), gamma (dot-dash), multinomial (dash). Panel 1 true normal, panel 2 (across) true lognormal, panel 3 (under) true gamma, panel 4 true multinomial. BOB 2015 Terrorist network p. 41

42 Model choice performance probability 0.4 probability sample size sample size probability probability sample size sample size BOB 2015 Terrorist network p. 42

43 Simulation results For the true multinomial, by sample size 120 the multinomial model was best, and its probability increased rapidly with increasing sample size. For the parametric models: the true normal was best for all sample sizes, the true lognormal was best for sample sizes greater than 120, the true gamma was best for sample sizes greater than 50. the multinomial always had low probability for small sample sizes, but this approached 0.5 in large samples. BOB 2015 Terrorist network p. 43

44 References Aitkin, M. (1997). The calibration of P-values, posterior Bayes factors and the AIC from the posterior distribution of the likelihood (with Discussion). Statistics and Computing 7, Aitkin, M. (2001). Likelihood and Bayesian analysis of mixtures. Statistical Modelling 1, Aitkin, M., Boys, R.J. and Chadwick, T. (2005). Bayesian point null hypothesis testing via the posterior likelihood ratio. Statistics and Computing 15, Aitkin, M. (2010). Statistical Inference: an Integrated Likelihood/ Bayesian Approach. Chapman and Hall/CRC Press, Boca Raton FL. Aitkin, M. (2011). How many components in a finite mixture? In Mixtures: Estimation and Applications, eds. K.L. Mengersen, C.P. Robert and D.M. Titterington. Wiley, Chichester, Aitkin, M., Vu, D. and Francis, B.J. (2014). Statistical modelling of the group structure of social networks. Social Networks 38, Aitkin, M., Vu, D. and Francis, B.J. (2015). A new Bayesian approach for determining the number of components in a finite mixture. Metron DOI : /s BOB 2015 Terrorist network p. 44

45 References Davis, A., Gardner, B.B. and Gardner, M.R. (1941). Deep South: A Social Anthropological Study of Caste and Class. Chicago: University Press. Dempster, A. P. (1997). The direct use of likelihood in significance testing. Statistics and Computing 7, Everton, S.F. (2012). Disrupting Dark Networks. Cambridge: University Press. Spiegelhalter, D.J., Best, N.G., Carlin, B.P. and van der Linde, A. (2002). Bayesian measures of model complexity and fit (with Discussion). Journal of the Royal Statistical Society B 64, BOB 2015 Terrorist network p. 45

Statistical modelling of social networks

Statistical modelling of social networks Murray Aitkin and Duy Vu, and Brian Francis murray.aitkin@unimelb.edu.au duy.vu@unimelb.edu.au b.francis@lancaster.ac.uk Department of Mathematics and Statistics,