Statistical modelling of a terrorist network with the latent class model and Bayesian model comparisons
|
|
- Reynard Cooper
- 5 years ago
- Views:
Transcription
1 Statistical modelling of a terrorist network with the latent class model and Bayesian model comparisons Murray Aitkin and Duy Vu, and Brian Francis murray.aitkin@unimelb.edu.au duy.vu@unimelb.edu.au b.francis@lancaster.ac.uk School of Mathematics and Statistics, University of Melbourne and Department of Mathematics and Statistics, University of Lancaster, UK BOB 2015 Terrorist network p. 1
2 Statistical modelling of social networks Work supported by Australian Research Council , Aim: to evaluate latent class modelling by maximum likelihood and Bayesian methods for the analysis of social network and criminal career data. Participants Murray Aitkin, Pip Pattison, Brian Francis, Duy Vu. Main contribution: identifying subgroups, their number and membership, by latent class modelling and Bayesian model comparison. Two examples: Social network of the Natchez Mississippi women see Aitkin, M., Vu, D. and Francis, B.J. (2014). Statistical modelling of the group structure of social networks. Social Networks 38, Noordin Top terrorist network Aitkin, M., Vu, D. and Francis, B.J. (2016) to appear in an RSS journal (A or C). BOB 2015 Terrorist network p. 2
3 Where do network data come from? For social networks networks of people or other social creatures from either direct observation, or indirect data gathering though newpapers and other recording instruments. These sources of information provide the evidence of connections among actors. The connections are presented mathematically, and are analysed through properties of the mathematical structure. BOB 2015 Terrorist network p. 3
4 Facebook friendship network (Wikipedia) BOB 2015 Terrorist network p. 4
5 Unipartite and bipartite networks The Facebook network is a unipartite network it represents direct connections between the Facebook users. These connections may be directed (A likes B does not imply that B likes A) or undirected or reciprocal (A and B are connected through something, like Facebook). We discuss bipartite networks, in which the connections between actors are through their joint participation in events. BOB 2015 Terrorist network p. 5
6 The Natchez social network BOB 2015 Terrorist network p. 6
7 The adjacency matrix To perform any analysis we need to re-express the table elements mathematically through a link, or tie variable Y ij, with the presence of woman i at event j defining Y ij = 1, and her absence from the event defining Y ij = 0. We use n to denote the number of rows women, and r to denote the number of columns events. The resulting table is expressed as an n r matrix, called the adjacency matrix, denoted by Y. Marginal totals (T) have been added to the table, giving the total number of events attended by each woman, and the total number of women attending each event. We see that women vary in their propensity to attend events, and events vary in their attractiveness to women. We also give the marital status of woman i in a variable x i, coded 1 for married and 0 for unmarried. BOB 2015 Terrorist network p. 7
8 Two-mode network data x W\E T T BOB 2015 Terrorist network p. 8
9 Two-mode network data zeros suppressed x W\E T T BOB 2015 Terrorist network p. 9
10 Original Matrix 18 Actors 14 Events BOB 2015 Terrorist network p. 10
11 Random Shuffled Matrix 18 Actors 14 Events BOB 2015 Terrorist network p. 11
12 Probability models for actors and events Analysis needs to allow for uncertainty in the behaviour of actors: even if they form an established group with other actors, this does not mean that they all attend the same events. We consider the presence or absence of an actor at an event as a random process attendance is determined by a possibly large number of factors unknown to us, so we represent the process outcome as a Bernoulli random variable: The probability that actor i attends event j, (Y ij = 1), is p ij, and that actor i does not attend event j, (Y ij = 0), is 1 p ij. We want to bring the actors and event structures into the event attendance probability in some way. BOB 2015 Terrorist network p. 12
13 Models The null" model is a single-parameter model, giving the same constant probability p ij = p that every actor attends every event, independently across events and actors all actors have the same attendance probability, and all events have the same attraction probability. The Rasch model has a parameter for each actor and a parameter for each event: Each actor i has a propensity θ i to attend any event. Each event j has an attractiveness φ j to any actor. Actors attend events independently. The Rasch model is a main effect or additive exponential random graph model (ERGM), in events and actors, on the logit-transformed probability scale: logitp ij = log It has no subgroup structure. ( pij 1 p ij ) = θ i +φ j. BOB 2015 Terrorist network p. 13
14 The latent class model This model specifies a K-class latent structure for actors. The K classes are distinguished by K sets of event attendance parameters q jk, different among classes, but identical within classes. The proportion of actors in class k is π k ; θ K = (K,{π k },{q jk }). The class structure is unobserved; it is implied and identified by the actors different patterns of event attendance. The (observed data) likelihood L(θ K ) the probability of the observed data is given by r Pr[{y ij } k,i] = q y ij jk (1 q jk) 1 y ij Pr[{y ij } i] = L(θ K ) = Pr[{y ij }] = j=1 K r π k k=1 n i=1 j=1 K r π k k=1 q y ij jk (1 q jk) 1 y ij j=1 q y ij jk (1 q jk) 1 y ij. BOB 2015 Terrorist network p. 14
15 Analysis with the complete data likelihood Bayesian analysis is greatly simplified by introducing counterfactual missing data: the class identification of each actor. We define Z ik = 1 if actor i belongs to class k, and zero otherwise, with (prior) probability π k. If the complete data y ij and Z ik were observed, the complete data likelihood CL(θ K ) for the K-class model would be CL(θ K ) = Pr[{y ij },{Z ik }] = Pr[{y ij } {Z ik }] Pr[{Z ik }] [ n r K ( ) = q y ij jk (1 q jk) 1 y Zik n ij = i=1j=1k=1 n K i=1k=1 π k r j=1 q y ij jk (1 q jk) 1 y ij Zik. K i=1k=1 π Z ik k ] BOB 2015 Terrorist network p. 15
16 MCMC analysis MCMC iterates between making random draws of the Z ik given the current parameter draws, and random draws of the parameters given the current Z ik draws. With flat or non-informative priors on the parameters and the Z ik, the conditional distributions can be inferred from the complete data likelihood: the Z ik given the parameters are multinomial with probabilities proportional to π r k j=1 qy ij jk (1 q jk) 1 y ij ; the parameters given the Z ik : the π k are Dirichlet with parameters Z +k = n i=1 Z ik; the q jk are Beta with parameters n i=1 Z iky ij, n i=1 Z ik(1 y ij ). BOB 2015 Terrorist network p. 16
17 Bayesian model comparison Bayesian theory allows us to decide which of these (or other) models is the most plausible for a given data set, through the deviance distributions for the competing models (deviance = -2 log likelihood = badness of fit of the model"). This approach (Dempster 1997, Aitkin 1997, Aitkin, Boys and Chadwick 2005, Aitkin 2010) is to use the posterior distributions of the likelihoods, by substituting M (typically 10,000) random draws θ [m] k of the parameters θ k (k = 1,...,K) from their posterior distributions into the (observed data) likelihoods L(θ k ), giving M corresponding random draws L [m] k = L(θ [m] k ) from the posterior distributions of the observed data likelihoods. BOB 2015 Terrorist network p. 17
18 Model comparison through posterior deviances Because of the scale of likelihoods, we use (observed data) deviances D k (θ k ) = 2logL k (θ k ) rather than likelihoods L. Models are compared for the stochastic ordering of their posterior deviance distributions, initially by graphing the cdfs of the deviance draws D [m] k = D k (θ [m] k ) for each number of components; the left-most cdf defines the best-supported model. BOB 2015 Terrorist network p. 18
19 DIC The DIC of Spiegelhalter et al (2002), implemented in BUGS, also uses deviance draws, but these are of the complete data deviance rather than the observed data deviance, and are used only to compute the mean complete data deviance across the draws. The complete data likelihood and deviance treat the latent class as an observed structure, overstating the data information. The DIC, like AIC and BIC and some other decision criteria, requires a penalty (in this case using the effective number of parameters) on the mean complete data deviance to account for this overstatement. This is not needed for the comparison of observed data deviance distributions: models with increasing numbers of components are effectively penalized for their increasing parametrization, as they have increasingly diffuse deviance distributions because of the decreasing data information about each component. But does it work? see later. BOB 2015 Terrorist network p. 19
20 Class membership probabilities The Bayesian analysis also provides the full posterior distribution of the class membership probabilities. The probability of membership of actor i in class k, given the data, follows from Bayes s theorem: π ik data = Pr(i classk data) = π r k j=1 qy ij jk (1 q jk) 1 y ij K ] r k=1[π k j=1 qy ij jk (1 q jk) 1 y ij Substituting the parameter draws θ [m] k = (π [m] k,qm] jk ) into the membership probability gives its posterior distribution from the M values π [m] ik data. Label-switching can be a problem. BOB 2015 Terrorist network p. 20
21 The Natchez women Asymptotic Deviances CDF Rasch K = 2 K = Deviance BOB 2015 Terrorist network p. 21
22 The Natchez women The two-class latent class model fits better than the Rasch, and the three-class model does not fit any better than the two-class model. We conclude that the two-class model is best. BOB 2015 Terrorist network p. 22
23 Posterior membership distributions for women 1 9 BOB 2015 Terrorist network p. 23
24 Posterior membership distributions for women BOB 2015 Terrorist network p. 24
25 Summary Women 1 6 clearly belong to class 1. Women clearly belong to class 2. Women 7 9 have grades of membership in both classes: Woman 7 93% in class 1, 7% in class 2. Woman 8 11% in class 1, 89% in class 2. Woman 9 43% in class 1, 57% in class 2. Woman 9 was claimed by both classes in interviews; woman 8 was placed in a separate class in many analyses. BOB 2015 Terrorist network p. 25
26 The Noordin Top terrorist network (Wikipedia) Noordin Mohammad Top, a Malaysian citizen was a Muslim extremist, and Indonesia s most wanted Islamist militant. He is thought to have been a key bomb maker and/or financier. Noordin and Azahari Husin were thought to have masterminded the 2003 Marriott hotel bombing in Jakarta, the 2004 Australian embassy bombing in Jakarta, the 2005 Bali bombings and the 2009 JW Marriott-Ritz-Carlton bombings, and Noordin may have assisted in the 2002 Bali bombings. Noordin was an indoctrinator who specialized in recruiting militants into becoming suicide bombers and collecting funds for militant activities. Husin was killed in a police raid on his hideout in Batu, near Malang in East Java on 9 November Top was killed during a police raid in Solo, Central Java, on 17 September 2009 conducted by an Indonesian anti-terrorist team. BOB 2015 Terrorist network p. 26
27 Data source The data come from the book Disrupting Dark Networks by Everton, in the Structural Analysis in the Social Science Series, Cambridge (2012). Appendix 1 of this book provides data: The subset of the Noordin Top Terrorist Network was drawn primarily from Terrorism in Indonesia: Noordin s Networks, a 2006 publication of the International Crisis Group. It includes relational data on 79 individuals listed in Appendix C of that publication. The data were initially coded as 45 binary items. Our analysis is restricted to 75 of these individuals: four individuals were eliminated as they were not present at any of the 45 events used in the analysis. BOB 2015 Terrorist network p. 27
28 Original Matrix 74 Actors 45 Events BOB 2015 Terrorist network p. 28
29 Interpretation The appearance of the full adjacency matrix is quite different from that for the Natchez women. It is very sparse, and appears random without the structure we can see in the Natchez women network matrix. Re-ordering the rows and columns does not change much the appearance of the matrix there is no clear division into sub-groups. BOB 2015 Terrorist network p. 29
30 The Noordin network Noordin CDF Rasch K = 2 K = 3 K = Deviance BOB 2015 Terrorist network p. 30
31 The Noordin network The three-class model fits better than the two-class or Rasch, and the four-class model does not fit any better than the three-class. We conclude that the three-class model is best. Which actors are in which classes? We need a probabilistic expression for this the ternary plot. BOB 2015 Terrorist network p. 31
32 Ternary plot Noordin Mohammed Top Azahari Husin G 1 G 3 G BOB 2015 Terrorist network p. 32
33 The Noordin network Top and Husin define class 1 the planners and leaders. Class 3 contains all the actors who attended 6-9 events the trainers who meet the planners and train the footsoldiers. Actors who attended 5 or fewer events are spread along the class 2-3 axis class 2 are the footsoldiers who are present at organisation, training and operations never at finance, meetings or logistics. The division beween classes 2 and 3 is not absolute. This may be because of the short working life" of many actors, killed in actions or arrested and in prison. Trainers who are killed in actions or imprisoned, may be replaced by footsoldiers who have survived actions. Of the 74 actors, 45 were dead or in prison by BOB 2015 Terrorist network p. 33
34 What works in Bayesian model comparison? AIC chooses over-complex models asymptotically. BIC chooses the true model asymptotically but chooses under-complex models in finite samples. Simulation studies with the posterior deviance distribution: comparison of normal mixture models with 1 7 components on samples generated from each model (Aitkin, Vu and Francis 2015), derived from real astronomical galaxy recession velocity data, with DIC comparison: Aitkin, M., Vu, D. and Francis, B.J. (2015). A new Bayesian approach for determining the number of components in a finite mixture. Metron DOI : /s comparison of four single-population models normal, lognormal, gamma and multinomial, on samples generated from each model (Aitkin 2010 pp ), derived from real family income data. BOB 2015 Terrorist network p. 34
35 Galaxy recession velocity data 82 observations of recession velocities of galaxies, modelled as a mixture of K normals with different means and variances (shown with single normal cdf) cdf velocity BOB 2015 Terrorist network p. 35
36 Simulations 100 samples of sizes 82, 164, 328, 656 were generated from normal mixture distributions for K = 1,...,7 components. The means, variances and proportions for the simulated data were equal to the MLEs of the galaxy data. For each sample the posterior deviance for each model was computed for each of 1,000 draws, and for each draw the model with the smallest deviance was called best". The graphs show the percentages in the 100 samples of best model" for each K and sample size, by both the DIC and the posterior deviance: DIC (solid), posterior deviance (dash). Panel 1 sample size 82, panel 2 (across) sample size 164, panel 3 (under) sample size 328, panel 4 sample size 656. BOB 2015 Terrorist network p. 36
37 Galaxy data simulations percent correct percent correct K K percent correct percent correct K K BOB 2015 Terrorist network p. 37
38 Simulation results The DIC performed well for up to 3 components but poorly for more than 3. The posterior deviance worked uniformly better: its correct identification probabilities increased steadily with increasing sample size. The posterior deviance performed similarly in simulations from latent class models with varying numbers of classes (Aitkin, Vu and Francis 2015). BOB 2015 Terrorist network p. 38
39 Family income data count INCOME in hundreds BOB 2015 Terrorist network p. 39
40 Model fit normal population deviate lognormal population deviate deviate deviate income gamma population deviate log income deviate income BOB 2015 Terrorist network p. 40
41 Simulations 1,000 samples of sizes from 10 to 1,000 were generated from normal, lognormal, gamma and multinomial populations. The mean and variance for the parametric models were equal to those of the income data population. For each sample the posterior deviance for each model was computed for each of 10,000 draws, and for each draw the model with the smallest deviance was called best". The graphs show the proportions in the 1,000 samples of best model" for each of the four: normal (solid), lognormal (dotted), gamma (dot-dash), multinomial (dash). Panel 1 true normal, panel 2 (across) true lognormal, panel 3 (under) true gamma, panel 4 true multinomial. BOB 2015 Terrorist network p. 41
42 Model choice performance probability 0.4 probability sample size sample size probability probability sample size sample size BOB 2015 Terrorist network p. 42
43 Simulation results For the true multinomial, by sample size 120 the multinomial model was best, and its probability increased rapidly with increasing sample size. For the parametric models: the true normal was best for all sample sizes, the true lognormal was best for sample sizes greater than 120, the true gamma was best for sample sizes greater than 50. the multinomial always had low probability for small sample sizes, but this approached 0.5 in large samples. BOB 2015 Terrorist network p. 43
44 References Aitkin, M. (1997). The calibration of P-values, posterior Bayes factors and the AIC from the posterior distribution of the likelihood (with Discussion). Statistics and Computing 7, Aitkin, M. (2001). Likelihood and Bayesian analysis of mixtures. Statistical Modelling 1, Aitkin, M., Boys, R.J. and Chadwick, T. (2005). Bayesian point null hypothesis testing via the posterior likelihood ratio. Statistics and Computing 15, Aitkin, M. (2010). Statistical Inference: an Integrated Likelihood/ Bayesian Approach. Chapman and Hall/CRC Press, Boca Raton FL. Aitkin, M. (2011). How many components in a finite mixture? In Mixtures: Estimation and Applications, eds. K.L. Mengersen, C.P. Robert and D.M. Titterington. Wiley, Chichester, Aitkin, M., Vu, D. and Francis, B.J. (2014). Statistical modelling of the group structure of social networks. Social Networks 38, Aitkin, M., Vu, D. and Francis, B.J. (2015). A new Bayesian approach for determining the number of components in a finite mixture. Metron DOI : /s BOB 2015 Terrorist network p. 44
45 References Davis, A., Gardner, B.B. and Gardner, M.R. (1941). Deep South: A Social Anthropological Study of Caste and Class. Chicago: University Press. Dempster, A. P. (1997). The direct use of likelihood in significance testing. Statistics and Computing 7, Everton, S.F. (2012). Disrupting Dark Networks. Cambridge: University Press. Spiegelhalter, D.J., Best, N.G., Carlin, B.P. and van der Linde, A. (2002). Bayesian measures of model complexity and fit (with Discussion). Journal of the Royal Statistical Society B 64, BOB 2015 Terrorist network p. 45
Statistical modelling of social networks
Statistical modelling of social networks Murray Aitkin and Duy Vu, and Brian Francis murray.aitkin@unimelb.edu.au duy.vu@unimelb.edu.au b.francis@lancaster.ac.uk Department of Mathematics and Statistics,
More informationNew Bayesian methods for model comparison
Back to the future New Bayesian methods for model comparison Murray Aitkin murray.aitkin@unimelb.edu.au Department of Mathematics and Statistics The University of Melbourne Australia Bayesian Model Comparison
More informationA new Bayesian approach for determining the number of components in a finite mixturee
Metron manuscript No. (will be inserted by the editor) A new Bayesian approach for determining the number of components in a finite mixturee METRON DOI:10.1007/s40300-015-0068-1 Murray Aitkin Duy Vu Brian
More informationPenalized Loss functions for Bayesian Model Choice
Penalized Loss functions for Bayesian Model Choice Martyn International Agency for Research on Cancer Lyon, France 13 November 2009 The pure approach For a Bayesian purist, all uncertainty is represented
More informationHeriot-Watt University
Heriot-Watt University Heriot-Watt University Research Gateway Prediction of settlement delay in critical illness insurance claims by using the generalized beta of the second kind distribution Dodd, Erengul;
More information7. Estimation and hypothesis testing. Objective. Recommended reading
7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing
More informationBayesian Mixture Modeling of Significant P Values: A Meta-Analytic Method to Estimate the Degree of Contamination from H 0 : Supplemental Material
Bayesian Mixture Modeling of Significant P Values: A Meta-Analytic Method to Estimate the Degree of Contamination from H 0 : Supplemental Material Quentin Frederik Gronau 1, Monique Duizer 1, Marjan Bakker
More informationRandom Effects Models for Network Data
Random Effects Models for Network Data Peter D. Hoff 1 Working Paper no. 28 Center for Statistics and the Social Sciences University of Washington Seattle, WA 98195-4320 January 14, 2003 1 Department of
More information36-720: The Rasch Model
36-720: The Rasch Model Brian Junker October 15, 2007 Multivariate Binary Response Data Rasch Model Rasch Marginal Likelihood as a GLMM Rasch Marginal Likelihood as a Log-Linear Model Example For more
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationLatent classes for preference data
Latent classes for preference data Brian Francis Lancaster University, UK Regina Dittrich, Reinhold Hatzinger, Patrick Mair Vienna University of Economics 1 Introduction Social surveys often contain questions
More informationMLE for a logit model
IGIDR, Bombay September 4, 2008 Goals The link between workforce participation and education Analysing a two-variable data-set: bivariate distributions Conditional probability Expected mean from conditional
More informationOutline. Clustering. Capturing Unobserved Heterogeneity in the Austrian Labor Market Using Finite Mixtures of Markov Chain Models
Capturing Unobserved Heterogeneity in the Austrian Labor Market Using Finite Mixtures of Markov Chain Models Collaboration with Rudolf Winter-Ebmer, Department of Economics, Johannes Kepler University
More informationBayesian model selection for computer model validation via mixture model estimation
Bayesian model selection for computer model validation via mixture model estimation Kaniav Kamary ATER, CNAM Joint work with É. Parent, P. Barbillon, M. Keller and N. Bousquet Outline Computer model validation
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationBayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units
Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Sahar Z Zangeneh Robert W. Keener Roderick J.A. Little Abstract In Probability proportional
More informationStandard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j
Standard Errors & Confidence Intervals β β asy N(0, I( β) 1 ), where I( β) = [ 2 l(β, φ; y) ] β i β β= β j We can obtain asymptotic 100(1 α)% confidence intervals for β j using: β j ± Z 1 α/2 se( β j )
More informationSpecification and estimation of exponential random graph models for social (and other) networks
Specification and estimation of exponential random graph models for social (and other) networks Tom A.B. Snijders University of Oxford March 23, 2009 c Tom A.B. Snijders (University of Oxford) Models for
More informationLearning Bayesian network : Given structure and completely observed data
Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution
More informationBayesian linear regression
Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding
More informationModel Selection in GLMs. (should be able to implement frequentist GLM analyses!) Today: standard frequentist methods for model selection
Model Selection in GLMs Last class: estimability/identifiability, analysis of deviance, standard errors & confidence intervals (should be able to implement frequentist GLM analyses!) Today: standard frequentist
More informationBayesian Models in Machine Learning
Bayesian Models in Machine Learning Lukáš Burget Escuela de Ciencias Informáticas 2017 Buenos Aires, July 24-29 2017 Frequentist vs. Bayesian Frequentist point of view: Probability is the frequency of
More informationDownloaded from:
Camacho, A; Kucharski, AJ; Funk, S; Breman, J; Piot, P; Edmunds, WJ (2014) Potential for large outbreaks of Ebola virus disease. Epidemics, 9. pp. 70-8. ISSN 1755-4365 DOI: https://doi.org/10.1016/j.epidem.2014.09.003
More information7. Estimation and hypothesis testing. Objective. Recommended reading
7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing
More informationMixtures of Rasch Models
Mixtures of Rasch Models Hannah Frick, Friedrich Leisch, Achim Zeileis, Carolin Strobl http://www.uibk.ac.at/statistics/ Introduction Rasch model for measuring latent traits Model assumption: Item parameters
More informationUsing Model Selection and Prior Specification to Improve Regime-switching Asset Simulations
Using Model Selection and Prior Specification to Improve Regime-switching Asset Simulations Brian M. Hartman, PhD ASA Assistant Professor of Actuarial Science University of Connecticut BYU Statistics Department
More informationDynamic sequential analysis of careers
Dynamic sequential analysis of careers Fulvia Pennoni Department of Statistics and Quantitative Methods University of Milano-Bicocca http://www.statistica.unimib.it/utenti/pennoni/ Email: fulvia.pennoni@unimib.it
More informationStat 542: Item Response Theory Modeling Using The Extended Rank Likelihood
Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Jonathan Gruhl March 18, 2010 1 Introduction Researchers commonly apply item response theory (IRT) models to binary and ordinal
More informationApplication of Latent Class with Random Effects Models to Longitudinal Data. Ken Beath Macquarie University
Application of Latent Class with Random Effects Models to Longitudinal Data Ken Beath Macquarie University Introduction Latent trajectory is a method of classifying subjects based on longitudinal data
More informationGenerative Models for Discrete Data
Generative Models for Discrete Data ddebarr@uw.edu 2016-04-21 Agenda Bayesian Concept Learning Beta-Binomial Model Dirichlet-Multinomial Model Naïve Bayes Classifiers Bayesian Concept Learning Numbers
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables
More informationST440/540: Applied Bayesian Statistics. (9) Model selection and goodness-of-fit checks
(9) Model selection and goodness-of-fit checks Objectives In this module we will study methods for model comparisons and checking for model adequacy For model comparisons there are a finite number of candidate
More informationDIC: Deviance Information Criterion
(((( Welcome Page Latest News DIC: Deviance Information Criterion Contact us/bugs list WinBUGS New WinBUGS examples FAQs DIC GeoBUGS DIC (Deviance Information Criterion) is a Bayesian method for model
More informationLinear Regression. Data Model. β, σ 2. Process Model. ,V β. ,s 2. s 1. Parameter Model
Regression: Part II Linear Regression y~n X, 2 X Y Data Model β, σ 2 Process Model Β 0,V β s 1,s 2 Parameter Model Assumptions of Linear Model Homoskedasticity No error in X variables Error in Y variables
More informationAppendix: Modeling Approach
AFFECTIVE PRIMACY IN INTRAORGANIZATIONAL TASK NETWORKS Appendix: Modeling Approach There is now a significant and developing literature on Bayesian methods in social network analysis. See, for instance,
More informationTopic Models. Brandon Malone. February 20, Latent Dirichlet Allocation Success Stories Wrap-up
Much of this material is adapted from Blei 2003. Many of the images were taken from the Internet February 20, 2014 Suppose we have a large number of books. Each is about several unknown topics. How can
More informationBayesian nonparametric models for bipartite graphs
Bayesian nonparametric models for bipartite graphs François Caron Department of Statistics, Oxford Statistics Colloquium, Harvard University November 11, 2013 F. Caron 1 / 27 Bipartite networks Readers/Customers
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationIRT Model Selection Methods for Polytomous Items
IRT Model Selection Methods for Polytomous Items Taehoon Kang University of Wisconsin-Madison Allan S. Cohen University of Georgia Hyun Jung Sung University of Wisconsin-Madison March 11, 2005 Running
More informationMixture models for heterogeneity in ranked data
Mixture models for heterogeneity in ranked data Brian Francis Lancaster University, UK Regina Dittrich, Reinhold Hatzinger Vienna University of Economics CSDA 2005 Limassol 1 Introduction Social surveys
More informationBayes methods for categorical data. April 25, 2017
Bayes methods for categorical data April 25, 2017 Motivation for joint probability models Increasing interest in high-dimensional data in broad applications Focus may be on prediction, variable selection,
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationLatent class analysis and finite mixture models with Stata
Latent class analysis and finite mixture models with Stata Isabel Canette Principal Mathematician and Statistician StataCorp LLC 2017 Stata Users Group Meeting Madrid, October 19th, 2017 Introduction Latent
More information9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering
Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make
More information\ fwf The Institute for Integrating Statistics in Decision Sciences
# \ fwf The Institute for Integrating Statistics in Decision Sciences Technical Report TR-2007-8 May 22, 2007 Advances in Bayesian Software Reliability Modelling Fabrizio Ruggeri CNR IMATI Milano, Italy
More informationLinear Regression Models P8111
Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started
More informationDesign of Text Mining Experiments. Matt Taddy, University of Chicago Booth School of Business faculty.chicagobooth.edu/matt.
Design of Text Mining Experiments Matt Taddy, University of Chicago Booth School of Business faculty.chicagobooth.edu/matt.taddy/research Active Learning: a flavor of design of experiments Optimal : consider
More informationDifferent points of view for selecting a latent structure model
Different points of view for selecting a latent structure model Gilles Celeux Inria Saclay-Île-de-France, Université Paris-Sud Latent structure models: two different point of views Density estimation LSM
More informationRecursive Deviance Information Criterion for the Hidden Markov Model
International Journal of Statistics and Probability; Vol. 5, No. 1; 2016 ISSN 1927-7032 E-ISSN 1927-7040 Published by Canadian Center of Science and Education Recursive Deviance Information Criterion for
More informationBayesian Analysis for Natural Language Processing Lecture 2
Bayesian Analysis for Natural Language Processing Lecture 2 Shay Cohen February 4, 2013 Administrativia The class has a mailing list: coms-e6998-11@cs.columbia.edu Need two volunteers for leading a discussion
More informationCOS513 LECTURE 8 STATISTICAL CONCEPTS
COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions
More informationBayesian non-parametric model to longitudinally predict churn
Bayesian non-parametric model to longitudinally predict churn Bruno Scarpa Università di Padova Conference of European Statistics Stakeholders Methodologists, Producers and Users of European Statistics
More informationTruncation and Censoring
Truncation and Censoring Laura Magazzini laura.magazzini@univr.it Laura Magazzini (@univr.it) Truncation and Censoring 1 / 35 Truncation and censoring Truncation: sample data are drawn from a subset of
More informationMultiple QTL mapping
Multiple QTL mapping Karl W Broman Department of Biostatistics Johns Hopkins University www.biostat.jhsph.edu/~kbroman [ Teaching Miscellaneous lectures] 1 Why? Reduce residual variation = increased power
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationClustering bi-partite networks using collapsed latent block models
Clustering bi-partite networks using collapsed latent block models Jason Wyse, Nial Friel & Pierre Latouche Insight at UCD Laboratoire SAMM, Université Paris 1 Mail: jason.wyse@ucd.ie Insight Latent Space
More informationChoosing among models
Eco 515 Fall 2014 Chris Sims Choosing among models September 18, 2014 c 2014 by Christopher A. Sims. This document is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported
More informationInference for a Population Proportion
Al Nosedal. University of Toronto. November 11, 2015 Statistical inference is drawing conclusions about an entire population based on data in a sample drawn from that population. From both frequentist
More informationIntroduction to Probabilistic Machine Learning
Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning
More informationBayesian Learning (II)
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationRonald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California
Texts in Statistical Science Bayesian Ideas and Data Analysis An Introduction for Scientists and Statisticians Ronald Christensen University of New Mexico Albuquerque, New Mexico Wesley Johnson University
More informationBayesian Statistical Methods. Jeff Gill. Department of Political Science, University of Florida
Bayesian Statistical Methods Jeff Gill Department of Political Science, University of Florida 234 Anderson Hall, PO Box 117325, Gainesville, FL 32611-7325 Voice: 352-392-0262x272, Fax: 352-392-8127, Email:
More informationBAYESIAN DECISION THEORY
Last updated: September 17, 2012 BAYESIAN DECISION THEORY Problems 2 The following problems from the textbook are relevant: 2.1 2.9, 2.11, 2.17 For this week, please at least solve Problem 2.3. We will
More informationUNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator
UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 27 pages
More informationPlausible Values for Latent Variables Using Mplus
Plausible Values for Latent Variables Using Mplus Tihomir Asparouhov and Bengt Muthén August 21, 2010 1 1 Introduction Plausible values are imputed values for latent variables. All latent variables can
More informationBayesian inference for factor scores
Bayesian inference for factor scores Murray Aitkin and Irit Aitkin School of Mathematics and Statistics University of Newcastle UK October, 3 Abstract Bayesian inference for the parameters of the factor
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationIntroduction. Chapter 1
Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics
More informationNonparametric Bayesian Matrix Factorization for Assortative Networks
Nonparametric Bayesian Matrix Factorization for Assortative Networks Mingyuan Zhou IROM Department, McCombs School of Business Department of Statistics and Data Sciences The University of Texas at Austin
More informationTheory of Maximum Likelihood Estimation. Konstantin Kashin
Gov 2001 Section 5: Theory of Maximum Likelihood Estimation Konstantin Kashin February 28, 2013 Outline Introduction Likelihood Examples of MLE Variance of MLE Asymptotic Properties What is Statistical
More informationPackage effectfusion
Package November 29, 2016 Title Bayesian Effect Fusion for Categorical Predictors Version 1.0 Date 2016-11-21 Author Daniela Pauger [aut, cre], Helga Wagner [aut], Gertraud Malsiner-Walli [aut] Maintainer
More informationThe Bayesian Approach to Multi-equation Econometric Model Estimation
Journal of Statistical and Econometric Methods, vol.3, no.1, 2014, 85-96 ISSN: 2241-0384 (print), 2241-0376 (online) Scienpress Ltd, 2014 The Bayesian Approach to Multi-equation Econometric Model Estimation
More informationSeminar über Statistik FS2008: Model Selection
Seminar über Statistik FS2008: Model Selection Alessia Fenaroli, Ghazale Jazayeri Monday, April 2, 2008 Introduction Model Choice deals with the comparison of models and the selection of a model. It can
More informationGraphical Models and Kernel Methods
Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.
More informationPMR Learning as Inference
Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning
More informationQuantifying the Price of Uncertainty in Bayesian Models
Provided by the author(s) and NUI Galway in accordance with publisher policies. Please cite the published version when available. Title Quantifying the Price of Uncertainty in Bayesian Models Author(s)
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationBayesian Nonparametric Rasch Modeling: Methods and Software
Bayesian Nonparametric Rasch Modeling: Methods and Software George Karabatsos University of Illinois-Chicago Keynote talk Friday May 2, 2014 (9:15-10am) Ohio River Valley Objective Measurement Seminar
More informationINTRODUCTION TO BAYESIAN INFERENCE PART 2 CHRIS BISHOP
INTRODUCTION TO BAYESIAN INFERENCE PART 2 CHRIS BISHOP Personal Healthcare Revolution Electronic health records (CFH) Personal genomics (DeCode, Navigenics, 23andMe) X-prize: first $10k human genome technology
More informationNon-Parametric Bayes
Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian
More informationDetermining the number of components in mixture models for hierarchical data
Determining the number of components in mixture models for hierarchical data Olga Lukočienė 1 and Jeroen K. Vermunt 2 1 Department of Methodology and Statistics, Tilburg University, P.O. Box 90153, 5000
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More informationCategorical and Zero Inflated Growth Models
Categorical and Zero Inflated Growth Models Alan C. Acock* Summer, 2009 *Alan C. Acock, Department of Human Development and Family Sciences, Oregon State University, Corvallis OR 97331 (alan.acock@oregonstate.edu).
More informationBayesian estimation of complex networks and dynamic choice in the music industry
Bayesian estimation of complex networks and dynamic choice in the music industry Stefano Nasini Víctor Martínez-de-Albéniz Dept. of Production, Technology and Operations Management, IESE Business School,
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should
More informationStatistical Model for Soical Network
Statistical Model for Soical Network Tom A.B. Snijders University of Washington May 29, 2014 Outline 1 Cross-sectional network 2 Dynamic s Outline Cross-sectional network 1 Cross-sectional network 2 Dynamic
More informationComparison between conditional and marginal maximum likelihood for a class of item response models
(1/24) Comparison between conditional and marginal maximum likelihood for a class of item response models Francesco Bartolucci, University of Perugia (IT) Silvia Bacci, University of Perugia (IT) Claudia
More informationBayesian Model Diagnostics and Checking
Earvin Balderama Quantitative Ecology Lab Department of Forestry and Environmental Resources North Carolina State University April 12, 2013 1 / 34 Introduction MCMCMC 2 / 34 Introduction MCMCMC Steps in
More informationA graph contains a set of nodes (vertices) connected by links (edges or arcs)
BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,
More informationOutline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution
Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model
More informationLecture 2: Simple Classifiers
CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 2: Simple Classifiers Slides based on Rich Zemel s All lecture slides will be available on the course website: www.cs.toronto.edu/~jessebett/csc412
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationReview. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda
Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with
More informationInverse Sampling for McNemar s Test
International Journal of Statistics and Probability; Vol. 6, No. 1; January 27 ISSN 1927-7032 E-ISSN 1927-7040 Published by Canadian Center of Science and Education Inverse Sampling for McNemar s Test
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More informationSTA414/2104. Lecture 11: Gaussian Processes. Department of Statistics
STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations
More informationPartial effects in fixed effects models
1 Partial effects in fixed effects models J.M.C. Santos Silva School of Economics, University of Surrey Gordon C.R. Kemp Department of Economics, University of Essex 22 nd London Stata Users Group Meeting
More informationNaïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability
Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish
More informationRepresent processes and observations that span multiple levels (aka multi level models) R 2
Hierarchical models Hierarchical models Represent processes and observations that span multiple levels (aka multi level models) R 1 R 2 R 3 N 1 N 2 N 3 N 4 N 5 N 6 N 7 N 8 N 9 N i = true abundance on a
More information