Dynamic Probabilistic Models for Latent Feature Propagation in Social Networks

Similar documents
Dynamic Probabilistic Models for Latent Feature Propagation in Social Networks

Pattern Recognition and Machine Learning

Bayesian Hidden Markov Models and Extensions

STA 414/2104: Machine Learning

STA 4273H: Statistical Machine Learning

Nonparametric Probabilistic Modelling

Learning from Data. Amos Storkey, School of Informatics. Semester 1. amos/lfd/

Active and Semi-supervised Kernel Classification

Chapter 20. Deep Generative Models

Approximate Inference Part 1 of 2

Approximate Inference Part 1 of 2

Learning Gaussian Process Models from Uncertain Data

Unsupervised Learning

Non-Parametric Bayes

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

Automating variational inference for statistics and data mining

The Variational Gaussian Approximation Revisited

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

Random function priors for exchangeable arrays with applications to graphs and relational data

Hidden Markov Model. Ying Wu. Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Dynamic Approaches: The Hidden Markov Model

The connection of dropout and Bayesian statistics

Nonparmeteric Bayes & Gaussian Processes. Baback Moghaddam Machine Learning Group

Large-scale Ordinal Collaborative Filtering

Introduction to Machine Learning

Recent Advances in Bayesian Inference Techniques

13: Variational inference II

Bayesian Nonparametric Learning of Complex Dynamical Phenomena

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

Unsupervised Learning

Statistical Approaches to Learning and Discovery

Hidden Markov Models

Bayesian Machine Learning - Lecture 7

STA 4273H: Sta-s-cal Machine Learning

Deep unsupervised learning

CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model

Brief Introduction of Machine Learning Techniques for Content Analysis

Learning to Disentangle Factors of Variation with Manifold Learning

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang

Scaling Neighbourhood Methods

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

output dimension input dimension Gaussian evidence Gaussian Gaussian evidence evidence from t +1 inputs and outputs at time t x t+2 x t-1 x t+1

Nonparametric Bayesian Models for Sparse Matrices and Covariances

Bayesian Machine Learning

Unsupervised Learning of Hierarchical Models. in collaboration with Josh Susskind and Vlad Mnih

Probabilistic Graphical Models Lecture 20: Gaussian Processes

Augmented Statistical Models for Speech Recognition

Bayesian Nonparametrics

Applying Latent Dirichlet Allocation to Group Discovery in Large Graphs

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

Lecture 6: Graphical Models: Learning

Nonparameteric Regression:

Deep Generative Models. (Unsupervised Learning)

Intelligent Systems I

Probabilistic & Bayesian deep learning. Andreas Damianou

Variable sigma Gaussian processes: An expectation propagation perspective

VCMC: Variational Consensus Monte Carlo

Lecture 16 Deep Neural Generative Models

Bias-Variance Trade-Off in Hierarchical Probabilistic Models Using Higher-Order Feature Interactions

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26

Priors for Random Count Matrices with Random or Fixed Row Sums

Bayesian nonparametric models for bipartite graphs

Deep Learning Basics Lecture 8: Autoencoder & DBM. Princeton University COS 495 Instructor: Yingyu Liang

Density Estimation. Seungjin Choi

CS534 Machine Learning - Spring Final Exam

Gaussian Process Approximations of Stochastic Differential Equations

Introduction to Gaussian Processes

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding

GWAS V: Gaussian processes

Algorithmisches Lernen/Machine Learning

An Introduction to Statistical and Probabilistic Linear Models

Machine Learning Summer School

Nonparametric Bayesian Methods (Gaussian Processes)

Gentle Introduction to Infinite Gaussian Mixture Modeling

Hidden Markov Models

Probabilistic Graphical Models

Steven L. Scott. Presented by Ahmet Engin Ural

Network Event Data over Time: Prediction and Latent Variable Modeling

FINAL: CS 6375 (Machine Learning) Fall 2014

Introduction to Probabilistic Machine Learning

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

MTTTS16 Learning from Multiple Sources

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Building an Automatic Statistician

Deep Neural Networks as Gaussian Processes

Linear Dynamical Systems

Deep Belief Networks are compact universal approximators

Lecture 15. Probabilistic Models on Graph

Bayesian Inference Course, WTCN, UCL, March 2013

CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS

Generative Clustering, Topic Modeling, & Bayesian Inference

The Naïve Bayes Classifier. Machine Learning Fall 2017

Probabilistic Machine Learning. Industrial AI Lab.

STAT 518 Intro Student Presentation

p L yi z n m x N n xi

Final Examination CS 540-2: Introduction to Artificial Intelligence

Transcription:

Dynamic Probabilistic Models for Latent Feature Propagation in Social Networks Creighton Heaukulani and Zoubin Ghahramani University of Cambridge TU Denmark, June 2013 1

A Network Dynamic network data record the link statuses in the network at T time points: 2

Generative Models of Networks... encoded as link adjacency matrices {Y (1),Y (2),...,Y (T) }: Y (t) = 1 0 0 1 1 1 0 1 0 0 1 0 Example: friendship statuses in a social network y (t) ij = 1: actors i and j are friends at time t; y (t) ij = 0: actors i and j are not friends at time t. 3

Latent feature representations Assume there are K latent features underlying the population. Associate actor n with feature indicators h (t) n {0,1} K at time t: H (t) = Interpretation: features represent unobserved hobbies/interests, e.g., if feature k represents plays tennis, then h (t) nk h (t) nk = 1: actor n plays tennis at time t; = 0: actor n doesn t play tennis at time t. 4

Hidden Markov models Hidden Markov models assume features evolve independently h (t) ik h(t 1) ik Q ( h (t 1) ik, h (t) ) ik where Q is a Markov transition matrix. Then the edges are conditionally independent given the latent features [ ( )] y (t) ij h(t) i,h (t) j Bernoulli σ kl h (t) ik h(t) jl v kl where V is a feature-interaction weight matrix v kl N(0,σ 2 H) 5

An Example Consider the following example: feature h (t) ik evolves asa h (t) ik h(t 1) ik ( Bernoulli a 1 h(t 1) ik k ) b h(t 1) ik k a k [0,1] controls the probability of feature k switching from off to on b k [0,1] controls the persistency of feature k in the off state a Finite version of the DRIFT model from?. 6

Factorial HMM h (1) nk h (2) nk h (3) nk... h (T) nk.... h (1) n2 h (2) n2 h (3) n2... h (T) n2 h (1) n1 h (2) n1 h (3) n1... h (T) n1 n = 1 : N. Y (1) Y (2) Y (3)... Y (T) Each feature evolves over time independently. The latent feature configuration at a given time step produces the observed network. See?. 7

Latent Feature Evolution But consider the following: If my friends enjoy playing tennis, I am likely to start playing tennis If a friend gets me to join the tennis team, then I am more likely to befriend other tennis players We call this phenomenon latent feature propagation 8

Latent Feature Propagation Want something more like this: H (1) H (2) H (3)... H (T) Y (1) Y (2) Y (3)... Y (T) Network observations influence future latent features; information propagates between the observed and latent structures throughout the network over time 9

Latent Feature Propagation We use the following model (?) h (t+1) ik µ (t+1) ik µ (t+1) ik Bernoulli [ [ ])] σ (c k µ (t+1) ik b k = (1 λ i )h (t) ik +λ h (t) ik + j ε(i,t) w jh (t) jk i 1+ j ε(i,t) w j 1. λ i (0,1): actor i s susceptibility to the influence of friends; (1 λ i ) is the corresponding measure of social independence; 2. w i R + : the weight of influence of person i; 3. c k R + : a scale parameter for the persistence of feature k; 4. b k R + : a bias parameter for feature k. 10

Latent Feature Propagation Inference by MCMC; Forward-Backwards Algorithm from? Datasets: Simulated: N = 50,T = 100,K = 10. NIPS: N = 110,T = 17,K = 15. INFOCOM (?): N = 78,T = 50,K = 10. 11

Prediction of Missing Links 0.4 0.15 0.6 0.05 Log likelihood 0.8 1 1.2 Log likelihood 0.1 0.15 Log likelihood 0.2 0.25 1.4 0.2 0.3 LFP** DRIFT LFRM LFP** DRIFT LFRM LFP DRIFT LFRM 0.86 0.86 0.955 0.84 0.84 0.82 0.95 AUC score 0.82 0.8 AUC score 0.8 0.78 0.76 AUC score 0.945 0.78 0.74 0.72 0.94 0.76 LFP** DRIFT LFRM 0.7 LFP** DRIFT LFRM 0.935 LFP DRIFT LFRM (d) Simulated data (e) NIPS dataset (f) INFOCOM dataset Top: log-likel. of test edges. Bottom: AUC scores for classifying test edges. 10 repeats on different 20% hold-outs. Averaged over 300 samples. Significance indicated by ( ) (α = 0.05). 12

Forecasting Future Networks (g) NIPS dataset, K = 15 (h) INFOCOM dataset, K = 10 Forecasting a future unseen network. Differences from a naive baseline of the log-likelihoods of Y (t) after training on Y (1:t 1) 13

Visualising Feature Propagation (i) H 1997,Y 1997 (j) H 1998,Y 1997 (k) H 1998,Y 1998 (l) H 1999,Y 1998 Visualising feature propagation in the NIPS dataset (K = 15). 14

Visualising Feature Propagation We can look at the research interests of a trio of sparsely linked authors between 1997 and 1998 who are nearby in latent space: Author & year Topics Top cluster Prior Knowledge in Support Vector Kernels Bengio Y, Singer Y, et al. Shared Context Probabilistic Transducers Hinton G, Ghahramani Z Hierarchical Non-linear Factor Analysis and Topographic Maps Bishop C, Williams C, et al. Regression w/ Input-depend. Noise: A Gaussian Process Treatment Sollich P, Barber D On-line Learning from Finite Training Sets in Nonlinear Networks Barber D, Bishop C Ensemble Learning for Multi-Layer Networks Bishop C, Jordan M Approx. Posterior Distributions in Belief Networks Using Mixtures 15

Visualising Feature Propagation We can also examine the five authors with the largest inferred weights w n and some of their research interests ( 1995-1999): Author w n (relative) Topics Barto A 1.55 reinforcement learning, planning algorithms Rasmussen C 1.32 Gaussian processes, Bayesian methods Vapnik V 1.29 SVMs, learning theory Scholkopf B 1.28 SVMs, kernel methods Tresp V 1.26 neural networks, Bayesian methods 16

References Foulds, J., DuBois, C., Asuncion, A., Butts, C., and Smyth, P. (2011). A dynamic relational infinite feature model for longitudinal social networks. In Proc. AISTATS. Ghahramani, Z. and Jordan, M. (1997). Factorial hidden Markov models. Machine Learning, 29:245 273. Heaukulani, C. and Ghahramani, Z. (2013). Dynamic probabilistic models for latent feature propagation in social networks. In Proc. ICML. Scott, J., Gass, R., Crowcroft, J., Hui, P., Diot, C., and Chaintreau, A. (2009). CRAWDAD data set cambridge/haggle (v. 2009-05-29). Downloaded from http://crawdad.cs.dartmouth.edu/cambridge/haggle. Scott, S. (2002). Bayesian methods for hidden Markov models: Recursive computing in the 21st century. J. Am. Statist. Assoc., 97(457):337 351. 17