Distributed ML for DOSNs: giving power back to users

Distributed ML for DOSNs: giving power back to users Amira Soliman KTH isocial Marie Curie Initial Training Networks

Part1 Agenda DOSNs and Machine Learning DIVa: Decentralized Identity Validation for Social Networks Part2 Topic Models Latent Dirichlet Allocation (LDA) LDA for DOSNs

DOSNs and Shifting Roles Apps SNoT Trust Data Benefits of DML in DOSNs Self-adaptive components Personalized services

Distributed ML for DOSNs Challenges: Heterogeneity: Different behavioral patterns Different data generation rates Different connectivity and roles Availability Incremental Updates: Social feeds streams Lifetime of models

DIVa: Decentralized Identity Validation 1. DIVa is a decentralized identity validation model. 2. DIVa provides users with community-aware validation rules that conceptualize users identities better than the centralized approach.

DIVa: Main Steps {City, University} {University, School} School : ABC University: LCAS X City: Y School : ABC University: X City: Y LCAS Community1 School : ABC University: X LCAS City: Y Degree: PhD Eng Job: PostDoc Employer: AXYZ LCAS Degree: Bsc Eng LCAS Job: Developer Employer: AXYZ Interests: Football, Climbing, Reading Community2 LCAS Job: Comm. Eng City: Z Interests: LCAS Football, Climbing, Waterskiing Job: Administrator Employer: AXYZ Interests: Music, Acting LCAS LCAS Degree: Msc Eng Job: Developer LCAS Employer: AXYZ Interests: Football, Climbing, Swimming Job: Accountant City: Z Interests: LCAS Football, Waterskiing, Climbing Job: System Analyst LCAS City: Z Interests: Football, Climbing, Waterskiing {Employer, Degree} {Degree, Interests} {City, Interests} Community3

DIVa: Main Steps (cont.) 1. Association Rule Mining 2. Community Detection Degree("Eng.") (Eng, X) s T Employer ("X") 3 0.75 4 Decision Rules: 2 7 3. Community-level Dominant CID Aggregation among direct friends Max CID among direct friends 4 9 1 0 5 10 3 11 6 12.... 2 4 9 1 0 5 10 3 7 6 11 12 8 8

Results DIVa achieved improvements over Centralized Loss ratio if Centralized validation is applied WWW15 feedback on DIVa: Deeper analysis about attributes PCA Overlapping communities Soft Clustering Incremental updates Community detection Community-level aggregation

TOPIC MODELING isocial Marie Curie Initial Training Networks isocial meeting 27-28/1/2015 Crete http://isocial-itn.eu/

Document clustering: 1. Uncover hidden topics, 2. Annotate documents according to those topics, Topic Models 3. Use annotation to organize, summarize and understand these documents

Each document is a bag of words How many clusters? Document Clustering Bayesian nonparametric methods (E.G., Dirichlet Processes) automatically detect how many clusters there are.

Latent Dirichlet Allocation (LDA)

LDA Generative Model LDA Model from Blei (2011) Each document is a random mixture of corpus-wide topics Each word is drawn from one of those topics

LDA Graphical model z a For each document d = 1,,M Generate d ~ Dir( a) For each position n = 1,, N d generate z n ~ Mult( d ) generate w n ~ Mult( zn ) w N M

Dirichlet Distribution The Dirichlet Distribution is parameterised by a set of concentration constants a defined over the k-simplex (a multinomial probability distribution): a ( a 1 a ) a 0 i {1 k} k i 2 1 0.2 0.5 0.3 1 2 3 i i 1 3 1 0 1 1 1 0 0 1 2 3 i i 1

LDA Inference Treat data as observations that arise from a generative probabilistic process that includes hidden variables For documents, the hidden variables reflect the thematic structure of the collection. Infer the hidden structure using posterior inference

LDA Inference We want to calculate the posterior: Two main ways to get posterior: - Sampling methods - Time consuming - Lots of black magic in sampling tricks - Variational methods - An approximation - Faster

Topic models of LDA LDA Unsupervised LDA Supervised LDA OSN Batch Incremental Gibbs Sampling Variational methods Parallel Online

Topic Models for DOSNs Issues to be addressed: 1. Short and noisy text, 2. Linked words instead of bag-of-words, 3. Global vs. localized models( communitybased models, models at most influential nodes), 4. Dynamic models.

Applications Topic Models LDA Topically-based Community Detection LDA Context-Aware Individual Recommendation System LDA DL Context-Aware Group Recommendation System LDA DL

Conclusion DOSNs creates the possibility of applying distributed and online learning in highly dynamic and heterogeneous environment, DIVa is a practical example of empowering users with customizable services, We target to implement topic models in DOSNs, and providing services such as ranking, summarization, recommendation systems.